Glossary
Every abbreviation, scenario code, metric, and statistical term that appears elsewhere in the dashboard, defined in one place.
Looking for the conceptual why instead of the literal what? See the About page.
Policies under test
- MASmas
- Multi-Agent System — the architecture under study. Five cooperating agents (forecasting, drift detection, replenishment, inventory state, coordinator) share the inventory loop.
- Static ROPstatic_rop
- Baseline policy. Reorder Point and order quantity are fixed at the start of the simulation from historical mean demand and never adapt. Stand-in for the simplest textbook approach used widely in small-to-mid retailers.
- Periodic Forecastingperiodic_forecasting
- Baseline policy. Re-fits the demand forecast on a fixed schedule (every N steps) regardless of whether demand has changed. Stronger than Static ROP — adapts to slow trends — but blind to abrupt shifts between refit cycles.
Forecasting
- Tiered forecasting
- Forecaster selection per SKU based on data sufficiency: MA → SES → Holt-Winters. New / sparse SKUs get a simple moving average; rich-history SKUs graduate to seasonally-decomposed models.
- MAmoving_average
- Moving Average. Mean of demand over the last w observations.
- SESsimple_exp_smoothing
- Simple Exponential Smoothing. Forecast is an exponentially-weighted average of past demand. Handles slow trends; no seasonality.
- Holt-Winters
- Triple Exponential Smoothing — level + trend + seasonal components. Used for SKUs with enough history to identify a seasonal cycle.
- MAPEmean_abs_pct_error
- Mean Absolute Percentage Error. Forecast quality metric: average of
|actual − forecast| / actual. Lower is better; 0% is perfect.
Drift detection
- Concept drift
- Change in the distribution that is generating the data over time. In inventory: yesterday's demand model no longer predicts today's demand.
- ADWINadaptive_windowing
- ADaptive WINdowing (Bifet & Gavaldà, 2007). Streaming change detector that maintains a variable-size window of recent observations and signals when its two halves differ statistically. Triggers a forecaster refit when it fires.
- Per-SKU vs global drift
- We run one ADWIN per SKU on forecast residuals (catches SKU-specific shifts) and a second ADWIN on the population-mean residual (catches market-wide shifts that sparse per-SKU streams miss).
- Refit
- Re-training the forecasting model on the most recent window of data. Triggered either by ADWIN (MAS) or on a fixed schedule (Periodic Forecasting).
Inventory math
- (s, S) policy
- Continuous-review inventory rule (Scarf, 1959). When position falls below s (reorder point), order up to S (order-up-to level). Provably optimal under a class of single-item assumptions.
- EOQeconomic_order_qty
- Economic Order Quantity — the order size that minimizes the sum of holding + ordering costs. Used to derive the default S − s gap.
- Reorder Pointrop
- Inventory level at which a replenishment order is placed. Computed as lead-time demand + safety stock.
- Safety stock
- Buffer above forecast demand sized to cover demand-during-lead-time variability at a target service level. The MAS version of this is dynamic — it scales with recent forecast uncertainty.
- Lead time
- Number of simulation steps between placing an order and the order landing on-hand.
- On-hand / On-order
- On-hand = units physically in stock right now. On-order = units already ordered but still in transit. Inventory position = on-hand + on-order − backorders.
- SKUstock_keeping_unit
- An individual product variant tracked separately in inventory.
Drift scenarios
- no_drift
- Stationary demand throughout the horizon. Sanity-check baseline.
- gradual
- Demand mean drifts slowly and continuously over the horizon (e.g. monotonic trend). Tests the "slow shift" case.
- seasonal
- Demand follows a repeating cyclical pattern. Tests whether forecasters pick up the seasonality and the policy doesn't over- or under-react at cycle boundaries.
- abrupt
- Single sharp change in demand level at a known step. Tests how quickly the system detects and adapts.
- severe_abrupt
- Larger-magnitude version of
abrupt. - catastrophic
- Multiple compounding shifts (e.g. abrupt level change + variance spike). Worst-case stress test.
Metrics
- Stockout rate
- Fraction of SKU × time-step cells where demand could not be satisfied from on-hand inventory. Primary loss metric for H1.
- Total cost
- Sum of holding cost (per unit-step on-hand) + ordering cost (per order placed) + stockout penalty (per unmet unit). Reported as a single dollar figure across the horizon.
- n_orders
- Total number of replenishment orders placed during the run.
- n_drift_events
- Count of times the per-SKU ADWIN signaled a drift and triggered a forecaster refit during the run.
- n_global_drift_events
- Count of times the global (population-mean) ADWIN signaled. Often much smaller than per-SKU count; correlates with regime shifts.
- n_refits
- Number of forecaster refit operations performed. Equals
n_drift_eventsfor MAS, equalsn_steps / refit_periodfor Periodic.
Statistics
- Seed
- RNG seed for one independent replication of an experiment. Reported as
seed_001throughseed_010— every cell is N = 10. - Mann-Whitney Umw_p
- Non-parametric two-sample test of whether one distribution is stochastically greater than another. Our primary significance test because sweep distributions aren't normal.
- Welch's twelch_p
- Two-sample t-test that does NOT assume equal variances. Reported alongside Mann-Whitney as a parametric cross-check.
- Cohen's d
- Standardized effect size.
|d| ≥ 0.8= large effect,0.5= medium,0.2= small. A p-value tells you whether an effect is real; d tells you how big it is. - CI95
- 95% confidence interval. Bootstrap-style bounds on the per-cell mean; non-overlapping CIs across policies indicate practical separation.
- p ≤ 0.05
- Statistical significance threshold used throughout. Sub-0.001 values are reported as
≤ 0.001for compactness.
Experiment organization
- Sweep
- A batch of seed runs for a single configuration. Output of one sweep is one row under
results/<config_name>/. - Ablation
- Variant of the MAS with one component disabled (e.g. no ADWIN, no safety stock, MA-only forecaster). Used to attribute performance contributions to specific design choices. See /reports/ablation.
- Family
- Experiments are grouped into families:
h1(six scenarios × three policies),h3(scale sweep, 50 → 1000 SKUs),ablation(component removal), andcustom(everything else). - Tier promotion
- When a SKU accumulates enough observations to support a more expressive forecaster, it is "promoted" from MA to SES or SES to Holt-Winters mid-simulation.