Glossary

Every abbreviation, scenario code, metric, and statistical term that appears elsewhere in the dashboard, defined in one place.

Looking for the conceptual why instead of the literal what? See the About page.

Policies under test

MAS
mas
Multi-Agent System — the architecture under study. Five cooperating agents (forecasting, drift detection, replenishment, inventory state, coordinator) share the inventory loop.
Static ROP
static_rop
Baseline policy. Reorder Point and order quantity are fixed at the start of the simulation from historical mean demand and never adapt. Stand-in for the simplest textbook approach used widely in small-to-mid retailers.
Periodic Forecasting
periodic_forecasting
Baseline policy. Re-fits the demand forecast on a fixed schedule (every N steps) regardless of whether demand has changed. Stronger than Static ROP — adapts to slow trends — but blind to abrupt shifts between refit cycles.

Forecasting

Tiered forecasting
Forecaster selection per SKU based on data sufficiency: MASESHolt-Winters. New / sparse SKUs get a simple moving average; rich-history SKUs graduate to seasonally-decomposed models.
MA
moving_average
Moving Average. Mean of demand over the last w observations.
SES
simple_exp_smoothing
Simple Exponential Smoothing. Forecast is an exponentially-weighted average of past demand. Handles slow trends; no seasonality.
Holt-Winters
Triple Exponential Smoothing — level + trend + seasonal components. Used for SKUs with enough history to identify a seasonal cycle.
MAPE
mean_abs_pct_error
Mean Absolute Percentage Error. Forecast quality metric: average of |actual − forecast| / actual. Lower is better; 0% is perfect.

Drift detection

Concept drift
Change in the distribution that is generating the data over time. In inventory: yesterday's demand model no longer predicts today's demand.
ADWIN
adaptive_windowing
ADaptive WINdowing (Bifet & Gavaldà, 2007). Streaming change detector that maintains a variable-size window of recent observations and signals when its two halves differ statistically. Triggers a forecaster refit when it fires.
Per-SKU vs global drift
We run one ADWIN per SKU on forecast residuals (catches SKU-specific shifts) and a second ADWIN on the population-mean residual (catches market-wide shifts that sparse per-SKU streams miss).
Refit
Re-training the forecasting model on the most recent window of data. Triggered either by ADWIN (MAS) or on a fixed schedule (Periodic Forecasting).

Inventory math

(s, S) policy
Continuous-review inventory rule (Scarf, 1959). When position falls below s (reorder point), order up to S (order-up-to level). Provably optimal under a class of single-item assumptions.
EOQ
economic_order_qty
Economic Order Quantity — the order size that minimizes the sum of holding + ordering costs. Used to derive the default S − s gap.
Reorder Point
rop
Inventory level at which a replenishment order is placed. Computed as lead-time demand + safety stock.
Safety stock
Buffer above forecast demand sized to cover demand-during-lead-time variability at a target service level. The MAS version of this is dynamic — it scales with recent forecast uncertainty.
Lead time
Number of simulation steps between placing an order and the order landing on-hand.
On-hand / On-order
On-hand = units physically in stock right now. On-order = units already ordered but still in transit. Inventory position = on-hand + on-order − backorders.
SKU
stock_keeping_unit
An individual product variant tracked separately in inventory.

Drift scenarios

no_drift
Stationary demand throughout the horizon. Sanity-check baseline.
gradual
Demand mean drifts slowly and continuously over the horizon (e.g. monotonic trend). Tests the "slow shift" case.
seasonal
Demand follows a repeating cyclical pattern. Tests whether forecasters pick up the seasonality and the policy doesn't over- or under-react at cycle boundaries.
abrupt
Single sharp change in demand level at a known step. Tests how quickly the system detects and adapts.
severe_abrupt
Larger-magnitude version of abrupt.
catastrophic
Multiple compounding shifts (e.g. abrupt level change + variance spike). Worst-case stress test.

Metrics

Stockout rate
Fraction of SKU × time-step cells where demand could not be satisfied from on-hand inventory. Primary loss metric for H1.
Total cost
Sum of holding cost (per unit-step on-hand) + ordering cost (per order placed) + stockout penalty (per unmet unit). Reported as a single dollar figure across the horizon.
n_orders
Total number of replenishment orders placed during the run.
n_drift_events
Count of times the per-SKU ADWIN signaled a drift and triggered a forecaster refit during the run.
n_global_drift_events
Count of times the global (population-mean) ADWIN signaled. Often much smaller than per-SKU count; correlates with regime shifts.
n_refits
Number of forecaster refit operations performed. Equals n_drift_events for MAS, equals n_steps / refit_period for Periodic.

Statistics

Seed
RNG seed for one independent replication of an experiment. Reported as seed_001 through seed_010 — every cell is N = 10.
Mann-Whitney U
mw_p
Non-parametric two-sample test of whether one distribution is stochastically greater than another. Our primary significance test because sweep distributions aren't normal.
Welch's t
welch_p
Two-sample t-test that does NOT assume equal variances. Reported alongside Mann-Whitney as a parametric cross-check.
Cohen's d
Standardized effect size. |d| ≥ 0.8 = large effect, 0.5 = medium, 0.2 = small. A p-value tells you whether an effect is real; d tells you how big it is.
CI95
95% confidence interval. Bootstrap-style bounds on the per-cell mean; non-overlapping CIs across policies indicate practical separation.
p ≤ 0.05
Statistical significance threshold used throughout. Sub-0.001 values are reported as ≤ 0.001 for compactness.

Experiment organization

Sweep
A batch of seed runs for a single configuration. Output of one sweep is one row under results/<config_name>/.
Ablation
Variant of the MAS with one component disabled (e.g. no ADWIN, no safety stock, MA-only forecaster). Used to attribute performance contributions to specific design choices. See /reports/ablation.
Family
Experiments are grouped into families: h1 (six scenarios × three policies), h3 (scale sweep, 50 → 1000 SKUs), ablation (component removal), and custom (everything else).
Tier promotion
When a SKU accumulates enough observations to support a more expressive forecaster, it is "promoted" from MA to SES or SES to Holt-Winters mid-simulation.