DiffLOB: Diffusion Models for Counterfactual Generation in Limit Order Books

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Modern generative models for limit order books (LOBs) can reproduce realistic market dynamics, but remain fundamentally passive: they either model what typically happens without accounting for hypothetical future market conditions, or they require interaction with another agent to explore alternative outcomes. This limits their usefulness for stress testing, scenario analysis, and decision-making. We propose \textbf{DiffLOB}, a regime-conditioned \textbf{Diff}usion model for controllable and counterfactual generation of \textbf{LOB} trajectories. DiffLOB explicitly conditions the generative process on future market regimes–including trend, volatility, liquidity, and order-flow imbalance, which enables the model to answer counterfactual queries of the form: ``If the future market regime were X instead of Y, how would the limit order book evolve?’’ Our systematic evaluation framework for counterfactual LOB generation consists of three criteria: (1) \textit{Controllable Realism}, measuring how well generated trajectories can reproduce marginal distributions, temporal dependence structure and regime variables; (2) \textit{Counterfactual validity}, testing whether interventions on future regimes induce consistent changes in the generated LOB dynamics; (3) \textit{Counterfactual usefulness}, assessing whether synthetic counterfactual trajectories improve downstream prediction of future market regimes.

💡 Research Summary

DiffLOB addresses a critical gap in limit order book (LOB) modeling: the inability to generate realistic market trajectories under hypothetical future conditions. Existing generative LOB models either reproduce average dynamics from historical data or require interaction with a trading agent to explore alternative outcomes, limiting their usefulness for stress testing, scenario analysis, and strategic decision‑making.

The authors formulate counterfactual LOB generation as a conditional generative problem. They introduce a set of future market regime variables—trend, volatility, liquidity, and order‑flow imbalance—as controllable conditions. By learning the conditional distribution p(x_{t+1:t+τ} | x_{1:t}, c), the model can answer queries of the form “If the future regime were X instead of Y, how would the LOB evolve?”

Technically, DiffLOB builds on denoising diffusion probabilistic models (DDPM). In the forward diffusion process, Gaussian noise is added to the target LOB trajectory over 100 discrete timesteps. The reverse process is parameterized by a neural network that predicts the injected noise, thereby reconstructing a sample from the data distribution. Crucially, the network is conditioned on three sources of information: (1) the historical LOB window, (2) time‑of‑day embeddings, and (3) the future regime vector c.

The architecture consists of four main components:

Backbone – A stack of 16 WaveNet‑style residual blocks captures spatio‑temporal dependencies across price levels and time. Each block produces residual and skip connections that are aggregated to form the final score estimate.
Regime Condition Encoder – Future regimes are split into “local” (step‑wise signals such as liquidity and imbalance) and “global” (long‑horizon trend and volatility). Local information is encoded with convolutional layers, while global information passes through a multilayer perceptron. This separation allows the model to handle heterogeneous effects at different temporal scales.
Control Module – Inspired by ControlNet, a set of zero‑initialized 1×1 convolutions injects regime‑specific control signals into the backbone after the backbone has been pretrained on the unconditional data distribution. Because the weights start at zero, the control pathway initially has no effect and learns stable, interpretable interventions during fine‑tuning.
FiLM Modulation – Feature‑wise linear modulation (FiLM) layers inject diffusion timestep embeddings together with local and global regime embeddings into each residual block. This sequential modulation enables a structured interaction between diffusion time, short‑term market state, and long‑term regime characteristics.

The dataset consists of LOBSTER snapshots for three highly liquid stocks (AMZN, AAPL, GOOG). One‑second snapshots include the top 10 bid and ask levels. Prices are represented by mid‑price differences and cross‑level price gaps, while volumes are capped at the 99th percentile and square‑root transformed to reduce heavy‑tail effects. Regime variables are computed directly from the target future trajectory: cumulative return for trend, standard deviation of returns for volatility, total standing volume for liquidity, and normalized volume imbalance for order‑flow imbalance.

Training uses Adam (lr = 1e‑4), batch size 128, early stopping on validation loss, and exponential moving average of parameters. Sampling follows ancestral DDPM sampling with classifier‑free guidance to incorporate the conditioning information.

Evaluation is organized around three criteria:

Controllable Realism – When conditioned on observed future regimes, DiffLOB’s generated trajectories match real data across a suite of statistical metrics: marginal distributions of mid‑price returns, bid‑ask spread, volatility clustering (autocorrelation of absolute returns), and volume‑level cross‑correlations. DiffLOB outperforms diffusion baselines (Diff‑CSDI, Diff‑S4), GAN/AE baselines (cGAN, cVAE), and an autoregressive S4 model.
Counterfactual Validity – The authors intervene on regime variables (e.g., impose a strong upward trend, high volatility, or low liquidity) and observe the generated LOB dynamics. The synthetic trajectories exhibit consistent, interpretable changes: returns shift in the desired direction, spread widens under low liquidity, and imbalance reflects the imposed order‑flow bias. Moreover, the statistical properties of these counterfactual samples align with real market periods that exhibited similar regime characteristics.
Counterfactual Usefulness – Synthetic counterfactual trajectories are added to the training set of a downstream regime‑prediction classifier. The augmented data improve prediction performance, especially for extreme or rare regimes, yielding average AUC gains of 4–6 % over models trained on real data alone. This demonstrates that DiffLOB not only generates realistic samples but also provides actionable information for downstream tasks.

Ablation studies show that removing the control module degrades both realism and the ability to steer the generation, confirming the importance of the ControlNet‑style pathway.

Limitations include the relatively simple definition of regime variables, the fixed prediction horizon (τ = 32 seconds), and the lack of integration with active trading agents for policy evaluation. Future work is suggested to enrich regime representations (e.g., macro‑economic indicators, news sentiment), extend the horizon, and combine DiffLOB with multi‑agent simulations for optimal execution and risk management.

In summary, DiffLOB introduces a novel, regime‑conditioned diffusion framework that enables controllable, counterfactual generation of LOB trajectories. It bridges the gap between realistic market simulation and scenario‑driven analysis, offering a powerful tool for researchers and practitioners interested in stress testing, strategy design, and market microstructure research.

DiffLOB: Diffusion Models for Counterfactual Generation in Limit Order Books

💡 Research Summary

Comments & Academic Discussion

Leave a Comment