Metaorder modelling and identification from public data

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Market-order flow in financial markets exhibits long-range correlations. This is a widely known stylised fact of financial markets. A popular hypothesis for this stylised fact comes from the Lillo-Mike-Farmer (LMF) order-splitting theory. However, quantitative tests of this theory have historically relied on proprietary datasets with trader identifiers, limiting reproducibility and cross-market validation. We show that the LMF theory can be validated using publicly available Johannesburg Stock Exchange (JSE) data by leveraging recently developed methods for reconstructing synthetic metaorders. We demonstrate the validation using 3 years of Transaction and Quote Data (TAQ) for the largest 100 stocks on the JSE when assuming that there are either N=50 or N=150 effective traders managing metaorders in the market.

💡 Research Summary

The paper “Metaorder modelling and identification from public data” tackles a long‑standing limitation in the empirical validation of the Lillo‑Mike‑Farmer (LMF) order‑splitting theory: the reliance on proprietary datasets that contain trader identifiers. The LMF theory posits that the long‑range autocorrelation observed in market order flow originates from the splitting of large “parent” orders into many smaller child orders, which are executed as meta‑orders by individual agents. According to the theory, the distribution of meta‑order lengths follows a power law P(L) ∝ L⁻ᵅ (α > 1) and the autocorrelation function of trade signs decays as C(τ) ∝ τ⁻ᵞ, with the exponents linked by γ = α − 1. Historically, confirming this relationship required data that could link each trade to a specific trader, a resource that is rarely publicly available.

The authors overcome this obstacle by employing a recently proposed synthetic meta‑order reconstruction method (Maitrier et al., 2024) that works solely with public Level‑1 transaction and quote (TAQ) data. The method first defines a hypothetical number of traders N and a participation distribution F (either homogeneous or a power‑law). Using a mapping function (Algorithm 1), each trade in the chronological stream is randomly assigned to one of the N synthetic traders while preserving the original time order. Consecutive trades with the same sign (+1 for buy, –1 for sell) belonging to the same synthetic trader are then grouped into a meta‑order (Algorithm 2). This approach yields a set of synthetic meta‑orders that retain key aggregate statistics of the real market.

The empirical analysis uses three years (January 2023 – December 2025) of JSE TAQ data for the 100 largest stocks. The authors consider two plausible values for the number of effective traders, N = 50 and N = 150, and test both homogeneous and power‑law participation (δ ≈ 2). For each configuration they generate synthetic meta‑orders and evaluate four stylised facts that have become benchmarks for meta‑order realism:

Square‑Root Law (SQL) – The price impact I(Q) of a meta‑order of total volume Q scales as √Q when normalised by daily volume V_D and intraday volatility σ_D. By binning Q/V_D on a log‑log scale and averaging impact, the authors find that for Q/V_D ∈

Metaorder modelling and identification from public data

💡 Research Summary

Comments & Academic Discussion

Leave a Comment