Fixed-Horizon Self-Normalized Inference for Adaptive Experiments via Martingale AIPW/DML with Logged Propensities

Reading time: 5 minute
...

📝 Original Info

  • Title: Fixed-Horizon Self-Normalized Inference for Adaptive Experiments via Martingale AIPW/DML with Logged Propensities
  • ArXiv ID: 2602.15559
  • Date: 2026-02-17
  • Authors: ** 저자 정보가 논문 본문에 명시되지 않아 확인할 수 없습니다. **

📝 Abstract

Adaptive randomized experiments update treatment probabilities as data accrue, but still require an end-of-study interval for the average treatment effect (ATE) at a prespecified horizon. Under adaptive assignment, propensities can keep changing, so the predictable quadratic variation of AIPW/DML score increments may remain random. When no deterministic variance limit exists, Wald statistics normalized by a single long-run variance target can be conditionally miscalibrated given the realized variance regime. We assume no interference, sequential randomization, i.i.d. arrivals, and executed overlap on a prespecified scored set, and we require two auditable pipeline conditions: the platform logs the executed randomization probability for each unit, and the nuisance regressions used to score unit $t$ are constructed predictably from past data only. These conditions make the centered AIPW/DML scores an exact martingale difference sequence. Using self-normalized martingale limit theory, we show that the Studentized statistic, with variance estimated by realized quadratic variation, is asymptotically N(0,1) at the prespecified horizon, even without variance stabilization. Simulations validate the theory and highlight when standard fixed-variance Wald reporting fails.

💡 Deep Analysis

📄 Full Content

Adaptive randomized experiments-including response-adaptive clinical trials, contextual bandits, and large-scale platform experimentation systems-update assignment probabilities as data accrue to balance learning and deployment (Kasy and Sautmann, 2021). In practice, however, many platforms still require conventional end-of-study reporting for classical causal estimands such as the superpopulation average treatment effect (ATE) computed once at a prespecified horizon. Hereafter, we denote by 𝜋 𝑡 the executed assignment probability used to randomize unit 𝑡 after applying any platform guardrails, as recorded in the experiment log. In this paper, we study fixed-horizon Wald inference for the standard logged-propensity AIPW/DML estimator, the sample average of doubly robust pseudo-outcomes scored using these logged propensities. Under adaptive assignment, the propensity process {𝜋 𝑡 } is itself data-dependent, so the predictable quadratic variation of the AIPW/DML score increments can remain replication-random and need not converge to a single deterministic long-run variance target.

Some end-of-study Wald arguments for AIPW/A2IPW (and related DML estimators) under adaptivity proceed via Slutsky steps that rely on a deterministic variance target for the predictable quadratic variation, often enforced through stabilization-type or design-stability conditions on assignment probabilities and/or average conditional variances (Hadad et al., 2021;Zhan et al., 2021;Kato et al., 2020;Cook et al., 2024;Li and Owen, 2024;Sengupta et al., 2025). On modern platforms, however, the policy can keep reacting to noisy intermediate estimates. Clipping and guardrails can activate intermittently and batch updates can induce regime switches. When this happens, a Wald statistic normalized using a single deterministic variance target can be systematically miscalibrated conditional on the realized variance regime, even if marginal coverage appears close to nominal.

We treat the adaptive assignment policy as given but assume that the platform logs the executed propensity used to randomize each unit and that nuisance regressions used for AIPW/DML scoring are fit predictably using only past data. Related work emphasizes that careful use of the logging policy and past-only fitting is central for post-adaptive inference (Bibaut et al., 2021;Kato et al., 2021;Cook et al., 2024). Under these auditable conditions, the centered score increments form an exact martingale difference sequence, and we obtain fixed-horizon Wald inference by studentizing with realized quadratic variation along the realized propensity path. This yields asymptotic N (0, 1) calibration without requiring the predictable quadratic variation to converge to a deterministic long-run variance target.1

• Auditable martingale scoring. We formalize a logging/predictability contract, logged executed propensities and predictable nuisance fitting, under which centered AIPW/DML score increments form an exact martingale difference sequence (Lemma 5.3).

• Fixed-horizon self-normalized Wald inference. We prove that the usual Studentized statistic, with variance estimated by realized quadratic variation, is asymptotically N (0, 1) at a prespecified horizon even when no deterministic long-run variance limit exists (Theorem 5.14).

• Feasible studentization. We show that the standard plug-in studentizer used in practice consistently estimates realized quadratic variation, so the feasible Wald interval inherits the same fixed-horizon validity (Proposition 4.11).

• Oracle benchmarking and nuisance-learning effects. We provide a conditional second-moment decomposition yielding an oracle precision benchmark and isolate a nonnegative augmentation term capturing variance inflation from nuisance error (Proposition 5.8). Under weighted 𝐿 2 convergence, the feasible statistic is asymptotically oracle-equivalent (Theorem C.4).

The subsequent sections are structured as follows. Section 2 reviews related work. Section 3 states the model and assumptions. Section 4 presents the estimator and auditable implementation details. Section 5 develops the main theoretical results and Section 6 reports simulations. The appendices collect supporting limit-theory background, additional results and proofs, and an operational logging protocol.

A growing literature studies inference with adaptively collected data, where observations are generated under an evolving information set and classical i.i.d. arguments do not directly apply. Early econometric work by Hahn et al. (2011) highlighted how propensity information can be leveraged for inference in sequential designs. More recent general frameworks derive asymptotic representations for sequential decisions and adaptive experiments under broad conditions (Hirano and Porter, 2023). In the contextual-bandit and adaptive-experiment literature, fixed-horizon inference has also been developed via batched OLS/batchwise studentization arguments (Zhang et al., 2020). O

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut