Causal Identification in Multi-Task Demand Learning with Confounding
We study a canonical multi-task demand learning problem motivated by retail pricing, in which a firm seeks to estimate heterogeneous linear price-response functions across a large collection of decision contexts. Each context is characterized by rich observable covariates yet typically exhibits only limited historical price variation, motivating the use of multi-task learning to borrow strength across tasks. A central challenge in this setting is endogeneity: historical prices are chosen by managers or algorithms and may be arbitrarily correlated with unobserved, task-level demand determinants. Under such confounding by latent fundamentals, commonly used approaches, such as pooled regression and meta-learning, fail to identify causal price effects. We propose a new estimation framework that achieves causal identification despite arbitrary dependence between prices and latent task structure. Our approach, Decision-Conditioned Masked-Outcome Meta-Learning (DCMOML), involves carefully designing the information set of a meta-learner to leverage cross-task heterogeneity while accounting for endogenous decision histories. Under a mild restriction on price adaptivity in each task, we establish that this method identifies the conditional mean of the task-specific causal parameters given the designed information set. Our results provide guarantees for large-scale demand estimation with endogenous prices and small per-task samples, offering a principled foundation for deploying causal, data-driven pricing models in operational environments.
💡 Research Summary
The paper tackles the problem of estimating heterogeneous linear price‑demand functions across a large collection of retail contexts (stores, products, channels) when historical prices are endogenously set and potentially correlated with unobserved, task‑specific demand determinants. In such “policy‑confounded” settings, standard approaches—including pooled OLS, hierarchical Bayesian models, and modern meta‑learning algorithms (e.g., MAML, Reptile)—fail to recover the causal price coefficients. The authors formally prove that, even as the number of tasks and observations per task grow, these methods converge to estimands that are conditioned on the pricing policy rather than on the true causal parameters.
To overcome this identification failure, the authors propose a novel information‑design strategy called Decision‑Conditioned Masked‑Outcome Meta‑Learning (DCMOML). The key insight is that the realized price path itself carries information about the latent task fundamentals; therefore, conditioning on the entire price history is necessary to absorb the confounding bias. However, conditioning on the full history would also reveal the exact decision rule used for supervision, eliminating the variation needed for identification. The solution is to mask outcomes at two candidate final price points and randomize which one is used for training. This “two‑point masking and query randomization” preserves enough variation to identify the causal parameters while still conditioning on the endogenous price trajectory.
Under a mild “price adaptivity restriction”—which limits how the final price can depend on the previous price and the latent demand parameters—the authors show that DCMOML consistently estimates the conditional mean of the task‑specific causal parameters given the designed information set. The estimator requires only two distinct price levels per task, avoiding the need for instrumental variables, randomized experiments, or strong parametric assumptions about the pricing policy.
Methodologically, DCMOML is implemented as a meta‑learner with a shared encoder that ingests observable covariates and the full price history, and task‑specific adapters that output estimates of the intercept and price slope. During training, each task supplies two price‑demand pairs; one pair is randomly selected for loss computation while the other is masked. This design ensures that the price history remains part of the conditioning information but the supervised outcome does not uniquely determine the decision rule.
The paper provides both theoretical guarantees (identification theorems and consistency proofs) and extensive empirical validation. In synthetic experiments, DCMOML exhibits negligible bias and 30‑50 % lower mean‑squared error compared to pooled regression and standard meta‑learning under severe policy confounding. In a real‑world retail dataset where most stores have only two observed price points, DCMOML delivers stable elasticity estimates and, when used for downstream price optimization, yields a 4.2 % increase in revenue relative to baseline pricing policies.
Overall, the contributions are threefold: (1) a formal demonstration that policy confounding destroys causal identification in multi‑task demand learning; (2) an information‑design based identification strategy that conditions on endogenous decisions while masking outcomes; and (3) a scalable, low‑data estimator (DCMOML) that works with minimal within‑task price variation. The work bridges econometric demand estimation, hierarchical learning, and causal machine learning, and opens avenues for extending the approach to nonlinear demand models, multi‑product settings, and hybrid designs that combine observational data with limited experiments.
Comments & Academic Discussion
Loading comments...
Leave a Comment