Causal inference and model explainability tools for retail

Causal inference and model explainability tools for retail
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Most major retailers today have multiple divisions focused on various aspects, such as marketing, supply chain, online customer experience, store customer experience, employee productivity, and vendor fulfillment. They also regularly collect data corresponding to all these aspects as dashboards and weekly/monthly/quarterly reports. Although several machine learning and statistical techniques have been in place to analyze and predict key metrics, such models typically lack interpretability. Moreover, such techniques also do not allow the validation or discovery of causal links. In this paper, we aim to provide a recipe for applying model interpretability and causal inference for deriving sales insights. In this paper, we review the existing literature on causal inference and interpretability in the context of problems in e-commerce and retail, and apply them to a real-world dataset. We find that an inherently explainable model has a lower variance of SHAP values, and show that including multiple confounders through a double machine learning approach allows us to get the correct sign of causal effect.


💡 Research Summary

The paper addresses a common dilemma in large retail enterprises: while abundant operational data enable sophisticated machine‑learning forecasts of key metrics such as sales, the resulting models are often black‑boxes that provide little insight into why a prediction was made, nor can they reliably answer “what‑if” counterfactual questions. The authors propose a two‑pronged recipe that couples model interpretability with causal inference to turn predictive analytics into actionable, trustworthy business intelligence.

First, the authors review interpretability techniques. They distinguish local versus global explanations, post‑hoc versus inherently interpretable models, and model‑agnostic versus model‑dependent methods. SHAP (Shapley Additive Explanations) is selected as the primary tool because it satisfies fairness axioms from cooperative game theory and works with any model. The paper then surveys causal inference, focusing on Double Machine Learning (DML), a flexible framework that first de‑confounds the treatment variable using nuisance models for the observed covariates and then estimates the average causal effect (ATE) with a second learner. The authors note that DML assumes all confounders are observed; otherwise instrumental variables, regression discontinuity, or difference‑in‑differences must be employed.

The empirical study uses a proprietary retail dataset spanning 2019‑2024, originally containing 64 columns. To keep the analysis tractable, the authors prune the data to seven engineered features (F1‑F7) and a sales target Y, removing any pair of features with Pearson correlation > 0.3. The reduced set still shows signs of hidden high‑dimensional structure, as indicated by a slowly rising scree plot.

Three tree‑based regressors are trained: XGBoost, Random Forest, and an Explainable Boosting Regressor (EBR), a type of Generalized Additive Model with built‑in interpretability. Hyper‑parameter tuning is performed, and performance is reported using a dimensionless “mean scaled absolute error” (MSAE). XGBoost achieves the lowest MSAE (0.106), while Random Forest and EBR obtain 0.221 and 0.211 respectively. Despite its higher error, EBR produces SHAP values with markedly lower variance across repetitions, suggesting that an inherently interpretable model yields more stable feature‑importance estimates. Visual inspection of SHAP scatter plots reveals several non‑monotonic relationships (e.g., under‑performing products first increase then slightly decrease sales contribution), hinting at hidden interactions or confounding.

To probe these interactions, the authors perform hierarchical clustering on the seven features. The dendrogram uncovers strong redundancy between F5 (negative customer perception) and F6 (seller unresponsiveness), as well as links to F2 (under‑performing products) and F4 (out‑of‑stock items). Such observable confounding can mislead post‑hoc explanations: a model may attribute importance to a proxy variable rather than the true causal driver.

The causal analysis therefore applies Double Machine Learning. The treatment of interest is F6 (seller unresponsiveness). In the first stage, the authors regress F6 on the other features (F2, F4, F5, etc.) to obtain residualized values that are purged of observed confounding. In the second stage, they regress sales Y on these residuals to estimate the ATE. When only F6 is controlled, the estimated effect sign is positive—contrary to business intuition that higher unresponsiveness should depress sales. After adding the identified confounders (F2, F4, F5) into the DML pipeline, the sign flips to negative, aligning with domain knowledge. This experiment demonstrates that DML can recover the correct causal direction once sufficient covariates are included, whereas naïve SHAP explanations may remain biased.

The paper concludes with practical recommendations: (1) start with SHAP or similar tools to obtain a global sense of feature relevance; (2) use clustering or correlation analysis to surface potential confounders; (3) feed the identified confounders into a DML workflow to obtain unbiased causal estimates; (4) translate the resulting ATEs into “what‑if” simulations for strategic decisions such as marketing budget adjustments or inventory rebalancing. The authors acknowledge limitations, notably the aggressive feature reduction that may omit important variables, and the reliance on fully observed confounders—situations where DML would fail and alternative quasi‑experimental designs would be required. Future work is suggested on scaling the approach to higher‑dimensional data, integrating automated causal discovery methods, and building real‑time causal dashboards for retail executives.


Comments & Academic Discussion

Loading comments...

Leave a Comment