A-Collapsibility of Distribution Dependence and Quantile Regression Coefficients

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The Yule-Simpson paradox notes that an association between random variables can be reversed when averaged over a background variable. Cox and Wermuth (2003) introduced the concept of distribution dependence between two random variables X and Y, and developed two dependence conditions, each of which guarantees that reversal cannot occur. Ma, Xie and Geng (2006) studied the collapsibility of distribution dependence over a background variable W, under a rather strong homogeneity condition. Collapsibility ensures the association remains the same for conditional and marginal models, so that Yule-Simpson reversal cannot occur. In this paper, we investigate a more general condition for avoiding effect reversal: A-collapsibility. The conditions of Cox and Wermuth imply A-collapsibility, without assuming homogeneity. In fact, we show that, when W is a binary variable, collapsibility is equivalent to A-collapsibility plus homogeneity, and A-collapsibility is equivalent to the conditions of Cox and Wermuth. Recently, Cox (2007) extended Cochran’s result on regression coefficients of conditional and marginal models, to quantile regression coefficients. The conditions of Cox and Wermuth are sufficient for A-collapsibility of quantile regression coefficients. If the conditional distribution of W, given Y = y and X = x, belong to one-dimensional natural exponential family, they are also necessary. Some applications of A-collapsibility include the analysis of a contingency table, linear regression models and quantile regression models.

💡 Research Summary

The paper addresses the long‑standing Yule‑Simpson paradox, which occurs when an association between two variables reverses after marginalizing over a background variable. Building on Cox and Wermuth’s (2003) notion of distribution dependence, the authors introduce a more flexible condition—A‑collapsibility—that guarantees the absence of such reversal without requiring the strong homogeneity assumption used in earlier work (Ma, Xie, and Geng 2006).

Cox and Wermuth identified two sufficient conditions for non‑reversal: (A) monotonicity of the conditional cumulative distribution function of Y given X and W with respect to X, and (B) non‑increasing conditional density of Y given X and W. Either condition ensures that the sign of the X–Y association cannot change after averaging over W. Ma et al. defined collapsibility as the equality of conditional and marginal models, but they required that the X–Y relationship be identical across all levels of W (homogeneity), a restriction often violated in practice.

A‑collapsibility relaxes this requirement. It is defined as the property that the association between X and Y remains unchanged after marginalizing over W when at least one of Cox and Wermuth’s conditions holds, regardless of whether the X–Y relationship varies with W. The authors prove that Cox and Wermuth’s conditions automatically imply A‑collapsibility, making the latter a genuine generalization. Moreover, when the background variable W is binary, they show that traditional collapsibility is equivalent to A‑collapsibility together with homogeneity; thus, A‑collapsibility captures the essence of collapsibility while allowing for heterogeneity.

The second major contribution concerns quantile regression coefficients. Cox (2007) extended Cochran’s classic result on the equality of conditional and marginal regression coefficients to the quantile regression setting. The present paper demonstrates that Cox and Wermuth’s two conditions are sufficient for A‑collapsibility of quantile regression coefficients as well. Importantly, if the conditional distribution of W given (Y = y, X = x) belongs to a one‑dimensional natural exponential family (e.g., Bernoulli, Poisson, Gamma), these conditions become not only sufficient but also necessary. This result provides a complete characterization of when quantile regression coefficients are immune to Simpson’s paradox.

To illustrate the practical relevance, three applications are discussed. In contingency‑table analysis, A‑collapsibility ensures that cell‑proportion odds remain consistent between conditional and marginal tables, eliminating paradoxical interpretations. In linear regression, it guarantees that regression slopes retain their sign and magnitude after marginalization, even when the error structure varies with W. In quantile regression, the equality of conditional and marginal quantile curves under the identified conditions allows analysts to draw reliable conclusions about distributional effects across different quantiles, a crucial feature for policy evaluation and risk assessment.

In summary, the paper establishes A‑collapsibility as a unifying and less restrictive framework for preventing Yule‑Simpson reversal. It subsumes the earlier sufficient conditions, clarifies the relationship between collapsibility and homogeneity for binary background variables, and extends the theory to quantile regression coefficients with both sufficient and necessary conditions under natural exponential family assumptions. These theoretical advances broaden the toolkit for statisticians and data scientists, enabling more robust inference in a wide range of models where background variables cannot be assumed homogeneous.

A-Collapsibility of Distribution Dependence and Quantile Regression Coefficients

💡 Research Summary

Comments & Academic Discussion

Leave a Comment