Wasserstein projection distance for fairness testing of regression models

Wasserstein projection distance for fairness testing of regression models
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Fairness testing evaluates whether a model satisfies a specified fairness criterion across different groups, yet most research has focused on classification models, leaving regression models underexplored. This paper introduces a framework for fairness testing in regression models, leveraging Wasserstein distance to project data distribution and focusing on expectation-based criteria. Upon categorizing fairness criteria for regression, we derive a Wasserstein projection test statistic from dual reformulation, and derive asymptotic bounds and limiting distributions, allowing us to formulate both a hypothesis-testing procedure and an optimal data perturbation method to improve fairness while balancing accuracy. Experiments on synthetic data demonstrate that the proposed hypothesis-testing approach offers higher specificity compared to permutation-based tests. To illustrate its potential applications, we apply our framework to two case studies on real data, showing (1) statistically significant gender disparities that appear on student performance data across multiple models, and (2) significant unfairness between pollution areas under multiple fairness criteria affecting housing price data, robust to different group divisions, with feature-level analysis identifying spatial and socioeconomic drivers.


💡 Research Summary

The paper addresses the largely unexplored problem of fairness testing for regression models by introducing a Wasserstein‑projection‑based statistical framework. After categorising regression fairness criteria, the authors focus on expectation‑based notions (e.g., equal mean predictions, equal mean errors, bounded group loss) and define a set F_R of “fair” probability distributions that satisfy a given criterion with respect to a fixed regressor R. The core test statistic T is the squared Wasserstein distance between the empirical data distribution \hat P_N and the closest distribution in F_R, using a cost function that penalises differences in feature space (weight α) and output space (weight β) while assigning infinite cost to moving mass across groups, thereby enforcing strict group separation.

Because directly solving inf_{Q∈F_R} W_c²(\hat P_N,Q) is infinite‑dimensional, the authors apply two reductions. First, they show that it suffices to project onto the marginally‑constrained set F_R(\hat p_N), preserving the empirical group proportions. Second, they invoke duality to rewrite the problem as a finite‑dimensional max‑min formulation:

T = (1/N) · sup_{γ∈ℝ^N} ∑{i=1}^N inf{x_i,y_i}


Comments & Academic Discussion

Loading comments...

Leave a Comment