Trade-offs Between Individual and Group Fairness in Machine Learning: A Comprehensive Review
Algorithmic fairness has become a central concern in computational decision-making systems, where ensuring equitable outcomes is essential for both ethical and legal reasons. Two dominant notions of fairness have emerged in the literature: Group Fairness (GF), which focuses on mitigating disparities across demographic subpopulations, and Individual Fairness (IF), which emphasizes consistent treatment of similar individuals. These notions have traditionally been studied in isolation. In contrast, this survey examines methods that jointly address GF and IF, integrating both perspectives within unified frameworks and explicitly characterizing the trade-offs between them. We provide a systematic and critical review of hybrid fairness approaches, organizing existing methods according to the fairness mechanisms they employ and the algorithmic and mathematical strategies used to reconcile multiple fairness criteria. For each class of methods, we examine their theoretical foundations, optimization mechanisms, and empirical evaluation practices, and discuss their limitations. Additionally, we discuss the challenges and identify open research directions for developing principled, context-aware hybrid fairness methods. By synthesizing insights across the literature, this survey aims to serve as a comprehensive resource for researchers and practitioners seeking to design hybrid algorithms that provide reliable fairness guarantees at both the individual and group levels.
💡 Research Summary
This survey provides a comprehensive review of methods that aim to satisfy both group fairness (GF) and individual fairness (IF) within a single machine‑learning framework, and it systematically analyzes the inherent trade‑offs between these two notions. The authors begin by motivating the problem: high‑stakes applications such as criminal justice, credit lending, hiring, and healthcare increasingly rely on automated decision‑making, yet historical biases can be amplified by learned models, raising ethical, legal, and societal concerns. While the fairness literature traditionally treats GF and IF as separate strands—GF focusing on statistical parity or error‑rate equality across predefined demographic groups, and IF insisting that similar individuals receive similar outcomes—the survey argues that real‑world deployments often require a joint consideration of both.
The paper first establishes a unified notation and formal definitions. GF is expressed through three widely used statistical criteria: independence (demographic parity), separation (equalized odds or equal opportunity), and sufficiency (predictive parity). Each criterion is written as a conditional independence statement and can be measured directly from model outputs when sensitive attributes are available. IF, by contrast, is formalized via a Lipschitz‑continuity condition linking a distance metric in feature space to a distance metric in prediction space, or via counterfactual consistency where the only change between two instances is the sensitive attribute. The authors emphasize that specifying an appropriate similarity metric is domain‑specific and often the most challenging aspect of implementing IF.
A central contribution of the survey is the articulation of theoretical incompatibility results: under mild assumptions, perfect GF and perfect IF can be achieved simultaneously only in trivial settings where the data distribution is already perfectly balanced. Empirical studies cited (e.g., references
Comments & Academic Discussion
Loading comments...
Leave a Comment