Double Fairness Policy Learning: Integrating Action Fairness and Outcome Fairness in Decision-making
Fairness is a central pillar of trustworthy machine learning, especially in domains where accuracy- or profit-driven optimization is insufficient. While most fairness research focuses on supervised learning, fairness in policy learning remains less explored. Because policy learning is interventional, it induces two distinct fairness targets: action fairness (equitable action assignments) and outcome fairness (equitable downstream consequences). Crucially, equalizing actions does not generally equalize outcomes when groups face different constraints or respond differently to the same action. We propose a novel double fairness learning (DFL) framework that explicitly manages the trade-off among three objectives: action fairness, outcome fairness, and value maximization. We integrate fairness directly into a multi-objective optimization problem for policy learning and employ a lexicographic weighted Tchebyshev method that recovers Pareto solutions beyond convex settings, with theoretical guarantees on the regret bounds. Our framework is flexible and accommodates various commonly used fairness notions. Extensive simulations demonstrate improved performance relative to competing methods. In applications to a motor third-party liability insurance dataset and an entrepreneurship training dataset, DFL substantially improves both action and outcome fairness while incurring only a modest reduction in overall value.
💡 Research Summary
The paper addresses a gap in the fairness‑aware policy learning literature by introducing a “double fairness learning” (DFL) framework that simultaneously enforces action fairness (equal treatment in decision making) and outcome fairness (equal distribution of downstream consequences) while still maximizing expected reward. The authors first formalize the two fairness notions in the context of policy learning, distinguishing equal‑opportunity (EO) and counterfactual fairness for both actions and outcomes. They then derive necessary and sufficient conditions under which a policy can satisfy both fairness criteria within a given policy class, assuming sufficient richness of the class and monotonicity of the reward and fairness functions.
To manage the inevitable trade‑offs among three objectives—value maximization, action‑fairness violation, and outcome‑fairness violation—the authors cast the problem as a multi‑objective optimization (MOO) task following Miettinen’s framework. Because many realistic policy parameterizations are non‑convex, standard linear scalarization would miss Pareto‑optimal solutions. The paper therefore adopts a lexicographic weighted Tchebychev scalarization, which respects a predefined priority order (e.g., value > action fairness > outcome fairness) and can recover all Pareto‑efficient points even in non‑convex spaces. The method iteratively solves a sequence of weighted min‑max problems, adjusting weights after each stage to enforce the lexicographic hierarchy.
An estimator is built by first learning regression models for the primary reward r(s,x,a) and the fairness‑related outcome f(s,x,a). The policy is parameterized (e.g., via a logistic model) and optimized with a loss that combines the negative estimated value and weighted fairness penalties. The lexicographic scheme dynamically updates the penalty weights, ensuring that the higher‑priority objective is satisfied before the lower‑priority ones are minimized.
Theoretical contributions include: (1) consistency of the learned policy with respect to the Pareto set defined by the Tchebychev scalarization, and (2) a regret bound of order O_p(1/√n) for the combined objective, showing that adding fairness constraints does not degrade statistical efficiency. The proofs rely on uniform convergence of the regression estimators and a constrained Lagrangian analysis that guarantees the fairness violations converge to zero.
Empirical evaluation comprises synthetic experiments and two real‑world case studies. In a Belgian motor liability insurance dataset, standard policies produce sizable premium and claim‑approval disparities across groups. DFL reduces both action‑fairness and outcome‑fairness gaps by roughly 30‑40 % while incurring only a 3‑5 % drop in expected profit. In an entrepreneurship training program, the baseline policy is action‑fair but yields a 20 % difference in internship success rates. DFL modestly adjusts admission criteria, preserving action fairness (≈1 % loss) and cutting the outcome gap to about 6 %. Across all scenarios DFL dominates single‑fairness baselines (action‑only or outcome‑only) and pure value‑maximizing policies on a radar chart that visualizes the three objectives.
The authors discuss extensions to multi‑group, multi‑action, and continuous fairness metrics, noting that the lexicographic weighting can be tuned to reflect domain‑specific priorities. Limitations include reliance on counterfactual models for causal fairness and the need for expert input to set the lexicographic order. Nonetheless, the work provides a principled, theoretically grounded, and practically effective approach to embedding dual fairness considerations into policy learning, offering a valuable tool for responsible AI deployment in high‑stakes decision‑making contexts.
Comments & Academic Discussion
Loading comments...
Leave a Comment