Altruism and Fair Objective in Mixed-Motive Markov games

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Cooperation is fundamental for society’s viability, as it enables the emergence of structure within heterogeneous groups that seek collective well-being. However, individuals are inclined to defect in order to benefit from the group’s cooperation without contributing the associated costs, thus leading to unfair situations. In game theory, social dilemmas entail this dichotomy between individual interest and collective outcome. The most dominant approach to multi-agent cooperation is the utilitarian welfare which can produce efficient highly inequitable outcomes. This paper proposes a novel framework to foster fairer cooperation by replacing the standard utilitarian objective with Proportional Fairness. We introduce a fair altruistic utility for each agent, defined on the individual log-payoff space and derive the analytical conditions required to ensure cooperation in classic social dilemmas. We then extend this framework to sequential settings by defining a Fair Markov Game and deriving novel fair Actor-Critic algorithms to learn fair policies. Finally, we evaluate our method in various social dilemma environments.

💡 Research Summary

The paper tackles a fundamental tension in multi‑agent systems: while cooperation is essential for collective welfare, selfish incentives often drive agents to defect, leading to socially inefficient and unfair outcomes. The dominant paradigm in multi‑agent reinforcement learning (MARL) is to maximize a utilitarian social welfare—typically the sum or weighted sum of individual rewards. Although this approach can achieve high total returns, it ignores how those returns are distributed among agents, frequently producing highly inequitable solutions.

To address this limitation, the authors propose replacing the utilitarian objective with a proportional‑fairness (PF) based objective. PF, originally introduced in telecommunications for rate control, maximizes the sum of logarithms of agents’ utilities (Σ_i log u_i). This is equivalent to maximizing the Nash welfare (the product of utilities) and thus balances efficiency with equity: agents with low payoffs receive relatively larger marginal gains from improvements, encouraging more balanced allocations.

The core technical contribution is the definition of a “fair altruistic utility” for each agent. Starting from the classic α‑altruistic game, where each player’s payoff is a convex combination of its own reward p_i and a social welfare term SW(s), the authors replace p_i with a logarithmic transformation F_i(p_i)=log(p_i−m_p) (m_p being the minimal possible payoff) to bring individual rewards onto the same scale as the PF term. The resulting utility is

u_i(s) = (1−α)·log(p_i(s)−m_p) + α·SW(s),

where α∈

Altruism and Fair Objective in Mixed-Motive Markov games

💡 Research Summary

Comments & Academic Discussion

Leave a Comment