Learning to Reach Agreement in a Continuous Ultimatum Game
It is well-known that acting in an individually rational manner, according to the principles of classical game theory, may lead to sub-optimal solutions in a class of problems named social dilemmas. In contrast, humans generally do not have much difficulty with social dilemmas, as they are able to balance personal benefit and group benefit. As agents in multi-agent systems are regularly confronted with social dilemmas, for instance in tasks such as resource allocation, these agents may benefit from the inclusion of mechanisms thought to facilitate human fairness. Although many of such mechanisms have already been implemented in a multi-agent systems context, their application is usually limited to rather abstract social dilemmas with a discrete set of available strategies (usually two). Given that many real-world examples of social dilemmas are actually continuous in nature, we extend this previous work to more general dilemmas, in which agents operate in a continuous strategy space. The social dilemma under study here is the well-known Ultimatum Game, in which an optimal solution is achieved if agents agree on a common strategy. We investigate whether a scale-free interaction network facilitates agents to reach agreement, especially in the presence of fixed-strategy agents that represent a desired (e.g. human) outcome. Moreover, we study the influence of rewiring in the interaction network. The agents are equipped with continuous-action learning automata and play a large number of random pairwise games in order to establish a common strategy. From our experiments, we may conclude that results obtained in discrete-strategy games can be generalized to continuous-strategy games to a certain extent: a scale-free interaction network structure allows agents to achieve agreement on a common strategy, and rewiring in the interaction network greatly enhances the agents ability to reach agreement. However, it also becomes clear that some alternative mechanisms, such as reputation and volunteering, have many subtleties involved and do not have convincing beneficial effects in the continuous case.
💡 Research Summary
The paper investigates how autonomous agents can learn to reach a common agreement in a continuous‑action version of the Ultimatum Game (UG), a classic social dilemma where a proposer suggests a split of a resource and a responder either accepts or rejects it. Classical game theory predicts that rational proposers will offer the smallest possible amount while responders will accept any positive offer, yet human participants typically converge on fair splits (around 50 % of the total). The authors aim to reproduce this human‑like fairness in multi‑agent systems by extending previous work that focused on discrete‑action games to a setting where strategies are real‑valued numbers.
Methodology
- Learning mechanism – Agents are equipped with Continuous‑Action Learning Automata (CALA). CALA maintains a probability distribution (normally a Gaussian) over possible offers; after each interaction the distribution’s mean is shifted in the direction indicated by the payoff (accepted or rejected). This yields smooth, incremental adjustments of offers, unlike binary reinforcement schemes used for discrete strategies.
- Interaction topology – Two network configurations are examined: (a) a static scale‑free network generated by preferential attachment, and (b) the same network augmented with a rewiring rule. In the rewiring scenario, if a proposer’s offer falls below the responder’s current expectation, the responder may cut the link with a fixed probability and randomly reconnect to another node. This mimics the human tendency to avoid repeatedly interacting with unfair partners.
- Population composition – Experiments are run with (i) only learning agents, and (ii) a mixture where a fraction (10–30 %) of agents are “fixed‑strategy” agents that always propose and accept a pre‑specified fair split (e.g., 0.5). The presence of fixed agents serves as a proxy for an externally imposed human‑like norm.
- Simulation protocol – Thousands of random pairwise games (≥10 000 rounds) are played. The authors record the average offer, acceptance rate, variance of offers, and convergence speed. They also test two additional mechanisms—reputation (recording past acceptance behavior) and volunteering (allowing agents to opt out of a game)—to see whether they improve outcomes in the continuous setting.
Key Findings
- Scale‑free networks facilitate convergence – Even without rewiring, the hub‑rich topology enables rapid diffusion of successful offer values. Over time, the entire population’s offers collapse to a narrow band, indicating spontaneous agreement.
- Rewiring dramatically accelerates and sharpens agreement – When agents can drop unsatisfactory partners, the average offer converges faster and settles much closer to the target fair split imposed by the fixed agents. Rewiring effectively filters out “defective” strategies, preventing them from contaminating the network.
- Reputation and volunteering are ineffective in continuous UG – Reputation scores, which are binary (accepted/rejected), do not capture the fine‑grained differences between offers and therefore have little influence on partner selection. Volunteering reduces the total number of interactions, limiting the learning opportunities for CALA and slowing convergence.
- CALA proves robust for continuous strategies – The learning automaton reliably drives agents toward a stable equilibrium despite noisy feedback and the presence of heterogeneous partners. Its incremental update rule avoids the oscillations often seen in discrete reinforcement learning when agents over‑react to single outcomes.
Implications and Limitations
The study suggests that, for continuous social dilemmas, the structure of the interaction network and the ability to rewire connections are more critical than auxiliary mechanisms such as reputation. This aligns with observations in human societies where individuals tend to avoid repeatedly dealing with unfair counterparts, thereby reinforcing norms of fairness. The results also validate CALA as a practical tool for continuous‑action multi‑agent learning, offering smoother adaptation than traditional Q‑learning or discrete automata.
However, the authors acknowledge several constraints. A high proportion of fixed‑strategy agents (>50 %) can dominate the dynamics, potentially suppressing the emergence of diverse strategies. The rewiring rule is modeled with a simple probability and does not account for realistic costs (e.g., time, communication overhead) associated with breaking and forming links. Moreover, the reputation system tested was overly simplistic; richer, graded reputation models might yield different outcomes. Future work is proposed to incorporate explicit rewiring costs, explore multi‑dimensional continuous actions (e.g., amount, timing, quality), and design more sophisticated reputation mechanisms that can capture subtle variations in fairness.
Conclusion
Overall, the paper demonstrates that extending discrete‑action agreement mechanisms to continuous‑action environments is feasible, provided that agents operate on a scale‑free network and are allowed to rewire away from unfair partners. These findings offer concrete design guidelines for building multi‑agent systems—such as autonomous resource‑allocation platforms or decentralized marketplaces—that aim to emulate human‑like fairness and cooperation in settings where decisions are not limited to a few discrete options.
Comments & Academic Discussion
Loading comments...
Leave a Comment