Simulating the Economic Impact of Rationality through Reinforcement Learning and Agent-Based Modelling

Simulating the Economic Impact of Rationality through Reinforcement Learning and Agent-Based Modelling
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

💡 Research Summary

The paper introduces a novel framework, the Rational macro Agent‑Based Model (R‑MABM), which augments a classic macroeconomic ABM with multi‑agent reinforcement learning (RL) to create fully rational firms that learn profit‑maximizing policies. The baseline model (CC‑MABM) contains households, banks, K‑firms (capital‑goods producers) and C‑firms (consumption‑goods producers). In the original model, C‑firms follow a simple heuristic: they adjust prices toward the market average and modify production based on excess inventory, embodying bounded rationality.

R‑MABM replaces a configurable fraction of C‑firms with RL agents. Each RL firm observes a state consisting of the logarithmic price deviation from the market average and the logarithmic inventory deviation, both discretized into bins. The action space is also discrete, representing log‑adjustments to price and target production. Agents are trained with asynchronous Q‑learning using an ε‑greedy policy; the reward at each step is the firm’s profit, with a heavy penalty for bankruptcy. Because other firms follow fixed rules, they are treated as part of the environment, allowing each RL agent to learn independently in a partially observable stochastic game.

Experiments vary two key parameters: the proportion of RL firms (degree of rationality) and the competition intensity parameter z_c, which determines how many firms each consumer samples when shopping. The results reveal three emergent strategies among the RL firms: (1) a “price‑competition” strategy that undercuts the market average and expands output, (2) a “margin‑optimization” strategy that sets prices above average to maximize profit per unit, and (3) a “mixed” adaptive strategy that switches between the two depending on recent market signals. The optimal strategy depends on market competition: low competition (small z_c) favors price competition, while high competition (large z_c) favors margin optimization.

A striking finding is that, even without explicit communication, RL firms with similar policies self‑segregate into distinct strategic groups, effectively concentrating market power and raising aggregate profits. This spontaneous clustering demonstrates that coordination can arise purely from the learning dynamics.

From a macroeconomic perspective, increasing the share of rational RL firms consistently raises total output (GDP) because firms allocate resources more efficiently. However, the impact on economic stability varies with the dominant strategy. When margin‑optimization dominates, the economy experiences higher profit levels but also greater output volatility, suggesting a trade‑off between growth and stability. Conversely, the price‑competition strategy yields more stable output at the cost of lower average profits.

The authors make the R‑MABM code publicly available, ensuring reproducibility, and argue that the framework offers a principled way to embed rational optimisation into ABMs without hand‑crafting complex behavioural rules. By allowing researchers and policymakers to tune the proportion of rational agents and observe resulting macro‑effects, R‑MABM opens new avenues for studying the interplay between micro‑level learning and macro‑level outcomes, and for designing policies that balance growth, profitability, and economic volatility.


Comments & Academic Discussion

Loading comments...

Leave a Comment