Simulating the Economic Impact of Rationality through Reinforcement Learning and Agent-Based Modelling
đĄ Research Summary
The paper introduces a novel framework, the Rational macro AgentâBased Model (RâMABM), which augments a classic macroeconomic ABM with multiâagent reinforcement learning (RL) to create fully rational firms that learn profitâmaximizing policies. The baseline model (CCâMABM) contains households, banks, Kâfirms (capitalâgoods producers) and Câfirms (consumptionâgoods producers). In the original model, Câfirms follow a simple heuristic: they adjust prices toward the market average and modify production based on excess inventory, embodying bounded rationality.
RâMABM replaces a configurable fraction of Câfirms with RL agents. Each RL firm observes a state consisting of the logarithmic price deviation from the market average and the logarithmic inventory deviation, both discretized into bins. The action space is also discrete, representing logâadjustments to price and target production. Agents are trained with asynchronous Qâlearning using an Îľâgreedy policy; the reward at each step is the firmâs profit, with a heavy penalty for bankruptcy. Because other firms follow fixed rules, they are treated as part of the environment, allowing each RL agent to learn independently in a partially observable stochastic game.
Experiments vary two key parameters: the proportion of RL firms (degree of rationality) and the competition intensity parameter z_c, which determines how many firms each consumer samples when shopping. The results reveal three emergent strategies among the RL firms: (1) a âpriceâcompetitionâ strategy that undercuts the market average and expands output, (2) a âmarginâoptimizationâ strategy that sets prices above average to maximize profit per unit, and (3) a âmixedâ adaptive strategy that switches between the two depending on recent market signals. The optimal strategy depends on market competition: low competition (small z_c) favors price competition, while high competition (large z_c) favors margin optimization.
A striking finding is that, even without explicit communication, RL firms with similar policies selfâsegregate into distinct strategic groups, effectively concentrating market power and raising aggregate profits. This spontaneous clustering demonstrates that coordination can arise purely from the learning dynamics.
From a macroeconomic perspective, increasing the share of rational RL firms consistently raises total output (GDP) because firms allocate resources more efficiently. However, the impact on economic stability varies with the dominant strategy. When marginâoptimization dominates, the economy experiences higher profit levels but also greater output volatility, suggesting a tradeâoff between growth and stability. Conversely, the priceâcompetition strategy yields more stable output at the cost of lower average profits.
The authors make the RâMABM code publicly available, ensuring reproducibility, and argue that the framework offers a principled way to embed rational optimisation into ABMs without handâcrafting complex behavioural rules. By allowing researchers and policymakers to tune the proportion of rational agents and observe resulting macroâeffects, RâMABM opens new avenues for studying the interplay between microâlevel learning and macroâlevel outcomes, and for designing policies that balance growth, profitability, and economic volatility.
Comments & Academic Discussion
Loading comments...
Leave a Comment