Deviations from the Nash equilibrium in a two-player optimal execution game with reinforcement learning
The use of reinforcement learning algorithms in financial trading is becoming increasingly prevalent. However, the autonomous nature of these algorithms can lead to unexpected outcomes that deviate from traditional game-theoretical predictions and may even destabilize markets. In this study, we examine a scenario in which two autonomous agents, modelled with Double Deep Q-Learning, learn to liquidate the same asset optimally in the presence of market impact, under the Almgren-Chriss (2000) framework. We show that the strategies learned by the agents deviate significantly from the Nash equilibrium of the corresponding market impact game. Notably, the learned strategies exhibit supra-competitive solution, {which might be compatible with a tacit collusive behaviour}, closely aligning with the Pareto-optimal solution. We further explore how different levels of market volatility influence the agents’ performance and the equilibria they discover, including scenarios where volatility differs between the training and testing phases.
💡 Research Summary
This paper investigates how two autonomous trading agents, each equipped with a Double Deep Q‑Learning (DDQL) algorithm, learn to liquidate a common asset under the classic Almgren‑Chriss market‑impact framework. The authors first extend the single‑agent optimal execution problem to a two‑player open‑loop game, where each agent’s trades affect both the permanent impact (κ·vₜ/τ) and the temporary impact (α·vₜ/τ) on the mid‑price. In this setting the Nash equilibrium has been analytically derived in prior work (reference
Comments & Academic Discussion
Loading comments...
Leave a Comment