The Invisible Handshake: Tacit Collusion between Adaptive Market Agents

The Invisible Handshake: Tacit Collusion between Adaptive Market Agents
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We study the emergence of tacit collusion in a repeated game between a market maker, who controls market liquidity, and a market taker, who chooses trade quantities. The market price evolves according to the endogenous price impact of trades and exogenous innovations to economic fundamentals. We define collusion as persistent overpricing over economic fundamentals and characterize the set of feasible and collusive strategy profiles. Our main result shows that a broad class of simple learning dynamics, including gradient ascent updates, converges in finite time to collusive strategies when the agents maximize individual wealth, defined as the value of their portfolio, without any explicit coordination. The key economic mechanism is that when aggregate supply in the market is positive, overpricing raises the market capitalization and thus the total wealth of market participants, inducing a cooperative component in otherwise non-cooperative learning objectives. These results identify an inherent structure through which decentralized learning by AI-driven agents can autonomously generate persistent overpricing in financial markets.


💡 Research Summary

The paper develops a tractable theoretical framework to study how tacit collusion can arise spontaneously between two AI‑driven market participants—a market maker (liquidity provider) and a market taker (liquidity consumer)—when each agent simply maximizes its own portfolio value. The authors model the market as a discrete‑time stochastic game. At each period a signed trade quantity (Q_t) is executed, generating an endogenous price impact (\delta_t) that is proportional to the square‑root of the trade size: (\delta_t = \alpha_t\sqrt{Q_t}) for buys and (\delta_t = \beta_t\sqrt{-Q_t}) for sells, where (\alpha_t\ge 0) and (\beta_t\le 0) are the maker’s liquidity‑parameter choices. An exogenous fundamental shock (\varepsilon_{t+1}>0) (i.i.d. with finite mean and variance) then multiplicatively updates the price: (P_{t+1} = (P_t+\delta_t)\varepsilon_{t+1}).

Both agents hold cash and inventory, and their wealth at time (t) is the mark‑to‑market value (W_p(t)=C_p(t)+P_t I_p(t)). Two objective formulations are considered: (i) a myopic goal of maximizing the expected one‑step wealth increment, and (ii) a farsighted goal of maximizing long‑run average wealth as the horizon (T\to\infty).

The central analytical contribution is a decomposition of the game’s payoff into a competitive component and a cooperative component. The competitive component dominates when the aggregate inventory (I = I_M+I_T) equals zero; in that case any price movement harms at least one player, and the unique stable equilibrium coincides with the fundamental price (no overpricing). The cooperative component emerges when (I>0): a higher price raises the market capitalization of all participants, thereby aligning a portion of each player’s wealth‑maximization objective with the goal of raising the price. This creates an implicit “team” incentive embedded within a non‑cooperative setting.

Learning dynamics are modeled as gradient‑ascent updates on the expected wealth‑increase function. The authors consider a broad class of adaptive algorithms that can be expressed as stochastic block‑coordinate gradient steps (i.e., at each iteration a randomly selected parameter—(\alpha_t), (\beta_t) or (Q_t)—is updated using an unbiased gradient estimate). The main theorem shows that, under mild regularity conditions, such dynamics enter the collusive region (the set of parameter values that generate persistent price overvaluation relative to fundamentals) in finite time with probability one, and thereafter remain trapped there. The proof hinges on two facts: (1) inside the collusive region the expected wealth‑increase gradient points strictly inward, guaranteeing a positive drift toward the interior; (2) on the boundary the stochastic update has a non‑zero probability of moving back inside, making the boundary reflecting rather than absorbing.

The analysis is extended to the farsighted objective. By deriving a closed‑form decomposition of long‑run wealth, the authors demonstrate that the cooperative term continues to dominate when aggregate inventory is positive, so the same finite‑time convergence to the collusive region holds for the long‑run objective as well. Consequently, whether agents are myopic or forward‑looking, simple gradient‑based learning inevitably leads to a stable, over‑priced equilibrium whenever the market carries net long positions.

Policy implications are emphasized. Current antitrust enforcement focuses on explicit communication; however, the paper shows that AI‑driven agents can autonomously generate a “price‑inflation club” without any overt coordination. The structural driver is the sign of total inventory: positive net inventory creates a built‑in incentive for agents to collectively push prices up. Regulators might therefore consider rules that limit net positions, impose minimum liquidity provision standards (to keep (\alpha_t,\beta_t) from becoming too large), or monitor the evolution of agents’ learning parameters for signatures of entry into the collusive region.

Limitations include the restriction to two representative agents (ignoring the many‑agent, high‑frequency environment of real markets), the assumption of a square‑root price impact law (real markets exhibit asymmetric and state‑dependent impact), and the focus on gradient‑based learning (actual deep‑reinforcement learners involve exploration‑exploitation trade‑offs and non‑convex loss surfaces). Future work could extend the model to multiple heterogeneous agents, incorporate more realistic microstructure dynamics, and test the theoretical predictions with empirical market data or high‑fidelity simulations.

In sum, the paper provides a rigorous, game‑theoretic explanation for how decentralized, wealth‑maximizing AI agents can self‑organize into a tacitly collusive regime that sustains price overvaluation, highlighting a novel source of systemic risk in algorithmic finance.


Comments & Academic Discussion

Loading comments...

Leave a Comment