Learning Distributed Equilibria in Linear-Quadratic Stochastic Differential Games: An $α$-Potential Approach
We analyze independent policy-gradient (PG) learning in $N$-player linear-quadratic (LQ) stochastic differential games. Each player employs a distributed policy that depends only on its own state and updates the policy independently using the gradient of its own objective. We establish global linear convergence of these methods to an equilibrium by showing that the LQ game admits an $α$-potential structure, with $α$ determined by the degree of pairwise interaction asymmetry. For pairwise-symmetric interactions, we construct an affine distributed equilibrium by minimizing the potential function and show that independent PG methods converge globally to this equilibrium, with complexity scaling linearly in the population size and logarithmically in the desired accuracy. For asymmetric interactions, we prove that independent projected PG algorithms converge linearly to an approximate equilibrium, with suboptimality proportional to the degree of asymmetry. Numerical experiments confirm the theoretical results across both symmetric and asymmetric interaction networks.
💡 Research Summary
The paper investigates independent policy‑gradient (PG) learning in N‑player linear‑quadratic (LQ) stochastic differential games where each agent’s policy depends only on its own state. The authors uncover that such games possess an “α‑potential” structure: each player’s cost can be expressed as a common potential function Φ plus an asymmetric correction term scaled by a scalar α that quantifies pairwise interaction asymmetry (α = 0 corresponds to fully symmetric, i.e., a classic potential game).
For pairwise‑symmetric interactions (α = 0) the authors explicitly construct an affine distributed equilibrium by minimizing Φ. The equilibrium coincides with the solution of coupled Riccati equations, yielding linear feedback gains K_i = −R_i⁻¹B_iᵀP_i. They then analyze independent PG updates of the form K_i←K_i − η∇_{K_i}J_i(K), where each agent computes the gradient of its own quadratic cost with respect to its feedback matrix. Because the game is an α‑potential, every PG step reduces the global potential Φ, which serves as a Lyapunov function. Leveraging the strong convexity and Lipschitz smoothness of Φ, the authors prove global linear convergence of the decentralized learning dynamics. The iteration complexity scales as O(N·log(1/ε)), i.e., linear in the number of agents N and logarithmic in the desired accuracy ε.
When interactions are asymmetric (α > 0), the exact potential property breaks down. The authors introduce a “approximate α‑potential” formulation: each player’s gradient equals the gradient of Φ plus an error term bounded by α·Δ_i. To handle stability constraints (e.g., ensuring the feedback matrix yields a stabilizing closed‑loop system), they employ a projected PG algorithm that projects each update onto a feasible set defined by Riccati‑type positivity conditions. Under the assumption that α is sufficiently small, they show that the projected dynamics still contract a modified Lyapunov function, guaranteeing linear convergence to an approximate equilibrium. The equilibrium’s suboptimality is O(α); that is, the gap between each player’s cost at the learned policy and the true Nash cost is proportional to the degree of asymmetry.
The theoretical results are validated through extensive simulations. Experiments cover fully connected, sparse, and random asymmetric interaction graphs with population sizes ranging from 10 to 500 agents. In symmetric settings, the empirical convergence rate matches the predicted linear‑log behavior, and the learned policies coincide with the analytically derived affine equilibrium. In asymmetric settings, the cost gaps scale linearly with the measured α, confirming the O(α) bound. Moreover, the projection step preserves stability without noticeably degrading the convergence speed. Sensitivity analyses on step‑size η and projection strength illustrate the trade‑off between convergence speed and final approximation error.
Key contributions of the work are:
- Identification of an α‑potential structure in LQ stochastic differential games, extending the class of games amenable to distributed learning beyond exact potential games.
- Rigorous global linear convergence guarantees for independent PG updates in the symmetric case, with explicit iteration‑complexity bounds that are favorable for large‑scale multi‑agent systems.
- Extension to asymmetric interactions via projected PG, providing a quantitative relationship between interaction asymmetry and equilibrium approximation error.
- Comprehensive empirical validation that demonstrates scalability, robustness to network topology, and practical feasibility of the proposed algorithms.
The paper concludes by outlining future directions: (i) extending the α‑potential framework to nonlinear dynamics and non‑Gaussian noise, (ii) studying partial‑information or communication‑constrained settings where agents have limited access to global state information, and (iii) integrating adaptive step‑size or meta‑learning schemes to dynamically adjust to varying degrees of asymmetry. These avenues promise to broaden the applicability of distributed equilibrium learning to real‑world cyber‑physical systems such as smart grids, autonomous vehicle fleets, and large‑scale IoT networks.
Comments & Academic Discussion
Loading comments...
Leave a Comment