No-Regret Learning in Stackelberg Games with an Application to Electric Ride-Hailing

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We consider the problem of efficiently learning to play single-leader multi-follower Stackelberg games when the leader lacks knowledge of the lower-level game. Such games arise in hierarchical decision-making problems involving self-interested agents. For example, in electric ride-hailing markets, a central authority aims to learn optimal charging prices to shape fleet distributions and charging patterns of ride-hailing companies. Existing works typically apply gradient-based methods to find the leader’s optimal strategy. Such methods are impractical as they require that the followers share private utility information with the leader. Instead, we treat the lower-level game as a black box, assuming only that the followers’ interactions approximate a Nash equilibrium while the leader observes the realized cost of the resulting approximation. Under kernel-based regularity assumptions on the leader’s cost function, we develop a no-regret algorithm that converges to an $ε$-Stackelberg equilibrium in $O(\sqrt{T})$ rounds. Finally, we validate our approach through a numerical case study on optimal pricing in electric ride-hailing markets.

💡 Research Summary

The paper tackles the problem of learning optimal strategies in single‑leader, multi‑follower Stackelberg games when the leader has no knowledge of the followers’ utility functions or the exact lower‑level equilibrium. This setting is motivated by electric ride‑hailing markets, where a central authority (the leader) wishes to set spatially varying electricity prices to steer the distribution of electric vehicles (EVs) across city districts, while competing ride‑hailing firms (the followers) independently decide how many EVs to allocate to each district. Existing approaches rely on first‑order hyper‑gradient methods that require the followers to share private gradient information, which is unrealistic in practice.

The authors model the lower‑level game as a black box that, given a price vector π, returns an approximate Nash equilibrium x(π) after a finite number of internal iterations. They assume the lower‑level game is concave and α‑strongly monotone, guaranteeing a unique Nash equilibrium for any π. The leader’s cost J(π, x) is assumed to be Lipschitz in the followers’ actions and to belong to a reproducing kernel Hilbert space (RKHS) with bounded norm, enabling a Gaussian‑process (GP) prior over J. Observations are bandit‑type: after each round t the leader sees only the realized cost J(π_t, x_t(π_t)) = J(π_t, x^*(π_t)) − ε_t, where ε_t is a sub‑Gaussian error due to the followers’ approximation.

Using the GP framework, the leader maintains posterior mean μ_t(π) and variance σ_t²(π) based on all past cost observations. The outer learning loop selects the next price vector by an Upper Confidence Bound (UCB) rule that balances exploration (high variance) and exploitation (low mean). The inner loop, denoted ApproxNE(π_t, K), runs a fixed number K of iterations of a standard Nash‑equilibrium learning algorithm (e.g., projected gradient descent) for the followers, ensuring that the approximation error decays polynomially with T.

The main theoretical contribution is a regret analysis showing that, under the stated regularity assumptions, the cumulative regret R_T = Σ_{t=1}^T J(π_t, x_t(π_t)) − min_{π∈Π} J(π, x^*(π)) grows at most O(√T). Consequently, the average regret vanishes as T → ∞, establishing a no‑regret guarantee. Moreover, the authors prove that no‑regret implies convergence to an ε‑Stackelberg equilibrium, where ε captures both the followers’ approximation error and the GP estimation error.

To validate the approach, the authors construct a realistic simulation of an electric ride‑hailing market. Each firm i has a fleet of M_i EVs and chooses allocations x_i ∈ ℝ^d subject to capacity constraints. The firm’s utility combines revenue proportional to market share (a function of total vehicles in a district) and charging costs determined by the price vector π. The central authority’s objective is to minimize the squared L2 distance between the resulting vehicle distribution x(π) and a target distribution ξ^*. The simulation demonstrates that the proposed GP‑UCB algorithm rapidly reduces the authority’s cost, achieving near‑optimal pricing within a few dozen rounds and maintaining the vehicle distribution within a few percent of the target. Compared to prior single‑follower methods and zeroth‑order hyper‑gradient schemes, the new algorithm converges faster and requires no information about the followers’ utilities.

In summary, the paper presents a principled, privacy‑preserving, and computationally efficient method for learning Stackelberg equilibria in hierarchical games with black‑box lower levels. By leveraging kernel‑based regularity and Gaussian‑process bandit optimization, it achieves O(√T) regret and provable convergence to an ε‑Stackelberg equilibrium, making it directly applicable to smart‑mobility policy design and other domains where only outcome‑level feedback is available. Future work may extend the framework to non‑convex follower games, multi‑objective leader objectives, and online settings with streaming data.

No-Regret Learning in Stackelberg Games with an Application to Electric Ride-Hailing

💡 Research Summary

Comments & Academic Discussion

Leave a Comment