Competitive Multi-Operator Reinforcement Learning for Joint Pricing and Fleet Rebalancing in AMoD Systems
Autonomous Mobility-on-Demand (AMoD) systems promise to revolutionize urban transportation by providing affordable on-demand services to meet growing travel demand. However, realistic AMoD markets will be competitive, with multiple operators competing for passengers through strategic pricing and fleet deployment. While reinforcement learning has shown promise in optimizing single-operator AMoD control, existing work fails to capture competitive market dynamics. We investigate the impact of competition on policy learning by introducing a multi-operator reinforcement learning framework where two operators simultaneously learn pricing and fleet rebalancing policies. By integrating discrete choice theory, we enable passenger allocation and demand competition to emerge endogenously from utility-maximizing decisions. Experiments using real-world data from multiple cities demonstrate that competition fundamentally alters learned behaviors, leading to lower prices and distinct fleet positioning patterns compared to monopolistic settings. Notably, we demonstrate that learning-based approaches are robust to the additional stochasticity of competition, with competitive agents successfully converging to effective policies while accounting for partially unobserved competitor strategies.
💡 Research Summary
This paper tackles a critical gap in autonomous mobility‑on‑demand (AMoD) research: the impact of competition on reinforcement‑learning (RL) based control. While most prior work assumes a single, monopolistic operator that learns either vehicle rebalancing, dynamic pricing, or a joint policy, real‑world AMoD markets will feature several firms simultaneously adjusting fares and fleet deployments. To study this, the authors propose a competitive multi‑operator RL framework in which two independent agents learn, at each 3‑minute decision epoch, an origin‑based price scalar and a desired idle‑vehicle distribution.
The environment is modeled as a directed graph of N spatial zones. Each operator controls its own fleet (M₀ and M₁ vehicles) and observes only the competitor’s posted prices, not its fleet state or demand. Passenger demand is generated from real taxi‑trip data for three cities (San Francisco, Washington DC, and Manhattan South) and allocated through a multinomial logit (MNL) choice model. The utility for a passenger k choosing operator o on an OD pair (i, j) is
Uᵗₖᵢⱼₒ = β₀ − βₜ·vₖ·τᵢⱼ − \bar{v}·vₖ·pᵗᵢⱼₒ,
where vₖ is the hourly wage, τᵢⱼ the travel time, and pᵗᵢⱼₒ the fare (historical base price multiplied by the learned scalar ρᵗᵢₒ). This formulation captures both price sensitivity and income effects, allowing demand to shift endogenously as prices change. Passengers join an FCFS queue at their origin and abandon the request after a six‑minute waiting threshold.
The control problem is cast as a Markov Decision Process. The state includes the network adjacency matrix, each operator’s idle vehicle counts per zone, vehicles en‑route over a planning horizon, the previous price vectors of both firms, and local queue lengths. The action consists of (i) a vector of price scalars ρᵗᵢₒ ∈ (0, 1] and (ii) a vector of target idle‑vehicle shares wᵗᵢₒ that sum to one. Prices are computed as pᵗᵢⱼₒ = β·ρᵗᵢₒ·pᵗᵢⱼ (β≤2). The desired share vector is fed into a minimum‑cost flow problem that yields concrete rebalancing flows yᵗᵢⱼₒ. The reward for operator o is the profit from served trips minus operational and rebalancing costs.
Learning is performed with independent Actor‑Critic (A2C) agents. Both actor and critic use a graph convolutional network (GCN) to embed spatial information, followed by fully‑connected layers. The actor outputs parameters of a Beta distribution (for pricing) and a Dirichlet distribution (for rebalancing shares), from which stochastic actions are sampled. No parameter sharing occurs, so each firm must infer the competitor’s strategy solely from observed prices and resulting system dynamics.
Experiments span three cities with distinct demand variability (coefficient of variation CV = 1.31, 1.26, 0.69). Baselines include a “No‑Control” scenario (no pricing or rebalancing) and a uniform‑distribution heuristic (vehicles spread evenly, fixed prices). In the monopolistic setting, a joint pricing‑and‑rebalancing policy consistently outperforms pricing‑only, rebalancing‑only, and the baselines, especially in high‑variability San Francisco where it yields a 22 % gain over rebalancing alone and a 42 % gain over pricing alone. In low‑variability Manhattan South, pricing‑only already achieves near‑optimal performance, reducing the marginal benefit of joint control.
In the competitive dual‑operator setting, results diverge across cities. In San Francisco, the joint policy still attains the highest combined profit (≈ $10,415), reflecting aggressive price competition that drives fares down while both firms concentrate vehicles in high‑demand zones. In Washington DC, rebalancing‑only control dominates (≈ $15,620), indicating that spatial positioning is the primary lever when demand variability is moderate. In Manhattan South, pricing‑only control yields the greatest profit (≈ $18,632), suggesting that in dense, relatively homogeneous demand environments, price competition can serve as an implicit rebalancing mechanism. Notably, the joint policy underperforms in Manhattan South, highlighting the added difficulty of coordinating long‑horizon rebalancing when the opponent’s future actions are uncertain.
Despite the added stochasticity from simultaneous learning, both agents converge to stable policies across all experiments, demonstrating the robustness of the RL approach to competitive dynamics and partial observability (operators only see competitor prices). The authors also conduct sensitivity analyses on fleet size asymmetry and wage heterogeneity, confirming that the qualitative patterns persist.
Overall, the paper provides a rigorous, data‑driven framework for studying AMoD markets under competition. By integrating discrete‑choice demand modeling with multi‑agent RL, it offers actionable insights for operators and regulators: in markets with high demand volatility, fleet positioning is the key competitive lever; in dense, low‑variance markets, price competition can effectively balance supply and demand. The work opens avenues for extending to more than two operators, incorporating real‑time traffic conditions, and optimizing additional objectives such as passenger welfare or environmental impact.
Comments & Academic Discussion
Loading comments...
Leave a Comment