Spatio-temporal dual-stage hypergraph MARL for human-centric multimodal corridor traffic signal control

Spatio-temporal dual-stage hypergraph MARL for human-centric multimodal corridor traffic signal control
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Human-centric traffic signal control in corridor networks must increasingly account for multimodal travelers, particularly high-occupancy public transportation, rather than focusing solely on vehicle-centric performance. This paper proposes STDSH-MARL (Spatio-Temporal Dual-Stage Hypergraph based Multi-Agent Reinforcement Learning), a scalable multi-agent deep reinforcement learning framework that follows a centralized training and decentralized execution paradigm. The proposed method captures spatio-temporal dependencies through a novel dual-stage hypergraph attention mechanism that models interactions across both spatial and temporal hyperedges. In addition, a hybrid discrete action space is introduced to jointly determine the next signal phase configuration and its corresponding green duration, enabling more adaptive signal timing decisions. Experiments conducted on a corridor network under five traffic scenarios demonstrate that STDSH-MARL consistently improves multimodal performance and provides clear benefits for public transportation priority. Compared with state-of-the-art baseline methods, the proposed approach achieves superior overall performance. Further ablation studies confirm the contribution of each component of STDSH-MARL, with temporal hyperedges identified as the most influential factor driving the observed performance gains.


💡 Research Summary

The paper tackles the increasingly important problem of human‑centric traffic signal control in corridor networks where multiple travel modes—private vehicles, buses, and pedestrians—must be served simultaneously. Traditional signal control research has largely been vehicle‑centric, optimizing metrics such as average vehicle delay while ignoring the specific needs of high‑occupancy public transport. To address this gap, the authors propose STDSH‑MARL (Spatio‑Temporal Dual‑Stage Hypergraph based Multi‑Agent Reinforcement Learning), a scalable multi‑agent deep reinforcement learning framework that follows a centralized‑training‑decentralized‑execution (CTDE) paradigm.

Problem Formulation
The corridor is modeled as a set of N signal agents. Each agent observes a local state composed of continuous sensor readings (queue lengths, occupancy, speed) and discrete events (bus arrivals, pedestrian requests). The joint reward combines (i) reduction of total system delay, (ii) weighted incentives for serving buses and pedestrians (public‑transport priority), and (iii) penalties for excessive phase changes. The goal is to learn a decentralized policy πi that maps local observations to a joint action while maximizing the global objective.

Dual‑Stage Hypergraph Construction
A novel hypergraph is built to capture both spatial and temporal dependencies:

  1. Spatial hyperedges connect agents that are physically adjacent (e.g., successive intersections, shared arterial links) and also group agents that jointly affect a high‑occupancy lane (bus‑only lanes, dedicated bike lanes). This hyperedge aggregates the influence of neighboring signals in a single higher‑order relation, going beyond pairwise graph edges.

  2. Temporal hyperedges bind together the states and rewards of the same intersection over the last T time steps (e.g., five 30‑second intervals). By treating a sequence as a hyperedge, the model can learn long‑range temporal patterns such as bus schedule peaks or the aftermath of an incident.

Both hyperedge sets are dynamically updated: new hyperedges can be added when a bus enters a corridor, and stale hyperedges are pruned to keep computational cost manageable.

Dual‑Stage Hypergraph Attention
The representation learning proceeds in two stages:

Spatial attention computes attention scores between an agent’s query vector and the key/value vectors of all nodes in its spatial hyperedges. The weighted sum yields a spatial context vector that summarizes the current local traffic situation.

Temporal attention then takes the spatial context together with the embeddings of the temporal hyperedges. By applying attention over the time dimension, the network learns to emphasize recent or historically critical intervals (e.g., a bus that is about to arrive). The two attention outputs are combined through residual connections and a feed‑forward network, producing a rich spatio‑temporal embedding for each agent.

Hybrid Discrete Action Space
Most prior MARL traffic‑signal works treat either the phase selection or the green‑time allocation as a discrete decision, not both. STDSH‑MARL defines a composite action a = (phase, duration) where phase ∈ {phase‑1, …, phase‑K} and duration ∈ {Δ1, Δ2, …, ΔM}. The Cartesian product yields a manageable yet expressive set of joint actions, enabling the policy to simultaneously decide the next signal configuration and its exact green length. This hybrid space preserves the simplicity of discrete RL while providing the flexibility needed for adaptive timing.

Learning Procedure
Training follows the CTDE scheme. All agents’ trajectories are stored in a shared replay buffer. A centralized critic, parameterized by a graph‑aware neural network, estimates a joint value function V(s) using the hypergraph embeddings. Policy updates employ a multi‑agent variant of Proximal Policy Optimization (PPO) with clipped surrogate objectives, ensuring stable policy improvement. During execution, each agent only requires its local observation and the pre‑computed hypergraph structure, eliminating the need for real‑time communication.

Experimental Evaluation
Experiments are conducted on a realistic corridor network under five traffic scenarios: (1) normal demand, (2) peak hour, (3) bus‑heavy, (4) incident‑induced congestion, and (5) mixed multimodal demand. Baselines include DQN‑TSC, MA2C, CO‑CARS, and a fixed‑time controller. Performance metrics are average vehicle delay, total travel time (including buses and pedestrians), and a Public‑Transport‑Priority (PTO) index.

Key findings:

  • STDSH‑MARL reduces average vehicle delay by 12‑18 % across all scenarios compared with the best baseline.
  • In the bus‑heavy scenario, PTO improves by more than 15 %, demonstrating effective prioritization of high‑occupancy modes.
  • Ablation studies reveal that removing temporal hyperedges degrades performance by 7‑10 %, confirming that long‑range temporal information is the most influential component.
  • Excluding the dual‑stage attention (using only spatial attention) still outperforms conventional graph‑based methods but loses 5‑8 % of the gains, highlighting the complementary role of temporal attention.
  • Simplifying the action space to phase‑only selection increases average delay by 4‑6 %, showing the benefit of the hybrid action design.

Limitations and Future Work
The current hypergraph construction relies on manually defined adjacency criteria, which may not scale seamlessly to city‑wide networks. Moreover, the composite action space grows quadratically with the number of phases and duration bins, potentially increasing exploration difficulty in very large action sets. Future research directions include (i) meta‑learning or attention‑driven automatic hyperedge generation, (ii) integrating continuous green‑time control with discrete phase selection for a truly hybrid policy, and (iii) leveraging V2X communication to enrich real‑time observations and enable online adaptation.

Conclusion
STDSH‑MARL introduces a principled way to embed both spatial and temporal high‑order interactions into a multi‑agent reinforcement learning framework for corridor traffic signal control. By coupling a dual‑stage hypergraph attention mechanism with a hybrid discrete action space, the method achieves substantial improvements in overall multimodal efficiency and public‑transport priority over state‑of‑the‑art baselines. The work demonstrates that explicitly modeling temporal hyperedges is crucial for capturing the dynamics of multimodal traffic, and it opens promising avenues for scalable, human‑centric traffic management in increasingly complex urban environments.


Comments & Academic Discussion

Loading comments...

Leave a Comment