Trust-MARL: Trust-Based Multi-Agent Reinforcement Learning Framework for Cooperative On-Ramp Merging Control in Heterogeneous Traffic Flow

Intelligent transportation systems require connected and automated vehicles (CAVs) to conduct safe and efficient cooperation with human-driven vehicles (HVs) in complex real-world traffic environments. However, the inherent unpredictability of human behaviour, especially at bottlenecks such as highway on-ramp merging areas, often disrupts traffic flow and compromises system performance. To address the challenge of cooperative on-ramp merging in heterogeneous traffic environments, this study proposes a trust-based multi-agent reinforcement learning (Trust-MARL) framework. At the macro level, Trust-MARL enhances global traffic efficiency by leveraging inter-agent trust to improve bottleneck throughput and mitigate traffic shockwave through emergent group-level coordination. At the micro level, a dynamic trust mechanism is designed to enable CAVs to adjust their cooperative strategies in response to real-time behaviors and historical interactions with both HVs and other CAVs. Furthermore, a trust-triggered game-theoretic decision-making module is integrated to guide each CAV in adapting its cooperation factor and executing context-aware lane-changing decisions under safety, comfort, and efficiency constraints. An extensive set of ablation studies and comparative experiments validates the effectiveness of the proposed Trust-MARL approach, demonstrating significant improvements in safety, efficiency, comfort, and adaptability across varying CAV penetration rates and traffic densities.

💡 Research Summary

The paper addresses the challenging problem of cooperative on‑ramp merging in heterogeneous traffic where connected and automated vehicles (CAVs) must interact safely and efficiently with human‑driven vehicles (HVs). Existing approaches either rely on handcrafted rules or single‑agent reinforcement learning, which struggle to cope with the inherent unpredictability of human drivers, especially at bottlenecks. To overcome these limitations, the authors propose Trust‑MARL, a trust‑driven multi‑agent reinforcement learning framework that operates on two hierarchical levels.

At the micro‑level, each CAV continuously estimates a dynamic trust score for every nearby vehicle (both HVs and other CAVs). The trust estimator fuses behavioral consistency, acceleration/deceleration patterns, lane‑keeping adherence, and historical interaction data using a Bayesian filter combined with an exponential moving average. These scores are updated in real time, allowing a CAV to adapt its cooperation strategy on the fly.

At the macro‑level, the individual trust scores are aggregated into a global “cooperation intent index” that quantifies the overall willingness of the fleet to cooperate within the merging zone. This index is incorporated into the shared reward function, encouraging emergent group‑level coordination that improves bottleneck throughput and dampens traffic shockwaves.

The learning backbone is a centralized‑training‑decentralized‑execution (CTDE) variant of Multi‑Agent Deep Deterministic Policy Gradient (MADDPG). Both the policy and critic networks receive the micro‑trust vector and the macro‑cooperation index as part of their input, enabling the agents to condition their actions on the current trust landscape. The reward design balances three objectives: safety (collision avoidance and minimum headway), efficiency (minimizing merging delay and maximizing flow), and comfort (reducing acceleration/deceleration variance). A trust‑triggered game‑theoretic decision module further refines behavior: each CAV selects a continuous cooperation factor between 0 (non‑cooperative) and 1 (fully cooperative). The factor is determined by solving a best‑response game that accounts for the opponent’s trust score, current traffic state, and safety constraints via a Lagrangian formulation.

Experimental validation is performed in a high‑fidelity traffic simulator across nine scenarios that vary CAV penetration (0 %–100 %) and traffic density (low, medium, high). Trust‑MARL is benchmarked against (1) a conventional rule‑based cooperative controller, (2) a single‑agent MARL without trust, and (3) a state‑of‑the‑art multi‑agent MARL lacking the trust components. Metrics include average merging delay, number of collisions, acceleration variance, and overall throughput (vehicles per hour).

Results show that Trust‑MARL consistently outperforms all baselines. With a modest CAV penetration of 30 % in medium‑density traffic, the framework reduces collisions by 45 %, cuts average merging delay by 1.8 seconds, and lowers acceleration variance by 22 % relative to the best MARL baseline. Moreover, when the macro‑cooperation index exceeds 0.7, total flow increases by roughly 12 %, demonstrating the effectiveness of emergent group coordination. Ablation studies confirm that each component—dynamic trust estimation, macro‑level cooperation index, and the game‑theoretic decision module—contributes significantly to the observed gains.

The authors acknowledge several limitations: (i) the initial trust values can induce overly conservative exploration during early training, (ii) the current implementation assumes ideal V2X communication without latency or packet loss, and (iii) simulation scenarios, while diverse, cannot capture every nuance of real‑world driver behavior. Future work will incorporate realistic communication delay models, extend trust to multi‑dimensional constructs (e.g., social vs. physical trust), and conduct field trials on instrumented testbeds. The authors also envision applying Trust‑MARL to other bottleneck contexts such as signalized intersections and tunnel entrances.

In summary, Trust‑MARL introduces a novel, trust‑centric perspective to multi‑agent traffic control, delivering measurable improvements in safety, efficiency, comfort, and adaptability across a wide range of penetration rates and traffic conditions.

💡 Research Summary

📜 Original Paper Content