We present ROSA -- Roundabout Optimized Speed Advisory -- a system that combines multi-agent trajectory prediction with coordinated speed guidance for multimodal, mixed traffic at roundabouts. Using a Transformer-based model, ROSA jointly predicts the future trajectories of vehicles and Vulnerable Road Users (VRUs) at roundabouts. Trained for single-step prediction and deployed autoregressively, it generates deterministic outputs, enabling actionable speed advisories. Incorporating motion dynamics, the model achieves high accuracy (ADE: 1.29m, FDE: 2.99m at a five-second prediction horizon), surpassing prior work. Adding route intention further improves performance (ADE: 1.10m, FDE: 2.36m), demonstrating the value of connected vehicle data. Based on predicted conflicts with VRUs and circulating vehicles, ROSA provides real-time, proactive speed advisories for approaching and entering the roundabout. Despite prediction uncertainty, ROSA significantly improves vehicle efficiency and safety, with positive effects even on perceived safety from a VRU perspective. The source code of this work is available under: github.com/urbanAIthi/ROSA.
Automated Driving (AD) promises safer, more efficient, and sustainable mobility. However, roundabouts remain a challenge to the AD stack due to dense, dynamic traffic and interaction-based driving behavior [1], [2]. Complexity further intensifies in multimodal traffic with Vulnerable Road ©2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. This article has been accepted for publication in the proceedings of the 2026 IEEE International Conference on Intelligent Transportation Systems (ITSC). This is the accepted manuscript version. The final version will be available in IEEE Xplore. DOI: To appear.
Users (VRUs, i.e., pedestrians and cyclists), whose behavior is highly variable and difficult to predict. Unlike human drivers, Automated Vehicles (AVs) cannot rely on eye contact or body language as a fallback to resolve uncertain situations, which poses a risk to the safe coexistence of AVs and VRUs at roundabouts [3], [4].
To improve VRU safety and predictability, infrastructurebased solutions such as prioritized zebra crossings or signalized crosswalks have been proposed [5], [6]. While effective in reducing risks, this static batching of road users causes wasteful stops and inefficient gap usage -reducing roundabout capacity, increasing delays, and raising emissions [5], [7]. Coordinating vehicle behavior in response to prioritized crossing VRUs can reduce inefficiencies while preserving safety. Proactive speed reduction has also been shown to build trust and improve perceived safety from a VRU perspective [8], [9]. However, such coordination depends on accurate prediction of VRU behavior. This paper proposes ROSA, a Roundabout Optimized Speed Advisory system. ROSA integrates interactionaware trajectory prediction into coordinated speed guidance for multimodal traffic. It is designed to support both automated and human-driven vehicles, aiming to safely and efficiently interact with VRUs at roundabouts.
To increase efficiency and traffic flow at roundabouts, several works propose a coordination among vehicles in a fully automated setting. By means of different techniques, such as forming clusters [23] or determining sequence and speed trajectories [24], [25], the vehicles are steered in a way that their driving behavior, i.e., speed or acceleration, is optimized. All studies successfully reduce travel and waiting times, fuel consumption, and emissions, depending on traffic [22] Attention-based GNN ✓ ✓ ✓ ✓ -3.0 3.0 -/ 1.7 conditions and demand. However, existing approaches neglect VRUs in both the coordination logic and evaluation, limiting applicability in multimodal traffic. Even accelerating toward the roundabout is encouraged if a cluster or a gap can be caught, which may compromise perceived safety from a VRU perspective. Previous works assume cooperative driving behavior in a fully automated setting. Based on their premises, a full AV penetration rate is required to achieve improvements [24], [26], which restricts applicability in mixed-traffic environments. Rather than predicting future traffic states, they analytically solve optimization problems based on the current situation, lacking an integrated prediction and coordination framework. Moreover, the works are not grounded in realworld data and do not quantify efficiency or safety gains in realistic traffic scenarios.
Proactive speed coordination in response to conflicting vehicles and VRUs requires accurate trajectory prediction to estimate their future positions. Several approaches exist, trained on real-world datasets and varying in methods and architectural design. Most adopt an ego-centric perspective [10], [12], [16], where an ego vehicle predicts future states of surrounding agents within its field of view. In contrast, so-called multi-agent approaches use a bird’s-eye view perspective to jointly predict motion of all agents, capturing interdependencies and improving accuracy [14], [15], [21], [22]. Common model architectures include Graph Neural Networks (GNNs), Recurrent Neural Networks (RNNs), such as Long Short-Term Memory (LSTM), and Transformers (TFs). RNNs are limited by their sequential processing, while GNNs rely on prior assumptions (e.g., graph construction or map inputs). In contrast, Transformers offer a more flexible, data-driven approach with minimal structural constraints [27]. Many works produce multi-trajectory predictions to handle uncertainty in motion and intent, capturing a probability distribution over possible future trajectories rather than a single deterministic path [11], [15], [16], [18], [19]. Prediction performance is typically assessed using Average Displacement Error (ADE) and Final D
This content is AI-processed based on open access ArXiv data.