TrajMoE: Scene-Adaptive Trajectory Planning with Mixture of Experts and Reinforcement Learning

Reading time: 5 minute
...

📝 Original Info

  • Title: TrajMoE: Scene-Adaptive Trajectory Planning with Mixture of Experts and Reinforcement Learning
  • ArXiv ID: 2512.07135
  • Date: 2025-12-08
  • Authors: Zebin Xing, Pengxuan Yang, Linbo Wang, Yichen Zhang, Yiming Hu, Yupeng Zheng, Junli Wang, Yinfeng Gao, Guang Li, Kun Ma, Long Chen, Zhongpu Xia, Qichao Zhang, Hangjun Ye, Dongbin Zhao

📝 Abstract

Current autonomous driving systems often favor end-to-end frameworks, which take sensor inputs like images and learn to map them into trajectory space via neural networks. Previous work has demonstrated that models can achieve better planning performance when provided with a prior distribution of possible trajectories. However, these approaches often overlook two critical aspects: 1) The appropriate trajectory prior can vary significantly across different driving scenarios. 2) Their trajectory evaluation mechanism lacks policy-driven refinement, remaining constrained by the limitations of one-stage supervised training. To address these issues, we explore improvements in two key areas. For problem 1, we employ MoE to apply different trajectory priors tailored to different scenarios. For problem 2, we utilize Reinforcement Learning to fine-tune the trajectory scoring mechanism. Additionally, we integrate models with different perception backbones to enhance perceptual features. Our integrated model achieved a score of 51.08 on the navsim ICCV benchmark, securing third place.

💡 Deep Analysis

Figure 1

📄 Full Content

TrajMoE: Scene-Adaptive Trajectory Planning with Mixture of Experts and Reinforcement Learning Zebin Xing1 Pengxuan Yang1 Linbo Wang1 Yichen Zhang1 Yiming Hu1 Yupeng Zheng†1 Junli Wang1,2 Yinfeng Gao1,2 Guang Li2 Kun Ma2 Long Chen2 Zhongpu Xia1 Qichao Zhang1 Hangjun Ye2 Dongbin Zhao1 1Institute of Automation, Chinese Academy of Sciences 2Xiaomi †Project Leader Abstract— Current autonomous driving systems often favor end-to-end frameworks, which take sensor inputs like images and learn to map them into trajectory space via neural net- works. Previous work has demonstrated that models can achieve better planning performance when provided with a prior distri- bution of possible trajectories. However, these approaches often overlook two critical aspects: 1) The appropriate trajectory prior can vary significantly across different driving scenarios. 2) Their trajectory evaluation mechanism lacks policy-driven refinement, remaining constrained by the limitations of one- stage supervised training. To address these issues, we explore improvements in two key areas. For problem 1, we employ MoE to apply different trajectory priors tailored to different scenarios. For problem 2, we utilize Reinforcement Learning to fine-tune the trajectory scoring mechanism. Additionally, we integrate models with different perception backbones to enhance perceptual features. Our integrated model achieved a score of 51.08 on the navsim ICCV benchmark, securing third place. Index Terms— End-to-end Autonomous driving, Trajectory Planning, Model Ensembling. I. INTRODUCTION Frameworks like UniAD [2] utilize transformers with distinct functional modules to process and pass information, enabling mapping from sensor data to trajectory action space. Others, such as VAD [3], adopt sparser perceptual representa- tions to reduce computational demands. In recent end-to-end research, several works incorporate action anchors or priors. For instance, VADv2 [4], and GTRS [9] first cluster a large set of trajectories from the training dataset and then learn to select the optimal trajectory for the current scene from this cluster. DiffusionDrive [5] adds and removes noise based on a trajectory cluster, while GoalFlow [6] clusters trajectory endpoints to serve as prior information for goal points. A key finding from recent methods is that incorporating a prior distribution over trajectories in some form helps the model generate safer and more reliable planned trajectories. However, these methods typically rely on a fixed trajectory vocabulary or prior, despite the fact that action distributions can differ greatly across scenarios (e.g., going straight vs. turning). To address this limitation, we introduce Sparse MoE [10] to differentiate between trajectory distributions. Specifically, trajectories from the vocabulary are routed to different experts based on the router’s decision. Furthermore, we extend the standard supervised learning framework by performing additional fine-tuning using GRPO [8], following initial supervised learning on various driving score metrics. This aims to enhance the model’s accuracy in predicting driving scores. Finally, building upon the GTRS baseline, we experiment with replacing different perception backbones and ensemble the outputs of our models. This approach led to a score of 51.08 on the navsimv2 [1] ICCV competition, earning a third-place ranking. In summary, our main contributions are as follows: • We explore integrating Sparse MoE with a standard tra- jectory vocabulary, allowing trajectories to be processed by different experts to account for varying scenario- specific distributions. • We investigate augmenting supervised learning on driv- ing scores with subsequent GRPO fine-tuning to im- prove the model’s understanding and prediction of these scores. • We ensemble models with different perception back- bones, achieving strong performance on the navsim ICCV benchmark. II. METHODOLOGY A. Base Model Based on the established anchor trajectory paradigm, our architectural foundation is built upon the state-of-the-art GTRS framework [9]. The input to GTRS comprises a con- catenated tensor C ∈RH×W ×3, formed by spatially stitching image data from the front, front-left, and front-right cameras. A powerful perception backbone, specifically V299 [7], is employed to extract a dense and informative scene feature representation, denoted as Fcam ∈RTcam×Dcam, from this composite input. Subsequently, a predefined set of trajectory anchors, termed the trajectory vocabulary V, is embedded into a latent space Ftraj. The camera feature Fcam and the trajectory vocabulary feature Ftraj interact through a transformer ar- chitecture to produce an enhanced representation Ffusion. To facilitate discriminative scoring within the vocabulary, the GTRS model is optimized under a multi-task learning objective, supervising the predicted trajectory scores from several critical aspects. arXiv:2512.07135v2 [cs.CV] 9 Dec 2025 Sensor Input Backbone Mo

📸 Image Gallery

main_fig.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut