CRoSS: A Continual Robotic Simulation Suite for Scalable Reinforcement Learning with High Task Diversity and Realistic Physics Simulation

CRoSS: A Continual Robotic Simulation Suite for Scalable Reinforcement Learning with High Task Diversity and Realistic Physics Simulation
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Continual reinforcement learning (CRL) requires agents to learn from a sequence of tasks without forgetting previously acquired policies. In this work, we introduce a novel benchmark suite for CRL based on realistically simulated robots in the Gazebo simulator. Our Continual Robotic Simulation Suite (CRoSS) benchmarks rely on two robotic platforms: a two-wheeled differential-drive robot with lidar, camera and bumper sensor, and a robotic arm with seven joints. The former represent an agent in line-following and object-pushing scenarios, where variation of visual and structural parameters yields a large number of distinct tasks, whereas the latter is used in two goal-reaching scenarios with high-level cartesian hand position control (modeled after the Continual World benchmark), and low-level control based on joint angles. For the robotic arm benchmarks, we provide additional kinematics-only variants that bypass the need for physical simulation (as long as no sensor readings are required), and which can be run two orders of magnitude faster. CRoSS is designed to be easily extensible and enables controlled studies of continual reinforcement learning in robotic settings with high physical realism, and in particular allow the use of almost arbitrary simulated sensors. To ensure reproducibility and ease of use, we provide a containerized setup (Apptainer) that runs out-of-the-box, and report performances of standard RL algorithms, including Deep Q-Networks (DQN) and policy gradient methods. This highlights the suitability as a scalable and reproducible benchmark for CRL research.


💡 Research Summary

The paper introduces CRoSS (Continual Robotic Simulation Suite), a new benchmark designed for continual reinforcement learning (CRL) that combines high‑fidelity physics simulation with massive task diversity. CRoSS is built on the open‑source Gazebo simulator and ROS‑Transport, and it ships as an Apptainer container to guarantee reproducibility across Linux platforms. Two robotic platforms are provided: (1) a two‑wheeled differential‑drive robot equipped with lidar, RGB camera, bumper, and six controllable LEDs, and (2) a 7‑degree‑of‑freedom Franka Emika Panda‑style arm.

For the mobile robot, two families of environments are defined. The Multi‑Task Line Following (MLF) suite generates 150 distinct tracks by varying line colors, ground textures, and LED cues. Each task consists of 50 episodes (max 30 steps) where the agent receives a 100×3 RGB line‑camera image augmented with a lidar‑derived distance row. The action space is 18‑dimensional (three motion primitives × six LED selections). Rewards encourage the robot to stay on the left side of a central line and to activate the correct LED at the appropriate half of the track. A simplified version (SS) compresses the visual input into three 15‑pixel mono‑images encoding line colors, distance, and pixel offset; an ultra‑simplified version (SSS) further reduces the action set to six LED commands while a non‑adaptive controller determines the motion based on the known pixel offset.

The Multi‑Task Pushing Objects (MPO) suite also contains 150 variations, defined by combinations of five colors, six symbols, and five shapes. In each episode the robot starts 0.45 m from an object with a random orientation offset of ±18°, and must decide among four actions (turn left, turn right, go straight, stop) to either push a “pushable” object or avoid a “non‑pushable” one. Observations are 20×20 RGB front‑view images. Like MLF, MPO offers DS, SS, and SSS configurations that progressively simplify the visual input into population‑coded mono‑images.

The arm platform supports two control modalities. In the high‑level mode, the agent selects one of six Cartesian directions for the end‑effector, mirroring the Continual World benchmark. In the low‑level mode, the agent directly outputs joint angles for all seven joints, increasing the dimensionality of the action space and requiring precise torque control. Additionally, a kinematics‑only variant removes the physics engine, allowing the same tasks to run up to two orders of magnitude faster when sensor feedback is not required.

All environments expose a Gymnasium‑compatible API, enabling immediate use with existing RL libraries. The authors provide baseline results for Deep Q‑Network (DQN) on the mobile‑robot tasks and REINFORCE on the arm tasks. They also define three evaluation metrics: Average Forgetting (to quantify catastrophic forgetting across task switches), Forward Transfer (to measure positive knowledge transfer to new tasks), and Final Performance (overall success after the full task sequence). Experiments reveal that standard algorithms suffer substantial forgetting in the default (high‑complexity) settings, while performance stabilizes in the simplified configurations, highlighting the importance of task difficulty in CRL research.

Key contributions of the work are: (1) realistic, physics‑based robotic environments that can be transferred to real hardware via the ROS bridge; (2) systematic generation of hundreds of distinct tasks through visual and structural parameter sweeps, providing a scalable testbed for continual learning; (3) support for multiple control modalities (high‑level Cartesian, low‑level joint, kinematics‑only) to probe algorithmic robustness across action spaces; (4) a fully containerized, reproducible setup that eliminates the notorious installation hurdles of Gazebo‑based benchmarks; (5) publicly released baseline implementations and evaluation scripts, establishing a common ground for future CRL research. By addressing the limitations of existing robotic CRL suites—namely limited task numbers, lack of sensor diversity, and poor reproducibility—CRoSS positions itself as a comprehensive platform for advancing continual reinforcement learning in embodied, physically realistic settings.


Comments & Academic Discussion

Loading comments...

Leave a Comment