Curriculum-Based Reinforcement Learning for Autonomous UAV Navigation in Unknown Curved Tubular Conduit
Autonomous drone navigation in confined tubular environments remains a major challenge due to the constraining geometry of the conduits, the proximity of the walls, and the perceptual limitations inherent to such scenarios. We propose a reinforcement learning approach enabling a drone to navigate unknown three-dimensional tubes without any prior knowledge of their geometry, relying solely on local observations from LiDAR and a conditional visual detection of the tube center. In contrast, the Pure Pursuit algorithm, used as a deterministic baseline, benefits from explicit access to the centerline, creating an information asymmetry designed to assess the ability of RL to compensate for the absence of a geometric model. The agent is trained through a progressive Curriculum Learning strategy that gradually exposes it to increasingly curved geometries, where the tube center frequently disappears from the visual field. A turning-negotiation mechanism, based on the combination of direct visibility, directional memory, and LiDAR symmetry cues, proves essential for ensuring stable navigation under such partial observability conditions. Experiments show that the PPO policy acquires robust and generalizable behavior, consistently outperforming the deterministic controller despite its limited access to geometric information. Validation in a high-fidelity 3D environment further confirms the transferability of the learned behavior to a continuous physical dynamics. The proposed approach thus provides a complete framework for autonomous navigation in unknown tubular environments and opens perspectives for industrial, underground, or medical applications where progressing through narrow and weakly perceptive conduits represents a central challenge.
💡 Research Summary
This paper presents a reinforcement learning (RL) framework for enabling an autonomous drone to navigate unknown three-dimensional curved tubular conduits without any prior geometric knowledge. The core challenge lies in operating within highly constrained, GPS-denied environments where the drone must avoid collisions while maintaining a centered trajectory, despite severe perceptual limitations, especially during sharp turns when the tube’s central reference point vanishes from view.
The proposed approach trains a drone agent using the Proximal Policy Optimization (PPO) algorithm. The agent’s perception is restricted to local observations: a forward-facing camera that conditionally detects the tube’s center point (when visible), and front/rear LiDARs providing distance-to-wall measurements. This setup creates a deliberate “information asymmetry” when compared to a deterministic Pure Pursuit baseline controller, which is granted privileged access to the tube’s exact centerline geometry. This comparison rigorously tests the RL agent’s ability to compensate for the lack of a global model.
A key innovation is the “turning-negotiation mechanism” designed to handle partial observability during turns. When the visual target is lost, the agent relies on a combination of short-term directional memory (retaining the last known target direction) and geometric cues derived from LiDAR symmetry and asymmetry features. The observation space is a sophisticated 37-dimensional vector incorporating these elements, along with drone kinematics and orientation, rather than raw sensor data.
To ensure stable and generalizable policy learning, a Curriculum Learning strategy is employed. Training begins with nearly straight tubes (Level 0) and progressively advances to moderately curved (Level 1) and finally highly curved geometries (Level 2) where the center point is frequently occluded. This gradual exposure prevents policy collapse and teaches the agent robust behaviors adaptable to unseen tube shapes.
Experimental results demonstrate that the learned RL policy consistently outperforms the Pure Pursuit baseline in terms of success rate and trajectory smoothness, despite having access to far less information. Furthermore, validation in a high-fidelity 3D simulation environment confirms the transferability of the discrete-time learned policy to continuous physical dynamics. The work provides a complete framework for autonomous navigation in confined tubular spaces, opening perspectives for practical applications in industrial inspection, underground exploration, and medical endoscopic procedures.
Comments & Academic Discussion
Loading comments...
Leave a Comment