Multi-Task Reinforcement Learning of Drone Aerobatics by Exploiting Geometric Symmetries
Flight control for autonomous micro aerial vehicles (MAVs) is evolving from steady flight near equilibrium points toward more aggressive aerobatic maneuvers, such as flips, rolls, and Power Loop. Although reinforcement learning (RL) has shown great potential in these tasks, conventional RL methods often suffer from low data efficiency and limited generalization. This challenge becomes more pronounced in multi-task scenarios where a single policy is required to master multiple maneuvers. In this paper, we propose a novel end-to-end multi-task reinforcement learning framework, called GEAR (Geometric Equivariant Aerobatics Reinforcement), which fully exploits the inherent SO(2) rotational symmetry in MAV dynamics and explicitly incorporates this property into the policy network architecture. By integrating an equivariant actor network, FiLM-based task modulation, and a multi-head critic, GEAR achieves both efficiency and flexibility in learning diverse aerobatic maneuvers, enabling a data-efficient, robust, and unified framework for aerobatic control. GEAR attains a 98.85% success rate across various aerobatic tasks, significantly outperforming baseline methods. In real-world experiments, GEAR demonstrates stable execution of multiple maneuvers and the capability to combine basic motion primitives to complete complex aerobatics.
💡 Research Summary
The paper introduces GEAR (Geometric Equivariant Aerobatics Reinforcement), a multi‑task reinforcement learning framework designed to enable a single policy to master a suite of aggressive micro‑aerial vehicle (MAV) maneuvers such as flips, rolls, rotates, and power loops. The core insight is that MAV dynamics are invariant under rotations about the gravity axis, i.e., an SO(2) symmetry in yaw. By embedding this symmetry directly into the neural network architecture, the authors dramatically improve sample efficiency and generalization across tasks.
State representation is constructed in the body‑frame relative coordinates (position, velocity, angular velocity, and rotation relative to the desired pose). This representation is provably invariant to global yaw rotations, preserving the SO(2) symmetry in the policy inputs. The policy network consists of three components: (1) an SO(2)‑equivariant backbone (implemented via group‑convolution or equivariant linear layers) that processes the invariant state; (2) a FiLM (Feature‑wise Linear Modulation) module that receives a one‑hot task identifier and scalar task parameters, producing task‑specific scaling (γ) and bias (β) that modulate the backbone features, allowing a single shared encoder to represent many maneuvers; (3) a multi‑head critic, where each head estimates the value function for a particular task, preventing interference between heterogeneous reward signals.
The reward function is unified across tasks but includes task‑specific shaping terms. Basic tracking terms penalize deviations in relative position, linear velocity, and angular velocity using a smooth kernel H(x;k)=1/(1+kx). An additional command‑adherence term forces the policy to achieve high‑level command attributes (e.g., desired angular velocity or number of rolls). Each maneuver contributes a specific geometric term: quaternion similarity for hover, lateral offset for rotate, axis alignment for roll, and a constant reward for flip.
Experiments are conducted in two stages. In high‑fidelity simulation, GEAR is trained on four basic primitives simultaneously. Compared with baseline multi‑task methods that lack equivariance or use a single critic, GEAR achieves a 9.53 % higher final return and a 98.85 % success rate across all tasks. Ablation studies confirm that removing the equivariant backbone, the FiLM modulation, or the multi‑head critic each leads to substantial performance drops.
Real‑world validation uses a 0.2 kg quadrotor. The same policy, without any fine‑tuning, successfully executes each primitive and can compose them to perform complex aerobatics such as power loops, barrel rolls, and multi‑flip sequences. Task parameters (flip speed, number of rolls, rotation velocity) are adjustable at runtime, demonstrating flexibility.
The contributions are threefold: (1) the first end‑to‑end multi‑task RL framework that explicitly encodes SO(2) geometric symmetry into the policy architecture; (2) a novel combination of equivariant representation, FiLM‑based task conditioning, and multi‑head value estimation that yields both data efficiency and task separation; (3) extensive simulation and physical experiments that prove the approach’s robustness and practicality for high‑speed MAV aerobatics.
Limitations include the focus on only yaw‑axis symmetry; environments with wind, obstacles, or full SE(3) asymmetries are not addressed. The FiLM modulation is linear, which may restrict representation of highly nonlinear task variations. Future work is suggested to extend symmetry handling to larger groups, incorporate adaptive symmetry relaxation, and bridge the sim‑to‑real gap with meta‑learning or domain randomization techniques.
Comments & Academic Discussion
Loading comments...
Leave a Comment