Residual Reinforcement Learning for Waste-Container Lifting Using Large-Scale Cranes with Underactuated Tools
This paper studies the container lifting phase of a waste-container recycling task in urban environments, performed by a hydraulic loader crane equipped with an underactuated discharge unit, and proposes a residual reinforcement learning (RRL) approach that combines a nominal Cartesian controller with a learned residual policy. All experiments are conducted in simulation, where the task is characterized by tight geometric tolerances between the discharge-unit hooks and the container rings relative to the overall crane scale, making precise trajectory tracking and swing suppression essential. The nominal controller uses admittance control for trajectory tracking and pendulum-aware swing damping, followed by damped least-squares inverse kinematics with a nullspace posture term to generate joint velocity commands. A PPO-trained residual policy in Isaac Lab compensates for unmodeled dynamics and parameter variations, improving precision and robustness without requiring end-to-end learning from scratch. We further employ randomized episode initialization and domain randomization over payload properties, actuator gains, and passive joint parameters to enhance generalization. Simulation results demonstrate improved tracking accuracy, reduced oscillations, and higher lifting success rates compared to the nominal controller alone.
💡 Research Summary
The paper addresses the highly demanding task of lifting waste containers in urban recycling operations using a large‑scale hydraulic loader crane equipped with an underactuated discharge unit. Precise engagement of the container’s hooking rings requires sub‑centimeter accuracy despite the crane’s massive inertia, compliance, and the pendulum‑like dynamics of the discharge tool. To meet these challenges, the authors propose a Residual Reinforcement Learning (RRL) framework that augments a well‑engineered Cartesian controller with a learned residual policy, thereby combining the reliability of model‑based control with the adaptability of model‑free reinforcement learning.
Nominal Cartesian controller – The baseline controller operates in task‑space and consists of three components: (i) an admittance controller that translates the desired TCP trajectory into a virtual force using tunable mass, damping, and stiffness parameters; (ii) a pendulum‑aware anti‑swing term that derives a corrective horizontal acceleration from a linearized pendulum model (θ¨ + 2ζωₙθ˙ + ωₙ²θ = 0) and filters swing estimates; (iii) a damped least‑squares inverse kinematics (IK) solver augmented with a null‑space posture term to generate joint‑velocity commands. This controller alone can track the reference trajectory and provide basic swing damping, but residual errors remain due to unmodeled hydraulic dynamics, payload variations, and structural compliance.
Residual policy – A Proximal Policy Optimization (PPO) network (MLP with layers 128‑64‑32) learns a corrective joint‑velocity vector (res_u). The policy receives a 78‑dimensional observation that concatenates joint positions/velocities, discharge‑unit states, reference TCP points, a tube‑distance metric δ_tube (measuring deviation from a tubular corridor around the reference path), the previous nominal action, and the previous residual action over the last three timesteps. By including a short history, the policy can infer the underlying dynamics and the behavior of the nominal controller. The residual is blended with the nominal command using an error‑dependent weight λ, and it is activated only during the critical horizontal alignment segment (segment B), where precision is most demanding.
Training methodology – All experiments are performed in NVIDIA Isaac Lab. Each episode begins with a randomized container pose within a bounded workspace and a randomized initial TCP position above the container. A spline‑generated reference trajectory is split into three phases: approach (A), horizontal alignment (B), and lift (C). Domain randomization is applied at reset: payload mass and center‑of‑mass, actuator stiffness/damping, friction of the passive discharge‑unit joints, and scaling of the admittance gains are sampled from a range
Comments & Academic Discussion
Loading comments...
Leave a Comment