Uncertainty-Aware Non-Prehensile Manipulation with Mobile Manipulators under Object-Induced Occlusion
Non-prehensile manipulation using onboard sensing presents a fundamental challenge: the manipulated object occludes the sensor’s field of view, creating occluded regions that can lead to collisions. We propose CURA-PPO, a reinforcement learning framework that addresses this challenge by explicitly modeling uncertainty under partial observability. By predicting collision possibility as a distribution, we extract both risk and uncertainty to guide the robot’s actions. The uncertainty term encourages active perception, enabling simultaneous manipulation and information gathering to resolve occlusions. When combined with confidence maps that capture observation reliability, our approach enables safe navigation despite severe sensor occlusion. Extensive experiments across varying object sizes and obstacle configurations demonstrate that CURA-PPO achieves up to 3X higher success rates than the baselines, with learned behaviors that handle occlusions. Our method provides a practical solution for autonomous manipulation in cluttered environments using only onboard sensing.
💡 Research Summary
This paper tackles a practical yet under‑explored problem in mobile manipulation: when a robot pushes an object using only onboard sensing, the object itself blocks the sensor’s field of view, creating occluded regions that can hide unforeseen obstacles. The authors formulate the task as a partially observable Markov decision process (POMDP) and introduce CURA‑PPO (Collision Uncertainty‑Risk Aware Proximal Policy Optimization), a reinforcement‑learning framework that explicitly models both collision risk and perceptual uncertainty.
The core technical contribution is the Distributional Collision Estimator (DCE). For a given observation oₜ, DCE predicts a set of quantiles (N = 50) of the discounted cumulative collision indicator Cπ, effectively representing a probability distribution over future collisions. From this distribution the algorithm extracts two scalar signals: the mean (risk R) and the variance (uncertainty U). Risk reflects how likely a collision is, while uncertainty captures how unreliable the prediction is due to partial observability. Both signals are fed back as intrinsic costs C_R and C_U in the PPO surrogate objective, encouraging the policy to avoid high‑risk actions and to seek actions that reduce uncertainty (e.g., moving to gain a better view).
To provide a reliable perception backbone, the method builds a confidence map from 2D LiDAR scans. Each cell’s confidence decays exponentially (α = 0.9) over time, ensuring that stale measurements are down‑weighted. A local 100 × 100 pixel window around the robot is encoded by a pretrained variational auto‑encoder (VAE) into a latent vector zₜ, which is supplied to the policy alongside proprioceptive data, object pose/velocity, and the goal pose. The policy network outputs desired base and end‑effector velocities; these are transformed into joint targets via differential inverse kinematics and executed with joint‑impedance control.
Training is performed in Isaac Sim with 2048 parallel environments on an RTX 4090 GPU. The PPO loss is augmented with the risk and uncertainty penalties, and DCE is trained simultaneously using an energy‑distance loss that aligns predicted quantile distributions with Bellman‑updated targets. Experiments vary object size, obstacle number, placement, and the timing of dynamic obstacles. Compared against three baselines—standard PPO, risk‑only PPO, and a non‑distributional collision predictor—CURA‑PPO achieves up to three times higher success rates (≈78 % vs. ≈26 % for vanilla PPO) and reduces average collisions by more than 40 %. Qualitatively, the learned policy exhibits active‑perception behaviors: it occasionally backs up or rotates the object to uncover hidden regions before proceeding, a pattern absent in the baselines.
Limitations include reliance on a 2‑D LiDAR (no height information), potential sim‑to‑real gaps, and the computational overhead of predicting many quantiles. Future work is suggested to integrate 3‑D depth sensors, apply domain randomization for real‑world transfer, and extend the framework to multi‑robot scenarios where collective uncertainty reduction could be coordinated.
In summary, CURA‑PPO demonstrates that embedding distributional collision predictions and explicit uncertainty costs into RL yields safe, information‑seeking non‑prehensile manipulation even when the manipulated object severely occludes the robot’s own sensors. This represents a significant step toward robust, autonomous manipulation in cluttered, partially observable environments using only onboard sensing.
Comments & Academic Discussion
Loading comments...
Leave a Comment