Why Look at It at All?: Vision-Free Multifingered Blind Grasping Using Uniaxial Fingertip Force Sensing
Grasping under limited sensing remains a fundamental challenge for real-world robotic manipulation, as vision and high-resolution tactile sensors often introduce cost, fragility, and integration complexity. This work demonstrates that reliable multifingered grasping can be achieved under extremely minimal sensing by relying solely on uniaxial fingertip force feedback and joint proprioception, without vision or multi-axis/tactile sensing. To enable such blind grasping, we employ an efficient teacher-student training pipeline in which a reinforcement-learned teacher exploits privileged simulation-only observations to generate demonstrations for distilling a transformer-based student policy operating under partial observation. The student policy is trained to act using only sensing modalities available at real-world deployment. We validate the proposed approach on real hardware across 18 objects, including both in-distribution and out-of-distribution cases, achieving a 98.3~$%$ overall grasp success rate. These results demonstrate strong robustness and generalization beyond the simulation training distribution, while significantly reducing sensing requirements for real-world grasping systems.
💡 Research Summary
The paper tackles the long‑standing problem of robotic grasping under severely limited sensing. Instead of relying on expensive RGB‑D cameras, high‑resolution tactile skins, or multi‑axis force/torque sensors, the authors demonstrate that a three‑finger, nine‑degree‑of‑freedom gripper equipped only with uniaxial fingertip force sensors (simple FSR‑type devices) and joint proprioception can achieve robust, dexterous grasping. The key technical contribution is a two‑stage teacher‑student learning pipeline that bridges the gap between privileged simulation observations and the partial observations available on the real robot.
In the first stage, a “teacher” policy πₜ is trained in IsaacLab using Proximal Policy Optimization (PPO) with a rich privileged observation vector (≈95 dimensions) that includes joint positions/velocities, full 6‑DoF object pose, 3‑axis contact forces, projected uniaxial forces, planar distance to the object, and the previous action. The reward function is carefully crafted: a task reward that encourages the object to rise to a target height while minimizing the planar distance, an incentive reward that gives a binary bonus only when all three fingertips are in contact (thereby forcing the policy to exploit the uniaxial force signal), and three penalty terms that discourage joint‑limit violations, large actions, and abrupt action changes. Domain randomization is applied extensively—joint offsets, object positions, friction coefficients, masses, actuator gains, and random external forces are sampled from wide ranges, and Gaussian noise is added to both joint angles (σ=0.005 rad) and force readings (σ=0.5 N). This results in a teacher that learns a smooth, contact‑rich grasping behavior that is robust to variations in dynamics and external disturbances. Training runs in 9 000 parallel environments (18 object types × 500 seeds) on a single RTX 5080 GPU and converges in roughly three hours.
The second stage distills the teacher’s expertise into a “student” policy πₛ that operates under the real‑world sensor suite: nine joint positions, nine joint velocities, and three uniaxial force readings (total 21 scalar inputs). The student uses a transformer architecture to capture temporal dependencies in the proprioceptive and force streams, enabling it to infer contact states and plan over longer horizons despite the limited observation space. Demonstrations are generated by rolling out the teacher in simulation; successful trajectories (those that achieve a stable lift) are collected and used for behavioral cloning. The student is trained with standard imitation‑learning losses, action clipping, and a KL‑divergence constraint to keep it close to the teacher while allowing fine‑tuning to the reduced observation space.
Hardware validation is performed on a custom three‑finger gripper derived from the open‑source D’Claw platform, with each fingertip fitted with the uniaxial sensor. The system is tested on 18 geometric objects: six shapes (two cuboids, a capsule, two cylinders, a sphere) each in three sizes. Six objects belong to the training distribution, while the remaining twelve are out‑of‑distribution (different sizes or shapes). Across 200 grasp attempts (≈10 per object), the system achieves an overall success rate of 98.3 %. Success is defined as establishing a stable three‑finger contact, lifting the object at least 10 cm, and maintaining grasp without dropping. The policy remains effective under random external pushes applied during the lift, confirming the robustness imparted by the domain randomization and the incentive reward that forces reliance on force feedback.
The authors claim four primary contributions: (1) proof that uniaxial force sensing plus proprioception suffices for reliable multi‑finger grasping, dramatically lowering hardware cost and integration complexity; (2) a teacher‑student framework that leverages privileged simulation data to train a partial‑observation policy efficiently; (3) the use of a transformer‑based student to handle temporal sequences of minimal sensory data; and (4) extensive randomization and noise injection that enable strong sim‑to‑real transfer and generalization to unseen objects.
Future work suggested includes scaling to more fingers or asymmetric grippers, extending to dynamic manipulation tasks such as in‑hand reorientation, assembly, or tool use, and exploring hybrid sensing where a cheap uniaxial sensor is combined with low‑resolution multi‑axis sensors to recover limited torque information. The presented approach offers a practical pathway for industrial and service robots that need robust manipulation without the expense and fragility of vision‑centric or high‑fidelity tactile systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment