ChicGrasp: Imitation-Learning based Customized Dual-Jaw Gripper Control for Delicate, Irregular Bio-products Manipulation
Automated poultry processing lines still rely on humans to lift slippery, easily bruised carcasses onto a shackle conveyor. Deformability, anatomical variance, and strict hygiene rules make conventional suction and scripted motions unreliable. We present ChicGrasp, an end–to–end hardware–software co-design for this task. An independently actuated dual-jaw pneumatic gripper clamps both chicken legs, while a conditional diffusion-policy controller, trained from only 50 multi–view teleoperation demonstrations (RGB + proprioception), plans 5 DoF end–effector motion, which includes jaw commands in one shot. On individually presented raw broiler carcasses, our system achieves a 40.6% grasp–and–lift success rate and completes the pick to shackle cycle in 38 s, whereas state–of–the–art implicit behaviour cloning (IBC) and LSTM-GMM baselines fail entirely. All CAD, code, and datasets will be open-source. ChicGrasp shows that imitation learning can bridge the gap between rigid hardware and variable bio–products, offering a reproducible benchmark and a public dataset for researchers in agricultural engineering and robot learning.
💡 Research Summary
The paper introduces ChicGrasp, an end‑to‑end hardware‑software system designed to automate the “pick‑and‑rehang” step in poultry processing, a task traditionally performed by human workers. The core hardware innovation is a custom 2‑DoF pneumatic dual‑jaw gripper that independently actuates each jaw. Each jaw terminates in 30‑degree chevron ridges fabricated by additive manufacturing, converting normal clamping force into tangential friction to securely grip wet, slippery chicken legs without bruising or piercing tissue. The gripper is driven by double‑acting parallel air cylinders, controlled via a 5/2‑way solenoid valve and an Arduino Uno R4 that receives high‑level commands from the host PC.
For perception, the system employs three RGB‑D cameras: two fixed side views to reduce occlusions and a wrist‑mounted eye‑in‑hand camera that keeps both jaws in view during demonstrations. During tele‑operation data collection, a UR10e robot was guided with a 3Dconnexion SpaceMouse, capturing 50 multi‑view demonstrations (each 25–40 s) that include synchronized RGB streams (640 × 480 @ 30 Hz), joint positions/velocities (100 Hz), and binary jaw states (100 Hz).
The control policy is a conditional diffusion model (Diffusion Policy) that maps the current observation—stacked RGB frames, robot proprioception, and jaw states—to a distribution over a 5‑dimensional action vector aₜ =
Comments & Academic Discussion
Loading comments...
Leave a Comment