Failure-Aware Bimanual Teleoperation via Conservative Value Guided Assistance
Teleoperation of high-precision manipulation is con-strained by tight success tolerances and complex contact dy-namics, which make impending failures difficult for human operators to anticipate under partial observability. This paper proposes a value-guided, failure-aware framework for bimanual teleoperation that provides compliant haptic assistance while pre-serving continuous human authority. The framework is trained entirely from heterogeneous offline teleoperation data containing both successful and failed executions. Task feasibility is mod-eled as a conservative success score learned via Conservative Value Learning, yielding a risk-sensitive estimate that remains reliable under distribution shift. During online operation, the learned success score regulates the level of assistance, while a learned actor provides a corrective motion direction. Both are integrated through a joint-space impedance interface on the master side, yielding continuous guidance that steers the operator away from failure-prone actions without overriding intent. Experimental results on contact-rich manipulation tasks demonstrate improved task success rates and reduced operator workload compared to conventional teleoperation and shared-autonomy baselines, indicating that conservative value learning provides an effective mechanism for embedding failure awareness into bilateral teleoperation. Experimental videos are available at https://www.youtube.com/watch?v=XDTsvzEkDRE
💡 Research Summary
The paper tackles a fundamental challenge in high‑precision bimanual teleoperation: the difficulty of anticipating irreversible failures under partial observability, complex contact dynamics, and communication latency. To address this, the authors propose a failure‑aware shared‑autonomy framework that learns a conservative success score from heterogeneous offline teleoperation datasets containing both successful and failed executions. The success score is obtained via Conservative Value Learning (CVL), a variant of offline reinforcement learning that penalizes over‑optimistic Q‑values for out‑of‑distribution actions, thereby yielding a risk‑sensitive estimate that remains reliable when the operator’s commands deviate from the training distribution.
During online operation, the learned success score (Q_c(s,a)) is evaluated for the current state‑action pair. If the score falls below a predefined threshold (\tau), a gating variable (\lambda_t\in
Comments & Academic Discussion
Loading comments...
Leave a Comment