From Vision to Assistance: Gaze and Vision-Enabled Adaptive Control for a Back-Support Exoskeleton
Back-support exoskeletons have been proposed to mitigate spinal loading in industrial handling, yet their effectiveness critically depends on timely and context-aware assistance. Most existing approaches rely either on load-estimation techniques (e.g., EMG, IMU) or on vision systems that do not directly inform control. In this work, we present a vision-gated control framework for an active lumbar occupational exoskeleton that leverages egocentric vision with wearable gaze tracking. The proposed system integrates real-time grasp detection from a first-person YOLO-based perception system, a finite-state machine (FSM) for task progression, and a variable admittance controller to adapt torque delivery to both posture and object state. A user study with 15 participants performing stooping load lifting trials under three conditions (no exoskeleton, exoskeleton without vision, exoskeleton with vision) shows that vision-gated assistance significantly reduces perceived physical demand and improves fluency, trust, and comfort. Quantitative analysis reveals earlier and stronger assistance when vision is enabled, while questionnaire results confirm user preference for the vision-gated mode. These findings highlight the potential of egocentric vision to enhance the responsiveness, ergonomics, safety, and acceptance of back-support exoskeletons.
💡 Research Summary
The paper presents a novel control framework for an active lumbar occupational exoskeleton that directly incorporates egocentric vision and gaze tracking into the assistance loop. Traditional back‑support exoskeletons rely on indirect load‑estimation signals such as EMG or IMU, which react only after the user has already exerted effort, and they lack awareness of the surrounding environment. To address these limitations, the authors equip the user with eye‑tracking glasses that stream RGB video and gaze coordinates. A lightweight YOLOv9 detector, fine‑tuned to distinguish “Grasped” from “Not Grasped” boxes, processes each frame in real time. By mapping the gaze point onto the image, a binary “vision gate” is generated: the gate turns on when the gaze remains inside a “Grasped” bounding box for a configurable dwell time, and turns off when the gaze shifts to a “Not Grasped” box. Temporal filtering (window length and dwell ratio) suppresses spurious gaze fluctuations.
The vision gate feeds a finite‑state machine (FSM) that models the lifting task with four states: (0) standing with no box, (1) bending to pick, (2) standing with box, (3) bending to place. State transitions are triggered by trunk inclination thresholds (θ_stand, θ_bend) and by the vision gate. When the gate is active, the FSM includes a box‑related torque term (τ_box) in the assistance calculation; otherwise only the trunk torque (τ_trunk) is used. The reference assistance torque is defined as τ_ref_ass = γ
Comments & Academic Discussion
Loading comments...
Leave a Comment