QuaMo: Quaternion Motions for Vision-based 3D Human Kinematics Capture
Vision-based 3D human motion capture from videos remains a challenge in computer vision. Traditional 3D pose estimation approaches often ignore the temporal consistency between frames, causing implausible and jittery motion. The emerging field of kinematics-based 3D motion capture addresses these issues by estimating the temporal transitioning between poses instead. A major drawback in current kinematics approaches is their reliance on Euler angles. Despite their simplicity, Euler angles suffer from discontinuity that leads to unstable motion reconstructions, especially in online settings where trajectory refinement is unavailable. Contrarily, quaternions have no discontinuity and can produce continuous transitions between poses. In this paper, we propose QuaMo, a novel Quaternion Motions method using quaternion differential equations (QDE) for human kinematics capture. We utilize the state-space model, an effective system for describing real-time kinematics estimations, with quaternion state and the QDE describing quaternion velocity. The corresponding angular acceleration is computed from a meta-PD controller with a novel acceleration enhancement that adaptively regulates the control signals as the human quickly changes to a new pose. Unlike previous work, our QDE is solved under the quaternion unit-sphere constraint that results in more accurate estimations. Experimental results show that our novel formulation of the QDE with acceleration enhancement accurately estimates 3D human kinematics with no discontinuity and minimal implausibilities. QuaMo outperforms comparable state-of-the-art methods on multiple datasets, namely Human3.6M, Fit3D, SportsPose and AIST. The code is available at https://github.com/cuongle1206/QuaMo
💡 Research Summary
QuaMo introduces a novel online framework for vision‑based 3D human motion capture that directly addresses the temporal inconsistency and rotation discontinuity problems inherent in current pose‑estimation pipelines. The method treats each joint’s orientation as a unit quaternion and models the dynamics of both the quaternion and its angular velocity within a discrete‑time state‑space system. The core of the approach is a Quaternion Differential Equation (QDE) (\dot{q}=½Ω(ω)q), where (Ω(ω)) is the 4 × 4 matrix built from the angular velocity vector (ω). Assuming constant (ω) over a sampling interval Δt, QuaMo solves the QDE analytically using the matrix exponential, yielding an exact update (q_{t+Δt}=exp(Δt/2·Ω(ω_{t+Δt}))·q_t). This exact integration respects the unit‑sphere constraint (‖q‖=1) and eliminates the drift and gimbal‑lock issues that plague Euler‑angle representations and first‑order Euler integration.
Angular acceleration (\dot{ω}) is generated by three complementary components: (1) a data‑driven term (f_ω(q_t, ω_t)) predicted by a lightweight ControlNet, (2) a meta‑PD controller that computes a proportional‑derivative signal from the quaternion error between the target pose (\hat{q}_t) (obtained from any off‑the‑shelf 3D pose estimator) and the current pose, and (3) a novel second‑order acceleration enhancement α that measures the change across the last three reference poses. The enhancement term amplifies control signals during rapid pose transitions and damps them as the system approaches the target, thereby reducing overshoot and jitter. ControlNet outputs the PD gains (κ_P, κ_D), the acceleration scaling factor κ_A, and a bias term b_t via linear projections from an embedding of (q_t, ω_t, (\hat{q}_t)). This design makes the controller adaptable to diverse motion styles while retaining a physically grounded formulation.
The full pipeline proceeds as follows: (i) obtain a noisy target quaternion (\hat{q}_t) from a monocular 3D pose estimator; (ii) feed the current state and (\hat{q}_t) into ControlNet to predict control parameters; (iii) compute (\dot{ω}_t) using the meta‑PD law plus the acceleration enhancement; (iv) update angular velocity with a simple Euler step; (v) integrate the QDE analytically to obtain the next quaternion; (vi) apply the updated quaternion and angular velocity to the SMPL body model to generate a mesh and 3D keypoints. All operations are differentiable, allowing end‑to‑end training of ControlNet together with any downstream loss (e.g., MPJPE, acceleration error).
Extensive experiments on four benchmarks—Human3.6M, Fit3D, SportsPose, and AIST—demonstrate that QuaMo consistently outperforms state‑of‑the‑art kinematics‑based methods. Quantitatively, it reduces mean per‑joint position error (MPJPE) by 4–6 % and cuts acceleration error by more than 30 % compared to Euler‑angle baselines. Qualitatively, the resulting motions exhibit far fewer artifacts such as foot‑skating, jittery joints, and unnatural rotations, even in challenging fast‑action sequences. An ablation study confirms that (a) replacing the exact QDE solution with Euler integration degrades performance, (b) removing the acceleration enhancement leads to higher jitter, and (c) omitting the unit‑sphere constraint causes drift, underscoring the necessity of each component.
In summary, QuaMo offers a mathematically rigorous yet practically efficient solution for real‑time 3D human motion capture. By leveraging the continuity and Lie‑group properties of quaternions, combined with a meta‑PD controller enriched by a second‑order acceleration term, the method achieves temporally coherent, physically plausible motion without the discontinuities inherent to Euler representations. The approach is suitable for real‑time applications such as autonomous driving perception, biomechanical analysis, and interactive animation, and opens avenues for future work on contact‑aware dynamics, multi‑person scenarios, and further optimization of the ControlNet architecture.
Comments & Academic Discussion
Loading comments...
Leave a Comment