Improving Sparse IMU-based Motion Capture with Motion Label Smoothing
📝 Abstract
Sparse Inertial Measurement Units (IMUs) based human motion capture has gained significant momentum, driven by the adaptation of fundamental AI tools such as recurrent neural networks (RNNs) and transformers that are tailored for temporal and spatial modeling. Despite these achievements, current research predominantly focuses on pipeline and architectural designs, with comparatively little attention given to regularization methods, highlighting a critical gap in developing a comprehensive AI toolkit for this task. To bridge this gap, we propose motion label smoothing, a novel method that adapts the classic label smoothing strategy from classification to the sparse IMU-based motion capture task. Specifically, we first demonstrate that a naive adaptation of label smoothing, including simply blending a uniform vector or a “uniform” motion representation (e.g., dataset-average motion or a canonical T-pose), is suboptimal; and argue that a proper adaptation requires increasing the entropy of the smoothed labels. Second, we conduct a thorough analysis of human motion labels, identifying three critical properties: 1) Temporal Smoothness, 2) Joint Correlation, and 3) Low-Frequency Dominance, and show that conventional approaches to entropy enhancement (e.g., blending Gaussian noise) are ineffective as they disrupt these properties. Finally, we propose the blend of a novel skeleton-based Perlin noise for motion label smoothing, designed to raise label entropy while satisfying motion properties. Extensive experiments applying our motion label smoothing to three state-of-the-art methods across four real-world IMU datasets demonstrate its effectiveness and robust generalization (plug-and-play) capability.
💡 Analysis
Sparse Inertial Measurement Units (IMUs) based human motion capture has gained significant momentum, driven by the adaptation of fundamental AI tools such as recurrent neural networks (RNNs) and transformers that are tailored for temporal and spatial modeling. Despite these achievements, current research predominantly focuses on pipeline and architectural designs, with comparatively little attention given to regularization methods, highlighting a critical gap in developing a comprehensive AI toolkit for this task. To bridge this gap, we propose motion label smoothing, a novel method that adapts the classic label smoothing strategy from classification to the sparse IMU-based motion capture task. Specifically, we first demonstrate that a naive adaptation of label smoothing, including simply blending a uniform vector or a “uniform” motion representation (e.g., dataset-average motion or a canonical T-pose), is suboptimal; and argue that a proper adaptation requires increasing the entropy of the smoothed labels. Second, we conduct a thorough analysis of human motion labels, identifying three critical properties: 1) Temporal Smoothness, 2) Joint Correlation, and 3) Low-Frequency Dominance, and show that conventional approaches to entropy enhancement (e.g., blending Gaussian noise) are ineffective as they disrupt these properties. Finally, we propose the blend of a novel skeleton-based Perlin noise for motion label smoothing, designed to raise label entropy while satisfying motion properties. Extensive experiments applying our motion label smoothing to three state-of-the-art methods across four real-world IMU datasets demonstrate its effectiveness and robust generalization (plug-and-play) capability.
📄 Content
Human motion capture plays a critical role in diverse domains, including film production (Menache 2000), interactive gaming (Geng and Yu 2003), and medical rehabilitation (Mousavi Hondori and Khademi 2014). Recently, sparse Inertial Measurement Units (IMUs) based motion capture systems have emerged as a lightweight yet promising alternative.
These systems achieve real-time human motion reconstruction using only six IMUs strategically positioned on the
Figure 1: A live comparison between the state-of-the-art sparse IMU-based motion capture system, GlobalPose (Yi, Pan, and Xu 2025) (left, red), and its improved variant enhanced with our motion label smoothing technique (right, green) clearly illustrates the effectiveness of our method. wrists, ankles, head, and hips. This minimal configuration offers compelling advantages in portability, affordability, and resilience to occlusions or lighting variations, making them highly suitable for ubiquitous motion capture scenarios.
Fueled by recent advances in AI, sparse IMU-based motion capture has made remarkable progress. Early methods (Huang et al. 2018;Yi, Zhou, and Xu 2021) aged RNNs to reconstruct human motion; TIP (Jiang et al. 2022) introduced transformer architectures to improve accuracy; PIP (Yi et al. 2022) enhanced RNNs with hidden-state initialization to disambiguate complex motions; PNP (Yi, Zhou, and Xu 2024) calibrated acceleration signals via an autoregressive MLP. While effective, these approaches largely focus on adapting core neural architectures for temporal and spatial modeling to the task. In contrast, regularization, an equally critical component of deep learning, remains largely unexplored, revealing a key gap in building a more comprehensive AI toolkit for sparse IMU-based motion capture.
In this paper, we address this gap by introducing motion label smoothing, an adaptation of the classic label smoothing technique (Szegedy et al. 2016) tailored for sparse IMUbased motion capture. While it may appear straightforward, this adaptation poses significant challenges. Specifically, a naive adaptation of label smoothing from classification tasks (Szegedy et al. 2016;Müller, Kornblith, and Hinton 2019), such as blending the ground-truth label with a uniform label vector or a “uniform” motion representation (e.g., dataset-average motion or a canonical T-pose), proves suboptimal. We argue that this stems from a fundamental misinterpretation of “smoothness” in label smoothing: it is meant to increase label entropy, not merely enforce uniformity across label vectors. In classification, incorporating a uniform vector supports this objective; in motion capture, however, a “uniform” motion collapses into a static pose (e.g., a T-pose), paradoxically reducing entropy rather than enhancing it. Therefore, to properly adapt label smoothing for sparse IMU-based motion capture, we first conduct a rigorous analysis of motion labels, identifying their three key properties: (1) Temporal Smoothness: motion evolves continuously over time; (2) Joint Correlation: rotations of adjacent joints within a kinematic chain are inherently linked; and (3) Low-Frequency Dominance: joint rotation signals in Euclidean space are dominated by low-frequency components. Building on these properties, we demonstrate that naive entropy-enhancement strategies, such as blending Gaussian or uniform noise, are ineffective as they inevitably disrupt these intrinsic characteristics. To address this challenge, we propose a novel skeleton-based Perlin noise method for motion label smoothing. Specifically, we first map joint rotations from the SO(3) manifold to a tractable Euclidean R6D representation (Zhou et al. 2019), where these motion properties can be explicitly modeled. In this space, we construct a structured Perlin noise field whose spatial distribution encodes joint correlations (via kinematic chains) and whose temporal continuity preserves smoothness while still raising label entropy. Overlaying this skeleton-based noise onto motion labels yields smoothed labels that adhere to the principles of label smoothing while respecting the properties of human motion. We conduct comparison experiments across three state-of-the-art sparse IMU-based motion capture models and four real-world IMU datasets. Empirically, we also compare our method to naive adaptations of label smoothing and other label modification strategies. Extensive experimental results demonstrate that our method consistently improves motion capture perfor-mance and outperforms competing strategies.
In summary, our contributions are: • We propose motion label smoothing, a novel adaptation of the classic label smoothing technique specifically tailored for sparse IMU-based motion capture, featuring the blending of a skeleton-based Perlin noise to motion data. • To justify its necessity, we identify why a naive adaptation of label smoothing from classification is suboptimal, showing that they misinterpret “smoothness” as unifo
This content is AI-processed based on ArXiv data.