CMD-HAR: Cross-Modal Disentanglement for Wearable Human Activity Recognition

CMD-HAR: Cross-Modal Disentanglement for Wearable Human Activity Recognition
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Human Activity Recognition (HAR) is a fundamental technology for numerous human - centered intelligent applications. Although deep learning methods have been utilized to accelerate feature extraction, issues such as multimodal data mixing, activity heterogeneity, and complex model deployment remain largely unresolved. The aim of this paper is to address issues such as multimodal data mixing, activity heterogeneity, and complex model deployment in sensor-based human activity recognition. We propose a spatiotemporal attention modal decomposition alignment fusion strategy to tackle the problem of the mixed distribution of sensor data. Key discriminative features of activities are captured through cross-modal spatio-temporal disentangled representation, and gradient modulation is combined to alleviate data heterogeneity. In addition, a wearable deployment simulation system is constructed. We conducted experiments on a large number of public datasets, demonstrating the effectiveness of the model.


💡 Research Summary

The paper introduces CMD‑HAR, a novel framework for wearable sensor‑based human activity recognition (HAR) that simultaneously tackles three persistent challenges: multimodal data mixing, activity heterogeneity across users, and the deployment of complex models on resource‑constrained devices. The authors propose a four‑stage architecture. First, a Channel Expansion module partitions input feature maps into cardinality groups and radix‑aware sub‑groups, applying a radix‑aware spatial attention to dynamically weight each subgroup and align disparate sensor modalities. Second, a Spatiotemporal Disentanglement mechanism employs separate self‑attention branches for spatial (S(·)) and temporal (T(·)) dimensions, producing distinct spatial and temporal representations while also generating a shared cross‑modal representation via cross‑attention. A disentanglement loss comprising three L2 distance terms (spatial‑temporal, shared‑spatial, shared‑temporal) encourages independence between the three representations while minimizing intra‑modal redundancy. Third, an Adaptive Modulation Gradient Balancing (AMGB) component monitors modality‑specific gradient magnitudes (‖∇θ Lₘ‖²) during back‑propagation, computes dominance ratios ρₘ, and applies a hyperbolic‑tangent‑based modulation factor γₘ to suppress over‑dominant modalities and boost under‑represented ones. This dynamic gradient re‑weighting ensures balanced learning across modalities. Finally, a lightweight deployment pipeline is validated on Raspberry Pi 5 hardware, measuring parameter count, memory footprint, and inference latency (≈28 ms per sample), demonstrating feasibility for real‑time wearable applications. Extensive experiments on five public HAR datasets (UCI HAR, PAMAP2, HHAR, WISDM, OPPORTUNITY) show that CMD‑HAR achieves an average accuracy of 99.75 %, outperforming state‑of‑the‑art methods by 2–3 % on complex activities. Ablation studies confirm that each component contributes positively: channel expansion improves modality alignment, spatiotemporal disentanglement adds ~1.2 %p accuracy, and gradient modulation adds ~0.9 %p. Moreover, the model maintains high performance across different users without requiring personalized fine‑tuning, indicating effective mitigation of activity heterogeneity. In summary, CMD‑HAR provides a principled solution that disentangles spatial and temporal information, dynamically balances multimodal gradients, and is validated on actual low‑power hardware, thereby bridging the gap between advanced HAR research and practical wearable deployment. Future work may explore broader sensor suites, ultra‑low‑power optimizations, and online continual learning for lifelong activity monitoring.


Comments & Academic Discussion

Loading comments...

Leave a Comment