Human Activity Learning and Segmentation using Partially Hidden Discriminative Models

Human Activity Learning and Segmentation using Partially Hidden   Discriminative Models
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Learning and understanding the typical patterns in the daily activities and routines of people from low-level sensory data is an important problem in many application domains such as building smart environments, or providing intelligent assistance. Traditional approaches to this problem typically rely on supervised learning and generative models such as the hidden Markov models and its extensions. While activity data can be readily acquired from pervasive sensors, e.g. in smart environments, providing manual labels to support supervised training is often extremely expensive. In this paper, we propose a new approach based on semi-supervised training of partially hidden discriminative models such as the conditional random field (CRF) and the maximum entropy Markov model (MEMM). We show that these models allow us to incorporate both labeled and unlabeled data for learning, and at the same time, provide us with the flexibility and accuracy of the discriminative framework. Our experimental results in the video surveillance domain illustrate that these models can perform better than their generative counterpart, the partially hidden Markov model, even when a substantial amount of labels are unavailable.


💡 Research Summary

The paper addresses the problem of learning and segmenting human activities from low‑level sensor streams when only a fraction of the activity labels are available. Traditional approaches rely on fully supervised training of generative models such as Hidden Markov Models (HMMs) and their hierarchical extensions. While sensor data can be collected cheaply in smart environments, manual annotation is costly, motivating a semi‑supervised solution.

The authors propose partially hidden discriminative models: Conditional Random Fields (CRFs) and Maximum Entropy Markov Models (MEMMs). In both models the label sequence y is split into a visible part v (labels that are known, either manually or via reliable sensors) and a hidden part h (unknown labels). The conditional probability p(v|x; λ) is obtained by marginalising over h: p(v|x; λ)=∑ₕ p(v,h|x; λ). For CRFs the standard chain‑structured log‑linear formulation is used, with feature functions fₖ(yₜ₋₁, yₜ, x) and parameters λₖ. MEMMs are reformulated so that a single parameter set is shared across all source states; a sliding window Ωₜ around time t provides contextual observation features, allowing the model to capture temporal dependencies beyond the immediate observation.

Training with partially observed labels is performed using the Expectation‑Maximisation (EM) framework. In the E‑step the posterior distribution over hidden labels p(h|v,x; λʲ) is computed under the current parameters λʲ. The M‑step maximises a regularised lower bound Q(λʲ, λ)=∑ₕ p(h|v,x; λʲ) log p(v,h|x; λ)−½σ⁻²‖λ‖². Because both CRFs and MEMMs are log‑linear, closed‑form solutions are unavailable; the authors employ quasi‑Newton optimisation (e.g., L‑BFGS) for CRFs and a similar gradient‑based update for MEMMs. A Gaussian prior (σ) prevents over‑fitting when training data are scarce.

Experiments were conducted in a video surveillance setting. A 4 × 6 m² dining‑kitchen area was monitored by two static cameras. The actor performed a series of primitive movements among six landmarks (door, cupboard, fridge, stove, TV chair, dining chair). Three activity scenarios—short meal, snack, and normal meal—were recorded, yielding multiple training and test video sequences. Labels consist of 12 primitive actions (e.g., “door → cupboard”). The authors varied the proportion of visible labels from 100 % down to 10 % and compared partially hidden CRF, MEMM, and the partially hidden HMM (PHMM) introduced in earlier work.

Results show that when labels are abundant all three models achieve high accuracy, but as the visible‑label ratio drops below 30 % the discriminative models outperform the PHMM by 8–12 % in F1 score. MEMM, with its contextual window and shared parameters, is particularly robust to severe label sparsity, while CRF benefits from richer feature flexibility at the cost of higher computational demand. The PHMM retains the ability to model p(x) when no labels are present, but its generative nature limits the exploitation of complex observation features.

Key insights include: (1) discriminative models allow arbitrary, possibly overlapping features from multiple sensor streams; (2) semi‑supervised EM training effectively leverages the small set of known labels; (3) MEMM’s parameter sharing and context windows provide a computationally efficient way to capture temporal dependencies; (4) when any label information is available, modeling p(y|x) directly yields superior segmentation performance compared with joint modeling p(y,x).

The work demonstrates that partially hidden CRFs and MEMMs are practical for activity recognition in smart environments where labeling effort is limited. Future directions suggested by the authors involve extending the approach to multi‑sensor fusion, online learning, and interactive labeling interfaces to further reduce annotation cost while maintaining high segmentation quality.


Comments & Academic Discussion

Loading comments...

Leave a Comment