Deciding of HMM parameters based on number of critical points for gesture recognition from motion capture data

Deciding of HMM parameters based on number of critical points for   gesture recognition from motion capture data
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This paper presents a method of choosing number of states of a HMM based on number of critical points of the motion capture data. The choice of Hidden Markov Models(HMM) parameters is crucial for recognizer’s performance as it is the first step of the training and cannot be corrected automatically within HMM. In this article we define predictor of number of states based on number of critical points of the sequence and test its effectiveness against sample data.


💡 Research Summary

The paper addresses a fundamental problem in hidden Markov model (HMM) based gesture recognition: how to choose the number of hidden states (n) before training. Conventional practice relies on exhaustive search over a range of n values, evaluating each model with criteria such as Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC). This approach is computationally expensive, especially when dealing with large motion‑capture datasets and real‑time applications.
The authors propose a data‑driven estimator that derives n directly from the structural properties of the input sequences. Specifically, they treat each sensor channel of a motion‑capture glove as a time series, normalize it, and count its “critical points” – the two boundary points, all local maxima, and all local minima. The total count cp = (#local maxima) + (#local minima) + 2 is taken as a proxy for the intrinsic complexity of the signal. Three variants are examined: (i) cp (all points), (ii) cp‑2 (excluding the two boundaries), and (iii) cp‑1 (interpreted as the number of trends).
To evaluate the estimator, the authors follow a multi‑step pipeline:

  1. Resampling and Normalization – each raw sensor vector is interpolated to a fixed length M = 64 and standardized (zero mean, unit variance).
  2. Critical‑Point Detection – with a neighbourhood size γ = 1, a point is declared a local maximum (minimum) if it exceeds (is lower than) both immediate neighbours. The two ends are always counted as critical points.
  3. Clustering (Quantization) – the normalized values are quantized using k‑means clustering into c discrete symbols, where c ranges from 4 to 11. This yields a symbolic observation sequence suitable for discrete‑state HMMs.
  4. HMM Training – for each combination of sensor (j), gesture class (i), cluster count (c), and candidate state number n (also ranging from 4 to 11), a separate HMM λ(Fijc, n) is trained with the Baum‑Welch algorithm.
  5. Model Evaluation – the log‑likelihood of each trained model on its training sequences is computed, and AIC is calculated as AIC = –2·log‑likelihood + 2·q, where q = n² reflects the number of free parameters in the transition matrix (emission matrix size is ignored for simplicity).
  6. Performance Metric ξ – for each dataset Fijc, the authors compute AICmin (the smallest AIC across all n), AICmax (the largest), and AICcp (AIC for the state number suggested by the critical‑point predictor). The normalized distance ξ = (AICcp – AICmin) / (AICmax – AICmin) lies in

Comments & Academic Discussion

Loading comments...

Leave a Comment