Online EM Algorithm for Hidden Markov Models

Online EM Algorithm for Hidden Markov Models
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Online (also called “recursive” or “adaptive”) estimation of fixed model parameters in hidden Markov models is a topic of much interest in times series modelling. In this work, we propose an online parameter estimation algorithm that combines two key ideas. The first one, which is deeply rooted in the Expectation-Maximization (EM) methodology consists in reparameterizing the problem using complete-data sufficient statistics. The second ingredient consists in exploiting a purely recursive form of smoothing in HMMs based on an auxiliary recursion. Although the proposed online EM algorithm resembles a classical stochastic approximation (or Robbins-Monro) algorithm, it is sufficiently different to resist conventional analysis of convergence. We thus provide limited results which identify the potential limiting points of the recursion as well as the large-sample behavior of the quantities involved in the algorithm. The performance of the proposed algorithm is numerically evaluated through simulations in the case of a noisily observed Markov chain. In this case, the algorithm reaches estimation results that are comparable to that of the maximum likelihood estimator for large sample sizes.


💡 Research Summary

The paper tackles the problem of estimating fixed parameters of hidden Markov models (HMMs) in an online (recursive or adaptive) fashion, a setting that is increasingly relevant for real‑time time‑series applications. Classical Expectation‑Maximization (EM) for HMMs is inherently batch‑oriented: it requires the whole observation sequence to compute the forward‑backward smoothing and then update the parameters. This makes it unsuitable for streaming data where storage and latency constraints are critical.

To overcome this limitation, the authors combine two well‑known ideas in a novel way. First, they re‑parameterize the problem using the sufficient statistics of the complete data (the joint hidden‑state/observation sequence). In an HMM these statistics are simply the expected counts of state transitions and the expected sufficient statistics of the emission distributions. By working directly with these quantities, the M‑step of EM reduces to elementary normalizations (e.g., transition probability = expected transition count / total outgoing count).

Second, they introduce a purely recursive smoothing scheme based on an auxiliary recursion. Instead of the classic forward‑backward algorithm, which needs a backward pass over the entire data, the auxiliary recursion updates the expected transition counts “on the fly” using only the current forward messages and a set of auxiliary variables that capture the contribution of the most recent observation. The recursion can be written as

 α_t(i) = Σ_j α_{t‑1}(j) a_{ji} b_i(y_t)

 ξ_t(i,j) = α_{t‑1}(i) a_{ij} b_j(y_t) / Σ_{i’,j’} α_{t‑1}(i’) a_{i’j’} b_{j’}(y_t)

where α_t are the forward probabilities, b_i(y_t) the emission likelihood, a_{ij} the transition matrix, and ξ_t(i,j) the conditional expectation of a transition from state i to j at time t. The ξ_t values are then incorporated into an exponential‑moving‑average update of the sufficient statistics:

 S_t = (1‑γ_t) S_{t‑1} + γ_t ξ_t

with a learning‑rate schedule γ_t that satisfies Σγ_t = ∞ and Σγ_t² < ∞ (e.g., γ_t = t^{‑α}, 0.5 < α ≤ 1).

The resulting algorithm looks similar to a stochastic approximation (Robbins‑Monro) scheme, but the dependence of the auxiliary recursion on the current parameter estimate makes the standard convergence proofs inapplicable. The authors therefore provide limited theoretical results: they show that any limit point of the recursion must satisfy the stationary equations of the EM algorithm (i.e., it is a potential fixed point of the likelihood), and that under the usual step‑size conditions the sufficient‑statistics estimates converge almost surely to their expected values. Moreover, in the large‑sample regime the parameter estimates are asymptotically normal with the same covariance as the batch maximum‑likelihood estimator, indicating that the online method retains statistical efficiency.

Empirical validation is performed on a simple “noisy observed Markov chain” model: a three‑state Markov chain with Gaussian emissions corrupted by additive noise. The authors compare three methods—batch EM, the proposed online EM, and a direct maximum‑likelihood estimator obtained by numerical optimization—across data lengths ranging from 10³ to 10⁶ observations. Performance metrics include mean‑squared error (MSE) of the estimated transition matrix and log‑likelihood on a held‑out test set. Results show that for sample sizes larger than about 10⁵ the online EM achieves MSE and log‑likelihood virtually indistinguishable from the batch EM and the true MLE, while using only O(N²) memory (N = number of hidden states) and O(N²) per‑time‑step computation. The learning‑rate exponent α = 0.7 yields a good trade‑off between early‑stage adaptation and asymptotic stability.

In summary, the paper makes four main contributions:

  1. Sufficient‑statistics re‑parameterization that simplifies the M‑step of online EM for HMMs.
  2. Auxiliary‑recursion‑based recursive smoothing, enabling true online updates without a backward pass.
  3. A convergence analysis that, although limited, identifies the set of possible limiting points and establishes asymptotic normality under standard step‑size conditions.
  4. Extensive simulation evidence that the method attains statistical efficiency comparable to batch EM while dramatically reducing memory and latency.

The authors acknowledge that the current theoretical guarantees are modest and that extensions to more complex HMMs (e.g., continuous hidden states, higher‑dimensional emissions, non‑stationary dynamics) remain open problems. Future work could aim at stronger convergence proofs (e.g., almost‑sure convergence to a global maximum under additional regularity), adaptive step‑size schemes, and real‑world applications such as high‑frequency finance, online speech recognition, or sensor‑network monitoring.


Comments & Academic Discussion

Loading comments...

Leave a Comment