The adjusted Viterbi training for hidden Markov models
The EM procedure is a principal tool for parameter estimation in the hidden Markov models. However, applications replace EM by Viterbi extraction, or training (VT). VT is computationally less intensive, more stable and has more of an intuitive appeal, but VT estimation is biased and does not satisfy the following fixed point property. Hypothetically, given an infinitely large sample and initialized to the true parameters, VT will generally move away from the initial values. We propose adjusted Viterbi training (VA), a new method to restore the fixed point property and thus alleviate the overall imprecision of the VT estimators, while preserving the computational advantages of the baseline VT algorithm. Simulations elsewhere have shown that VA appreciably improves the precision of estimation in both the special case of mixture models and more general HMMs. However, being entirely analytic, the VA correction relies on infinite Viterbi alignments and associated limiting probability distributions. While explicit in the mixture case, the existence of these limiting measures is not obvious for more general HMMs. This paper proves that under certain mild conditions, the required limiting distributions for general HMMs do exist.
💡 Research Summary
The paper addresses a fundamental limitation of Viterbi training (VT) for hidden Markov models (HMMs): unlike the Expectation‑Maximization (EM) algorithm, VT does not possess the fixed‑point property, meaning that even with an infinite amount of data and initialization at the true parameters, the VT iteration will typically move away from the true values. This deficiency stems from the fact that the empirical measures induced by a Viterbi alignment converge, not to the true emission distributions, but to certain limiting probability measures (Q_l(\psi)) that depend on the current parameter guess (\psi). Consequently, the state‑specific maximum‑likelihood estimates (\hat\mu_{n}^{l}) converge to (\mu_l(\psi)=\arg\max_{\theta’}\int \log f_l(x;\theta’),Q_l(dx;\psi)), which generally differs from the true emission parameters (\theta_l). Likewise, the estimated transition probabilities converge to limits (q_{ij}(\psi)) that differ from the true transition matrix (P).
To remedy this, the authors propose Adjusted Viterbi training (VA). VA retains the computational simplicity of VT—only a single Viterbi alignment per iteration—but adds an analytic correction based on the limiting measures (Q_l(\psi)) and the limits (q_{ij}(\psi)). By explicitly characterizing the bias (\mu_l(\psi)-\theta_l) and (q_{ij}(\psi)-p_{ij}), VA modifies the VT update rules so that the true parameters become an asymptotic fixed point of the algorithm, thereby restoring the desirable statistical properties of EM while preserving VT’s speed.
A major theoretical contribution of the paper is the rigorous proof that the required limiting measures exist for general HMMs under mild conditions (finite state space, irreducible and aperiodic transition matrix, and sufficiently regular emission densities). The authors introduce the concepts of “nodes” and “no‑node intervals”. A node is a time point at which the observation forces a unique optimal state in the Viterbi path, independent of the surrounding data. They show that, almost surely, an infinite sequence of such nodes occurs, providing regeneration points for the Viterbi alignment process. By partitioning the observation sequence at these regeneration points, the Viterbi path becomes a regenerative process: each block between successive nodes is independent and identically distributed. This regenerative structure yields the weak convergence of the empirical measures (\hat P_{n}^{l}) to (Q_l(\psi)) and the almost sure convergence of the transition estimates to (q_{ij}(\psi)).
The paper also clarifies and extends prior work that had either restricted attention to the two‑state case or made unjustified assumptions about the existence of “special columns”. By formulating general node conditions, the authors provide sufficient criteria that hold for a broad class of HMMs, correcting earlier misconceptions.
Simulation studies presented in the paper compare EM, standard VT, and the proposed VA on both mixture models and more general HMMs. The results demonstrate that VA achieves estimation accuracy comparable to EM while requiring only marginally more computation than VT. Biases observed in VT are substantially reduced, confirming the theoretical predictions.
In conclusion, Adjusted Viterbi training offers a practically viable alternative to EM for large‑scale or real‑time applications (e.g., streaming audio/video, bio‑informatics) where computational resources are limited. The paper’s rigorous treatment of infinite Viterbi alignments and the associated limiting distributions not only underpins VA but also enriches the theoretical understanding of Viterbi decoding in stochastic processes.
Comments & Academic Discussion
Loading comments...
Leave a Comment