Adjusted Viterbi training for hidden Markov models
To estimate the emission parameters in hidden Markov models one commonly uses the EM algorithm or its variation. Our primary motivation, however, is the Philips speech recognition system wherein the EM algorithm is replaced by the Viterbi training algorithm. Viterbi training is faster and computationally less involved than EM, but it is also biased and need not even be consistent. We propose an alternative to the Viterbi training – adjusted Viterbi training – that has the same order of computational complexity as Viterbi training but gives more accurate estimators. Elsewhere, we studied the adjusted Viterbi training for a special case of mixtures, supporting the theory by simulations. This paper proves the adjusted Viterbi training to be also possible for more general hidden Markov models.
💡 Research Summary
The paper addresses a long‑standing practical problem in hidden Markov model (HMM) parameter estimation: the trade‑off between the statistical efficiency of the Expectation‑Maximization (EM) algorithm and the computational speed of Viterbi training (VT). While VT is attractive for real‑time applications such as the Philips speech‑recognition system because it replaces the costly E‑step with a single Viterbi decoding, it suffers from systematic bias and can even be inconsistent, as the most likely state path does not represent the true posterior distribution over hidden states.
To remedy this, the authors propose Adjusted Viterbi Training (AVT), an algorithm that retains the O(T·|S|²) complexity of VT but adds a correction term that accounts for the discrepancy between the Viterbi path and the full posterior. The method proceeds in two stages at each iteration k: (1) compute the Viterbi alignment (\hat S^{V}) under the current parameters (\theta^{(k)}); (2) evaluate expected sufficient statistics (transition counts (T_{ij}) and emission statistics) conditioned on the observed sequence using the forward‑backward algorithm, and incorporate the difference between these expectations and the raw counts obtained from (\hat S^{V}) into a modified objective function. Formally, the adjusted objective is
\
Comments & Academic Discussion
Loading comments...
Leave a Comment