Learning Hidden Markov Models using Non-Negative Matrix Factorization

Reading time: 5 minute
...

📝 Original Info

  • Title: Learning Hidden Markov Models using Non-Negative Matrix Factorization
  • ArXiv ID: 0809.4086
  • Date: 2011-01-11
  • Authors: ** - George Cybenko, Fellow, IEEE (Dartmouth College, Thayer School of Engineering) - Valentino Crespi, Member, IEEE (California State University, Los Angeles) **

📝 Abstract

The Baum-Welsh algorithm together with its derivatives and variations has been the main technique for learning Hidden Markov Models (HMM) from observational data. We present an HMM learning algorithm based on the non-negative matrix factorization (NMF) of higher order Markovian statistics that is structurally different from the Baum-Welsh and its associated approaches. The described algorithm supports estimation of the number of recurrent states of an HMM and iterates the non-negative matrix factorization (NMF) algorithm to improve the learned HMM parameters. Numerical examples are provided as well.

💡 Deep Analysis

Deep Dive into Learning Hidden Markov Models using Non-Negative Matrix Factorization.

The Baum-Welsh algorithm together with its derivatives and variations has been the main technique for learning Hidden Markov Models (HMM) from observational data. We present an HMM learning algorithm based on the non-negative matrix factorization (NMF) of higher order Markovian statistics that is structurally different from the Baum-Welsh and its associated approaches. The described algorithm supports estimation of the number of recurrent states of an HMM and iterates the non-negative matrix factorization (NMF) algorithm to improve the learned HMM parameters. Numerical examples are provided as well.

📄 Full Content

arXiv:0809.4086v2 [cs.LG] 8 Jan 2011 SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY, SEPTEMBER 2008 1 Learning Hidden Markov Models using Non-Negative Matrix Factorization George Cybenko, Fellow, IEEE, and Valentino Crespi, Member, IEEE Abstract—The Baum-Welch algorithm together with its deriva- tives and variations has been the main technique for learning Hidden Markov Models (HMM) from observational data. We present an HMM learning algorithm based on the non-negative matrix factorization (NMF) of higher order Markovian statistics that is structurally different from the Baum-Welch and its asso- ciated approaches. The described algorithm supports estimation of the number of recurrent states of an HMM and iterates the non-negative matrix factorization (NMF) algorithm to improve the learned HMM parameters. Numerical examples are provided as well. Index Terms—Hidden Markov Models, machine learning, non- negative matrix factorization. I. INTRODUCTION Hidden Markov Models (HMM) have been successfully used to model stochastic systems arising in a variety of appli- cations ranging from biology to engineering to finance [1], [2], [3], [4], [5], [6]. Following accepted notation for representing the parameters and structure of HMM’s (see [7], [8], [9], [1], [10] for example), we will use the following terminology and definitions: 1) N is the number of states of the Markov chain underly- ing the HMM. The state space is S = {S1, ..., SN} and the system’s state process at time t is denoted by xt; 2) M is the number of distinct observables or symbols generated by the HMM. The set of possible observables is V = {v1, ..., vM} and the observation process at time t is denoted by yt. We denote by yt2 t1 the subprocess yt1yt1+1 . . . yt2; 3) The joint probabilities aij(k) = P(xt+1 = Sj, yt+1 = vk|xt = Si); are the time-invariant probabilities of transitioning to state Sj at time t + 1 and emitting observation vk given that at time t the system was in state Si. Observation vk is emitted during the transition from state Si to state Sj. We use A(k) = (aij(k)) to denote the matrix of state transition probabilities that emit the same symbol vk. Note that A = P k A(k) is the stochastic matrix representing the Markov chain state process xt. 4) The initial state distribution, at time t = 1, is given by Γ = {γ1, ..., γN} where γi = P(x1 = Si) ≥0 and P i γi = 1. G. Cybenko is with the Thayer School of Engineering, Dartmouth College, Hanover, NH 03755 USA e-mail: gvc@dartmouth.edu. V. Crespi is with the Department of Computer Science, California State University at Los Angeles, LA, 90032 USA e-mail: vcrespi@calstatela.edu. Manuscript submitted September 2008 Collectively, matrices A(k) and Γ completely define the HMM and we say that a model for the HMM is λ = ({A(k) | 1 ≤ k ≤M}, Γ). We present an algorithm for learning an HMM from single or multiple observation sequences. The traditional approach for learning an HMM is the Baum-Welch Algorithm [1] which has been extended in a variety of ways by others [11], [12], [13]. Recently, a novel and promising approach to the HMM ap- proximation problem was proposed by Finesso et al. [14]. That approach is based on Anderson’s HMM stochastic realization technique [15] which demonstrates that a positive factorization of a certain Hankel matrix (consisting of observation string probabilities) can be used to recover the hidden Markov model’s probability matrices. Finesso and his coauthors used recently developed non-negative matrix factorization (NMF) algorithms [16] to express those stochastic realization tech- niques as an operational algorithm. Earlier ideas in that vein were anticipated by Upper in 1997 [17], although that work did not benefit from HMM stochastic realization techniques or NMF algorithms, both of which were developed after 1997. Methods based on stochastic realization techniques, includ- ing the one presented here, are fundamentally different from Baum-Welch based methods in that the algorithms use as input observation sequence probabilities as opposed to raw obser- vation sequences. Anderson’s and Finesso’s approaches use system realization methods while our algorithm is in the spirit of the Myhill-Nerode [18] construction for building automata from languages. In the Myhill-Nerode construction, states are defined as equivalence classes of pasts which produce the same futures. In an HMM, the “future” of a state is a probability distribution over future observations. Following this intuition we derive our result in a manner that appears comparatively more concise and elementary, in relation to the aforementioned approaches by Anderson and Finesso. At a conceptual level, our algorithm operates as follows. We first estimate the matrix of an observation sequence’s high order statistics. This matrix has a natural non-negative matrix factorization (NMF) [16] which can be interpreted in terms of the probability distribution of future observations given the current state of the underlyin

…(Full text truncated)…

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut