Estimating Discrete Markov Models From Various Incomplete Data Schemes

The parameters of a discrete stationary Markov model are transition probabilities between states. Traditionally, data consist in sequences of observed states for a given number of individuals over the whole observation period. In such a case, the estimation of transition probabilities is straightforwardly made by counting one-step moves from a given state to another. In many real-life problems, however, the inference is much more difficult as state sequences are not fully observed, namely the state of each individual is known only for some given values of the time variable. A review of the problem is given, focusing on Monte Carlo Markov Chain (MCMC) algorithms to perform Bayesian inference and evaluate posterior distributions of the transition probabilities in this missing-data framework. Leaning on the dependence between the rows of the transition matrix, an adaptive MCMC mechanism accelerating the classical Metropolis-Hastings algorithm is then proposed and empirically studied.

💡 Research Summary

This paper addresses the problem of estimating the transition probabilities of a discrete‑time, stationary Markov chain when the state trajectories are only partially observed. In the classical setting, each individual’s full sequence of states over the observation window is available, and the maximum‑likelihood estimator reduces to simple one‑step counts. Real‑world applications—such as longitudinal medical studies, ecological monitoring, or social‑science surveys—often provide only sporadic snapshots of each subject’s state at a limited set of time points. Consequently, the likelihood involves latent paths that must be integrated out, making direct estimation infeasible.

The authors adopt a Bayesian framework. Each row of the transition matrix (P) is assigned an independent Dirichlet prior, which naturally respects the simplex constraint (row sums equal one). The observed data consist of tuples ((i, t_{ij}, x_i(t_{ij}))) indicating that subject (i) was seen in state (x_i) at time (t_{ij}). Missing intervals between observed times are treated as latent sequences. The posterior distribution of (P) given the incomplete data is analytically intractable, so the paper develops Markov chain Monte Carlo (MCMC) algorithms to draw samples from it.

A baseline Gibbs sampler is described: (1) conditional on a current transition matrix, each subject’s missing path is sampled using a forward‑backward procedure that respects the Markov property; (2) given the complete (observed + imputed) paths, each row of (P) is updated from its conjugate Dirichlet posterior. While conceptually straightforward, this Gibbs scheme can suffer from poor mixing when the unobserved intervals are long, because the space of possible hidden trajectories grows exponentially.

To improve mixing, the authors embed a Metropolis–Hastings (MH) step that proposes new rows of the transition matrix. The key innovation is an adaptive proposal that exploits the dependence among rows. After a burn‑in period, the algorithm computes the empirical mean vector (\mu) and covariance matrix (\Sigma) of the sampled rows. A multivariate normal proposal (N(\mu,\Sigma)) is drawn, transformed to a positive vector, and normalized to lie on the simplex, yielding a candidate row. This adaptive proposal is more likely to fall in high‑density regions of the posterior, leading to higher acceptance rates. Moreover, the covariance is gradually shrunk as sampling proceeds, balancing exploration and exploitation.

The paper categorises three missing‑data schemes: (i) regular‑interval observation (all subjects observed every (\Delta t) steps), (ii) random‑missing observation (each observation time is drawn independently), and (iii) a mixed scheme combining the two. Extensive simulation studies are performed for state spaces of size (K=3,5,10) under each scheme. Performance metrics include Effective Sample Size (ESS), Gelman–Rubin convergence diagnostic ((\hat R)), and L2 distance between posterior mean transition matrices and the true matrices. Results show that the adaptive MH‑augmented Gibbs sampler achieves ESS gains of 2–3× over a naïve independent‑proposal MH, (\hat R) values consistently below 1.05, and markedly reduced bias in the estimated transition probabilities. Computational complexity scales as (O(NK^2)) per iteration (where (N) is the number of subjects), and the path‑sampling step is embarrassingly parallel.

The authors discuss limitations: when observations are extremely sparse (e.g., each subject observed only once), the adaptive proposal may not have enough information to learn a reliable covariance structure, leading to slower convergence. They suggest incorporating stronger priors, hierarchical modeling, or variational approximations in such regimes. Potential extensions include non‑stationary or continuous‑time Markov processes, high‑dimensional state spaces with sparse transition structures, and real‑data applications such as disease‑stage progression modeling.

In summary, the paper provides a comprehensive Bayesian treatment of discrete Markov model estimation under various incomplete‑data scenarios, and demonstrates that an adaptive MCMC mechanism leveraging inter‑row dependence can substantially accelerate convergence and improve estimation accuracy compared with conventional Metropolis–Hastings approaches.

💡 Research Summary

📜 Original Paper Content