Order estimation of Markov chains
We describe estimators $\chi_n(X_0,X_1,…,X_n)$, which when applied to an unknown stationary process taking values from a countable alphabet ${\cal X}$, converge almost surely to $k$ in case the process is a $k$-th order Markov chain and to infinity otherwise.
💡 Research Summary
The paper introduces a novel estimator, denoted χₙ, for determining the order of a Markov chain when the underlying process is stationary and takes values from a countable alphabet. Unlike traditional methods that rely on information criteria such as AIC or BIC and often assume a finite alphabet or a predefined model class, χₙ is fully non‑parametric and works under minimal assumptions. The construction of χₙ proceeds by computing empirical transition probabilities from the observed sequence (X₀,…,Xₙ) and evaluating a deviation function that measures the discrepancy between empirical conditional distributions and their limiting values for each candidate order k. Two safeguards are incorporated: a frequency threshold that excludes rarely observed strings, and a convergence test that checks whether the deviation for all length‑(k+1) strings tends to zero.
The authors prove that if the process is a true k‑th order Markov chain, then χₙ converges almost surely to k as n → ∞. Conversely, if the process does not possess a finite Markov order, χₙ diverges to infinity with probability one. The proof leverages the strong law of large numbers, stationarity, and the finiteness of transition probabilities even in a countable state space.
Computationally, χₙ requires tracking frequencies of all observed substrings up to a certain length, which can be efficiently implemented using hash tables or trie structures. The paper provides a detailed analysis of time and memory complexity, showing that the method remains feasible for large datasets because most long substrings are sparse in practice.
Empirical evaluation includes synthetic Markov chains of orders 2–4 and real‑world data such as text corpora and DNA sequences. Simulations demonstrate that χₙ identifies the correct order with high probability after relatively modest sample sizes (on the order of 10⁴–10⁵ observations). For processes lacking a finite Markov order, χₙ exhibits a monotonic increase, confirming the theoretical divergence result.
The authors also discuss extensions, including adaptation to non‑stationary processes, continuous alphabets, and online (streaming) settings where the estimator updates incrementally. Overall, the paper delivers a rigorous, universally applicable tool for order estimation, bridging a gap between theoretical consistency and practical applicability in fields ranging from natural language processing to bioinformatics.
Comments & Academic Discussion
Loading comments...
Leave a Comment