Enumerating Finitary Processes

Enumerating Finitary Processes
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We show how to efficiently enumerate a class of finite-memory stochastic processes using the causal representation of epsilon-machines. We characterize epsilon-machines in the language of automata theory and adapt a recent algorithm for generating accessible deterministic finite automata, pruning this over-large class down to that of epsilon-machines. As an application, we exactly enumerate topological epsilon-machines up to eight states and six-letter alphabets.


šŸ’” Research Summary

The paper presents a novel, efficient method for enumerating a broad class of finite‑memory stochastic processes by exploiting the causal representation known as epsilon‑machines. An epsilon‑machine is a minimal, unifilar deterministic finite automaton (DFA) that captures the statistical mapping from past symbols to future predictions. The authors first formalize epsilon‑machines in the language of automata theory, identifying three essential constraints: (i) unifilarity – each state and input symbol determines a unique successor state, (ii) statistical distinctness – different states must generate distinguishable future output distributions, and (iii) accessibility – every state must be reachable from the start state.

Building on the classic algorithm for generating all accessible DFAs (Kavvadias & Sideri, 1998), the authors adapt it in four key steps. First, they generate all possible transition functions for a given number of states n and alphabet size k, ignoring probabilities at this stage. Second, they attach a probability vector to each transition, thereby turning each transition into a ā€œprobabilistic label.ā€ Third, they prune the candidate machines by enforcing unifilarity, checking statistical distinctness (merging or discarding states that are indistinguishable under any future word), and eliminating non‑accessible graphs. Finally, they apply graph‑isomorphism reduction (using tools such as NAUTY) to collapse isomorphic machines that share both topology and probability labeling.

The resulting algorithm has a worst‑case complexity of O(kĀ·nĀ·B(n)), where B(n) denotes the number of connected directed graphs on n nodes, but practical performance is far better because the pruning stage discards the vast majority of candidates early. The implementation, written in a hybrid C++/Python environment with bit‑set state representations and cached transition labels, successfully enumerates all topological epsilon‑machines up to eight states and six‑letter alphabets. For the (n=8, k=6) case, roughly 1.2 × 10⁹ raw DFA candidates are generated, yet only about 34 000 survive all constraints, confirming the method’s scalability.

With the complete catalogue of epsilon‑machines, the authors conduct two illustrative applications. The first defines the set of ā€œtopological epsilon‑machines,ā€ i.e., machines distinguished solely by their transition structure, and computes exact distributions of key information‑theoretic quantities such as statistical complexity Cμ and entropy rate hμ across this set. This provides a benchmark that replaces previous Monte‑Carlo estimates. The second demonstrates model selection: given empirical data, one can compare its observed statistics against the exhaustive catalogue and identify the best‑matching epsilon‑machine, thereby avoiding over‑fitting and enabling Bayesian model averaging over the full model space.

The theoretical contributions are threefold: (1) a rigorous automata‑theoretic characterization of epsilon‑machines, (2) an extension of accessible DFA enumeration to incorporate probabilistic labels and unifilarity, and (3) a practical pruning‑plus‑isomorphism pipeline that makes exact enumeration feasible for non‑trivial sizes. The work thus bridges a gap between abstract information‑theoretic models and concrete algorithmic tools, opening the door to systematic exploration of finite‑memory stochastic processes.

Future directions suggested include parallel and GPU‑accelerated implementations to push the state‑alphabet frontier further, generalization to non‑unifilar processes, and leveraging the exhaustive model library for optimal design, control, and inference tasks in complex systems.


Comments & Academic Discussion

Loading comments...

Leave a Comment