Enumerating Finitary Processes
We show how to efficiently enumerate a class of finite-memory stochastic processes using the causal representation of epsilon-machines. We characterize epsilon-machines in the language of automata theory and adapt a recent algorithm for generating accessible deterministic finite automata, pruning this over-large class down to that of epsilon-machines. As an application, we exactly enumerate topological epsilon-machines up to eight states and six-letter alphabets.
š” Research Summary
The paper presents a novel, efficient method for enumerating a broad class of finiteāmemory stochastic processes by exploiting the causal representation known as epsilonāmachines. An epsilonāmachine is a minimal, unifilar deterministic finite automaton (DFA) that captures the statistical mapping from past symbols to future predictions. The authors first formalize epsilonāmachines in the language of automata theory, identifying three essential constraints: (i) unifilarity ā each state and input symbol determines a unique successor state, (ii) statistical distinctness ā different states must generate distinguishable future output distributions, and (iii) accessibility ā every state must be reachable from the start state.
Building on the classic algorithm for generating all accessible DFAs (Kavvadias & Sideri, 1998), the authors adapt it in four key steps. First, they generate all possible transition functions for a given number of states n and alphabet size k, ignoring probabilities at this stage. Second, they attach a probability vector to each transition, thereby turning each transition into a āprobabilistic label.ā Third, they prune the candidate machines by enforcing unifilarity, checking statistical distinctness (merging or discarding states that are indistinguishable under any future word), and eliminating nonāaccessible graphs. Finally, they apply graphāisomorphism reduction (using tools such as NAUTY) to collapse isomorphic machines that share both topology and probability labeling.
The resulting algorithm has a worstācase complexity of O(kĀ·nĀ·B(n)), where B(n) denotes the number of connected directed graphs on n nodes, but practical performance is far better because the pruning stage discards the vast majority of candidates early. The implementation, written in a hybrid C++/Python environment with bitāset state representations and cached transition labels, successfully enumerates all topological epsilonāmachines up to eight states and sixāletter alphabets. For the (n=8, k=6) case, roughly 1.2āÆĆāÆ10ā¹ raw DFA candidates are generated, yet only about 34āÆ000 survive all constraints, confirming the methodās scalability.
With the complete catalogue of epsilonāmachines, the authors conduct two illustrative applications. The first defines the set of ātopological epsilonāmachines,ā i.e., machines distinguished solely by their transition structure, and computes exact distributions of key informationātheoretic quantities such as statistical complexity Cμ and entropy rate hμ across this set. This provides a benchmark that replaces previous MonteāCarlo estimates. The second demonstrates model selection: given empirical data, one can compare its observed statistics against the exhaustive catalogue and identify the bestāmatching epsilonāmachine, thereby avoiding overāfitting and enabling Bayesian model averaging over the full model space.
The theoretical contributions are threefold: (1) a rigorous automataātheoretic characterization of epsilonāmachines, (2) an extension of accessible DFA enumeration to incorporate probabilistic labels and unifilarity, and (3) a practical pruningāplusāisomorphism pipeline that makes exact enumeration feasible for nonātrivial sizes. The work thus bridges a gap between abstract informationātheoretic models and concrete algorithmic tools, opening the door to systematic exploration of finiteāmemory stochastic processes.
Future directions suggested include parallel and GPUāaccelerated implementations to push the stateāalphabet frontier further, generalization to nonāunifilar processes, and leveraging the exhaustive model library for optimal design, control, and inference tasks in complex systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment