Prediction, Retrodiction, and The Amount of Information Stored in the Present
We introduce an ambidextrous view of stochastic dynamical systems, comparing their forward-time and reverse-time representations and then integrating them into a single time-symmetric representation. The perspective is useful theoretically, computationally, and conceptually. Mathematically, we prove that the excess entropy–a familiar measure of organization in complex systems–is the mutual information not only between the past and future, but also between the predictive and retrodictive causal states. Practically, we exploit the connection between prediction and retrodiction to directly calculate the excess entropy. Conceptually, these lead one to discover new system invariants for stochastic dynamical systems: crypticity (information accessibility) and causal irreversibility. Ultimately, we introduce a time-symmetric representation that unifies all these quantities, compressing the two directional representations into one. The resulting compression offers a new conception of the amount of information stored in the present.
💡 Research Summary
The paper presents a unified, time‑symmetric framework for analyzing stochastic dynamical systems by treating forward‑time (predictive) and reverse‑time (retrodictive) representations on equal footing. It begins by recalling the standard computational‑mechanics construction of the ε‑machine, a minimal unifilar hidden Markov model whose causal states S⁺ capture all information from the past needed to predict the future. By time‑reversing the process, an analogous ε‑machine is built for the reverse direction, yielding retrodictive causal states S⁻ that contain all information from the future needed to reconstruct the past.
The central theoretical contribution is the proof that the excess entropy E—traditionally defined as the mutual information I(Past;Future)—is exactly the mutual information between the predictive and retrodictive causal states: E = I(S⁺;S⁻). The proof exploits Bayes’ rule and the fact that both S⁺ and S⁻ are sufficient statistics for their respective conditioning variables, establishing that the joint distribution P(S⁺,S⁻) fully captures the past‑future correlation structure.
From this identity the authors derive a practical method for computing E without having to estimate the full past‑future distribution. One only needs the stationary state distributions π⁺, π⁻ and the transition matrices of the two ε‑machines. By constructing the joint state distribution P(S⁺,S⁻) (the “bidirectional” distribution) and evaluating I(S⁺;S⁻) = H(S⁺) + H(S⁻) – H(S⁺,S⁻), the excess entropy follows directly. This dramatically reduces computational complexity, especially for processes with long-range dependencies where conventional estimators become infeasible.
The paper then introduces two new invariants. Crypticity χ = C_μ⁺ – E measures how much internal information the predictive ε‑machine stores that is not reflected in observable past‑future correlations; its reverse‑time counterpart χ⁻ = C_μ⁻ – E is defined analogously. Their difference, causal irreversibility η = χ – χ⁻, quantifies an asymmetry in information storage: η > 0 indicates that more information is retained about the past than about the future, while η < 0 signals the opposite. These quantities provide a nuanced view of temporal asymmetry beyond thermodynamic entropy production.
To bring the two directional models together, the authors construct a “bidirectional machine” whose states are ordered pairs (S⁺,S⁻). Transitions are defined only when the forward and reverse transitions are compatible (i.e., they emit the same observable symbol). The joint entropy H(S⁺,S⁻) of this machine equals C_μ⁺ + C_μ⁻ – E, showing that the bidirectional representation compresses the total stored information into a single structure. Consequently, the present’s effective information content is precisely the excess entropy E, while the remaining entropy H(S⁺,S⁻) – E corresponds to hidden, inaccessible information (crypticity).
The authors validate their framework on several examples, including binary Markov chains, hidden Markov models with non‑unifilar structure, and processes with infinite Markov order. In each case, the bidirectional machine yields a more compact representation than either directional ε‑machine alone, confirming the theoretical advantage of the time‑symmetric approach.
In conclusion, the paper demonstrates that viewing stochastic processes through the dual lenses of prediction and retrodiction not only yields a deeper conceptual understanding of excess entropy but also provides concrete computational tools and novel invariants (crypticity and causal irreversibility). This time‑symmetric perspective has potential applications across physics (e.g., nonequilibrium statistical mechanics), biology (e.g., genetic regulatory networks), and machine learning (e.g., bidirectional recurrent architectures), wherever the balance between stored and observable information is of central interest.
Comments & Academic Discussion
Loading comments...
Leave a Comment