How Hidden are Hidden Processes? A Primer on Crypticity and Entropy Convergence
We investigate a stationary process's crypticity---a measure of the difference between its hidden state information and its observed information---using the causal states of computational mechanics. Here, we motivate crypticity and cryptic order as p…
Authors: John R. Mahoney, Christopher J. Ellison, Ryan G. James
San ta F e Institute W orking P ap er 11-08-XXX arxiv.org:1108.XXXX [ph ysics.gen-ph] Ho w Hidden are Hidden Pro cesses? A Primer on Crypticit y and En trop y Con v ergence John R. Mahoney, 1 , ∗ Christopher J. Ellison, 2 , † Ry an G. James, 2 , ‡ and James P . Crutc hfield 2, 3 , § 1 Physics Dep artment, University of California at Mer c e d, 5200 North Lake R o ad Merc e d, CA 95343 2 Complexity Scienc es Center Physics Dep artment, University of California at Davis, One Shields Avenue, Davis, CA 95616 3 Santa F e Institute, 1399 Hyde Park R o ad, Santa F e, NM 87501 (Dated: Marc h 12, 2018) W e in v estigate a stationary pro cess’s crypticit y—a measure of the difference b etw een its hidden state information and its observ ed information—using the causal states of computational mechanics. Here, we motiv ate crypticit y and cryptic order as physically meaningful quantities that monitor ho w hidden a hidden pro cess is. This is done by recasting previous results on the conv ergence of blo ck entrop y and blo ck-state en trop y in a geometric setting, one that is more intuitiv e and that leads to a num ber of new results. F or example, w e connect crypticity to how an observer sync hronizes to a pro cess. W e show that the blo ck-causal-state en trop y is a con v ex function of block length. W e giv e a complete analysis of spin c hains. W e present a classification scheme that surveys stationary processes in terms of their p ossible cryptic and Mark o v orders. W e illustrate related en trop y conv ergence b ehaviors using a new form of foliated information diagram. Finally , along the wa y , we pro vide a v ariet y of in terpretations of crypticity and cryptic order to establish their naturalness and p erv asiv eness. Hop efully , these will inspire new applications in spatially extended and net w ork dynamical systems. Keyw ords : crypticit y , stored information, statistical complexity , excess entrop y , information dia- gram, sync hronization, irrev ersibilit y P ACS n um bers: 02.50.-r 89.70.+c 05.45.Tp 02.50.Ey 02.50.Ga CONTENTS I. Introduction 2 I I. Definitions 3 I I I. Crypticity: F rom state paths to sync hronization 3 A. Global versus Lo cal 5 B. Mazes and Stacks 5 C. T ransient v ersus Relaxed 5 D. Naive versus Informed 6 E. Statistical Complexity versus Crypticity 6 IV. Crypticity through Information Theory 6 A. Crypticity 6 B. Cryptic Order 7 V. Crypticity and Entrop y Con v ergence 8 A. Blo ck Entrop y 8 ∗ jmahoney3@ucmerced.edu † cellison@cse.ucdavis.edu ‡ rgjames@ucdavis.edu § chaos@cse.ucda vis.edu B. State-blo ck entrop y 8 C. Blo ck-state entrop y 9 VI. Crypticity in Spin Chains 10 VI I. Geometric Constraints 12 VI I I. The Cryptic Marko vian Zo o 12 IX. Information Diagrams for Stationary Pro cesses 13 X. Conclusion 15 Ac kno wledgmen ts 15 A. Why Crypticity? 15 B. Equiv alence of F orw ard and Reverse Restricted State-P aths 16 C. Crypticity and Co-unifilarity 17 References 17 2 A black b o x is a metaphor for ignorance: One cannot see inside, but the presumption is that something, unkno wn in whole or in part, is there to b e disco v ered. Moreo v er, the conceit is that the imp ov erished outputs from the box do con- tain something partially informativ e. Ph ysically , ignorance comes in the act of measuremen t— measuremen ts that are generically incomplete, in- accurate, and infrequent. Since measuremen ts dictate that one can hav e only a partial view, it go es without saying that these distortions make disco v ery b oth difficult and one of the key c hal- lenges to scien tific metho dology . Measuremen t necessarily leads to our viewing the world as b e- ing hidden from us. Of course, the w orld is not completely hidden. If it w ere, then there w ould b e neither gain nor motiv ation to probing measuremen ts to build mo dels. Scien tific theory building and its exp erimental verification op er- ate, then, in the framew ork of hidden processes— pro cesses from whic h w e hav e observ ations from whic h, in turn, w e attempt to understand the hid- den mec hanisms. At least philosophically , this setting is not even remotely new. The circum- stance is that addressed by Plato’s metaphor of our knowledge of the w orld deriving from the data of shadows on a ca v e w all. F ortunately , w e are far b ey ond metaphors these da ys. Hidden pro cesses p ose a quantitativ e ques- tion: How hidden are they? Here, we sho w ho w to quantitativ ely measure just this: How muc h in- ternal information is hidden b y measuring a pro- cess? Of course, this assumes, as in the black b o x metaphor, that there is something to b e discov- ered. The to ol w e use to ground the in ten tional stance of discov ering the in ternal mechanisms— to say what is hidden—is computational mec han- ics. Computational mechanics is a theory of what patterns are and how to measure a hidden pro- cess’s degree of structure and organization. Com- putational mec hanics has a long history , though, going back to the original challenges of nonlin- ear mo deling p osed in the 1970s that led to the concept of reconstructing “geometry from a time series”. The explorations here can b e seen in this ligh t, with one imp ortant difference: Computa- tional mec hanics shows that measurements of a hidden process tell ho w the pro cess’s in ternal or- ganization should b e represented. Building on this, w e dev elop a quan titativ e theory of ho w hid- den pro cesses are. I. INTRODUCTION Man y scien tific domains face the confounding problems of defining and measuring information pro cessing in dy- namical systems. These range from technology to funda- men tal science and, even, epistemology of science [1]: 1. The 2020 Digital R o adblo ck : The end of Mo ore’s scaling laws for micro electronics [2–4]. 2. The Centr al Do gma of Neur obiolo gy : Ho w are the in tricate physical, bio c hemical, and biological com- p onen ts structured and coordinated to supp ort nat- ural, intrinsic neural computing? 3. Physic al Intel ligenc e : Do es intelligence require bi- ology , though? Or can there b e alternative non bio- logical substrates which supp ort system behaviors that are to some degree “smart”. 4. Structur e versus F unction : Intelligence aside, how do we define and detect sp ontaneous organization, in the first place? How do these emergent patterns tak e on and supp ort functionality? Man y hav e work ed to quantify v arious asp ects of infor- mation dynamics; cf. Ref. [5]. One often finds references to information storage, transfer, and pro cessing. Sophis- ticated measures are devised to c haracterize these quanti- ties in m ultidimensional settings, including netw orks and adaptiv e systems. Here, we inv estigate foundational questions that b ear on all these domains, using metho ds with very few mo deling and represen tation requiremen ts attac hed that, nonetheless, allo w a go o d deal of progress. In quan- tifying information pro cessing in sto chastic dynamical systems, t wo measures hav e rep eatedly app eared and b een successfully applied: the past-future mutual infor- mation of observ ations (excess entrop y) E [6, and refer- ences therein] and the internal stored information (sta- tistical complexity) C µ [7]. Curiously , the difference b e- t ween these meas ures—the crypticit y χ [8]—has only re- cen tly receiv ed attention. T o our kno wledge, the first attempt to understand χ directly was in Ref. [9]. The follo wing provides additional p ersp ective and clarit y to the results contained there and in the related w orks of Refs. [8, 10, 11]. In particular, w e add to the b o dy of kno wledge surrounding crypticity and cryptic order, de- v elop a further classification of the space of pro cesses, and in tro duce several alternativ e w a ys to visualize these concepts. An appendix demonstrates that crypticity cap- tures a notable and unique prop erty , when compared to alternativ e information measures. The goal is to pro- vide a more in tuitiv e and geometric to olb ox for p os- ing and answ ering the increasing range and increasingly 3 more complex research challenges surrounding informa- tion pro cessing in nature and technology . I I. DEFINITIONS W e denote contiguous groups of random v ariables X i using X n : m +1 = X n . . . X m . A semi-infinite group is de- noted either X n : = X n X n +1 . . . or X : n = . . . X n − 2 X n − 1 . W e refer to these as the futur e and the p ast , resp ectively . Consisten t with this, the bi-infinite c hain of random v ari- ables is denoted X : . A pr o c ess is specified b y the distri- bution Pr( X : ). Throughout the following, we assume w e are given a stationary pro cess. Please refer to Refs. [9, 12] for supplementary defini- tions of presentations, causal states, -machines, unifi- larit y , co-unifilarity , Shannon blo ck information, infor- mation diagrams, and the like. The following assumes familiarit y with these concepts and the results and tec h- niques there. How ever, our dev elopment calls for a few reminders. There are tw o notions of memory central to c haracter- izing stochastic pro cesses. These are the exc ess en tropy E (sometimes called the predictive information) and the statistical complexit y C µ . The excess en tropy is a mea- sure of correlation b etw een the past and future: the de- gree to which one can remo ve uncertain ty in the future giv en knowledge of the past. (This is illustrated as the green information atom at the in tersection the past and future in the information diagram of Fig. 1.) The statis- tical complexity is a quan tit y that arises in the context of mo deling rather than prediction. Sp ecifically , C µ is the amount of information required for an observ er to sync hronize a sto chastic pro cess. In the setting of finite- state hidden Marko v mo dels, it is the information stored in the pro cess’s causal states. Then, we hav e the crypticity: Definition 1. A pr o c ess’s crypticit y χ is define d as: χ = H [ S 0 | X 0: ] , wher e S t is a pr o c ess’s c ausal state at time t . Clearly , the definition relies on ha ving a pro cess’s -mac hine presentation; the states used are causal states. Other presen tations, whose alternativ e states w e denote R , suggest an analogous, but more general definition of crypticit y; cf. Ref. [12]. T o give us something to temp orarily hang our hat on, it turns out that the crypticit y is simply ho w m uch stored information is hidden from observ ations. That is, it is the difference b et ween the internal stored information ( C µ ) and the apparen t past-future mutual information ( E ). This is directly illustrated in Fig. 1. H [ X :0 ] H [ X 0: ] C µ = H [ S ] H [ X :0 |S ] H [ X :0 | X 0: ] H [ X 0: |S ] χ = H [ S | X 0: ] E = I [ S ; X 0: ] = I [ X :0 ; X 0: ] FIG. 1. Crypticity χ is represen ted by the red (dark) crescen t shap e in this -machine I-diagram. The excess entrop y E , b y the (green) ov erlap of the past information H [ X :0 ] and future information H [ X 0: ]. The statistical complexit y C µ is the information in the internal causal states S and comprises b oth χ and E . F or a review of information measures and diagrams refer to the citations given in the text or quickly read the first portions of Sec. IX. W e are also interested in the range required to “learn” the crypticity . This is the cryptic or der . Definition 2. A pr o c ess’s cryptic order k is define d as: k = min { L ∈ Z + : H [ S L | X 0: ] = 0 } . These definitions do not easily admit an in tuitive inter- pretation. Their connection to hidden stored information is not immediately clear, for example. They mask the im- p ortance and centralit y of the crypticity prop erty . Given this, we dev ote some effort in the following to motiv ate them and to give several supplementary interpretations. As a start, Fig. 1 gives a graphical definition of cryptic- it y using the -machine information diagram of Ref. [8]. It is the red crescent highligh ted there, which is the state information C µ = H [ S ] min us that information deriv able from the future H [ X :0 ] ≡ H [ X 0: ]. This b egins to explain crypticit y as a measure of a pro cess’s hidden-ness. W e’ll return to this, but first let’s consider sev eral other alter- nativ es. I I I. CR YPTICITY: FR OM ST A TE P A THS TO SYNCHR ONIZA TION Crypticit y and, in particular, cryptic order ha ve straigh tforward in terpretations when one considers the in ternal state-paths tak en as an observer synchronizes to a pro cess [13]. In this, cryptic order is seen to be analogous to, and p otentially simpler than, a pro cess’s Mark ov order. While b oth the Marko v and cryptic or- ders deriv e from a notion of synchronization, the cryp- 4 tic order dep ends on a subset of the paths realized dur- ing synchronization. W e illustrate this via an example: The ( R, k )-Golden Mean Pro cess—a generalization of the Golden Mean Pro cess with tunable Marko v order R and tunable cryptic order k . In particular, we examine the (3 , 2)-Golden Mean Pro cess sho wn in Fig. 2. A B C D E 1 0 0 0 1 1 FIG. 2. The (3 , 2)-Golden Mean -machine: Marko v order 3 and cryptic order 2. It is straightforw ard to verify that the only w ords of length 3 generated by this pro cess are { 000 , 001 , 011 , 100 , 110 , 111 } . Since the process is Mark ov order 3 (b y construction) we kno w that each of these w ords is a sync hronizing w ord [14]. Some w ords lead to sync hronization in fewer than three steps, though. F or instance, 011 yields sync hronization to state E after just the first tw o symbols 01. In Fig. 3, we displa y the internal-state paths tak en by eac h p ossible initial state under evolution gov erned by the six sync hronizing words. Let’s take a moment to describ e these illustrations carefully . Before reading any w ord, there is maximum uncertain ty in the in ternal state. W e represen t this using a circle for each of the fiv e causal states of the -mac hine. Eac h of these states is led to a next state b y follo wing the first symbol seen [15]. F or w ord 001, the first symbol is 0, and A , for instance, is led to B . Notice that E is not led to any state. This is b ecause E has no outgoing transition with symbol 0. The path from E , therefore, ends and is not considered further. The termination of paths is one of the important features of synchronization to note. Lo oking at the sync hronizing w ord 100, w e see that the transition on the first sym b ol 1 takes b oth states A and E to the same state A . Since w e use unifilar presen tations ( -mac hines), this merging can never b e undone. Path merging is yet another imp ortant feature. Both the termination and merging of paths are rele- v ant to synchronization, but ha ve different roles in the determination of the Marko v and cryptic orders. Although w e already know the Marko v order of this pro cess, w e can read it from Fig. 3 b y lo oking at the lengths for eac h word where only one path remains. These lengths { 3 , 3 , 2 , 2 , 2 , 2 } are marked with orange dia- monds. The maximum v alue of this length is the Marko v order (3, in this example). A B C D E 0 0 0 0 0 1 0 1 1 1 0 0 1 1 0 1 1 1 FIG. 3. Sync hronization paths for (3 , 2)-Golden Mean -mac hine: Each sync hronizing w ord induces a set of state- paths; some of whic h terminate, some of which merge. In the next illustration, Fig. 4, we keep only those paths that do not terminate early . In this w ay , w e remo ve paths that generally are quite long, but that terminate b efore ha ving the c hance to merge with the final sync hro- nizing paths. W e similarly mark, with green triangles, the lengths where these reduced paths hav e ultimately merged. Note that restricting paths can only preserve or decrease each length. Finally , in analogy to the Marko v order, the maximum of these lengths { 0,0,0,1,2,2 } is the cryptic order (2 in this example). A B C D E 0 0 0 0 0 1 0 1 1 1 0 0 1 1 0 1 1 1 FIG. 4. The paths that are not terminated b efore the Marko v order are highlighted in red. These are the paths relev an t for the cryptic order. F or each word the contribution to Marko v order is still indicated by an orange diamond, whereas the con tribution tow ard cryptic order is indicated by a green tri- angle. This demonstrates how crypticity relates to paths and path merging. It is a small step then to ask for a direct connection to co-unifilarity [9]: H [ S 0 | X 0 S 1 ] = 0. In fact, there are three primary equiv alent statements ab out a pro cess: (i) its -machine b eing co-unifilar, (ii) its χ = 0, and (iii) its cryptic order k = 0. (App endix C presents a pro of of this equiv alence in terms of entrop y gro wth functions and includes the connection to cryptic order as w ell.) This exp oses the elemen tary nature of the cryptic order as a prop ert y of sync hronizing paths. Appendix B go es further to sho w that state-paths traced similarly , but in the reverse time direction, are the same as those singled- out in the forw ard direction, as just done. The remainder of this section offers different p ersp ectives on crypticit y , some of which are less strict, but provide in tuition and suggest its broad applicability . 5 A. Global v ersus Lo cal Imagine a synchronization task inv olving a group of agen ts. The agen ts b egin in different lo cations (states) and mov e to next lo cations based on the synchronization input they receiv e from a common con troller. The goal is to provide a uniform input that causes (a subset of ) the agen ts to arrive at the same lo cation. This is reminis- cen t of a road-coloring problem. In many road-coloring con texts, only uniform-degree graph structures are in ves- tigated, largely due to theoretical tractabilit y . How ever, real-w orld graphs are rarely uniform degree. This means that some agen ts may receive instructions that they can- not carry out. These agents quit, and their paths are terminated. Assuming that the instructions are synchro- nizing for some subset of the agents (the instruction is a synchronizing word), the synchronization task will end with this subset of agents at the desired destination. There are t wo w ays in whic h w e may view this process. One is global, and corresp onds to the Marko v order, while the other is lo cal and corresp onds to the cryptic order. If we monitor the entire collection of agen ts from a bird’s ey e view ev olving under the sync hronization input, w e observe paths terminating and merging. Our global notion of sync hronization is the p oin t at whic h eac h path is either terminated or merged with ev ery other v alid path. This is clearly coincident with the description of Mark ov order previous describ ed. Alternativ ely , we monitor the collective b y querying the agen ts after the task is complete. The unsuccess- ful agen ts, whose paths w ere terminated, never arriving at the destination, cannot b e queried. F rom this view- p oin t, sync hronization tak es place r elative to the group of agents that w ere not terminated. As lo cally inter- acting entities, they know the latest time at whic h an agen t merged with their group—the group which ulti- mately sync hronized. Even after this ev ent, there ma y be other agents still op erating that will inevitably b e termi- nated at some later time. This means that from the lo cal (agen t) persp ective, synchronization may happ en earlier than from the global (controller) p ersp ective. W e claim, based on this setting, that the cryptic order has a straigh tforw ard and physically relev ant basis in the con text of synchronization. Up coming discussions, some more tec hnical, will emphasize this p oint further, as w ell as demonstrate new results. B. Mazes and Stacks The Mark ov v ersus cryptic order distinction is relev ant to an y maze-solving algorithm [16]. Imagining the solu- tion of a maze as a sequence of mo ves—left, right, or straigh t—we may write down a list of p otential solutions (whic h m ust contain all actual solutions) b y listing all 3 N 2 sequences [17]. A brute-force algorithm tries all of these paths. Since w e are in terested in worst-case scenar- ios, many of the details (e.g., depth- versus breadth-first searc h) are not relev ant. What is relev ant is the ob ject that the algorithm must maintain in memory or that it ultimately returns to the user. An algorithm might try out each p otential solution, feeding in eac h mo ve sequentially and testing for either maze completion or termination (walking in to a wall or a previously visited location) at eac h step. The end of eac h solution is mark ed with a length. When all solutions ha ve b een tried, this set of solutions and lengths is returned. While this is not a stationary sto chastic pro cess, we may think of the longest of these lengths as being similar to the Marko v order. The sp eed and memory use of this algorithm are obviously improv ed b y using a tree struc- ture, but this do es not affect the result we are interested in. If we were only interested in paths which end in maze completion, an even more memory-conscious algorithm w ould realize that dead-ends in the tree could b e re- mo ved. One accomplishes this with a stac k memory for the activ e-path tree branc h. Reaching a nonsolving ter- mination, the algorithm p ops the end states until return- ing to the most recen t unexplored option. This process con tinues recursively until the tree has b een filled out. The relev ant lengths are now the lengths of the maze- completing paths (all ro ot-to-leaf paths), the longest of whic h is an analog of the cryptic order. C. T ransien t versus Relaxed Rather than using the global versus lo cal distinction, w e can think in terms of a dynamical view of synchroniza- tion. W e might imagine a collection of ants attempting to create paths from a resource-rich region to their nest; or a watershed in the pro cess of forming the transp ort net work from collection regions to the main b o dy of wa- ter. Until these netw orks develop, it is not clear which will b ecome the imp ortant paths. A log not worth clim bing o ver causes ants to make the effort less often, thereby dropping less pheromone, leading fewer ants to attempt this path, until finally it is empt y . Similarly , slow w ater dep osits more sediment and fills underused channels. As these net works ev olve from an initial transitory state to relaxed state, the types of paths within the netw ork and their sync hronization prop- erties change. In particular, while the early-time sync hro- nization depends on the terminating paths, the later-time sync hronization will not. In this dynamical picture we 6 see that a property akin to cryptic order emerges as the system e v olves. D. Naive versus Informed It is only a small step from this dynamical picture to view these self-reinforcing systems as ev olving from naiv e to informed states. Over time, a system “realizes” which paths are undesirable and quits them. Consider an in- dividual learning to navigate a new cit y . She will ex- p erience a similar netw ork ev olution, where the pruning of dead-end paths is an in tentional act. This navigation structure also will tend to reflect the cryptic order. E. Statistical Complexit y v ersus Crypticit y In addition to describing the Mark ov and cryptic or- ders via a dynamical picture of synchronization, we can explore the same phenomenon with the asso ciated en- tropies, a more statistical p ersp ective. Beginning with the global view, the distribution ov er the set of all starting p oints is the state en tropy H [ S 0 ], commonly called the statistical complexity C µ . By con- sidering the initial state distribution conditioned on the remo v al of the terminating paths, we are left with only a p ortion of this entrop y , and this is the crypticit y χ [18]. As discussed, w e migh t consider this remo v al a result of memory , relaxation, or prescience. IV. CR YPTICITY THROUGH INFORMA TION THEOR Y The discussion ab ov e in terms of paths is relatively in tuitive. The original conception, how ever, w as not in terms of paths, but rather in terms of information- theoretic quantities. Information identities based on -mac hines are b eginning to provide a gro wing set of in- terpretations; some more subtle, some more direct than others. The following will show that crypticity and cryp- tic order ha ve div erse implications and also that even elemen tary information-theoretic quantities form a rich to olset. A. Crypticity The -machine causal presentation pairs up pasts with futures in a wa y appropriate for prediction. Since pasts can b e different but predictively equiv alent, this pairing op erates on sets of pasts that, in turn, are equiv alent to the causal states themselv es. F urthermore, a single past can b e follow ed by a set of futures. This is natural since the pro cesses are sto chastic. So, any past or predictively equiv alent group of pasts is linked to a distribution of futures. Finally , these future distributions often ov erlap. As w e will now sho w, crypticity is a measure of this ov er- lap. Historically , it has taken some time to sort out the simi- larities and differences betw een v arious measures of mem- ory . Even tually , tw o emerged naturally as k ey concepts: C µ , the statistical complexit y or information pro cessing “size” of the in ternal mechanism; and E , the excess en- trop y , or the apparen t (to an observ er) amount of past- future mutual information. It has b een recognized for some time [19, 20] that C µ is an upp er b ound on E . The strictness of this inequalit y and the nature of the relation- ship b etw een the tw o, how ever, was not significan tly ex- plored un til Ref. [9]. The first simple statemen t [8] ab out crypticit y in terms of information-theoretic quantities is that it is the quan tifiable difference betw een tw o predom- inan t measures of information storage: χ = C µ − E . T aking this view a bit further, since E is the amoun t of uncertaint y in the future that one can reduce through study of the past, and C µ is the amoun t of information necessary to do optimal prediction (using a minimal pre- dictor), their difference is the amount of mo deling over- he ad . One may ob ject that a minimal optimal predictor should not require more information than will b e made use of. In fact it is known that many processes with large χ hav e nonunifilar representations that are muc h smaller [9]. What is not ob vious is that this is simply a re-represen tation of the causal states as mixtures of the new states [11, 21]. In other w ords, the overhe ad is in- esc ap able. This suggests a useful language with which to discuss sto chastic pro cesses—not only do we iden tify a pro cess with an - machine , but we analyze the efficiency of these machines in terms of required resources. F or the follo wing, we briefly inv oke the use of the re- v erse -machine, the causal representation of a pro cess when scanned in rev erse, to extend our view of crypticity . (F or details on reverse causal states, see Refs. [8, 11].) Recall that forward causal states are built for prediction and, similarly , reverse causal states are built for retro d- iction. W e say they are “built” for these purp oses in the sense that they are minimal and optimal, tw o desirable design goals. Giv en this, it is somewhat surprising to see that for- w ard causal states are b ette r at retro diction than reverse causal states. The information diagram in Fig. 5 illus- trates this. W e will no w show that the degree to whic h this is true is precisely the forward pro cess’s crypticity . Here, we write this difference in retro dictive uncertaint y 7 H [ X :0 ] H [ X 0: ] C − µ = H [ S − ] H [ X − 1:0 ] H [ X − 2:0 ] H [ X − 3:0 ] H [ X − 4:0 ] E C + µ = H [ S + ] FIG. 5. Crypticity as the degree to which forward causal states are b etter retro dictors than rev erse causal states. as follows: H [ X − L :0 |S − 0 ] − H [ X − L :0 |S + 0 ] ≥ 0 . Then, this difference conv erges to χ : χ = lim L →∞ H [ X − L :0 |S − 0 ] − H [ X − L :0 |S + 0 ] . W e might wonder why the reverse causal states were not built to b e better at their job. This is explained b y the fact that the information input to the ab ov e con- structs is not equiv alent. The forward causal states are built from the past, while the reverse causal states are built from the future. It is no surprise, then, that the forw ard states can offer information about the pasts from whic h they w ere built. It is more interesting to consider wh y they do not maintain all of this information. This is b ecause the forward states were designed for predicting a sto chastic pro cess, a goal for which maintaining infor- mation ab out the past offers diminishing returns. Rather than comparing the function of tw o ob jects (forw ard and reverse causal states), we can compare tw o functions of the same ob ject. In this light, the crypticity is the degree to which forw ard causal states are b etter at retro diction than they are at prediction. More precisely , w e hav e: H [ X 0: L |S 0 ] − H [ X 0: L |S L ] = H [ S 0 X 0: L ] − H [ X 0: L S L ] = H [ S 0 | X 0: L S L ] − H [ S L |S 0 X 0: L ] = H [ S 0 | X 0: L S L ] ≥ 0 . The first step follows from stationarity , the second ap- p eals to an informational identit y , and the next to unifi- larit y of the -mac hine. Similarly , this difference con- v erges to χ : lim L →∞ H [ S 0 | X 0: L S L ] = lim L →∞ H [ S 0 | X 0: L ] = χ Th us, crypticit y is the amount of information that, al- though necessary for curren t prediction, m ust be erased at some future time. B. Cryptic Order Man y of these statements ab out uncertain ty can b e rephrased in terms of length scales. The length scale asso ciated with the crypticity is the cryptic order: the distance we must lo ok in to the past to discov er the mo d- eling o verhead. F ollo wing our discussion of forward and rev erse states, w e can in terpret cryptic order as the length at which the difference conv erges to χ : k = min { L : H [ X − L :0 |S − 0 ] − H [ X − L :0 |S + 0 ] = χ } . Stated differently , it is the length at which all adv antage of a forward state ov er a reverse state as a retro dictor is lost. In other words: k = min { L : H [ X 0 | X 1: L +1 S + L +1 ] = H [ X 0 | X 1: L +1 S − L +1 ] } . Equiv alently cryptic order is the length at which a for- w ard state’s uncertaint y in prediction and retro diction equalize. More collo quially , it is the range b eyond which a forward state is equally go o d at prediction and retro d- iction, or: k = min { L : H [ X L |S 0 X 0: L ] = H [ X 0 | X 1: L +1 S L +1 ] } . As Sec. I I I suggested, the cryptic order k is closely analogous to the Marko v order R . Here, we state the parallel formally: R = min { L : H [ S L | X 0: L ] = 0 } k = min { L : H [ S L | X 0: L , X L : ] = 0 } . App endix A argues for the uniqueness of this parallel. Cryptic order is the largest noninferable state se- quence. Giv en an infinite string of measurements . . . x − 2 x − 1 x 0 , one ev entually sync hronizes to a particular causal state [22], for an y finite-state -machine. The same sym b ol sequence can then b e used to retro dict states b e- ginning at the p oint of sync hronization. All but the ear- liest k states can b e definitively retro dicted regardless of whic h observed sequence (and resulting predictive state) o ccurs. 8 V. CR YPTICITY AND ENTROPY CONVER GENCE It has become increasingly clear that en trop y functions are useful characterizations of pro cesses. Since a pro cess is a bi-infinite collection of random v ariables [23], it typ- ically is not useful to calculate the en tropy of the entire collection. The alternativ e strategy is to analyze the en- trop y of increasingly large finite p ortions. The scaling, then, captures the system’s bulk prop erties in the large- size (thermo dynamic) limit, as well as how those prop er- ties eme rge from the individual comp onents. These functions capture muc h of the b ehavior that w e are interested in here. The blo ck entrop y H [ X 0: L ] w as used to great effect in Ref. [6] to understand the w ay perceived rand omness ma y b e reform ulated as struc- ture, when longer correlations are considered. More recen tly , Ref. [12] used extended functions—the blo ck- state entrop y H [ X 0: L R L ] and the state-blo ck entrop y H [ R 0 X 0: L ]—to explore the relationship b etw een alter- nate presen tations of a pro cess and the information the- oretic measures of memory in a presentation. W e will b orrow these t wo new en tropy functions and turn them bac k on the canonical set of presentations, -mac hines, to exp ose the w orkings of crypticity . The result is a graphical approac h that offers a more intu- itiv e understanding of the results originally developed in Ref. [11]. Using this, we sharpen sev eral theorems, dis- co ver new b ounds, and p ose additional challenges. A. Blo ck Entrop y The blo ck entr opy H [ X 0: L ] is the joint Shannon en- trop y of finite sequences. As it is treated rather thor- oughly in Ref. [6], we simply recall several of its features. First, recall that X 0:0 represen ts the random v ariable for a null observ ation and, since there is just one w ay to do this, H [ X 0:0 ] = 0. As L increases, the block entrop y curv e is a nondecreasing, concav e function that limits to the linear asymptote E + h µ L , where E is the excess en tropy and h µ is the pro cess en trop y rate. Giv en a block en tropy curve, Mark ov pro cesses are eas- ily identified since the curve reaches it linear asymptote at finite blo c k length. That is, the Marko v order R sat- isfies: R ≡ min { L : H [ X 0: L ] = E + h µ L } . Before reaching the Marko v order, one has not discov- ered all pro cess statistics and, so, new symbols app ear more surprising than they otherwise would. Mathemati- cally , this is formulated through a low er b ound: H [ X L | X 0: L ] ≥ h µ , for all L . Since the blo c k en tropy curv e for Marko vian pro cesses reac hes its asymptote at L = R and since the linear asymptote has slop e equal to the en tropy rate, w e kno w that Marko v pro cesses attain the lo wer bound whenev er L > R : H [ X L | X 0: L ] = h µ . Finally , since the blo c k en trop y is concav e and non- decreasing, it is bounded abov e b y its linear asymptote. This naturally leads to a concav e, nondecreasing low er b ound estimate for the excess en tropy: E ( L ) ≡ H [ X 0: L ] − h µ L . Th us, E ( L ) ≤ E ( L + 1) ≤ E and lim L →∞ E ( L ) = E . B. State-blo ck entrop y The state-blo ck entrop y H [ R 0 X 0: L ] is the joint uncer- tain ty one has in a presentation’s in ternal state R and the blo ck of symbols immediately following. Its b eha vior is generally non trivial, but when restricted to -machines, its b ehavior is simple [12]. In that case, it refers to the pro cess’s unkno wn causal state S 0 and is denoted H [ S 0 X 0: L ]. Its simplicity is a direct consequence of the causal states’ efficient enco ding of the past. T o see this, note that differences in the state-block en tropy curv e, the rate at which it grows with blo ck length, are constant: H [ S 0 X 0: L +1 ] − H [ S 0 X 0: L ] = H [ X L |S 0 X 0: L ] = H [ X L |S 0: L X 0: L ] = H [ X L |S L ] = h µ . Here, we used the unifilarity prop erty of -mac hines: H [ S L +1 |S L , X L ] = 0. So, given the causal state S 0 , the blo c k X 0: L of symbols immediately follo wing it deter- mines eac h causal state along the wa y S 0: L . Since causal states are sufficient statistics for prediction, the future sym b ol X L dep ends only on the most recent causal state S L and, finally , the optimality of -mac hines means that the next symbol can b e predicted at the entrop y rate h µ . In other words, the state-blo ck en trop y that emplo ys a pro cess’s -machine presen tation is a straigh t line with with slop e h µ and y -in tercept H [ S 0 X 0:0 ] = H [ S 0 ] ≡ C µ . Note that H [ S 0 X 0: L ] ≥ H [ X 0: L ] with equality if and only if H [ S 0 | X 0: L ] = 0. Since conditioning never increases uncertain ty , these t wo block-en tropy curv es remain equal from that point onw ard. This necessarily implies that 9 they tend to the same asymptote. So, if the state-blo ck en tropy curv e ev er equals the block entrop y curv e, then the y -in tercepts of eac h curv e m ust also b e equal: C µ = E . Stated differently , the tw o curv es meet if and only if the pro cess has χ = 0. C. Blo ck-state entrop y Finally , w e consider the block-state en tropy H [ X 0: L R L ], a measure of the joint uncertain t y one has in a block of symbols and the presentation’s sub- sequen t internal state. Once again, our in te rest here is with -machines, and so w e consider H [ X 0: L S L ]. Unlik e the state-blo c k entrop y , how ever, the b ehavior of this entrop y is nontrivial. W e recall a num b er of its prop erties and also establish the equiv alence of the cryptic order definitions given in Refs. [9, 12]. Then, w e pro vide a detailed pro of of its conv exity , as this do es not app ear previously . The blo ck-state entrop y b egins at C µ when L = 0. As L increases, the curve is nondecreasing and tends, from ab o v e, to the same linear asymptote as the block entrop y: E + h µ L . Since the state-blo ck entrop y is C µ + h µ L and since C µ ≥ E , we see that the state-blo ck entrop y curv e is greater than or equal to the blo c k-state entrop y: H [ S 0 X 0: L ] ≥ H [ X 0: L S L ]. Equality for L > 0 o ccurs if and only if the pro cess has C µ = E or, equiv alently , χ = 0 and, then, the curves are equal for all L . Similarly , the blo ck-state en tropy is greater than or equal to the blo c k entrop y: H [ X 0: L S L ] ≥ H [ X 0: L ]. W e ha ve equality if and only if H [ S L | X 0: L ] = 0. Recall, the smallest suc h L is the Marko v order R . So, the blo c k- state entrop y equals the blo ck entrop y only at the Mark ov order. F urther, once the curv es are equal, they remain equal: H [ X 0: L S L ] = H [ X 0: L ] ⇒ H [ X 0: L +1 S L +1 ] = H [ X 0: L +1 ] . This can sho wn by individually expanding both H [ X 0: L +1 S L +1 ] and H [ X 0: L +1 ] to H [ X 0: L ] + h µ . The in terpretation is that the tw o curv es become equal only at the Marko v order and only after b oth curv es hav e reac hed their linear asymptotes. Reference [12] defined the cryptic order as the min- im um L for which the blo c k-state en tropy reac hes its asymptote. This is in contrast to the definition provided here and also in Ref. [9], which defines the cryptic order as the minimum L for which H [ S L | X 0: ] = 0. W e now establish the equiv alence of these tw o definitions. Theorem 1. H [ S L | X 0: ] = 0 ⇐ ⇒ H [ X 0: L S L ] = E + h µ L . (1) Pro of. H [ S L | X 0: ] = 0 (2) ⇐ ⇒ H [ S 0 | X 0: ] = H [ S 0 | X 0: L , S L ] (3) ⇐ ⇒ I [ S 0 ; X 0: ] = I [ S 0 ; X 0: L , S L ] (4) ⇐ ⇒ E = H [ X 0: L , S L ] − H [ X 0: L , S L |S 0 ] (5) ⇐ ⇒ H [ X 0: L , S L ] = E + H [ S L |S 0 , X 0: L ] + H [ X 0: L |S 0 ] (6) ⇐ ⇒ H [ X 0: L , S L ] = E + h µ L . (7) The step fr om Eq. (2) to Eq. (3) fol lows fr om Thm. 1 of R ef. [12]. In moving fr om Eq. (4) to Eq. (5) , we use d the pr escienc e of c ausal states E = I [ S 0 ; X 0: ] [20]. Final ly, Eq. (6) le ads to Eq. (7) using unifilarity of -machines ( H [ S L |S 0 , X 0: L ] = 0 ) and that they al low for pr e diction at the pr o c ess entr opy r ate: H [ X 0: L |S 0 ] = h µ L . W e obtain estimates for the crypticit y χ b y considering the difference b etw een the state-blo ck and blo ck-en tropy curv es: χ ( L ) ≡ H [ S 0 X 0: L ] − H [ X 0: L S L ] (8) = h µ L − H [ X 0: L |S L ] . (9) Ref. [9] show ed that this approximation limits from b e- lo w in a nondecreasing manner to the pro cess crypticity: χ ( L ) → χ and χ ( L ) ≤ χ ( L + 1) ≤ χ . This also provides an upp er-b ound estimate of the excess en trop y: E ≤ C µ − χ ( L ) . Com bined with the low er-b ound estimate the blo ck en- trop y provides, one can be confident in the estimates of excess en tropy . The retro dictive error H [ X 0: L |S L ] is the difference of the blo ck-state en trop y from the statistical complexity . It is also the difference of χ ( L ) from h µ L . F urthermore, it follows from Ref. [12] that the asymptotic retro diction rate [24] is equal to the pro cess entrop y rate: lim L →∞ H [ X 0: L |S L ] L = h µ . In a sense, this describes short-term retrodiction. As w e will see in a momen t, order- R spin-c hains are a class of pro cesses that ha ve no retro diction error for a full R -blo ck. The opp osite class, in this sense, consists of pro cesses with χ = 0—that is, the co-unifilar pro cesses. These immediately begin retrodiction at the optimal rate, whic h is h µ . Finally , we establish the conv exity of the blo ck-state en tropy , which app ears to b e new. Theorem 2. H [ X 0: L S L ] is c onvex upwar ds in L . 10 H [ X − 1 ] H [ X 0: L − 1 ] H [ X L − 1 S L ] H [ S L − 1 ] α γ ζ δ β FIG. 6. F our v ariable I-diagram for the blo ck-state entrop y con v exit y pro of, with the needed sigma-algebra atoms appro- priately lab eled. Pro of. Convexity her e me ans: H [ X 0: L +1 S L +1 ] − H [ X 0: L S L ] ≥ H [ X 0: L S L ] − H [ X 0: L − 1 S L − 1 ] . Stationarity gives us: H [ X − 1: L S L ] − H [ X 0: L S L ] ≥ H [ X − 1: L − 1 S L − 1 ] − H [ X 0: L − 1 S L − 1 ] . Simplifying, we have: H [ X − 1 | X 0: L S L ] ≥ H [ X − 1 | X 0: L − 1 S L − 1 ] . We c an use the I-diagr am of Fig. 6 to help understand this last c onvexity statement. Ther e, it tr anslates into: α + γ ≥ α + β , or, sinc e α ≥ 0 : γ ≥ β . (10) Using the fact that the c ausal state is an optimal r ep- r esentation of the p ast, we have the fol lowing expr essions that ar e asymptotic al ly e quivalent to the entr opy r ate h µ : H [ X L − 1 S L |S L − 1 ] = β + + δ + ζ H [ X L − 1 S L |S L − 1 X 0: L − 1 ] = β + ζ H [ X L − 1 S L |S L − 1 X − 1 ] = + ζ H [ X L − 1 S L |S L − 1 X − 1 X 0: L − 1 ] = ζ . The asso ciations with the sigma-algebr a atoms ar e r e adily gle ane d fr om the I-diagr am. Note that the various finite- L expr essions for the entr opy r ate r ely on the shielding pr op erty of the c ausal states and also on the -machine’s unifilarity. T aken to gether in the L → ∞ limit, the four r elations yield: ζ = h µ and β = δ = = 0 . These, in turn, tr ansform the c onvexity criterion of Eq. (10) into the simple statement that: γ ≥ 0 . Sinc e γ = I [ X − 1 ; S L − 1 | X 0: L S L ] is a c on ditional mutual information and, ther efor e, p ositive semidefinite, this es- tablishes that the blo ck-state entr opy is c onvex. 0 1 2 3 4 5 6 7 8 W ord Length 0 1 2 3 4 5 6 7 8 9 Entrop y C µ E χ χ ( L ) Marko v Order R Cryptic Order k H [ X 0: L ] E + h µ L H [ X 0: L S L ] H [ S 0 X 0: L ] FIG. 7. The blo ck H [ X 0: L ], state-block H [ S 0 X 0: L ], and blo ck- state entrop y H [ X 0: L S 0 ] curves compared. The sloped dashed line is the asymptote E + h µ L , to whic h b oth the blo ck entrop y and state-blo ck entrop y asymptote. Finite Marko v order and finite cryptic order are illustrated by the vertical dashed lines that indicate where the entropies meet the linear asymptote, resp ectiv ely . The con vergence of the crypticit y appro ximation χ ( L ) to χ is also shown. It will help to summarize the point that w e ha ve now reac hed. W e used the v arious blo c k entrop y curves to syn thesize muc h of our information-theoretic viewp oint of a pro cess into a single represen tation—that shown in Fig. 7. W e can amortize the effort to develop this view- p oin t b y applying it to a broad class of pro cesses familiar from statistical mechanics. VI. CR YPTICITY IN SPIN CHAINS W e first consider a subset of pro cesses dra wn from sta- tistical mec hanics known as one-dimensional spin chains . (F or background, see Refs. [19, 25].) They are pro cesses 11 suc h that H [ X 0: R ] = C µ . Using this, the simple geometry presen ted in Fig. 7 reveals that: χ ( L ) = ( h µ L 0 ≤ L ≤ R , χ L > R , (11) and this, in turn, implies that k = R . This can also seen through Eq. (9). Recall, the block-state en tropy is nondecreasing and b egins at C µ . Since spins chains hav e H [ X 0: R ] = C µ , we kno w that the block-state en trop y curv e for spin chains m ust remain flat until L = R . Con- sequen tly , H [ X 0: L |S L ] = 0 and χ ( L ) = h µ L for L ≤ R . Notice that H [ X 0: R |S R ] not v anishing gives a w ay to un- derstand how χ ( L ) deviates from linear growth. That is, the nonlinearit y of the approach of χ ( L ) to χ is exactly the co en trop y H [ X 0: L |S L ]. This prop erty is tantamoun t to a very simple test to determine if a pro cess is a spin c hain. If one obtains a plot similar to Fig. 7 for the pro cess in question, it is a spin c hain if H [ X 0: L , S L ] go es from (0 , C µ ) flat to ( R, C µ ), and then follows E + Lh µ . That is, the blo ck- state en tropy curve is flat un til it reaches its asymptote at L = k = R , at whic h p oin t it tracks it. F urthermore, giv en (i) the abov e pro of, (ii) the conca v- it y proof from Sec. V C, and (iii) the fact that k ≤ R , for a giv en E , h µ , and R spin chains are seen to be maximally- cryptic pro cesses. By this we mean that for all pro cesses with a particular set of v alues for E , h µ , and R , the pro- cess that maximizes χ is a spin chain. This implies that C µ is also maximized. 00 01 10 11 0 1 0 1 0 1 0 1 FIG. 8. An order-2 Marko v spin c hain with full support. Figures 8 and 9 show tw o order-2 Mark o v spin chains. The first is a full-supp ort order-2 Mark ov chain, while the second has only partial supp ort. In fact, the latter pro cess has the Golden Mean support consisting of all bi-infinite sequences that do not contain consecutive 0s. Figure 10 gives an -mac hine of similar structure to the spin chains just examined and, while it is also an order- 2 Marko v process, it is not a spin chain. The reason is that one causal state (lab eled “01 , 11”) is induced b y t wo 00 01 10 0 1 0 0 1 FIG. 9. An order-2 Marko v spin c hain with partial supp ort. 00 01,11 10 0 1 0 1 0 1 FIG. 10. An order-2 Marko v process, but not a spin chain. w ords: 01 and 11. This means that the corresp ondence b et w een inducing-words and causal states is brok en. It is no longer a spin chain. W e close this section with a num b er of op en questions ab out s pin chains. The first tw o regard the structure of spin chains. If an -machine is a subgraph of an order- R Marko v sk eleton, then is it a spin chain? That is, do es the remov al of an edge from a spin c hain pro duce another spin c hain? The intuition behind this question is straigh tforward: Removing transitions disallows blo cks, but it would not cause any blo ck to b e asso ciated with a differen t state. A related question asks if all spin chains are of this form. The next tw o questions regard the transformation from a spin c hain to any other process and vice versa. First, can any order- R Marko v, order- k cryptic -machine b e obtained b y starting with an order- R Marko v skeleton, reducing some probabilities to zero and adjusting others to cause state merging? Also, given an order- R Marko v, order- k cryptic -machine, we can break the existing de- generacy so that H [ X 0: R ] = C µ . How do es the nonspin c hain we started with compare with the spin chain we end up with? 12 VI I. GEOMETRIC CONSTRAINTS The geometry of the blo ck entrop y conv ergence illus- trated in Fig. 7 can b e exploited. In particular, as w e will no w sho w, a v ariety of constrain ts leads to further results on the allow ed conv ergence b ehaviors the blo c k and blo ck-state entrop y curves can express. Figure 11 depicts these results graphically . 0 1 2 3 4 5 6 W ord Length 0 1 2 3 4 5 6 Entrop y R E + Rh µ C µ χ h µ χ h µ k FIG. 11. Constrain ts on entrop y con v ergence, illustrated for a pro cess that is order-5 Marko v and order-4 cryptic. The blue region circumscrib es where the blo ck entrop y curv e can lie; the tan, where the block-state entrop y ma y b e. These and the discreteness of L lead to restrictions on allow ed cryptic orders as w ell. First, giv en the blo ck entrop y’s conca vity and it’s asymptote, one sees that the blo ck en tropy curv e is contained within the triangle describ ed by { (0 , 0) , (0 , E ) , ( R, H [ X 0: R ]) } . W e also know that the blo c k entrop y cannot grow faster than H [ X 0 ] and this excludes the triangle { (0 , 0) , (0 , E ) , (1 , H [ X 0 ]) } . The re- sulting allow ed region is shown in light blue in Fig. 11. Second and similarly , the blo ck-state en tropy’s own prop erties require it to be within a triangle describ ed b y { (0 , C µ ) , ( χ h µ , C µ ) , ( R, H [ X 0: R ]) } . Third, since the entrop y functions are defined for discrete v alues of word length L , we can go a lit- tle further than these observ ations. The blo ck- state entrop y cannot intersect the asymptote E + h µ L at a noninteger L . Therefore the small triangle { ( b χ h µ c , C µ ) , ( χ h µ , C µ ) , ( d χ h µ e , E + d χ h µ e h µ ) } is excluded. The resulting allow ed trapezoid is displa yed in tan in Fig. 11. F ourth, recalling results on the blo ck-state en trop y , this exclusion means that pro cesses with C µ 6 = E + h µ k , for some k , m ust ha ve a degree of nonoptimal retro dic- tion. In short, they are prev ented from being spin c hains. Finally , giv en a pro cess that has cryptic order k , w e see that C µ ≤ E + h µ k . A more detailed result then says that C µ = E + h µ k if and only if H [ X k ] = C µ . Moreov er, it is Marko v order- k ; that is, it is a spin chain. VI I I. THE CR YPTIC MARK O VIAN ZOO It turns out that there exist finite-state pro cesses with all com binations of Marko v and cryptic order; sub ject, of course, to the constraint that R ≥ k . These range from the zero structural complexit y indep endent, identically distributed pro cesses, for which R = 0 and k = 0, to few- state processes where either or both are infinite. (F or a complemen tary and exhaustive surv ey see Ref. [26].) In practice, giv en what we now know ab out these prop erties, it is not difficult to design a v ariety of pro cesses that fulfill a given sp ecification. Also noteworth y is how the introduction of the new crypticit y “co ordinate” affects our view of sev eral w ell studied examples. F or instance, the Even Process [6] is one of the canonical finite-state, infinite-order Marko v pro cesses. In the past, it was often thought of as rep- resen ting b oth in tractability and compactness. Now, though, we see that it is trivial, b eing 0-cryptic. The Golden Mean Process, one of the simplest (order-1 Mark ov) subshifts of finite-type studied is now seen as more sophisticated, b eing 1-cryptic. These and similar explorations naturally lead one to delve deeper to find ex- treme examples—such as the Nemo pro cess b elow—that are infinite in b oth cryptic and Marko v orders. Again, see Ref. [26]. Figure 12 presents a crypticit y-Marko vity roadmap for the space of finite-state pro cesses. Borrowing from the immediately preceding citations, it also displa ys a select few pro cesses using their -mac hines to sho w concretely the full diversit y of p ossible Marko v and cryptic orders a finite-state pro cess can p ossess. The green bar at k = 0 consists of all co-unifilar pro cesses. The orange line con- tains all pro cesses where the Mark ov and cryptic orders are identical—a subset of whic h are the spin c hains. All other processes lie ab o ve this line. The Ev en Process is in the upper left corner. The Golden Mean Process (no con- secutiv e 0s) is in the low er left. The ∞ -cryptic, infinite- order Marko v Nemo Pro cess is in the upp er right corner. Sev eral of the other prototype -machines depicted illus- trate ( R, k )-parametrized classes of process for whom the Mark ov and cryptic orders can b e selected arbitrarily . 13 0 1 2 ∞ Cr yptic Order 0 1 2 ∞ Markov Order A B p | 1 1 − p | 0 1 | 1 A B C D E 1 2 | 2 1 2 | 0 1 2 | 1 1 2 | 3 1 2 | 0 1 2 | 1 1 2 | 4 1 2 | 5 1 2 | 7 1 2 | 6 A B C p | 1 1 − p | 0 1 | 0 1 − q | 0 q | 1 0 1 · · · k 1 2 | 1 1 2 | 0 1 | 1 1 | 1 1 | 1 0 1 · · · k 1 2 | 1 1 2 | 0 1 | 0 1 | 0 1 | 1 0 1 · · · k 1 2 | 1 1 2 | 0 1 | 0 1 | 0 1 | 2 A B 1 2 | 0 1 2 | 1 1 | 1 0 1 · · · R R + k − 1 · · · R + 1 1 2 | 1 1 2 | 0 1 | 0 1 | 0 1 | 1 1 | 1 1 | 1 1 | 1 FIG. 12. The crypticity-Mark ovit y roadmap for finite-state stationary pro cesses: The range of p ossible Mark ov and cryptic orders, illustrated by a sample of pro cesses depicted by their -machines. Low er left: The F air Coin Pro cess and all other I ID pro cesses. Upp er left: The ∞ -cryptic Even Process. Upper right: The Nemo Pro cess. Left v ertical (green) line: The co-unifilar pro cesses. IX. INFORMA TION DIA GRAMS F OR ST A TIONAR Y PROCESSES Information diagr ams , or simply I-diagr ams , are an imp ortan t analysis to ol in using information theory to analyze m ultiv ariate sto chastic pro cesses [27]. They are particularly useful when w orking with processes and, as w e ha ve already seen here, give a go o d deal of insight when the -machine presentation is employ ed [8, 11]. The essential idea is that there is a one-to-one cor- resp ondence b et ween information-theoretic quantities— m utual information and conditional and join t en tropies— and measurable sets. Constructiv ely , informational rela- tionships and constraints are depicted via set-theoretic op erations: join t en tropies are set unions, conditional en- tropies correspond to set difference, mutual information corresp onds to set intersection, and the lik e. The math- ematical structure is a sigma algebra ov er the process’s ev ents (words). The noncomp osite sets are the atoms of the sigma algebra and their size is the magnitude of the corresp onding informational quantities. When depicted graphically , though, one often ignores magnitudes and, instead, fo cuses on the set-theoretic relationships. Armed with simple and familiar rules, one can of- ten accomplish several algebraic calculational steps on comp ound en tropy expressions via a simple I-diagram and a small description. Perhaps more imp ortan tly , I- diagrams afford a visual calculus that lends a heightened in tuition ab out complicated relationships among random v ariables. Figures 13 through 17 show ho w to make more explicit and in tuitive the preceding formal views of entrop y con- v ergence and its relationship to Mark o vity and cryptic- it y . The tw o large circles in eac h represent the past via 14 H [ X :0 ] and the future via H [ X 0: ]. The excess entrop y E = I [ X :0 ; X 0: ], b eing a mutual information, is their in- tersection. The I-diagrams there sho w the nested de- p endence of the v arious information measures as one in- creases blo ck size and so increases the n umber of random v ariables. In the general m ultiv ariate case this w ould lead to an explosion of atoms. How ever, due to the nature of pro cesses and the -mac hine itself many simplifications are p ossible. Figures 14-16 also depict the -mac hine’s causal-state information, C µ = H [ S ], as a circle entirely inside the past H [ X :0 ]. This is so, since the causal states are a function of the past. H [ X :0 ] H [ X 0: ] H [ X − 1:0 ] H [ X − 2:0 ] H [ X − 3:0 ] H [ X − 4:0 ] E FIG. 13. Information diagram for an order-4 Marko v pro cess. Only the four most recent history symbols are needed to re- duce as muc h uncertaint y in the future as using the whole past w ould. T o start with the simplest case, Fig. 13 gives the I- diagram for an order-4 Marko v pro cess. As one exp ects, only the four most recent history symbols are needed to reduce as muc h uncertaint y in the future as using the whole past w ould. Equiv alently , as so on as the history con tains four symbols, all of the shared information b e- t ween the past and the future (the excess entrop y E ) is captured. H [ X :0 ] H [ X 0: ] H [ X − 1:0 ] H [ X − 2:0 ] H [ X − 3:0 ] H [ X − 3:0 ] H [ X − 4:0 ] C µ = H [ S ] E χ FIG. 14. Causal state is o verlaid onto an I-diagram for an order-4 Marko v pro cess. As drawn, no fewer than 4 his- tory symbols are required to determine the causal state. The causal state, though, do es not generally determine this length- four history . Figure 14 then o verla ys the causal state measure H [ S ]. In this, we see that no fewer than four history symbols are required to determine the causal state. Imp ortantly , it is now also made explicit that causal states do not generally determine this history . H [ X :0 ] H [ X 0: ] H [ X − 1:0 ] H [ X − 2:0 ] H [ X − 3:0 ] H [ X − 3:0 ] H [ X − 4:0 ] C µ = H [ S ] E χ FIG. 15. An I-diagram for an order-4 Marko v process, but order-3 cryptic. F our history sym b ols are required to deter- mine the state, but only three are required if one conditions on the future. Consider now the order-4 Mark ov, order-3 cryptic pro- cess of Figure 15. As b efore, four history symbols are required to determine the state. But, as depicted, only three history symbols are required if one conditions on the future as well. H [ X :0 ] H [ X 0: ] H [ X − 2:0 ] H [ X − 2:0 ] H [ X − 1:0 ] H [ X − 3:0 ] H [ X − 3:0 ] H [ X − 4:0 ] C µ = H [ S ] E χ FIG. 16. The separation b etw een Marko v and cryptic orders can b e widened: A Mark ov order-4, cryptic order-2 pro cess. Figure 16 demonstrates how the difference b etw een Mark ov and cryptic orders can b e increased without b ound. The I-diagram illustrates the sigma-algebra for an order-4 Marko v, order-2 cryptic pro cess. Finally , Fig. 17 gives the I-diagram for an order-4 spin c hain. Sev eral features of spin chains are clearly ren- dered in this I-diagram. First, the shortest history that uniquely determines the state o ccurs at length 4. Sp ecif- ically , as depicted, min L : H [ S 0 | X − L :0 ] = 4. And, at the same time, this length-4 history is itself uniquely deter- mined by the causal state. 15 H [ X :0 ] H [ X 0: ] H [ X − 4:0 ] C µ = H [ S ] E χ h µ h µ h µ h µ h µ h µ FIG. 17. The highly regular I-diagram for an order-4 spin c hain. X. CONCLUSION Crypticit y , as the difference b etw een a pro cess’s stored information and its observed information, is a key prop- ert y . The fundamen tal definitions, Eqs. (1) and (2), though, are not imm ediately transparent. How ever, they do lead to sev eral interpretations that pro ve useful in dif- feren t settings. Given this, our main goals were to expli- cate the basic notions b ehind crypticity and to motiv ate v arious of its interpretations. Along the wa y , we pro- vided a new geometric interpretation for cryptic order, established a num b er of previously outstanding prop er- ties, and illustrated crypticit y by giving a complete anal- ysis for spin chains. More specifically , using state-paths, w e in tro duced sev- eral new in terpretations of crypticity that not only helped to explain the basic idea but also suggest future ap- plications in distributed dynamical systems. W e also ga ve a simple geometric picture that relates cryptic and Mark ov orders. W e established the equiv alence b etw een co-unifilarit y and b eing 0-cryptic, as well as the concav- it y of the blo c k-state entrop y H [ X 0: L S L ]. W e derived sev eral geometric constraints and drew out their impli- cations for bounds on crypticit y . These also led to an im- pro ved b ound on Marko v order. Presumably , the b ounds will help impro ve estimates of crypticit y and cryptic or- der, in b oth the finite and infinite cases. T o giv e a sense of the relationship b etw een cryptic and Mark ov orders we gav e a graphical ov erview classifying pro cesses in their terms. In a complementary w ay , we in- tro duced the tec hnique of foliated information diagrams to analyze en tropy conv ergence and Mark ov and cryp- tic orders in terms of Shannon information measures and their now blo ck-length-dependent sigma algebra. T o ground the results in a concrete and familiar class of pro cesses we analyzed range- R 1D spin chains in de- tail. W e established their Mark ov order and show ed that the blo ck-state entrop y H [ X 0: L S L ] is flat for spin chains and that χ ( L ) = Lh µ , for all L ≤ R . F rom these prop- erties one can determine whether or not a given pro cess is represen table as a spin chain: Is the R -blo ck en tropy equal to the statistical complexit y? The prop erties also suggest what the pro cesses in the neighborho o d of a spin c hain lo ok lik e. Finally , by wa y of making contact with applications to physics and computation, w e close by briefly out- lining the relationship b etw een crypticit y and dynam- ical irreversibilit y in ph ysical pro cesses [28]. Consider the morph map φ : S 0 → { X 0: } . A pro cess’s entrop y rate controls the prediction uncertain ty of this map: h µ ≡ lim L →∞ H [ X 0: L |S 0 ]. No w, consider the state un- certain ty determined by the inv erse of the morph map: φ − 1 : X 0: → {S 0 } . This is already familiar. The cryptic- it y controls this uncertaint y: χ = lim L →∞ H [ S 0 | X 0: L ]. Just as the entrop y rate is a pro cess’s rate of producing information, the crypticity is its rate of information loss or, what one can call, a pro cess’s information-pr o c essing irr eversibility . And the latter, appropriately adapting Landauer’s Principle [29], provides a low er b ound on the energy dissipation required to supp ort a pro cess’s irre- v ersible intrinsic computation. W e leav e the full dev el- opmen t of the thermo dynamics of intrinsic computation, ho wev er, to another ven ue. A CKNOWLEDGMENTS This work was partially supported b y NSF Gran t No. PHY-0748828 and by the Defense Adv anced Researc h Pro jects Agency (D ARP A) Ph ysical In telligence Sub con- tract No. 9060-000709. The views, opinions, and find- ings contained in this article are those of the authors and should not b e interpreted as represen ting the offi- cial views or policies, either expressed or implied, of the D ARP A or the Department of Defense. App endix A: Why Crypticity? There are many wa ys to assemble information- theoretic quantities—more sp ecifically , information mea- sures [27]. Why should one care ab out crypticit y and cryptic order? What makes them sp ecial? W e show that crypticit y stands out among reasonable alternative mea- sures by a rather direct comparison. It turns out that there are fewer information quan- tities than one migh t expect—at least few er interesting ones—o ver pasts, futures, and states. Let’s limit our- selv es to quantities that dep end on only a finite set of ob jects and require that we lo ok for a “1-parameter fini- tization” prop erty , based on block length. In this case, w e 16 can make an exhaustive list of the information measures and describ e each one. The list, at first, app ears long. But this length is illustrative of the fact that crypticit y and cryptic order really do capture a relatively unique pro cess prop erty . Everything else is either trivial, p eri- o dic, or Mark o v. T able I presen ts the list. It was assembled in a direct w ay b y systematically writing down alternative expres- sions ov er single v ariables, pairs of v ariables and their join t and conditional entrop y p ossibilities, o ver three v ariables, and so on. One could also consider enumer- ating only the relev an t sigma-algebra atoms. This, how- ev er, obscures parallels to existing quantities. In addition, alternatives suc h as H ( X − L :0 | X :0 ) are not included, since they are trivial. Nor were quantities such as H ( X :0 | X − L :0 ) added, although they could b e. Quan- tities along these lines would needlessly expand the list, to little b enefit. As elsewhere here, w e assume the state random v ari- able denotes a causal state. App endix B: Equiv alence of F orward and Rev erse Restricted State-Paths Wh y are the restricted state-paths the same in the forw ard and backw ard lattice diagrams of Figs. 3 and 4? Recall that a forward path is allow ed if Pr( X 0: L = w , S 1 = σ B |S 0 = σ A ) 6 = 0. Similarly , a backw ard path is allo wed when Pr( S 0 = σ A , X 0: L = w |S L = σ B ) 6 = 0. Since b oth causal states σ A and σ B ha ve nonzero proba- bilit y by definition of b eing recurrent, we see that w e can state both cases as paths for which Pr( S 0 = σ A , X 0: L = w , S L = σ B ) 6 = 0. A B C D E 0 0 0 0 0 1 0 1 1 1 0 0 1 1 0 1 1 1 FIG. 18. Why forward and backw ard restricted paths are the same. In this figure state-paths are traced back from final states. Cf. Fig. 4. Figure 18 illustrates this by tracing state-paths back- w ard through the mac hine starting at eac h final state. Of course, since processes and their -mac hines are generally not counifilar, there will be splitting in these paths. F or example, consider the paths that end in state A on a 1. A ’s predecess ors on a 1 are states A and E . Note that this pro duces a different initial set of candi- date state-paths, when compared with those in ligh t blue in Fig. 4. No w, eliminate all paths that do not trace bac k Information Measure Prop erty Detected H [ A ] H [ X :0 ] = H [ X − L :0 ] P erio dic H [ X 0: ] = H [ X 0: L ] P erio dic H [ A | B ] H [ S | X :0 ] = H [ S | X − L :0 ] Mark ov H [ S | X 0: ] = H [ S | X 0: L ] Mark ov H [ X 0: | X :0 ] = H [ X 0: | X − L :0 ] Mark ov H [ X :0 | X 0: ] = H [ X :0 | X 0: L ] Mark ov H [ X :0 |S ] = H [ X − L :0 |S ] P erio dic H [ X 0: |S ] = H [ X 0: L |S ] P erio dic H [ A | B C ] H [ S | X :0 X 0: ] = H [ S | X − L :0 X 0: ] Cryptic Order H [ S | X :0 X 0: ] = H [ S | X :0 X 0: L ] T rivial H [ X :0 |S X 0: ] = H [ X :0 |S X 0: L ] T rivial H [ X 0: | X :0 S ] = H [ X 0: | X − L :0 S ] T rivial H [ X :0 |S X 0: ] = H [ X − L :0 |S X 0: ] P erio dic H [ X 0: | X :0 S ] = H [ X 0: L | X :0 S ] P erio dic H [ AB ] H [ X :0 S ] = H [ X − L :0 S ] P erio dic H [ S X 0: ] = H [ S X 0: L ] P erio dic H [ X :0 X 0: ] = H [ X − L :0 X 0: ] P erio dic H [ X :0 X 0: ] = H [ X :0 X 0: L ] P erio dic H [ AB | C ] H [ X :0 S | X 0: ] = H [ X − L :0 S | X 0: ] P erio dic H [ S X 0: | X :0 ] = H [ S X 0: L | X :0 ] P erio dic H [ X :0 S | X 0: ] = H [ X :0 S | X 0: L ] Mark o v H [ S X 0: | X :0 ] = H [ S X 0: | X − L :0 ] Mark ov H [ AB C ] H [ X :0 S X 0: ] = H [ X − L :0 S X 0: ] P erio dic H [ X :0 S X 0: ] = H [ X :0 S X 0: L ] P erio dic T ABLE I. Alternative information measures ov er the past, the future, and the causal state, when they achiev e their limit at finite blo ck length L . As seen, almost all are either trivial, p erio dic, or detect the Mark ov prop erty . Cryptic order stands out as unique. successfully along the entire word. Fig. 18 shows these remaining state-paths in red and w e see that they are the same as those in Fig. 4. 17 App endix C: Crypticity and Co-unifilarit y Here, we explore the equiv alence of E = C µ , co- unifilarit y , and 0-crypticity using sev eral results obtained in Ref. [11]. With a small mo dification, the latter results allo w for a more straightforw ard pro of that leads to a b etter understanding of these relations. The “forw ard” argumen t is that χ ( L ) = 0 implies cryp- ticit y v anishes at all L . First, we recall tw o results. Corollary 6 [11]: If there exists a k ≥ 1 for which χ ( k ) = 0, then χ ( j ) = 0 for all j ≥ 1. Prop osition 3 [11]: lim k →∞ χ ( k ) = χ . Com bining Cor. 6 and Prop. 3, we hav e the following: If there exists a k ≥ 1 for which χ ( k ) = 0, then χ ( j ) = 0 for all j ≥ 1 and χ = 0. The “backw ard” argument is that v anishing in the limit implies crypticity v anishes at all L . Since χ ( k ) is nonnegative (conditional en tropy) and nondecreasing (Prop. 2 [11]) and limits to χ (Prop. 3 [11]), we hav e that χ = 0 implies χ ( k ) = 0, for all k ≥ 0. All that remains is to recall that co-unifilarity is iden- tical to χ (1) = 0 and this establishes the desired chain of implications: Co-unifilar ⇐ ⇒ χ (1) = 0 ⇐ ⇒ ∃ k ≥ 1 : χ ( k ) = 0 ⇐ ⇒ χ ( k ) = 0 , ∀ k ≥ 0 ⇐ ⇒ χ = 0 ⇐ ⇒ 0-cryptic . The heart of the result falls in the middle. It shows us that an y non trivial zero in χ ( k ) is equiv alent to the en tire function, as well as χ itself, v anishing. [1] J. P . Crutc hfield, W. Ditto, and S. Sinha. Intrinsic and designed computation: Information processing in dynam- ical systems—Bey ond the digital hegemony . CHAOS , 20(3):037101, 2010. [2] G. E. Mo ore. Cramming more comp onents onto inte- grated circuits. Ele ctr onics , 38(8):56–59, 1965. [3] G. E. Mo ore. Progress in digital integrated electron- ics. T e chnical Digest 1975. International Ele ctr on Devic es Me eting, IEEE , pages 11–13, 1975. [4] G. E. Mo ore. Lithography and the future of moore’s law. Pr o c e e dings of the SPIE , 2437:1–8, May 1995. [5] H. A. Atmansprac her and H. Scheingraber. Information Dynamics . Plen um, New Y ork, 1991. [6] J. P . Crutc hfield and D. P . F eldman. Regularities un- seen, randomness observ ed: Levels of entrop y conv er- gence. Chaos: An Inter disciplinary Journal of Nonline ar Scienc e , 13(1):25–54, 2003. [7] J. P . Crutchfield and K. Y oung. Inferring statistical com- plexit y . Phys. R ev. Let. , 63:105–108, 1989. [8] J. P . Crutchfield, C. J. Ellison, and J. R. Mahoney . Time’s barbed arrow: Irrev ersibility , crypticity , and stored information. Phys. Rev. L ett. , 103(9):094101, 2009. [9] J. R. Mahoney , C. J. Ellison, and J. P . Crutchfield. In- formation accessibilit y and cryptic processes. J. Phys. A: Math. The o. , 42:362002, 2009. [10] R. G. James, C. J. Ellison, and J. P . Crutc hfield. Anatom y of a bit: Information in a time series observ a- tion. page submitted, 2010. San ta F e Institute W orking P ap er 11-05-XXX; arxiv.org:1105.2988 [math.IT]. [11] C. J. Ellison, J. R. Mahoney , and J. P . Crutchfield. Prediction, retro diction, and the amount of information stored in the present. J. Stat. Phys. , 136(6):1005–1034, 2009. [12] J. P . Crutchfield, C. J. Ellison, J. R. Mahoney , and R. G. James. Synchronization and control in intrinsic and de- signed computation: An information-theoretic analysis of competing models of sto chastic computation. CHAOS , 20(3):037105, 2010. [13] J. P . Crutc hfield and D. P . F eldman. Synchronizing to the environmen t: Information theoretic limits on agent learning. A dv. in Complex Systems , 4(2):251–264, 2001. [14] A synchr onizing word is symbol sequence that induces one and only one causal state. This is in contrast with a minimal sync hronizing word, of whic h no proper prefix is a sync hronizing w ord. [15] The fact that any state is led by any symbol to at most one next state is the prop ert y known as unifilarity—a direct consequence of the states b eing causal states. [16] The maze example is not a stationary pro cess, so there are some imp ortant differences. F or instance, there can b e words that terminate after the length of the longest maze solution. [17] Assume an N by N maze. A nonintersecting solution cannot contain more instructions than there are locations within the maze. [18] T o b e more precise, it is not so muc h that the statistical complexit y is deriv ed from considering all paths as it is deriv ed from considering no paths. [19] J. P . Crutchfield and D. P . F eldman. Statistical complex- it y of simple one-dimensional spin systems. Phys. R ev. E , 55(2):1239R–1243R, 1997. [20] C. R. Shalizi and J. P . Crutc hfield. Computational me- c hanics: P attern and prediction, structure and simplicit y . J. Stat. Phys. , 104:817–879, 2001. [21] J. R. Mahoney , C. J. Ellison, and J. P . Crutc hfield. Infor- mation accessibilit y and cryptic processes: Linear combi- nations of causal states. 2009. arxiv.org:0906.5099 [cond- mat]. [22] N. T ra vers and J. P . Crutchfield. Asymptotically syn- 18 c hronizing to finitary sources. 2010. SFI W orking P ap er 10-06-XXX; arxiv.org:10XX.XXXX [XXXX]. [23] One might choose to consider the pro cess to be a bi- infinite collection of symbols or consider including causal states as well. W e could similarly consider reverse causal states. [24] J. P . Crutchfield and C. R. Shalizi. Thermo dynamic depth of causal states: Ob jective complexity via mini- mal represen tations. Phys. R ev. E , 59(1):275–283, 1999. [25] D. P . F eldman and J. P . Crutc hfield. Discov ering non- critical organization: Statistical mec hanical, information theoretic, and computational views of patterns in simple one-dimensional spin systems. 1998. Santa F e Institute W orking P ap er 98-04-026. [26] Ryan G. James, J. R. Mahoney , C. J. Ellison, and J. P . Crutchfield. Man y roads to synchron y: Natu- ral time scales and their algorithms. submitte d , 2010. arxiv.org:1010.5545 [nlin.CD]. [27] R. W. Y eung. A new outlo ok on Shannon’s information measures. IEEE T r ans. Info. Th. , 37(3):466–474, 1991. [28] C. J. Ellison, J. R. Mahoney , R. G. James, J. P . Crutch- field, and J. Reichardt. Information symmetries in ir- rev ersible pro cesses. pages 1–32, 2011. Santa F e In- stitute W orking Paper 11-07-028, [cond-mat.stat-mec h]. [29] R. Landauer. Dissipation and noise immunit y in compu- tation, measuremen t, and communication. J. Stat. Phys. , 54(5/6):1509–1517, 1989.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment