Entropy-based Generating Markov Partitions for Complex Systems

Finding appro ximate GMPs for Complex Systems Entrop y-based Generating Ma rk ov P a rtitions fo r Complex Systems Nicol´ as Rubido, 1 Celso Greb ogi, 2 and Murilo S. Baptista 2 1) Universidad de la R ep´ ublic a (UdelaR), Instituto de F ´ ısic a de F acultad de Ciencias (IFF C), Igu´ a 4225, Montevide o, Uruguay 2) University of A b er de en (UoA), King’s Col lege, Institute for Complex Systems and Mathematic al Biolo gy (ICSMB), AB24 3UE Ab er de en, United Kingdom (Dated: 27 No vem b er 2017) Finding the correct enco ding for a generic dynamical system’s tra jectory is a complicated task: the sym b olic sequence needs to preserv e the in v arian t prop erties from the system’s tra- jectory . In theory , the solution to this problem is found when a Generating Mark o v Partition (GMP) is obtained, which is only deﬁned once the unstable and stable manifolds are known with inﬁnite precision and for all times. Ho w ev er, these manifolds usually form highly conv o- luted Euclidean sets, are a priori unkno wn, and, as it happ ens in any real-w orld exp eriment, measuremen ts are made with ﬁnite resolution and ov er a ﬁnite time-span. The task gets ev en more complicated if the system is a net w ork comp osed of interacting dynamical units, namely , a high-dimensional complex system. Here, w e tac kle this task and solv e it b y deﬁning a method to appro ximately construct GMPs for an y complex system’s ﬁnite-resolution and ﬁnite-time tra jectory . W e critically test our metho d on net w orks of coupled maps, enco d- ing their tra jectories into sym bolic sequences. W e sho w that these sequences are optimal b ecause they minimise the information loss and also any spurious information added. Conse- quen tly , our metho d allows us to approximately calculate the in v ariant probability measures of complex systems from observed data. Thus, we can eﬃciently deﬁne complexit y measures that are applicable to a wide range of complex phenomena, suc h as the characterisation of brain activit y from EEG signals measured at diﬀerent brain regions or the characterisation of climate v ariability from temp erature anomalies measured at diﬀerent Earth regions. P ACS n umbers: 89.75.-k, 05.45.-a, 02.50.-r, 89.75.Fb Keyw ords: Marko v partitions, Shannon entrop y , Information theory , Complex Systems The use of measures from Information Theory for complex system’s analysis requires the estimation of probabilities. In practice, these probabilities need to b e deriv ed from ﬁnite data-sets, namely , EEG signals coming from diﬀerent brain regions, EK G signals coming from the heart, or temp er- ature anomalies coming from diﬀeren t Earth re- gions. Resp ectively , the complex systems in these cases are the brain, the heart, and the Earth cli- mate —all b eing systems comp osed of m an y dy- namically in teracting comp onents. The main rea- son behind using measures from Information The- ory to analyse complex systems is that these mea- sures help to better understand and predict their b eha viour and functioning. How ever, calculating probabilities from observ ed data is nev er straigh t- forw ard; in particular, up-to-now, w e lac k of prac- tical wa ys to deﬁne them without losing useful (or adding meaningless) information in the pro cess. In order to minimise these spurious additions or losses, we prop ose here a metho d to deriv e these probabilities optimally . Our metho d makes an en trop y-based enco ding of the measured signals, th us, transforming them in to easy-to-handle sym- b olic sequences con taining most of the relev ant information ab out the system dynamics. Con- sequen tly , w e can ﬁnd the Information Theory measures, or any other spatio-temp oral av erage, w e seek when analysing a complex system. I. INTRODUCTION Complex systems are gaining atten tion breathtakingly . The reason is simple, nature and man-made systems are ﬁlled with suc h examples, where man y units in ter- act dynamically and are able to collectively self-organise —as our brains, comp osed of billions of neurons inter- connected in complex synaptic netw orks, or our p ow er- grids, comp osed of steady p ow er-plants, ﬂuctuating re- new able p o w er-sources, and (somewhat) randomly de- manding consumers, all inter-connected b y a complex net w ork of transmission lines. In general, it is imp or- tan t to understand and foresee the emerging collective b eha viours that complex systems can exhibit, since this can help us to con trol them; for example, to prev en t epileptic seizures or p ow er black outs. An imp ortant wa y to characterise these complex systems and their emerg- ing b ehaviours is b y using measures from I nformation Theory , whic h require the calculation of inv ariant proba- bilities from observ ed data, a process that is nev er trivial. The inv ariant probabilit y measure (IPM) 1–4 , µ (Γ), of a complex system or a dynamical system, is the prob- abilit y measure, µ , that is preserved under the sys- tem’s equations of motion, and gives the probabilit y densit y of ﬁnding the system at a given p oint in state space, Γ. Diﬀerent statistical quantities can b e deﬁned in terms of IPMs, such as the av erage p osition of the system in state space ( h x i = R Γ x dµ ), the diﬀeren tial En trop y 5 ( H = − R Γ log( µ ) dµ ) or the Kolmogorov-Sinai en trop y 6 , which are measures of the system’s a verage un- predictabilit y/information con tent, or the Ly apuno v ex- Finding appro ximate GMPs for Complex Systems 2 p onen ts ( λ = R Γ log | D F ( x ) | dµ , where D F ( x ), x ∈ Γ, is the system’s Jacobian), which measure the system’s c haoticit y . Nonetheless, we can rarely derive or guess the exact IPM for higher dimensional systems ( D > 1), since it requires inﬁnite precision for all times, which is completely impractical. On the contrary , and from a practical p oint-of-view, it is adv an tageous to deriv e a coarse-grained discrete IPM by making some ﬁnite- resolution observ ations, only during a ﬁnite time-interv al, and on a pro jected low er-dimensional space, but from where all relev an t statistical quan tities can be well esti- mated. Th us, instead of dealing with a contin uous IPM, w e need to transform the system’s tra jectories into ﬁnite sym b olic-sequences conforming a ﬁnite alphabet 1–4 and then ﬁnd a discrete IPM for the sym b ols’ probabilities. Enco ding a tra jectory into a symbolic sequence with- out adding meaningless (or lo osing imp ortant) informa- tion is only ac hiev ed once a Generating Mark ov P artition (GMP) is deﬁned 2–4 . The reason is that a GMP preserves the ﬂo w’s in v arian t properties and divides the state space in to a complete set of disjoint regions, namely , it cov- ers all state space. Sp eciﬁcally , a GMP enco ding has a one-to-one relationship with the system’s tra jectory (i.e., eac h sym b olic sequence is speciﬁc to each initial condi- tion), con tains the maxim um amoun t of information that an y tra jectory-enco ding can ha v e (i.e., maximises the se- quence’s en trop y), has a minimum num b er of symbols (i.e., any larger num b er of symbols that could b e used are already con tained in the enco ding, which is the parti- tion’s generating character, and deﬁnes a ﬁnite alphab et), and results in a symbolic sequence that is memoryless, namely , it is Marko vian 7 . Consequently , a GMP enco des a deterministic tra jectory in to a symbolic sequence that b eha v es as if it w as generated b y indep enden t random sources and con tains all the relev ant information. This important memoryless character of the sym bolic sequence is the one allowing us to use measures from In- formation Theory , suc h as the Shannon Entrop y (SE), since these measures are t ypically only deﬁned in terms of random sources 5–7 (for a non-random source, the SE is an upper-b ound for its information conten t). There- fore, if a non-Marko vian enco ding is used, the random c haracter is lost, and not only these measures are deﬁ- cien t, but inﬁnitely long tra jectories are needed. How- ev er, a GMP is correctly deﬁned only after the system’s in v ariant manifolds are known, and the in v ariant man- ifolds of non-linear complex systems generally conform highly conv oluted sets 8–13 , thus, requiring inﬁnitely pre- cise measurements for all times. In order to av oid calcu- lating the unstable and stable manifolds, previous meth- o ds ha ve obtained appro ximate Marko v partitions for dy- namical systems b y searching alternative approac hes 14 . F or example, Ref. 15 sho ws how to calculate partitions b y the primary homoclinic tangencies of dissipativ e systems, later extended to conserv ativ e systems 16 . In Ref. 17 , au- thors show how to ﬁnd partitions b y lo cating the Unsta- ble Periodic Orbits of chaotic systems 18 . There are also metho ds that approximate Marko v partitions by ﬁnding a partition that generates a symbolic sequence that is unique and it is one-to-one with the tra jectory 19 . Here, we show how to ﬁnd an approximate GMP for a complex system using ﬁnite resolution and ﬁnite time in terv als. Our method uses optimally enco ded data se- quences that b ehav e, from an informational p oint-of- view, in the same wa y that enco ded sequences obtained from true GMPs do. Namely , we pro vide an entrop y- based metho dology to obtain an optimal sym b olic en- co ding that contains most of the relev ant information ab out the system dynamics. F rom this enco ding, spatio- temp oral in v ariant a verages can b e estimated. Our approac h follows the lines b ehind the method pro- p osed in Ref. 20 . There, a Marko v memoryless representa- tion of a system is constructed based on the assumption that the more accurate the partition is, the more pre- dictiv e information it provides. Hence, ﬁner partitions could lo wer the uncertaint y in future estimates. On the con trary , our main idea here is to consider informational manifestations of a GMP , namely , a partition that leads the Shannon Information Rate (SIR) v alue [see Eq. (1)] to b e constan t and p ositive for an y length of the sym b olic sequence. Another entrop y-based manifestation that re- ﬂects an encoding from a GMP is that the SIR for par- titions of diﬀerent orders (resolution) for an appropriate range of the symbolic sequence length remains inv ariant. Moreo v er, an optimal partition must extract as muc h in- formation from the complex system as possible, that is, it must generate a Marko v-lik e pro cess. This is achiev ed b y satisfying Eq. (2). In practice, to make our methodol- ogy accessible, we seek for maximisation and inv ariance of SIR v alues for a range of w ord lengths ( L v alues). Another condition, seen as a manifestation of a memory- less system observed ov er a low er-order partition, is that Shannon En tropy (SE) is equal to the v alue at whic h SIR is maximal and in v ariant. This implies that SIR is in v ari- an t for any w ord length for that partition. W e, how ev er, did not satisfy this latter condition. Instead, we seek a partition that rem ains inv arian t while the SIR tends to an asymptotic v alue for increasing L v alues. Hence, our method constructs the approximate GMP from these en trop y-based conditions, rather than based on the top ol- ogy of the system’s manifolds. In order to v alidate our metho d, we use net w orks of coupled maps and enco de their tra jectories into sym bolic sequences, showing that our results are optimal as they minimise the information loss and also an y spurious additions. I I. METHODS AND MODEL A. Generating Mark ov Pa rtitions and symb olic enco dings W e deﬁne a p artition as the parting of a smo oth dy- namical system’s domain into disjoint op en regions (i.e., state-space regions). An enc o ding pro cess uses these re- gions to deﬁne a shift-inv ariant constrain t that acts on a ﬁnite dictionary , namely , each state-space region is as- so ciated to a particular symbol, th us, creating a ﬁnite sym b olic-space from the dynamical system’s state-space. In other words, for each state-space region the enco ding deﬁnes a symbol that is assigned to all tra jectory p oints that fall within that region. Hence, any tra jectory can b e enco ded in to a sym b olic sequence giv en a partition. F or example, let an in vertible dynamical system, which Finding appro ximate GMPs for Complex Systems 3 is a set X and an inv ertible mapping F : X → X , deﬁne an orbit through a given p oint (initial condition) x ∈ X , namely ,  . . . , F − 2 ( x ) , F − 1 ( x ) , x, F ( x ) , F 2 ( x ) , . . .  . Th us, p oint x is represented b y a bi-inﬁnite sequence. Us- ing a partition enco ding means that a symbolic sequence is generated from this orbit. The symbolic sequence is formed b y the successive disjoint regions visited b y the orbit, regions whic h are deﬁned by the partition. Con- sequen tly , after the enco ding, the orbit passing through p oin t x is represented by a bi-inﬁnite symbolic sequence α ( x ) = . . . α − 2 α − 1 . α 0 α 1 α 2 . . . , where α n ∈ S is the sym b ol (also kno wn as letter) associated to the partition region where the n -th iterate falls, F n ( x ), S is the dic- tionary (i.e., the ﬁnite alphabet resulting from all the disjoin t regions that the partition deﬁnes), and the “.” indicates where the sequence starts. F or non-in v ertible mappings, the orbit and the symbolic sequence are in- ﬁnite sequences instead, namely ,  x, F ( x ) , F 2 ( x ) , . . .  and α ( x ) = α 0 α 1 α 2 . . . , resp ectively . By enco ding a deterministic tra jectory into sym bols, w e gain that, instead of having real-v alued iterates or tra- jectory p oints (with p ossibly inﬁnite precision observ a- tions —con tin uum formalism) to analyse its statistic and ﬁnd the inv ariant probabilit y measure, w e hav e a sym- b olic sequence of elements coming from a ﬁnite num b er of letters in an alphab et, i.e., α n ∈ S , which is signiﬁcantly simpler to analyse. W e can dra w an analogy b et w een this enco ding approac h with the Statistical Mec hanics view- p oin t of ideal particles inside a box. Instead of lo oking at each particles tra jectory , or that of the whole system, Statistical Mec hanics is interested in lo oking at the prob- abilit y of having the particle (or the whole system) with a particular v alue of p osition and momentum. Hence, the particles tra jectory time-dep endence is lost, and only the tra jectorys visits to the diﬀerent state-space regions matter; namely , it go es from a time view-p oint, whic h is deterministic, to a space view-p oint, which for the case of Marko v partitions, is memoryless. In particular, for the ideal particles, if the exact v alues of all positions and momen ta are considered to be discrete (because, for ex- ample, the measuremen t device has lo w resolution) in- stead of p ossibly taking any contin uum v alue that can satisfy the systems Hamiltonian, we should hav e a ﬁnite set of probabilities for e ac h region of the state space cor- resp onding to the given discretization. This is analogous to the partitioning of the s tate space and symbolic en- co ding w e prop ose and that GMPs do. After the tra jectory is enco ded in to a sym bolic se- quence, w e can ﬁnd w ord-statistics, namely , the statistics coming from a string of L letters, i.e., a w ord, given b y α n + L − 1 n ≡ α n α n +1 . . . α n + L − 1 . F or example, if p ( α L − 1 ) is the probability of having w ord α L − 1 , the Shannon In- formation Rate (SIR), h , is found b y 5,7 h L ≡ − lim L →∞ 1 L X p  α L − 1  log  p  α L − 1  , (1) where the summation is ov er all possible string lengths L , i.e., |S | L , and the logarithm is taken in base 2 if the unit is the bits p er sym b ol 7 . In other w ords, h L in Eq. (1) quan tiﬁes the av erage information p er sym bol that w ords of length L carry in the symbolic sequence, whic h is an appro ximation to the Kolmogoro v-Sinai entrop y , namely , h L = − lim L →∞ 1 L  log  p  α L − 1  . W e note that, when a symbol generator is close to a random generator, for every L , the inter-sym b ol dep en- dence is negligible and the information source is said to b e memoryless. The inter-sym b ol indep endence is veri- ﬁed when the symbolic sequence is mixing . Sp eciﬁcally , when after a ﬁnite time-lap, τ , the joint probabilit y of ﬁnding a symbol, α n , of a sequence at iteration n and an- other symbol, α n + τ , at iteration n + τ , i.e., p ( α n ; α n + τ ), is (appro ximately) iden tical to the product b etw een the probabilities of ﬁnding each symbol indep endently . In other w ords, the sym b ols are uncorrelated after τ and p ( α n ; α n + τ ) ' p ( α n ) p ( α n + τ ) . (2) F or memoryless sequences, the SIR [Eq. (1)] is constan t for an y ﬁnite L and identical to the Shannon Entrop y 5 (SE), H ( Y ). The SE is deﬁned for a random v ariable Y , which could b e the sym b olic sequence obtained b y a GMP enco ding, with outcomes y ∈ S , which could b e letters from an alphab et, and probabilit y p ( y ) by H ( Y ) ≡ − X y ∈S p ( y ) log [ p ( y )] = − h log ( p ) i . (3) So, the SE is h L =1 , i.e., it is equal to the SIR for L = 1 [Eq. (1)]. Consequently , it is necessary for the partition of a dynamical system’s state-space to enco de any tra- jectory and initial condition such that the resultant sym- b olic sequences are all memoryless, that is, if one wan ts to apply Eqs. (1) and (3) under the same h ypothesis as those used b y Shannon in Ref. 5 (namely , for random sources). In order to enco de any deterministic tra jectory by means of a state-space parting and obtain a memoryless sym b olic sequence that maximises the SE and SIR, we need a Gener ating Markov Partition (GMP). The reason is tw ofold. Firstly , that a gener ating partition is unique and an order- q partition can b e used to generate one of order-( q + 1) 21 . This means that the dictionary (i.e., the diﬀeren t letters corresponding to the disjoint regions) de- riv ed from an order- p partition, S q , is contained in the dictionary from an order-( q + 1) partition, S q +1 . Hence, the smallest dictionary/partition order can b e used. Sim- ilarly , this also implies that we can use the coarsest part- ing of the system’s domain into disjoin t regions. Sec- ondly , a Markov p artition maps the expanding [con tract- ing] directions of the system’s dynamics in to expanding [con tracting] directions of the sym b olic space 2–4 . Hence, using a GMP to enco de tra jectories result in bijective (eac h symbolic sequence maps one-to-one with the ob- serv ed tra jectory , particularly , diﬀeren t initial conditions result in diﬀeren t sym b olic sequences) and uncorrelated (memoryless) sym b olic sequences w ith constant SIR (i.e., the SE pro duction rate that a symbolic-sequence has with resp ect to the partition order is constant) as we increase the w ord length, L , and partition order, q . W e would like to stress that, if an order- q partition obtained after proper maximisation of SIR o ver all p os- sible partitions lo cations satisﬁes the Marko vian prop- ert y , giv en by Eq. (2), for a given word length L , the Finding appro ximate GMPs for Complex Systems 4 SIR obtained for the same partition but for word length L + 1 will b e the same. This can also b e used to v er- ify the Marko vian prop erty , not by fulﬁlling Eq. (2), but b y seeking a plateau of the SIR v alues as a function of L . Namely , by seeking the domain parting lo cations that mak e the SIR maximal and in v ariant for diﬀerent L . B. Our enco ding metho dology F or a GMP enco ding, the dynamical prop erties of the system are preserv ed and the memory b et w een consecu- tiv e symbols is lost. Of course, almost any enco ding will reco v er the full information of the dynamical system’s tra jectories when the partitions hav e inﬁnite order, since this w ould deﬁne an inﬁnite dictionary/alphab et. This is still an improv emen t, since inﬁnite dictionaries are still coun table sets, while tra jectory p oin ts in state space are uncoun table. How ever, for practical matters, this is still useless. Th us, our aim is to reco v er the maxim um (p os- sible) amount of information with the least n um b er of partitions, i.e., ﬁnite and small order- q partitions, and a memoryless ﬁnite sequence, i.e., sequences of length L to deal with the practical constraints that words of inﬁnite length, as in Eq. (1), are infeasible. Moreov er, our parti- tions are marginal since we divide the domain by means of planes, which implies that our encoding alwa ys deﬁnes a dictionary that has a m ultiple of 2 p ossible symbols. Sp eciﬁcally , our marginal partitions constitute orthog- onally intersecting straigh t lines for t w o-dimensional sys- tems, planes for three-dimensional systems, and hyper- planes for higher-dimensional systems. The reasons is that, by using straigh t lines or planes to divide the state- space into disjoin t regions, we are discarding, unequiv- o cally , all other state-space regions. As a result, the computation of the probabilit y of having a tra jectory ly- ing within any particular region is indep endent of the rest of the regions, hence, the term marginal. On the other hand, GMPs deﬁne complex partitions, generally in v olving b orders b etw een regions that are highly conv o- luted. Thus, despite of GMPs b eing alw a ys the correct w a y to observe any dynamical system extracting all rele- v ant information and without lo osing information, their construction from data is usually imp ossible to obtain, sp ecially when dealing with exp erimen tal systems. Recen tly , diﬀeren t metho ds hav e b een prop osed to ac hiev e this goal without relying on GMPs. F or exam- ple, in Ref. 23 , the authors in tro duce a measure based on sequential ordering of the elemen ts in the data se- ries, namely , ordinal patterns, which hav e prov en use- ful in v arious ﬁelds 24–26 . The sym bolic sequence is then found in a natural w ay without any assumptions about the mo del. This av oids the diﬃcult problem of ﬁnding the righ t GMP 21 ; how ever, the generating and bijective prop erties of the GMPs are lost. This loss is also seen in diﬀerent threshold-crossing analysis, as is p ointed out in Ref. 22 . Other metho ds ha v e b een proposed that pre- serv e the bijective and generating character, such as using higher order partitions 26–28 , the computation of unstable p erio dic orbits 17,18 , deﬁning a sym b olic shadowing 19 , or ﬁnding sym b olic nearest neigh b ours 29 . In our case, w e ﬁnd approximate GMPs using marginal partitions, as exempliﬁed in Fig. 1 for a tw o-dimensional state-space. There, w e start b y arbitrarily dividing the system’s domain in to 4 regions (shaded quadrants in Fig. 1). This division corresp onds to an order-1 marginal partition that deﬁnes 4 sym b ols, i.e., a dictionary S 1 ≡ { α (1) 1 = α, α (1) 2 = β , α (1) 3 = χ, α (1) 4 = δ } (shown in the lo w er left corner of Fig. 1). The b order b etw een these regions (thick dashed lines), namely , the division place- men t, is shifted until the mixing and information proper- ties of the resulting symbolic sequence are optimal (con- tin uous lines), as we explain in what follows. In particu- lar, the shift is done ﬁrst on the horizontal dashed line, y (1) (across the interv al), and later on the v ertical dashed line, x (1) . F or each division placement, the particular tra jectory b eing enco ded is transformed into a symbolic sequence, where its SE, H ( x (1) , y (1) ), and SIR v alues, h L ( x (1) , y (1) ), for diﬀeren t L are found. Then, the opti- mal partition is set from all these sequences by taking the sequence yielding a maxim um SIR v alue [Eq. (1)] with in- v ariant L c haracteristics and a v alid SE [Eq. (3)], namely , the optimal memoryless and Mark o vian sequence. 0 1 0 1 va ri a b l e or de r - 1 pa rt i t i on va ri a bl e or d e r - 1 pa rt i t i on 0 1 x o p t ( 1 ) 1 y o p t ( 1 ) 0 O pt i m a l orde r - 1 pa r t i t i on 1 0 0 1 y 1 ( 1 ) x 1 ( 1 ) y 1 ( 2 ) y 2 ( 2 ) va ri a bl e orde r - 2 p a r t i t i o n va ri a bl e orde r - 2 pa r t i t i on x 1 ( 2 ) x 2 ( 2 ) orde r - 1 (re gi o n ) s y m bol s : =  =  =  =  orde r - 2 (re gi o n) s ym bol s :                         X dom a i n i nt e r va l Y dom a i n i nt e rva l 0 1 1 0 FIG. 1. Approximate Generating Marko v Partition (GMP) metho d depiction for a t w o-dimensional state-space (ﬁlled square). The v ertical and horizontal thic k dashed lines show the order-1 marginal partition of the domain interv als as they shift positions, where the symbolic sequence (low er left cor- ner) is found from the resulting quadran ts (ﬁlled areas). The con tinuous lines represent the optimal p osition, whic h is the one with higher Shannon Information Rate (SIR) [Eq. (1)]. Fixing the optimal order-1 partition, an order-2 partition ap- pro ximation is sought in a similar wa y (ﬁne dashed lines). Once we ﬁx the order-1 partition lo cation approxi- mately (contin uous vertical and horizontal lines in Fig. 1 lo cated at x (1) opt and y (1) opt ), we attempt to increase the par- tition order by making sub-divisions, namely , generating disjoin t sub-domains from the former 4 quadrants. W e rep eat the former analysis with the resulting sequences of this new partition, which attempts to approximate an order-2 partition. Particularly , this order-2 partition ex- pands the alphab et from |S 1 | = 4 1 to |S 2 | = 4 2 p ossible Finding appro ximate GMPs for Complex Systems 5 sym b ols, whic h are shown in the low er left corner in Fig 1 and represen t the 16 disjoint regions. W e note that the SIR v alue for the order-1 marginal partition and w ords of length L = 2, h q =1 L =2 , has to be appro ximately equal to the SE v alue for the order-2 marginal partition, H q =2 , which is simply the SIR v alue for words of length L = 1, i.e., h q =2 L =1 . In other words, the generating c haracter of our ap- pro ximate Marko v partition is revealed if h q =1 L =2 ' h q =2 L =1 or when L is a m ultiple of q . Otherwise, the sub-division pro cess is con tin ued un til this condition is met. W e highlight that the attractiveness of using symbolic analysis is that, using w ords of length L in Eq. (1), for a one-dimensional system and a partition with spatial resolution 1 / 2 (namely , a tra jectory that is enco ded in to a binary sequence using an order-1 partition), pro vides the same results as doing the analysis considering tra jectories with resolution prop ortional to 2 − L . Moreo v er, if the partition is a GMP , then the analysis can b e done with w ords of length as small as L = 1, i.e., with the SE. Hence, our metho d is aimed at ﬁnding a suitable L such that the partition b ehav es as close as p ossible to a GMP . Although we detailed our method for a square-lik e t wo- dimensional state-space as in Fig. 1, its generalisation to non-square domains and higher dimensional state-spaces is straightforw ard. This is possible as long as our previ- ous considerations are met, namely , memoryless symbolic sequences with high SIR v alues and approximate generat- ing characteristics. In particular, it is w orth noting that our marginal partitions increase the n umber of sym bols in the alphab et as |S q | = (2 D ) q ev ery time we increase the order- q of the partition. Namely , our marginal order- q partitions divide a D -dimensional state-space domain in to 2 q D disjoin t regions. Moreo ver, for each order- q , we ha v e to test diﬀerent lo cations for the marginalization (as depicted b y the dashed lines in Fig. 1), which corre- sp ond to the partition’s lo cation resolution. Let then R deﬁne the num ber of diﬀerent lo cations we try p er do- main dimension in order to ﬁnd an optimal order- q par- tition, i.e., the partition lo cation is then deﬁned with a 1 /R precision. This implies that our brute-force compu- tations scales as O  R D  × O  |S q | L  , where R D is the to- tal n umber of diﬀeren t sym b olic sequences resulting from eac h partition’s location being explored and |S q | L is the n um b er of diﬀerent words of length L that the order- q partition generates and ha ve to b e analysed to ﬁnd the SIR and SE v alues. Clearly , we ackno wledge that our metho dology can be improv ed by using optimization sc hemes and/or GPU parallel computations, but stan- dard CPUs can b e used for our current results’ analysis. W e summarise in the following the main steps, con- cepts, and ideas b ehind our methodology . (i) Find a lo w er-order partition (order-1 in this work) that maxi- mizes the SIR for large L v alues. If there is a range of L v alues that the maximal v alue of SIR (ov er many p ossi- ble marginal partitions) remains in v arian t, the partition is generating in the time sense and for that range. This is also a manifestation of a partition with a Mark ovian c har- acteristic (i.e., fulﬁlling Eq. (2)). (ii) Calculate the SE for the partition obtained in (i) for whic h SIR is maximal (o v er marginal partition lo cations) and inv ariant (ov er a range of L v alues). If SE is equal to the inv ariant v alue of SIR for a range of L v alues, the partition has Marko- vian properties. Instead of doing this, we seek a parti- tion whose lo cation remains inv ariant as the maximum v alue of SIR (ov er all p ossible marginal partitions) tends asymptotically to a constant v alue as L is increased. This is done because the Marko vian prop erty only emerges for larger v alues of L , likely a consequence of using marginal partitions instead of respecting the in v ariant manifolds. (iii) After ﬁnding a low er-order partition that is generat- ing in the time sense and has a Marko vian prop erty , ﬁnd a higher-order partition (order-2 in this work) by splitting the low er-order partition in to sub-interv als. The gener- ating (in the time sense) and Marko vian prop erties of this new division can b e tested using the conditions (i) and (ii), resp ectively . If the maximal v alue of SIR for large L v alues remains inv ariant, with resp ect to the v al- ues obtained for the lo w er-order partition, the partition is generating in the space sense. F rom the p ersp ective of Information theory , a generating partition is one whose enco ding do es not either generate or destro y information. C. Our complex system mo del In order to test our metho d, we analyse a particular complex system: a set of one-dimensional coupled maps follo wing the Kaneko coupling type 30 . Hence, our sys- tem is N -dimensional (b ecause it has N one-dimensional maps) and evolv es according to the recursiv e relationship x ( t +1) i = (1 −  ) f i  x ( t ) i  +  N X j =1 A ij k i f j  x ( t ) j  , (4) where x ( t ) i [ x ( t ) j ] is the i -th [ j -th] map’s state at a discrete time t , f i ( x ( t ) i ) [ f j ( x ( t ) j )] is the i -th [ j -th] map’s function,  is the coupling strength, and A ij is the adjacency ma- trix of the net w ork. Sp eciﬁcally , A ij is the ij -th entry of a binary matrix, whic h is 1 [0] if a link connecting no des i and j is presen t [absen t]. In particular, the edge density of the net work, ρ , is ρ ≡ P i,j A ij / N ( N − 1) and the node degree is k i ≡ P j A ij (i.e., no de’s i neighbour n umber). T o commence with a w orking example, w e set N = 2 and let the map’s b e logistic, i.e., f i ( x ( t ) ) = r i x ( t ) (1 − x ( t ) ), r i b eing the con trol parameter for the isolated dynamic of map i , with i = 1 , 2, and let the adjacency matrix b e A 12 = A 21 = 1, A 11 = A 22 = 0. This sets a sym- metric coupling b etw een maps, although, it can b e an heterogeneous conﬁguration if r 1 6 = r 2 . Our goal is to ﬁnd an approximate GMP from infor- mational measures to obtain a ﬁnite-resolution discrete In v ariant Probability Measure (IPM), which provides an optimal encoding of the system and from which inv ariant spatio-temp oral in v arian t quan tities can be estimated. This discrete IPM enables the estimation of the relev an t statistical quantities, such as the Lyapuno v exp onents or Kolmogoro v-Sinai en tropy . In order to do that, in what follo ws, we tak e 1 . 1 × 10 5 iterations of Eq. (4) and re- mo v e the ﬁrst 0 . 1 × 10 5 iterations, which we consider as a transien t. Also, our initial conditions are randomly set. Finding appro ximate GMPs for Complex Systems 6 I I I. RESUL TS AND DISCUSSION A. Identical logistic maps order- 1 partition appro ximation The IPM of an isolated logistic map in its c haotic regime, i.e., r = 4, is known, and it is giv en by µ = 1 /π p x (1 − x ) 3 . This IPM implies that most of the time the system is found close to the interv al extremes. Simi- larly , the exact GMP is also kno wn, which for the order-1 GMP is dividing the unit in terv al [0 , 1] in half. How ever, when tw o logistic maps are coupled, an expression for the IPM or the GMP is unknown. The reason is that, ev en in the case where b oth maps are identical and in their chaotic regime, the coupling deforms their isolated attractors c hanging the IPM and GMP . FIG. 2. The left panel shows the attractor for N = 2 identical logistic-maps symmetrically coupled [Eq. (4)]. The coupling strength [map parameter] is set to  = 0 . 10 [ r = 4 (chaotic regime)]. The initial condition is set randomly and 10 5 itera- tions are shown. The rectangles indicate particular symmetric areas of the attractor, whic h are sho wn on the resp ectiv e righ t panels. The dashed vertical and horizon tal lines sho w the order-1 generating Mark ov partition of the uncoupled maps. F or example, we see from Fig. 2 that there are state- space regions where there is a larger probabilit y of ﬁnd- ing the tra jectory of the coupled system, namely , the signalled rectangle areas that appear in the right panels in Fig 2. These areas show a higher complexity than the rest of the attractor, hence, w e notice that the order- 1 GMP for an isolated map (i.e., the interv al splitting in to tw o halves) could b e unhelpful. Namely , the order-1 GMP migh t ha v e to be mov ed to a diﬀerent place in each maps’ interv al, dep ending on the coupling strength v alue,  . Nev ertheless, in order to kno w where the best marginal partition should b e placed, we need to maximise the SIR v alue for increasing word-length L , as in Eq. (1). The imp ortance of using higher L v alues is seen when lo oking at the case with L = 1. F or L = 1, the SIR v alues are identical to the SE, as seen from Eqs. (1) and (3). Hence, the order-1 partition location that maximises the SIR of words with L = 1 is the same as the one that maximises the SE. As Fig. 3 shows in colour co de for  = 0 . 10, instead of locating the order-1 partition at 0 . 5 (v ertical and horizontal dashed lines in the left panel of Fig. 2) to achiev e maximum SE, the maximum is achiev ed when splitting the interv al at a higher p osition, namely , at 0 . 70 ± 0 . 02, as signalled b y the vertical and horizontal FIG. 3. Shannon Entrop y (SE) v alues [Eq. (3)] (colour co de) for the coupled maps shown in Fig. 2. The v alues are found from the symbolic sequence that results after dividing the state-space into 4 regions, i.e., 4 diﬀerent symbols enco de the coupled-system’s tra jectory . The division sets an order- 1 marginal partition and the symbolic sequence c hanges ac- cording to the partition’s lo cation. The maxim um SE is found for the partition that is signalled b y the vertical and horizon tal contin uous lines, where the dashed lines show a ± 1 / 50 = ± 0 . 02 resolution for its placement. con tin uous lines. The dashed lines are the resolution we use to deﬁne the partition placemen t, whic h in this case is 1 /R = 1 / 50 = 0 . 02. In general, the highest p ossible SE is 1 when using log 4 in Eq. (3) and a 4-symbol alphab et; instead of using bits and a 2-symbol alphab et. W e are unable to attain 1 (i.e., for a completely random source) for an y of our order-1 marginal partition lo cations, al- though our v alue is close to 1, i.e., max { H p =1 } = 0 . 98 when the split is done in b oth in terv als at 0 . 70 ± 0 . 02. The more we increase  , the more the system’s attrac- tor extension is reduced, hence, the less entropic the sys- tem is. F urthermore, it is known 31 that there is a critical coupling strength,  c , also v alid for other chaotic regimes (i.e., r < 4), where for  >  c the collectiv e dynamics of the t wo maps is coheren t and collapses to the diago- nal of the state-space. Th us, the p osition of the order-1 partition that maximises the SE clearly dep ends on  . Sp eciﬁcally , when the maps are identical,  c c hanges as a function of the map’s parameter as 31 ,  c ≡ 1 2  1 − e − λ iso ( r )  , (5) where λ iso ( r ) > 0 is the Ly apuno v exp onent of an isolated logistic map in a chaotic state, which depends on r . In particular, for r = 4, λ iso (4) = ln(2), hence,  c = 1 / 4. F or  >  c = 1 / 4, the system synchronises and the state- space dynamic is restricted to the diagonal line. It is w orth noting that this condition [Eq. (5)] holds for any n um b er N of all-to-all coupled maps if one changes the m ultiplying factor of 1 / 2 b y ( N − 1) / N . Ho w ev er, maximising the SE is insuﬃcien t to guar- an tee a Mark o vian memoryless sym b olic sequence with a generating character. This is p ossible only after maximis- ing the SIR v alues and also con trasting successiv e orders of the partition suc h that they are generating. Otherwise, Finding appro ximate GMPs for Complex Systems 7 w e are only ﬁnding the partition’s lo cation that splits the state-space into disjoin t regions where the system spends the same time. In that case, and for our working exam- ple, ha ving a 4 symbol alphab et with equally-distributed app earance probabilit y , p ( α ) = 1 / 4, leads to hav e a max- im um SE giv en b y [Eq. (3)], H ( p =1) ( S 4 ) = − 4 X α =1 1 4 log 4  1 4  = − log 4  1 4  = 1 . Consequen tly , the order-1 partition lo cation is only de- ﬁned after the SIR is maximised for increasing L . B. Non-identical logistic maps order- 1 partition app ro ximation The problem of ﬁnding the optimal order-1 partition’s placemen t is further enhanced if heterogeneity is intro- duced into the system, as it breaks the symmetry b etw een the maps. An example of this heterogeneous condition is sho wn in Fig. 4, where the same coupling strength,  = 0 . 10, and panel distribution as in Fig. 2 are used, but sligh tly diﬀeren t map parameters are set, i.e., r 1 = 3 . 9 and r 2 = 4. In this case, as Fig. 5 shows, the order-1 partition placement that maximises the SE (colour co de) v alue c hanges. The order-1 partition that results in the highest SE v alue, i.e., H ( p =1) = 0 . 92, is now obtained when dividing map’s 1 interv al at 0 . 84 ± 0 . 02 and map’s 2 interv al at 0 . 58 ± 0 . 02. Comparing Figs. 3 and 5, w e see that the 4 state-space regions, whic h w ere encod- ing the system’s tra jectory into 4 symbols, hav e now c hanged. Before, the divisions (vertical and horizontal lines in Fig 3) cross themselves at the state-space diag- onal. No w, the region for the left quadrants in Fig. 5 is larger than the resp ective regions in Fig. 3, thus the crossing is b elo w the state-space diagonal. FIG. 4. The left panel sho ws the attractor for tw o hetero- geneous logistic maps symmetrically coupled. As in Fig. 2, the coupling strength is set to 0 . 10 and b oth maps are set in their chaotic regime, but with slightly diﬀerent parameters, namely , r 1 = 3 . 9 and r 2 = 4. The remaining parameters, panel distribution, and symbols, are identical to Fig. 2. The reason behind the order-1 partition shift in p osi- tion is seen from the attractor in Fig. 4, where b ecause symmetry is broken by the heterogeneous parameters, FIG. 5. Shannon En tropy (SE, colour co de) as a function of the order-1 partition lo cation for the coupled-map dynamics sho wn in Fig. 4. SE v alues and lines are found as in Fig. 3. the system sp ends more time on the upp er-diagonal p or- tion of the attractor (see top right panel in Fig. 4) than on the lo wer-diagonal p ortion (see b ottom right panel in Fig. 4). Hence, the partition that maximises the SE is now the one that balances oﬀ this eﬀect in the en- co ded tra jectory . Given that one of the maps is set to r 1 = 3 . 9, the system is unable to reac h the en tire state space ([0 , 1] × [0 , 1]) for an y  > 0. Consequen tly , a unit SE (in base 4) is unattained for this scenario, con trary to the former scenario where b oth maps were identical and in their c haotic regime, r 1 = r 2 = 4 (then, the SE can b e nearly set to 1). How ev er, the SE maximisation is an insuﬃcient condition to ﬁx the lo cation of the order-1 marginal partition, as w e sho w in what follows. C. Shannon Information Rate (SIR) and the Generating Ma rk ov Pa rtition (GMP) appro ximations Deﬁning an appro ximate GMP in terms of the maximi- sation of SE do es not guarantee the generating c haracter of the partition. Hence, our order-1 partition placement is set only after the resulting sym b olic sequence max- imises the SIR with resp ect to the lo cation of the par- tition and the SIR v alue b eha v es asymptotically as L is increased. Once an order-1 approximate GMP is found, w e seek to determine an order-2 partition that is also an appro ximation to a GMP . If the maximal v alue of SIR is the same for b oth partition orders, w e say that the partition is generating also in the spatial sense. F or the dynamical scenario of Fig. 2, as Fig. 6 sho ws, the maximum SIR v alue (ﬁlled circles) for a given L de- creases as L increases. This v alue is exp ected to ev en- tually conv erge to the Kolmogorov-Sinai entrop y of the system for large L s, which is upp er b ounded by the sum of p ositive Ly apuno v exp onents and equals 0 . 59. On the other hand, we note that the placement for the order-1 partition lo cation changes for increasing L with resp ect to the maximisation of the SE, but its conv ergence is faster than the maximum SIR v alue. After L = 2, we ﬁnd that the lo cation is nearly unc hanged, and p oin ts to an opti- Finding appro ximate GMPs for Complex Systems 8 mal order-1 partition lo cated at 0 . 50 ± 0 . 02 for b oth map in terv als. Hence, despite the fact that the maximum SIR v alue still decreases for increasing L , the order-1 parti- tion lo cation stops c hanging after small L increments. This is also corrob orated by the SE non-c hanging v al- ues (ﬁlled squares), whic h are found in this case for the corresp onding partition lo cations that maximise the SIR. FIG. 6. Shannon Entrop y (SE) and maximum Shannon In- formation Rate (SIR) as a function of the word length, L , for an order-1 marginal partition of the coupled dynamical sys- tem sho wn in Fig. 2. The maximum SIR (circles) depends on the partition location, similarly to the SE v alue (squares), and it is found from Eq. (1) without the limit and using a base 4 logarithm. As L increases, the lo cation conv erges to the division that splits b oth map’s in terv als at 0 . 50 ± 0 . 02. FIG. 7. Shannon En tropy (SE) and maxim um Shannon Infor- mation Rate (SIR) as a function of the word length, L , found for an order-1 marginal partition of the coupled dynamical system shown in Fig. 4. Con trary to Fig. 6, the partition lo cation conv erges to 0 . 48 ± 0 . 02 for map 1 ( r 1 = 3 . 9) and 0 . 50 ± 0 . 02 for map 2 ( r 2 = 4) after L = 2. Similarly , for the dynamical scenario of Fig. 4, Fig. 7 sho ws the maxim um SIR v alues (ﬁlled circles) decrease as the word-length L increases. The sum of the p osi- tiv e Lyapuno v exp onents for this case is 0 . 24, hence, the maxim um SIR is exp ected to decrease even further than the previous dynamical scenario. How ever, the lo cation for the optimal order-1 partition con v erges faster than b efore, where now, it splits the interv als at 0 . 48 ± 0 . 02 for map 1 ( r 1 = 3 . 9) and 0 . 50 ± 0 . 02 for map 2 ( r 2 = 4). After the order-1 partition lo cation is set by maximis- ing the SIR v alues for increasing L , we sub-divide the state-space into an order-2 partition —maintaining the previous partition lo cation. F or the 2 coupled logistic maps, this means that each interv al is further divided in to 2 more regions. Thus, w e go from a 2 ( p =1)( D =2) = 4 letter alphab et, S 1 , to a 2 ( p =2)( D =2) = 16 letter alphabet, S 2 . W e need to ha ve h ( p =1) L =2 ' h ( p =2) L =1 = H ( p =1) , in or- der to asymptotically preserv e the generating c haracter of the partition. W e must note that, since the time-series length is ﬁxed ( T = 10 5 iterations), higher order parti- tions and longer word lengths start to b e ill-deﬁned. F or example, for an order-1 partition, the SIR probabilities in Eq. (1) for words of length L = 2 hav e an a v erage of T / |S 1 | ( L =2) = 10 5 / 4 2 = 6250 possibilities, but for w ords of length L = 5 the statistic b ecomes T / |S 1 | ( L =5) = 10 5 / 4 5 ' 100. F or the order-2 partition the statistic b ehind the deﬁnition of the SIR probabilities for words of length L = 2 results in T / |S 2 | ( L =2) = 10 5 / 16 2 ' 400 p ossibilities, which is still statistically signiﬁcant. But the probability of app earance for L = 3 words is already ill-deﬁned, unless the time-series length T is extended. F or the order-2 marginal partition of the homogeneous coupled system ( r 1 = r 2 = 4 . 0), our results show that the maxim um SIR v alue for L = 2 is achiev ed when splitting the map in terv als at 0 . 40 ± 0 . 05 and 0 . 80 ± 0 . 05 (main- taining the former order-1 partition at 0 . 50 ± 0 . 02). This result holds after w e increase the w ord length to L = 3, where we need to increase the time-series to T = 5 × 10 5 so that w ord-statistics are well-deﬁned. Con trary , for the heterogeneous coupled system ( r 1 = 3 . 9 and r 2 = 4 . 0), the maximum SIR for L = 2 is ac hiev ed when splitting map’s 1 in terv al at 0 . 393 ± 0 . 048 and 0 . 811 ± 0 . 052 (main- taining the former order-1 partition at 0 . 48 ± 0 . 02), and map’s 2 interv al at 0 . 40 ± 0 . 05 and 0 . 80 ± 0 . 05 (maintain- ing the former order-1 partition at 0 . 50 ± 0 . 02). Again, results hold when L = 3 and T 7→ 5 × 10 5 . The resolution changes are the consequence that we use 10 diﬀeren t sub division lo cations within the resul- tan t order-1 split, namely , to each sub-interv al we make a split changing its lo cation 10 times. Thus, for the order- 2 partition, w e explore 10 × 10 lo cations per map, namely , 10 2 × 10 2 lo cations for the whole state-space. On the con- trary , for the order-1 partition, we explored 50 lo cations for the split p er map, namely , 50 × 50 possible division placemen ts in the whole state-space. This means that w e ac hiev e a b etter resolution for the order-1 lo cation than for the order-2. Sp eciﬁcally , the diﬀerent lo cations for the order-1 are separated by 1 . 0 / 50 (1 . 0 is the map’s interv al length), while the diﬀerent lo cations for the order-2 are separated by , e.g., 0 . 5 / 10, when 0 . 5 is the sub-divided in terv al length, or 0 . 48 / 10, when the length is 0 . 48. D. Heterogeneous scenario Here we show that other results supp ort our previous analysis, where we set signiﬁcan tly diﬀerent map parame- Finding appro ximate GMPs for Complex Systems 9 ters for the tw o logistic maps. Sp eciﬁcally , w e set r 1 = 3 and r 2 = 4, corresp onding to an isolated p erio dic and c haotic map’s dynamic, resp ectiv ely . Figures 8 and 9 sho w the corresp onding SE analysis, as presen ted in the Results section. F or this case, the optimal order-1 GMP appro ximation, namely , the one that maximises the SIR v alues for L = 5, is found when the split divides map’s 1 in terv al at 0 . 72 ± 0 . 02 and map’s 2 in terv al at 0 . 86 ± 0 . 02, whic h is higher than the split that maximises the SE (horizon tal and vertical contin uous lines in Fig. 9). W e note that the in terv al division for map 2 is diﬀeren t than the previous 2 dynamical scenarios, which held the same lo cation for the order-1 partition as the location of the exact GMP for the isolated dynamic of a chaotic logistic map. How ever, in this case, the conv ergence to the lo- cation is slow er than the previous cases, as is shown in Fig. 10 b y the decreasing SE v alues. FIG. 8. Attractor for N = 2 diﬀerent logistic-maps sym- metrically coupled [Eq. (4)]. The coupling strength is set to  = 0 . 10 and the map parameters are r 1 = 3 (perio dic regime) and r 2 = 4 (chaotic regime). The initial condition is set ran- domly and 10 5 iterations are shown. The rectangles indicate the areas sho wn in Figs. 2 and 4. E. Discussion: extension to systems near tipping-p oints Statistical dynamical-inv ariants that can b e estimated from appro ximate GMPs as describ ed in this pap er can b e essen tial to understand the prop erties of dynamical systems near a tipping point, contributing to predicting the tendency for the system to drift tow ard it, to issuing early w arnings, and ﬁnally , to applying con trol to rev erse or slo w do wn the trend. Here, the prop osed metho d, which is based on infor- mational quantities, is appropriate to deal with ev en ts that contain a p ositive en trop y , as with chaotic systems. Ho w ev er, in sev eral situations, the dynamics of a system undergoing a tipping p oint is p erio dic, namely , a zero- en trop y even t. Nonetheless, in Nature 32 , tipping points also happ en in systems that present noise. Then, the noise reveals a transien t dynamics with a positive en- trop y (due to the noisy tra jectories), hence, the present metho dology could also b e applied successfully . FIG. 9. Shannon Entrop y (SE) v alues [Eq. (3)] (colour co de) for the coupled maps shown in Fig. 8. The v alues are found from the symbolic sequence that results after dividing the state-space into 4 regions, i.e., 4 diﬀerent symbols enco de the coupled-system’s tra jectory . The division sets an order- 1 marginal partition and the symbolic sequence c hanges ac- cording to the partition’s lo cation. The maxim um SE is found for the partition that is signalled b y the vertical and horizon tal contin uous lines, where the dashed lines show a ± 1 / 50 = ± 0 . 02 resolution for its placement. FIG. 10. Shannon Entrop y (SE, ﬁlled squares) and maxim um Shannon Information Rate (SIR, ﬁlled circles) as a function of the w ord length, L , for an order-1 marginal partition of the coupled dynamical system shown in Fig. 8 and determined as in Figs. 6 and 7. An imp ortant requirement in the study of tipping p oin ts is the determination of whether the system’s pa- rameter is b efore, at, or after the tipping point. F or systems with noise, Ref. 32 has shown that imp ortant dy- namical c haracteristics do not fully rev eal the status of the system. Another well-studied case where the tipping p oin t happ ens, is the existence of multi-stabilit y , i.e., the destruction of one attractor or the complete destruction of the oscillatory b eha viour (oscillation death). This tip- ping results in a merging of manifolds for co-existing sets, causing drastic c hanges in the partitions. Consequently , our prop osed metho d could b e successful in determining the status of the system that can p oten tially tip. Finding appro ximate GMPs for Complex Systems 10 IV. CONCLUSIONS In this w ork we presen t a pro cedure that uses an Infor- mation theoretical p ersp ectiv e to appro ximate a Generat- ing Marko v Partition (GMP) for a complex system from ﬁnite resolution and ﬁnite time interv al tra jectories. Our metho d divides the state-space, or a pro jection of it, us- ing marginal partitions, namely , straight divisions, that deﬁne disjoint regions. These regions enco de the system’s tra jectory into discrete symbols coming from a ﬁnite al- phab et (i.e., ﬁnite num b er of regions). The enco ded data sequence is then used to ﬁnd its Shannon Information Rate (SIR) for diﬀerent w ord-lengths (i.e., diﬀerent sym- b ol strings). The partition placement is shifted across the state-space in order to ﬁnd the one that maximises the SIR for increasing word-lengths. Moreo ver, in or- der to hav e a generating partition in the spatial sense, a sub-division of the state-space (a higher-order partition) needs to hav e a similar SIR than the previous division (a lo w er-order partition). When these conditions are met, the resultan t sym b olic sequence and partition lo cation deﬁne an appro ximate GMP , which allows to ﬁnd a dis- crete approximation for the Inv ariant Probabilit y Mea- sure (IPM) of the complex system, pro viding most of the relev ant information conten t of the system’s dynam- ics. If the partition is generating, it will pro vide an en- co ding that preserves the system c haracteristics without adding [remo ving] meaningless [imp ortant] information. F urthermore, ha ving the approximate GMP allo ws one to estimate other spatio-temp oral inv ariants; imp ortant for the c haracterisation of complex systems from time-series. It is often b eliev ed that an optimal partition con- taining most of the relev ant information ab out a sys- tem is obtained m y maximising the Shannon Entrop y . This work shows that this is not the case. Suc h a case is only true if the system is random and the prob- abilistic ev en ts triggered by the system dynamics are uncorrelated. F or correlated systems, the appropriate informational-theoretical quantit y to determine an ap- pro ximate GMP is the SIR. Although our results are focused on analysing a partic- ular complex system, namely , tw o coupled logistic maps in their chaotic regime, our metho d applicability is un- b ounded to this particular case. In fact, the main re- striction to its applicability is the computational p ow er and data av ailability , i.e., partition order- q resolution R and time-series length T . The reason is that our state- space split creates 2 q D disjoin t regions from an order- q split and a D -dimensional state-space (which can b e a pro jection of the full state-space). Consequently , our metho d is eﬃcien t up until the statistics for the SIR v al- ues for large word-lengths L are ill-deﬁned, which hap- p ens if T / (2 q D ) L  100. V. A CKNO WLEDGEMENTS Authors thank the Scottish Universit y Ph ysics Alliance (SUP A) supp ort and NR also thanks PEDECIBA. 1 J. P . Ec kman and D. Ruelle, “Ergodic theory of chaos and strange attractors”, Rev. Mo d. Phys. 57 (3), 617-656 (1985). 2 S. Wiggins, Intro duction to Applied Nonline ar Dynamic al Sys- tems and Chaos (Springer Science & Business Media, 2003). 3 K. T. Alligo o d, T. D. Sauer, and J. A. Y orke, Chaos: A n In- tr o duction to Dynamic al Systems (Springer Science & Business Media, 3rd Ed., 2006). 4 J. Guck enheimer, and P . J. Holmes, Nonline ar oscil lations, dynamic al systems, and bifur c ations of ve ctor ﬁelds V ol. 42. (Springer Science & Business Media, 2013). 5 C. E. Shannon, “A mathematical theory of comm unication”, Bell Syst. T ec h. J. 27 , 623-656 (1948). 6 A. N. Kolmogorov, “A new metric inv ariant of transient dynami- cal systems and automorphisms in Lebesgue spaces”, Dokl. Ak ad. Nauk SSSR 119 , 861-864 (1958). 7 J. M. Amig´ o, Permutation Complexity in Dynamic al Systems- Or dinal Patterns, Permutation Entr opy, and Al l That (Springer Science & Business Media, 1st Ed., 2010). 8 C. Greb ogi, E. Ott, and J. A. Y orke, “Crises, sudden changes in chaotic attractors, and transien t c haos”, Ph ysica D 7 , 181-200 (1983). 9 S. W. McDonald, C. Grebogi, E. Ott, and J. A. Y orke, “F ractal basin b oundaries”, Physica D 17 , 125-153 (1984). 10 C. Grebogi, E. Ott, and J. A. Y orke, “Chaos, Strange A ttractors, and F ractal Basin Boundaries in Nonlinear Dynamics”, Science 238 (4827), 632-638 (1987). 11 P . Cvitanovi ´ c, G. H. Gunaratne, and I. Pro caccia, “T op ological and metric properties of H´ enon-type strange attractors”, Ph ys. Rev. A 38 (3), 1503-1520 (1988). 12 A. S. Piko vsky and U. F eudel, “Characterizing Strange Non- chaotic Attractors”, Chaos 5 (1), 253-260 (1995). 13 U. F eudel and C. Greb ogi, “Multistability and the control of complexity”, Chaos 7 (4), 597-604 (1997). 14 Y. G. Sina ˆ ı, “Construction of Marko v partitionings”, F unktsion- alnyi Analiz i ego Prilozhenija 2 (3), 70-80 (1968). 15 P . Grassberger and H. Kantz, “Generating partitions for the dis- sipative H´ enon map”, Ph ys. Lett. A 113 (5), 235-238 (1985). 16 F. Christiansen and A. Politi, “A generating partition for the standard map”, Phys. Rev. E 51 , R3811 (1995). 17 R. L. Da vidc hack, Y.-C. Lai, E. M. Bollt, and M. Dhamala, “Es- timating generating partitions of c haotic systems b y unstable p e- riodic orbits”, Phys. Rev. E 61 (2), 1353-1356 (2000). 18 C. Greb ogi, E. Ott, and J. A. Y orke, “Unstable p erio dic orbits and the dimensions of m ultifractal c haotic attractors” Ph ys. Rev. A 37 , 1711-1724 (1988). 19 Y. Hirata, K. Judd, and D. Kilminster, “Estimating a generating partition from observed time series: Symbolic shadowing”, Phys. Rev. E 70 , 016215 (2004). 20 D. Holstein and H. Kan tz, “Optimal Mark ov appro ximations and generalized embeddings”, Phys. Rev. E 79 , 056202 (2009). 21 H. T eramoto and T. Komatsuzaki, “Ho w does a c hoice of Mark ov partition aﬀect the resultant symbolic dynamics?”, Chaos 20 , 037113 (2010). 22 E. M. Bollt, T. Stanford, Y.-C. Lai, K. ´ Zyczko wski, “V alidity of Threshold-Crossing Analysis of Sym b olic Dynamics from Chaotic Time Series”, Phys. Rev. Lett. 85 (16), 3524-3527 (2000). 23 C. Bandt and B. Pompe, “P erm utation Entrop y: A Natural Complexity Measure for Time Series”, Ph ys. Rev. Lett. 88 (17), 174102 (2002). 24 N. Rubido, et al., “Language organization and temporal corre- lations in the spiking activity of an excitable laser: experiments and mo del comparison”, Phys. Rev. E 84 (2), 026202 (2011). 25 J. M. Amig´ o, R. Monetti, T. Aschen brenner, and W. Bunk, “T ranscripts: An algebraic approach to coupled time series”, Chaos 22 , 013105 (2012). 26 E. Bianco-Mart ´ ınez, N. Rubido, C. G. An tonopoulos, and M. S. Baptista, “Successful netw ork inference from time-series data us- ing mutual information rate”, Chaos 26 , 043102 (2016). 27 M. S. Baptista, C. Greb ogi, and R. K¨ oberle, “Dynamically Mul- tilay ered Visual System of the Multifractal Fly” Ph ys. Rev. Lett. 97 , 178102 (2006). 28 M. S. Baptista and J. Kurths, “T ransmission of information in active networks”, Phys. Rev. E, 77 , 026205 (2008). 29 M. B. Kennel and M. Buhl, “Estimating Goo d Discrete Partitions from Observed Data: Symbolic F alse Nearest Neighbors”, Phys. Rev. Lett. 91 (8), 084102 (2003). Finding appro ximate GMPs for Complex Systems 11 30 K. Kanek o, “Clustering, co ding, switching, hierarchical ordering, and con trol in a net work of c haotic elemen ts”, Physica D 41 , 137- 172 (1990). 31 F. Xie and H. Cerdeira, “Coherent-ordered transition in chaotic globally coupled maps”, Ph ys. Rev. E 54 (4), 3235-3238 (1996). 32 E. S. Medeiros, I. L. Caldas, M. S. Baptista, and U. F eudel, “T rapping Phenomenon A ttenuates Tipping P oints for Limit Cy- cles”, Sci. Rep. 7 , 42351 (2017).

Entropy-based Generating Markov Partitions for Complex Systems

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment