Detecting emergent processes in cellular automata with excess information
Many natural processes occur over characteristic spatial and temporal scales. This paper presents tools for (i) flexibly and scalably coarse-graining cellular automata and (ii) identifying which coarse-grainings express an automaton's dynamics well, …
Authors: David Balduzzi
Detecting emergent pr ocesses in cellular automata with excess inf ormation David Balduzzi 1 1 Department of Empirical Inference, MPI for Intelligent Systems, T ¨ ubingen, Germany david.balduzzi@tuebingen.mpg.de Abstract Many natural processes occur over characteristic spatial and temporal scales. This paper presents tools for (i) flexibly and scalably coarse-graining cellular automata and (ii) identify- ing which coarse-grainings e xpress an automaton’ s dynamics well, and which express its dynamics badly . W e apply the tools to in vestigate a range of examples in Conway’ s Game of Life and Hopfield networks and demonstrate that they cap- ture some basic intuitions about emergent processes. Finally , we formalize the notion that a process is emergent if it is bet- ter expressed at a coarser granularity . Introduction Biological systems are studied across a range of spa- tiotemporal scales – for example as collections of atoms, molecules, cells, and organisms (Anderson, 1972). How- ev er , not all scales express a system’ s dynamics equally well. This paper proposes a principled method for identify- ing which spatiotemporal scale best expresses a cellular au- tomaton’ s dynamics. W e focus on Conway’ s Game of Life and Hopfield networks as test cases where collectiv e behav- ior arises from simple local rules. Conway’ s Game of Life is a well-studied artificial sys- tem with interesting behavior at multiple scales (Berlekamp et al., 1982). It is a 2-dimensional grid whose cells are up- dated according to deterministic rules. Remarkably , a suffi- ciently large grid can implement any deterministic compu- tation. Designing patterns that perform sophisticated com- putations requires working with distributed structures such as gliders and glider guns rather than individual cells (Den- nett, 1991). This suggests grid computations may be better expressed at coarser spatiotemporal scales. The first contribution of this paper is a coarse-graining procedure for expressing a cellular automaton’ s dynamics at different scales. W e begin by considering cellular au- tomata as collections of spacetime coordinates termed occa- sions (cell n i at time t ). Coarse-graining groups occasions into structures called units . For example a unit could be a 3 × 3 patch of grid containing a glider at time t . Units do not ha ve to be adjacent to one another; the y interact through channel – transparent occasions whose outputs are marginal- ized over . Finally , some occasions are set as gr ound , which fixes the initial condition of the coarse-grained system. Gliders propagate at 1/4 diagonal squares per tic – the grid’ s “speed of light”. Units more than 4 n cells apart cannot interact within n tics, imposing constraints on which coarse- grainings can express glider dynamics. It is also intuitiv ely clear that units should group occasions concentrated in space and time rather than scattered occasions that hav e nothing to do with each other . In fact, it turns out that most coarse- grainings express a cellular automaton’ s dynamics badly . The second contrib ution of this paper is a method for dis- tinguishing good coarse-grainings from bad based on the following principle: • Coarse-gr ainings that generate mor e information, r ela- tive to their sub-grainings, better expr ess an automaton’s dynamics than those generating less. W e introduce two measures to quantify the information gen- erated by coarse-grained systems. Ef fectiv e information, ei , quantifies how selectively a system’ s output depends on its input. Effecti ve information is high if few inputs cause the output, and low if many do. Excess information, ξ , mea- sures the difference between the information generated by a system and its subsystems. W ith these tools in hand we in vestigate coarse-grainings of Game of Life grids and Hopfield networks and show that grainings with high ei and ξ capture our basic intuitions regarding emer gent processes. For example, excess infor- mation distinguishes boring (redundant) from interesting (synergistic) information-processing, exemplified by blank patches of grid and gliders respectiv ely . Finally , the penultimate section con verts our experience with examples in the Game of Life and Hopfield networks into a provisional formalization of the principle above. Roughly , we define a process as emerg ent if it is better ex- pressed at a coarser scale. The principle states that emergent processes are more than the sum of their parts – in agreement with many other approaches to quantifying emergence (Crutchfield, 1994; T ononi, 2004; Polani, 2006; Shalizi and Moore, 2006; Seth, 2010). T wo points distinguishing our approach from prior work are worth emphasizing. First, coarse-graining is scal- able : coarse-graining a cellular automaton yields another cellular automaton. Prior w orks identify macro-v ariables such as temperature (Shalizi and Moore, 2006) or centre- of-mass (Seth, 2010) b ut do not show ho w to describe a sys- tem’ s dynamics purely in terms of these macro-variables. By contrast, an emergent coarse-graining is itself a cellular au- tomaton, whose dynamics are computed via the mechanisms of its units and their connectivity (see belo w). Second, our starting point is selectivity rather than pre- dictability . Assessing predictability necessitates building a model and deciding what to predict. Although emer gent variables may be rob ust against model changes (Seth, 2010), it is unsatisfying for emergence to depend on properties of both the process and the model. By contrast, ef fectiv e and excess information depend only on the process: the mecha- nisms, their connecti vity , and their output. A process is then emergent if its internal dependencies are best expressed at coarse granularities. Probabilistic cellular automata Concrete examples. This paper considers two main ex- amples of cellular automata: Conway’ s Game of Life and Hopfield networks (Hopfield, 1982). The Game of Life is a grid of deterministic binary cells. A cell outputs 1 at time t iff: (i) three of its neighbors outputted 1s at t − 1 or (ii) it and two neighbors outputted 1s at t − 1 . In a Hopfield network (Amit, 1989), cell n k fires with probability proportional to p ( n k,t = 1 | n • ,t − 1 ) ∝ exp 1 T X j → k α j k · n j,t − 1 (1) T emperature T controls network stochasticity . Attractors { ξ 1 , . . . , ξ N } are embedded into a network by setting the connectivity matrix as α j k = P N µ =1 (2 ξ µ j − 1)(2 ξ µ k − 1) . Abstract definition. A cellular automaton is a finite di- rected graph X with vertices V X = { v 1 . . . v n } . V ertices are referred to as occasions; they correspond to spacetime coordinates in concrete examples. Each occasion v l ∈ V X is equipped with finite output alphabet A l and Markov ma- trix (or mechanism ) p l ( a l | s l ) , where s l ∈ S l = Q k → l A k , the combined alphabet of the occasions targeting v l . The mechanism specifies the probability that occasion v l chooses output a l giv en input s l . The input alphabet of the entire au- tomaton X is the product of the alphabets of its occasions X in := Q l ∈ V X A l . The output alphabet is X out = X in . Remark. The input X in and output X out alphabets are dis- tinct copies of the same set. Inputs are causal interven- tions imposed via Pearl’ s do ( − ) calculus (Pearl, 2000). The probability of output a l is computed via the Marko v matrix: p l a l | do ( s l ) . The do ( − ) is not included in the notation e x- plicitly to sav e space. Ho wev er , it is always implicit when applying any Marko v matrix. A Hopfield netw ork ov er time interval [ α, β ] is an abstract automaton. Occasions are spacetime coordinates – e.g. v l = n i,t , cell i at time t . An edge connects v k → v l if there is a connection from v k ’ s cell to v l ’ s and the time coordinates are t − 1 and t respectiv ely for some t . The mechanism is giv en by Eq. (1). Occasions at t = α , with no incoming edges, can be set as fix ed initial conditions or noise sources. Similar considerations apply to the Game of Life. Non-Markovian automata (whose outputs depend on in- puts over multiple time steps) have edges connecting occa- sions separated by more than one time step. Coarse-graining Define a subsystem X of cellular automaton Y as a subgraph containing a subset of Y ’ s vertices and a subset of the edges targeting those v ertices. W e sho w how to coarse-grain X . Definition (coarse-graining) . Let X be a subsystem of Y . The coar se-graining algorithm detailed below takes X ⊂ Y and data K as arguments, and produces new cellular au- tomaton X K . Data K consists of (i) a partition of X ’s occa- sions V X = G ∪ C ∪ U 1 ∪ · · · ∪ U N into gr ound G , channel C and units U 1 . . . U N and (ii) gr ound output s G . V ertices of automaton X K , the new coarse-grained occa- sions, are units: V X K := { U 1 . . . U N } . The directed graph of X K is computed in Step 4 and the alphabets A l of units U l are computed in Step 5. Computing the Markov matrices (mechanisms) of the units takes all fi ve steps. The ground specifies occasions whose outputs are fixed: the initial condition s G . The channel specifies unobserved occasions: interactions between units propagate across the channel. Units are macroscopic occasions wh ose interac- tions are expressed by the coarse-grained automaton. Fig. 1 illustrates coarse-graining a simple automaton. There are no restrictions on partitions. For example, al- though the ground is intended to provide the system’ s ini- tial condition, it can contain any spacetime coordinates so that in pathological cases it may obstruct interactions be- tween units. Distinguishing good coarse-grainings from bad is postponed to later sections. Algorithm. Apply the follo wing steps to coarse-grain: Step 1. Marginalize o ver extrinsic inputs. External inputs are treated as independent noise sources; we are only interested in internal information-processing. An occasion’ s input alphabet decomposes into a product S l = S X l × S Y \ X l of inputs from within and without the system. For each occasion v l ∈ V X , marginalize o ver exter - nal outputs using the uniform distribution: p l a l s X l ) := X S Y \ X l p l a l s X l , s Y \ X l · p unif ( s Y \ X l ) . (2) time 0 -1 -2 -3 -4 -5 -6 directed graph 0 -1 -2 -3 -4 -5 0 -5 -6 without extraneous arrows nearest neighbor CA A B C coarse-grained system D cells red blue ground Figure 1: (A) An automaton of 6 cells connected to their imme- diate neighbors. (B): The directed graph of occasions over time interval [ − 6 , 0] . Green occasions are ground. Red and blue oc- casions form two units. Other occasions are channel. (C): Edges whose signals do not reach the blue unit hav e no effect. (D): The coarse-grained system consists of two units (macro-occasions). Step 2. F ix the gr ound. Ground outputs are fixed in the coarse-grained system. Graining K imposes a second decomposition onto v l ’ s in- put alphabet, S X l = S G l × S C l × S U l where U = ∪ k U k . Subsume the ground into v l ’ s mechanism by specifying p G l a l s C l , s U l ) := p l a l s G l , s C l , s U l . Step 3. Marginalize o ver the channel. The channel specifies transparent occasions. Perturba- tions introduced into units propagate through the channel until they reach other units where they are observed. Trans- parency is imposed by mar ginalizing ov er the channel occa- sions in the product mechanism p K ( x K out | x K in ) := X l ∈ C Y l ∈ C ∪ U p G l x l out | x l in , (3) where superscripts denote that inputs and outputs are re- stricted, for K , to occasions in units in K (since channel is summed o ver and ground is already fix ed) and, for each l , to the inputs and outputs of occasion v l . For example, consider cellular automaton with graph v a → v b → v c and product mechanism p ( c | b ) p ( b | a ) p ( a ) . Setting v b as channel and marginalizing yields coarse- grained mechanism P b p ( c | b ) p ( b | a ) p ( a ) = p ( c | a ) p ( a ) . The channel is rendered transparent and new mechanism p ( c | a ) con volves p ( c | b ) and p ( b | a ) . Step 4. Compute the effective graph of coar se-graining X K . The micro-alphabet of unit U l is ˜ A l := Q k ∈ U l A k . The mechanism of U l is computed as in Eq. (3) with the prod- uct restricted to occasions j ∈ C ∪ U l , thus obtaining p U l ( a l | x in ) where a l ∈ ˜ A l . T wo units U k and U l are connected by an edge if the outputs of U k make a difference to the behavior of U l . More precisely , we draw an edge if ∃ a k , a 0 k ∈ ˜ A k such that p U l ( a l | x in , a k ) 6 = p U l ( a l | x in , a 0 k ) for some a l ∈ ˜ A l . Here, x in denotes the input from all units other than U k . The ef fectiv e graph need not be acyclic. Intervening via the do ( − ) calculus allo ws us to work with cycles. Step 5. Compute macr o-alphabets of units in X K . Coarse-graining can eliminate low-le vel details. Outputs that are distinguishable at the base level may not be after coarse-graining. This can occur in two ways. Outputs b and b 0 hav e indistinguishable effects if p ( a | b, c ) = p ( a | b 0 , c ) for all a and c . Alternativ ely , two outputs react indistinguish- ably if p ( b | c ) = p ( b 0 | c ) for all c . More precisely , two outputs u l and u 0 l of unit U l are equiv alent, denoted u l ∼ K u 0 l , iff p K ( x out | x in , u l ) = p K ( x out | x in , u 0 l ) and p U l ( u l | x K in ) = p U l ( u 0 l | x K in ) for all x out , x in . Picking a single element from each equiv alence class ob- tains the macro-alphabet A l of the unit U l . The mechanism of U l is p U l , Step 4, restricted to macro-alphabets. Inf ormation This section extends prior work to quantify the information generated by a cellular automaton, both as a whole and rela- tiv e to its subsystems (Balduzzi and T ononi, 2008, 2009). Giv en subsystem m of X , let p m ( x out | x in ) , or m for short, denote its mechanism or Markov matrix. The mechanism is computed by taking the Markov matrix of each occasion in X , marginalizing over extrinsic inputs (edges not in X ) as in Eq. (2), and taking the product. It is notationally con ve- nient to write p m as though its inputs and outputs are x out and x in , ev en though m does not in general contain all oc- casions in X and therefore treats some inputs and outputs as extrinsic, unexplainable noise. W e switch freely between terms “subsystem” and “submechanism” below . Effective inf ormation quantifies how selectiv ely a mech- anism discriminates between inputs when assigning them to an output. Alternativ ely , it measures how sharp the func- tional dependencies leading to an output are. The actual r epertoir e ˆ p m ( X in | x out ) is the set of inputs that cause (lead to) mechanism m choosing output x out , weighted by likelihood according to Bayes’ rule ˆ p m x in | x out := p m x out | do ( x in ) p ( x out ) · p unif ( x in ) . (4) The do ( − ) notation and hat ˆ p remind that we first inter- vene to impose x in and then apply Markov matrix p m . For deterministic mechanisms, i.e. functions f : X in → X out , the actual repertoire assigns ˆ p = 1 | f − 1 ( x out ) | to ele- ments of the pre-image and ˆ p = 0 to other elements of X in . 11 11 00 00 00 01 00 10 00 11 01 00 01 01 01 10 01 11 10 00 10 01 10 10 10 11 11 00 11 01 11 10 11 11 B 11 00 11 01 11 10 00 00 00 01 00 10 00 11 01 00 01 01 01 10 01 11 10 00 10 01 10 10 10 11 ei = log(16) - log(4) = 2 bits ei = log(16) - log(8) = 1 bit A 1 4 Figure 2: Categorization and information. Cells fire if they re- ceiv e two or more spik es. The 16 = 2 4 possible outputs by the top layer are arranged in a grid. (AB): Cells n 1 and n 4 fire when the output is in the orange and blue regions respectiv ely . Cell n 1 ’ s re- sponse is more informative than n 4 ’ s since it fires for fewer inputs. The shaded regions in Fig. 2 show outputs of the top layer that cause the bottom cell to fire. Effective information generated when m outputs x out is Kullback-Leibler di ver gence ( K L [ p k q ] = P i p i log 2 p i q i ), ei ( m , x out ) := K L h ˆ p m X in | x out p unif ( X in ) i . (5) Effecti ve information is not a statistical measure: it depends on the mechanism and a particular output x out . Effecti ve information generated by deterministic function f is ei ( f , x out ) = log 2 | X in | | f − 1 ( x out ) | where | · | denotes cardi- nality . In Fig. 2, ei is the logarithm of the ratio of the total number of squares to the number of shaded squares. Excess information quantifies how much more informa- tion a mechanism generates than the sum of its submecha- nisms – how syner gistic the internal dependencies are. Giv en subsystem with mechanism m , partition P = { M 1 . . . M m } of the occasions in sr c ( m ) , and output x out , define e xcess information as follo ws. Let m j := m ∩ ( M j × X ) be the restriction of m to sources in M j . Excess infor- mation ov er P is ξ ( m , P , x out ) := ei ( m , x out ) − X j ei ( m j , x out ) . (6) Excess information (sans partition) is computed over the information-theoretic weakest link P M I P ξ ( m , x out ) := ξ ( m , P M I P , x out ) . (7) Let A M j := Q l ∈ M j A j . The minimum information parti- tion 1 P M I P minimizes normalized excess information: P M I P := arg min P ξ ( m , P , x out ) N P , where N P := ( m − 1) · min j { log 2 | A M j |} . 1 W e restrict to bipartitions to reduce the computational b urden. Excess information is negati ve if any decomposition of the system generates more information than the whole. Fig. 3 sho ws how two cells taken together can generate the same, less, or more information than their sum taken individually depending on how their categorizations ov erlap. Note the figure decomposes the mechanism of the system ov er tar gets rather than sources and so does not depict excess information – which is more useful but harder to illustrate. Effecti ve information and excess information can be com- puted for any submechanism of any coarse-graining of any cellular automaton. 11 11 00 00 00 01 00 10 00 11 01 00 01 01 01 10 01 11 10 00 10 01 10 10 10 11 11 00 11 01 11 10 11 11 00 00 00 01 00 10 00 11 01 00 01 01 01 10 01 11 10 00 10 01 10 10 10 11 11 00 11 01 11 10 11 11 ei = 3 > 1 + 1 ei = 1.4 < 1 + 1 D C 11 00 11 01 11 10 00 00 00 01 00 10 00 11 01 00 01 01 01 10 01 11 10 00 10 01 10 10 10 11 11 00 11 01 11 10 11 11 00 00 00 01 00 10 00 11 01 00 01 01 01 10 01 11 10 00 10 01 10 10 10 11 ei = 2.4 = 2 + 0.4 ei = 4 = 2 + 2 A B 1 2 2 1 3 4 4 3 Figure 3: Independent, redundant and synergistic information. (AB): Independent. Orthogonal categorizations, orange+pink and blue+pink shadings respectively , by n 1 and n 2 . (C): Partially redundant. Both cells fire; categorizations overlap (pink) more “than expected” and ei ( n 3 n 4 , 11) < ei ( n 3 , 1) + ei ( n 4 , 1) . (D): Synergistic. Overlap is less “than expected”; ei ( n 3 n 4 , 01) > ei ( n 3 , 0) + ei ( n 4 , 1) . A pplication: Conway’ s Game of Life The Game of Life has interesting dynamics at a range of spatiotemporal scales. At the atomic lev el, each coordinate (cell i at time t ) is an occasion and information processing is extremely local. At coarser granularities, information can propagate through channels, so that units generate informa- tion at a distance. Gliders, for example, are distributed ob- jects that can interact over lar ge distances in space and time, Fig. 4A, and provide an important example of an emergent process (Dennett, 1991; Beer, 2004). This section shows how effecti ve and excess information quantifiably distinguish coarse-grainings expressing glider 9.0 1.5 0.0 3.0 4.5 6.0 7.5 B A e ective information Figure 4: Detecting focal points. (A): A glider moves 1 diago- nal square every 4 time steps. (B): Cells in the orange and black outlined 3 × 3 squares are units at t = 0 and t = − 20 respec- tiv ely , with x out the glider shown. Cells at t = − 21 are blank ground; other occasions are channel. Shifting the position of the black square produces a family of coarse-grainings. Effecti ve in- formation is sho wn as the black square’ s center v aries over the grid. dynamics well from those expressing it badly . Effective inf ormation detects focal points. Fig. 4A shows a glider trajectory , which passes through 1 diagonal step over 4 tics. Fig. 4B in vestigates how glider trajectories are captured by coarse-grainings: if there is a glider in the 3 × 3 orange square at time 0, Fig. 4B, it must have passed through the black square at t = − 20 to get there. Are coarse- grainings that respect glider trajectories quantifiably better than those that do not? Fig. 4B fixes occasions in the black square at t = − 20 and the orange square at t = 0 as units (18 total), the ground as blank grid at t = − 21 and ev erything else as channel. V ary- ing the spatial location of the black square over the grid, we obtain a family of coarse-grainings. Ef fectiv e information for each graining in the family is shown in the figure. There is a clear focal point exactly where the black square inter- sects the spatiotemporal trajectory of the glider where ei is maximized (dark red). Effecti ve information is zero for lo- cations that are too far or too close at t = − 20 to effect the output of the orange square at t = 0 . Effecti ve information thus provides a tool analogous to a camera focus: grainings closer to the focal point express glider dynamics better . Macroscopic textur e varies with distance. The behavior of indi vidual cells within a glider trajectory is far more com- plicated than the glider itself, which transitions through 4 phases as it trav erses its diagonal trajectory , Fig. 4A. Does coarse-graining quantifiably simplify dynamics? number of macr osta t es number of time st eps pr ior A B 62 4104 ï ï ï ï ï ï ï ï ï ï t = - 24 (r ed) t = - 12 (blue) 5 39 10 2 10 1 10 3 10 4 r ed blue Figure 5: Macro-alphabets as a function of distance. (A): Con- sider two families of coarse-grainings with channel and ground as in Fig. 4. First, take the blue squares (filled and empty) as units at times − 4 n and 0 where n is the diagonal distance between them. Second, repeat for the red squares. (B): Log-plot of the size of the filled squares’ macro-alphabets as a function of − 4 n . Fig. 5 constructs pairs of 3 × 3 units out of occasions at various distances from one another and computes their macro-alphabets. A 3 × 3 unit has a micro-alphabet of 2 9 = 512 outputs. The macro-alphabet is found by group- ing micro-outputs together into equiv alences classes if their effect is the same after propagating through the channel. W e find that the size of the macro-alphabet decreases exponen- tially as the distance between units increases, stabilizing at 5 macro-outputs: the 4 glider phases in Fig. 4A and a large equiv alence class of outputs that do not propagate to the tar- get unit and are equiv alent to a blank patch of grid. A similar phenomenon occurs for pairs of 4 × 4 units, also Fig. 5. Continuing the camera analogy: at close range the te xture of units is visible. As the distance increases, the channel absorbs more of the detail. The computational texture of the system is simpler at coarser -grains yielding a more symbolic description where glider dynamics are described via 4 basic phases produced by a single macroscopic unit rather than 2 9 outputs produced by 9 microscopic occasions. Excess information detects spatial organization. So far we have only considered grainings of the Game of Life that respect its spatial organization – in effect, taking the spatial structure for granted. A priori , there is nothing stopping us from grouping the 8 gray cells in Fig. 6A into a single unit that does not respect the spatial organization, since its con- stituents are separated in space. Are coarse-grainings that respect the grid-structure quantifiably better than others? Fig. 6A shows a coarse-graining that does not respect the grid. It constructs two units, one from both gray squares at t = 1 and the other from both red squares at t = 0 . Intu- itiv ely , the coarse-graining is unsatisfactory since it builds units whose constituent occasions hav e nothing to do with each other o ver the time-scale in question. Quantitativ ely , excess information over the obvious partition P of the sys- tem into two parts is 0 bits. It is easy to sho w ξ ≤ 0 for any disjoint units. By comparison, the coarse-grainings in panels CD, which respect the grid structure, both generate positiv e excess information. j /P = 0 ei = 0.4 = .2+.2 j = - 0.2 ei = 1.3 j = 0.9 ei = 2.6 j = 0.9 ei = 2.6 A B C D Figure 6: Detecting spatial organization. Units are the cells in the red (thick-edged) and gray (filled) squares at t = 0 and t = 1 respectiv ely; other occasions are extrinsic noise. (A): ξ = 0 . The coarse-graining groups non-interacting occasions into units. (B): ξ < 0 . A blank grid is highly redundant. (CD): ξ > 0 . Gliders perform interesting information-processing. Thus we find that not only does our information-theoretic camera ha ve an automatic focus, it also detects when pro- cesses hang together to form a single coherent scene. Excess information detects gliders. Blank stretches of grid, Fig. 6B, are boring. There is nothing going on. Are interesting patches of grid quantifiably distinguishable from boring patches? Excess information distinguishes blank grids from glid- ers: ξ on the blank grid is neg ativ e, Fig. 6B , since the in- formation generated by the cells is redundant analogous to Fig. 3C. By contrast, ξ for a glider is positiv e, Fig. 6CD, since its cells perform synergistic categorizations, similarly to Fig. 3D. Glider trajectories are also captured by excess information: v arying the location of the red units (at t = 0 ) around the gray units we find that ξ is maximized in the po- sitions sho wn, Fig. 6CD, thus e xpressing the rightwar ds and downw ards motions of the respectiv e gliders. Returning to the camera analogy , blank patches of grid fade into (back)ground or are (transparent) channel, whereas gliders are highlighted front and center as units. A pplication: Hopfield networks Hopfield networks embed energy landscapes into their con- nectivity . For any initial condition they tend to one of fe w attractors – troughs in the landscape (Hopfield, 1982; Amit, 1989). Although cells in Hopfield networks are quite differ - ent from neurons, there is evidence suggesting neuronal pop- ulations transition between coherent distributed states simi- lar to attractors (Abeles et al., 1995; Jones et al., 2007). output INT : B → B EXT : A → B t A B ei max ξ ei max ξ 0 00000000 01010101 1 10100011 01010101 2 . 42 0 . 10 0 . 31 0 . 04 2 10101010 00010101 1 . 85 0 . 08 2 . 44 0 . 16 3 10101010 00101011 1 . 96 0 . 12 6 . 89 0 . 27 4 10101010 00101010 1 . 85 0 . 08 1 . 60 0 . 10 5 10101010 10101010 2 . 42 0 . 10 0 . 90 0 . 06 6 10101010 10101010 2 . 42 0 . 10 0 . 31 0 . 04 T able 1: Analysis of unidirectionally coupled Hopfield networks A → B each containing 8 cells. The networks and coupling embed attractors { 00001111 , 00110011 , 01010101 } and their mir- rors. T emperature is T = 0 . 25 . A sample run is analyzed using two coarse-grainings: INT captures B ’ s effect on itself and EXT captures A ’ s effect on B; see text. Attractors are population le vel phenomena. They arise because of interactions between groups of cells – no sin- gle cell is responsible for their existence – suggesting that coarse-graining may rev eal interesting features of attractor dynamics. Effective information detects causal interactions. T a- ble 1 analyzes a sample run of unidirectionally coupled Hop- field networks A → B . Network A is initialized at an un- stable point in the energy landscape and B in an attractor . A settles into a dif ferent attractor from B and then shoves B into the new attractor ov er a few time steps. Intuitively , A only ex erts a strong force on B once it has settled in an attractor and before B transitions to the same attractor . Is the force A ex erts on B quantitatively detectable? T able 1 shows the effects of A and B respectiv ely on B by computing ei for two coarse-grainings constructed for each transition t → t + 1 . Coarse-graining INT sets cells in B at t and t + 1 as units and A as extrinsic noise. EXT sets cells in A at t and B at t + 1 as units and fixes B at time t as ground. INT generates higher ei for all transitions except 1 → 2 → 3 , precisely when A shoves B . Effecti ve information is high when an output is sensitive to changes in an input so it is unsurprising that B is more sensiti ve to changes in A exactly when A forces B out from one attractor into another . Analyzing other sample runs (not sho wn) confirms that ei reliably detects when A shov es B out of an attractor . Macroscopic mechanisms depend on the ground. Fix- ing the ground incorporates population-level biases into a coarse-grained cellular automaton’ s information-processing. The ground in coarse-graining EXT (i.e. the output of B at t − 1 ) biases the mechanisms of the units in B at time t . When the ground is an attractor, it introduces tremendous inertia into the coarse-grained dynamics since B is heavily biased tow ards outputting the attractor again. Few inputs from A can overcome this inertia, so if B is pushed out of an attractor it generates high ei about A . Conv ersely , when B stays in an attractor , e.g. transition 5 → 6 , it follows its internal bias and so generates low ei about A . Excess information detects attractor redundancy . Fol- lowing our analysis of gliders, we in vestigate ho w attractors are captured by excess information. It turns out that ξ is ne g- ativ e in all cases: the functional dependencies within Hop- field networks are redundant. An attractor is analogous to a blank Game of Life grid where little is going on. Thus, although attractors are population-level phenomena, we ex- clude them as emergent processes. Excess information expresses attractor transitions. W e therefore refine our analysis and compute the subset of units at time t that maximize ξ ; maximum values are shown in T able 1. W e find that the system decomposes into pairs of occasions with lo w ξ , except when B is shoved, in which case larger structures of 5 occasions emerge. This fits prior analysis sho wing transitions between attractors yield more integrated dynamics (Balduzzi and T ononi, 2008) and sug- gestions that cortical dynamics is metastable, characterized by antagonism between local attractors (Friston, 1997). Our analysis suggests that transitions between attractors are the most interesting emergent behaviors in coupled Hop- field networks. How this generalizes to more sophisticated models remains to be seen. Emergence The examples show we can quantify how well a graining expresses a cellular automaton’ s dynamics. Effecti ve in- formation detects glider trajectories and also captures when one Hopfield network shoves another . Howe ver , ei does not detect whether a unit is integrated. For this we need ex- cess information, which compares the information generated by a mechanism to that generated by its submechanisms. Forming units out of disjoint collections of occasions yields ξ = 0 . Moreover , boring units (such as blank patches of grid or dead-end fixed point attractors) have negativ e ξ . Thus, ξ is a promising candidate for quantifying emergent processes. This section formalizes the intuition that a system is emer- gent if its dynamics are better expressed at coarser spa- tiotemporal granularities. The idea is simple. Emergent units should generate more excess information, and have more excess information generated about them, than their sub-units. Moreov er emergent units should generate more excess information than neighboring units, recall Fig. 4. Stating the definition precisely requires some notation. Let src v l = { v l } ∪ { v k | k → l } and similarly for trg v l . Let J be a subgraining of K , denoted J ≺ K , if for every U j ∈ J there is a unit U k ∈ K such that U j ( U k . W e compare mechanism m ⊂ K with its subgrains via ξ K / J ( m , x out ) := ei ˜ K ( m , x out ) − X v j ∈J ei ˜ J ( m j , x out ) , where m j = m ∩ src v j and ei ˜ K signifies ef fectiv e informa- tion is computed ov er K using micro-alphabets. Definition (emergence) . F ix cellular automaton X with out- put x out . Coarse-graining 2 K is emergent if it satisfies con- ditions E1 and E2 . E1. Each unit U l ∈ K generates excess information about its sources and has excess information generated about it by its targets, relati ve to subgrains J ≺ K : 0 < ξ J / K src U l , x out and 0 < ξ J / K trg U l , x out . (8) E2. There is an emerg ent subgrain J ≺ K such that (i) ev ery unit of K contains a unit of J and (ii) neighbors K 0 (defined below) of K with respect to J satisfy ξ J / K 0 src U 0 , x out ≤ ξ J / K src U , x out (9) for all U ∈ K , and similarly for trg ’ s. If K has no emergent subgrains then E2 is v acuous. Grain K 0 is a neighbor of K with respect to J ≺ K if for ev ery U ∈ K there is a unique U 0 ∈ K 0 satisfying N1. there is a unit T ∈ J such that T ⊂ U , U 0 , src T ⊂ src U , src U 0 and similarly for trg ; and N2. the alphabet of U 0 is no larger than U : Q k ∈ U 0 A k ≤ Q l ∈ U A l , and similarly for the combined alphabets of their sources and targets respecti vely . The graining E X that best expresses X outputting x out is found by maximizing normalized excess information: E X ( x out ) := arg max {K | emergent } ξ ( K , x out ) N K P M I P . (10) Here, N K P M I P is the normalizing constant found when com- puting the minimum information partition for K . Some implications. W e apply the definition to the Game of Life to gain insight into its mechanics. Condition E1 requires that interactions between units and their sources (and targets) are synergistic, Fig. 6CD. Units that decompose into independent pieces, Fig. 6A, or per- form highly redundant operations, Fig. 6B, are therefore not emergent. Condition E2 compares units to their neighbors. Rather than build the automaton’ s spatial organization directly into the definition, neighbors of K are defined as coarse- grainings whose units overlap with K and whose alpha- bets are no bigger . Coarse-grainings with higher ξ than their neighbors are closer to focal points, recall Fig. 4 and Fig. 6CD, where ξ was maximized for units respecting glider trajectories. An analysis of glider boundaries similar in spirit to this paper is (Beer, 2004). 2 Ground output s G is x out restricted to ground occasions. Finally , Eq. (10) picks out the most expressi ve coarse- graining. The normalization plays two roles. First, it bi- ases the optimization towards grainings whose MIPs con- tain few , symmetric parts follo wing (Balduzzi and T ononi, 2008). Second, it biases the optimization towards systems with simpler macro-alphabets. Recall, Fig. 5, that coarse- graining produces more symbolic interactions by decreasing the size of alphabets. Simplifying alphabets typically re- duces effecti ve and excess information since there are less bits to go around. The normalization term rewards simpler lev els of description, so long as they use the bits in play more synergistically . Discussion In this paper we introduced a flexible, scalable coarse- graining method that applies to an y cellular automaton. Our notion of automaton applies to a broad range of systems. The constraints are that they (i) decompose into discrete components with (ii) finite alphabets where (iii) time passes in discrete tics. W e then described how to quantify the in- formation generated when a system produces an output (at any scale) both as a whole and relative to its subsystems. An important feature of our approach is that the output x out of a graining is incorporated into the ground and also di- rectly influences ei and ξ through computation of the actual repertoires. Coarse-graining and emergence therefore cap- ture some of the suppleness of biological processes (Bedau, 1997): they are context-dependent and require many ceteris paribus clauses (i.e. background) to describe. In vestigating examples taken from Conway’ s Game of Life and coupled Hopfield networks, we accumulated a small but significant body of evidence confirming the prin- ciple that expr essive coarse-grainings generate mor e infor- mation r elative to sub-grainings . Finally , we pro visionally defined emergent processes. The definition is provisional since it deri ves from analyzing a small fraction of the possi- ble coarse-grainings of only two kinds of cellular automata. Hopfield networks and the Game of Life are simple mod- els capturing some important aspects of biological systems. Ultimately , we would like to analyze emergent phenomena in more realistic models, in particular of the brain. Con- scious percepts take 100-200ms to arise and brain activity is (presumably) better expressed as comparatively leisurely interactions between neurons or neuronal assemblies rather than much faster interactions between atoms or molecules (T ononi, 2004). T o apply the techniques dev eloped here to more realistic models we must confront a computational hurdle: the number of coarse-grainings that can be imposed on large cellular automata is vast. Nev ertheless, the ap- proach dev eloped here may still be of use. First, manip- ulating macro-alphabets provides a method for performing approximate computations on large-scale systems. Second, for more fine-grained analysis, initial estimates about which coarse-grainings best express a system’ s dynamics can be fine-tuned by comparing them with neighbors. Acknowledgements. The author thanks Dominik Janzing for many useful comments on an earlier draft, Giulio T ononi for stimulating conv ersations and V irgil Griffiths for empha- sizing the importance of excess information. References Abeles, M., Bergman, H., Gat, I., Meilijson, I., Seidemann, E., T ishby , N., and V aadia, E. (1995). Cortical activity flips among quasi-stationary states. Pr oc. Nat. Acad. Sci. , 92:8616–8620. Amit, D. (1989). Modelling brain function: the world of attractor neural networks . Cambridge Uni versity Press. Anderson, P . W . (1972). More is different. Science , 177(4047):393–6. Balduzzi, D. and T ononi, G. (2008). Integrated Information in Discrete Dynamical Systems: Motiv ation and Theoretical Framew ork. PLoS Comput Biol , 4(6):e1000091. Balduzzi, D. and T ononi, G. (2009). Qualia: the geometry of inte- grated information. PLoS Comput Biol , 5(8):e1000462. Bedau, M. A. (1997). Emergent models of supple dynamics in life and mind. Brain Cogn , 34(1):5–27. Beer , R. D. (2004). Autopoiesis and cognition in the game of life. Artif Life , 10(3):309–26. Berlekamp, E., Conway , J., and Guy , R. (1982). Winning W ays for your Mathematical Plays , volume 2. Academic Press. Crutchfield, J. (1994). The calculi of emer gence: Computation, dynamics, and induction. Physica D , 75:11–54. Dennett, D. C. (1991). Real Patterns. J. Philosophy , 88(1):27–51. Friston, K. (1997). Transients, metastability and neuronal dynam- ics. Neur oimage , 5:164–171. Hopfield, J. (1982). Neural networks and physical systems with emergent computational properties. Pr oc. Nat. Acad. Sci. , 79:2554–2558. Jones, L. M., Fontanini, A., Sadacca, B. F ., Miller , P ., and Katz, D. B. (2007). Natural stimuli ev oke dynamic sequences of states in sensory cortical ensembles. Pr oc Natl Acad Sci U S A , 104(47):18772–18777. Pearl, J. (2000). Causality: models, r easoning and inference . Cam- bridge Univ ersity Press. Polani, D. (2006). Emergence, intrinsic structure of information, and agenthood. Int J Complex Systems , 1937. Seth, A. K. (2010). Measuring autonomy and emergence via Granger causality . Artif Life , 16(2):179–96. Shalizi, C. and Moore, C. (2006). What is a macrostate: Subjectiv e observations and objecti ve dynamics. http://arxiv .or g/abs/condmat/0303625 . T ononi, G. (2004). An information inte gration theory of conscious- ness. BMC Neur osci , 5:42.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment