The Computational Structure of Spike Trains

The Computational Structure of Spik e T rains Robert Haslinger, 1, 2 Kristina Lisa Klinkner, 3 and Cosma Rohilla Shalizi 3, 4 1 Ma rtinos Center fo r Biomedi cal Imaging,Massachusetts General Hospital, Charlesto wn MA 2 Depa rtment of Brain and Cognitive Sciences, Massachusetts Institute of T echnology , Cambridge MA 3 Depa rtment of Statistics, Carnegie Mellon University , Pittsburgh P A 4 Santa Fe Institute, Santa Fe NM (Dated: September 2008; January 2009) Neurons p erform computations, and con vey the results of those computations through the statistical structure of their output spike trains. Here we present a practical metho d, grounded in the information-theoretic analysis of prediction, for inferring a minimal represen tation of that structure and for characterizing its com- plexit y . Starting from spike trains, our approac h ﬁnds their c ausal state mo dels (CSMs), the minimal hidden Marko v models or stochastic automata capable of generating statistically-iden tical time series. W e then use these CSMs to ob jec- tiv ely quan tify b oth the generalizable structure and the idiosyncratic randomness of the spik e train. Sp eciﬁcally , w e show that the exp ected algorithmic informa- tion conten t (the information needed to describ e the spike train exactly) can b e split int o three parts describing (1) the time-inv ariant structure (complexit y) of the minimal spike-generati ng process, which describ es th e spik e train statistic al ly , (2) the randomness (inter nal entrop y rate) of the minimal spike-generating process, and (3) a residual pure noise term not describ ed b y the minimal spik e generating pro cess. W e use CSMs to approximate each of these quantities. The CSMs are in- ferred non-parametrically from the data, making only mild regularity assumptions, via the Causal State Splitting Reconstruction (CSSR) algorithm. The metho ds presen ted here complement more traditional spike train analyses by describing not only spiking probability , and spike train entrop y , but also the complexity of a spike train’s structure. W e demonstrate our approach using b oth sim ulated spike trains and exp erimen tal data recorded in rat barrel cortex during vibrissa stimulation. I. INTRODUCTION The recognition that neurons are computational devices is one of the foundations of modern neuroscience ( McCulloch & Pitts , 1943 ). How ever, determining the functional form of such computation is extremely diﬃcult, if only b ecause while one often kno ws the output (the spikes) the input ( synaptic activit y) is almost alw ays unkno wn. Often, therefore, scien tists must draw inferences about the computation from its results, namely the output spik e trains and their statistics. In thi s v ein, man y researc hers ha ve used i nformation theory to determine, via calculation of the en tropy rate, a neuron’s c hannel capacit y , i.e., ho w m uch information the neuron could conceiv ably transmit, giv en the distribution of observ ed spik es ( Rieke et al. , 1997 ). Ho wev er, en tropy q uantiﬁes randomness, and sa ys little about ho w m uch structur e a spike train has, or the amount and type of computation which must hav e, at a minimum, taken place to produce this structure. Here, and throughout this pap er, w e mean “computational structure” information- theoretically , i.e., the most compact eﬀective description of a pro cess capable of statistic al ly repro ducing the observ ed spik e trains. The complexit y of this structure is the num b er of bits needed to describe it. This is diﬀeren t from the algorithmic information conten t of a spike train, which is the num ber of bits needed to repro duce the latter exactly , describing not only its regularities, but also its accidental, noisy details. Our goal is to develop rigorous yet practical metho ds for determining the minimal computational structure necessary and suﬃcient to generate neural spike trains. W e are able to do this through non-parametric analysis of the directly- observ able spike trains, without resorting to a priori assumptions ab out what kind of structure they hav e. W e do this b y identifying the minimal hidden Marko v mo del (HMM) whic h can statistically predict the future of the spike train without loss of information. This HMM also generates spik e trains with the same statistics as the obser ved train. It thus deﬁnes a program whic h describ es the spik e train’s computational structure, letting us quan tify , in bit s, the structure’s complexit y . F rom multiple directions, sev eral groups, including our own, hav e sho wn that minimal generativ e mo dels of time series can b e disco vered b y clustering histories into “states” , based on their conditional distributions o ver future ev ents ( Crutc hﬁeld & Y oung , 1989 ; Grassberger , 1986 ; Jaeger , 2000 ; K night , 1975 ; Littman et al. , 2002 ; Shalizi & Crut chﬁeld , 2001 ). The observed time series need not b e Marko vian (few spike trains are), but the construction alw ays yields 2 the minimal HMM capable of generating and predicting the original pro cess. F ollo wing Shalizi ( 2001 ); Shalizi & Crutc hﬁeld ( 2001 ), we will call such a HMM a “Causal State Mo del” (CSM). Within this framework, the mo del disco very algorithm called Causal State Splitting R e c onstruction , or CSSR ( Shalizi & Klinkner , 2004 ) is an adaptive non-parametric metho d which consistently estimates a system’s CSM from time-series data. In this pap er we adapt CSSR for use in spike train analysis. CSSR provid es us with non-parametric estimates of the time- and history- dep endent spiking probabilities found b y more familiar parametric analyses. Unlike those analyses, it is also capable, in the limit of inﬁnite data, of capturing al l the information ab out the computational structure of the spike-gene rating pro cess contained in the spikes themselv es. In particular, the CSM quanti ﬁes the c omplexity of the spik e-generating process b y sho wing how muc h information ab out the history of the spik es is relev an t to their future, i.e., ho w muc h information is needed to repro duce the spik e train statistically . This is equiv alen t to the log of the eﬀective num b er of statistically-distinct states of the pro cess ( Crutchﬁeld & Y oung , 1989 ; Grassb erger , 1986 ; Shalizi & Crutc hﬁeld , 2001 ). While this is not the same as the algorithmic information conten t, we sho w that CSMs can also appro ximate the av erage algorithmic information con tent, split ting it into three parts: (1) The generativ e process’s complexit y in our sense; (2) the internal entr opy r ate of the generativ e process, the extra information needed to describe the exact state transitions the undergone while generating the spike train; and (3) the r esidual r andomness in the spikes, unconstrained b y the generativ e process. The ﬁrst of these quantiﬁes the spike train’s structure, the last t wo its randomness. Belo w, w e give precise deﬁnitions of these quanti ties, b oth their ensemble av erages ( § II.C ) and their functional dependence on time ( § I I.D ). The time-dep endent versions allow us to determine when the neuron is tra versing states requiring complex descriptions. Our methods put hard n umerical low er b ounds on the amoun t of computational structure which must b e present to generate the observ ed spikes. They also quantify , in bits, the extent to which the neuron is driv en b y external forces. W e demonstrate our approach using b oth simu lated and exp erimentally recorded single-neuron spik e trains. W e discuss the interpretation of our measures, and how they add to our understanding of neuronal computation. I I. THEORY AND METHODS Throughout this paper w e treat spike trains as sto chastic binary time series, with time divided into discrete, equal- duration bins steps (typ ically at one millisecond resolution); “1” corresponds to a spike and “0” to no spike. Our aim is to ﬁnd a minimal description of the computational structure present in such a time series. Heuristically , the structure presen t in a spike train can b e describ ed by a “program” which can repro duce the spikes statistically . The information needed to describe this program (lo osely speaking the program length) quan tiﬁes the structure’s complexity . Our approac h uses minimal, optimally predictive HMMs, or Causal State Mo dels (CSMs), reconstructed from the data, to describe the program. (W e clarify our use of “minimal” b elow.) The CSMs are then used to calculate v arious measures of the computational structure, such as its complexity . The states are c hosen so that they are optimal predictors of the spik e train’s future, using only the information a v ailable from the train’s history . (W e discuss the limitations of this below.) Sp eciﬁcally the states S t are deﬁned b y grouping the histories of past spiking activity X t −∞ whic h o ccur in the spike train into equiv alence classes, where all mem b ers of a given equiv alence class are statistically equiv alent i n terms of predictin g the future spiki ng X ∞ t +1 . ( X t t 0 denotes the sequence of random observ ables, i.e., spikes or their absence, b et ween t 0 and t > t 0 while X t denotes the random observ able at time t . The notation is similar for the states.) This construction ensures that the causal states are Mark ovian, even if the spik e train is not ( Shalizi & Crutchﬁeld , 2001 , Lemma 6, p. 839). Therefore, at all times t the system and its p ossible future evolution(s) can b e sp eciﬁed by the state S t . Lik e all HMMs, a CSM can b e represented pictorially b y a directed graph, with nodes standing for the pro cess’s hidden states and directed edges the possible transitions betw een these states. Each edge is lab eled with t he observ able/sym b ol emitted during the corresponding transition ( “1” for a spike and “0” for no spik e), and the probabilit y of trav ersing that edge giv en that the system started in t hat state. The CSM also speciﬁes the time-a veraged probabilit y of o ccupying an y state (via the ergo dic theorem for Marko v chains). The theory is describ ed in more detail b elow, but at this p oint examples ma y clarify the ideas. Figures 1 A and B sho w tw o simple CSMs. Both are built from simulated ≈ 40 Hz spike trains 200 seconds in length (1 msec time bins, p = 0 . 04 IID at each time when spiking is p ossible). How ever, spike trains generated from the CSM in Figure 1 B ha ve a 5 msec refractory p erio d after each spike (when p = 0), while the spiking rate in non-refractory p erio ds is still 40 Hz ( p = 0 . 04). The refractory p erio d is additional structure, represented by the extra states. State A represents the status of the neuron during 40 Hz spiking, outside of the refractory p erio ds. While in this state, the neuron either emits no spi ke ( X t +1 = 0), sta ying in state A , or emits a spike ( X t +1 = 1) with probability p = 0 . 04 and mo v es to state B . The equiv alence class of past spiking histories deﬁning state A therefore includes all past spiking histories for which the most recent ﬁve symbols are 0, sym b olically {∗ 00000 } . State B is the neuron’s state during the ﬁrst 3 msec of the refractory perio d. It is deﬁned by the set of spi king histories {∗ 1 } . No spike can b e emitted during a refractory p eriod so the transition to state C is certain and the symbol emitted is alw ays ’0’. In this manner the neuron pro ceeds through states C to F and back to state A whereup on it is p ossible to spike again. The re st of t his section is divided into four subsections. First, w e brieﬂy review the formal theory b ehind CSMs (for details, see Shalizi ( 2001 ); Shalizi & Crutchﬁeld ( 2001 )) and discuss why they can b e considered a go o d choice for understanding the structural con tent of spik e trains. Second, w e describe the Causal State Splitting R e c onstruction (CSSR) algorithm used to reconstruct CSMs from observed spike trains ( Shalizi & Klink ner , 2004 ). W e emphasize that CSSR requires no a priori knowledge of the structure of the CSM whic h is discov ered from the spik e train. Third, w e discuss t wo diﬀerent notions of spike train structure, namely statistical complexit y and algorithmic information con tent. These t w o measures can be interpreted as diﬀeren t asp ects of a spike train’s computational structure, and eac h can b e related to the reconstructed CSM. F ourth and ﬁnally , we show ho w the reconstructed CSM can b e used to predict spiking, measure the neural resp onse and detect the inﬂuence of external stim uli. A. Causal State Mo dels The foundation of the theory of causal states is the concept of a pr e dictively suﬃcient statistics . A statistic, η , on one random v ariable, X , is suﬃcien t for predicting another random v ariable, Y , when η ( X ) and X hav e the same information 1 ab out Y , I [ X ; Y ] = I [ η ( X ); Y ]. This holds if and only if X and Y are conditionally indep endent giv en η ( X ): P ( Y | X, η ( X )) = P ( Y | η ( X )). This is a close relative of the familiar idea of p ar ametric suﬃciency; in Ba yesian statistics, where parameters are random v ariables, parametric suﬃcien cy is a sp ecial case of predictive suﬃciency ( Bernardo & Smith , 1994 ). Predictiv e suﬃciency shares all of parametric suﬃciency’s optimali ty properties ( Bernardo & Smith , 1994 ). Ho wev er, a statistic’s predictiv e suﬃciency depends only on the actu al joint distri bution of X and Y , not on any parametric mo del of that distribution. Again as in the parametric case, a minimal predictively suﬃcien t statistic  is one which is a function of every other suﬃcien t statistic η , i.e.,  ( X ) = h ( η ( X )) for some h . Minimal suﬃcien t statistics are the most compact summaries of the data whic h retain all the pr edictively-relev ant information. A basic result is that a minimal suﬃcien t statistic alw ays exists and is (essentially) unique, up to isomorphism ( Bernardo & Smith , 1994 ; Shalizi & Crutchﬁeld , 2001 ). In the con text of sto chastic processes, such as spike trains,  is the minimal suﬃcien t statistic of the history X t −∞ for predicting future of the pro cess, X ∞ t +1 . This st atistic is the optimal predictor of the observ ations. The seque nce of v alues of the minimal suﬃcient statistic, S t =  ( X t −∞ ), is another sto chastic process. This pro cess is alw ays a homogeneous Mark ov chain, whether or not the X t pro cess is ( Knight , 1975 ; Shalizi & Crut chﬁeld , 2001 ). T urned around, this means that the original X t pro cess is alwa ys a random function of a homogeneous Marko v chain , whose laten t states, named the c ausal s tates by Crutc hﬁeld & Y oung ( 1989 ), are opt imal, minimal predictors of the future of the time series. A c ausal state mo del or c ausal state machine is a sto chastic automaton or HMM constructed so that its Mark o v states are minimal suﬃcient statistics for predicting the future of the spike train, and consequently can generate spike trains statistically identical to those observed. 2 Causal state r e c onstruction means inf erring the causal states from the observe d spike train. F ollo wing Crutchﬁeld & Y oung ( 1989 ); Shalizi & Crutchﬁeld ( 2001 ), the causal states can b e seen as equiv alence classes of spike-train histories X t −∞ whic h maximize the mutual information b et ween the state(s) and the future of the spike train X ∞ t +1 . Because they are suﬃcient, they predict the future of the spike train as well as it can b e predicted from its history alone. Because they are minimal, the num b er of states or equiv alence classes is as small as it can b e without discarding predictiv e p ow er. 3 F ormally , tw o histories, x − and y − , are equiv alent when P ( X ∞ t +1 | X t −∞ = x − ) = P ( X ∞ t +1 | X t −∞ = y − ). The equiv a- lence class of x − is [ x − ]. Deﬁne the function which maps histories to their equiv alence classes:  ( x − ) ≡ [ x − ] =  y − : P ( X ∞ t +1 | X t −∞ = y − ) = P ( X ∞ t +1 | X t −∞ = x − )  1 See Cov er & Thomas ( 1991 ) for information-theoretic deﬁnitions and notation. 2 Some authors use “hidden Marko v Mo del” only for mo dels where the curren t observ ation is indep endent of all other v ariables giv en the curren t state, and call the broader class which includes CSMs “partially observ able Marko v mo del” . 3 There may exist more compact representations, but then the states, or their equiv alents, can never b e empirically iden tiﬁed — see Shalizi & Crutchﬁel d ( 2001 , thm. 3, p. 846), or L ¨ ohr & Ay ( 2009 ). 4 The causal states are the p ossible v alues of  , i.e., the equiv alence classes; each corresp onds to a distinct distribution for the future. The state at time t is S t =  ( X t −∞ ). Clearly ,  ( x − ) is a suﬃcient statistic. It is also minimal, since if η is suﬃcient, then η ( x − ) = η ( y − ) implies  ( x − ) =  ( y − ). One can further show ( Shalizi & Crutchﬁeld , 2001 , Theorem 3) that  is the unique minimal suﬃcient statistic, meaning that an y other m ust b e isomorphic to it. In addition to b eing minimal suﬃcient statistics, the causal states ha ve some other imp ortant properties whic h mak e them ideal for quantifying structure ( Shalizi & Crutchﬁeld , 2001 ). (1) As men tioned, { S t } is a Marko v pro cess, and one can write the observed process X as a random function of the causal state pro cess, i.e., X has a natural hidden-Mark ov-model representation. (2) The causal states are recursively calculable; there is a function T such that S t +1 = T ( S t , X t +1 ) — see App endix A . (3) CSMs are closely related to the “predictiv e state representat ions” of con trolled dynamical systems ( Littman et al. , 2002 ); see App endix C . B. Causal State Splitting Reconstruction Our goal is to ﬁnd a minimal suﬃcient statistic for the spike train, whic h will form a hidden Mark ov mo del. As stated previously , the states of this mo del are eq uiv alence classes of s piking hi stories X t −∞ . In practice, we need an algorithm whi ch can b oth cluster histories in to groups which preserve their conditional distribution of futures, and ﬁnd the history lengt h Λ at which the past may b e truncated while preserving the computational structure of the spik e train. The former is accomplished by the CSSR algorithm ( Shalizi & Klinkner , 2004 ) for inferring causal states from data by building a recursiv e next-step-suﬃcien t statistic. 4 W e do the latter by minimizing Sc hw artz’s Bay esian Information Criterion (BIC) ov er Λ. T o sa ve space, w e just sk etch the CSSR algorithm here. 5 CSSR starts b y treating the pro cess as an indep endent, iden tically-distributed sequence, with one causal state. It adds states when statistical tests sho w that the curren t set of states is not suﬃcien t. Supp ose w e ha v e a sequence x N 1 = x 1 , x 2 , . . . x N of length N from a ﬁnite alphab et A of size k . W e wish to deriv e from this an estimate ˆ  of the minimal suﬃcient statistic  . W e do this by ﬁnding a set Σ of states, each of which will b e a set of strings, or ﬁnite-length histories. The function ˆ  will then map a history x − to whic hever state contains a suﬃx of x − (taking “suﬃx” in the usual string-manipulation sense). Although each state can contai n m ultiple suﬃxes, one can c heck ( Shalizi & Klinkner , 2004 ) that the mapping ˆ  will never b e ambiguous. The nul l hyp othesis is that the pro cess is Marko vian on the basis of the states in Σ, P ( X t | X t − 1 t − L = ax t − 1 t − L +1 ) = P ( X t | ˆ S = ˆ  ( x t − 1 t − L +1 )) (1) for all a ∈ A . In w ords, adding an extra piece of history does not change the conditional distr ibution for the next observ ation. W e can ch eck this with standard statistical tests, such as χ 2 or Kolmogorov- Smirnov. In this pap er, we used a KS test of size α = 0 . 01. 6 If we reject this h yp othesis, we fall back on a r estricte d alternative hyp othesis , that w e ha ve the righ t set of conditional distributions, but ha ve matc hed them with the wrong histories. That is, P ( X t | X t − 1 t − L = ax t − 1 t − L +1 ) = P ( X t | ˆ S = s ∗ ) (2) for some s ∗ ∈ Σ, but s ∗ 6 = ˆ  ( x t − 1 t − L +1 ). If this hypothesis passes a test of size α , then s ∗ is the state to which we assign the history 7 . Only if the ( 2 ) is itself rejected do we create a new state, with the suﬃx ax t − 1 t − L +1 . 8 4 A next-step-suﬃcien t statistic con tains all the information needed for optimal one-step-ahead prediction, I [ X t +1 ; η ( X t −∞ )] = I [ X t +1 ; X t −∞ ], but not necessarily for longer predictions. CSSR relies on the theorem that if η is next-step suﬃcient, and it is recursively calculable, then η is suﬃcien t for the whole of the future ( Shalizi & Crutc hﬁeld , 2001 , pp. 842–843). CSSR ﬁrst ﬁnds a next-step suﬃcien t statistic, and then reﬁnes it to be recursive. 5 In addition to Shalizi & Klinkner ( 2004 ), which gives pseudo co de, some details of conv ergence, and applications to pro cess classiﬁcation, are treated in Klinkner & Shalizi ( 2009 ); Shalizi et al. ( 2009 ). An open-source C++ imple- men tation is av ailable at http://bactra.org/CSSR/ . The CSMs generated by CSSR can b e display ed graphically , as we do in this pap er, with the op en-source program dot ( http://www.graphviz.org/ 6 F or ﬁnite N , decreasing α tends to yield simpler CSMs with fewer states. In a sense, it is a s ort of regularization co eﬃcien t. The inﬂuence of this regularization diminishes as N increases. F or the data used in the Results section of this pap er, v arying α in the range 0 . 001 < α < 0 . 1 made little diﬀerence. 7 If more than one suc h state s ∗ exists, we chose the one for which b P ( X t | ˆ S = s ∗ ) diﬀers least, in total v ariation distance, from b P ( X t | t − 1 t − L = ax t − 1 t − L +1 ), which is plausible and conv enient. How ev er, whic h state we c hose is irrelev ant in the limit N → ∞ , so long as the diﬀerence b etw een the distributions is not statistically signiﬁcant. 8 The conceptual ly similar algorith m of Kennel & Mees ( 2002 ) in eﬀec t alwa ys cr eates a new state, whic h leads to more complex mo dels, sometimes inﬁnitely more complex ones; see Shalizi & Klinkner ( 2004 ). 5 The algorithm itself has three phases. Phase I initializes Σ to a single state, which con tains only the null suﬃx ∅ . (That is, ∅ is a suﬃx of any string.) The length of the longest suﬃx in Σ is L ; this starts at 0. Phase I I iteratively tests the successiv e v ersions of the n ull hypothesis, Eq. 1 , and L increases by one each iteration, un til we reac h some maxim um length Λ. At the end of II, ˆ  is (appro ximately) next-step suﬃcien t. Phase II I mak es ˆ  recursiv ely calculable, b y splitting the states until they hav e deterministic transitions. Under mild technical conditions (a ﬁnite true num b er of states, etc.), CSSR conv erges in probabilit y on the correct CSM as N → ∞ , pro vided only that Λ is long enough to discriminate all of the states. The error of the predicted distributions of futures P ( X ∞ t +1 | X t −∞ ), measured by total v ariation distance, deca ys as N − 1 / 2 . Section 4 of Shalizi & Klinkner ( 2004 ) details CSSR’s conv ergence properties. Comparisons of CSSR’s performance with that of more traditional exp ectation max imization based approac hes can also be found in Shalizi & Klinkner ( 2004 ) as can time complexit y bounds for the algorithm. Dep ending upon the mac hine used, CSSR can pro cess an N = 10 6 time series in under a minute. 1. Cho osing Λ CSSR requires no a priori knowledge of the CSM’s structure, but do es need a choice of of Λ; here pic k it by minimizing the BIC of the reconstructed mo dels ov er Λ, i.e., B I C ≡ − 2 log L + d log N (3) where L is the likel iho o d, N is the data length and d is the n um b er of model parameters, in our case the num b er of predictiv e states 9 BIC’s logarithmic-with- N p enalty term help s keep the n um b er of causal states from gro wing to o quic kly with increased data size, which is why we use it instead of the Ak aike Information Criterion (AIC). Also, BIC is known to b e consistent for selecti ng the order of Marko v chains and v ariable-length Marko v mo dels ( Csisz´ ar & T alata , 2006 ), b oth of which are sub-classes of CSMs. W riting the observ ed spike train as x N 1 , and the state sequence as s N 0 , the total likelihoo d of the spike train is L = X s N 0 ∈ Σ N +1 P ( X N 1 = x N 1 | S N 0 = s N 0 ) P ( S N 0 = s N 0 ) , (4) the sum ov er all possible causal state sequences of the join t probabilit y of the spik e train and the state sequence. Since the states up date recursiv ely , s t +1 = T ( s t , x t +1 ), the starting state s 0 and the spike train x N 1 ﬁx the entire state sequence s N 0 . Th us the sum ov er state sequences can b e replaced by a sum o ver initial states L = X s i ∈ Σ P ( X N 1 = x N 1 | S 0 = s i ) P ( S 0 = s i ) (5) with the state probabilities P ( S 0 = s i ) coming from the CSM. By the Marko v prop erty , P ( X N 1 = x N 1 | S 0 = s i ) = N Y j =1 P ( X j = x j | S j − 1 = s j − 1 ) (6) Selecting Λ is no w straigh tfow ard: for each v alue of Λ, w e build the CSM from the spik e train, calculate the lik eliho o d using Eq. 5 and 6 , and pick the v alue, and CSM, minimizing Eq. 3 . W e try all v alues of Λ up to a mo del- independent upp er b ound. F or a wide range of sto chastic pro cesses, Marton & Shields ( 1994 ) show ed that the length m of subsequences for which probabilities can be consistently and non-parametrically estimated can gro w as fast as log N /h , where h is the entrop y rate, but no faster. CSSR estimates th e distribution of the next symbol giv en the previous Λ sym b ols, which is equiv alent to estimating joint probabilities of blo cks of length m = Λ + 1. Thus Marton and Shield’s result limits the usable v alues of Λ: Λ ≤ log N h − 1 (7) 9 The n umber of indep endent parameters d in volv ed in describing the CSM will b e (n umber of states)*(num b er of sym b ols - 1) since the sum of the outgoing probabilities for each state is constrained to b e 1. Thus, for a binary alphabet, d = num b er of states. 6 Using Eq. 7 requires the en tropy rate h . The latter can either b e upp er b ounded as the log of the alphab et size (here, log 2 = 1), or by some other, less p essimistic, estimator of the entrop y rate (such as the output of CSSR with Λ = 1). Use of an upp er b ound on h results in a conserv ative maxim um v alue for Λ. F or example, a 30 min ute experiment with 1 msec time bins lets us use at le ast Λ ≈ 20 b y the most p essimistic estimate of h = 1; the actual maximum v alue of Λ may be muc h larger. W e use Λ ≤ 25 in this pap er but see no indication that this can’t b e extended further, if need b e. 2. Condensing the CSM F or real neural data, the num b er of causal states can b e very large — hundreds or more. This creates an interpreta- tion problem, if only b ecause it is hard to ﬁt suc h a CSM on a single page for insp ection. W e thus developed a wa y to reduce the full CSM while still accountin g for most of the spike train’s structure. Our “state culling” technique found the least-probable states and selectiv ely remov ed them, appropriately redirecting state transitions and reassigning state occupation probabilities. By keeping the most probable states, we fo cus on the ones which contribute the most to the spike train’s structure and complexity . Again, we used BIC as our mo del selection criterion. First, w e sorted the states by probability , ﬁnding the least probable state ( “remov e” state) with a single incoming edge from a state (its “ancestor” ) with outgoing transitions to t wo di ﬀerent states, the r emov e state and a secon d, “k eep” state. W e redirected b oth of the ancestor’s outgoing edges to the keep state. Second, we reassigned the remov e state’s outgoing transitions to the keep state. If the outgoing transitions from the keep state w ere still deterministic (at most a single 0 emitting edge and a single 1 emitting edge), w e stopp ed. If the transitions were non-deterministic, we merged states reached by emitting 0s with each other (lik ewise those reac hed b y 1s), rep eating this un til termination. Third, we chec ked that there existed a state sequence of the new mo del which could generate the observed spikes. If there was, we accepted the new CSM. If not, w e rej ected the new CSM and c hose the next low est probability state from the original CSM to remov e. This culling was iterated until removing any state made it imp ossible for the CSM to generate the spik e train. At eac h iteration, we calculated BIC (as describ ed in the previous section), and ultimate c hose the culled CSM with the minim um BIC. This gav e a culled CSM for each v alue of Λ; the ﬁnal one w e used was c hosen after also minimizing BIC ov er Λ. The CSMs shown b elow in the results section paper result from this minimizing of BIC ov er Λ and state culling. 3. ISI Bootstrapping While we do mo del selection with BIC, we also w ant to do mo del c hecking or adequacy-testing. F or the most part, w e do this by using the CSM to b o otstrap point-wise conﬁdence b ounds on the in terspike in terv al (ISI) distribution, and c hecking their co verage of the empirical ISI distribution. Because this distribution is not used by CSSR in reconstructing the CSM, it provides a chec k on the latter’s ability to accurately describe the spik e-train’s statistics. Sp eciﬁcally , we generated conﬁdence b ounds as follows. T o simulat e one spike train, we pick ed a random starting state according to the CSM’s inferred state-o ccupation probabilities, and then ran the CSM forward for N time-steps, N being the length of the original spik e train. This giv es a binary time-series, where a “1” stands for a spike an d a “0” for no-spike, and ga ve us a sample of in ter-spike interv als from the CSM. This in turn ga ve an “empirical” ISI distribution. Rep eated o ver 10 4 independent runs of the CSM, and taking the 0 . 005 and 0 . 995 quantiles of the distributions at each ISI length, gives 99% p oin twise conﬁdence b ounds. (Poin twise bounds are necessary b ecause of the ISI distribution often mo dulates rapidly with ISI length.) If the CSM is correct, the empirical ISI will, b y c hance, lie outside the b ounds at ≈ 1% of the ISI lengths. If w e split the data into training and v alidation sets, a CSM reconstructed from the training set can b e used to b o otstrap ISI conﬁdence b ounds, which can b e compared to the ISI distribution of the test set. W e discuss this sort of of cross v alidation, as well as an additional test based on the time rescaling theorem, in App endix B . 7 C. Complexity and Algorithmic Information Content The algorithmic information c ontent K ( x n 1 ) of a sequence x n 1 is the length of the shortest complete (input-free) computer program whic h will output x n 1 exactly and then halt ( Cov er & Thomas , 1991 ) 10 . In general, K ( x n 1 ) is strictly uncomputable, but when x n 1 is the realization of a stochastic pro cess X n 1 , the ensem ble-av eraged algorithmic information essentiall y coincides with the Shannon entrop y ( “Brudno’s theorem” ; see Badii & Politi ( 1997 )), reﬂecting the fact that b oth are maximized for completely random seque nces ( Co ve r & Thomas , 1991 ). Both the algorithmic information and the Shannon entrop y can b e conv enientl y written in terms of a minimal suﬃcient statistic Q : E [ K ( X n 1 )] = H [ X n 1 ] + o ( n ) = H [ Q ] + H [ X n 1 | Q ] + o ( n ) (8) The equalit y H [ X n 1 ] = H [ Q ] + H [ X n 1 | Q ] holds b ecause Q is a function of X n 1 , so H [ Q | X n 1 ] = 0. The k ey to determining a spik e train’s exp ected algorithmic information is thus to ﬁnd a minim al suﬃcient statistic. By construction, causal state mo dels pro vide exactly this; a minimal suﬃcien t statistic for x n 1 is the state sequence s n 0 = s 0 , s 1 , . . . s n ( Shalizi & Crutc hﬁeld , 2001 ). Th us the ensemble-a v eraged algorithmic information con tent, dropping terms o ( n ) and smaller, is E [ K ( X n 1 )] = H [ S n 0 ] + H [ X n 1 | S n 0 ] = H [ S 0 ] + n X i =1 H [ S i | S i − 1 ] + n X i =1 H [ X i | S i , S i − 1 ] (9) Going from the ﬁrst to the second line uses the causal states’ Marko v prop erty . Assuming stationarity , Eq. 9 b ecomes E [ K ( X n 1 )] = H [ S t ] + n ( H [ S t | S t − 1 ] + H [ X t | S t , S t − 1 ]) = C + n ( J + R ) (10) This separates terms represent ing structure from those represen ting randomness. The ﬁrst term in Eq. 10 is the c omplexity , C , of the spike-generating pro cess ( Crutchﬁeld & Y oung , 1989 ; Grass- b erger , 1986 ; Shalizi et al. , 2004 ). C = H [ S t ] = − E [log P ( S t )] (11) C is the entrop y of the causal states, quantifying the structure present in the observed spikes. This is distinct from the entrop y of the spikes themselves, which quantiﬁes not their structure but their randomness (and is appro ximated b y the other t wo terms). Intuitiv ely , C is the (time-a veraged) amoun t of information about the past of the system whic h is relev ant to predicting its future. F or example, consider again the I ID 40 Hz Bernoulli pro cess of Figure 1 A. With p = 0 . 04, this has an entrop y of 0 . 24 bits/msec, but b ecause it can b e describ ed by a single state, the complexit y is zero. (That state emits either a “0” or a “1” , with resp ective probabilities 0 . 96 and 0 . 04, but either wa y the state transitions back to itself.) In contrast, adding a 5 ms refractory p erio d to the pro cess means six states are needed to describ e the spike trains (Figure 1 B). The new structure of the refractory perio d is quantiﬁed b y the higher complexit y , C = 1 . 05 bits. The second and third terms in Eq. 10 b oth describ e randomness, but of distinct kinds. The second term, the internal entr opy r ate J , quantiﬁes the randomness in the state transitions; it is the en trop y of the next state giv en the curren t state. J = H [ S t +1 | S t ] = − E [log P ( S t +1 | S t )] (12) This is the a verage num b er of bits p er time-step needed to describe the sequence of states the process mo ved through (beyond those given by C ). The last term in Eq. 10 accounts for any residual randomness in the spikes whic h is not captured b y the state transitions. R = H [ X t +1 | S t , S t +1 ] = − E [log P ( X t +1 | S t , S t +1 )] (13) 10 The algorithmic information con tent is also called the Kolmogorov complexity . W e do not use this te rm, to av oid confusion with our “complexity” C the information needed to repro duce the spik e train statistic al ly rather than exactly (Eq. 11 ). See Badii & Politi ( 1997 ) for a detailed comparison of complexity measures. 8 F or long trains, the en tropy of the spik es, H [ X n 1 ], is appro ximately the sum of these tw o terms, H [ X n 1 ] ≈ n ( J + R ). Computationally , C represents the ﬁxed generating structure of the pro cess, which needs to be described once, at the b eginning of the time series, and n ( J + R ) represen ts the growin g list of details which pick out a particular time series from the ensemble which could b e generated; this needs, on a v erage, J + R extra bi ts per time-step. (Cf. the “sophistication” of G´ acs et al. ( 2001 ).) Consider again the 40 Hz Bernoulli pro cess. As there is only one state, the pro cess alwa ys stays in that state. Th us the entrop y of the next state J = 0. How ev er, the state sequence yields no information ab out the emitted symbols (the pro cess is I ID), so the residual randomness R = 0 . 24 bits/msec — as it m ust b e, since the total entrop y rate is 0 . 24 bits/msec. In con trast, the states of the 5 msec refractory pro cess are informativ e about the process’s future. The in ternal en trop y rate J = 0 . 20 bits/msec and the residual randomness R = 0. All of the randomness is in the state transitions, b ecause they uniquely deﬁne the output spi ke train. The randomness in the state trans ition is conﬁned to state A , where the pro cess “decides” whether it will sta y in A , emitting no spike, or emit a spike and go to B . The decision needs, or giv es, 0 . 24 bits of information. The transitions from B through F and back to A are ﬁxed and con tribute 0 bits, reducing the exp ected J . The imp ortant p oint is that the structure present in the refractory p erio d mak es the spik e train less random, lo wering its entrop y . Av eraged ov er time, the mean ﬁring rate of the pro cess is p = 0 . 0333. W ere the spikes IID, the en tropy rate w ould be 0 . 21 bits/msec, but in fact J + R = 0 . 20 bits/msec. This is b ecause a minimal description of a long sequence X t 1 ...X t N = X t N t 1 , the generating pro cess only needs to b e describ ed onc e ( C ), while the in ternal en tropy rate and randomness nee d to b e updated at each time step ( n ( J + R )). Simply put, a complex, structured spik e train can b e exactly describ ed in fewer bits than one which is entirely random. The CSM lets us calculate this reduction in algorithmic information, and quantif y the structure b y means of the complexity . D. Time-Va rying Complexity and Entropies The complexit y and en tropy are ensemble-a v eraged quantities. In the previous section the ensemble was the en tire time series, and the a veraged complexity and en tropies w ere analogous to a mean ﬁring rate. The time-v arying complexit y and entropies are also of int erest, for example their v ariation after stimuli. A p eri-stimul us time histogram (PSTH) sho ws ho w the ﬁring probabilit y v aries with time; the same idea w orks for the complexit y and en tropy . Since the states form a Mark ov c hain, and any one spik e train sta ys within a single ergodic comp onent, we can in vok e the ergo dic theorem ( Gray , 1988 ), and (almost surely) assert that X S t ,S t +1 P ( S t , S t +1 , X t +1 ) f ( S t , S t +1 , X t +1 ) = lim N →∞ 1 N N X t =1 f ( S t , S t +1 , X t +1 ) = lim N →∞ h f ( S t , S t +1 , X t +1 ) i N (14) for arbitrary int egrable functions f ( S t , S t +1 , X t +1 ). In the case of the mean ﬁring rate, the function to time-a verage is l ( t ) ≡ X t +1 . F or the time av eraged-complexity , in ternal entrop y and residual randomness, the functions (respectively c , j and r ) are c ( t ) = − log P ( S t ) j ( t ) = − log P ( S t +1 | S t ) r ( t ) = − log P ( X t +1 | S t , S t +1 ) (15) and time-v arying en tropy h ( t ) = j ( t ) + r ( t ). The PSTH av erages ov er an ensemble of stimulus presentations, rather than time: λ P S T H ( t ) = 1 M M X i =1 l i ( t ) = 1 M M X i =1 X t +1 ,i (16) with M b eing num b er of stim ulus presen tations, and t re-set to zero at each presen tation. Analogously , the “PSTH” of the complexity is C P S T H ( t ) = 1 M M X i =1 c i ( t ) = 1 M M X i =1 − log P ( S t,i ) (17) 9 F or the en tropies, replace c with j , r or h as appropriate. Similar c alculations can b e made with an y w ell-deﬁned ensem ble of reference times, not just stimulus presentations; we will also calculate c and the entropies as functions of the time since the latest spike. W e can estimate the error these time-dependent quantities as the standard error of the mean as a function of time, S E t = s t / √ M where s t is the sample standard deviation in eac h time bin t and M is the n umber of trials. The probabilities appearing in the deﬁnitions of c ( t ), j ( t ), r ( t ) also ha ve some estimation errors, either because of sampling noise or, more interestingly , b ecause the ensemble is b eing distor ted b y outside inﬂ uences. The latter creates a gap b et ween their av erages (ov er time or stim uli) and what the CSM predicts for those av erages. In the next section, w e explain ho w to use this to measure the inﬂuence of external drivers. E. The Inﬂuence of External Fo rces If we know that S t = s , the CSM predicts that ﬁring probability is λ ( t ) = P ( X t +1 = 1 | S t = s ). By means of the CSM’s recursiv e ﬁltering prop ert y (App endix A ), once a transien t regime has passed, the state is alwa ys known with certain ty . Thereafter, the CSM predicts what the ﬁring probability should b e at all times, incorp orating the eﬀects of the spike train’s history . As w e show in the next section, these predictions give go o d matches to the actual response function in sim ulations where the spiking probabilit y dep ends only on the spike history . But real neurons’ spiki ng rates generally also dep end on external pro cesses, e.g., stim uli. As curren tly formulated, the CSM is (or, rather, con verges on) the optimal predictor of the future of the pro cess given its own past. Such an “output only” mo del do es not represent the (p ossible) eﬀects of other pro cesses, and so ignores external cov ariates and stimuli. Present ly , determining the precise form of spike trains’ resp onses to external forces is b est left to parametric mo dels. Ho wev er, we can use output-only CSMs to learn something ab out the computation: the PSTH-calculated entr opy rate H P S T H ( t ) = J P S T H ( t ) + R P S T H ( t ) quant iﬁes the exten t to which external processes drive the neuron. (The PSTH subscript is henceforce supressed.) Supp ose w e kno w th e true ﬁring probabilit y λ true ( t ). At each time step, the CSM predicts the ﬁring probability λ C S M ( t ). If λ C S M ( t ) = λ true ( t ), then the CSM correctly describ es the spiking and the PSTH entrop y rate is H C S M ( t ) = − λ C S M ( t ) log [ λ C S M ( t )] − (1 − λ C S M ( t )) log [1 − λ C S M ( t )] (18) Ho wev er, if λ C S M ( t ) 6 = λ true ( t ), then the CSM mis-describes the spiking, b ecause it neglects the inﬂuence of external pro cesses. Simply put, the CSM has no wa y of knowing when the stimuli happ en. The PSTH entrop y rate calculated using the CSM b ecomes H C S M ( t ) = − λ true ( t ) log [ λ C S M ( t )] − (1 − λ true ( t )) log [1 − λ C S M ( t )] (19) Solving λ true ( t ), λ true ( t ) = H C S M ( t ) + log [1 − λ C S M ( t )] log [1 − λ C S M ( t )] − log [ λ C S M ( t )] (20) The discrepancy b etw een λ C S M ( t ) and λ true ( t ) indicates how muc h of the apparent randomness in the entrop y rate is actually due to external driving. The true PSTH entrop y rate H true ( t ) is H true ( t ) = − λ true ( t ) log [ λ true ( t )] − (1 − λ true ( t )) log [1 − λ true ( t )] (21) The diﬀerence b etw een H C S M ( t ) and H true ( t ) quantiﬁes, in bits, the driving by external forces as a function of the time since stimul us presen tation. ∆ H = H C S M ( t ) − H true ( t ) = λ true ( t ) log  λ true ( t ) λ C S M ( t )  + (1 − λ true ( t )) log  1 − λ true ( t ) 1 − λ C S M ( t )  (22) This stimulus-driven entr opy ∆ H is the relative en trop y or Kullback-Leibler div ergence D ( X true k X C S M ) b et ween the true distribution of sym b ol emissions and that predicted by the CSM. Information-theoretically , this relativ e entrop y is the error in our prediction of the next state due to assuming the neuron is running autonomously when it’s actually externally driv en. Since ev ery state corresponds to a distinct distribution ov er future b ehavior, this is our error in predicting the future due to ignorance of the stimulus. 11 11 Cf. the informational c oher enc e in tro duced by Klinkner et al. ( 2006 ) to measure of information-sharing betw een 10 I I I. RESUL TS W e no w presen t a few examples. (All of them use a time-step of 1 millisecond.) W e b egin with ideal ized mo del neurons to illustrate our technique. W e reco ver CSMs for t he mo del neurons using only the simulated spike trains as input to our algorithms. F rom the CSM we calculate the complexit y , entropies, and, when appropriate, stim ulus- driv en entrop y (Kullback-Leibler div ergence b et ween the true and CSM predicted ﬁring probabilities) of each mo del neuron. W e then analyze spikes recorded in vivo from a neuron in lay er I I/I I I of rat SI (barrel) cortex. W e use spike trains recorded both with and without external stim ulation of the rat’s whiskers. See Andermann & Moore ( 2006 ) for exp erimen tal details. A. Mo del neuron with a “soft” refractory p erio d and bursting W e b egin with a refractory , bursting model neuron, whose spiking rate dep ends only on the time since the last spik e. The baseline rate is 40 Hz. Every spik e is follow ed by a 2 msec “hard” refractory perio d, du ring which spikes nev er o ccur. The spiking rate then reb ounds to twice its baseline, to which it slo wly decays . (See dashed line in the ﬁrst panel of Figure 3 B.) This history dep endence mimics that of a bursting neuron, and is, intuitiv ely , more complex than the simple refractory p erio d of the mo del in Figure 1 . Figure 2 sho ws the 17-state CSM reconstructed from a 200 second spik e train (at 1 msec resolution) generated by this mo del. It has a complexit y of C = 3 . 16 bits (higher than that of the mo del in Figure 1 , as an ticipated), an in ternal entrop y rate of J = 0 . 25 bits/msec and a residual randomness of R = 0 bits/msec. The CSM was obtained with Λ = 17 (selected by BIC). Figure 3 A shows how the 99% ISI b ounds b o otstrapp ed from the CSM enclose the empirical ISI distribution, with the exception of one short segment. The CSM is easily interpreted. State A is the baseline state. When it emits a spike, the CSM mov es to state B . There are then tw o deterministic transitions, to C and then D , which nev er emit spikes; this is the hard 2 msec refractory p erio d. Once in D it is p ossible to spik e again, and if that happ ens, the transition is back to state B . Ho wev er, if no spi ke is emitted, the transition is to state E . This is repeated, with v arying ﬁring probabilit ies, as states E through Q are tra versed. Even tually , the pro cess returns to A and so to baseline. Figure 3 B plots the ﬁring rate, complexit y , and in ternal en tropies as functions of the time since the last spik e c onditional on no subse quent spike emission . This lets us compare the ﬁring rate predicted by the CSM (solid line squares) to the speciﬁcation of the model whic h generated the spik e train (dashed line) and a PSTH calculated by triggering on the last s pike (solid line). Except at 16 and 17 msec post spike, the CSM-predicted ﬁring r ate agrees with b oth the generating mo del and the PSTH. The discrepancy arises b ecause the CSM only discerns the structure in the data, and most of the ISIs are shorter than 16 msec. There is muc h closer agreement b et ween the CSM and the PSTH if ﬁring rates are plotted as a function of time since a spike without conditioning on no subsequen t spike emission (not shown) . The second and third panels of Figure 3 plot the time-dep enden t complexity and entropies. The complexity is m uch higher after the emission of a spik e than during baseline, b ecause the states trav ersed (B-Q) are less probable, and represen t the additional structures of refract oriness and bursting. The time-dependent en tropies (third panel) sho w that just after a spike, the refractory p erio d imp oses temp orary determinism on the spike train, but burstiness increases the randomness b efore the dynamics return to the baseline state. B. Mo del neuron under p erio dic stimulation Figure 4 show s the CSM for a p erio dically-stim ulated model neuron . This CSM w as reconstructed from 200 seconds of spikes with a baseline ﬁring rate of 40 Hz ( p = 0 . 04). Each second, the ﬁring rate rose ov er the course of 5 msec to p = 0 . 54 spikes/msec, falling slowly back to baseline ov er the next 50 msec. This mimics the p erio dic presentation of a strong external stimulus. (The exact inhomogeneous ﬁring rate used was λ ( t ) = 0 . 93[ e − t/ 10 − e − t/ 2 ] + 0 . 04 with t in msec. See Figure 5 B, ﬁrst panel, dashed line.) In this mo del, the ﬁring rate do es not directly dep end on the spike train’s history , but there is a sort of history dep endence in the stimulus time-course, and this is what CSSR discov ers. BIC selected Λ = 7, giving a 16 state CSM with C = 0 . 89 bits, J = 0 . 27 bits/msec and R = 0 . 0007 bits/msec. The baseline is again state A and if no spik e is emitted then the process sta ys in A . Spikes are either sp ontaneous and neurons, b y quantifying the error in predicting the distribution of the future of one neuron due to ignoring its coupling with another. 11 random, or stim ulus-driv en. Because the stimulus is external, it is not immediately clear which of these tw o causes pro duced a given spike. Thus, if a spik e is emitted, the CSM trav erses states B through F , deciding, so to sp eak, whether or not the spike is due to a stimulus. If tw o spikes happen within 3 msec of each other, then the CSM decides that it is b eing stimulated and go es to one of states G , H or M . States G through P represen t the resp onse to the stim ulus. The CSM mov es b etw een these states until no spike is emitted for 3 msec, when it returns to the baseline, A . The ISI distribution from the CSM matc hes that from the mo del (Figure 5 A). Ho wev er, b ecause the stim ulus do esn’t depend on spike train’s history , the CSM mak es inaccurate predictions during stimu lation. The ﬁrst panel of Figure 5 B plots the ﬁring rate as a function of time since stim ulus presen tation, comparing the mo del (dashed line) and the PSTH (solid line) with the CSM’s prediction (line with squares). The discrepancy b etw een these is due to the CSM having no wa y of knowing that an external stimulus has b een applied until sev eral spikes in a row hav e b een emitted (represen ted, as w e just sa y , b y states B – F ) 12 . Despite this, c ( t ) shows that something more complex than simple random ﬁring is happening (second panel of Figure 5 B), as do j ( t ) and r ( t ) (third panel). F urther, something is clearly wrong with the entrop y rate, b ecause it should b e upp er-b ounded by h = 1 bit/msec (when p = 0 . 5). The fact that h ( t ) exceeds this b ound indicates an external force, not fully captured b y the CSM, is at work. As discussed in Metho ds ( § II.E ), driv e from the stim ulus can b e quantiﬁed with a relative entrop y (F igure 5 C). Stim uli are presented at t = 1 msec, where ∆ H ( t ) > 1 bit. It is not un til ≈ 25 msec post-stimulus that ∆ H ( t ) ≈ 0 and the CSM once again correctly describes the internal entrop y rate. Thus, as expected, the stimulus strongly inﬂuences neuronal dynamics immediately after its presentati on. The true inter nal entrop y rate H true ( t ) is slightly less than 1 bit/msec shortly after stimulation, when the true spiking rate has a maximum of p max = 0 . 54. The fact that the CSM giv es an inaccurate v alue for J actually lets us ﬁnd the num b er of bits of information gain supplied by the stimulus, e.g., ∆ H > 1 bit immediately after the stim ulus is presented. C. Sp ontaneously Spiking Barrel Cortex Neuron W e reconstructed a CSM from 90 seconds of sp ontane ous (no vibrissa deﬂection) spiking recorded from a lay er I I/I I I FSU barrel cortex neuron. CSSR, using Λ = 21, discov ered a CSM with 315 states, a complexit y of C = 1 . 78 bits, and internal en tropy rat e of J = 0 . 013 bits/msec. After state culling ( § II.B.2 ), the reduced CSM, plotted in Figure 6 , has 14 states, C = 1 . 02, J = 0 . 10 bits/msec, and residual randomness of R = 0 . 005 bits/msec. W e fo cus on the reduced CSM from this p oint onw ards. This CSM resem bles that of the sp on taneously-ﬁring mo del neuron of § I I I.A and Fig. 2 . The complexit y and en tropies are low er than those of our model neuron because the mean spike rate is m uch lo wer, and so simple descriptions suﬃce most of the time. (Barrel cortex neurons exhibit notoriously low spik e rates, esp ecially during anesthesia.) There is a base line state A which emits a spike with probability p = 0 . 01, i.e., 10 Hz. When a spike is emitted, the CSM mov es to state B and then on through the c hain of states C through N , return to A if no spike is subsequen tly emitted. Ho wev er, the CSM can emit a second or even third spike after the ﬁrst, and indeed this neuron displa ys spike doublets and triplets. In general, emitting a spik e mo ves the CSM to B , with some exceptions that sho w the structure to b e more intricate than the mo del neuron’s. Figure 7 A sho ws the CSM’s 99% conﬁdence b ounds almost completely enclosing the empirical ISI distri bution. The ﬁrst panel of Figure 7 B plots the history-dependent ﬁring probability predicted by the CSM as a function of the time since the latest spike, according to both the PSTH and the CSM’s prediction. They are highly similar in the ﬁrst 13 msec post-spike, indicating that the CSM gets the spiking statistics right in this epo ch. The CSM and PSTH the div erge after this, for tw o reasons. First, as with the mo del neuron, there are few ISIs of this length. Most of the ISIs are either shorter, due to the n ueron’s burstiness, or muc h longer, due to the low baseline ﬁring rate. Secondly , 90 seconds is not v ery m uc h data. W e show in Figure 10 that a CSM reconstructed from a longer spik e train does capture all of the structure. W e present the results of this shorter spike train to emphasize that, as a non-parametric method, CSSR only uncov ers the statistical structure in the data , no more, no less. Finally , the second and third panels of Figure 6 B show, resp ectively , the complexit y and entropi es as functions of the time since the latest spik e. As with the mo del of § I I I.A , the structure in the pro cess o ccurs after spiking, during the refractory and bursting p erio ds. This is when the complexity is largest, and also when the entropies v ary most. 12 In eﬀect, this part of the CSM implements Ba yes’s rule, balancing the increased likeliho o d of a spike after a stimulus against the low a priori probability or base-rate of stim ulation. 12 D. Periodically Stimulated Barrel Cortex Neuron W e reconstructed CSMs from 335 seconds of spi ke trains tak en from the same neuron used abov e, but recorded while it was b eing p erio dically stimulated b y vibrissa deﬂection. BIC selected Λ = 25, giving the 29-state CSM shown in Figure 8 . (Before state culling, the original CSM had 1916 states, C = 2 . 55 and J = 0 . 11.) The reduced CSM has a complexit y of C = 1 . 97 bits, an in ternal en tropy rate of J = 0 . 10 bits/msec, and a residual randomness of R = 0 . 005 bits/msec. Note that C is higher when the neuron is being stim ulated as opp osed to when it is sp ontaneously ﬁring, indicating more structure in the spike train. While at ﬁrst th e CSM may seem to only represen ts history-dep endent refractoriness and bursting, ignoring the external stim ulus, this is not quite true. Once again, there is a baseline state A , and most of the other states ( B – X ) comprise a refractory/bursting chain, like this neuron has during sp ontaneous ﬁring. Ho wev er, the transition up on A emitting a spike is not bac k to B and then down the chain again, but to either state C 1 , and subsequently C 2 , or more often to state Z Z . These three states rep resent the structure induced b y the external sti mulus, as we saw with the mo del stimulated neuron of § I I I.B and Figure 4 . (The state Z Z is comparable to the state M of the mo del stimulated neuron: b oth lo op back to th emselves if they emit a spik e.) Three states are enough b ecause, in this exp eriment, barrel cortex neurons spike extremely sparsely , 0 . 1–0 . 2 spikes p er stimulus presen tation. Figure 9 A plots the ISI distribution, nicely enclosed by the bo otstrapp ed conﬁdence bounds. Figure 9 B shows the ﬁring rate, complexity and en tropies as functions of the time since stim ulus presentation (av eraged ov er all presen tations). These plots lo ok m uch like those in Figure 7 B. How ever, there is a clear indication that someth ing more complex tak es place after stimulation: the CSM’s ﬁring-rate predictions are wrong. The stimulus-driv en entrop y ∆ H turns out to b e as large as 0 . 02 bits within 5–15 msec p ost-stimulus. This agrees with the known ≈ 5–10 msec stim ulus propagation time betw een vibrissae and barrel cortex ( Andermann & Mo ore , 2006 ). The reason that ∆ H is so muc h smaller for the real neuron than the stimulated model neuron of § II I.B is that the former’s ﬁring rate is m uch low er. Although the ﬁring rate p ost-stimulu s can b e almost twice as large as the CSM’s prediction, the actual rate is still low, max λ ( t ) ≈ 0 . 04 spikes/msec. Most of the time the neuron do es not spike, ev en when stimulated, so on a verage, the stim ulus pro vides little information p er presentation. F or completeness, Figure 10 sho ws the spik e probabilit y , complexit y and en tropies as functions of the time since the latest spike. Averaged ov er this ensemble, the CSM’s predictions are highly accurate. IV. DISCUSSION The goal of this pap er was to present methods for determining the structural conten t of spike trains while making minimal a priori assumptions as to the form whic h that structure tak es. W e use the CSSR algorithm to build minimal, optimally predictive hidden Mark ov mo dels (CSMs) from spik e trains, Sc hw artz’s Ba yesian Information Criterion to ﬁnd the optimal history length Λ of the CSSR algorithm, and b o otstrapp ed conﬁdence b ounds on the ISI distribution from the CSM to chec k go o dness-of-ﬁt. W e demonstrated how CSMs can estimate a spike train’s complexit y , thus quant ifying its structure, and its mean algorithmic information conten t, quanti fying the minimal computation necessary to generate the spik e train. Finally we show ed how to quan tify , in bits, the inﬂuence of external stimuli up on the spik e-generating pro cess. W e applied these metho ds b oth to simulated spike trains, for whic h the resulting CSMs agreed with intuition, and to real spike trains recorded from a lay er I I/I I rat barrel cortex neuron, demonstrating increased structure, as measured by the complexity , when the neuron w as b eing stim ulated. W e are unaw are of any other practical techniqu es for quan tifying the complexit y and computational structure of a spike train as we deﬁne them. Intuitiv ely , neither random (Poisson), nor high ly ordered (e.g., strictly p erio dic, as in Olufsen et al. ( 2003 )) spik e trains should b e thought of as complex since they do not p ossess structure requiring a sophisticated program to generate. Instead, complexity lies b etw een order and disorder ( Badii & Politi , 1997 ), in the non-random v ariation of the spik es. Higher complexity means a greater degree of organization in neural activit y than w ould b e implied by random spiking. It is the reconstruction of the CSM through CSSR which allows us to calculate the complexit y . Our deﬁnition of complexity stands in stark con trast to other complexit y measures which assign high v alues to highly disordered systems. Some of these, such as Lempel Ziv complexity ( Amigo et al. , 2002 , 2004 ; Jimenez-Mon tano et al. , 2002 ; Szczepanski et al. , 2004 ) and context free grammar complexity ( Rapp et al. , 1994 ) hav e been applied to spik e trains. How ev er, b oth of these are measures of the amount of information required to repro duce the spike train exactly , and take on v ery high v alues for completely random sequences. These “complexit y” measures are therefore m uch more similar to t otal algorithmic information con ten t and even to th e en tropy rate than to our sort of complexit y . Our measure of complexit y is the entrop y of the distribution of causal states. This has the desired prop erty of b eing maximized for structured, rather than ordered or disordered systems, because the causal states are deﬁned statistically , as equiv alence classes of histories conditioned on future even ts. Other rese archers hav e also calculated 13 complexit y measures which are en tropies of state distributions, but hav e deﬁned their states diﬀeren tly . Amigo et al. ( 2002 ) uses the observ ables (symbol strings) present in the spike train to deﬁne a k-th order Marko v pro cess and calls eac h individual length k string which app ears in the spik e train a state. Gorse and T aylor ( 1990 ) similarly use single suﬃx symbol strings to deﬁne the states of a Marko v pro cess. In b oth cases, I ID Bernoulli sequences could exhibit up to 2 k states (in long enough sequences), and p ossess an extremely high “complexit y” . Ho wev er, all of these states mak e the same prediction for the future of the pro cess. The minimal representation is a single causal state, a CSM with a complexity of zero. It should b e noted that there are also man y works which mo del spike trains using HMMs, but in whic h the hidden states represent macr o -states of the system (aw ake/asleep, Up/Down, etc.), and spiking rates are mo deled separately in each macro-state ( Abeles et al. , 1995 ; Ac htman et al. , 2007 ; Chen et al. , 2008 ; Dano czy & Hahnloser , 2005 ; Jones et al. , 2007 ). Although the graphical representati on of suc h HMMs may l o ok like those of CSMs, the t wo kinds of states hav e very diﬀeren t meanings. Finally , there are also state space metho ds whic h mo del the dynamical state of the system as a con tinuous hidden v ariable, the most w ell known of which is the linear Gaussian mo del with Kalman ﬁltering. These hav e been extensiv ely applie d to neural encoding and deco ding prob lems ( Eden et al. , 2007 ; Smith et al. , 2004 ; Sriniv asan et al. , 2007 ). Interestingly , for a univ ariate Gaussian ARMA mo del in state-space form, the Kalman ﬁlter’s one-step-ahead prediction and mean-squared prediction error are, jointly , minimal-suﬃcient for next- step prediction, and since they can b e up dated recursively they in fact constitute the minimal suﬃcien t statistic, and hence the causal state in this sp ecial case. Neurons are driv en by their aﬀeren t synapses. Although as discussed in App endix C , there is a parallel “transducer” formalism for generating CSMs which tak e external inﬂuences in to account , this is not y et computationally imple- men ted, and our current approac h reconstructs CSMs only from the spike train. Since the history of the neuron under study is t ypically connected with the history of the netw ork in whic h it is lo cated, this CSM will, in general, reﬂect more than a neuron’s internal biophysi cal prop erties. Nonetheless, in both our mo del neurons and in the real barrel cortex neuron, states not in terpretable as simple refractoriness or bursting appeared when a stimul us was presen t, pro ving w e can de tect stim ulus-driv en complexit y . F urther, w e show ed that the CSM can b e used to determine the exten t (in bits) to whic h a neuron is driven by external stimuli. The metho ds presented here complemen t more established mo des of spike-train analysis, which ha v e diﬀerent goals. P arametric metho ds, suc h as PSTHs or maximum likelihoo d estimation ( Bro wn et al. , 2004 ; T ruccolo et al. , 2005 ) generally fo cus on determining a neuron’s ﬁring rate (mean, instantaneous or history-dep enden t), and on how known external cov ariates mo dulate that rate. They hav e the adv antage of requiring less data than non-parametric methods suc h as CSSR, but the disadv antage, for our purp oses, of imp osing the struct ure of the mo del at the outset. When the experimenter wan ts to kno w ho w a neuron enco des a particular asp ect of a cov ariate, e.g., how neurons in the sensory p eriphery or primary sensory cortices encode stim uli, parametric metho ds ha ve prov ed highly illuminating. Ho wev er, in many cases the identi ty or even existence of relev an t external cov ariates is uncertain. F or example, one could envision using CSMs to analyze recordings in pre-frontal cortex during diﬀerent cognitiv e tasks, or to p erhaps compare spiking structure during diﬀerent attentional states. In b oth cases, the relev ant external cov ariates are not at all clear, but CSMs could still b e used to quantify changes in computational structure, for single neurons or for groups of them. F or neural p opulations one can en vision generating distributions (o ver the p opulation) of complexities and examining how these distributions change in diﬀerent cortical macro-states. This w ould b e entirely analagous to analyzing distributions of ﬁring rates or tuning curves. In addition to calculations of the complexity , the whole arra y of mutual-inf ormation analyses can b e applied to CSMs, but instead of calculating m utual information b etw een the spik es and the cov ariates (which could incl ude other spik e trains), one can calculate the mutual information b et ween the cov ariates and the c ausal state s . The adv antage is that the causal states represen t the behavioral patterns of the spik e-generating pro cess, and so are closer to the actual state of the system than the spi kes (output observ ables) are themselv es. Results on calculating the m utual information b etw een the causal states of diﬀerent neurons (informational coherence) in a large simulated netw ork sho w that synchronous neuronal dynamics are more eﬀectively reveal ed than when calculated directly from the spikes ( Klinkner et al. , 2006 ). In closing, our metho ds provide a w ay to understand structure in spik e trains, and should b e considered as comple- men ts to traditional analysis metho ds. W e rigorously deﬁne structure, and sho w h ow to disco ver it from the data itself. Our metho ds go b eyond those whic h seek to describ e the observ ed v ariation in the spiking rates by also describing the underlying comput ational process (in the form of a CSM) needed to generate that v ariation. A CSM can show not only that the spike rate has c hanged, but also sho w exactly how it has changed. Ac knowledgmen ts The authors thank Mark Andermann and Christopher Mo ore for the use of their data. RH thanks Emery Brown, Anna Dreyer and Christopher Mo ore for v aluable discussions. CRS thanks An thony Bro ck well, Dav e F eldman, Chris Geno vese, Rob Kass and Alessandro Rinaldo for v aluable discussions. 14 APPENDIX A: Filtering with CSMs A common diﬃculty with hidden Marko v models is t hat predictions can only be made fr om a kno wledge of the state, whic h m ust itself b e guessed at from the time series, since it is, after all, hidden. This creates the state estimation or ﬁltering problem. Under strong assumptions (linear Gaussian sto c hastic dynamics, linearly observ ed through I ID additiv e Gaussian noise) the Kalman ﬁlter is an optimal yet tractable solution. F or non-li near pro cesses, how ever, optimal ﬁltering essentially amounts to main taining a posterior distribution ov er the states and updating it via Bay es’s rule ( Ahmed , 1998 ). (This distribution is sometimes called the pro cess’s “information state” .) One conv enien t and imp ortant feature of CSMs is that this whole machinery of ﬁltering is unnecessary , because of their recursive-updating property . Given the state at time t , S t , and the observ ation at time t + 1, X t +1 , the state at time t + 1 is ﬁxed, S t +1 = T ( S t , X t +1 ) for some transition function T . Clearly , if the state is kno wn with certaint y at any time, it will remain known. How ever, the same recursive up dating propert y also allows us to show that the state does b ecome certain, i.e., that after some ﬁnite (but p ossibly random) time τ , P ( S τ = s | X τ 1 ) is either 0 or 1 for all states s . F or Marko v c hains of order k , clearly τ ≤ k ; under more general circumstances P ( τ ≥ t ) goes to zero exponentially or faster. Th us, after a transien t p erio d, the state is completely unambiguous. This will be useful to us in multi ple places, including un derstanding the comput ational structure of the process and predicting the ﬁring rate of the neuron. It al so leads to considerable n umerical simpliﬁcations, compared to approac hes whic h demand conv enti onal ﬁltering. F urther, recursiv e ﬁltering is easily applied to a new spik e train, not merely the one from which the CSM w as reconstructed. This helps in cross-v alidating CSMs, as discussed in the next app endix. APPENDIX B: Cross-V alidation It is often desirable to cross-v alidate a statistical mo del b y spliting one’s data set in tw o, using one part (generally the larger) as a training set f or the mo del and t he other part to v alidate the mo del b y some statistical test. In the case of CSMs it is particularly imp ortant to chec k the v alidity of the BIC used to regularize the Λ control-setting. One possible test is the ISI bo otstrapping of § II.B.3 . A second, somewhat stronger, go o dness-of-ﬁt test is based on the time rescaling theorem of Brown et al. ( 2002 ). This test rescales the in terspik e interv als as a functi on of the in tegrated history-dep endent spiking rate o ver the ISI: τ k = 1 − e − R t k +1 t k λ ( t ) dt (B1) where the { t k } are the spike times and λ ( t ) is the history-dep endent spiking rate from the CSM. If the CSM describ es the data well, then rescaled ISI’s { τ k } should follow a uniform distribution. This can be tested using either a Kolmogoro v Smirnov test or b y plotting the empirical CDF of the rescaled times against the CDF of the uniform distribution (Kolmogoro v Smirnov or “KS” plot) ( Bro wn et al. , 2002 ). Figure 11 giv es cross-v alidation results for the rat barrel cortex neuron, during b oth sp ontaneous ﬁring and perio dic vibrissae deﬂection. 90 seconds of sp ontaneously ﬁring spikes were split int o a 75 second training set and a 15 second v alidation set. The 335 sec onds of stim ulus-ev oked ﬁring w ere split into a 270 second training set and a 65 second v alidation set. P anels A and B show the ISI b o otstrapping results for the spontaneous and stim ulus evok ed ﬁring respectively . The dashed lines are 99% conﬁdence b ounds from a CSM reconstructed fr om the training set and the solid line is the ISI distribution of the v alidation set. The ISI distribution largely falls within these b ounds for b oth the sp on taneous and stim ulus evok ed data. P anels C-F display the time rescaling test. Panels C and D show the time rescaling plots for the sp ontaneous and stim ulus ev ok ed train ing data resp ectively . The dashed lines are 95% conﬁdence b ounds. The sp ontaneous KS plot largely falls within the bounds. The stimulus-ev ok ed does not, but this is exp ected b ecause, as discussed, the CSM do es not completely capture the imposition of the external stimulus. (The jagged “steps” in b oth plots result from the 1 msec temp oral discretization.) P anels E and F show the time rescaling plots for, respectively , the sp ontaneous and stimulus evok ed v alidation data. The ﬁts here are somewhat worse. In the stimulate d case, this is not surprising. In the sp on taneous case the cause is lik ely non-stationarity in the data, a problem shared with other spike train analysis techniques, such as the Generalized Linear Mo del approaches describ ed in the next Appendix. It should b e emphasized that the p oint of reconstructing CSMs is not to obtain p erfect ﬁts to the data, but instead to estimate the structure inheren t in the spik e train, and the cross-v alidation results should b e view ed in this ligh t. 15 APPENDIX C: Causal State T ransducers and Predictive State Rep resentations Mathematically , CSMs can b e expanded to include the inﬂuence of external stimuli on the pro cess, yielding causal state tr ansduc ers , whic h are optimal represen tations of the history-dep endent mapping from inputs to outputs ( Shalizi , 2001 , c h. 7). Such causal state transducers are a type of partially-observ able Marko v decision pro cess, closely related to predictiv e state represen tations (PSRs) ( Littman et al. , 2002 ). In b oth formalisms, the right notion of “state” is a statisti c, a measurable function of the observ able past of the pro cess. Causal states represent thi s through an equiv alence relation on the space of observ able histories. F or PSRs, the represent ation is through “tests” , i.e., a dis- tinguished set input/output sequence pairs; the idea is that states can b e uniquely characterize d by their probabilities of pro ducing the output sequences conditional on the input sequences. An algorithm for reconstructing causal state transducers would b egin b y estimating probabilit y distributions of future histories conditioned on b oth the history of the spikes and the history of an external co v ariate Y , e.g. P ( X ∞ t +1 | X t −∞ , Y t −∞ ), and otherwise b e entirely parallel to CSSR. This has not yet implemented. References Abeles, M., Bergman, H., Gat, I., Meilijson, I., Seidemann, E., Tishb y , N. & V aadia, E. (1995). Cortical ativity ﬂips among quasi-stationary states. Pr o c. Natl. A c ad. Sci. USA. , 92 , 8616–8620. Ac htman, N., Afshar, A., San thanam, G., Y u, B. M., Ryu, S. I., & Shenoy , K. V. (2007). F ree paced high-p erformance brain-computer interfa ces. Journal of Neur al Engineeri ng , 4 , 336–347. Ahmed, N. U. (1998). Line ar and nonline ar ﬁltering for scientists and engine ers . Singapore: W orld Scien tiﬁc. Amigo, J. M., Szczepanski, J.,W a jnryb, E., Sanch ez-Vives, M. V. (2002). On the Num b er of States of the Neuronal Sources. Biosystems , 68 , 57–66. Amigo, J. M., Szczepanski, J.,W a jnryb, E., Sanchez-Viv es, M. V. (2004). Estimating the Entrop y Rate of Spike T rains via Lempel-Ziv Complexit y . Neur al Computation , 16 , 717–736. Andermann, M. L. & Mo ore, C. I. (2006). A sub-columnar direction map in rat barrel cortex. Natur e Neuro science , 9 , 543–551. Badii, R. & P oliti, A. (1997). Complexity: Hier ar chic al structur es and scali ng in physics . Cambridge, England: Cambridge Univ ersity Press. Bernardo, J. M. & Smith, A. F. M. (1994). Bayesian The ory . New Y ork: Wiley . Bro wn, E. N., Barbieri, R., V entura, V., Kass, R. E., & F rank, L. M. (2002). The time rescaling theorem and its application to neural spike train data analysis. Neur al Computation , 14 , 325–346. Bro wn, E. N., Kass, R. E., & Mitra, P . P . (2004). Multiple neural spike train data analysis: State-of-the-art and future c hallanges. Natur e Neur oscienc e , 7 , 456–461. Chen, Z., Vija yan, S., Barbieri, R., Wilson, M. A., & Brown, E. N. (2008) Discrete- and con tinuous- time probablistic mo dels and inference algorithms for neuronal deco ding of Up and Down states. In revi ew at Neur al Computation. Co ver, T. M. & Thomas, J. A. (1991). Elements of Information The ory. New Y ork: Wily . Crutc hﬁeld, J. P . & Y oung, K. (1989). Inferring statistical complexit y . Physic al R eview L etters , 63 , 105–108. Csisz´ ar, I. & T alata, Z. (2006). Contex t tree estimation for not necessarily ﬁnite memory pro cesses, via BIC and MDL. IEEE T r ansactions on Information The ory , 52 , 1007–1016. Danoczy , M. G., Hahnloser, R. H. R. (2005). Eﬃcien t Estimation of Hidden State Dynamics. A dvanc es in Neur al Information Pr o c essing Systems (NIPS 2005) Cam bridge, Massach usetts. MIT Press. Eden, U. T., F rank, L. M., Barbieri, R., Solo, V. & Brown, E. N.. (2004). Dynamic analysis of neural enco ding by point pro cess adaptiv e ﬁltering Neur al Computation , 16 , 971–998. G´ acs, P ., T romp, J. T., & Vitan yi, P . M. B. (2001). Algorithmic statistics. IEEE T r ansactions on Information The ory , 47 , 2443–2463. Gorse, D., T aylor, J. G. (1990). A General Mo del of Stochastic Neural Pro cessing. Biolo gic al Cyb ernetics , 63 , 299–306. Grassberger, P . (1986). T ow ard a quan titativ e theory of self-generated co mplexity . International Journal of The or etic al Physics , 25 , 907–938. Gra y , R. M. (1988). Pr ob ability, rando m pr o c esses, and er go dic pr op erties . New Y ork: Springer-V erlag. Jaeger, H. (2000). Observ able op erator models for discrete sto chastic time series. Neur al Computation , 12 , 1371–1398. Jimenez-Mon tano, M. A., Eb eling, W., Pohl, T., Rapp, P . E. (2002). Entrop y and Complexity of Finite Sequences and Fluctuating Quant ities. Biosystems , 64 , 23–32. Jones, L. M., F onta nini, A., Sadacca, B. F., & Katz, D. B. (2007). Natural stimuli ev oke dynamic sequences of states in sensory cortical ensembles. Pr o c. Natl. A c ad. Sci. USA. , 104 , 18772–18777. Kennel, M. B. & Mees, A. I. (2002). Cont ext-tree mo deling of observed sym b olic dynamics. Physic al R eview E , 66 , 056209. Klinkner, K. L. & Shalizi, C. R. (2009). CSSR: A nonparametric algorithm for predicting an d classifying time series. Manuscript in preparation. Klinkner, K. L., Shalizi, C. R., & Camp eri, M. F. (2006). Measuring shared information and co ordinated activit y in neuronal net works. In W eiss, Y., Sch ¨ olk opf, B., & Platt, J. C. (Eds.), A dvanc es in neur al information pr o ces sing system s 18 (NIPS 2005) , (pp. 667–674), Cambridge, Massac husetts. MIT Press. Knigh t, F. B. (1975). A predictiv e view of contin uous time processes. Annals of Pr ob ability , 3 , 573–596. 16 Littman, M. L., Sutton, R. S., & Singh, S. (2002). Predictive representations of state. In Dietteric h, T. G., Bec ker, S., & Ghahramani, Z. (Eds.), A dvanc es in neur al information pr oc essing systems 14 (NIPS 2001) , (pp. 1555–1561)., Cam bridge, Massac husetts. MIT Press. L ¨ ohr, W. & Ay , N. (2009). On the Generative Nature of Prediction. A dvanc es in Complex Systems , forthcoming. Marton, K. & Shields, P . C. (1994). Entrop y and the consisten t estimation of joint distributions. Annals of Pr ob ability , 22 , 960–977. Correction, Annals of Pr ob ability , 24 (1996): 541–545. McCulloch, W. S. & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity . Bul letin of Mathematic al Biophysics , 5 , 115–133. Olufsen, M. S., Whittington, M. A., Camp eri, M., & Kop ell, N. (2003). New Roles for the Gamma Rhythm: Population T uning and Pro cessing for the Beta Rhythm. Journal of Computation Neur oscienc e , 14 , 33–54. Rapp, P . E., Zimmerman, I. D., Vining , E. P ., Cohen, N.,Albano, A. M., Jimenez-Mon tano, M. A. (1994). The Algorithmic Complexit y of Neural Spike T rains Increases During F o cal Seizures. Journal of Neur oscienc e , 14 , 4731–4739. Riek e, F., W arland, D., de Ruyter v an Steveninc k, R., & Bialek, W. (1997). Spikes: Exploring the neur al c o de . Cambridge, Massac husetts: MIT Press. Shalizi, C. R. (2001). Causal ar chite ctur e, c omplexity and self-or ganization in time series and c el lular automata . PhD thesis, Univ ersity of Wisconsin-Madison. Shalizi, C. R. & Crutc hﬁeld, J. P . (2001). Computational mec hanics: Pattern and prediction, structure and simplicity . Journal of Statistic al Physics , 104 , 817–879. Shalizi, C. R. & Klinkner, K. L. (2004). Blind construction of optimal nonlinear recursive predictors for discrete sequences. In Chic kering, M. & Halp ern, J. Y. (Eds.), Unc ertainty in artiﬁcial intel ligenc e: Pr o c e e dings of the twentieth c onfer enc e (UAI 2004) , (pp. 504–511)., Arlington, Virginia. A UAI Press. Shalizi, C. R., Klinkner, K. L., & Haslinger, R. (2004). Quan tifying self-organization with optimal predictors. Physic al R eview L etters , 93 , 118701. Shalizi, C. R., Rinaldo, A., & Klinkner, K. L. (2009). Adaptive nonparametric prediction and b ootstrapping of discrete time series. Man uscript in preparation. Singh, S., Littman, M. L., Jong, N. K., P ardo e, D., & Stone, P . (2003) Learning predictive state representations. In T. F aw cett and N. Mishra, editors, Pr o c e e dings of the Twentieth International Confer enc e on Machine L ear ning (ICML-2003) , (pp. 712-719). AAAI Press. Smith, A. C., F rank, L. M., Wirth, S., Y anike, M., Hu, D., Kubota, Y., Graybiel, A. M., Suzuki, W. A., & Brown, E. N. (2004). Dynamic analysis of learning in b ehavioral exp eriments. Journal of Neur oscienc e , 24 , 447–461. Sriniv asan, L., Eden, U. T., Mitter, S. K., & Brown, E. N. (2007). General purpose ﬁlter design for neural prosthetic devices. Journal of Neur ophysiolo gy , 98 , 2456–2475. Szczepanski, J., Amigo, J. M., W a jnryb, E., Sanc hez-Vives, M. V. (2004). Characterizi ng spike trains with Lemp el-Ziv com- plexit y . Neur o c omputing , 58 , 79–84. T ruccolo, W., Eden, U. T., F ellow, M. R., Donoghue, J. P ., & Brown, E. N. (2005). A p oint pro cess framew ork for relating neural spiking activity to spiking history , neural ensemb le and cov ariate eﬀects. Journal of Neur ophysiolo gy , 93 , 1074–1089. 17 A 0 | 0.96 1 | 0.04 A 0 | 0.96 B 1 | 0.04 C 0 | 1 D 0 | 1 E 0 | 1 F 0 | 1 0 | 1 A B FIG. 1 Two simple CSMs reconstructed from 20 0 sec of simulated spikes using CSSR. States are represen ted as the nodes of a directed graph. The transitions betw een states are lab eled with the symbol emitted during the transition (1 = spike, 0 = no spike) and the probabilit y of the transition given the origin state. (A) The CSM for a 40 Hz Bernoulli spiking pro cess consists of a single state A which alwa ys transitions back to itself, emitting a spike with probabilit y p = 0 . 04 p er msec. (B) CSM for 40 Hz Bernoulli spiking process with a 5 msec refractory p erio d imp osed after each spike. State A again spikes with probabilit y p = 0 . 04. Upon spiking the CSM transitions through a deterministic chain of states B – F (squares) which represent the refractory p erio d. The increased structure of the refractory p eriod requires a more complex representation . 18 A 0 | 0.957 B 1 | 0.043 C 0 | 1.000 D 0 | 1.000 H 1 | 0.053 I 0 | 0.947 1 | 0.069 J 0 | 0.931 F 1 | 0.024 G 0 | 0.976 1 | 0.039 0 | 0.961 1 | 0.076 K 0 | 0.924 E 1 | 0.012 0 | 0.988 1 | 0.003 0 | 0.997 P 1 | 0.069 Q 0 | 0.931 0 | 0.933 1 | 0.067 O 1 | 0.074 0 | 0.926 N 1 | 0.078 0 | 0.922 M 1 | 0.080 0 | 0.920 L 1 | 0.075 0 | 0.925 1 | 0.079 0 | 0.921 FIG. 2 CSM reconstructed from a 200 sec sim ulated spike train with a “soft” refractory/bursting structure. C = 3 . 16, J = 0 . 25, R = 0. State A (circle) is the baseline 40 Hz spiking state. Up on emitting a spike the transition is to state B . States B and C (squares) are “hard” refractory states from whic h no spike may b e emitted. States D throu gh Q (hexagons) compromise a refractory/bursting chain from whic h if a spike is emitted the transition is back to state B . Up on exiting the chain the CSM returns to the baseline state A . 19 0 5 10 15 20 25 30 0 0.2 0.4 Entropies 0 5 10 15 20 25 30 0 2 4 6 Complexity (C(t)) 0 5 10 15 20 25 30 0 0.05 0.1 History Dependent Firing Probability ISI Distribution 0 5 10 15 20 25 30 35 40 45 50 0 0.02 0.04 0.06 0.08 time since most rec ent spike (msec) msec spikes/msec bits bits A B λ CSM (t) λ PSTH (t) J(t) R(t) H(t) λ model (t) FIG. 3 “S oft” refractory and bursting mo del ISI distribution and time dep endent ﬁring probabilit y , complexity and en tropies. (A) ISI distribution and 99% conﬁdence b ounds bo otstrapp ed from the CSM. (B) First panel: Firing probability as a function of time since the most recen t spike. Line with squares = ﬁring probabilit y predicted by CSM. Solid line = ﬁring probability deduced from PSTH. Dashed line = mo del ﬁring rate used to generate spik es. Second panel: Complexity as a function of time since most recent spike. Third panel: Entropies as a function of time since most recent spike. Squares = internal entrop y rate, circles = residual randomness, solid line = en tropy rate. (ov erlaps squares) 20 A 0 | 0.957 B 1 | 0.043 C 0 | 0.944 H 1 | 0.056 D 1 | 0.055 F 0 | 0.945 E 0 | 0.885 M 1 | 0.115 0 | 0.882 1 | 0.118 0 | 0.948 G 1 | 0.052 0 | 0.821 1 | 0.179 I 0 | 0.791 1 | 0.209 J 0 | 0.839 K 1 | 0.161 0 | 0.878 1 | 0.122 L 0 | 0.677 1 | 0.323 1 | 0.205 N 0 | 0.795 1 | 0.392 0 | 0.608 1 | 0.366 O 0 | 0.634 1 | 0.265 P 0 | 0.735 0 | 0.816 1 | 0.184 FIG. 4 16-state CSM reconstructed from 200 sec of sim ulation of perio dically-stimulated spiking. C = 0 . 89, J = 0 . 27, R = 0 . 0007. State A is the baseline state. States B through F (o ctagons) are “decision” states in whic h the CSM ev aluates whether a spike indicates a stimulus or w as spontaneous. Two spikes within 3 msec cause the CSM to transition to states G through P , which represen t the structure imposed b y the stimu lus. If no spik es are emitted within 5 (often fewer) sequen tial msec, the CSM goes bac k to the baseline state A . 21 0 5 10 15 20 25 30 35 40 45 50 0 0.02 0.04 0.06 0.08 0.1 0.12 0 5 10 15 20 25 30 35 40 45 50 0 0.2 0.4 0.6 Time Dependent Firing Probability 0 5 10 15 20 25 30 35 40 45 50 0 5 10 Complexity C(t) 0 5 10 15 20 25 30 35 40 45 50 0 1 2 3 Entropies ISI Distribution msec Time sinc e stimulus (msec) Time sinc e stimulus (msec) bits bits spikes/msec Stimulus Driven Entr opy (Δ H(t)) A B C λ CSM (t) λ PSTH (t) J(t) R(t) H(t) λ model (t) 0 5 10 15 20 25 30 35 40 45 50 0 0.5 1 1.5 bits FIG. 5 Stim ulus mo del ISI distribution and time-dep endent complexit y and entropies. (A) ISI distribution and 99% conﬁdence bounds. (B) First panel: Firing probability as a function of time since stim ulus presentation. Second panel: Time dependent complexit y . Third panel: time-dep endent en tropies. (C) The stim ulus-driv en en tropy is > 1 bit, indicating strong external driv e. See text for discussion. 22 A 0 | 0.990 B 1 | 0.010 D E 0 | 0.928 1 | 0.072 1 | 0.065 F 0 | 0.935 C 0 | 0.961 1 | 0.039 0 | 0.991 1 | 0.009 H 1 | 0.046 G 0 | 0.954 M 1 | 0.042 N 0 | 0.958 0 | 0.959 1 | 0.041 L 1 | 0.036 0 | 0.964 K 1 | 0.041 0 | 0.959 J 1 | 0.032 0 | 0.968 I 1 | 0.047 0 | 0.953 1 | 0.052 0 | 0.948 1 | 0.053 0 | 0.947 FIG. 6 14-state CSM reconstructed from 90 sec of spiking recorded from a spontaneously spiking (no stimulus) neuron lo cated in la yer I I/I I I of rat barrel cortex. C = 1 . 02, J = 0 . 10, R = 0 . 005. State A (circle) is baseline 10 Hz spiking. States B through N comprise a refractory/bursting c hain similar to, but with a s omewhat more in tricate structure than, that of the model neuron in Figure 2 23 0 5 10 15 20 25 30 35 40 45 50 0 0.02 0.04 0.06 0.08 0.1 0 5 10 15 20 25 30 0 0.02 0.04 0.06 History Dependent Firing Probability 0 5 10 15 20 25 30 0 2 4 6 8 Complexity C(t) 0 5 10 15 20 25 30 0 0.2 0.4 Entropies ISI Distribution msec spikes/msec bits bits time since most rec ent spike (msec) B A J(t) R(t) H(t) λ CSM (t) λ PSTH (t) FIG. 7 Sp ontaneously spiking barrel cortex neuron. (A) ISI distribution and 99% b o otstrapp ed conﬁdence b ounds. (B) First panel: Time dep endent ﬁring probability as a function of time since most recent spike. See text for explanation of discrepancy betw een CSM and PSTH spike probabilities. Second Panel: Complexit y as a function of time since most recent spik e. Third P anel: Entrop y rates as a function of time since most recent spike. 24 A 0 | 0.99 B 1 | 0.01 ZZ 1 | 0.02 C2 0 | 0.99 C 1 | 0.01 0 | 0.99 X 0 | 0.98 1 | 0.02 L 1 | 0.03 M 0 | 0.97 1 | 0.04 N 0 | 0.96 D 0 | 0.95 C1 1 | 0.05 H2 1 | 0.05 0 | 0.95 1 | 0.06 E 0 | 0.94 H1 1 | 0.01 0 | 0.99 1 | 0.04 0 | 0.96 O 1 | 0.03 0 | 0.97 1 | 0.03 P 0 | 0.977 J 1 | 0.04 K 0 | 0.96 1 | 0.04 0 | 0.97 H 1 | 0.04 I 0 | 0.96 1 | 0.04 0 | 0.96 G 1 | 0.05 0 | 0.95 F 1 | 0.05 0 | 0.95 1 | 0.06 0 | 0.94 0 | 0.99 1 | 0.01 W 0 | 0.98 1 | 0.02 V 1 | 0.02 0 | 0.98 S 1 | 0.03 T 0 | 0.97 1 | 0.03 U 0 | 0.98 Q 1 | 0.03 R 0 | 0.97 1 | 0.03 0 | 0.97 1 | 0.03 0 | 0.97 1 | 0.02 0 | 0.98 FIG. 8 29-state CSM reconstructed from 335 seconds of spikes recorded from a lay er II/I I I barrel cortex neuron undergoing perio dic (125 msec in ter-stim ulus in terv al) stimulation via vi brissa deﬂection. C = 1 . 97, J = 0 . 11, R = 0 . 004. Most of the states are devoted to refractory/bursting b ehavior, how ever states “C1” , “C2” and “ZZ” represent the structure imp osed by the external stimulus. See text for discussion. 25 0 5 10 15 20 25 30 35 40 45 50 0 0.02 0.04 0.06 0.08 0 5 10 15 20 25 30 35 40 45 50 0 0.02 0.04 Time Dependent Firing Probability 0 5 10 15 20 25 30 35 40 45 50 1.5 2 2.5 3 Complexity C(t) 0 5 10 15 20 25 30 35 40 45 50 0 0.2 0.4 Entropies ISI Distribution A msec spikes/msec bits bits time since stimulus presenta tion (msec) λ CSM (t) λ PSTH (t) J(t) R(t) H(t) B C time since stimulus presenta tion (msec) 0 5 10 15 20 25 30 35 40 45 50 0 0.005 0.01 0.015 0.02 0.025 Stimulus Driven Entr opy (Δ H(t)) bits FIG. 9 Stimulated barrel cortex neuron ISI distribution and time-dep endent complexit y and en tropies. (A) ISI distribution and 99% conﬁdence bounds. (B) First panel: Firing probabilit y as a function of time since stim ulus presen tation. Second panel: Time-dependent complexit y . Third panel: time-dep endent entropies. (C) The stimulus driv en en trop y (maxim um of 0 . 02 bits/msec) is low b ecause the n umber of spikes p er s timulus ( ≈ 0 . 1 − 0 . 2) is very low and hence the stimulus do es not supply muc h information. 26 0 5 10 15 20 25 30 0 0.02 0.04 0.06 History Dependent Firing Probability 0 5 10 15 20 25 30 35 40 45 50 0 5 10 Complexity C(t) 0 5 10 15 20 25 30 0 0.2 0.4 Entropies spikes/msec bits bits time since most rec ent spike (msec) J(t) R(t) H(t) λ CSM (t) λ PSTH (t) FIG. 10 Firing probability , complexit y and en tropies of the stimulated barrel cortex neuron as a function of time since the most recent spike. 27 0 10 20 30 40 50 0 0.02 0.04 0.06 0.08 0.1 0.12 0 10 20 30 40 50 0 0.02 0.04 0.06 0.08 0.1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 ISI Distrib. Spont. Spiking V alidation Data ISI Distrib. Stimulated Spiking V alidation Data KS Plot, Spont. Spiking T raining Data V alidation Data KS Plot, Stimulated Spiking T raining Data V alidation Data msec msec A B C D E F FIG. 11 Cross v alidation of CSMs reconstructed from sp ontaneously ﬁring and stimulus evok ed rat barrel cortex on an inde- pendent v alidation training set. (A,B) IS I distribution of sp ontaneously and stimulus ev oked ﬁring v alidation sets and 99% conﬁdence b ounds bo otstrapp ed from CSM. (C-D) Time rescaling plots of training data sets for sp ontaneously ﬁring and stim- ulus evok ed ﬁring resp ectively . Dashed lines are 95% conﬁdence b ounds and the solid line is the rescaled ISIs. The solid line along the digagonal is for visual comparison to an ideal ﬁt. (E-F) Similar time rescaling plots for the v alidation data sets.

The Computational Structure of Spike Trains

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment