Information topology identifies emergent model classes
We develop a language for describing the relationship among observations, mathematical models, and the underlying principles from which they are derived. Using Information Geometry, we consider geometric properties of statistical models for different…
Authors: Mark K. Transtrum, Gus Hart, Peng Qiu
Information top ology iden tifies emergen t mo del classes Mark K. T ranstrum, 1 Gus L. W. Hart, 1 and Peng Qiu 2 1 Dep artment of Physics and Astr onomy, Brigham Y oung University, Provo, Utah 84602, USA ∗ 2 Dep artment of Biome dic al Engine ering, Geor gia T ech and Emory University, Atlanta, Ge or gia 30332, USA W e develop a language for describing the relationship among observ ations, mathematical mo d- els, and the underlying principles from whic h they are d eriv ed. Using Information Geometry , we consider geometric properties of statistical mo dels for different observ ations. As observ ations are v aried, the mo del manifold may b e stretc hed, compressed, or even collapsed. Observ ations that preserv e the structural iden tifiability of the parameters also preserve certain top ological features (suc h as edges and corners) that characterize the mo del’s underlying physical principles. W e in- tro duce Information T op olo gy in analogy with information geometry as characterizing the “abstract mo del” of which statistical models are realizations. Observ ations that change the top ology , i.e., “manifold collapse,” require a modification of the abstract mo del in order to construct identifiable statistical models. Often, the essential top ological feature is a hierarchical structure of boundaries (faces, edges, corners, etc.) whic h w e represen t as a hierarc hical graph known as a Hasse diagram. Lo w-dimensional elements of this diagram are simple models that describ e the dominant b eha vioral mo des, what we call emer gent mo del classes . Observ ations that preserv e the Hasse diagram are diffeomorphically related and form a group, the collection of which form a partially ordered set. All p ossible observ ations ha ve a semi-group structure. F or hierarchical mo dels, w e consider how the top ology of simple models is em b edded in that of larger mo dels. When emergent mo del classes are unstable to the introduction of new parameters, we classify the new parameters as relev an t. Con versely , the emergent mo del classes are stable to the in tro duction of irrelev an t parameters. In this wa y , information top ology provides a general language for exploring representations of ph ysical systems and their relationships to observ ations. I. INTR ODUCTION Mathematical modeling is a central comp onen t of nearly all scientific inquiry . Simple mo dels of real systems, coupled with robust metho ds for interacting with them, is one of the primary engines for scien tific progress[1 – 3]. F rom an information theoretic persp ec- tiv e, mathematical mo dels act as b oth a type of “infor- mation con tainer” for storing the answers to experimental questions (in the form of parameter confidence regions, for example) and as a transfer mec hanism for using that information to predict the outcome of new exp erimen ts. In order to facilitate these functions, mo dels reflect the prop erties of the ph ysical system that are relev ant for ex- plaining the outcome of exp eriments while disregarding irrelev an t degrees of freedom. Generically , whic h prop- erties are relev an t or irrelev ant dep end on the observ a- tions, and a complete understanding of the system is best ac hieved when the relationship among all the p otential represen tations and observ ations is characterized. F or example, a complete microscopic description of an ideal gas is neither necessary nor insigh tful for under- standing its thermo dynamic properties. On the other hand, a macroscopic description has a limited domain of applicabilit y and masks the microscopic mechanisms that con trol the system b eha vior. By relating the macroscopic description (temp erature and pressure) to microscopic v ariables (kinetic energy) one simultaneously explains the ∗ Electronic address: mktranstrum@byu.edu emergence of the thermo dynamic mo del and identifies its domain of applicability (i.e., the thermo dynamic limit). These deep insights deriv ed from understanding the re- lationship among different represen tations and their ap- propriate domains are lost when either representation is considered alone. Another con text in which this type of insigh t is useful is that of effective field theories. Effectiv e field the ories demonstrate the utility of including only the appropri- ate degrees of freedom for describing a particular phe- nomenon, and the Renormalization Group (RG) mak es systematic the pro cess of remo ving irrelev an t degrees of freedom. Although the actual pro cedure is usually lim- ited to systems with an emergen t scale in v ariance or con- formal symmetry , the analysis makes clear the utilit y of effectiv e theories and how they emerge from a more com- plete underlying ph ysical theory . Recen tly it has been suggested that the general suc- cess of effective models and emergen t theories can be attributed to an information theoretic “parameter space compression” resulting from coarse observ ations[3]. The presen t w ork builds up on this relationship as a vehicle for extending concepts and insigh ts originating in RG to other concepts. This w ork should not b e interpreted as a direct translation of R G principles, although the con- cepts w e discuss are motiv ated by analogy with renor- malization. W e assume that the appropriate model of a physical system depends not just on the intrinsic prop erties of the system itself, but also the questions one wishes to ask ab out it, i.e., the observ ations to b e made at sp ecific exp erimen tal conditions. W e develop a new mathemati- 2 FIG. 1: Abstract and Statistical Mo dels An abstract mo del, Y , maps a parameter space Θ into a b eha vior space B . The b eha vior space consists of all p ossible mo del b eha v- iors under all p ossible exp erimen tal conditions, (even those measuremen ts that ma y b e impractical or imp ossible). A real exp erimen t X defines a data space D as all p ossible outcomes of the exp eriment and is a subset of B . The statistical mo del, y , maps the parameter space in to the data space of the sp e- cific experiments. Statistical inference consists of iden tifying an inv erse function y − 1 that enco des the information from the experiments in to the parameter space for extrapolation to new exp eriments. Information T op ology c haracterizes the exp erimen tal conditions X for which the in verse y − 1 is w ell- defined for a given abstract mo del. cal language, which we call Information T op olo gy for de- scribing the relationships among mathematical mo dels, observ ations, and the underlying principles from whic h they are derived. Understanding these relationships leads to a more complete understanding of the physical system. The foundation for this theory is a distinction b et ween t wo types of mo dels as we graphically illustrate in Fig- ure 1. The first, what w e call the abstr act mo del and denote b y Y , is c haracterized by parameters Θ and a set of rules that describe the underlying physical prin- ciples implemented by the mo del. Applying these rules to specific exp erimental conditions, X , leads to what we call the statistic al mo del and denote by y . Imp ortantly it is the statistical model that makes falsifiable predic- tions that can be compared to observ ations. W e express this relationship by sa ying a statistical model is a re- alization of a particular abstract mo del. Statistical in- ference is p erformed by comparing the statistical mo del to exp erimen tal data through the inv erse function y − 1 . This process induces a metric on the parameter space of the abstract mo del; how ever, without any observ ations, there is no natural metric to the parameter space of the abstract model. As a concrete example, consider Newton’s law of grav- itation. The abstract model is characterized b y a pa- rameter (i.e, the universal gra vitational constant G ) and a set of rules (i.e., the inv erse square law) for predict- ing the motion of ob jects. V arying the parameter defines a family of abstract gravitational mo dels, and the op- timal v alue is found by comparing the predictions of a statistical model to observ ation, e.g., predicting the p o- sition of the mo on, that are a small subset of all p ossible prediction that the abstract mo del could make. The dis- tinguishabilit y of the statistical model’s predictions for differen t v alues of G defines a metric on the parameter space. The metric is not in trinsic to the physical sys- tem, how ever, but dep ends on the observ ations. F urther- more, the metric w ould b e different if the mo on’s p osition w ere measured to differen t accuracy or if another exp eri- men t (such as observing the p osition of Mars or measur- ing the acceleration near the surface of the Earth) were p erformed. Each of these potential observ ations corre- sp onding to different X ’s in Figure 1 and therefore lead to differen t statistical models y . Since statistical models (unlike abs tract mo dels) make predictions for the outcomes of specific exp erimen ts, they are naturally in terpreted as a manifold of p oten tial pre- dictions embedded in data space. The interpretation of mo dels as manifolds is the basis for the field of infor- mation geometry[4 – 13]. T o study the prop erties of the abstract mo del, w e consider those properties of the mo del manifold that are inv ariant to changes in the metric, i.e., what properties are shared b y differen t realizations of the same abstract model. W e therefore consider how geo- metric prop erties of statistical mo dels v ary dep ending on sp ecific observ ations and which prop erties are inv ariant to these c hanges. This pap er is organized as follows: first, we fo cus on the observ ations for whic h statistical inference can be p erformed, i.e., for whic h y − 1 exists. W e find that the ob- serv ations for whic h this holds are diffeomorphisms of the mo del manifold. Therefore, properties of the model man- ifold that are inv ariant under diffeomorphisms c haracter- ize the abstract model. Information top ology is therefore the synthesis of differential top ology and information the- ory just as information geometry is the combination of differen tial geometry and information theory as we illus- trate in F igure 2 and elab orate in section I I. In section I I I, w e consider the effect of coarsening the observ ations. Coarsened observ ations may lead to struc- tural changes in the manifold compared to the identifiable case, what we call manifold c ol lapse . Manifold collapse leads to uniden tifiable statistical manifolds that require mo difying the abstract mo dels in order to formulate iden- tifiable statistical mo dels. In other cases, coarse observ a- tions can lead to a collapse of the b oundaries of the mo del manifold, which in turn leads to practically unidentifi- able statistical models. In man y cases, the parameters in a practically unidentifiable mo del can b e arranged in a hierarch y of decreasing relative importance[3], a phe- nomenon known as sloppiness and observed in many dif- feren t models[14 – 20]. W e suggest that sloppiness ma y b e a lo cal manifestation of the collapse of manifold b ound- aries. 3 FIG. 2: Information T op ology and Abstract Models An abstract model is characterized by a parameter space without a metric. Comparing the predictions of the mo del to exp er- imen tal observ ations leads to a family of statistical mo dels and imp oses a metric on the mo del’s parameter space. This statistical mo del can b e interpreted as a manifold embedded in the space of all p ossible exp erimen tal outcomes and is the domain of information geometry . Changing the details of the exp erimen tal conditions leads to differen t models with the same parameter space but differen t metrics and b y extension differen t mo del manifolds. Exp erimental conditions that do not change the key structural c haracteristics of the manifold (i.e., are diffeomorphically related to one another) form an in- formation top ological description of the abstract model from whic h they arise. In section IV, we show that the top ological structure of an abstract model can b e graphically represented as a hierarchical graph known as a Hasse diagram. This visualization naturally leads to a reinterpretation of the mo del parameters in terms of refining appro ximations to simple reduced mo dels[21]. These reduced mo dels are lo w dimensional elements of the Hasse diagram and iden- tify distinct b eha vioral mo des of the mo del. In this wa y , the top ology of the abstract mo del identifies the relation- ships among distinct system behaviors and minimal rep- resen tations of those b ehaviors. W e find that set of all p ossible observ ations form a semi-group structure with diffeomorphism subgroups characterized by unique Hasse diagrams. In section V we consider how top ologies of hierarchical mo dels are embedded within one another. This allo ws us to iden tify stable and unstable mo del classes and clas- sify parameters as either relev an t or irrelev an t (in anal- ogy with corresp onding classifications in the context of renormalization group analysis[22, 23]). I I. INF ORMA TION TOPOLOGY AND MA THEMA TICAL MODELS T o formalize the concepts outlined in the in tro duction, w e consider statistical mo dels that are defined as a prob- abilit y distribution for sp ecific observ ations, P ( ξ ) . Here P represen ts the probability of observing some outcome ξ . W e fo cus on probability distributions b ecause they ha ve broad applicabilit y and hav e a natural metric (i.e., the Fisher Information defined b elo w) for measuring dis- tances among mo del predictions. This assumption is not strictly necessary , how ever, and we discuss other p ossible metrics in the supplemen t. W e assume these statisti- cal mo dels are deriv ed as sp ecific applications of some abstract model to particular exp erimental conditions, so that there are man y statistical models that share a com- mon parameter space. W e no w consider the concept of identifiability of a sta- tistical mo del. In general, there are tw o classes of iden ti- fiabilit y issues, structural and practical. A mo del y with structurally iden tifiable parameters is an injective map from parameters to predictions so that an in verse, y − 1 can b e constructed. Structural iden tifiabilit y is often de- scrib ed as the p oten tial to accurately infer unique param- eter v alues from a model giv en perfect, noise-free data and is a necessary condition for parameter inference in a real exp erimen t[24]. Structural identifiabilit y is further categorized as either global or lo cal. Global identifiabil- it y means that the inferred parameters are unique when considering the entire domain of physically relev an t pa- rameters. Local identifiabilit y , on the other hand, means that there exists an op en neigh b orhoo d centered on the inferred parameters in whic h the inferred parameters are unique[25, 26]. Lo cal identifiabilit y is closely related to the Fisher In- formation Matrix (FIM): g = * ∂ log P ∂ θ 2 + = ∂ 2 log P ∂ θ 2 , (1) where h·i means expectation with resp ect to the mo del. It can be shown that a mo del is lo cally iden tifiable at parameter v alues θ if and only if the FIM is non-singular at θ [25]. Qualitativ ely , a lo cally uniden tifiable mo del has redundan t parameters so that the parameter v alues can b e changed in a coordinated wa y without changing the predictions of the mo del. The null space of the FIM is precisely the linear subspace in which parameter v alues can be shifted without changing the mo del b ehavior. The other type of identifiabilit y is practical. Although in principle it may be p ossible to uniquely iden tify true parameter v alues, in practice the num b er of rep eated ob- serv ations necessary to obtain a reasonable estimate may b e impractical. Practical unidentifiabilit y is also related to the FIM through the Cramer-Rao inequalit y: C ov ( ˆ θ ) ≥ g − 1 /n, (2) where C ov ( ˆ θ ) is the asymptotic cov ariance matrix of an un biased estimator ˆ θ and n is the num b er of repeated samples. If the FIM is not singular but approximately so, it may tak e an unreasonable sample size, i.e., large v alue of n , to obtain accurate estimates. It is not uncommon, particularly in mo dels with man y parameters, for the FIM to b e p o orly conditioned lead- ing to practical identifiabilit y problems, a phenomenon sometimes known as sloppiness[14 – 20]. P arameter iden- tifiabilit y can b e impro ved through repeated application 4 of exp erimen tal design techniques[19, 27 – 34] or model reduction[21]. The basic principle b ehind the geometric in terpreta- tion of statistics is that the FIM in Eq. (1) can b e inter- preted as a Riemannian metric tensor. (Although we use the FIM as the measure of distance throughout this pa- p er, our results generalize to any distance measure among mo dels as we discuss in the supplemen t.) A parameter- ized statistical mo del is therefore equiv alent to a Rieman- nian manifold. W e present here four simple illustrative examples that will then serv e to illustrate the top ological principles w e explore later. First, we consider a mo del as the sum and difference of exponentials: ξ i ( θ 1 , θ 2 ) = e − θ 1 t i + e − θ 2 t i + z i 1 ≤ i ≤ M s e − θ 1 t i − e − θ 2 t i + z i M s < i ≤ M s + M d (3) where z i is normally distributed random v ariable with zero mean and standard deviation σ i and M s and M d are the num b er of observ ations of the sum and differences re- sp ectiv ely . This mo del could describ e, for example, the radioactivit y of a sample with tw o radioactiv e compo- nen ts in the case it is p ossible to distinguish b et w een the t wo types of radiation, i.e., it is p ossible to measure both the total radiation and the difference in radiation types as in Eq. (3). F or the mo del in Eq. (3), mo del manifolds for tw o dif- feren t observ ations (i.e., different c hoices of time p oin ts t i and exp erimen tal accuracies σ i ) are given in Figures 3 and 4. Sp ecifically , both manifolds corresp ond to a sam- pling of time p oin ts logarithmically spaced b etw een 0 . 1 and 10 . Figure 3 corresp onds to σ i = 1 for all v alues of i while Figure 4 has σ i = 1 for i < M s and σ i = 1 / 2 for i > M s . The manifolds are then generated by con- sidering the model predictions for all physically allow ed v alues of θ 1 and θ 2 . Notice that since the observ ations that led to Figures 3 and 4 are different, geometrical fea- tures (e.g., distances and curv atures) of the manifolds are also different. Ho wev er, b ecause these manifolds are b oth identifiable realizations of the same abstract model, they share important structural characteristics. Sp ecifi- cally in this case, they are b oth square-lik e, in the sense of ha ving four edges and four corners. Next, we consider tw o generalizations of the Ising mo del given by the Hamiltonians H = − X i J i s i s i +1 , (4) and H = − X i,α J α s i s i + α . (5) In Eqs. (4) and (5), s i are spin random v ariables arranged in a one-dimensional chain that can take v alues ± 1 . The probabilit y of a particular configuration is then given by a Boltzmann distribution P ∝ e − H , (6) FIG. 3: Exp onen tial Model Visu alization. A visualiza- tion of the of the mo del manifold describ ed by Eq. (3) for one p ossible realization of the abstract mo del. An alternative realization is given in Figure 4. FIG. 4: Alternate Exponential Model Visualization. An alternative realization of the mo del describ ed by Eq. (3) with different observ ations that those that led to Figure 3. Notice that by changing the observ ations, the geometric prop- erties of the mo del manifold also c hange. The manifold may stretc h and b end, but the manifold remains top ologically equiv alent to a square. where w e hav e taken k B T = 1 and the normalization is determined b y summing o v er all configurations. The parameters J i in Eq. (4) are the site-sp ecific nearest neighbor coupling of the magnetic momen ts s i . This could describe an inhomogeneous magnet with short range interactions. On the other hand, the parameters J α in Eq. (5) are the nearest and second nearest neigh- b or coupling of the magnetic moments describing a ho- mogeneous magnet with long(er) range in teractions. A visualization of the mo del manifold for Eq. (4) is giv en in Figure 5 for the case of three spins and tw o parameters. The mo del manifold for Eq. (5) is given in Figure 6 for the case of t w o parameters and four spins with perio dic b oundary conditions. Because these models are derived from different abstract mo dels, the top ological features are differen t (a square vs. a triangle). A final example is drawn from biochemistry , an enzyme substrate reaction: E + S ES → E + P. Mo deled as 5 FIG. 5: Mo del Manifold of the Ising Mo del in Eq. (4) . The generalized Ising model given b y Eq. (4) with three spins and tw o parameters is topologically a square. FIG. 6: Mo del Manifold of the Ising Mo del in Eq. (5) . The generalized Ising mo del given b y Eq. (5) with four spins and tw o parameters is topologically a triangle. three m ass-action reactions, the time-dep endence of the concen tration of each chemical sp ecies is determined by the set of differential equations d dt [E] = − k f [E][S] + k r [ES] + k cat [ES] (7) d dt [S] = − k f [E][S] + k r [ES] (8) d dt [ES] = k f [E][S] − k r [ES] − k cat [ES] (9) d dt [P] = k cat [ES] . (10) W e take the observ ations to b e the v alues of each of the c hemical sp ecies at v arious times with added random Gaussian noise of v ariable size. This mo del has three parameters, the three reaction rates k f , k r , k cat , so that the model manifold is a v olume rather than a surface. The mo del manifold is shown in Figure 7 where the fiv e colors are fiv e sides (which we consider b elo w) that en- close the v olume. Initial conditions are chosen to be a kno wn mixture of [E] , [S] and [ES] . This is an unrealistic initial condition since [ES] is unstable and spontaneously FIG. 7: Mo del Manifold of the Enzyme-Substrate Mo del is a three dimensional volume with five sides (colored red, green, blue, yello w, and purple). deca ys into its constituent parts. W e consider a more realistic scenario in a later section when we discuss man- ifold collapse. T o this point, the observ ations for all of the statis- tical models were c hosen so that all the parameters of the mo del were iden tifiable (b oth structurally and prac- tically). As suc h, rep eated sampling from these models con tains all the information to uniquely iden tify the pa- rameters in the abstract mo del, whic h in turn leads to precise predictions for other observ ations (i.e., extrap o- late to othe r statistical mo dels). W e now consider for whic h other observ ations the structural identifiabilit y of the parameters in the abstract mo del will b e preserved (at least lo cally). T o answer this question, w e use the fact mentioned ab o ve ab o v e that a statistical model is lo cally structurally iden tifiable, if and only if the FIM is non-singular. Therefore, struc- tural preserving experimental conditions can distort the distances measured on the mo del manifold provided there are no new singularities in tro duced in to the metric ten- sor. These t yp es of transformations are kno wn as dif- feomorphisms, i.e., differen tiable transformations of the mo del manifold. Therefore the manifolds in Figures 3- 5 are diffeomorphic to a square, Figure 6 is diffeomorphic to a triangle and Figure 7 is diffeomorphic to a p entahe- dron. As an aside, throughout this pap er we say that a man- ifold that is square-lik e (as in Figures 3- 5) is top ologi- cally a square (and similarly for other shap es). By this w e mean the manifold is diffeomorphic to a square. W e w arn the reader that the phrase “top ological equiv alence” is collo quially used to mean inv ariance under home omor- phisms rather than diffeomorphisms. In this pap er we restrict ourselves to the study of differential top ological prop erties of manifolds, so w e use this phrase without am biguity although it is not standard. Notice that each of the manifolds in Figures 3-7 are b ounded by edges which are in turn b ounded by corners. If w e restrict our atten tion to observ ations that are dif- feomorphic to those in these figures, then this hierarchi- 6 cal structure of b oundaries and edges will b e preserved. This indicates that the hierarchical b oundary structure is a feature of the abstract model and not of the spe- cific observ ations. Indeed, we no w give explicit formulas and interpretations for each of these limits and find that these b oundaries alw a ys represen t an extreme limit of the principles in the abstract mo del. The simplest example is in Figures 3-4 in whic h the mo del manifold is topologically a square. The limits of the four edges corresp ond to either of the parame- ters reaching their ph ysically limiting v alues individually: θ µ → ∞ or θ µ → 0 . Because these limits act on the un- derlying physical principles, it is not difficult to ascribe a ph ysical in terpretation to these edges. They are the cases in whic h one of the radioactiv e species either de- ca ys instantly or not at all (relative to the exp erimental time scales). The generalized Ising mo del in Eq. (4) (depicted graph- ically in Figure 5) is likewise diffeomorphic to a square. The four limiting cases are likewise eac h of the t wo pa- rameters reaching the limits of their physically relev ant v alues: J µ → ±∞ . These limits ph ysically corresp ond to p erfect, lo cal ferromagnetism or anti-ferromagnetism b et w een tw o neighboring spins. No w consider the generalized Ising mo del in Eq. (5) and visualized in Figure 6. This mo del manifold is diffeo- morphic to a triangle. Because this model is constructed to b e translationally in v arian t, it is natural to consider a F ourier transform of the spins as in[21, supplement]. One limiting case corresp onds to the limit of J 1 → ∞ , J 2 → −∞ suc h that J 1 + 2 J 2 → finite. Careful analysis sho ws that in this limit, configurations with the high- est F ourier frequency hav e infinite energy (i.e., hav e zero probabilit y). W e refer to this as the ferromagnetic limit (high-frequency configurations in the spins corresp ond to an ti-ferromagnetic order and ha ve zero probability). An- other limit o ccurs when J 1 − → ∞ , J 2 → −∞ such that J 1 − 2 J 2 → finite whic h in a similar w ay remo ves the lo w-frequency configurations in the spins, i.e., an anti- ferromagnetic limit. Finally , the limit J 2 → ∞ with J 1 remaining finite corresp onds to the limit of no mid- frequency configurations. Finally , w e consider the fiv e surfaces bounding the p en- tahedron in Figure 7. The green surface corresp onds to the limit k r → 0 , i.e., the first reaction is no longer re- v ersible. The red surface corresp onds to k f , k r → ∞ suc h that K d = k r /k f remains finite, in terpreted as the case in which the first reaction is alw a ys in equilibrium. The blue surface corresponds to k cat → 0 , i.e., the second re- action does not o ccur. The y ellow surface corresponds to k f → 0 , i.e., the first reaction proceeds only in the rev erse direction. The purple surface corresp ond to the limit k r , k cat → ∞ suc h that the ratio k r /k cat remains finite, corresponding to the case that the in termediate complex [ES] never accumulates. The top ological rela- tionship among the faces, edges, corners, etc. of the mo del is rich with ph ysical meaning as we will see in section IV. I II. MANIFOLD COLLAPSE LEADS TO CHANGES IN MODEL STRUCTURE W e no w consider how the analysis describ ed in sec- tion I I changes when the observ ations alter the top ologi- cal structures of the manifolds, i.e., are not related by diffeomorphisms. Since the mo dels in section I I w ere constructed to be identifiable, a structural change here corresp onds to a c o arsening of the observ ations. In most cases, this means simply not observing some of the p oten- tial mo del predictions. In the language of mathematical probabilit y , this means the distribution is marginalized o ver a subset of the random v ariables resulting is a coarse- grained set of observ ations. Geometrically , the corre- sp onding manifolds are compressed along some direction. In this w ay , manifolds ma y b e compressed, folded, or edges ma y b e glued together. On the other hand, rev ers- ing this process can result in manifolds that are stretc hed or torn. Consider the exp onential mo del in Eq. (3) for the limit in which the difference in the exp onential terms are not observ ed. This occurs if the specific ra dioactive prod- ucts of the tw o decay c hannels are indistinguishable, or if the exp erimental cost of observing the difference is pro- hibitiv e so that the experiment is not performed. In this case, Eq. (3) is mo dified to be ξ i ( θ 1 , θ 2 ) = e − θ 1 t i + e − θ 2 t i + z i . (11) The manifold of this mo del is giv en in Figure 8. Notice that the top ological structure of the manifold is funda- men tally differen t from that in Figures 3 and 4. The rela- tionship b et w een the coarsened manifold and the original manifold is illustrated by the colored lines in Figures 8 and 9. In effect, the manifold has b een folded in half such that the t wo white lines in Figure 9 are identified with eac h other and corresp ond to the white line in Figure 8. Similarly for the blue, green and red lines. The black line is the “fold line.” The blac k “fold line” in Figures 8 and 9 corresp onds to the curve for whic h θ 1 = θ 2 in the mo del. This line is significan t b ecause for the coarsened model in Eq. (11), the Fisher Information is singular along this line. This singularit y is the mathematical indication of the corre- sp onding structural c hange. F or all other p oin ts on the mo del, the Fisher Information remains nonsingular, so that the mo del is still structurally iden tifiable in the lo- cal sense almost everywhere. How ev er, the mo del is no longer globally identifiably since each p oint on the man- ifold corresponds to tw o p oin ts in parameter space. The structural change illustrated in Figures 8 and 9 in- dicates that the observ ations carry qualitatively differen t information about the parameters of the abstract model. Sp ecifically , there is a loss of information regarding the distinguishabilit y of the radioactive pro ducts. In order to construct an identifiable mo del, the abstract model must b e mo dified. In this case, the ph ysical domain of the pa- rameters can b e restricted to θ 1 > θ 2 , i.e., we arbitrarily 7 FIG. 8: Coarse-grained Exponential Model . If only the sum of the exp onential terms is observ ed, then the mo del manifold is structurally differen t from that in Figures 3 and 4; it is now a triangle. The top ological change reflects a c hange in the information conten t of the observ ations about the un- derlying theory and manifests itself as a structural unidenti- fiabilit y in the resulting model. FIG. 9: F ull Exp onential Mo del showing which p oin ts will b e identified under coarse-graining. In transitioning from the manifolds in Figures 3 and 4, the (square) manifold is ef- fectiv ely folded in half to produce the triangle structure in Figure 8. Here, lines of the same color on the original top ol- ogy become identified with each other after coarsening. The blac k line corresp onds to the fold line. order the parameters so that they are no longer iden tified with a specific radioactive agent. No w consider the generalized Ising mo del in Eq. (4). F or the sp ecific case of three spins visualized in Figure 5, w e no w consider the effect of observing only spins one and three and marginalizing the distribution ov er spin t wo. In this case, the tw o dimensional manifold in Fig- ure 5 collapses to a one-dimensional curve. The nature of this collapse is illustrated by the colored lines in Fig- ure 10. After marginalizing the distribution, the colored lines eac h collapse to a single point. A dditionally , the manifold is “folded up” so that the disconnected lines of the same color each map to the same p oint. That is, the blue line near the top and near the b ottom each collapse to a single p oint and the manifold is folded in half to FIG. 10: Identified p oin ts for the generalized Ising mo del in Eq. (4) . After coarsening, the tw o dimensional square topology collapses to a line segment. Here, lines of the same color are the set of p oin ts that collapse to an single p oin t on the final line segment. Notice that this collapse inv olv es b oth a type of compression, in which a sequence of connected p oin ts are squeezed together, and a global folding, in which blue lines (for example) on opposite sides of the manifold are identified with each other. The black line represents the mo del in which the t wo parameters are equal, i.e., the usual one-parameter nearest neigh b or Ising model. Notice that this line is folded in half under coarsening. iden tify these p oin ts. It is in teresting to consider the effect on the common Ising mo del corresp onding to J 1 = J 2 in Eq. (4). This one-dimensional curve is illustrated by the black line in Figure 10. Notice that it is folded in half by the coarse- graining. Since the previously tw o-dimensional manifold is col- lapsed to a one-dimensional curv e, the FIM matrix is sin- gular for all parameter v alues up on coarsening. Th us the new mo del is structurally uniden tifiable. The meaning of the parameters in the abstract mo del must b e mo dified in order to construct an iden tifiable mo del. In this case, the information lost is the nature of the short-range inter- actions. Are they ferro-magnetic or anti-ferromagnetic? After coarsening, the answer to this question is lost. As sho wn in reference[21], the new model can be exactly represen ted by an effectiv e interaction b et ween spins one and three: H = − ˜ J s 1 s 3 . (12) W e quantify this effective in teraction with the renormal- ized parameter ˜ J . The origin of this in teraction is un- dersto od to b e mediated by the microscopic interactions describ ed by J 1 and J 2 neither of which can be identified individually from the data. The generalized Ising mo del in Eq. (5) also collapses if w e similarly only observe spins one and three, ignor- ing spins t wo and four. The t w o dimensional surface in Figure 6 then collapses to a one dimensional curv e. The details of this collapse are illustrated in Figure 11 in whic h lines of a single color collapse to a single p oint. Notice that the manifold is not “folded” in the same wa y 8 FIG. 11: Iden tified p oin ts for generalized Ising mo del in Eq. (5) . After coarsening, the tw o dimensional triangle top ology condenses to the same line segmen t as that for the mo del in Eq. (4). Here, lines of the same color are collapsed to a single p oin t. Unlike the collapse in Figure 10, the structure do es not exp erience an y folding, rather the area b etw een tw o of the edges are collapsed to a line and the third edge collapses to a p oint. The blac k line corresp onds to the model with J 2 = 0 , with J 1 , i.e., the usual one-parameter nearest neighbor Ising model. This black line is the same as that in Figure 10. Although the whole manifold is not folded, this particular sub-manifold is folded in half under coarsening, just as it w as in Figure 10. as that in Figure 10. Indeed, tw o edges of the triangle are simply brough t together and the third edge is collapsed to a p oint. Considering the curve J 2 = 0 (giv en by the blac k line), w e see that this curv e is folded in half, just as it w as in Figure 10. The tw o abstract mo dels of these t w o generalized Ising mo dels are fundamen tally different. Eq. (4) corresp onds to a inhomogeneous magnet with short-range in terac- tions, while Eq. (5) is a homogeneous magnet with longer- range interactions. These differences are captured in the differen t top ological structures. How ever, upon coarsen- ing, b oth mo dels collapse to a one-dimensional curve. In fact, they collapse to the same mo del and are realiza- tions of the same abstract mo del. This effective mo del is one in which observ ed spins in teract “directly”. That “direct” in teraction, is of course, mediated by differen t microscopic interaction, of a now unkno wn nature (and unkno wable from the coarsened observ ations). This simple example demonstrates how manifold col- lapse can b oth explain univ ersality and justify the use of simple, effective mo dels. This example is b est understo o d in analogy with similar arguments based on renormaliza- tion group analysis. The similarity of these analyses be- comes a vehicle for generalizing these concepts to broader mo del classes. Finally , we turn our atten tion the enzyme-substrate mo del. T o coarsen this mo del, we consider an alternative initial condition in whic h only [E] and [S] hav e nonzero initial condition and observ e only the final product [P] (ignoring the time course of the other three v ariables). The corresp onding model manifold is illustrated in Fig- FIG. 12: Coarse-grained Enzyme-Substrate Mo del A more realistic experimental conditions for the enzyme- substrate model results in a manifold that has not completely collapsed as it remains a three dimensional volume. How ever, sev eral of the boundaries of the manifold in Figure 7 has col- lapsed. The yello w and blue surfaces hav e collapsed to points and the purple surface has collapsed to a line. The resulting mo del is structurally iden tifiable, but not practically identifi- able. The practical unidentifiabilit y is closely related to the top ological c hanges of the b oundary structure. ure 12. The model manifold for these observ ations is three dimensional; how ever, the p en tahed ron of Figure 7 has collapsed to a narro w volume b ounded b y tw o sur- faces (red and green), eac h of which are digons, i.e., tw o- sided polygons. T o understand manifold collapse in Figure 12, Notice that if k cat = 0 in Eqs. (7), that no pro duct can b e pro- duced. Consequen tly , the blue and y ellow surfaces in Figure 7 are collapsed to a single p oint in Figure 12. The entire manifold has been compressed, so that the t wo remaining surfaces are v ery close to one another and are therefore pr actic al ly unidentifiable. Th us, in princi- ple it is p ossible to iden tify all of the parameters in the mo del using only observ ations of the pro duct; ho wev er, in practice the near-collapse of the manifold mak es it v ery difficult. The structural collapse of the b oundaries in Figure 12 reflects the causal dependence among k f , k r , and k cat in the abstract model. The practical uniden tifiability re- sulting from the coarsening is intimately tied to this re- lationship and is manifest as a structural change in the mo del’s topology . Inspecting Figure 7, we see that the blue and yello w surfaces are necessary to “pull apart” the red and green surfaces. If the choice of observ ations col- lapses the blue and yello w surface to a single p oint (as in Figure 12), it b ecomes difficult to statistically distin- guish b et ween the red and green mo dels (as w ell as all the mo dels b etw een them). W e discuss this phenomenon in more detail in section IV. 9 IV. HYPER-CORNERS DEFINE PHENOMENOLOGICAL CLASSES A. Boundary structures are represented b y Hasse Diagrams The mo dels w e hav e considered here hav e mani- folds with an in trinsic hierarc hical structure. They are b ounded by (hyper-)surfaces of one less dimension, which are in turn bounded by other (hyper-)surfaces of ev en lo wer dimension. This structure is deriv ed from the “rules” of the abstract mo del. It has b een shown that this structure is common to many mo dels[35]. This hierar- c hical structure is describ ed mathematically as a graded partially ordered set, i.e., a graded p oset. The grading of eac h element in the set is determined b y its dimension (corners are zero-dimensional, edges are one-dimensional, etc.). The partial ordering is induced b y the fact that some corners are con tained within some edges but not others. In general mo dels that share this basic construc- tion w ill hav e many more than tw o or three parameters making them difficult to visualize in data space as we ha ve done previously . How ever, the relationships among this hierarc hy of hyper-surfaces can b e represented as a hierarc hical graph structure known as a Hasse diagram, as sho wn in Figure 13. A Hasse diagram graphically illustrates the relation- ship among the mo del manifold, its faces, edges, corners, etc. The complete mo del is represented as a single man- ifold of dimension N (where N is the num b er of param- eters), giv en b y the top node of the graph. This man- ifold may b e b ounded by a collection of hyper-surfaces of dimension N − 1 , i.e., the second ro w in the figure. Lik ewise, eac h of these hyper-surfaces may b e b ounded b y other h yp er-surfaces of dimension N − 2 , i.e., the third ro w in the figure. In this paper, we do not consider mo d- els that are unbounded in some directions. The arrows connecting the top no de to the no des in the first row show that eac h of these surfaces is a boundary to the no de ab ov e it. The arrows connecting the second and third rows similarly represen t which N − 2 dimensional h yp er-surfaces are b oundaries to whic h N − 1 dimensional h yp er-surfaces. In this w a y , a Hasse diagram summarizes the topological relationships among all the b oundaries of the model. Near the bottom of the graph are no des of zero di- mension (lab eled by Greek letters) which are vertices of the manifold. It is common in the theory of abstract p olytopes to include a single no de in the Hasse diagram b elo w the p oints of dimension 0 (i.e., dimension -1) cor- resp onding to the empty set[36 – 38]. Examples of Hasse diagrams for the manifolds in sections I I- II I are shown in Figure 14. Dimensi ons .... .... A B C D a b c d .... .... LF GF .... FIG. 13: Hasse Diagram. The relationship among b ound- aries, corners, h yp er-corners, etc. may b e summarized by a hierarc hical graph structure known as a Hasse Diagram. The top no de of the graph represen ts the full N -dimensional mo del (kno wn as the greatest face and denoted by GF). The first ro w of nodes correspond to the surfaces of dimension N − 1 that b ound this model. The next row represen ts the N − 2 - dimensional surfaces that b ound those in the previous row, and so forth. The tips of the graph correspond to mo dels of dimension zero (i.e. a single p oin t). It is common in Hasse diagrams to hav e a single no de at the b ottom (corresponding to dimension -1) represen ting the empty set and known as the least face (denoted by LF). B. f -vector and Euler Characteristic A top ologically important quantit y that can be read directly from the Hasse diagram is the f -vector. The f -v ector is a list of in tegers giving the total num b er of no des on eac h row (i.e., of a particular dimension) of the Hasse diagram. The f -v ectors for the Hasse diagrams in Figure 14 are line segment: (1,2,1), triangle: (1,3,3,1), square: (1,4,4,1), enzyme reaction: (1,5,8,5,1), coarsened enzyme reaction: (1,2,2,2,1). The Euler-c haracteristic is a top ological inv ariant cal- culated as the alternating sum of terms in the f -vector χ = N − 1 X i =0 ( − 1) i f i . (13) It is straigh tforward to chec k that for eac h of the mo d- els considered here, χ = 1 − ( − 1) N [36]. The significance of this result is that all of the manifolds w e consider here are orientable and none ha ve holes or handles (such as w ould a don ut or a coffee m ug). W e an ticipate that thes e are prop erties that will be common for many models, but it is p ossible to find other examples. F or example, con- sider another coarse-grained v ersion of Eq. (3) in which 10 FIG. 14: Examples of Hasse Diagrams. T op: F rom left to right the Hasse diagrams of a line segment, a triangle, and a square. In eac h case, edges are arbitrarily labeled b y a let- ter and corners by t wo letters corresp onding to the adjacent edges. Bottom left: The Hasse diagram of the p en tahedron in Figure 7. F aces are lab eled with the letter of the corresp ond- ing color in Figure 7 (R for red, etc.). F our of the five faces are triangles; the green face is a square. Bottom right: The Hasse diagram for Figure 12. In this case, the three dimen- sional volume is bounded by tw o digons. only the difference of exp onentials is observ ed: ξ i ( θ 1 , θ 2 ) = e − θ 1 t i − e − θ 2 t i + z i . (14) In this case, the square of Figure 3 is pinched off to a p oin t at the line θ 1 = θ 2 . The resulting structure is not a manifold since the manifold structure breaks down at θ 1 = θ 2 . Instead, it is tw o manifolds that are top ologi- cally digons that are glued together at one of their tips as illustrated in Figure 15. The f v ector for this structure is (1,3,4,1) and has an Euler c haracteristic χ = − 1 . Because there is only one greatest face and least face, the first and last entries of the f -vector will generically b e one, there are several equiv alent v ariations of Eq. (13). F or mo dels satisfying χ = 1 − ( − 1) N , we can similarly FIG. 15: Collapsed Exponential Mo del. Left: If only the difference in exp onentials is observed (as in Eq. (14)), then the square top ology in Figure 3 collapses to a point along the line θ 1 = θ 2 . The manifold structure breaks do wn at the v ertex γ , so this structure is not a manifold. It is tw o separate manifolds, eac h with the top ology of a digon connected at their tips. Right: the Hasse diagram for this structure. write[36] N X i =0 ( − 1) i f i = 1 (15) N X i = − 1 ( − 1) i f i = 0 . (16) C. Hasse diagrams identify imp ortan t parameter com binations Another use of the Hasse diagram is understanding the relationship b etw een mo del parameters and the mo del’s phenomenology . Quite often, particularly for sloppy or practically uniden tifiable models, the N indep enden t pa- rameter combinations of a model are not equally im- p ortan t for explaining the mo del’s observ ations, nor is it clear how v ariations in parameter v alues translate to mo del b eha vior. Indeed, for man y observ ations, there is a clear hierarc hy of imp ortance in the mo del param- eters that is revealed by an eigenv alue decomp osition of the FIM. The eigenv ectors of the FIM can then b e inter- preted as the linear combinations of parameters that are relativ ely important or unimportant for understanding the model b eha vior. Unfortunately , this interpretation is based on a lo cal, linear analysis. In reality , the truly imp ortan t parameters com binations are nonlinear com- binations of the mo del’s bare parameters. Identifying and interpreting the appropriate nonlinear combination requires a global, top ological analysis. T o illustrate, w e consider the enzyme-substrate mo del in Figure 12. Notice that the coarsened manifold in Figure 12 is a three dimensional volume. As such, the statistical mo del is structurally identifiable. How ever, the manifold is v ery 11 thin and as suc h it is practically unidentifiable. The practical unidentifiabilit y is closely related to the col- lapse of the b oundaries. F urthermore, there is a global anisotrop y; manifold has a clear long axis. By considering the statistical mo del for observ ations of the product P , an y information ab out the rates k f and k r of necessity must be inferred through the third parame- ter k cat . F urthermore, when only the substrate is stimu- lated, an y information about this stimulation must first pass through the intermediate complex [ES] . It is pre- cisely this informational dep endence that is described by the b oundary collapse as discussed in section I I I. Because the b oundaries of the manifold collapse, many different p oin ts on the manifold are dra wn very near one another leading to a hierarc hy of imp ortance in the parameters. P arameters that were originally easy to distinguished b e- come practically uniden tifiable. W e no w consider the functional form of the red and green surfaces in Figure 12. The green surface corre- sp onds to the limit k r → 0 , i.e., the approximation that the first reaction is not reversible. The red surface corre- sp onds to the limit k f , k r → ∞ with K d = k r /k f remain- ing finite,i.e., the approximation that the first reaction is in equilibrium. F rom the equilibrium approximation, one can derive the w ell-known Michaelis-Men ten appro xima- tion: d dt [P] = k cat E 0 [S] K d + [S] , (17) where E 0 = [E] + [ES] is the total amount of enzyme and K d = k r /k f . The tw o edges that join the red and green surfaces can then b e found as limiting approximations to Eq. (17). They are K d → 0 , which corresp onds to the approxi- mation that that the reaction is alw ays saturated, i.e., a constan t pro duction rate: d dt [P] = k cat E 0 . (18) Alternativ ely , the second limit corresp onds to k cat , K d → ∞ with k eff = k cat /K d remaining finite leading to the form d dt [P] = k eff E 0 [S] , (19) whic h corresp onds to a linear approximation. The vertices b ounding the t wo edges are limits k cat , k eff → 0 or k cat , k eff → ∞ corresp onding to either no reaction or a reaction that that completes instanta- neously . This hierarch y of models is a standard in terpretation of the Michaelis-Men ten dynamics in biochemistry and is naturally reco vered by in terpreting the nodes in the Hasse diagram as illustrated in Figure 16. Notice that a simplified mo del can b e asso ciated with eac h no de in the Hasse diagram. As one mo ves to lo wer dimension, these mo dels become progressively simpler a LF b G P GF RG R Full Model Irreversible Michaelis-Menten Saturated Linear Instantaneous No Reaction FIG. 16: Mo del Hierarc hy for coarsened Enzyme- Substrate Mo del. The nodes of the Hasse diagram for the coarsened enzyme-substrate mo del (as in Figure 12) cor- resp ond to a well-kno wn hierarc hy of mo dels, including the w ell-known Mic haelis-Menten rate-law. and the relationship b etw een the model b ehavior and the mo del parameters b ecomes progressiv ely clearer. F ur- thermore, the simpler mo dels naturally group the param- eters of the complex abstract mo del into the appropriate nonlinear com binations that are connected to the mo del b eha vior. W e refer to these simplified mo dels as emer- gen t mo del classes and their b ehaviors as the dominant b eha vioral modes. The b eha vior of the full model can then b e reinterpreted as a combination of these charac- teristic modes. F or mo dels of mo derate complexity , constructing the Hasse diagram may rev eal imp ortant insigh ts regarding the informational relationships among model parame- ters and b ehaviors suc h as w e ha ve done ab ov e for the enzyme-substrate mo del. Such insigh ts may guide exp er- imen tal design, mo del reduction, and mo del interpreta- tion. F or more complex models, it may b e impractical to explicitly construct the entire diagram. In these cases, m uch of the b enefit can still be found by identifying a single path do wn the graph from the top no de to an ap- propriate appro ximate mode l. D. The observ ation semi-group and diffeomorphism subgroups Observ ations that preserve the b oundary structure of the mo del manifold are represented by the same Hasse diagram and are related b y diffeomorphisms. The set of diffeomorphisms of a manifold form a group. Therefore, among the set of all possible observ ations there are sub- sets that are groups and c haracterized b y a unique Hasse diagram. The en tire set of all p ossible exp erimen tal con- ditions do not form a group, how ev er. This is b ecause some observ ations lead to manifold collapse whic h has no unique inv erse operation. All p ossible observ ations there- fore form a semi-group (a group-like structure but with some elements that do not hav e an inv erse). This struc- 12 2 Digons Square Triangle Point Line Segment 1 Line Segment 2 FIG. 17: Hasse Diagram for Observ ation semi-group The mo del defined in Eq. (3) corresp onds to the diffeomor- phism subgroup describ ed by a square topology . The “trian- gle” subgroup is giv en by Eq. (11) and the “2 Digons” sub- group is given by Eq. (14). If only a single time p oin t is ob- serv ed, th ese structures collapses to line segments. The least face corresponds to the “p oint” top ology . The Hasse diagram illustrates the ordering square ≺ triangle ≺ line segment 1 ≺ p oin t. There is no relation betw een the triangle and digons subgroups or their collapsed line segments. Note that the or- dering is not graded; the square, triangle, and digons are eac h t wo-dimensional structures. ture is reminiscent of the renormalization group which is also famously not a group, but a semi-group for similar reasons. W e refer to this as the observation semi-gr oup and the prop er subgroups c haracterized by unique Hasse diagrams as diffe omorphism sub gr oups . The set-lik e structure of observ ations induces a partial ordering on the diffeomorphism subgroups. Let G 1 and G 2 represen t tw o diffeomorphism subgroups of an ab- stract mo del. W e say G 1 ≺ G 2 if there exist observ ations o 1 ∈ G 1 and o 2 ∈ G 2 suc h that o 1 ⊂ o 2 . Being a p oset, the diffeomorphism subgroups can be represen ted by a Hasse diagram, as we demonstrate for the exp onential mo del from Eq. (3) in Figure 17. At the top of the diagram is the group preserving the square top ology as in Figures 3 and 4. Below this are the sub- groups whose top ologies are a triangle as in Figure 8 and Eq. (11) and tw o digons as in Figure 15. If the observ a- tions are further coarsened so that only one time point is observed, these topologies collapse to line segmen ts. Finally , if nothing is observ ed w e arrive at the least face, i.e., a single p oint. Unlik e the Hasse diagrams of section IV A, the Hasse diagram for the diffeomorphism subgroups is not graded. In particular, note in Figure 17 that triangle ≺ square ev en though they are b oth tw o dimensional. Similar to the Hasse diagrams in section IV A, the diffeomorphism subgroups will alw ays include a minimal elemen t that corresp onds to the empty set (i.e., no observ ations) whose top ology is a single p oin t. W e sp eculate, that for most mo dels there will also exist a unique “maximal top ology” (i.e., greatest face). The square is the unique maximal top ology for the exponential model, and the p en tahe- dron in Figure 7 is the maximal top ology for the enzyme- substrate model. V. TOPOLOGICAL EMBEDDING DEFINES ST ABILITY OF MODEL CLASSES A. Mo del classes are ground states of statistical mec hanics mo dels In this section, w e consider the topology of hierarchi- cal models. W e in tro duce a concept called top olo gic al emb e dding , i.e., how top ologies of small mo dels are em- b edded within larger ones. W e find that top ological em- b eddings allo ws us to define the concept of stability for b eha vioral mo des with resp ect to structural c hanges in the abstract mo del. W e motiv ate this definition in the con text of log-linear statistical mec hanics models, such as cluster expansions of binary alloys. W e therefore b e- gin by discussing the top ology for these models and its ph ysical significance. Consider a simple cluster expansion on a tw o- dimensional four by four square lattice with p eriodic b oundary conditions. Sites can be o ccupied by one of t wo atom types, i.e., a binary system. The Hamiltonian for binary in teractions up to fifth near-est neigh b or takes the form H = − 5 X k =1 X d ( i,j )= d k J k s i s j , (20) where the first sum indicates a sum ov er in teractions and the second sum indicates a sum o ver sites such that the distance b et ween sites i and j is equal to the k th nearest neigh b or distance. The random v ariables s i tak e on v al- ues ± 1 , indicating which atom t yp e occupies the site, ac- cording to a Boltzmann distribution P ( s ) ∝ E − H where w e hav e absorb ed the temp erature dep endence in to the definition of J k . This mo del has 16 binary random v ari- ables and therefore can exhibit 2 16 = 65536 distinct configurations. Man y of these states are symmetrically equiv alent; only 432 configurations are crystallographi- cally distinct. The top ology of this mo del is summarized b y the Hasse diagram in Fig. 18 (omitting the least face). Although this mo del corresponds to a five-dimensional manifold (and is thus difficult to visualize), from the Hasse dia- gram it can b e seen that the manifold has six h yp er- corners corresponding to zero-dimensional limiting cases (six nodes on the bottom row). These six points form the b oundaries for 15 lines (second ro w from the b ot- tom). Con tinuing to w ork up the graph, these lines form the boundaries for 20 triangles, whic h form the b ound- aries for 15 tetrahedra, which form the b oundaries for 6 first-order b oundaries of the manifold. These first order b oundaries are eac h 4 dimensional ob jects b ounded by fiv e tetrahedra. 13 FIG. 18: Hasse diagram of a square-lattice, fifth- nearest neighbor mo del. The vertices of the mo del corre- sp ond to all of the ground state energy configurations for this mo del class. P aths from the greatest face (lab eled 0) to the v ertices (lab eled 57 - 62) corresp ond to systematic coarsen- ing of av ailable configurations. F or example, the path along no des 0 → 1 → 7 → 22 → 42 → 57 systematically remov es the high frequency configurations. As we hav e seen, the low dimensional elemen ts of the Hasse di agram are mo dels characterizing the b eha vioral mo dels of the mo del whic h we no w in terpret. Moving from the greatest face to a first order boundary (i.e. an y no de from 1 to 6) corresp onds to a limiting case in which J k → ±∞ , i.e., a zero-temp erature limit. In these limits, man y of the possible configurations are “frozen out” so that the models corresp onding to the nodes 1 through 6 hav e b etw een 21 and 65 structurally distinct configu- rations with nonzero probabilit y (rather than the 432 in the complete mo del). Although this is a lo w-temp erature limit, the remaining configurations are not limited to the ground states. Rather, the discarded configurations are those that are first to b ecome irrelev ant at lo w temp era- tures. Mo ving do wn the Hasse diagram results in similar lim- its. A t eac h lev el, more and more configurations are re- mo ved. Eac h level remov es the next group of configura- tions least relev ant at low temp eratures. The final re- sult of this pro cess is that only the ground states remain for mo dels of dimension 0. Each vertex corresp onds to a mo del with only a ground state configuration. These ground states are illustrated in Fig. 18. The proces s of systematically removing high-energy states from a mo del is reminiscent of a renormalization group pro cedure. This analogy is reinforced b y the obser- v ation that the sequence of approximations corresp ond- ing to the path lab eled 0 → 1 → 7 → 22 → 42 → 57 re- mo ves configurations in order of their F ourier frequencies, from highest to low est. How ever, for many parameter v al- ues, high-frequency configurations do not corresp ond to high energy , the anti-ferromagnet being the seminal ex- ample. F or the an ti-ferromagnet, the c heck erb oard con- figuration is a ground state (no de 62). The topological structure of the model identifies the sequence of configurations least relev ant and leads to a series of approximate effective mo dels in which these con- figurations are remo ved. The sup erficial similarit y to a renormalization group procedure alluded to here will b e explored in more detail elsewhere. W e now consider a similar mo del, a cluster expansion of a binary alloy on an FCC lattice and ask, what are the possible ground state configurations for mo dels with differen t num b ers of clusters? This mo del takes the same basic form as b efore: H ( s ) = X i Π i ( s ) θ i , (21) where s is the configurations of the t w o atom types on the lattice and Π i ( s ) are the energy con tributions of a cluster expansion. The parameters then are the contributions to the energy from nearest neighbor, next nearest neigh b or in teractions, etc., but can include also many b o dy in ter- actions. F or a real allo y , one exp ects that the true energy in volv es con tributions from all of these terms. How ev er, including all order of the cluster expansion is b oth im- practical and not theoretically enlightening. In practice , the sum is truncated after a finite n umber of clusters. Ho w do the p ossible ground states dep end on this trun- cation? Although con tributions to the total energy will b e dominated b y a few large terms in the sum (e.g., binary in teractions of nearby neighbors) while other terms are less imp ortan t (many b o dy in teractions of distant neigh- b ors). How ever, the question of whic h are the relev an t parameters to include is more complicated than simply iden tifying whic h parameters are small. In particular, some configurations may be unstable to small p ertur- bations in other parameters. Identifying an appropri- ate mo del inv olves identifying the appropriate parameter com binations that pro duce ground states that are stable with resp ect to the p erturbation in the omitted parame- ters. B. T op ological embedding identifies relev ant and irrelev ant parameters W e now consider a sequence of mo dels defined in Eq. (21) found by truncating the series at different n um- b ers of parameters. F or a binary allo y on an FCC lattice, the manifold for a t w o parameter mo del with both near- est and second nearest neighbor tw o-b o dy in teractions is a p en tagon, illustrated in Figure 19. W e now consider the effect of truncating this this model to a one parameter mo del. W e do this by setting each of the t wo parameters to zero and v arying the other parameter. These one pa- rameter mo dels are illustrated by the red and blue curves 14 FIG. 19: T w o parameter F CC lattice mo del. This mo del is topologically a pentagon. Corners are indicated b y white dots. T wo sub-manifolds (red and blue lines) corre- sp ond to the mo dels in whic h one of the tw o parameters is fixed to zero. The blue curv e connects tw o ground states for the tw o parameter system, indicating that the ground states for this one parameter mo del are stable to small p erturbations in the second parameter. The red curve, on the other hand, ends on an edge of the tw o parameter mo del, indicating that the corresp onding ground state is unstable to small p ertur- bations in the second parameter. The top ological embedding of these mo dels is summarized by the hierarc hical graph in Figure 20. The carto on inset summarizes the topological re- lationship among the three manifolds. in Figure 19. The corners of the t wo parameter mo del are illustrated b y white dots in Figure 19. These dots represen t the emergen t mo del classes and b ehavioral mo dels of the t wo parameter model (in this case the distinct ground states of the alloy). Because the one parameter mo dels are each included as special cases of the tw o parameter mo del, their top ology is em b edded within the top ology of the tw o parameter mo del. W e summarize the relation- ship b et ween the topologies of t wo parameter model and eac h of the one parameter models b y the dashed lines in Figure 20. F or example, b ecause each of the red and blue curv es are subsets of the full tw o parameter mo del, w e dra w a dashed line from the greatest face of the tw o parameter graph to the greatest face of each of the one FIG. 20: T op ological em b edding summarized by Hasse diagrams. The tw o parameter mo del visualized in Figure 19 is topologically a pentagon summarized by the Hasse diagram on the right. The tw o mo dels of one parameter (red and blue curv es in Figure 19) are line segmen ts (Hasse diagrams on the left) that are em b edded in the p entagon. Em b edding relationships are indicated by dashed lines across Hasse diagrams. The ground states of the blue curve ( θ 2 = 0 , ab o v e) are stable to small perturbations in θ 2 and are also ground states of the tw o parameter mo del (ab ov e). Dashed lines with t wo arrows indicate that no des are equiv alent. One ground state of the red curve ( θ 1 = 0 ) is stable to small p er- turbations in θ 1 and is likewise a ground state of the tw o parameter mo del. The second ground state (colored blac k) is unstable because it is a subset of an edge of the t wo param- eter model, indicated by a dashed line with one arrow. This ground state will b e unstable to small perturbations in θ 1 . parameter diagrams. F urthermore, b ecause both vertices of the blue curve corresp ond to corners of the full mo del, w e connect them to the appropriate vertices of the tw o- dimensional diagram with dashed lines with tw o arro ws to indicate equiv alence (Fig. 20, top). In contrast, only one v ertex of the red curv e is also a v ertex of the tw o parameter mo del. The other vertex in tersects an edge of the tw o-parameter mo del. W e indicate that this v ertex is a subset of an edge by a dashed line connecting the relev ant no des of the graph (Fig. 20, b ottom). T o summarize, the vertices of the red and blue curv es in Figure 19 represent the ground states for the one pa- rameter mo dels. One of the ground states for the red curv e is not a ground state for the tw o parameter mo del and using this simple mo del would incorrectly predict the 15 existence of a stable structure corresp onding to this ver- tex. The reason for this is not that the missing parameter is large in an actual alloy system, but that that config- uration is unstable to arbitrarily small p erturbations in this parameter. This instabilit y is manifest in the w a y the one param- eter top ology is embedded in that of the tw o parameter mo del. Since the vertex of one parameter mode l lies on an edge and not a v ertex of the tw o parameter mo del, the second parameter m ust be tuned to realize this b eha vior. If a model class is unstable to the addition of a new pa- rameter, we classify this parameter as relev ant for mo d- eling the b eha vior. Similarly , if a model class is stable to the in tro duction of a new parameter, we classify the pa- rameter as irrelev ant. Qualitatively , relev ant parameters are those that must b e tuned to realize a b ehavior. The terms stable/unstable and relev an t/irrelev ant are used in analogy with similar definitions in R G analy- sis. This present discussion has revolv ed around finding ground states for alloy mo dels (i.e., phase transitions) in order to make the connection to these standard def- initions more transparent. An R G fixed p oin t with no unstable direction (i.e., a sink) corresp onds to a bulk phase[22, 23] because the system behavior in stable to v ariations in the parameters. Near a fixed p oin t, relev ant parameters are those that need to b e tuned to realize a phase transition or a critical p oint. The language of in- formation top ology allows the same notions of stability of system behavior to extend to other div erse systems. In this w ay , the question of what representation is appropri- ate for a particular physical system can b e answered in a systematic w ay . VI. DISCUSSION In this pap er we ha ve developed a mathematical lan- guage for exploring the informational relationships b e- t ween models and observ ations. A cornerstone of this formalism is the distinction b et ween an abstract mo del and a statistical mo del. The former refers to a collection of physical principles enco ded in a parameterization, such as chemical reaction rates, and a set of rules for making quan tifiable predictions, such as the law of mass action. Application of these physical principles to sp ecific exp er- imen tal conditions leads to precise, quantifiable predic- tions that we refer to as the statistical mo del. In other w ords, a statistical mo del is a realization of an abstract mo del for sp ecific observ ations, so that there are many statistical mo dels that share a common parameter space. The distinguishabilit y of predictions for differen t pa- rameter v alues induces a metric on the parameter space. V arying the observ ations c hanges the statistical mo del, whic h in turn changes the geometric properties of the pa- rameter space, suc h as distance and curv ature. Ho wev er, if w e consider the observ ations for which the parameter- ization remains structurally identifiable (at least in the lo cal sense), then the resulting differen tial top ology c har- acterizes the abstract mo del. W e refer to the subsequen t top ological analysis as Information T op ology . W e ha ve seen that for many mo dels, top ological prop- erties can be visualized as a hierarchical graph known as a Hasse diagram. A Hasse diagram naturally re- v eals the hierarchical relationship among the mo del’s pa- rameters. W e iden tify the low-dimensional no des of the Hasse diagram as emergen t mo del classes that describ e the b eha vioral mo des the system. The complete sys- tem behavior is understoo d as a com bi nation of these mo des. Appro ximate effectiv e mo dels are systematically constructed from no des of low er dimension in the Hasse diagram. This pro cedure is made explicit b y the manifold b oundary approximation metho d[21]. F or hierarchical mo dels, we hav e sho wn how the topol- ogy of small mo dels is embedded in the larger mo del. In this w a y , we characterize b eha vioral mo des of the mo del as b eing either stable or unstable to v ariations in other parameter v alues. This distinction leads naturally to the classification of parameters as either relev ant or irrele- v ant dep ending on whether or not they need to b e tuned to realize a b eha vior. This classification is motiv ated b y the similar classification originating in renormalization group analysis. The framework presented in this pap er provides new to ols for understanding the relationship b et ween abstract and statistical mo dels and their manifest behaviors. The problem of mathematical mo deling in complex systems is an imp ortan t one that spans man y disciplines. W e an- ticipate that the concept of an information topology will b e useful for studying systems in statistical mechanics, biology , c hemistry , engineering, climate, and economics among others. Application of the concepts in this pap er ma y also b e useful for problems related to exp erimen tal design, engineering and control, as well as providing a deep er understanding and explanation of the emergen t ph ysical principles that gov ern b eha vior in complex sys- tems. The authors thank Lei Huang, Jim Sethna, and Chris My ers for suggesting Figure 1 and Kolten Barfuss and Alexander Sh umw ay for helpful con versations. [1] E. Wigner: Wigner, EP op. cit (1995) 534 [2] P . W. Anderson, et al. : Scienc e 177 (1972) 393 [3] B. B. Mach ta, R. Chachra, M. K. T ranstrum, J. P . Sethna: Scienc e 342 (2013) 604 [4] C. R. Rao: Sankhya: The Indian Journal of Statistics 9 (1949) 246 [5] E. Beale: Journal of the R oyal Statistic al So ciety. Series B (Metho dolo gical) (1960) 41 [6] D . M. Bates, D. G. W atts: Journal of the R oyal Statis- tic al So ciety. Series B (Metho dolo gic al) (1980) 1 16 [7] S.-i . Amari: Differ ential-ge ometric al metho ds in statis- tics : Springer (1985) [8] S.-I. Amari, O. E. Barndorff-Nielsen, R. Kass, S. Lau- ritzen, C. Rao: L e ctur e Notes-Mono gr aph Series (1987) i [9] R. E. Kass: Statistic al Scienc e (1989) 188 [10] M. K. Murray , J. W. Rice: Differ ential ge ometry and statistics , vol. 48: CR C Press (1993) [11] S.-i . Amari, H. Nagaok a: Metho ds of information ge om- etry , vol. 191: American Mathematical So c. (2007) [12] M. K. T ranstrum, B. B. Mac hta, J. P . Sethna: Physic al r eview letters 104 (2010) 060201 [13] M. K. T ranstrum, B. B. Mac hta, J. P . Sethna: Physic al R eview E 83 (2011) 036701 [14] K. S. Brown, J. P . Sethna: Physic al R eview E 68 (2003) 021904 [15] K. S. Brown, C. C. Hill, G. A. Calero, C. R. Myers, K. H. Lee, J. P . Sethna, R. A. Cerione: Physic al biolo gy 1 (2004) 184 [16] S. L. F rederiksen, K. W. Jacobsen, K. S. Bro wn, J. P . Sethna: Physic al r eview letters 93 (2004) 165501 [17] J. J. W aterfall, F. P . Casey , R. N. Gutenkunst, K. S. Bro wn, C. R. Myers, P . W. Brouw er, V. Elser, J. P . Sethna: Physic al r eview letters 97 (2006) 150601 [18] R. N. Gutenkunst, J. J. W aterfall, F. P . Casey , K. S. Bro wn, C. R. My ers, J. P . Sethna: PL oS c omputational biolo gy 3 (2007) e189 [19] F. P . Casey , D. Baird, Q. F eng, R. N. Gutenkunst, J. J. W aterfall, C. R. Myers, K. S. Brown, R. A. Cerione, J. P . Sethna: IET systems biolo gy 1 (2007) 190 [20] B. C. Daniels, Y.-J. Chen, J. P . Sethna, R. N. Gutenkunst, C. R. My ers: Curr ent opinion in biote ch- nolo gy 19 (2008) 389 [21] M. K. T ranstrum, P . Qiu: Physic al R eview L etters 113 (2014) 098701 [22] N. Goldenfeld (1992) [23] J. Zinn-Justin: Phase tr ansitions and renormalization gr oup : Oxford Universit y Press (2007) [24] M. C. Eisenberg, S. L. Rob ertson, J. H. Tien: Journal of the or etical biolo gy 324 (2013) 84 [25] T. J. Rothenberg: Ec onometric a: Journal of the Ec ono- metric So ciety (1971) 577 [26] C. Cob elli, J. J. Distefano 3rd: Americ an Journal of Physiolo gy-R e gulatory, Inte grative and Comp ar ative Physiolo gy 239 (1980) R7 [27] D. F aller, U. Klingmüller, J. Timmer: Simulation 79 (2003) 717 [28] K.-H. Cho, S.-Y. Shin, W. K olch, O. W olkenhauer: Sim- ulation 79 (2003) 726 [29] E. Balsa-Canto, A. A. Alonso, J. R. Banga: IET systems biolo gy 2 (2008) 163 [30] J. F. Apgar, J. E. T o ettc her, D. Endy , F. M. White, B. Tidor: PL oS c omputational biolo gy 4 (2008) e30 [31] J. F. Apgar, D. K. Witmer, F. M. White, B. Tidor: Mole cular BioSystems 6 (2010) 1890 [32] K. Erguler, M. P . Stumpf: Mole cular BioSystems 7 (2011) 1593 [33] R. Chac hra, M. K. T ranstrum, J. P . Sethna: Mole cular BioSystems 7 (2011) 2522 [34] M. K. T ranstrum, P . Qiu: BMC bioinformatics 13 (2012) 181 [35] M. K. T ranstrum: arXiv pr eprint (2016) [36] B. Grunbaum, V. Klee, M. A. P erles, G. C. Shephard: Convex p olytop es : Springer (1967) [37] G. M. Ziegler: L e ctures on p olytop es , vol. 152: Springer Science & Business Media (1995) [38] A. Brondsted: An intr o duction to c onvex p olytop es , v ol. 90: Springer Science & Business Media (2012) Supplemen tal Information A. Visualization Metho ds Most examples throughout the text w ere chosen to con- sist of either tw o or three parameters, corresp onding to manifolds of tw o or three dimensions, so that their top o- logical structure could b e more easily visualized. How- ev er, these manifolds are em b edded in a data space of often muc h higher dimension. W e therefore pro ject this high-dimensional embedding space into a three dimen- sional subspace in order to generate the figures found in the main text. In order to generate lo w-dimensional pro jections, w e first constructed a grid of the parameter space and ev al- uated the model prediction at eac h point. The corre- sp onding grid of mo del predictions correspond to a grid of vectors in the high-dimensional embedding space. W e then p erformed a principle component analysis of this grid of prediction vectors and pro jected the grid on to eac h of the principle comp onents. W e then created the visualization b y truncating all but the first three prin- ciple comp onen ts. F or tw o-dimensional manifolds, we ha ve colored the manifold according to the fourth princi- ple comp onent, effectiv ely visualizing a four-dimensional em b edding. F or three-dimensional manifolds, i.e., the enzyme-substrate model, we hav e colored eac h boundary of the volume a solid color to help illustrate the top olog- ical structure. B. Em b edding space for non least-squares mo dels W e hav e used the Fisher Information Matrix (FIM) as the Riemannian metric for measuring distances betw een mo dels. When the mo del in question is a least-squares mo del, then there is a natural Euclidean metric in be- ha vior space corresp onding to the sum of squares defined b y the χ 2 cost function. This Euclidean metric induces a non-euclidean metric on the embedded model that is equiv alent to the Fisher Information[4 – 13]. A similar Euclidean embedding space can b e found for an arbitrary probabilistic mo del P ( ξ , θ ) . W e derive this metric for the case of discrete probability distributions, but the result also holds for probability densities. This em b edding space can b e found b y constructing the v ari- able z i ( θ ) = 2 p P i ( θ ) where the index i lab els each con- figuration of the random v ariable ξ . Assuming a Eu- 17 clidean metric on the vector z , gives ds 2 = X i dz 2 i = X i dP 2 i P i = dP 2 P 2 , (22) where in the last term we ha ve written the sum as an exp ectation v alue. Note that dP P = d log P = X µ ∂ P ∂ θ µ dθ µ , (23) from whic h it follo ws that ds 2 = X µν ∂ log P ∂ θ µ ∂ log P ∂ θ ν dθ µ dθ ν , (24) so that the induced metric on the manifold is g µν = ∂ log P ∂ θ µ ∂ log P ∂ θ ν , (25) whic h is the Fisher Information Metric. Notice that normalization requires that P i z 2 i = 4 P i P 2 i = 4 , so that the v ariables z i corresp ond to a h yp er-sphere embedded in a Euclidean data space. The mo del manifold is then embedded in this hyper-sphere. W e interpret this Euclidean, h yp er-sphere embedding in the same wa y as the Euclidean embedding space for least squares models, i.e., as a data or b ehavior space. C. T op ology and alternate metrics In the main text of the pap er, w e ha ve used the Fisher Information Matrix (FIM) as the metric that defines rel- ativ e distances betw een different statistical mo dels that are realized from the same abstract mo del. Since an abstract mo del is c haracterized b y its parameterization, w e found that c hanging the observ ations lead to differ- en t metrics on the same parameter space. Although the FIM has man y desirable statistical prop erties, it is not the unique measure of statistical distance b etw een probabilit y distributions. Other notable examples in- clude the Hellinger distance, total v ariation distance, the Levy-Prokhoro v metric, Bhattacharyy a distance, earth mo ver’s distance (also kno wn as the W asserstein or Kan- toro vich metric) and the energy distance. F urthermore, there are man y other distance measures that are applica- ble to specific types of mo dels and not probabilit y distri- butions generally , such as the H ∞ norm in con trol the- ory applicable to dynamical systems. (While all of these metrics are v alid measures of statistical distance, they are not all Riemannian metrics , i.e., they do not define an inner pro duct. Indeed, one of the adv antages of the FIM is that it corresponds to a Riemannian metric, th us allo wing the use of differential geometry .) Changing metrics among the many p ossible choices is in many wa ys analogous to c hanging observ ations. The net effect is to c hange the meaning of distance for pa- rameter v alues in the mo del. As suc h, the results of this pap er can b e applied to alternative metrics. F urthermore, there are many measures of statistic al diver genc e that, while not properly metrics (they ma y not satisfy either symmetry or the triangle inequality), are often interpreted as a type of distance measure. A class of such div ergences are the f-divergences, among whic h the Kullback-Leibler divergence is a well-kno wn example. (In terestingly , the FIM corresp onds to the f-div ergence for infinitesimally separated distributions.) Other classes of divergences include the M-divergences and S-div ergences. Sp ecific examples include Renyi’s di- v ergence and the Jensen-Shannon divergence. Although not prop erly metrics, statistical divergences do define a top ological space. Consquently , the top ological prop er- ties of abstract mo del can realized using a divergence rather than a prop er metric.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment