Parameter identifiability and redundancy: theoretical considerations

In this paper we outline general considerations on parameter identifiability, and introduce the notion of weak local identifiability and gradient weak local identifiability. These are based on local properties of the likelihood, in particular the ran…

Authors: Mark P. Little, Wolfgang F. Heidenreich, Guangquan Li

Parameter Identifiability and Redundancy: Theoreti cal Considerations Mark P. Little 1 * , Wolfgang F. Heidenreich 2 , Guangquan Li 1 1 Department of Epidemiology and Public Health, Imperial College Faculty of Medicine, St Mary’s Campus, London, United Kingdom, 2 Institut fu ¨ r Strahlenschutz, Helmholtz Zentrum Mu ¨ nchen, German Research Center for Environmental Health, Ingolsta ¨ dter Landstras se, Neuherberg, Germany Abstract Background: Models for complex biological systems may involve a large number of parameters. It may well be that some of these parameters cannot be derived from observed data via regression techniques. Such parameters are said to be unidentifiable, the remaining parameters being identifiable. Closely related to this idea is that of redundancy, that a set of parameters can be expressed in terms of some smaller set. Before data is analysed it is critical to determine which model parameters are identifiable or redundant to avoid ill-defined and poorly convergent regression. Methodology/Principal Findings: In this paper we outline general considerations on parameter identifiability, and introduce the notion of weak local identifiability and gradient weak local identifiability. These are based on local properties of the likelihood, in particular the rank of the Hessian matrix. We relate these to the notions of parameter identifiability and redundancy previously introduced by Rothenberg ( Econometrica 39 (1971) 577–591) and Catchpole and Morgan ( Biometrika 84 (1997) 187–196). Within the widely used exponential family, parameter irredundancy, local identifiability, gradient weak local identifiability and weak local identifiability are shown to be largely equivalent. We consider applications to a recently developed class of cancer models of Little and Wright ( Math Biosciences 183 (2003) 111–134) and Little et al. ( J Theoret Biol 254 (2008) 229–238) that generalize a large number of other recently used quasi-biological cancer models. Conclusions/Significance: We have shown that the previously developed concepts of parameter local identifiability and redundancy are closely related to the apparently weaker properties of weak local identifiability and gradient weak local identifiability—within the widely used exponential family these concepts largely coincide. Citation: Little MP, Heidenreich WF, Li G (2010) Parameter Identifiability and Redundancy: Theoretical Consideratio ns. PLoS ONE 5(1): e8915. doi:10.1371/ journal.pone.0008915 Editor: Fabio Rapallo, University of East Piedmont, Italy Received Novembe r 23, 2009; Accepted Jan uary 8, 2010; Published January 27, 2010 Copyright: ß 2010 Little et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distri bution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was funded partially by the European Commission under contracts FI6R-CT-2003-508842 (RISC-RAD) and FP6-036 465 (NOTE). The funders had no role in study design, data collection and anal ysis, decision to publish, or preparat ion of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: mark.little@imperial.a c.uk Introduction Models for compl ex bio logical systems may involv e a large number of parameters. It may well be that some of these parame ters cannot be derived from observed data via regression techniques. Such parame ters are said to be unident ifiable or non-id entifia ble, the remaining parameters being identifia ble. Closel y related to this idea is that of redundanc y, that a set of parameters can be expres sed in terms of some smaller set. Before data is analysed it is critical to determi ne which model parame ters are identifia ble or redundant to avoid ill-define d and poorly convergen t regression. Identif iability in stocha stic model s has been considered previo usly in var ious conte xts. Rothenber g [1] and Silvey [2] (pp. 50, 81) defined a set of paramete rs for a model to be identif iable if no two sets of paramete r values yield the same distribution of the data. Catchpole and Morgan [3] conside red identifiab ility and parame ter redundancy and the relatio ns between them in a general class of (expone ntial fam ily) model s. Rothe nberg [1], Jacquez and Perry [4] and Catchpole and Morgan [3] also defined a notion of local identi fiability, which we sha ll ext end in the Analysis Section. [There is also a large literature on identifabilit y in determi nistic (rathe r tha n stochasti c) models, for example the paper s of Audoly et al. [5], and Bellu [6], which we shall not consid er further.] Catchpole et al. [7] and Gimenez et al. [8] outlin ed use of computer algeb ra techni ques to determine numbers of identifiab le parameters in the exponenti al family. Vial lefont et al. [9] con sidered parame ter identifiab ility issues in a genera l setting, and outlined a method based on considering the rank of the Hessian for determi ning identifia ble parameters ; howeve r, some of their claimed results are incorrect (as we outline briefly later). Gimenez et al. [8] used Hessia n-based techniq ues, as well as a number of purel y numerical techniq ues, for determining the number of ident ifiable paramete rs. Further general observations on parameter ide ntifiabilit y and its relation to propertie s of sufficient statist ics are given by Picci [10], and a more recent review of the literat ure is given by Paulin o and de Bragan c ¸ a Pereira [11]. In this paper we outline some general considerations on parameter identifiability. We shall demonstrate that the concepts of parameter local identifiability and redundancy are closely related to apparently weaker properties of weak local identifiability and gradient weak local identifiability that we introduce in the Analysis Section. These latter properties relate to the uniqueness of likelihood maxima and likelihood turning points within the vicinity of sets of parameter values, and are shown to be based on local properties of the likelihood, in particular the rank of the Hessian PLoS ONE | www.plosone.org 1 January 2010 | Volume 5 | Issue 1 | e8915 matrix. Within the widely-used exponential family we demonstrate that these concepts (local identifiability, redundancy, weak local identifiability, gradient weak local identifiability) largely coincide. We briefly consider applications of all these ideas to a recently developed general class of carcinogenesis models [12,13,14], presenting results that generalize those of Heidenreich [15] and Heidenreich et al. [16] in the context of the two-mutation cancer model [17]. These are outlined in the later parts of the Analysis and the Discussion, and in more detail in a companion paper [12]. Analysis General Considerations on Parameter Identifiability As outline d in the Introduction , a genera l criteri on for parameter identi fiability has been set out by Jacquez and Perry [4]. They proposed a simple linearizat ion of the problem, in the contex t of models with normal err or. The y defined a notion of local identi fiability, which is that in a local region of the parameter space, there is a unique h 0 that fits some specifie d body of data, x i , y i ðÞ n i ~ 1 , i.e. for which the model predicted mean hx j h ðÞ is such that the residual sum of squares: S ~ X n l ~ 1 y l { hx l j h ðÞ ½ 2 ð 1 Þ has a un ique minimum. We present here a str aightfo rward general ization of this to other error struct ures. If the model prediction hx ðÞ ~ hx j h ðÞ for the observed data y is a function of some vector parame ters h ~ h j  p j ~ 1 then in general it can be assume d, under the general equ ivalence of lik elihood maximi zation and iterati vely reweighted least squares for generalized linear models [18](chapt er 2) that one is trying to minimiz e: S ~ X n l ~ 1 1 v l y l { hx l j h 0 ðÞ { X p j ~ 1 L hx l j h ðÞ L h j     h ~ h 0 : D h j "# 2 ð 2 Þ where y l 1 ƒ l ƒ n ðÞ n § p ðÞ is the observed measurement (e.g., the numbers of observed cases in the case of binomial or P oisson models) at point l and the v l 1 ƒ l ƒ n ðÞ are t he current es t imates of variance at each point. This has a unique minimum in the perturbing D h ~ D h j  p j ~ 1 ( h ~ h 0 z D h )g i v e nb y H T DH D h ~ H T D d ,w h e r e d l ðÞ n l ~ 1 ~ y l { hx l j h 0 ðÞ ðÞ n l ~ 1 , H lj  n , p l ~ 1, j ~ 1 ~ L hx l j h ðÞ L h j     h ~ h 0 ! n , p l ~ 1, j ~ 1 , D ~ diag 1 = v 1 ,1 = v 2 , ::: ,1 = v n ½ , whenever H T DH has full r a nk ( = p ). More generally, suppose that the likelihood associated with observation x l is l ( x l j h ) and let Lx l j h ðÞ ~ ln lx l j h ðÞ ½ . Then generalizing the least squares criterion (1) we now extend the definition of local identifiability to mean that there is at most one maximum of: L ~ Lx j h ðÞ ~ X n l ~ 1 Lx l j h ðÞ ð 3 Þ in the neighborhood of any given h [ V 5 R p . More formally: Definitions 1. A set of parameters h i ðÞ p i ~ 1 is identifiable if for any h [ V there are no d [ V \ f h g for which Lx j d ðÞ ~ Lx j h ðÞ x almost everywhere a : e : ðÞ ðÞ . A set of parameters h i ðÞ p i ~ 1 is locally identifiable if there exists a neighborhood N [Q h such that for no d [ N \ h fg is Lx j d ðÞ ~ Lx j h ðÞ xa : e : ðÞ . A set of parameters h i ðÞ p i ~ 1 is weakly locally identifiable if there exists a neighborhood N [Q h and data x ~ x 1 , ::: , x n ðÞ [ S n such that the log-likelihood L ~ Lx j h ðÞ ~ P n l ~ 1 Lx l j h ðÞ is maximized by at most one set of h _ [ N . If L ~ Lx j h ðÞ is C 1 as a function of h [ V a set of parameters h i ðÞ p i ~ 1 [ int V ðÞ is gradient weakly locally identifiable if there exists a neighborhood N [Q h and data x ~ x 1 , ::: , x n ðÞ [ S n such that L Lx j h _  L h _ i 0 B @ 1 C A p i ~ 1 ~ 0 (i.e., h _ is a turning point of L ( x j h ) ) for at most one set of h _ [ N . Our definitions of identifiability and local identifiability coincide with those of Rothenberg [1], Silvey [2](pp. 50, 81) and Catchpole and Morgan [3]. Rothenberg [1] proved that if the Fisher information matrix, I ~ I ( h ) , in a neighborhood of h [ int V ðÞ is of constant rank and satisfies various other more minor regularity conditions, then h [ int V ðÞ is locally identifiable if and only if I ( h ) is non-singular. Clearly identifiability implies local identifiability, which in turn implies weak local identifiability. By the Mean Value Theorem [19](p. 107) gradient weak local identifiability implies weak local identifiability. Heuristically, (gradient) weak local identifiability happens when: 0 ~ L L L h i ~ X n l ~ 1 L L ( x l j h ) L h i ~ X n l ~ 1 L L ( x l j h 0 ) L h i z X p j ~ 1 L 2 L ( x l j h 0 ) L h i L h j : D h j "# z O ( j D h j 2 ), 1 ƒ i ƒ p ð 4 Þ and in general this system of p equations has a unique solution in D h ~ D h j  p j ~ 1 in the neighborhood of h 0 (assumed [ int V ðÞ ) whenever P n l ~ 1 L 2 L ( x l j h 0 ) L h i L h j ! p i , j ~ 1 has full rank ( = p ). This turns out to be (nearly) the case, and will be proved later (Corollary 2). More rigorously, we have the following result. Theorem 1. Suppose that the log-likelihood L ( x j h ) is C 2 as a function of the parameter vector h [ V 5 R p , for all x ~ ( x 1 , ::: , x n ) [ S n . (i) Suppose that for some x and h [ int( V ) it is the case that rk L 2 L ( x j h ) L h i L h j ! p i , j ~ 1 2 4 3 5 ~ p . Then turning points of the likelihood in the neighborhood of h are isolated, i.e., there is an open neighborhood N [Q h 5 V for which there is at most one h _ [ N that satisfies L L ( x j h ) L h i  p i ~ 1     h ~ h _ ~ 0 . (ii) Suppose that for some x and h [ int( V ) it is the case that rk L 2 L ( x j h ) L h i L h j ! p i , j ~ 1 2 4 3 5 ~ p then local maxima of the likeli- hood in the neighborhood of h are isolated, i.e., there is an open neighborhood N [Q h 5 V for which there is at most one h _ [ N that is a local maximum of L ( x j h ) . (iii) Suppose that for some x and all h [ int( V ) it is the case that rk L 2 L ( x j h ) L h i L h j ! p i , j ~ 1 2 4 3 5 ~ r v p then all local maxima of the likelihood in int( V ) are not isolated, as indeed are all h [ int( V ) for which L L ( x j h ) L h i  p i ~ 1 ~ 0 . PLoS ONE | www.plosone.org 2 January 2010 | Volume 5 | Issue 1 | e8915 Parameter Identifiability We prove this result in Text S1 Section A. As an immediate consequence we have the following result. Corollary 1. For a given x ~ ( x 1 , ::: , x n ) [ S n , a sufficient condition for the likelihood (3) to have at most one maxi- mum and one turning point in the neighborhood of a given h ~ ( h 1 , ::: , h p ) [ int( V ) is that rk L 2 L ( x j h ) L h i L h j ! p i , j ~ 1 2 4 3 5 ~ p .I n particular, if this condition is satisfied h is gradient weakly locally identifiable (and therefore weakly locally identifiable). ( V 5 R p is the parameter space.) That this condition is not necessary is seen by consideration of the likelihood l ( x j h ) ~ C : exp { P p i ~ 1 ½ x i { h i  4  , where C is chosen so that this has unit mass. Then L 2 L ( x j h ) L h i L h j ~{ 12 : ½ x i { h i  2 : d ij which has rank 0 at h ~ x and a unique maximum there. In particula r, this shows that the result claime d by Viallef ont et al. [9](propo sition 2, p. 322) is incorrect. Definitions 2. A subset of param eters h p ( i )  k i ~ 1 (for some per- mutation p : f 1,2, :: : , p g ? f 1,2, ::: , p g )i s weakly maximal (respectively weakly gradient maximal) if for any permissibl e fixed h p ( i )  p i ~ k z 1 (such that V ( h p ( i ) ) p i ~ k z 1 k , p ~ h p ( i )  k i ~ 1 : h 1 , ::: , h k , h k z 1 , ::: , h p  [ V no = 1 ) h p ( i )  k i ~ 1 is weakly locally identif iable (respecti vely gr adient weakl y locall y i dentifi able) at th at point, bu t that th is is not th e case f or any l arger nu mber of pa rameter s. A subs et of pa rameters h p ( i )  k i ~ 1 is str on gly max im al (res pectively strongly gradient maxima l) if for any permissi ble fix ed h p ( i )  p i ~ k z 1 and any open U 5 V ( h p ( i ) ) p i ~ k z 1 k , p , h p ( i )  k i ~ 1 restr icted to th e set U is weakl y max imal ( respecti vely we akly gr adient max imal), i.e., all h 0 p ( i )  k i ~ 1 [ U ar e weakly maximal ( respectiv ely weak ly grad ie nt maximal). From this it easily follows that a strongly (gradient) maximal set of parameters h p ( i )  k i ~ 1 is a fortiori weakly (gradient) maximal at all points h 0 p ( i )  k i ~ 1 [ V ( h p ( i ) ) p i ~ k z 1 k , p for any permissible h p ( i )  p i ~ k z 1 . Assume now that k of the p h i are a weakly maximal set of parame- ters. So for some permutation p : f 1,2, ::: , p g ? f 1,2, ::: , p g and for any permissible fixed ( h p ( i ) ) p i ~ k z 1 and any ( h p ( i ) ) k i ~ 1 [ V ( h p ( i ) ) p i ~ k z 1 k , p 5 R k t h e r ei sa no p e nn e i g h b o r h o o d N [Q ( h p ( i ) ) k i ~ 1 5 V ( h p ( i ) ) p i ~ k z 1 k , p and some data x ~ ( x 1 , ::: , x n ) [ S n for which L ( h p ( i ) ) p i ~ k z 1 x j h p ( i )  k i ~ 1  is m a x i m i z e db ya tm o s to n es e to f ( h _ p ( i ) ) k i ~ 1 [ N , but that this is not the case for any larger number of parameters. Assume that r ~ max rk L 2 L ( h p ( i ) ) p i ~ k z 1 ( x j ( h p ( i ) ) k i ~ 1 ) L h p ( i ) L h p ( j ) ! k i , j ~ 1 2 4 3 5 : ( h p ( i ) ) k i ~ 1 [ N 8 < : 9 = ; v k .I f L is C 2 as a function of h then it follows easily that V k , r ~ ( h p ( i ) ) k i ~ 1 [ N : rk L 2 L ( h p ( i ) ) p i ~ k z 1 ( x j ( h p ( i ) ) k i ~ 1 ) L h p ( i ) L h p ( j ) ! k i , j ~ 1 2 4 3 5 ~ r 8 < : 9 = ; must be an open non-empty subset of N . By T heorem 1 (iii) any h _ [ V k , r which maximizes L ( h p ( i ) ) p i ~ k z 1 in V k , r cannot be isolated, a contradic- tion (unless there are no maximizing h _ [ V k , r ). Therefore, either there are no maximizing h _ [ V k , r or for at l east one h _ [ N rk L 2 L ( h p ( i ) ) p i ~ k z 1 ( x j ( h p ( i ) ) k i ~ 1 ) L h p ( i ) L h p ( j ) ! k i , j ~ 1       ( h p ( i ) ) k i ~ 1 ~ h _ 2 6 4 3 7 5 ~ k .T h i si m p l i e s that rk L 2 L ( x j h ) L h i L h j ! p i , j ~ 1       h ~ h _ 0 2 4 3 5 § k ,w h e r e h _ 0 ~ ( h _ ) | ( h p ( i ) ) p i ~ k z 1 in the obvious sense. Assume now that the ( h p ( i ) ) k i ~ 1 are s trongly maximal. Suppose that fo r som e h 1 ~ h 1 i ðÞ p i ~ 1 [ V and some x ~ ( x 1 , ::: , x n ) [ S n it is the case that rk L 2 L ( x j h ) L h i L h j ! p i , j ~ 1       h ~ h 1 2 4 3 5 w k .B e c a u s e L 2 L ( x j h ) L h i L h j ! p i , j ~ 1       h ~ h 1 is sym me tri c, th ere is a pe rm uta ti on p 0 : f 1, ::: , p g ? f 1, ::: , p g for wh ich rk L 2 L ( x j h ) L h p 0 ( i ) L h p 0 ( i ) ! k z 1 i , j ~ 1       h ~ h 1 2 4 3 5 ~ k z 1 [20](p. 79). If L is C 2 as a funct io n of h this will be the case in some open neighborhood N 0 [Q ( h 1 p 0 ( i ) ) k z 1 i ~ 1 5 R k z 1 . By Theo re m 1 (ii ) t hi s im pli es that t he parameters h p 0 ( i )  k z 1 i ~ 1 ha ve at most one ma xi mum in N 0 ,s ot h a t h p ( i )  k i ~ 1 is not a stro ngl y ma xi mal set of par am ete rs in N 0 .W i t h small changes everything above also goes through with ‘‘weakly gradient maxim al’’ subs t ituted for ‘‘weakly maximal’’ and ‘‘strongly gr adi en t ma xi ma l’’ subs ti tu ted for ‘‘st ro ngl y m ax ima l’ ’. There fo re we hav e pro ve d the foll o win g resu l t. Theorem 2. Let L ( x j h ) be C 2 as a function of h [ V 5 R p for all x [ P n . (i) If there is a weakly maximal (respectively weakly gradient maximal) subset of k parameters, ( h p (1) , h p (2) , ::: , h p ( k ) ) (for some permutation p : f 1,2, ::: , p g ? f 1,2, ::: , p g ), and for fixed ( h p ( i ) ) p i ~ k z 1 and some x ~ ( x 1 , ::: , x n ) [ S n L ( h p ( i ) ) p i ~ k z 1 ( x j ( h p ( i ) ) k i ~ 1 ) ha s a max imu m (re spe ct ive ly turn i ng poi nt) on the set of h where rk L 2 L ( h p ( i ) ) p i ~ k z 1 ( x j ( h p ( i ) ) k i ~ 1 ) L h p ( i ) L h p ( j ) ! k i , j ~ 1 2 4 3 5 is maxima l then max rk L 2 L ( h p ( i ) ) p i ~ k z 1 ( x j ( h p ( i ) ) k i ~ 1 ) L h p ( i ) L h p ( j ) ! k i , j ~ 1 2 4 3 5 : 8 < : ( h p ( i ) ) k i ~ 1 [ V ( h p ( i ) ) p i ~ k z 1 k , p 9 = ; ~ k and max rk L 2 L ( x j h ) L h i L h j ! p i , j ~ 1 2 4 3 5 : 8 < : h [ V 9 = ; § k . (ii) If there is a strongly maximal (respectively strongly gra- dient maximal) subset of k parameters, ( h p (1) , h p (2) , ::: , h p ( k ) ) (for some permutation p : f 1,2, ::: , p g ? f 1,2, ::: , p g )t h e n rk L 2 L ( x j h ) L h p ( i ) L h p ( j ) ! k i , j ~ 1 2 4 3 5 ƒ k V h [ V . All further results in this Section assume that the model is a member of the exponential family, so that if the observed data x ~ ( x l ) n l ~ 1 [ P n then the log-likelihood is given by L ( x j h ) ~ P n l ~ 1 x l z l { b ( z l ) a ( w ) z c ( x l , w )  for some functions a ( w ), b ( z ), c ( x , w ) . We assume that the natural parameters z l ~ z l h i ðÞ p i ~ 1 , z l  are functions of the model parameters ( h i ) p i ~ 1 and some auxiliary data z l , but that the scaling parameter w is not. Let m l ~ b 0 ( z l ) ~ E ½ x l  , so that m l ~ b 0 ( z l ½ ( h i ) p i ~ 1 , z l  ) . In all that follows we shall assume that the function b ( z ) is C 2 . The following definition was introduced by Catchpole and Morgan [3]. Definition 3. With the above notation, a set of parameters ( h i ) p i ~ 1 [ V is parameter redundant for an exponential family model if m l ~ b 0 ( z l ½ ( r i ) q i ~ 1 , z l  ) can be expressed in terms of some strictly PLoS ONE | www.plosone.org 3 January 2010 | Volume 5 | Issue 1 | e8915 Parameter Identifiability smaller parameter vector ( r i ) q i ~ 1 ( q v p ). Otherwise, the set of parameters ( h i ) p i ~ 1 is parameter irredundant or full rank. Catchpole and Morgan [3] proved (their Theorem 1) that a set of parameters is parameter redundant if and only if rk L m l L h i  np l ~ 1, i ~ 1 "# v p . They defined full rank models to be essentially full rank if rk L m l L h i  np l ~ 1, i ~ 1 "# ~ p for every ( h i ) p i ~ 1 [ V ; if rk L m l L h i  np l ~ 1, i ~ 1 "# ~ p only for some ( h i ) p i ~ 1 [ V then the parameter set is condit ionally full rank. They also showed (their Theorem 3) that if I ~ I ( h ) is the Fisher informatio n matrix then rk L m l L h i  np l ~ 1, i ~ 1 "# ~ rk ½ I ( h )  , and that parameter redun dancy implies lack of local identifiabili ty; indeed their proof of Theorems 2 and 4 showed that there is also lack of weak local identif iabilit y (respec tively gradient weak local identifiab ility) for all ( h i 0 ) p i ~ 1 [ V which for some x ~ ( x l ) n l ~ 1 [ P n are local maxima (respe ctively turning points) of the likeli hood. Assume that h ~ ( h i ) p i ~ 1 are an essentially full rank set of parameters for the model. From the above result for every h ~ ( h i ) p i ~ 1 [ V rk L m l L h i  np l ~ 1, i ~ 1 "# ~ rk ( I ( h )) ~ p . Therefore, since E L 2 L ( x j h ) L h i L h j "# ~{ E L L ( x j h ) L h i L L ( x j h ) L h j  ~{ I ( h ) is of full rank and so negativ e definite, so by the strong law of large nu mbers we can choose x ~ ( x l ) n l ~ 1 [ P n so that the same is true of L 2 L ( x j h ) L h i L h j ~ X n l ~ 1 x l { b 0 ( z l ) a ( w )  L 2 z l L h i L h j { b 00 ( z l ) a ( w ) L z l L h i L z l L h j () .T h i si m p l i e s that on some N [Q h 5 R p L 2 L ( x j h ) L h i L h j ~ X n l ~ 1 x l { b 0 ( z l ) a ( w )  L 2 z l L h i L h j { ( b 00 ( z l ) a ( w ) L z l L h i L z l L h j 9 = ; is of full rank, and therefore by Corollary 1 h ~ ( h i ) p i ~ 1 is (gradient ) weakly locally identifiable . Furthermore, the abov e argument shows that if h ~ ( h i ) p i ~ 1 are a con di- tional ly full rank set of par ameters then on the (op en) set V p ~ h ~ h i ðÞ p i ~ 1 [ V : rk L m l L h i  np l ~ 1, i ~ 1 "# ~ p () h ~ ( h i ) p i ~ 1 is gra - dient weakly locally identifiab le. We have therefore proved: Theorem 3. Let L ( x j h ) belong to the exponential family and be C 2 as a function of h [ V 5 R p for all x [ P n . (i) If the parameter set h ~ ( h i ) p i ~ 1 is parameter redundant then it is not locally identifiable, and is not weakly locally identifiable (respectively gradient weakly locally identifiable) for all ( h i 0 ) p i ~ 1 [ V which for some x ~ ( x l ) n l ~ 1 [ P n are local maxima (respectively turning points) of the likelihood. (ii) If the parameter set h ~ ( h i ) p i ~ 1 is of essentially full rank then for some x ~ ( x l ) n l ~ 1 [ P n L 2 L ( x j h ) L h i L h j is of full rank and the refore h ~ ( h i ) p i ~ 1 is gradient weakly locally ident ifiab le ( a nd so we akly local ly id entif i able) for all h ~ ( h i ) p i ~ 1 [ V . (iii) If the parameter set h ~ ( h i ) p i ~ 1 is of conditionally full rank then it is gradient weakly locally identifiable on the open set V p ~ h ~ h i ðÞ p i ~ 1 [ V : rk L m l L h i  np l ~ 1, i ~ 1 "# ~ p () . Remarks: It should be noted that part (i) of this generalizes part (i) of Theorem 4 of Catchpole and Morgan [3], who proved that if a model is parameter redundant then it is not locally identifiable. However, some components of part (ii) (that being essentially full rank implies gradient weak local identifiability) is weaker than the other result, proved in part (ii) of Theorem 4 of Catchpole and Morgan [3], namely that if a model is of essentially full rank it is locally identifiable. As noted by Catchpole and Morgan [3] (pp. 193–4), there are exponential-family models that are conditionally full rank, but not locally identifiable, so part (iii) is about as strong a result as can be hoped for. From Theorem 3 we deduce the following. Corollary 2. Let L ( x j h ) belong to the exponential family and be C 2 as a function of h [ V 5 R p for all x [ P n . Then (i) If for some subset of parameters ( h p ( i ) ) k i ~ 1 and some x ~ ( x 1 , ::: , x n ) [ S n it is the case that rk L 2 L ( x j h ) L h p ( i ) L h p ( j ) ! k i , j ~ 1 2 4 3 5 ~ k then this subset is gradient weakly locally identifiable at this point. (ii) If a su b s e t o f p ar am et er s ( h p ( i ) ) k i ~ 1 is weakly local ly identifiable and for some x [ P n this point is a local maximum of the likelihood then it is parameter irredundant, i.e., of full rank, so rk ½ I ( h )  ~ k , so that for some x 0 [ P n 0 rk L 2 L ( x 0 j h ) L h p ( i ) L h p ( j ) ! k i , j ~ 1 2 4 3 5 ~ k . In particular, if this holds for all h [ V then parameter irredundancy, local identifiability, gradient weak local identifiability and weak local identifia- bility are all equivalent. Proof. This is an immediate consequence of the remarks after Definition 1, Corollary 1, Theorem 3 (i) and Theorems 1 and 3 of Catchpole and Morgan [3]. QED . Remarks: (i) By the rema rks prece ding Theo rem 3 the con ditions of part (i) (that for some x ~ ( x 1 , ::: , x n ) [ S n it is the case that rk L 2 L ( x j h ) L h i L h j ! k i , j ~ 1 2 4 3 5 ~ k ) are automatical ly sat isfied if h ~ ( h i ) k i ~ 1 are an essentiall y full rank set of parameter s for the model. (ii) Assume the model is constructed from a stochastic cancer model embedded in the exponential family, in the sense outlined in Text S1 Section B, so that the natural parameters z l ~ z l ½ ( h i ) p i ~ 1 , z l  are functions of the model parameters ( h i ) p i ~ 1 and some auxiliary data ( z l ) n l ~ 1 , and the means are given by m l ~ b 0 ( z l ½ ( h i ) p i ~ 1 , z l  ) ~ z l : h ½ ( h i ) p i ~ 1 , y l  , where h ½ ( h i ) p i ~ 1 , y l  is the cancer hazard fu nction. In this case, as shown in Text S1 Sectio n B, L 2 L ( x j h ) L h i L h j ~ X n l ~ 1 ½ x l { b 0 ( z l )  z l a ( w ) b 00 ( z l ) L 2 h ( h , y l ) L h i L h j { z l 2 a ( w ) L h ( h , y l ) L h i L h ( h , y l ) L h j ½ b 00 ( z l )  2 z b 000 ( z l ) ½ x l { b 0 ( z l )  ½ b 00 ( z l )  3 () 2 6 6 6 4 3 7 7 7 5 . The second term inside the summat ion { z l 2 a ( w ) L h ( h , y l ) L h i L h ( h , y l ) L h j  ½ b 00 ( z l )  2 z b 000 (( z l ) ½ x l { b 0 ( z l )  ½ b 00 ( z l )  3 () 1 A p i , j ~ 1 is a rank 1 matrix and can be made small in relation to the first term, e.g., by making z l small. Therefo re finding data ( x , y , z ) ~ ( x 1 , ::: , x n , y 1 , ::: , y n , z 1 , ::: , z n ) [ S n for which rk L 2 L ( x j h ) L h p ( i ) L h p ( j ) ! k i , j ~ 1 2 4 3 5 ~ k is equivalent to finding data for PLoS ONE | www.plosone.org 4 January 2010 | Volume 5 | Issue 1 | e8915 Parameter Identifiability which rk L 2 h ( h , y l ) L h p ( i ) L h p ( j ) ! k i , j ~ 1 2 4 3 5 ~ k , or by the result of Dickson [20](p. 79) for which rk L 2 h ( h , y l ) L h i L h j ! p i , j ~ 1 2 4 3 5 ~ k . Hessian vs Fis her Information Matrix as a Method of Determining Redundancy and Identifiability in Generalised Linear Models We, as with Catchpole and Morgan [3], emphasise use of the Hessian of the likelihood rather than the Fisher informa- tion matrix considered by Rothenberg [1]. In the context of GLMs, we have L ( x j h ) ~ P n l ~ 1 x l z l { b ( z l ) a ( w ) z c ( x l , w )  and g ( m i ) ~ g ( b 0 ( z i )) ~ P p j ~ 1 A ij h j for some link function g () and fixed matrix A . We define D ij ~ L m j L h i ~ 1 g 0 (( m j ) A ji ~ A T G { 1 ðÞ ij where G ~ diag g 0 ( m 1 ), g 0 ( m 2 ), ::: , g 0 ( m n ) Þ ð . Theorem 1 of Cat chpole and Mor gan [3] states that a model is parameter irredunda nt if and only if rk ½ D  ~ p . The score vecto r is given by U i ~ L L ( x j h ) L h i ~ P n l ~ 1 ½ x l { m l  a ( w ) L z l L h i ~ X n l ~ 1 ½ x l { m l  b 00 ( z l ) a ( w ) L m l L h i ~ 1 a ( w ) D D ( x { m ) ðÞ i where D ~ diag 1 b 00 ( z 1 ) , 1 b 00 ( z 2 ) , ::: , 1 b 00 ( z n )  . The Fisher information is therefo re given by I ( h ) ~ EU U T ½ ~ 1 a ( w ) 2 D D V D D T where V ~ Ex i { m i ½ x j { m j    i , j is the data variance. Theorem 1 of Rothen berg [1] states that a model is locally identif iable if and only if rk ½ I ( h )  ~ p . As above (Corollary 2 (ii)), heurist ically parameter irredund ancy, local identifiab ility, gradient weak local identi fiability and weak local identifiabili ty are all equivalen t and occur whenever rk ( D D V D D T ) ~ rk ( D ) ~ p . Clearly evalua ting the rank of D is generally much easier than that of D D V D D T . Catchpole and Morgan [3 ] demon strate use of Hessian-ba sed methods to estimate par ameter redundanc y in a class of capture- recaptu re models. However , for certain applicatio ns, both the Fisher informatio n and the Hessian mu st be emplo yed, as we now out line. Assume that the model is constructed from a stochast ic cancer model embedded in an exponenti al family model in the sense outlin ed in Text S1 Section B. The key to showing that such an embedded model has no more than N irredundant paramete rs is to construct (as is done in Little et al. [12]) some scalar functions G 1 ( : ), G 2 ( : ), ::: , G N ( : ) such that the cancer hazard functi on h ( h ) can be written as h ( G 1 ( h ), G 2 ( h ), ::: , G N ( h )) . Since the cancer model is embedded in a member of the exp onentia l family (in the sense ou tlined in Text S1 Section B) the same will be true of the total log-lik elihood L ( x j h ) ~ L ( x j G 1 ( h ), G 2 ( h ), ::: , G N ( h )) . By means of the Chain Rule we obtain L 2 L ( x j h ) L h i L h j ~ X N l , k ~ 1 L 2 L ( x j G 1 , ::: , G N ) L G l L G k L G l L h i L G k L h j z P N l ~ 1 L L ( x j G 1 , ::: , G N ) L G l L 2 G l L h i L h j , so that the Fisher informati on matrix is given by: I ( h ) ~{ E h L 2 L ( x j h ) L h i L h j "# ~{ E X N l , k ~ 1 L 2 L ( x j G 1 , ::: , G N ) L G l L G k L G l L h i L G k L h j "# ~{ X N l , k ~ 1 L G l L h i E L 2 L ( x j G 1 , ::: , G N ) L G l L G k "# L G k L h j ð 5 Þ which therefore has rank at most N . Therefore by Coroll ary 2 there can be at most N irredundant parame ters, or indeed (gradien t) weak locally identifia ble parameters. [A similar argument shows that if one were to reparameteri se (via some inverti ble C 2 mapping h ~ f ( v ) ) then the embedde d log -likeliho od L ( x j f { 1 ( h )) ~ L ( x j v ) associ ated wit h h ( f { 1 ( h )) ~ h ( v ) must also have Fis her inform ation matrix of rank at most N .] By remark (ii) after Corolla ry 2, to show that a subset of cardinalit y N of the parameters ( h i ) p i ~ 1 is (gradient) weak locally identif iable parameters, requir es that one show that L 2 h ( h , y l ) L h i L h j "# p i , j ~ 1 has rank at least N for some ( h , y l ) . This is the approach adopted in the paper of Little et al. [12]. Discussion In this paper we have introd uced the notion s of weak local identifia bility and gradient weak local identifia bility, which we have related to the notions of parameter identifiab ility and redundancy previous ly intr oduced by Rothenber g [1] and Catchpo le and Morgan [3] . In particular we have shown tha t within the exponenti al family models parame ter irredunda ncy, local identifia - bility, gradient weak local identifia bility and weak local identifia - bility are largely equivalent . The slight novelty of our approach is that the notions of weak local identifiability and gradient weak local identifiability that we introduce are related much more to the Hessian of the likelihood rather than the Fisher information matrix that was considered by Rothenberg [1]. However, in practice, the two approaches are very similar; Catchpole and Morgan [3] used the Hessian of the likelihood, as do we, because of its greater analytic tractability. The use of this approach is motivated by the application, namely to determine identifiable parameter combinations in a large class of stochastic cancer models, as we outline at the end of the Analysis Section. In certain applications the Fisher information may be best for estimating the upper bound to the number of irredundant parameters, but the Hessian may be best for estimating the lower bound of this quantity. In the companion paper of Little et al. [12] we consider the problem of parameter identifiability in a particular class of stochastic cancer models, those of Little and Wright [13] and Little et al. [14]. These models generalize a large number of other quasi-biological cancer models, in particular those of Armitage and Doll [21], the two-mutation model [17], the generalized multistage model of Little [22], and a recently developed cancer model of Nowak et al. [23] that incorporates genomic instability. These and other cancer models are generally embedded in an exponential family model in the sense outlined in Text S1 Section B, in particular when cohort data are analysed using Poisson regression models, e.g., as in Little et al. [13,14,24]. As we show at the end of the Analysis Section, proving (gradient) weak local identifiability of a subset of cardinality k of the parameters ( h i ) p i ~ 1 can be done by showing that for this subset of parameters rk L 2 h ( h , y ) L h i L h j ! p i , j ~ 1 2 4 3 5 ~ k where h is the cancer hazard function. Little et al. [12] demonstrate (by exhibiting a particular parameterization) that there is redundancy in the parameteriza- tion for this model: the number of theoretically estimable parameters in the models of Little and Wright [13] and Little et al. [14] is at most two less than the number that are theoretically available, demonstrating (by Corollary 2) that there can be no more than this number of irredundant parameters. Two numerical examples suggest that this bound is sharp – we show that the rank PLoS ONE | www.plosone.org 5 January 2010 | Volume 5 | Issue 1 | e8915 Parameter Identifiability of the Hessian, rk L 2 h ( h , y ) L h i L h j ! p i , j ~ 1 2 4 3 5 , is two less than the row dimension of this matrix. This result generalizes previously derived results of Heidenreich and others [15,16] relating to the two- mutation model. Supporting Information Text S1 Found at: doi:10.1371/journal.pone.0008915.s001 (0.33 MB DOC) Acknowledgments The authors are very grateful for the comments of Professor Byron Morgan on an advanced draft of the paper, also for the detailed and helpful remarks of a referee. Author Contributions Conceived and designed the experiments: MPL WFH. Performed the experiments: MPL GL. Analyzed the data: MPL GL. Wrote the paper: MPL WFH GL. References 1. Rothenberg TJ (1971) Identification in parametric models. Econometrica 39: 577–591. 2. Silvey SD (1 975) Statistical inference. London: Chapman and Hall. 191 p. 3. Catchpole EA, Morgan BJT (1997) Detecting parameter re dundancy. Biometrika 84: 187–196. 4. Jacquez JA, Perry T (1990) Parameter estimation: local identifiability of parameters. Am J Physiol 258: E727–E736. 5. Audoly S, D’Angio L, Saccomani MP, Cobelli C (199 8) Global identifiability of linear compartmental models – a computer algebra algorithm. IEEE Trans Biomed Eng 45: 36–47. 6. Bellu G, Saccomani MP, Audoly S, D’Angio L (2007) DAISY: a new software tool to t est global i dentifiability of bi ological and physiologic al sy stems. Computer Meth Prog Biomed 88: 52–61. 7. Catchpole EA, Morgan BJT, Viallefont A (2002) Solving problems in parameter redundancy using comp uter algebra. J Appl Stat 29: 626–636. 8. Gimenez O, Viallefont A, Catchpole EA, Choquet R, Morgan BJT (2004) Methods for investigating parameter redundancy. Animal Biodiversity Conser- vation 27.1: 1–12. 9. Viall efont A, Lebr eton J-D, R eboule t A-M, Gory G ( 1998) Parame ter identifiability and model selection in capture-recapture models: a numerical approach. Biometrical J 40: 313–325. 10. Picci G (1977) Some connections between the theory of sufficient statistics and the iden tifiability problem. SIAM J Appl Math 33: 383–398. 11. Paulino CDM, de Braganc ¸ a Pereira CA (1994) On identifiability of parametric statistical models. J Ital Stat Soc 1: 125–151. ( Stat Methods Appl 3 ) . 12. Litt le MP, Heidenreich WF, Li G ( 2009) Parameter identifiability and redundancy in a general class of stochastic carcinogenesis models. PLoS ONE 4(12): e8520. 13. Little MP, Wright EG (2003) A stochastic carcinogenesis model incorporating genomic instability fitted to colon cancer data. Math Biosci 183: 111–134. 14. Little MP, Vineis P, Li G (2008) A stochastic carcinogenesis model incorporating multiple types of genomic instability fitted to colon cancer data. J Theoret Biol 254: 229–238: 255: 268. 15. Heidenreich WF (1996) On the param eters of the clonal expansion model. Radiat Environ Biophys 35: 127–129. 16. Heidenreich WF, Luebeck EG, Moolgavkar SH (1997) Some properties of the hazard function of the two-mutation clonal expansion model. Risk Anal 17: 391–399. 17. Moolgavkar SH, Venzon DJ (1979) Two-event models for carcinogenesis: incidence curves for childhood and adult tumors. Math Biosci 47: 55–77. 18. McCullagh P, Nelder JA (1989) Generalized linear models (2nd edition). London: Chapman and Hall. 511 p. 19. Rudin W (1976) Principles of mathema tical analysis (3rd edition). Auckland: McGraw Hill. 352 p. 20. Dickson LE (1926) Modern algebraic theories. Chicago: Sanborn. 273 p. 21. Armitage P, Doll R (1954) The age distribution of cancer and a multi-stage theory of carcinogenesis. Br J Cancer 8: 1–12. 22. Lit tle MP (1995) Are t wo mut atio ns suf ficie nt to cause cancer ? Som e generalizations of the two-mutation model of carcinogenesis of Moolgavkar, Venzon, and Knudson, and of the multistage model of Armitage and Doll. Biometrics 51: 1278–1291. 23. Nowak MA, Komarova NL, Sengupta A, Jallepalli PV , Shih I-M, et al. (2002) The role of chromosomal instability in tu mor initiation. Proc Natl Acad Sci U S A 99: 16226–16231 . 24. Little MP, Li G (2007) Stochastic modelling of colon cancer: is there a role for genomic instability ? Carcinogenesis 28: 479–487. PLoS ONE | www.plosone.org 6 January 2010 | Volume 5 | Issue 1 | e8915 Parameter Identifiability Supplementary material A. 1 2 3 4 5 Proof of Theorem 1 In this Section we outline a proof of Theorem 1 in the m ain text. To prove this result we need the fo llo win g le m ma of Rudin [19](p.229). Lemma A1. Suppose are non-negative integers s.t . , and is a function ,, mnr m mr ≥ nr ≥ F 1 C n E R ⊂→ R where is an open set. Suppose that E (' ( ) F x ) r = rk x E ∀∈ . Fix and put , and let and let be a linear projection operator ( ) s.t. and 6 ) 7 aE ∈ ' ( ) a 1 YP = AF = 2 PP = 1 ( n YA R = ) 2 Y : PR m R → m ( m R () null P = . Then ∃ , open sets and a bijective function whose inverse is also and s.t. , UV n R ⊂ ( FH 8 1 C : HV → U 1 C ( )) x ( ), A x A x ϕ =+ 9 x V ∀∈ where 1 Y 2 : AV Y ϕ ⊂→ is a function. 1 C 10 11 We now restate Theorem 1 here. Theorem A2. Suppose that the log-likelihood (| ) Lx θ is as a function of the parameter vector 2 C 12 p R θ ∈Ω ⊂ , and for all . 1 ( , ..., ) n xx x =∈ n Σ 13 (i) Suppose that for some x and int( ) θ ∈ Ω it is the case that 2 ,1 (| ) p ij ij Lx rk p θ θθ = ⎡⎤ ⎛⎞ ∂ ⎢⎥ = ⎜⎟ ⎜⎟ ∂∂ ⎢⎥ ⎝⎠ ⎣⎦ . 14 Then turning points of the likelihood in the neighborhood of θ are isolated, i.e., there is an open neighborhood θ for which there is at most one N θ ∈ 15 N ∈ℵ ⊂ Ω  that satisfies 16 1 (| ) 0 p i i Lx θθ θ θ = = ⎛⎞ ∂ = ⎜⎟ ∂ ⎝⎠  . 17 (ii) Suppose that for some x and int( ) θ ∈ Ω it is the case that 2 ,1 (| ) p ij ij Lx rk p θ θθ = ⎡⎤ ⎛⎞ ∂ ⎢⎥ = ⎜⎟ ⎜⎟ ∂∂ ⎢⎥ ⎝⎠ ⎣⎦ 18 then local maxima of the likelihood in the neighborhood of θ are isolated, i.e., there is an 19 open neighborhood for which there is at most one N θ N θ ∈ℵ ⊂ Ω ∈  that is a local maxi mum of ( | ) Lx 20 θ . 21 (iii) Suppose that fo r s ome x and all int( ) θ ∈ Ω it is the case that 22 2 ,1 ⎡⎤ ⎥ =< ⎢⎥ ⎣⎦ (| Lx rk θθ ∂ ⎢ ∂∂ 23 ) p ij ij θ = ⎛⎞ ⎜⎟ ⎜⎟ ⎝⎠ r p then all local maxim a of the likelihood in i nt ( ) Ω are not isolated, as indeed are all int ( ) Ω for whi ch 1 (| ) θ ∈ 0 p i i Lx θ θ = ⎛⎞ ∂ = ⎜⎟ ∂ ⎝⎠ 24 25 . Proof: (i) Let be defined by : FR Ω⊂ p → p R 12 12 (| Lx (| ) ) (| ) ( , , ..., ) , , ..., p p Lx Lx F θ θθ θθ θ θ ⎛⎞ ∂ = ⎜⎟ ⎜⎟ ∂ ⎝⎠ 26 θθ ∂∂ ∂∂ . Since is , is on L 2 C F 1 C int( ) p R Ω⊂ . By assumption 2 (| jj Lx θθ ∂ ⎜ ⎜ ∂∂ ,1 () ) p i i ij F θ θ θ = ⎛⎞ ∂ = ⎟ ⎟ ∂ ⎝⎠ 27 is of full rank at θ . By the inverse function theorem [19](pp.221-223) there are open such that , p NM R ⊂ 28 N θ ∈ and a bijective function such that GF 1 C : GM N → ( ( )) θ θ =   for all N θ ∈  . In particular there can be at most a single 29 N θ ∈  for which ( F θ ) 0 =  . QED. 30 ( ii) By (i) there is an open neighborhood N θ ∈ ℵ⊂ Ω for which if N θ ∈  is such that 31 1 (| ) 0 p i i Lx θθ θ θ = = ⎛⎞ ∂ = ⎜⎟ ∂ ⎝⎠  then for ' θθ ≠∈  N 1 ' (| ) 0 p i i Lx θθ θ θ = = ⎛⎞ ∂ ≠ ⎜⎟ ∂ ⎝⎠ . Suppose now that N θ ∈  is a lo cal maximu m of 32 ( Lx | ) θ . Any member of this neighborhood other than θ  cannot be a turning point, and so by the Mean Value Theorem (Rudin 1976, p.107) cannot be a local maxi mum. QED. 33 34 35 (iii) Let be defined by : p FR Ω⊂ → p R 36 12 12 (| , ..., ) p Lx ) (| ) (| ) ( , , , ..., p Lx Lx F θ θθ θθ θθ θ ⎛⎞ ∂∂ ∂ ⎜⎟ ⎜⎟ ∂∂ ∂ ⎝⎠ 37 θ = . Since is L 2 C , is on F 1 C int( ) p R Ω⊂ . By assumption 2 ,1 (| ) () p ij ij Lx rk F rk r θ θθ = ⎡⎤ ⎛⎞ ∂ ⎢⎥ = = ⎜⎟ ⎜⎟ ∂∂ ⎢⎥ ⎝⎠ ⎣⎦ for all 38 int( ) p R θ ∈Ω ⊂ . Suppose that 0 int( ) θ ∈Ω is a l ocal maxi mum of . Let L 39 0 : AR θθ = (, p R pp R =→ AL ∈ 1 () PR Y F θ ∂ ∂ PL ( ), and choose some arbitrary projection s.t. , and let ( , ) pp R R () pp A R == 40 ) p R ∈ 2 () Y null P = . By Lemma A1 there are open sets with 41 , UV p R ⊂ 0 U int( ) θ ∈⊂ () Fy A H Ω 11 ( y A H y ϕ −− =+ and a bijective mapping with inverse s.t. 1 C yV 1 C 42 : HV U → ) , ∀ ∈ where 12 : AV Y Y ϕ ⊂→ is a function. 1 C 43 44 Since 0 int( ) θ ∈Ω is a l oca l maxi mum of ( ; ) Lx θ , by the Mean Value Theorem [19](p.107) 45 0 () F 0 θ = . Now choose some non-trivial vector ( ) null A k ∈  and define a function, as we can, on some interval 46 : ( , ) p R δε ε −→ ( ) ( ' ) tt by . Because is bijective and is non-trivial 1 0 ( ) H t θ −  () ( tH δ = t ) k ' t + 47 : HV U → k  δ δ =⇔ = . Also, it is the case that: 48 0 49 11 11 00 11 1 1 00 0 0 ( ( )) ( [ ( ) ]) ( ( [ ( ) ])) [ () ] ( [ () ] ) [ () ] ( [ () ] ) () 0 Ft A H H H t k A H H H t k AH t k AH t k AH AH F δθ ϕ θ θϕ θ θ ϕ θ θ −− −− −− − − =+ + + = ++ + = + = =   (A1) Define :( , ) GR ε ε −→ by 12 ( ) ( ( )) (( ( ), ( ), ..., ( ))) n Gt L t L t t t δ δδ δ == . By the chain rule [19](p.215) 50 1 p i dG L x dt = ∂ = ∂ ∑ ) (| ) 0 i i d dt δ θ θ = (, t ε ε ∀ ∈− ( Lx . F ina lly , b y the Mean Value Theorem [19](p.107) G must be constant; in particular 51 0 | ( ) )( | ( 0 ) )( | t Lx Lx ) δ δθ = = (, t ) ε ε ∀∈ − and so all points 52 ( ) t δ must als o be l ocal maxi ma of ( | ) Lx θ . Therefore 0 θ is not an isolated 53 loc al maxi mum. Since all we used about 0 int( ) θ ∈ Ω was that 54 0 () 0 F θ = 00 02 | ) , ..., Lx θθ ( ( )) Ft 0 01 01 0 (| ) ( (| ) (( ) ) , 0 p ii p Lx Lx F θ θ θθ θ = ⎛⎞ ∂∂ ∂ == ⎜⎟ ⎜⎟ ∂∂ ∂ ⎝⎠ 55 , t he above argument also shows that turning points cannot be isolated: 0 δ = . QED. 56 57 Supplementary material B. 58 59 60 61 62 Specification of embedded exponential family model In this Section we outline the specification of an embedding of a stochastic cancer m odel in a general class of statistical models, the so-called exponential family [18]. This is often done in fitting cancer models to epidem iological and biological data (e.g., see references [12, 13, 14, 24]). Recall that a model is a mem ber of the exponential family if the observed data is such that the log-likelihood is given by 63 n ∑ 1 () n ll xx = =∈ 1 () (| ) ( , ) () n ll l l l xb Lx cx a ςς θ φ φ = ⎡ ⎤ − =+ ⎢ ⎥ ⎣ ⎦ ∑ for some functions 64 ( ), ( ), ( , ) abc x φ ςφ 65 . We assume that the natural param eters are functions of the model parameters 1 [( ) , p ll i i z ςς θ = = ] l 1 () p ii θ = and some auxiliary data , and that . Here is the cancer hazard function (for example, that of Little et al. [14], as also specified in the main text and in Text S1 Section B of Little et al. [12]), 66 y 67 68 1 () n ll z = 11 [( ) , ] pp l l ii l z h θ == = ⋅ 1 n ll ' ( [( ) , ]) l ii l bz μς θ = () y 1 [( ) , ] p ii l h θ = y = are some further auxiliary data, and we assum e that the are all non-zero. [ 69 1 () n ll z = Note: this is not necessarily a generalized linear model (GLM).] In this case it is seen that 70 71 2 2 22 1 3 [' ( ) ] ( , ) ( ) ''( ) (| ) ( , ) ( , ) [ ''( ) ] '''( ) [ '( ) ] () [ ' ' ( ) ] ll l l n li j l ij ll l l l l l ij l xb z h y ab Lx zh y h y b b x b ab ςθ φς θ θ θ θθ θθ ς ς ς φθ θ ς = ⎡ ⎤ −∂ ⎢ ⎥ ∂∂ ∂ ⎢ ⎥ = ⎢ ⎥ ∂∂ ⎧ ⎫ ∂∂ + − ⎢ ⎥ − ⎨ ⎬ ∂∂ ⎢ ⎥ ⎩⎭ ⎣ ⎦ ∑ (B1) 72 73 so that the Fisher information m atrix is given by 2 2 1 (, ) (, ) (| ) 1 () () ' ' ( ) n ll ij l ij l i j zh y h y Lx IE ab θ θθ θ θ θθ φ ς θ θ = ⎡⎤ ∂∂ ∂ =− = ⎢⎥ ∂∂ ∂ ∂ ⎢⎥ ⎣⎦ ∑ l 74 ( B 2 )

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment