Practical Estimation of High Dimensional Stochastic Differential Mixed-Effects Models
Stochastic differential equations (SDEs) are established tools to model physical phenomena whose dynamics are affected by random noise. By estimating parameters of an SDE intrinsic randomness of a system around its drift can be identified and separat…
Authors: Umberto Picchini, Susanne Ditlevsen
Practical Estimation of High Dimensional Sto c hastic Differen tial Mixed-Effects Mo dels Um b erto Picc hini a,b, ∗ , Susanne Ditlevsen b a Dep artment of Mathematic al Scienc es, University of Durham, South R o ad, DH1 3LE Durham, England. Phone: +44 (0)191 334 4164; F ax: +44 (0)191 334 3051 b Dep artment of Mathematic al Scienc es, University of Cop enhagen, Universitetsp arken 5, DK-2100 Cop enhagen, Denmark Abstract Sto c hastic differen tial equations (SDEs) are established to ols to mo del ph ysical phenomena whose dynamics are affected b y random noise. By estimating parameters of an SDE intrin- sic randomness of a system around its drift can b e identified and separated from the drift itself. When it is of in terest to mo del dynamics within a giv en p opulation, i.e. to mo del sim ultaneously the p erformance of sev eral experiments or sub jects, mixed-effects mod- elling allo ws for the distinction of b etw een and within exp erimen t v ariabilit y . A framew ork to mo del dynamics within a p opulation using SDEs is prop osed, representing sim ultane- ously several sources of v ariation: v ariabilit y b et ween exp erimen ts using a mixed-effects approac h and stochasticit y in the individual dynamics using SDEs. These sto chastic differ- ential mixe d-effe cts mo dels hav e applications in e.g. pharmacokinetics/pharmaco dynamics and biomedical modelling. A parameter estimation metho d is prop osed and computational guidelines for an efficient implemen tation are given. Finally the metho d is ev aluated using sim ulations from standard mo dels like the t wo-dimensional Ornstein-Uhlen b ec k (OU) and the square ro ot models. Keywor ds: automatic differen tiation, closed-form transition density expansion, maxim um likelihoo d estimation, p opulation estimation, stochastic differen tial equation, Co x-Ingersoll-Ross pro cess 1. INTRODUCTION Mo dels defined through stochastic differen tial equations (SDEs) allo w for the represen- tation of random v ariability in dynamical systems. This class of mo dels is b ecoming more and more imp ortant (e.g. Allen (2007) and Øksendal (2007)) and is a standard to ol to mo del financial, neuronal and p opulation gro wth dynamics. How ev er, m uch has still to b e ∗ Corresp onding author Email addr esses: umberto.picchini@durham.ac.uk (Um b erto Picchini), susanne@math.ku.dk (Susanne Ditlevsen) Pr eprint submitte d to Elsevier Novemb er 13, 2018 done, b oth from a theoretical and from a computational p oin t of view, to mak e it applica- ble in those statistical fields that are already well established for deterministic dynamical mo dels (ODE, DDE, PDE). F or e xample, studies in which rep eated measurements are tak en on a series of individuals or experimental units pla y an imp ortan t role in biomedical researc h: it is often reasonable to assume that resp onses follow the same mo del form for all experimental sub jects, but parameters v ary randomly among individuals. The imp or- tance of these mixe d-effe cts mo dels (e.g. McCullo c h and Searle (2001), Pinheiro and Bates (2002)) lies in their ability to split the total v ariation in to within- and b et ween-individual comp onen ts. The SDE approac h has only recen tly b een com bined with mixed-effects models, because of the difficulties arising b oth from the theoretical and the computational side when dealing with SDEs. In Ov ergaard et al. (2005) and T ornøe et al. (2005) a sto c hastic differential mixed-effects mo del (SDMEM) with log-normally distributed random parameters and a constan t diffusion term is estimated via (extended) Kalman filter. Donnet and Samson (2008) developed an estimation metho d based on a sto c hastic EM algorithm for fitting one-dimensional SDEs with mixed-effects. In Donnet et al. (2010) a Bay esian approac h is applied to a one-dimensional mo del for growth curv e data. The metho ds presen ted in the aforementioned references are computationally demanding when the dimension of the mo del and/or parameter space gro ws. How ev er, they are all able to handle data contami- nated b y measuremen t noise. In Ditlevsen and De Gaetano (2005) the likelihoo d function for a simple SDME M with normally distributed random parameters is calculated explic- itly , but generally the likelihoo d function is una v ailable. In Picchini et al. (2010) a com- putationally efficien t metho d for estimating SDMEMs with random parameters follo wing an y sufficiently well-behav ed con tinuous distribution w as developed. First the conditional transition density of the diffusion process given the random effects is approximated in closed-form b y a Hermite expansion, and then the obtained conditional likelihoo d is n u- merically integrated with resp ect to the random effects using Gaussian quadrature. The metho d resulted to b e statistically accurate and computationally fast. How ever, in practice it w as limited to one random effect only (see Picchini et al. (2008) for an application in neuroscience) since Gaussian quadrature is computationally inefficient when the num b er of random effects gro ws. Here the metho d presen ted in Picc hini et al. (2010) is extended to handle SDMEMs with m ultiple random effects. F urthermore, the application is extended in a second direction to handle multidimensional SDMEMs. Computational guidelines for a practical implemen- tation are giv en, using e.g. automatic differentiation (AD) to ols. Results obtained from sim ulation stud ies sho w that, at least for the examples discussed in the presen t work, the metho dology is flexible enough to accommo date complex mo dels ha ving different distribu- tions for the random effects, not only the normal or log-normal distributions whic h are the ones usually emplo yed. Satisfactory estimates for the unknown parameters are obtained ev en considering small populations of sub jects (i.e. few rep etitions of an exp erimen t) and few observ ations p er sub ject/exp erimen t, whic h is often relev ant, especially in biomedicine where large data sets are t ypically unav ailable. A drawbac k of our approach is that measurement error is not accoun ted for, and th us 2 it is most useful in those situations where the v ariance of the measuremen t noise is small compared to the system noise. The pap er is organized as follo ws. Section 2 i n tro duces the SDMEMs, the observ ation sc heme and the necessary notation. Section 3 introduces the lik eliho o d function. Section 4 considers appro ximate metho ds for when the expression of the exact likelihoo d function cannot b e obtained; computational guidelines and softw are to ols are also discussed. Section 5 is dev oted to the application on simulated datasets. Section 6 summarizes the results and the adv antages and limitations of the metho d are discussed. An Appendix containing tec hnical results closes the pap er. 2. STOCHASTIC DIFFERENTIAL MIXED EFFECTS MODELS In the following, b old characters are used to denote v ectors and matrices. Consider a d -dimensional (Itô) SDE mo del for some con tinuous pro cess X t ev olving in M different exp erimen tal units randomly c hosen from a theoretical p opulation: d X i t = µ ( X i t , θ , b i ) dt + σ ( X i t , θ , b i ) d W i t , X i 0 = x i 0 , i = 1 , . . . , M (1) where X i t is the v alue at time t ≥ t i 0 of the i th unit, with X i 0 = X i t i 0 ; θ ∈ Θ ⊆ R p is a p -dimensional fixed effects parameter (the same for the en tire p opulation) and b i ≡ b i ( Ψ ) ∈ B ⊆ R q is a q -dimensional random effects parameter (unit sp ecific) with comp onen ts ( b i 1 , ..., b i q ) ; eac h comp onen t b i l ma y follow a different contin uous distribution ( l = 1 , ..., q ). Let p B ( b i | Ψ ) denote the join t distribution of the v ector b i , parametrized b y an r -dimensional parameter Ψ ∈ Υ ⊆ R r . The W i t ’s are d -dimensional standard Brownian motions with comp onen ts W ( h ) i t ( h = 1 , ..., d ). The W ( h ) i t and b j l are assumed mutually indep enden t for all 1 ≤ i, j ≤ M , 1 ≤ h ≤ d , 1 ≤ l ≤ q . The initial condition X i 0 is assumed equal to a vector of constants x i 0 ∈ R d . The drift and the diffusion co efficien t functions µ ( · ) : E × Θ × B → R d and σ ( · ) : E × Θ × B → S are assumed kno wn up to the parameters, and are assumed sufficien tly regular to ensure a unique weak solution (Øksendal (2007)), where E ⊆ R d denotes the state space of X i t and S denotes the set of d × d p ositiv e definite matrices. Model (1) assumes that in eac h of the M units the ev olution of X follo ws a common functional form, and therefore differences b etw een units are due to differen t realizations of the Brownian motion paths { W i t } t ≥ t i 0 and of the random parameters b i . Thus, the dynamics within a generic unit i are expressed via an SDE mo del driv en b y Brownian motion, and the introduction of a v ector parameter randomly v arying among units allo ws for the explanation of the v ariabilit y b et ween the M units. Assume that the distribution of X i t giv en ( b i , θ ) and X i s = x s , s < t , has a strictly p ositiv e densit y w.r.t. the Leb esgue measure on E , whic h is denoted by x → p X ( x , t − s | x s , b i , θ ) > 0 , x ∈ E . (2) Assume moreo ver that unit i is observed at the same set of n i + 1 discrete time p oints { t i 0 , t i 1 , . . . , t i n i } for eac h co ordinate X ( h ) i t of the pro cess X i t ( h = 1 , ..., d ), whereas differen t 3 units ma y b e observ ed at different sets of time p oin ts. Let x i b e the ( n i + 1) × d matrix of resp onses for unit i , with j th row giv en by x i ( t i j ) = x i j = ( x (1) i j , ..., x ( d ) i j ) , and let the follo wing b e the N × d total resp onse matrix, N = P M i =1 ( n i + 1) : x = ( x 1 T , ..., x M T ) T = x (1)1 0 · · · x ( d )1 0 . . . . . . . . . x (1)1 n 1 · · · x ( d )1 n 1 . . . . . . . . . x (1) i j · · · x ( d ) i j . . . . . . . . . x (1) M 0 · · · x ( d ) M 0 . . . . . . . . . x (1) M n M · · · x ( d ) M n M , where T denotes transp osition. W rite ∆ i j = t i j − t i j − 1 for the time distance b etw een x i j − 1 and x i j . Notice that this observ ation scheme implies that the matrix of data x must not con tain missing v alues. The goal is to estimate ( θ , Ψ ) using sim ultaneously all the data in x . The sp ecific v alues of the b i ’s are not of interest, but only the iden tification of the v ector-parameter Ψ c haracterizing their distribution. How ever, estimates of the random parameters b i are also obtained, since it is necessary to estimate them when estimating ( θ , Ψ ) . 3. MAXIMUM LIKELIHOOD ESTIMA TION T o obtain the marginal density of x i , the conditional densit y of the data giv en the non-observ able random effects b i is in tegrated with respect to the marginal densit y of the random effects, using that W ( h ) i t and b j l are indep endent. This yields the likelihoo d function L ( θ , Ψ ) = M Y i =1 p ( x i | θ , Ψ ) = M Y i =1 Z B p X ( x i | b i , θ ) p B ( b i | Ψ ) d b i . (3) Here p ( · ) is the density of x i giv en ( θ , Ψ ) , p B ( · ) is the densit y of the random effects, and p X ( · ) is the pro duct of the transition densities p X ( · ) giv en in (2) for a giv en realization of the random effects and for a giv en θ , p X ( x i | b i , θ ) = n i Y j =1 p X ( x i j , ∆ i j | x i j − 1 , b i , θ ) . (4) In applications the random effects are often assumed to b e (multi)normally distributed, but p B ( · ) could b e any w ell-b eha ved density function. Solving the integral in (3) yields the marginal lik eliho o d of the parameters for unit i , indep enden t of the random effects 4 b i ; b y maximizing the resulting expression (3) with resp ect to θ and Ψ the corresp onding maxim um likelihoo d estimators (MLE) ˆ θ and ˆ Ψ are obtained. Notice that it is p ossible to consider random effects having discrete distributions: in that case the in tegral b ecomes a sum and can b e easily computed when the transition densit y p X is kno wn. In this paper only random effects ha ving contin uous distributions are considered. In simple cases an explicit expression for the likelihoo d function, and ev en explicit estimating equations for the MLEs can b e found (Ditlevsen and De Gaetano (2005)). Ho wev er, in general it is not p ossible to find an explicit solution for the in tegral, and thus exact MLEs are una v ailable. This o ccurs when: (i) p X ( x i j , ·| x i j − 1 , · ) is kno wn but it is not p ossible to analytically solve the integral, and (ii) p X ( x i j , ·| x i j − 1 , · ) is unkno wn. In (i) the in tegral must b e ev aluated numerically to obtain an approximation of the lik eliho o d (3). In (ii) first p X ( x i j , ·| x i j − 1 , · ) is appro ximated, then the integral in (3) is solved n umerically . In situation (ii) there exist several strategies to approximate the densit y p X ( x i j , ·| x i j − 1 , · ) , e.g. Monte Carlo approximations, direct solution of the F okk er-Planck equation, or Hermite expansions, just to mention some of the p ossible approac hes, see Hurn et al. (2007) for a comprehensiv e review fo cused on one-dimensional diffusions. W e prop ose to approximate the transition density in closed-form using a Hermite expansion (Aït-Sahalia (2008)). It often provides a go o d appro ximation to p X , and Jensen and P oulsen (2002) show ed that the metho d is computationally efficien t. Using the obtained expression, the lik eliho o d function is appro ximated, thus deriving approximated MLEs of θ and Ψ . 4. CLOSED-FORM TRANSITION DENSITY EXP ANSION AND LIKELI- HOOD APPRO XIMA TION 4.1. T r ansition Density Exp ansion for Multidimensional SDEs Here the transition density expansion of a d -dimensional time-homogeneous SDE as suggested in Aït-Sahalia (2008) is review ed; the same reference pro vides a method to handle m ulti-dimensional time-inhomogeneous SDEs, but for ease of exp osition atten tion is focused on the former case. Also references on further extensions, e.g. Lévy pro cesses, are given in the pap er. W e will only consider SDEs reducible to unit diffusion, i.e. multi- dimensional diffusions X for which there exists a one-to-one transformation to another diffusion with diffusion matrix the iden tity matrix. It is p ossible to p erform the density expansion also for non-reducible SDEs (Aït-Sahalia (2008)). F or the momen t reference to θ is dropp ed when not necessary , i.e. a function f ( x, θ ) is written f ( x ) . Consider the follo wing d -dimensional, reducible, time-homogeneous SDE d X t = µ ( X t ) dt + σ ( X t ) d W t , X 0 = x 0 (5) and a series of d -dimensional discrete observ ations x 0 , x 1 , ..., x n from X , all observ ed at the same time p oin ts { t 0 , t 1 , ..., t n } ; denote with E the state space of X . W e wan t to appro ximate p X ( x j , ∆ j | x j − 1 ) , the conditional densit y of X j giv en X j − 1 = x j − 1 , where ∆ j = t j − t j − 1 ( j = 1 , ..., n ). Under regularity conditions (e.g. µ ( x ) and σ ( x ) are assumed to b e infinitely differentiable in x on E , v ( x ) := σ ( x ) σ ( x ) T is a d × d p ositiv e definite matrix 5 for all x in the in terior of E and all the drift and diffusion comp onen ts satisfy linear growth conditions, see Aït-Sahalia (2008) for details), the logarithm of the transition densit y can b e expanded in closed form using an order J = + ∞ Hermite series, and appro ximated by a T a ylor expansion up to order K , ln p ( K ) X ( x j , ∆ j | x j − 1 ) = − d 2 ln(2 π ∆ j ) − 1 2 ln(det( v ( x j ))) + C ( − 1) Y ( γ ( x j ) | γ ( x j − 1 )) ∆ j + K X k =0 C ( k ) Y ( γ ( x j ) | γ ( x j − 1 )) ∆ k j k ! . (6) Here ∆ k j denotes ∆ j raised to the p o wer of k . The co efficien ts C ( k ) Y are giv en in the Appendix and γ ( x ) = ( γ (1) ( x ) , ..., γ ( d ) ( x )) T is the Lamp erti transform, whic h b y definition exists when the diffusion is reducible, and is suc h that O γ ( x ) = σ − 1 ( x ) . See the App endix for a sufficien t and necessary condition for reducibilit y . Using Itô’s lemma, the transformation Y t = γ ( X t ) defines a new diffusion pro cess Y t , solution of the follo wing SDE d Y t = µ Y ( Y t ) dt + d W t , Y 0 = y 0 , where the h -th elemen t of µ Y is giv en by ( h = 1 , ..., d ) µ ( h ) Y ( Y t ) = d X i =1 σ − 1 ( γ − 1 ( Y t )) hi µ ( i ) ( γ − 1 ( Y t )) − 1 2 d X i,j,k σ − 1 ( γ − 1 ( Y t )) ∂ σ ∂ x j ( γ − 1 ( Y t )) σ − 1 ( γ − 1 ( Y t )) hi σ ik ( γ − 1 ( Y t )) σ j k ( γ − 1 ( Y t )) . F or ease of interpretation the Lamp erti transform and the drift term µ Y for a scalar ( d = 1 ) SDE are rep orted. Namely γ ( · ) is defined b y Y t ≡ γ ( X t ) = Z X t du σ ( u ) , where the lo wer b ound of integration is an arbitrary p oin t in the interior of E . The drift term is giv en by µ Y ( Y t ) = µ ( γ − 1 ( Y t )) σ ( γ − 1 ( Y t )) − 1 2 ∂ σ ∂ x ( γ − 1 ( Y t )) . The transformation of X t in to Y t is a necessary step to make the transition density of the transformed pro cess closer to a normal distribution, so that the Hermite expansion giv es reasonable results. How ever, the reader is warned that this is b y no means an easy task for many m ultiv ariate SDEs, and imp ossible for those ha ving non-reducible diffusion (see Aït-Sahalia (2008) for details). The use of a softw are with sym b olic algebra capabilities lik e e.g. Ma thema tica , Maple or Maxima is necessary to carry out the calculations. 6 4.2. Likeliho o d Appr oximation and Par ameter Estimation F or a reducible time-homogeneous SDMEM, the co efficien ts C ( k ) Y are obtained as in Section 4.1 by taking ( θ , b i ) , ∆ i j and ( x i j , x i j − 1 ) instead of θ , ∆ j and ( x j , x j − 1 ) , resp ec- tiv ely . Then p ( K ) X is substituted for the unknown transition densit y in (4), thus obtaining a sequence of appro ximations to the likelihoo d function L ( K ) ( θ , Ψ ) = M Y i =1 Z B p ( K ) X ( x i | b i , θ ) p B ( b i | Ψ ) d b i , (7) where p ( K ) X ( x i | b i , θ ) = n i Y j =1 p ( K ) X ( x i j , ∆ i j | x i j − 1 , b i , θ ) (8) and p ( K ) X is given b y equation (6). By maximizing (7) with respect to ( θ , Ψ ) , approximated MLEs ˆ θ ( K ) and ˆ Ψ ( K ) are obtained. In general, the in tegral in (7) does not ha v e a closed form solution, and therefore efficien t n umerical in tegration metho ds are needed; see Picchini et al. (2010) for the case of a single random effect ( q = 1 ). General purp ose approximation methods for one- or m ulti- dimensional in tegrals, irrespective of the random effects distribution, are av ailable (e.g. Krommer and Ueberhuber (1998)) and implemen ted in sev eral soft w are pac k ages, though the complexit y of the problem grows fast when increasing the dimension of B . Ho wev er, since exact transition densities or a closed-form approximation to p X are supp osed to be a v ailable, analytic expressions for the in tegrands in (3) or (7) are kno wn and the Laplace metho d (e.g. Sh un and McCullagh (1995)) ma y b e used. W rite b i = ( b i 1 , ..., b i q ) and define f ( b i | θ , Ψ ) = log p X ( x i | b i , θ ) + log p B ( b i | Ψ ) , (9) where p X ( x i | b i , θ ) is giv en in (4). Then log R B e f ( b i | θ , Ψ ) d b i can b e appro ximated using a second order T aylor series expansion, known as Laplace approximation: log Z B e f ( b i | θ , Ψ ) d b i ' f ( ˆ b i | θ , Ψ ) + q 2 log(2 π ) − 1 2 log − H ( ˆ b i | θ , Ψ ) where ˆ b i = arg max b i f ( b i | θ , Ψ ) , and H ( b i | θ , Ψ ) = ∂ 2 f ( b i | θ , Ψ ) /∂ b i ∂ b i T is the Hessian of f w.r.t. b i . Thus, the log-likelihoo d function is appro ximately given b y log L ( θ , Ψ ) ' M X i =1 f ( ˆ b i | θ , Ψ ) + q 2 log(2 π ) − 1 2 log − H ( ˆ b i | θ , Ψ ) (10) and the v alues of θ and Ψ maximizing (10) are approximated MLEs. F or the sp ecial case where − f ( b i | θ , Ψ ) is quadratic and conv ex in b i the Laplace approximation is ex- 7 act (Jo e (2008)) and (10) pro vides the exact lik eliho o d function. An appro ximation of log L ( K ) ( θ , Ψ ) can b e derived in the same wa y , and w e denote with ( ˆ θ ( K ) , ˆ Ψ ( K ) ) = arg min θ ∈ Θ , Ψ ∈ Υ − log L ( K ) ( θ , Ψ ) the corresp onding appro ximated MLE of ( θ , Ψ ) . Davidian and Giltinan (2003) recommend to use the Laplace approximation only if n i is “large”; ho wev er Ko and Davidian (2000) note that even if the n i ’s are small, the inferences should still b e v alid if the magnitude of in tra-individual v ariation is small relativ e to the inter-individual v ariation. This happens in man y applications, e.g. in pharmacokinetics. In general, computing (10) or log L ( K ) ( θ , Ψ ) is non-trivial, since M indep enden t opti- mization pro cedures m ust be run to obtain the ˆ b i ’s and then the M Hessians H ( ˆ b i | θ , Ψ ) m ust b e computed. The latter problem can b e solved using either (i) approximations based on finite differences, (ii) computing the analytic expressions of the Hessians using a sym b olic calculus program or (iii) using automatic differen tiation to ols (AD, e.g. Griew ank (2000)). W e recommend to av oid metho d (i) since it is computationally costly when the dimen- sion of M and/or b i gro ws, whereas metho ds (ii)-(iii) are reliable choices, since sym b olic pac k ages are b ecoming standard in most softw are and are anyw ay necessary to calculate an approximation p ( K ) X to the transition densit y . How ev er, when a symbolic pac k age is not a v ailable to the user or is not of help in some sp ecific situations, AD can b e a con venien t (if not the only p ossible) choice, esp ecially when the function to b e differen tiated is defined via a complex soft ware co de; see the Conclusion for a discussion. In order to derive the required Hessian automatically w e used the AD to ol ADiMat for Ma tlab (Bisc hof et al. (2005)), see http://www.autodiff.org for a comprehensive list of other AD softw are. F or example, assume that a user defined Ma tlab function named loglik_indiv computes the f function in (9) at a giv en v alue of the random effects b i (named b_rand ): result = loglik_indiv(b_rand) so result con tains the v alue of f at b i . The follo wing Ma tlab code then in vok es ADiMat and creates automatically a file named g_loglik_indiv containing the code necessary to return the exact (to mac hine precision) Hessian of loglik_indiv w.r.t. b_rand : addiff(@loglik_indiv, ’b_rand’,[], ’--2ndorderfwd’) A t this p oin t we initialize the arra y and the matrix that will con tain the gradien t and the Hessian of f w.r.t. the q -dimensional vector b i : gradient = createFullGradients(b_rand); % inizialize the gradient Hessian = createHessians([q q], b_rand); % inizialize the Hessian [Hessian, gradient] = g_loglik_indiv(Hessian, gradient, b_rand); The last line returns the desired Hessian and the gradien t of f ev aluated at b_rand . W e used the Hessian either to compute the Laplace appro ximation or in the trust region 8 metho d used for the in ternal step optimization, read b elo w. When it is p ossible to deriv e the expression for the Hessian analytically w e strongly recommend to av oid the use of AD to ols in order to sp eed up the estimation algorithm. F or example, in Section 5.1 only t wo random effects are considered and using the Ma tlab Sym b olic Calculus T o olbox w e ha v e obtained the analytic expression for the Hessian without m uch effort. F or the remainder of this Section the reference to the index K is dropp ed, except where necessary , as the following apply irresp ectiv ely of whether p X or p ( K ) X is used. The mini- mization of − log L ( θ , Ψ ) is a nested optimization problem. First the in ternal optimization step estimates the ˆ b i ’s for every unit (the ˆ b i ’s are sometimes kno wn in the literature as em- pirical Ba y es estimators). Since b oth symbolic calculus and AD tools provide exact results for the deriv ativ es of f ( b i ) , the v alues pro vided via AD being only limited b y the computer precision, the exact gradient and Hessian of f ( b i ) can b e used to minimize − f ( b i ) w.r.t. b i . W e used the subspace trust-region metho d describ ed in Coleman and Li (1996) and implemen ted in the Ma tlab fminunc function. The external optimization step minimizes − log L ( θ , Ψ ) w.r.t. ( θ , Ψ ) , after plugging the ˆ b i ’s in to (10). This is a computationally hea vy task, esp ecially for large M , b ecause the M internal optimization steps must b e p erformed for eac h infinitesimal v ariation of the parameters ( θ , Ψ ) . Therefore to p erform the external step we rev erted to deriv ative-free optimization metho ds, namely the Nelder- Mead simplex with additional chec ks on parameter b ounds, as implemented b y D’Errico (2006) for Ma tlab . T o sp eed up the algorithm con vergence, ˆ b i ( k ) ma y b e used as starting v alue for b i in the ( k + 1) th iteration of the in ternal step, where ˆ b i ( k ) is the estimate of b i at the end of the k th iteration of the in ternal step. This might not b e an optimal strategy , ho wev er it should improv e ov er the c hoice made b y some authors who use a v ector of zeros as starting v alue for b i eac h time the internal step is p erformed. The latter strategy may b e inefficient when dealing with highly time consuming problems, as it requires many more iterations. Once estimates for θ and Ψ are av ailable, estimates of the random parameters b i are automatically giv en b y the v alues of the ˆ b i at the last iteration of the external optimization step, see Section 5.2 for an example. 5. SIMULA TION STUDIES In this Section the efficacy of the metho d is assessed through Monte Carlo sim ulations under different exp erimen tal designs. W e alw ays choose M and n to b e not v ery large, since in most applications, e.g. in the biomedical context, large datasets are often una v ailable. Ho wev er, see Picc hini et al. (2008) for the application of a one-dimensional SDMEM on a v ery large data set. 5.1. Or ange T r e es Gr owth Mo del The follo wing is as a toy example for gro wth mo dels, where SDEs are used regularly , esp ecially to describ e animal gro wth that allows for non-monotone growth and can mo del unexp ected c hanges in gro wth rates, see Donnet et al. (2010) for an application to chic ken 9 gro wth, Strathe et al. (2009) for an application to gro wth of pigs, and Filipe et al. (2010) for an application to b o vine data. In Lindstrom and Bates (1990) and Pinheiro and Bates (2002, Sections 8.1.1-8.2.1), data from a study on the growth of orange trees are studied by means of deterministic nonlinear mixed-effects mo dels using the method proposed in Lindstrom and Bates (1990). The data are a v ailable in the Orange dataset provided in the nlme R pack age (Pinheiro et al. (2007)). This is a balanced design consisting of sev en measuremen ts of the circumference of eac h of fiv e orange trees. In these references, a logistic mo del was considered to study the relationship b et ween the circumference X i,j (mm), measured on the i th tree at age t ij (da ys), and the age ( i = 1 , ..., 5 and j = 1 , ..., 7 ): X i,j = φ 1 1 + exp( − ( t ij − φ 2 ) /φ 3 ) + ε ij (11) with φ 1 (mm), φ 2 (da ys) and φ 3 (da ys) all p ositiv e, and ε ij ∼ N (0 , σ 2 ε ) are i.i.d. mea- suremen t error terms. The parameter φ 1 represen ts the asymptotic v alue of X as time go es to infinit y , φ 2 is the time v alue at whic h X = φ 1 / 2 (the inflection p oint of the logis- tic mo del) and φ 3 is the time distance b et w een the inflection p oin t and the p oin t where X = φ 1 / (1 + e − 1 ) . In Picc hini et al. (2010) a SDMEM was deriv ed from mo del (11) with a normally distributed random effect on φ 1 . The lik eliho o d approximation describ ed in Section 4.2 was applied to estimate parameters, but using Gaussian quadrature instead of the Laplace metho d to solv e the one-di mensional in tegral. Now consider a SDMEM with random effects on b oth φ 1 and φ 3 . The dynamical mo del corresp onding to (11) for the i th tree and ignoring the error term is giv en by the following ODE dX i t dt = 1 ( φ 1 + φ i 1 )( φ 3 + φ i 3 ) X i t ( φ 1 + φ i 1 − X i t ) , X i 0 = x i 0 , t ≥ t i 0 with φ i 1 ∼ N (0 , σ 2 φ 1 ) indep enden t of φ i 3 ∼ N (0 , σ 2 φ 3 ) and b oth indep enden t of ε ij ∼ N (0 , σ 2 ε ) for all i and j . Now φ 2 only app ears in the deterministic initial condition X i 0 = X i t i 0 = φ 1 / (1 + exp[( φ 2 − t i 0 ) /φ 3 ]) , where t i 0 = 118 da ys for all the trees. In growth data it is often observ ed that the v ariance is prop ortional to the lev el, which is obtained in an SDE if the diffusion co efficien t is prop ortional to the square root of the pro cess itself. Consider a state-dep enden t diffusion co efficien t leading to the SDMEM: dX i t = 1 ( φ 1 + φ i 1 )( φ 3 + φ i 3 ) X i t ( φ 1 + φ i 1 − X i t ) dt + σ p X i t dW i t , X i 0 = x i 0 , (12) φ i 1 ∼ N (0 , σ 2 φ 1 ) , φ i 3 ∼ N (0 , σ 2 φ 3 ) , (13) where σ has units (mm / days) 1 / 2 . Th us, θ = ( φ 1 , φ 3 , σ ) , b i = ( φ i 1 , φ i 3 ) and Ψ = ( σ φ 1 , σ φ 3 ) . Since the random effects are indep enden t, the densit y p B in (9) is p B ( b i | Ψ ) = ϕ ( φ i 1 ) ϕ ( φ i 3 ) , where ϕ ( φ i 1 ) and ϕ ( φ i 3 ) are normal p dfs with means zero and standard deviations σ φ 1 and σ φ 3 , resp ectiv ely . 10 T able 1: Orange trees gro wth: Monte Carlo maxim um lik eliho od estimates and 95% confidence interv als from 1000 sim ulations of mo del (12)-(13), using an order K = 2 for the closed form density expansion (CFE) and the densit y approximation based on the Euler-Maruy ama discretization (EuM). F or the CFE metho d measures of symmetry are also rep orted. T rue parameter values φ 1 φ 3 σ σ φ 1 σ φ 3 ˆ φ 1 ˆ φ 3 ˆ σ ˆ σ φ 1 ˆ σ φ 3 M = 5 , n + 1 = 7 195 350 0.08 25 52.5 Mean CFE [95% CI] 197.40 [164.98, 236.97] 356.88 [281.74, 460.95] 0.079 [0.057, 0.102] 15.68 [ 1 . 7 × 10 − 7 , 44.33] 28.67 [ 5 . 8 × 10 − 8 , 112.28] Skewness CFE 0.35 0.59 0.22 0.55 1.11 Kurtosis CFE 3.30 3.56 3.06 2.76 3.88 Mean EuM [95% CI] 183.35 [154.62, 217.93] 303.75 [236.57, 398.10] 0.089 [0.060, 0.123] 12.60 [ 1 . 5 × 10 − 7 , 39.56] 34.96 [ 5 . 9 × 10 − 8 , 112.48] M = 5 , n + 1 = 20 195 350 0.08 25 52.5 Mean CFE [95% CI] 196.71 [164.48, 236.39] 352.16 [274.88, 461.84] 0.079 [0.067, 0.090] 15.73 [ 2 × 10 − 7 , 43.67] 30.77 [ 7 × 10 − 8 , 114.94] Skewness CFE 0.33 0.63 -0.04 0.44 1.03 Kurtosis CFE 3.33 3.67 2.93 2.38 3.69 Mean EuM [95% CI] 192.50 [161.45, 230.09] 339.12 [264.82, 445.84] 0.080 [0.068, 0.091] 15.01 [ 1 . 9 × 10 − 7 , 41.48] 32.63 [ 6 . 4 × 10 − 8 , 114.79] M = 30 , n + 1 = 7 195 350 0.08 25 52.5 Mean CFE [95% CI] 196.06 [183.41, 209.52] 354.55 [317.66, 395.48] 0.081 [0.072, 0.092] 22.71 [7.25, 33.45] 42.18 [ 1 . 5 × 10 − 4 , 73.84] Skewness CFE 0.20 0.32 0.14 -0.89 -0.65 Kurtosis CFE 3.15 3.29 3.14 5.33 3.20 Mean EuM [95% CI] 182.89 [172.02, 194.68] 303.87 [273.18, 341.84] 0.093 [0.080, 0.106] 19.23 [0.05, 27.82] 48.54 [12.13, 75.81] M = 30 , n + 1 = 20 195 350 0.08 25 52.5 Mean CFE [95% CI] 195.62 [183.33, 209.20] 351.18 [315.47, 389.21] 0.080 [0.075, 0.085] 23.04 [9.61, 34.03] 44.83 [ 2 . 2 × 10 − 4 , 74.65] Skewness CFE 0.20 0.30 0.05 -0.73 -0.71 Kurtosis CFE 3.10 3.27 2.76 5.08 3.62 Mean EuM [95% CI] 191.51 [179.93, 204.40] 338.19 [304.38, 374.94] 0.081 [0.076, 0.086] 22.24 [8.93, 32.83] 46.34 [ 4 . 4 × 10 − 4 , 75.03] W e generated 1000 datasets of dimension ( n + 1) × M from (12)-(13) and estimated ( θ , Ψ ) on eac h dataset, th us obtaining 1000 sets of parameter estimates. This w as rep eated for ( M , n + 1) = (5 , 7) , (5 , 20) , (30 , 7) and (30 , 20) . T ra jectories were generated using the Milstein scheme (Kloeden and Platen (1992)) with unit step size in the same time in terv al [118, 1582] as in the real data. The data were then extracted b y linear interpolation from the simulated tra jectories at the linearly equally spaced sampling times { t 0 , t 1 , ..., t n } for differen t v alues of n , where t 0 = 118 and t n = 1582 for ev ery n . An exception is the case M = 7 , n i = n = 5 , where { t 0 , ..., t n } = { 118 , 484 , 664 , 1004 , 1231 , 1372 , 1582 } , the same as in the data. P arameters w ere fixed at ( X 0 , φ 1 , φ 3 , σ, σ φ 1 , σ φ 3 ) = (30 , 195 , 350 , 0 . 08 , 25 , 52 . 5) . The v alue for σ φ 3 is chosen suc h that the co efficien t of v ariation for ( φ 3 + φ i 3 ) is 15%, i.e. φ i 3 has non-negligible influence. An order K = 2 approximation to the likelihoo d was used, see the App endix for the co efficien ts. The estimates ( ˆ θ (2) , ˆ Ψ (2) ) ha ve b een obtained as describ ed in Section 4.2 and are denoted as CFE in T able 1, where CFE stands for Closed F orm Expansion to denote that a closed-form transition density expansion technique has b een used. The CFE estimates w ere used to pro duce the fit for ( M , n + 1) = (5 , 20) given in Figure 1(a), rep orting simulated data and empirical mean of 5000 sim ulated tra jectories from (12)- (13) generated with the Milstein scheme using a step-size of unit length. Empirical 95% confidence bands of tra jectory v alues and three example tra jectories are also rep orted. F or eac h simulated tra jectory indep enden t realizations of φ i 1 and φ i 3 w ere pro duced b y dra wing from the normal distributions N (0 , ( ˆ σ (2) φ 1 ) 2 ) and N (0 , ( ˆ σ (2) φ 3 ) 2 ) . The corresp onding fit for ( M , n ) = (30 , 20) is giv en in Figure 1(b). There is a p ositive linear correlation ( r = 0 . 42 , p < 0 . 001 ) betw een the estimates of φ 1 and φ 3 , see Figure 2 rep orting also the least squares fit line. Similar relations w ere found when using differen t combinations of M and n . Histograms of the p opulation parameter estimates ˆ φ (2) 1 , ˆ φ (2) 3 and ˆ σ (2) are giv en in Figure 11 0 200 400 600 800 1000 1200 1400 1600 0 50 100 150 200 250 Time since December 31, 1968 (days) Trunk circumference (mm) (a) ( M , n + 1) = (5 , 20) 0 200 400 600 800 1000 1200 1400 1600 0 50 100 150 200 250 Time since December 31, 1968 (days) Trunk circumference (mm) (b) ( M , n + 1) = (30 , 20) Figure 1: Orange trees growth: simulated data (circles connected b y straight lines) and fit of the SDMEM (12)-(13) using an order K = 2 for the densit y expansion. In panel (a) is ( M , n + 1) = (5 , 20) and in panel (b) is ( M , n + 1) = (30 , 20) . Eac h panel rep orts the empirical mean curve (smooth solid line), 95% empirical confidence curves (dashed lines) and example sim ulated tra jectories. 170 175 180 185 190 195 200 205 210 215 220 280 300 320 340 360 380 400 420 440 φ 1 (2) φ 3 (2) Figure 2: Orange trees gro wth: scatterplot of ˆ φ (2) 3 vs ˆ φ (2) 1 and least squares fit for ( M , n + 1) = (30 , 20) . 12 170 175 180 185 190 195 200 205 210 215 220 0 20 40 60 80 100 120 140 φ 1 (2) (a) ˆ φ (2) 1 280 300 320 340 360 380 400 420 440 0 20 40 60 80 100 120 140 φ 3 (2) (b) ˆ φ (2) 3 0.075 0.08 0.085 0.09 0.095 0 20 40 60 80 100 120 140 σ (2) (c) ˆ σ (2) Figure 3: Orange trees growth: histogram of p opulation parameter estimates obtained using an order K = 2 for the density expansion, and fits of normal probability density functions for ( M , n + 1) = (30 , 20) . 3, with normal probability densit y functions fitted on top. The normal densities fit well to the histograms with estimated means (standard deviations) equal to 195.6 (6.6), 351.2 (19.2) and 0.080 (0.003) for ˆ φ (2) 1 , ˆ φ (2) 3 and ˆ σ (2) , resp ectiv ely . It is worth while to compare the metho dology presen ted here, where a closed form expan- sion to the transition densit y is used, with a more straightforw ard approac h, namely the ap- pro ximation to the transition density based on the so-called “one-step” Euler-Maruyama ap- pro ximation. Consider for ease of exp osition a scalar SDE dX t = µ ( X t ) dt + σ ( X t ) dW t start- ing at X 0 = x 0 . The SDE can b e appro ximated b y X t +∆ − X t = µ ( X t )∆ + σ ( X t )∆ 1 / 2 ε t +∆ for ∆ “small”, with { ε t +∆ } a sequence of indep enden t dra ws from the standard normal distribution. This leads to the following transition density appro ximation p X ( x t +∆ , ∆ | x t ) ≈ ϕ ( x t +∆ ; x t + µ ( x t )∆ , σ 2 ( x t )∆) (14) where ϕ ( · ; m, v ) is the p df of the normal distribution with mean m and v ariance v . The parameter estimates obtained using (14) instead of the closed-form appro ximation p ( K ) X are giv en in T able 1 and are denoted EuM (standing for Euler-Maruyama). The v alue for ∆ used in (14) is the time distance b et w een the simulated data p oin ts. The comparisons b et w een CFE and EuM for the same v alues of M and n hav e b een p erformed using the same simulated datasets. The quality of the estimates obtained with the CFE metho d compared to the simple EuM appro ximation is considerable impro ved for data sampled at lo w frequency ( ∆ large), i.e. when n = 7 , whic h is a common situation in applications. F or ( M = 30 , n = 7) the 95% confidence in terv als for the EuM metho d even fail to con tain the true parameter v alues. The bad b eha vior of the EuM approximation when ∆ is not s mall is well do cumen ted (e.g. Jensen and P oulsen (2002), Sørensen (2004)) and therefore our results are not surprising. Several exp erimen ts with different SDE models (not SDMEMs) ha ve b een conducted in Jensen and P oulsen (2002) where the conclusion is that although the CFE technique do es require tedious algebraic calculations, they seem to b e w orth the 13 effort. 5.2. Two-Dimensional Ornstein-Uhlenb e ck Pr o c ess The OU pro cess has found n umerous applications in biology , physics, engineering and finance, see e.g. Picchini et al. (2008) and Ditlevsen and Lansky (2005) for applications in neuroscience, or F av etto and Samson (2010) for a tw o-dimensional OU mo del describing the tissue micro v ascularization in anti-cancer therap y . Consider the follo wing SDMEM of a tw o-dimensi onal OU pro cess: dX (1) i t = − β 11 b i 11 ( X (1) i t − α 1 ) + β 12 b i 12 ( X (2) i t − α 2 ) dt + σ 1 dW (1) i t , (15) dX (2) i t = − β 21 b i 21 ( X (1) i t − α 1 ) + β 22 b i 22 ( X (2) i t − α 2 ) dt + σ 2 dW (2) i t , (16) b i ll 0 ∼ Γ( ν ll 0 , ν − 1 ll 0 ) , l , l 0 = 1 , 2; i = 1 , ..., M (17) with initial v alues X ( k ) i 0 = x ( k ) i 0 , k = 1 , 2 . Here Γ( r , s ) denotes the Gamma distribution with p ositiv e parameters r and s and probabilit y densit y function p Γ ( z ) = 1 s r Γ( r ) z r − 1 e − z /s , z ≥ 0 , with mean 1 when s = r − 1 . The parameters b i ll 0 , β ll 0 , σ l and ν ll 0 are strictly positive ( l , l 0 = 1 , 2 ) whereas α 1 and α 2 are real. Let ∗ denote elemen t-wise m ultiplication. Rewrite the system in matrix notation as d X i t = β ∗ b i ( α − X i t ) dt + σ d W i t , X i 0 = x i 0 , i = 1 , ..., M (18) where X i t = X (1) i t X (2) i t ! , β = β 11 β 12 β 21 β 22 , b i = b i 11 b i 12 b i 21 b i 22 , α = α 1 α 2 , σ = σ 1 0 0 σ 2 , W i t = W (1) i t W (2) i t ! , X i 0 = X (1) i 0 X (2) i 0 ! . The matrices β ∗ b i and σ are assumed to ha v e full rank. Assume the random effects are m utually indep enden t and indep enden t of X i 0 and W i t . Because of (17) the random effects ha ve mean one and therefore E ( β ∗ b i ) = β is the p opulation mean. The set of parameters to b e estimated in the external optimization step is θ = ( α 1 , α 2 , β 11 , β 12 , β 21 , β 22 , σ 1 , σ 2 ) and Ψ = ( ν 11 , ν 12 , ν 21 , ν 22 ) . Ho wev er, during the internal optimization step it is necessary to estimate the b i ’s also, that is 4 M parameters. Thus, the total n umber of parameters in the o verall estimation algorithm with internal and external steps is 12 + 4 M . A stationary solution to (18) exists when the real parts of the eigen v alues of β ∗ b i are strictly p ositiv e, i.e. β ∗ b i has to b e p ositiv e definite. The OU pro cess is one of the only 14 m ultiv ariate mo dels with a kno wn transition densit y other than multiv ariate mo dels whic h reduce to the sup erp osition of univ ariate pro cesses. The transition densit y of mo del (18) for a giv en realization of the random effects is the biv ariate Normal p X ( x i j , ∆ i j | x i j − 1 , b i , θ ) = (2 π ) − 1 | Ω | − 1 / 2 exp( − ( x i j − m ) T Ω − 1 ( x i j − m ) / 2) (19) with mean vector m = α + ( x i j − 1 − α ) exp( − ( β ∗ b i )∆ i j ) and cov ariance matrix Ω = λ − exp( − ( β ∗ b i )∆ i j ) λ exp( − ( β ∗ b i ) T ∆ i j ) , where λ = 1 2tr( β ∗ b i ) | β ∗ b i | | β ∗ b i | σ σ T + ( β ∗ b i − tr( β ∗ b i ) I ) σ σ T ( β ∗ b i − tr( β ∗ b i ) I ) T is the 2 × 2 matrix solution of the Lyapuno v equation ( β ∗ b i ) λ + λ ( β ∗ b i ) T = σ σ T and I is the 2 × 2 iden tity matrix (Gardiner (1985)). Here | A | denotes the determinan t and tr( A ) denotes the trace of a square matrix A . >F rom (18) we generated 1000 datasets of dimension 2( n + 1) × M and estimated the parameters using the proposed approximated metho d, thus obtaining 1000 sets of parameter estimates. A dataset consists of 2( n + 1) observ ations at the equally spaced sampling times { 0 = t i 0 < t i 1 < · · · < t i n = 1 } for eac h of the M exp erimen ts. The observ ations are obtained b y linear in terp olation from sim ulated tra jectories using the Euler-Maruy ama sc heme with step size equal to 10 − 3 (Klo eden and Platen (1992)). W e used the following set-up: ( X (1) i 0 , X (2) i 0 ) =(3, 3), ( α 1 , α 2 , β 11 , β 12 , β 21 , β 22 , σ 1 , σ 2 ) =(1, 1.5, 3, 2.5, 1.8, 2, 0.3, 0.5,), and ( ν 11 , ν 12 , ν 21 , ν 22 ) =(45, 100, 100, 25). An order K = 2 appro ximation to the lik eliho od was used, see the App endix for the co efficien ts of the transition densit y expansion. The estimates ( ˆ θ (2) , ˆ Ψ (2) ) are giv en in T able 2. The fit for ( M , 2( n + 1)) = (20 , 40) is given in Figures 4(a)-4(b) for X (1) i t and X (2) i t , resp ectiv ely . Eac h figure rep orts the sim ulated data, the empirical mean of 5000 simulated tra jectories from (15)-(17), generated with the Euler-Maruy ama sc heme using a step size of length 10 − 3 , the empirical 95% confidence bands of tra jectory v alues as well as fiv e example tra jectories. F or eac h sim ulated tra jectory a realization of b i ll 0 w as pro duced b y dra wing from the Γ( ν (2) ll , ( ν (2) ll 0 ) − 1 ) distribution using the estimates given in T able 2. The empirical correlations of the p opulation parameter estimates is rep orted in Figure 5. There is a strong negativ e correlation b et ween the estimates of α 1 and α 2 ( r = − 0 . 97 , p < 0 . 001 ), whic h are the asymptotic means for X (1) i t and X (2) i t . The sum ˆ α (2) 1 + ˆ α (2) 2 results alw ays almost exactly equal to 2.5 in eac h dataset ( mean = 2 . 50 , standard deviation = 0.04), so the sum is more precisely determined than eac h mean parameter. This o ccurs b ecause there is a strong negativ e correlation b et w een X (1) i t and X (2) i t equal to -0.898 in the stationary distribution in this n umerical example. The individual mean parameters are un biased but with standard deviations fiv e times larger than the sum. There is a mo derate negativ e correlation b et w een β 21 and β 22 ( r = − 0 . 53 , p < 0 . 001 ). The estimation metho d pro vides estimates for the b i ’s also, given by the last v alues returned b y the internal optimization step in the last round of the ov erall algorithm. An 15 T able 2: Ornstein-Uhlenbeck mo del: Monte Carlo maxim um lik eliho od estimates and 95% confidence in terv als from 1000 sim ulations of model (18), using an order K = 2 densit y expansion. T rue parameter v alues Estimates for M = 7 , 2( n + 1) = 40 α 1 α 2 β 11 β 12 ˆ α (2) 1 ˆ α (2) 2 ˆ β (2) 11 ˆ β (2) 12 1 1.5 3 2.5 Mean [95% CI] 1.00 [0.59, 1.40] 1.50 [1.00, 1.97] 3.03 [2.50, 3.59] 2.50 [2.49, 2.51] Skewness 0.19 -0.32 0.21 3.17 Kurtosis 5.27 5.73 3.29 59.60 β 21 β 22 σ 1 σ 2 ˆ β (2) 21 ˆ β (2) 22 ˆ σ (2) 1 ˆ σ (2) 2 1.8 2 0.3 0.5 Mean [95% CI] 1.61 [0.80, 2.14] 2.30 [1.56, 3.63] 0.307 [0.274, 0.339] 0.500 [0.494, 0.508] Skewness -1.13 1.17 0.15 -1.10 Kurtosis 5.45 4.77 3.05 46.62 ν 11 ν 12 ν 21 ν 22 ˆ ν (2) 11 ˆ ν (2) 12 ˆ ν (2) 21 ˆ ν (2) 22 45 100 100 25 Mean [95% CI] 104.79 [16.63, 171.62] 120.97 [5.90, 171.62] 105.97 [2.02, 171.62] 98.60 [4.92, 171.62] Skewness -0.10 -0.72 -0.35 -0.17 Kurtosis 1.26 2.13 1.74 1.25 T rue parameter v alues Estimates for M = 20 , 2( n + 1) = 40 α 1 α 2 β 11 β 12 ˆ α (2) 1 ˆ α (2) 2 ˆ β (2) 11 ˆ β (2) 12 1 1.5 3 2.5 Mean [95% CI] 1.00 [0.72, 1.27] 1.50 [1.19, 1.83] 3.01 [2.71, 3.32] 2.50 [2.50, 2.50] Skewness -0.13 -0.17 0.14 0.01 Kurtosis 7.01 6.84 3.32 34.66 β 21 β 22 σ 1 σ 2 ˆ β (2) 21 ˆ β (2) 22 ˆ σ (2) 1 ˆ σ (2) 2 1.8 2 0.3 0.5 Mean [95% CI] 1.71 [1.26, 2.03] 2.13 [1.72, 2.74] 0.307 [0.289, 0.327] 0.500 [0.495, 0.503] Skewness -1.05 0.80 -0.01 1.69 Kurtosis 5.23 3.96 3.00 41.16 ν 11 ν 12 ν 21 ν 22 ˆ ν (2) 11 ˆ ν (2) 12 ˆ ν (2) 21 ˆ ν (2) 22 45 100 100 25 Mean [95% CI] 83.35 [22.15, 171.62] 114.16 [18.18, 171.62] 105.00 [6.04, 171.62] 84.61 [7.36, 171.62] Skewness 0.66 -0.35 -0.26 0.24 Kurtosis 1.83 1.86 1.84 1.26 0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2 2.5 3 3.5 t X t (1) (a) X (1) i t 0 0.2 0.4 0.6 0.8 1 0.5 1 1.5 2 2.5 3 3.5 t X t (2) (b) X (2) i t Figure 4: Ornstein-Uhlen b ec k: simulated data (circles connected by straight lines), fit of X (1) i t (panel (a)) and of X (2) i t (panel (b)) from the SDMEM (18) for ( M , 2( n + 1)) = (7 , 40) . F or each co ordinate of the system the panels rep ort the empirical mean curve of the SDMEM (smooth solid line), 95% empirical confidence curv es (dashed lines) and five simulated tra jectories. 16 0.4 0.5 0.6 0.25 0.3 0.35 0 2 4 0 2 4 2.4 2.5 2.6 2 3 4 0 2 4 0 1 2 0.4 0.5 0.6 0.25 0.3 0.35 0 2 4 0 2 4 2.4 2.5 2.6 2 3 4 0 2 4 0 1 2 α 1 α 2 β 11 β 12 β 21 β 22 σ 1 σ 2 α 1 α 2 β 11 β 12 β 21 β 22 σ 1 σ 2 Figure 5: Ornstein-Uhlenbeck: scatterplot matrix of the estimates ( ˆ α (2) 1 , ˆ α (2) 2 , ˆ β (2) 11 , ˆ β (2) 12 , ˆ β (2) 21 , ˆ β (2) 22 , ˆ σ (2) 1 , ˆ σ (2) 2 ) for ( M , 2( n + 1)) = (20 , 40) . equiv alent strategy is to plug ( ˆ θ (2) , ˆ Ψ (2) ) in to (9) and then minimize − f ( b i ) w.r.t. b i and obtain ˆ b i (2) . The estimation of the random effects is fast b ecause we mak e use of the explicit Hessian, and for this example only 2-3 iterations of the internal step algorithm w ere necessary . W e estimated the b i ’s b y plugging eac h of the 1000 sets of estimates in to (9), th us obtaining the corresp onding 1000 sets of estimates of b i . In Figure 6 b o xplots of the estimates of the four random effects are rep orted for M = 7 , where estimates from different units ha ve b een p o oled together. F or b oth M = 7 and M = 20 the estimates of the random effects ha ve sample means equal to one, as it should b e giv en the distributional h yp othesis. The standard deviations of the true random effects are giv en by 1 / √ ν ll and th us equal 0.15, 0.1, 0.1 and 0.2 for b i 11 , b i 12 , b i 21 and b i 22 , resp ectiv ely . The empirical standard deviations of the estimated random effects for M = 7 are 0.09, 0.09, 0.12 and 0.11, whereas for M = 20 they are 0.09, 0.06, 0.08 and 0.09. The parameters could b e estimated b y plugging the exact transition densit y (19) in to (4) to form (9) and then maximize (10). Ho w ever, the effort required for the estimation algorithm to con verge is computationally costly , b oth using the analytic expression for the Hessian of f in (10) or the one obtained using AD, since the Hessian has a huge expression when using the exact transition densit y . This problem is not present when using the densit y 17 0.5 1 1.5 2 2.5 b_11 b_12 b_21 b_22 Figure 6: Ornstein-Uhlenbeck: b o xplots of the random effects estimates ˆ b i (2) for the SDMEM (18) for ( M , 2( n + 1)) = (7 , 40) . expansion b ecause the expansion consists of polynomials of the parameters. 5.3. The squar e r o ot SDMEM The square ro ot process is given b y dX t = − β ( X t − α ) dt + σ p X t dW t . This pro cess is ergo dic and its stationary distribution is the Gamma distribution with shap e parameter 2 β α/σ 2 and scale parameter σ 2 / (2 β ) provided that β > 0 , α > 0 , σ > 0 , and 2 β α ≥ σ 2 . The pro cess has many applications: it is, for instance, used in mathematical finance to mo del sh ort term interest rates where it is called the CIR pro cess, see Co x et al. (1985). It is also a particular example of an integrate-and-fire model used to describ e the ev olution of the mem brane p oten tial in a neuron b et ween emission of electrical impulses, see e.g. Ditlevsen and Lansky (2006) and references therein. In the neuronal literature it is called the F eller pro cess, b ecause William F eller prop osed it as a mo del for p opulation gro wth in 1951. Consider the SDMEM dX i t = − ˜ β i ( X i t − α − α i ) dt + ˜ σ i p X i t dW i t , i = 1 , ..., M . (20) 18 Assume α i ∼ B ( p α , p α ) , ˜ σ i ∼ LN ( p σ 1 , p 2 σ 2 ) and ˜ β i ∼ LN ( p β 1 , p 2 β 2 ) . Here LN ( · , · ) denotes the (standard or 2-parameter) log-normal distribution and B ( p α , p α ) denotes the (general- ized symmetric) Beta distribution on the in terv al [ a, b ] , with densit y function p B ( z ) = 1 B ( p α , p α ) ( z − a ) p α − 1 ( b − z ) p α − 1 ( b − a ) 2 p α − 1 , p α > 0 , a ≤ z ≤ b, where B ( · , · ) is the b eta function and a and b are known constants. F or ease of in ter- pretation, assume the individual parameters ˜ β i and ˜ σ i to hav e unknown means β and σ resp ectiv ely , e.g. assume ˜ β i = β + β i and ˜ σ i = σ + σ i , β i and σ i b eing zero mean random quan tities. This implies that β and σ do not need to b e estimated directly: in fact the estimate for β results determined via the momen t relation β = exp( p β 1 + p 2 β 2 / 2) and can b e calculated once estimates for p β 1 and p β 2 are av ailable. Similarly , an estimate for σ can b e determined via σ = exp( p σ 1 + p 2 σ 2 / 2) b y plugging in the estimates for p σ 1 and p σ 2 . The parameters to b e estimated are θ = α , Ψ = ( p α , p β 1 , p β 2 , p σ 1 , p σ 2 ) and b i = ( α i , ˜ β i , ˜ σ i ) . T o ensure that X i t sta ys p ositiv e it is required that 2( α + α i ) ˜ β i / ( ˜ σ i ) 2 ≥ 1 . This condition must be c hec ked in eac h iteration of the estimation algorithm. The means and v ariances of the p opulation parameters added the random effects are E ( α + α i ) = α + ( a + b ) / 2 ; V ar ( α + α i ) = ( b − a ) 2 / (4(2 p α + 1)) , E ( ˜ σ i ) = σ = exp( p σ 1 + p 2 σ 2 / 2) ; V ar ( ˜ σ i ) = (exp( p 2 σ 2 ) − 1) exp(2 p σ 1 + p 2 σ 2 ) , E ( ˜ β i ) = β = exp( p β 1 + p 2 β 2 / 2) ; V ar ( ˜ β i ) = (exp( p 2 β 2 ) − 1) exp(2 p β 1 + p 2 β 2 ) . F or fixed v alues of the random effects, the asymptotic mean for the exp erimen tal unit i is α + α i . In most applications this v alue should b e b ounded within physical realistic v alues, and th u s the Beta distribution w as c hosen for α i , since the supp ort of the distribution of α + α i is then [ α + a, α + b ] . As in the previous examples, 1000 sim ulations w ere p erformed b y generating equidistan t observ ations in the time in terv al [0 , 1] with the following setup: ( X i 0 , α, p α , p β 1 , p β 2 , p σ 1 , p σ 2 ) = (1 , 3 , 5 , 0 , 0 . 25 , 0 . 1 , 0 . 3) with fixed constan ts [ a, b ] = [0 . 1 , 5] . The coefficient of v ariations for α + α i , ˜ β i and ˜ σ i are then 13.3%, 25.4% and 30.7%, resp ectiv ely . The estimates obtained using an order K = 2 densit y expansion are giv en in T able 3. A p ositiv e bias for ˆ α (2) is noticeable, ho wev er results are ov erall satisfactory , ev en using small sample sizes. Bias in estimates of drift parameters on finite observ ation in terv als is a well kno wn problem, and especially the sp eed parameter β in mean reverting diffusion mo dels is kno wn to b e biased and highly v ariable. In T ang and Chen (2009) the biases for β in the OU and the square ro ot mo del are calculated to b e on the order of T , where T is the length of the observ ation interv al, and th us increasing n do es not impro v e the estimates unless the observ ation interv al is also increased. As describ ed in the Ornstein-Uhlenbeck example, w e ha ve v erified that the small sample distributions for the estimates of α i , ˜ β i and ˜ σ i ha ve the exp ected c haracteristics: e.g. in the case ( M , n + 1) = (5 , 7) , b y p ooling together estimates from differen t units we hav e the follo wing means (standard deviations) 2.61 (1.40), 0.95 (0.38) and 1.06 (0.29) for the estimates of α i , ˜ β i and ˜ σ i , resp ectiv ely . These v alues match w ell with the first moments 19 T able 3: The square ro ot model: Mon te Carlo maximum likelihoo d estimates and 95% confidence interv als from 1000 simulations of model (20), using an order K = 2 density expansion. Determined parameters are denoted with (*), i.e. true v alues for β and σ are determined according to the momen t relations β = exp( p β 1 + p 2 β 2 / 2) and σ = exp( p σ 1 + p 2 σ 2 / 2) . Estimates for determined parameters are calculated by plugging the estimates of p β 1 , 2 and p σ 1 , 2 obtained from eac h of the 1000 Monte Carlo simulations into the momen t relations, then av eraging o v er the 1000 determined v alues. T rue parameter values Estimates for M = 5 , n + 1 = 7 α β (*) σ (*) p α ˆ α (2) ˆ β (2) (*) ˆ σ (2) (*) ˆ p (2) α 3 1.03 1.16 5 Mean [95% CI] 4.06 [1,52, 9.85] 1.14 [0.51, 1.69] 1.13 [0.76, 1.60] 8.80 [0.94, 112.84] Skewness 1.23 0.49 0.60 4.37 Kurtosis 4.46 3.20 4.72 22.47 p β 1 p β 2 p σ 1 p σ 2 ˆ p (2) β 1 ˆ p (2) β 2 ˆ p (2) σ 1 ˆ p (2) σ 2 0 0.25 0.1 0.3 Mean [95% CI] -0.038 [-0.824, 0.177] 0.372 [0.001, 1.000] 0.082 [-0.278, 0.450] 0.173 [0.001, 0.561] Skewness -2.92 0.79 -0.20 0.88 Kurtosis 16.67 2.10 4.16 4.00 T rue parameter values Estimates for M = 10 , n + 1 = 20 α β (*) σ (*) p α ˆ α (2) ˆ β (2) (*) ˆ σ (2) (*) ˆ p (2) α 3 1.03 1.16 5 Mean [95% CI] 4.43 [1.85, 9.63] 1.21 [0.44, 1.69] 1.15 [0.88, 1.48] 5.31 [0.99, 64.63] Skewness 1.01 -0.04 0.44 6.35 Kurtosis 3.65 2.66 3.44 46.77 p β 1 p β 2 p σ 1 p σ 2 ˆ p (2) β 1 ˆ p (2) β 2 ˆ p (2) σ 1 ˆ p (2) σ 2 0 0.25 0.1 0.3 Mean [95% CI] -0.045 [-0.953, 0.154] 0.487 [0.001, 1] 0.108 [-0.166, 0.376] 0.153 [0.001, 0.447] Skewness -3.04 0.16 -0.03 0.71 Kurtosis 14.65 1.40 3.29 2.84 T rue parameter values Estimates for M = 20 , n + 1 = 20 α β (*) σ (*) p α ˆ α (2) ˆ β (2) (*) ˆ σ (2) (*) ˆ p (2) α 3 1.03 1.16 5 Mean [95% CI] 4.00 [2.35, 6.78] 1.21 [0.97, 1.68] 1.15 [0.98, 1.35] 2.33 [1.00, 5.00] Skewness 1.50 0.27 0.27 12.78 Kurtosis 6.74 2.98 3.38 174.05 p β 1 p β 2 p σ 1 p σ 2 ˆ p (2) β 1 ˆ p (2) β 2 ˆ p (2) σ 1 ˆ p (2) σ 2 0 0.25 0.1 0.3 Mean [95% CI] -0.011 [-0.069, 0.042] 0.498 [0.010, 1.000] 0.101 [-0.061, 0.256] 0.27 [0.13, 0.41] Skewness -5.97 0.22 -0.02 -0.01 Kurtosis 47.98 1.59 3.17 3.17 of the true random effects E ( α i ) = (0 . 1 + 5) / 2 = 2 . 55 , E ( ˜ β i ) = 1 . 03 and E ( ˜ σ i ) = 1 . 16 and less well with the standard deviations SD α i = 0 . 74 , SD ˜ β i = 0 . 26 and SD ˜ σ i = 0 . 35 . A v erage estimation time on a dataset of dimension ( M , n + 1) = (10 , 20) w as around 95 seconds and around 160 seconds when ( M , n + 1) = (20 , 20) , using a Ma tlab program on an Intel Core 2 Quad CPU (3 GHz). 6. CONCLUSIONS An estimation metho d for population models defined via SDEs, incorp orating random effects, has been prop osed and ev aluated through simulations. SDE models with ran- dom effects hav e rarely b een studied, as it is still non-trivial to estimate parameters in SDEs, even on single/individual tra jectories, due to difficulties in deriving analytically the transition densities and the computational cost required to appro ximate the densities n u- merically . Approximation metho ds for transition densities is an imp ortan t research topic, since a go od appro ximation is necessary to carry out inferences based on the likelihoo d function, whic h guaran tees well known optimal prop erties for the resulting estimators. Of the several approximate metho ds prop osed in the last decades (see e.g. Sørensen (2004) and Hurn et al. (2007) for reviews) here we hav e considered the one suggested b y Aït-Sahalia (2008) for the case of multidimensional SDEs, since it results in an accurate closed-form appro ximation for p X (Jensen and P oulsen (2002)). In this w ork SDEs with m ultiple random effects ha ve b een studied, mo ving a step 20 forw ard with resp ect to the results presented in Picc hini et al. (2010), where Gaussian quadrature w as used to solv e the in tegrals for a single random effect. The latter approac h results unfeasible when there are sev eral random effects because the dimension of the in tegral gro ws. In fact, it ma y b e difficult to n umerically ev aluate the in tegral in (3) and (7) when b i ∈ B ⊆ R q , with q m uch larger than 2, and efficient n umerical algorithms are needed. As noted by Bo oth et al. (2001), if e.g. q = 20 one cannot count on standard statistical soft ware to maximize the lik eliho od, and numerical integration quadrature is only an option if the dimension of the in tegral is lo w, whereas it quickly becomes unreliable when the dimension gro ws. Some references are the review pap er b y Co ols (2002), Krommer and Ueb erh ub er (1998) and references therein, or one of the several monographs on Mon te Carlo metho ds (e.g. Ripley (2006)). In the mixed-effects framework the amount of literature dev oted to the ev aluation of q -dimensional integrals is large, see e.g. Da vidian and Giltinan (2003), Pinheiro and Bates (1995), McCullo ch and Searle (2001) and Pinheiro and Chao (2006). W e decided to use the Laplace approximation because using a symbolic calculus soft ware it is relatively easy to obtain the Hessian matrix necessary for the calculations, whic h results useful also to sp eed up the optimization algorithm. Computing deriv atives of long expressions can be a tedious and error prone task even with the help of a softw are for sym b olic calculus. In those cases we rev erted to softw are for automatic differen tiation (AD, e.g. Griewank (2000)). Although the presen t w ork do es not necessarily rely on AD to ols, it is worth while to sp end a few words to describ e roughly what AD is, since it is relatively unkno wn in the statistical comm unity ev en if it has already b een applied in the mixed-effects field (Sk aug (2002); Sk aug and F ournier (2006)). AD should not b e confused with sym b olic calculus since it do es not produce analytic expressions for the deriv ativ es/Hessians of a given function, i.e. it do es not pro duce expressions mean t to b e understo od by the human ey e. Instead, given a program computing some function h ( u ) , the application of AD on h ( u ) pro duces another program implementing the calculations necessary to compute gradients, Hessians etc. of h ( u ) exactly (to machine precision); furthermore, AD can differen tiate programs including e.g. for lo ops or if-else statemen ts, whic h are outside the scope of symbolic differen tiation. See http://www. autodiff.org for a list of AD soft ware tools. Ho wev er, the p ossibilit y of easily deriving gradien ts and Hessians using AD comes at a price. The co de pro duced b y AD to compute the deriv atives of h ( u ) ma y result so long and complex that it might affect negatively the p erformance of the ov erall estimation pro cedure, when inv ok ed into an optimization pro cedure. Thus, w e suggest to use analytic expressions whenever possible. How ever, at the v ery least, an AD program can still b e useful to c heck whether analytically obtained results are correct or not. Mo dellers and practitioners might consider the soft ware AD Mo del Builder (ADMB Pro ject (2009)), pro viding a framework in tegrating AD, mo del building and data fitting to ols, whic h comes with its o wn mo dule for mixed-effects mo delling. This w ork has a n umber of limitations, mostly due to the difficulty in carrying out the closed-form appro ximation to the lik eliho o d for m ultidimensional S DEs ( d ≥ 2 ). It is ev en more difficult when the diffusion is not reducible, although mathematical metho ds to treat this case are a v ailable (Aït-Sahalia (2008)). Another limitation is that measurement error is not mo delled, which is a problem if this noise source is not negligible relatively 21 to the system noise. The R PSM pack age is capable of mo delling measuremen t error and uses the Extended Kalman Filter (EKF) to estimate SDMEMs (Klim et al. (2009)). EKF pro vides appro ximations for the individual lik eliho o ds which are exact only for linear SDEs. The closed-form expansion considered in the presen t w ork can instead provide an approximation as go o d as desired to the individual likelihoo d (8) by increasing the order K of the expansion, though it can b e a tedious task. Lik e in the present pap er, PSM considers a Laplace approximation to multidimensional integrals, but Hessians are obtained using an approximation to the second order deriv atives (first order conditional estimation, FOCE); in our work Hessians are obtained exactly (to machine precision) using automatic differen tiation. Unfortunately , the structural differences b etw een our metho d and PSM mak e a rigorous comparison betw een the t w o methods imp ossible, even simply in terms of computational times, since PSM requires the specification of a measurement error factor and th us b oth the observ ations and the num b er of parameters considered in the estimation are different. Finally , PSM assumes multiv ariate normally distributed effects only , whereas in our metho d this restriction is not necessary . W e b eliev e the class of SDMEMs to b e useful in applications, esp ecially in those areas where mixed-effects theory is used routinely , e.g. in biomedical and pharmacoki- netic/pharmaco dynamic studies. F rom a theoretical point of view SDMEMs are necessary when analyzing rep eated measurements data if b oth the v ariabilit y b et w een exp eriments to obtain more precise estimates of p opulation characteristics, as well as sto c hasticity in the individual dynamics should b e tak en in to account. App endixA. REDUCIBILITY AND DENSITY EXP ANSION COEFFICIENTS App endixA.1. R e ducible diffusions The following is a necessary and sufficien t condition for the reducibility of a multiv ariate diffusion pro cess (Aït-Sahalia (2008)): Prop osition 1. The diffusion X is r e ducible if and only if d X q =1 ∂ σ ik ( x ) ∂ x ( q ) σ q j ( x ) = d X q =1 ∂ σ ij ( x ) ∂ x ( q ) σ q k ( x ) for e ach x in E and triplet ( i, j, k ) = 1 , ..., d . If σ is nonsingular, then the c ondition is ∂ { σ − 1 } ij ( x ) ∂ x ( k ) = ∂ { σ − 1 } ik ( x ) ∂ x ( j ) wher e { σ − 1 } ij ( x ) is the ( i, j ) -th element of σ − 1 ( x ) . App endixA.2. Gener al expr essions for the density exp ansion c o efficients Here are rep orted the explicit expressions for the coefficients of the log-densit y expan- sion (6) as given in Aït-Sahalia (2008). The use of a symbolic algebra program is advised 22 for the practical calculation of the co efficien ts. F or t wo given d -dimensional v alues y and y 0 of the pro cess Y t = γ ( X t ) the co efficien ts of the log-densit y expansion are given by C ( − 1) Y ( y | y 0 ) = − 1 2 d X h =1 ( y ( h ) − y ( h ) 0 ) 2 , C (0) Y ( y | y 0 ) = d X h =1 ( y ( h ) − y ( h ) 0 ) Z 1 0 µ ( h ) Y ( y 0 + u ( y − y 0 )) du C ( k ) Y ( y | y 0 ) = k Z 1 0 G ( k ) Y ( y 0 + u ( y − y 0 ) | y 0 ) u k − 1 du. for k ≥ 1 . The functions G ( k ) Y are giv en by G (1) Y ( y | y 0 ) = − d X h =1 ∂ µ Y ( h ) ( y ) ∂ y ( h ) − d X h =1 µ Y ( h ) ( y ) ∂ C (0) Y ( y | y 0 ) ∂ y ( h ) + 1 2 d X h =1 ∂ 2 C (0) Y ( y | y 0 ) ∂ y ( h ) 2 + ∂ C (0) Y ( y | y 0 ) ∂ y ( h ) 2 and for k ≥ 2 G ( k ) Y ( y | y 0 ) = − d X h =1 µ Y ( h ) ( y ) ∂ C ( k − 1) Y ( y | y 0 ) ∂ y ( h ) + 1 2 d X h =1 ∂ 2 C ( k − 1) Y ( y | y 0 ) ∂ y ( h ) 2 + 1 2 d X h =1 k − 1 X h 0 =0 k − 1 h 0 ∂ C ( h 0 ) Y ( y | y 0 ) ∂ y ( h ) ∂ C ( k − 1 − h 0 ) Y ( y | y 0 ) ∂ y ( h ) . App endixA.3. Co efficients of the or ange tr e es gr owth SDMEM In mo del (12)-(13) is Y i t = γ ( X i t ) = 2 p X i t /σ so µ Y ( Y i t ) = Y i t ( φ 1 + φ i 1 − σ 2 Y i t 2 / 4) / (2( φ 3 + φ i 3 )( φ 1 + φ i 1 )) − 1 / (2 Y i t ) , and for giv en v alues y i j and y i j − 1 of Y i t , w e hav e C ( − 1) Y ( y i j | y i j − 1 ) = − 1 2 ( y i j − y i j − 1 ) 2 C (0) Y ( y i j | y i j − 1 ) = − σ 2 ( y i j 4 − y i j − 1 4 ) 32( φ 3 + φ i 3 )( φ 1 + φ i 1 ) + ( y i j 2 − y i j − 1 2 ) 4( φ 3 + φ i 3 ) − 1 2 log y i j y i j − 1 23 C (1) Y ( y i j | y i j − 1 ) = − σ 4 y i j 6 + y i j 5 y i j − 1 + y i j 4 y i j − 1 2 + ( y i j y i j − 1 ) 3 + y i j 2 y i j − 1 4 + y i j y i j − 1 5 + y i j − 1 6 896( φ 3 + φ i 3 ) 2 ( φ 1 + φ i 1 ) 2 + σ 2 (10( φ 3 + φ i 3 )( y i j 2 + y i j y i j − 1 + y i j − 1 2 ) + 3( y i j 5 + y i j 2 y i j − 1 + y i j − 1 3 )) 240( φ 3 + φ i 3 ) 2 ( φ 1 + φ i 1 ) − 9( φ 3 + φ i 3 ) 2 + y i j y i j − 1 ( y i j 2 + y i j y i j − 1 + y i j − 1 2 ) 24 y i j y i j − 1 ( φ 3 + φ i 3 ) 2 C (2) Y ( y i j | y i j − 1 ) = − σ 4 (5( y i j 4 + y i j − 1 4 ) + 8 y i j y i j − 1 ( y i j 2 + y i j − 1 2 ) + 9 y i j 2 y i j − 1 2 ) 896( φ 3 + φ i 3 ) 2 ( φ 1 + φ i 1 ) 2 + σ 2 9( y i j 2 + y i j − 1 2 ) + 12 y i j y i j − 1 + 10( φ 3 + φ i 3 ) 240( φ 3 + φ i 3 ) 2 ( φ 1 + φ i 1 ) − ( y i j 2 y i j − 1 2 + 9( φ 3 + φ i 3 ) 2 ) 24 y i j 2 y i j − 1 2 ( φ 3 + φ i 3 ) 2 and p (2) X ( x i j , ∆ i j | x i j − 1 ) = 1 q 2 π σ 2 ∆ i j x i j exp − 2 q x i j − q x i j − 1 2 σ 2 ∆ i j + ˜ C (0) ( x i j | x i j − 1 ) + ˜ C (1) ( x i j | x i j − 1 )∆ i j + ∆ i j 2 2 ˜ C (2) ( x i j | x i j − 1 ) where ˜ C ( k ) ( x i j | x i j − 1 ) = C ( k ) Y 2 √ x i j σ 2 √ x i j − 1 σ , k = 0 , 1 , 2 . App endixA.4. Co efficients of the two-dimensional OU SDMEM The pro cess (15)-(16) is reducible and γ ( x i ) = σ − 1 x i = ( x (1) i /σ 1 , x (2) i /σ 2 ) T , so d Y i t = σ − 1 ( β ∗ b i ) α − σ − 1 ( β ∗ b i ) σ Y i t dt + d W i t := κ i ( η − Y i t ) dt + d W i t where η = σ − 1 α = ( η 1 , η 2 ) T and κ i = σ − 1 ( β ∗ b i ) σ = { κ i q ,q 0 } q ,q 0 =1 , 2 . If y i j = ( y (1) i j , y (2) i j ) T and y i j − 1 = ( y (1) i j − 1 , y (2) i j − 1 ) T are tw o v alues from Y i t , the co efficien ts of the order K = 2 densit y 24 expansion (6) for mo del (15)-(16) are giv en b y: C ( − 1) Y ( y i j | y i j − 1 ) = − 1 2 ( y (1) i j − y (1) i j − 1 ) 2 − 1 2 ( y (2) i j − y (2) i j − 1 ) 2 , C (0) Y ( y i j | y i j − 1 ) = − 1 2 ( y (1) i j − y (1) i j − 1 )(( y (1) i j + y (1) i j − 1 − 2 η 1 ) κ i 11 + ( y (2) i j + y (2) i j − 1 − 2 η 2 ) κ i 12 ) − 1 2 ( y (2) i j − y (2) i j − 1 )(( y (1) i j + y (1) i j − 1 − 2 η 1 ) κ i 21 + ( y (2) i j + y (2) i j − 1 − 2 η 2 ) κ i 22 ) , C (1) Y ( y i j | y i j − 1 ) = 1 2 ( κ i 11 − (( y (1) i j − 1 − η 1 ) κ i 11 + ( y (2) i j − 1 − η 2 ) κ i 12 ) 2 ) + 1 2 ( κ i 22 − (( y (1) i j − 1 − η 1 ) κ i 21 + ( y (2) i j − 1 − η 2 ) κ i 22 ) 2 ) − 1 2 ( y (1) i j − y (1) i j − 1 )(( y (1) i j − 1 − η 1 )( κ i 11 2 + κ i 21 2 ) + ( y (2) i j − 1 − η 2 )( κ i 11 κ i 12 + κ i 21 κ i 22 )) + 1 24 ( y (1) i j − y (1) i j − 1 ) 2 ( − 4 κ i 11 2 + κ i 12 2 − 2 κ i 12 κ i 21 − 3 κ i 21 2 ) − 1 2 ( y (2) i j − y (2) i j − 1 )(( y (1) i j − 1 − η 1 )( κ i 11 κ i 12 κ i 21 κ i 22 ) +( y (2) i j − 1 − η 2 )(( κ i 12 ) 2 + ( κ i 22 ) 2 )) + 1 24 ( y (2) i j − y (2) i j − 1 ) 2 ( − 4( κ i 22 ) 2 + ( κ i 21 ) 2 − 2 κ i 12 κ i 21 − 3( κ i 12 ) 2 ) − 1 3 ( y (1) i j − y (1) i j − 1 )( y (2) i j − y (2) i j − 1 )( κ i 11 κ i 12 + κ i 21 κ i 22 ) , C (2) Y ( y i j | y i j − 1 ) = − 1 12 (2 κ i 11 2 + 2 κ i 22 2 + ( κ i 12 + κ i 21 ) 2 ) + 1 6 ( y (1) i j − y (1) i j − 1 )( κ i 12 − κ i 21 )(( y (1) i j − 1 − η 1 )( κ i 11 κ i 12 + κ i 21 κ i 22 ) + ( y (2) i j − 1 − η 2 )( κ i 12 2 + κ i 22 2 )) + 1 12 ( y (1) i j − y (1) i j − 1 ) 2 ( κ i 12 − κ i 21 )( κ i 11 κ i 12 + κ i 21 κ i 22 ) + 1 12 ( y (2) i j − y (2) i j − 1 ) 2 ( κ i 21 − κ i 12 )( κ i 11 κ i 12 + κ i 21 κ i 22 ) + 1 6 ( y (2) i j − y (2) i j − 1 )( κ i 21 − κ i 12 )(( y (1) i j − 1 − η 1 )( κ i 11 2 + κ i 21 2 ) +( y (2) i j − 1 − η 2 )( κ i 11 κ i 12 + κ i 21 κ i 22 )) + 1 12 ( y (1) i j − y (1) i j − 1 )( y (2) i j − y (2) i j − 1 )( κ i 12 − κ i 21 )( κ i 22 2 + κ i 12 2 − κ i 11 2 + κ i 21 2 ) . App endixA.5. Co efficients of the squar e r o ot SDMEM F or mo del (20) w e hav e Y i t = 2 p X i t ˜ σ i 25 and µ Y ( Y i t ) = 2 q + 1 2 Y i t − ˜ β i Y i t 2 , where q = 2 ˜ β i ( α + α i ) / ( ˜ σ i ) 2 − 1 . F or giv en v alues y i j − 1 and y i j of Y i t the co efficien ts of the order K = 2 densit y expansion are: C ( − 1) Y ( y i j | y i j − 1 ) = − 1 2 ( y i j − y i j − 1 ) 2 , C (0) Y ( y i j | y i j − 1 ) = log y i j y i j − 1 q + 1 2 − 1 4 ˜ β i ( y i j 2 − y i j − 1 2 ) , C (1) Y ( y i j | y i j − 1 ) = − 1 24 y i j − 1 y i j − 12 ˜ β i y i j y i j − 1 ( q + 1) + ( ˜ β i ) 2 ( y i j 3 y j − 1 + ( y i j y i j − 1 ) 2 + y i j y i j − 1 3 ) + 12 q 2 − 3 , C (2) Y ( y i j | y i j − 1 ) = = − 1 24( y i j y i j − 1 ) 2 ( ˜ β i ) 2 ( y i j y i j − 1 ) 2 + 12 q 2 − 3 . A c knowledgmen ts Supp orted by grants from the Danish Council for Indep enden t Researc h | Natural Sci- ences to S. Ditlevsen. U. Picchini thanks the Department of Mathematical Sciences at the Univ ersity of Cop enhagen, Denmark, for funding his research for the present w ork during y ear 2008. 26 ADMB Pro ject, 2009. AD Mo del Builder: automatic differentiation mo del builder. Devel- op ed b y Da vid F ournier and freely av ailable from admb- project.org . Aït-Sahalia, Y., 2008. Closed-form likelihoo d expansion for multiv ariate diffusions. Ann. Stat. 36 (2), 906–937. Allen, E., 2007. Mo deling with Itô Stochastic Differen tial Equations. Springer. Bisc hof, C., Bück er, M., V ehreschild, A., 2005. ADiMat. R WTH Aac hen Univ ersit y , Ger- man y , av ailable at http://www.sc.rwth- aachen.de/adimat/ . Bo oth, J., Hob ert, J., Jank, W., 2001. A surv ey of Mon te Carlo algorithms for maximizing the lik eliho o d of a t w o-stage hierarchical mo del. Statistical Mo delling 1, 333–349. Coleman, T., Li, Y., 1996. An interior, trust region approach for nonlinear minimization sub ject to b ounds. SIAM Journal on Optimization 6, 418–445. Co ols, R., 2002. A dv ances in multidimensional integration. Journal of Computational and Applied Mathematics 149, 1–12. Co x, J., Ingersoll, J., Ross, S., 1985. A theory of the term structure of interest rate. Econometrica 53, 385–407. Da vidian, M., Giltinan, D., 2003. Nonlinear mo dels for rep eated measuremen ts: an o verview and up date. Journal of Agricultural, Biological, and Environmen tal Statistics 8, 387–419. D’Errico, J., 2006. fminsearchbnd . Bound constrained optimization using fminsearch , http://www.mathworks.com/matlabcentral/fileexchange/8277- fminsearchbnd . Ditlevsen, S., De Gaetano, A., 2005. Mixed effects in sto c hastic differential equations mo dels. REVST A T - Statistical Journal 3 (2), 137–153. Ditlevsen, S., Lansky , P ., 2005. Estimation of the input parameters in the Ornstein- Uhlen b ec k neuronal mo del. Ph ysical Review E 71, 011907. Ditlevsen, S., Lansky , P ., 2006. Estimation of the input parameters in the Feller neuronal mo del. Ph ys. Rev. E 73, Art. No. 061910. Donnet, S., F oulley , J., Samson, A., 2010. Ba yesian analysis of gro wth curv es using mixed mo dels defined b y sto c hastic differen tial equations. Biometrics 66 (3), 733–741. Donnet, S., Samson, A., 2008. P arametric inference for mixed mo dels defined b y sto c hastic differen tial equations. ESAIM: Probability & Statistics 12, 196–218. F a vetto, B., Samson, A., 2010. P arameter estimation for a bidimensional partially ob- serv ed Ornstein-Uhlenbeck pro cess with biological application. Scandina vian Journal of Statistics 37, 200–220. 27 Filip e, P ., Braumann, C., Ro quete, C., 2010. Multiphasic individual growth mo dels in random environmen ts. Metho dology and Computing in Applied Probability , 1–8. 10.1007/s11009-010-9172-0. Gardiner, C., 1985. Handb o ok of Sto c hastic Metho ds for Ph ysics, Chemistry and the Nat- ural Sciences. Springer. Griew ank, A., 2000. Ev aluating Deriv ativ es: Princip les and T ec hniques of Algorithmic Differen tiation. SIAM Philadelphia, P A. Hurn, A., Jeisman, J., Lindsay , K., 2007. Seeing the w o o d for the trees: A critical ev aluation of metho ds to estimate the parameters of sto c hastic differential equations. Journal of Financial Econometrics 5 (3), 390–455. Jensen, B., Poulsen, R., 2002. T ransition densities of diffusion pro cesses: numerical com- parison of appro ximation techniques. Journal of Deriv ativ es 9, 1–15. Jo e, H., 2008. Accuracy of Laplace approximation for discrete response mixed mo dels. Computational Statistics & Data Analyis 52 (12), 5066–5074. Klim, S., Mortensen, S. B., Kristensen, N. R., Overgaard, R. V., Madsen, H., 2009. P op- ulation sto c hastic mo delling (PSM) – An R pac k age for mixed-effects mo dels based on sto c hastic differential equations. Computer Metho ds and Programs in Biomedicine 94, 279–289. Klo eden, P ., Platen, E., 1992. Numerical Solution of Sto c hastic Differential Equations. Springer. K o, H., Davidian, M., 2000. Correcting for measurement error in individual-level co v ariates in nonlinear mixed effects mo dels. Biometrics 56 (2), 368–375. Krommer, A., Ueberhuber, C., 1998. Computational Integration. So ciet y for Industrial and Applied Mathematics. Lindstrom, M., Bates, D., 1990. Nonlinear mixed-effects mo dels for rep eated measures data. Biometrics 46, 673–687. McCullo c h, C., Searle, S., 2001. Generalized, Linear and Mixed Mo dels. Wiley Series in Probabilit y and Statistics. John Wiley & Sons, Inc. Øksendal, B., 2007. Sto c hastic Differen tial Equations: An In tro duction With Applications, sixth Edition. Springer. Ov ergaard, R., Jonsson, N., T ornøe, C., Madsen, H., 2005. Non-linear mixed-effects mo d- els with sto c hastic differential equations: implementation of an estimation algorithm. Journal of Pharmacokinetics and Pharmaco dynamics 32, 85–107. 28 Picc hini, U., De Gaetano, A., Ditlevsen, S., 2010. Sto c hastic differen tial mixed-effects mo dels. Scandina vian Journal of Statistics 37 (1), 67–90. Picc hini, U., Ditlevsen, S., De Gaetano, A., Lansky , P ., 2008. Parameters of the diffusion leaky in tegrate-and-fire neuronal mo del for a slo wly fluctuating signal. Neural Compu- tation 20 (11), 2696–2714. Pinheiro, J., Bates, D., 1995. Appro ximations of the log-lik eliho o d function in the nonlinear mixed-effects mo del. Journal of Computational and Graphical Statistics 4 (1), 12–35. Pinheiro, J., Bates, D., 2002. Mixed-effects mo dels in S and S-PLUS. Springer-V erlag, NY. Pinheiro, J., Bates, D., DebRo y , S., Sark ar, D., the R Dev elopmen t Core T eam, 2007. The nlme Pac k age. R F oundation for Statistical Computing, http://www.R- project.org/ . Pinheiro, J., Chao, E., 2006. Efficien t Laplacian and adaptive Gaussian quadrature algo- rithms for multilev el generalized linear mixed mo dels. Journal of Computational and Graphical Statistics 15 (1), 58–81. Ripley , B., 2006. Sto c hastic Simulation. Wiley-Interscience. Sh un, Z., McCullagh, P ., 1995. Laplace appro ximation of high dimensional integrals. Jour- nal of the Ro yal Statistical So ciety B 57 (4), 749–760. Sk aug, H., 2002. Automatic differentiation to facilitate maximum likelihoo d estimation in nonlinear random effects mo dels. Journal of Computational and Graphical Statistics 11 (2), 458–470. Sk aug, H., F ournier, D., 2006. Automatic appro ximation of the marginal likelihoo d in non- Gaussian hierarc hical mo dels. Computational Statistics & Data Analysis 51, 699–709. Sørensen, H., 2004. P arametric inference for diffusion pro cesses observ ed at discrete points in time: a survey . International Statistical Review 72 (3), 337–354. Strathe, A., Sørensen, H., Danfær, A., 2009. A new mathematical mo del for com bining gro wth and energy intak e in animals: The case of the growing pig. Journal of Theoretical Biology 261 (2), 165 – 175. T ang, C., Chen, S., 2009. P arameter estimation and bias correction for diffusion processes. Journal of Econometrics 149 (1), 65–81. T ornøe, C., Ov ergaard, R., Agersø, H., Nielsen, H., Madsen, H., Jonsson, E. N., 2005. Sto c hastic differential equations in NONMEM: implemen tation, application, and com- parison with ordinary differential equations. Pharmaceutical Researc h 22 (8), 1247–1258. 29
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment