Rising Above Chaotic Likelihoods

Berliner (Likelihood and Bayesian prediction for chaotic systems, J. Am. Stat. Assoc. 1991) identified a number of difficulties in using the likelihood function within the Bayesian paradigm which arise both for state estimation and for parameter esti…

Authors: Hailiang Du, Leonard A. Smith

Rising Above Chaotic Likelihoods
Rising Ab o v e Chaotic Lik eliho o ds Haili ang Du 1 , 2 Leonard A. Smith 1 , 2 , 3 1 Cen tr e for the Analysis of Time Series, London Sc ho ol of Economi cs, London W C2A 2AE. UK 2 Cen ter for Robust Decision Making on C limate and Ener gy Policy , Universit y of Chicag o, Chica go, IL 60637, US 3 P embrok e Co llege, Ox f ord, UK No vem b er 8, 2021 Abstract Berliner (Lik eliho od and Ba y esian prediction for c haotic systems, J. Am. Stat. As- so c. 1991) identified a num b er of difficulties in u sing the lik eliho od function within th e Ba y esian paradigm wh ic h arise b oth for s ta te estimatio n and for parameter estimation of c haotic systems. E v en when the equations of the system are giv en, h e demonstrated “c haotic likelihoo d fu nctio ns ” b oth of initia l conditions and of paramete r v al u es in the Logistic Map. Chaotic lik eliho od functions, while ultimately smo oth, ha ve suc h complicated sm al l scale structure as to cast doubt on the p ossibilit y of iden tifying high lik eliho od states in pr actice. In this pap er, the challenge of c haotic likel iho o ds is o v ercome by emb edding the observ ations in a higher d im en sional sequence-space; this allo ws goo d state estimation with finite computational p o wer. An imp ortance sam- pling appr oa ch is in tro duced, where Pseu d o-o rb it Data Assimilation is emplo y ed in the sequence-space, fir s t to iden tify r ele v an t pseud o-orbits and then r elev an t tra jecto- ries. Estimates are identified w ith lik eliho ods orders of m agnitud e higher t h an those previously iden tified in the examples giv en by Berliner. Pseudo-orbit Data Assimi- lation imp ortance sampler exploits the information b oth from the mo del d ynamics 1 and from the observ ations. While sampling from the r el ev an t prior (here, th e natural measure) will, of course, ev entuall y yield an acco untable sample, giv en the realistic computational resource this traditional approac h w ould pr o vide no high lik eliho od p oin ts at all. Wh ile one of the chall enges Berliner p osed is ov ercome, his cent r al con- clusion is supp orted. “Chaotic lik elihoo d fu nctio ns ” for parameter estimatio n still p ose a challenge ; this fact helps clarify w hy p h ysical scien tists main tain a strong distinction bet ween the initia l condition uncertain t y and parameter uncertai nt y . 1 In tro duc t ion Nonlinear ch ao tic systems pose sev eral c hallenges both for state es timation and for param- eter estimation. Chaos as a phenomenon implies sensitiv e dep endence to initial condition: initially nearby states will ev entually div erge in the future. The bifurcations of v arious c haotic systems [3 0 ] reve al ho w the b eha vior of the system differs as a parameter v alue c hanges. One migh t think that Bay esian analysis should be a ble to obtain g oo d estimation b oth of initial conditions a nd of pa r amete r v alues without m uc h trouble. Berliner [2] exam- ined the log-lik eliho o d function of estimates of initial conditions and parameter v alues for the Logistic Map, noting that c haotic systems can lead to “c haotic lik eliho o d functions” and suggesting that Ba y esian ana lysis would require prohibitive ly in tensiv e computing. The failure o f v ar iational appro ac hes, when a pplie d to long windo w observ ations of c haotic systems [7, 1 1 , 20], supp orts his p oin t. Sensitivit y to initial conditio n on the other hand sug- gests that information in the observ a t io ns (ev en o ver a relatively short windo w) can lead to go o d estimates of the initial condition [12]. An imp ortance sampling approac h to Berliner’s c hallenge without “intensiv e computing” is deplo ye d in this pap er. Adopting Pseu do- orbit Data Ass imilation ( PDA) [7, 11] recasts the problem in to a higher dimensional sequence space, where truly hig h lik eliho o d states are successfully lo cated. The c hallenges o f initial condition estimation and parameter estimation are dissimilar f o r c ha otic systems. PD A do es not easily gene ralize to parameter estimation, as it is unclear ho w to mathematically define a relev ant subs pace of the para mete r space in whic h the high lik eliho od tra jectories 2 migh t exist. Thus c hallenges remain in iden tifying high lik eliho o d parameter v alues giv en the initial condition; this asymmetry r efle cts differences b etw een the initial conditions and parameter v alues. In terms of estimating initial conditions giv en the parameter v alues, ho w ev er, Berliner’s c hallenge a s originally stated is m et and resolve d. 2 Chaotic L i k elih o o d F unction of Initial Con ditions Berliner’s [2] form ulation of the problem is adapted here. The Logistic M ap is the sy stem; in sections 2-4, the parameter a = 4 is kno wn but the true initial state ˜ x 0 is not. In that case, t he exp erimen t is said to fall within the p erfect mo del scenario 1 . In the first four sections of this pa per, the mathematical system a nd the mo del are iden tical, the w ord mo del is not us ed. Once real systems are considered, either structural mo del error or parameter inaccuracy requires one to distinguish the s ystem whic h generated the data from the mo del(s ) emplo y ed to a nalyze it. The ev olution of system s tat es 2 x i ∈ R m is then gov erned b y the nonlinear dynamics f : x i +1 = f ( x i ), where for the Logistic Map f ( x i ) = ax i (1 − x i ) . (1) Assuming additiv e observ at io nal noise, δ i , yields observ ations, s i = ˜ x i + δ i where ˜ x is the tr ue system state (T ruth) and the observ at ional no ise, δ i , is Indep enden t Nor mally Distributed (IND), δ i ∼ N (0 , σ 2 ). Under this assumption of normalit y , the log- lik eliho o d function is: LLik ( x 0 ) = − n − 1 X i =1 ( s i − f i ( x 0 )) 2 / 2 σ 2 , (2) where f i ( x ) is the i th iteration of f ( x ), s i is the i th observ ation, and n is the duration of observ ations considered. 1 The p erfect mo del scenar io, when the sys tem and the mo del are identical, ease s use of a digital computer; one av oids the issue of “ro und-off err or” b y consistent ly us e o f th e same computer and co de. This implies , of co urse, that E quation 1 no long er reflects the p erfect mo del. F or discussion, se e [1, 27, 28] and referenc e s thereof. 2 F o r m = 1, the s tate x i is a scalar . 3 0 0.2 0.4 0.6 0.8 1 −400 −350 −300 −250 −200 −150 −100 −50 0 x 0 LLik 0 0.2 0.4 0.6 0.8 1 −400 −350 −300 −250 −200 −150 −100 −50 0 x 0 RLLik Figure 1: T ypical log- lik eliho o d o f 1024 states (unifor mly distributed on [0 , 1]) for the Logistic Map. The true initial condition ˜ x 0 = √ 2 / 2 (denoted by ‘ × ’), σ = 0 . 1 and n = 32 . a) Log-likelihoo d function, b) Relative log-lik eliho o d to ˜ x 0 . States whic h hav e (relativ e) log-lik eliho o d less tha n -4 00 are plotted on the -400 horizontal line. All logarithms are using natural base. The maxim um lik eliho o d v alue found was ∼ e − 109 , whic h corresp onds to log-lik eliho o d of − 132 relativ e to T ruth. Figure 1 sho ws the c haotic lik eliho o d structure of 1024 candidate v a lues of x 0 dra wn uniformly o n the inte rv al zero, one. P a nel (a) plot s the log-likelih o o d fo r eac h x 0 , this can b e con trasted with v arious panels in Berliner’s [2] Figure 3. 3 P anel (b) shows the log- lik eliho o d relativ e to that of t he true tra jectory of the system state 4 . F or the conv enience of illustration, the same normalization used in panel (b) is applied in Figures 2-4 in this pap er. F rom Figure 1, it is clear that no high lik eliho o d states are iden tified. This is not a 4 0.4 0.5 0.6 0.7 0.8 0.9 1 −400 −350 −300 −250 −200 −150 −100 −50 0 x 0 RLLik 0.697 0.702 0.707 0.712 0.717 −400 −350 −300 −250 −200 −150 −100 −50 0 x 0 RLLik Figure 2 : Relativ e log-likelihoo d of 1024 states a) sampled from inv erse observ atio na l noise, b) uniformly sampled from [ ˜ x 0 − σ 10 , ˜ x 0 + σ 10 ]. The true initial condition ˜ x 0 = √ 2 / 2 (denoted b y ‘ × ’), σ = 0 . 1 and n = 32. case of equifinalit y 5 . Giv en the observ ationa l noise dis tribution, one can add random dra ws from the inv erse of the observ ational noise distribution to the observ a tion to obtain candidate estimates of initial condition. Figure 2a sho ws the r elat ive log-lik eliho o d of 1024 samples f rom in ve rse observ ational noise. N o high lik eliho o d states ar e iden tified in this w ay . (The maximum lik eliho o d v alue found w as ∼ e − 90 .) T o illustrat e the impact o f making muc h more precis e observ ations, consider a case whe re ˜ x 0 is kno wn to be within a region of ra dius o nly σ/ 1 0. Figure 2b sho ws the relativ e log-lik eliho o d of 102 4 uniformly sampled states in t he region around T ruth with σ / 1 0 radius. Y et again, no high lik eliho o d state are identified . (The 3 Here log-likelihoo d funct io ns based on a se quence of 32 observ ations are computed beca use pr oblem bec omes more obvious when more obser v ations ar e used. B erliner exa mined 15 & 10 o bserv ations . Shorter sequences o f obser v ations are examined in Section 3. 4 Note: given that only finite observ ations a re considered the true state of the system is, with probability 1, not the ma xim um likelihoo d state. 5 Equifinality occur s when man y p oten tial solutions to a task are each goo d, making it impo ssible to ident ify the true solution given the information in hand. In such cases the sampled lik eliho o d functions ar e relatively flat. Equidisma lit y ar ises when the sampled relative likelihoo d function is flat yet all solutions tested hav e v anishingly small likeliho od given the infor mation. Examining the relative likeliho od obscures the difference ; for tunately the e x pected (distribution of ) likelihoo d can b e computed fro m the noise mo del alone without knowledge o f the true initia l condition. 5 maxim um lik eliho o d v alue found increas es to ∼ e − 85 .) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −400 −350 −300 −250 −200 −150 −100 −50 0 x 0 RLLik 0.707106732 0.707106752 0.707106772 0.707106792 0.707106812 0.707106831 −400 −350 −300 −250 −200 −150 −100 −50 0 x 0 RLLik Figure 3: a) F ollow ing Figure 1b, but in this case including a set of 102 4 states (blue), whic h a re ex tremely close to ˜ x 0 , generated by spiral sampling around the ˜ x 0 (see fo otnote 6); b) zo om in o f a ). This difficulty here is not a shortcoming of the like liho o d f ramew ork, as there are high lik eliho o d states other tha n T ruth. One may demonstrate that suc h high lik eliho o d states exist b y sampling the p oin ts on a loga rithmic spiral approaching T ruth (to mac hine precision) 6 . Figure 3 de monstrates that hig h likelihoo d states other than T ruth do exist (i.e. some of the blue p oin ts). A smo oth curve o f the log-like liho o d function is only observ ed within a r adius of ˜ x 0 smaller than ∼ 1 0 − 7 , (see Figure 3b). Without kno wing T ruth, of course, this approach to identifying high lik eliho o d p oin ts is inaccessible. The lik eliho o d function is extremely ja gged; a s Berliner [2] stressed, finding ev en one high lik eliho o d state b y sampling the state space is prohibitiv ely costly . That said, there is no sense in whic h “sensitivit y to the initial conditions” can b e tak en to imply that the info rmation in the initial condition is “forgotten” or “lost”. There is sufficien t infor- mation in the observ a tion segmen t to distinguish high lik eliho o d initial states. Candidate states with non v anishing log - lik eliho o d relative to T ruth can be found b y extracting the information from the system dynamics using a relativ ely new approach to data assimilation: 6 In this exp erimen ts 1024 p oin ts a re generated by ˜ x 0 + 2 − (10+ 60 i 1024 ) ǫ i , i = 1 , 2 , ..., 1 0 24 where ǫ i is ra ndom drawn fro m U (0 , 1). 6 Pseudo-orbit Data Assimilation. 3 Imp ortance Sampling via Pseudo-orbi t D ata A s sim- ilation 3.1 Metho dology Uniform sampling in state space is not an efficien t approac h to lo cating high like liho o d states. As the dimension of the system increases, the task b ecomes ev en more computation- ally impractical. PDA imp ortance sampler 7 lo cates high lik eliho o d states in the tra jectory manifold by adopting t he Pseudo-orbit D a ta Assimilation approach [7 , 11]. PDA t a k es adv antage of the fact that solutions consisten t with the known dy namics will lie along a lo w er dimensional manifold in the higher dimensional sequence space. A brief introduction of t he PDA approach is give n in the following paragra ph (see [7, 11] for additional details). Giv en a dynamical system of dimension m and a sequence of n observ ations 8 , define a sequence space as the m × n dimensional space in whic h a single p oin t can b e thought of as a particular series of n states u i , for i = 0 , . . . , n − 1 where u i is an m dimensional v ector. Most p oin ts in seq uence space do not corres p ond to a tra jectory of the system. Define a pseudo-orbit , U ≡ { u 0 , ..., u n − 2 , u n − 1 } , to b e a p oin t in the m × n dimensional sequenc e space for whic h u i +1 6 = f ( u i ) for one or more comp onen ts of U . Th us a pseudo-orbit corresp onds to a sequence of states which is not a tra j ec tor y of the system. Each sequence of n o bserv ations s i , i = 0 , . . . , n − 1 defines a pseudo-orbit (a p oin t in the sequence space). Call t his an o bserve d p seudo-orbit , S ≡ { s 0 , ..., s n − 2 , s n − 1 } , whic h with probability one will not b e a tra jectory . 7 In high-dimensional space the sampler targets the r elev ant lower-dimensional tra jectory manifold which is more efficient than sampling a hyper sphere suggested by one obs e r v ation. Even in the one-dimensiona l Logistic Map this appro ac h suc c e eds by using PD A to sa mple the tra jectory manifold in the n-dimensional sequence spa ce. 8 F o r the Logistic Map, observ ations and sy stem states share the sa me one dimensiona l spac e . 7 Define the mi smatch to b e: e i = | f ( u i ) − u i +1 | (3) By construction, tra jectories ha ve a mismatc h of zero. The mismatc h cost function is then giv en by : C ( U ) = n − 1 X i =0 e 2 i (4) The most common form o f Pseudo-orbit Data Assimilation simply minimizes t he mismatc h cost function for U in the m × n dime nsional sequence space starting from the observ ed pseudo-orbit. If a gradient desce nt ( G D) approac h is a dopted 9 , then a minim um of the mismatc h cost function can b e obta ined b y solving the ordinary differen tia l equation d U dτ = −∇ C ( U ) , (5) where τ denotes algorit hmic time. 10 In practice, the minimization is initialized with the observ ed pseudo-orbit, i.e. 0 U = S where the pre-sup er-script zero on U denotes the initial algorithmic time ( τ = 0 ) of the GD. Under gradien t descen t, a p oin t in sequence space mo v es tow ards the tra jectory manifold under Equation 5. The minimization algorithm requires differentiating the mismatch cost function (Equation 4), whic h giv es: ∂ C ( U ) ∂ u i = 2 ×          − ( u i +1 − f ( u i )) J ( u i ) i = 0 − ( u i − f ( u i − 1 )) + ( u i +1 − f ( u i )) J ( u i ) 0 < i < n − 1 − u i − f ( u i − 1 ) i = n − 1 (6) where J ( u i ) is the Jacobian of f at u i . The ordinary differen tial Equation 5 is solv ed using the Euler approximation in t he examples b elo w. The mismatc h cost function has no lo cal minima other t ha n p oin ts on the manifold, for whic h C ( U ) = 0, (t his defines the tra jectory manifold 11 ) and of course ev ery segmen t 9 Other methods for this minimization ar e av ailable, GD is discussed here due to its simplicit y and adequacy . 10 The approach can b e ge ne r alized to situa tio ns where gradient is no t known analytically [13]; impro ving the abilit y to work without gradient information would widen the application of the approa c h sig nifica n tly . 11 Back-substitution of the solution of − u n − 1 − f ( u n − 2 ) = 0 into Equation 6 shows that the only cr itical po in ts for C ( U ) have u i − f ( u i − 1 ) = 0 for all i in 0 ≤ i ≤ n − 1. All p oin ts on the tra jectory manifold hav e zero mis matc h (are tra jectories) and only po in ts on the tr a jectory manifold have zero misma tc h. 8 of tra jectory lies on this manifold [11]. Denote the result o f the GD minimization at algorithmic time τ = α as α U ≡ α u 0 , ..., α u n − 1 . Here α indicates algorithmic time in GD (i.e. the n um b er of iterations of the GD minimization). As α → ∞ , the pseudo-orbit α U approac hes a tra jectory o f the mo del a symptotically . In other w ords, the GD minimization tak es the observ ed pseudo-orbit to w ards a mo del tra jectory ( ∞ U is a p oin t in sequence space on the tra jectory manifold). In practice, the GD algorithm is r un f o r a finite time and a tra jectory is not obta ined . Nev ertheless, the v alues of α u 0 for large α , define candidates tra jectories using infor ma t io n from the observ ation windo w 0 ≤ i ≤ n − 1. F or i > 0, the i -step preimages of the relev an t comp onen t of α u i pro vide candidates for the initial state. Complications arise from the fact that the Logistic Map is tw o -to-one; these complications hav e nothing to do with chaos p er se 12 . In one-t o -one chaotic ma ps the calculation of preimages is straigh tfor w ard. The Logistic Map is a tw o- to-one map, and in most cases 13 only one of the tw o preimages for eac h α u i is relev ant to ˜ x i − 1 . In practice a threshold criteria to discard irrelev ant preimages mus t b e defined, a simple example w ould b e to discard (with high probability) those preimages whose distance from the cor r esp onding previous observ at io n exceeds some t hr eshold based on the prop erties of the observ ational no is e (a 3 σ criteria is adopted in the fo llo wing section and sho wn to b e adequate for purp ose). 3.2 Results The green p oin ts in Figure 4 w ere lo cated using PDA imp ortance sampler; note that some ha v e relativ e log- likelihoo d near zero (maxim um 1.3). As exp ected, t he observ ations do no t con tain sufficien t inf o rmation to identify the state of t he system at the time of the final observ ation with the same degree of precision as the initial state. This is reflected in the fact that the green p oin ts are mu ch less close to the true state at time 31 (Figure 4b) than the corr esp onding green p oin ts at time 0 (Figure 4a). Tw o ex p erimen ts w ere conducted to tes t the robustness of the PDA impo rtance sam- 12 Beyond the fact that o ne - to-one maps in o ne-dimension cannot display chaotic dynamics, of c o urse. 13 Not in all ca ses, howev er. F or discus s ion of the p oin t see [1 5]. 9 0 0.2 0.4 0.6 0.8 1 −400 −350 −300 −250 −200 −150 −100 −50 0 x 0 RLLik 0 0.2 0.4 0.6 0.8 1 −400 −350 −300 −250 −200 −150 −100 −50 0 x 31 RLLik Figure 4: a) F ollowing Figure 3a, but in this case including he states lo cated b y PDA imp ortance sampler are plotted in green. b) The forward images of those states plotted in (a) at time 3 1. pling approac h. The first is based on 2048 differen t realizations o f observ atio ns g iven the same initial condition, ˜ x 0 = √ 2 / 2, to examine consistency . Th e second considers 2048 differen t initial conditions to examine robustness ov er different states. Three differen t o b- serv ation window lengths we re used in eac h exp erimen t. T able 1 and T able 2 sho ws the results. States with greater likelih o o d than the true state are of ten iden tified. Giv en uncertain observ ations, one can nev er iden tify T ruth of a c haotic system unam- biguously; this w as noted b y Berliner & MacEac hen [3, 18], then Lalley [15, 16 ] and later explored by Judd & Smith [11]. Using the PD A approac h, high lik eliho o d states a r e indeed found: states with relative log- lik eliho o d larger than min us one are f ound in ev ery sin- gle exp erime ntal run. The fact that some PD A states ha ve gr eat er lik eliho o d than T ruth reflects t he fa ct that T ruth is not exp ected to b e the most like ly mo del state giv en the observ ations. F or each exp erimen tal run, the minimu m distance b et w een those states obtained by PD A imp ortance sampler (whose relativ e log-likelihoo d larg er than min us one) and T ruth is recorded. The minim um, maxim um and median statistical v a lues of t he minim um distance from T ruth are rep o rted in T able 1 and 2. It is clear that the PD A imp ortance sampler iden tifies states at higher qualit y (the minim um distance from T ruth decreases) as the observ ation window length increases. This is exp ected inasm uc h as mor e information from 10 the dynamics is a v ailable giv en a lo ng er windo w. T able 1 show s that the maxim um v a lue of t he minimum distance a mong the 2048 different r ealizat io ns is 1 . 49 × 10 − 10 for a window length of 32 and in T a ble 2 the maxim um v alue of the minimum distance among differen t true initia l conditions is 2 . 02 × 1 0 − 10 . The PD A imp ortance sampler is b oth robust and efficien t in this case. 14 Windo w # of RLLik > − 1 # of RLLik > 0 Minimum distance to ˜ x 0 length Min Max Median Min Max Median Min Max Median 32 6 15 8 0 14 6 2 . 00 × 10 − 15 1 . 49 × 10 − 10 9 . 98 × 10 − 12 16 2 11 8 0 11 6 1 . 57 × 10 − 10 7 . 63 × 10 − 6 5 . 13 × 10 − 7 8 2 7 7 0 7 5 8 . 58 × 10 − 8 4 . 55 × 10 − 2 2 . 56 × 10 − 4 T able 1: Statistics of high lik eliho o d states lo cated by PD A imp ortance sampler based on 2048 differen t realizations of observ atio ns (of ˜ x 0 = √ 2 / 2) for the Logistic Map, i) statistics of the num b er of states (whose log-likelihoo d relativ e to the true state larger than min us one) ii) statistics of the n umber of stat es (whose relativ e log -lik eliho od larger tha n zero) iii) statistics of the minim um distance b et we en the states (whose r elat ive log- lik eliho o d larger than min us one) and T ruth. The exp erimen ts ab ov e solv e b y demonstration Berliner’s “imp ossible” challenge; they demonstrate that truly high lik eliho o d p oin ts can b e lo cated using dynamical informatio n, easing Berliner’s identification problem of initial condition. Selecting an ensem ble from this high lik eliho o d set allow s for informativ e f orecasts which do not b ecome useless until aft er those fro m the p oin t fo r ec asts illustrated by Berliner [2] b ecome uninformative . 14 Drawing samples unifor mly from within a distance of 0 . 1 of T ruth would requir e ∼ 10 8 candidates in order to find a ca ndidate within ∼ 2 × 10 − 10 of T ruth; even then tha t clo se ca ndidate need no t hav e high likelihoo d. The results o f T a ble 1 and 2 were obtained with o nly 10 24 GD minimization iterations in ea c h realization (each and every one of which identified hig h likeliho od s ta tes clo s e to T r uth) . 11 Windo w # of RLLik > − 1 # of RLLik > 0 Minimum distance to ˜ x 0 length Min Max Median Min Max Median Min Max Median 32 6 360 28 0 338 16 4 . 7 7 × 10 − 15 2 . 02 × 10 − 10 1 . 50 × 10 − 11 16 3 56 14 0 48 9 3 . 04 × 10 − 10 1 . 54 × 10 − 5 9 . 93 × 10 − 7 8 2 20 7 0 20 7 6 . 36 × 10 − 8 5 . 60 × 10 − 3 3 . 32 × 10 − 4 T able 2: Statistics o f hig h lik eliho o d states lo cated b y PDA imp ortance sampler ba sed on 2048 differen t true initia l states f o r the Logistic Map, i) statistics of the num b er of states (whose log -lik eliho o d relativ e to the true state larger t ha n minus one) ii) statistics of the n umber of states (whose relativ e log-like liho o d larger than zero) iii) statistics of the minim um distance b et we en the states (whose relativ e lo g -lik eliho od larger than min us one) and T ruth. 3.3 Equifinalit y and Relativ e Lik eliho o d Maxim um lik eliho o d estimation has b een widely used [25] since its in tro duction b y Fisher [8] in 192 2. The “b est” estimate is often c hosen fr o m a set of candidates, and only the relativ e lik eliho o d of candidates within t ha t sample is considered. Figure 5 shows the log - lik eliho o d of 1024 states (the same set used in Figure 2b), the red dashed line is the median log - lik eliho o d of those states. In this case, the problem is not one of whic h estimate to select (equifinalit y), but one of recognizing that eac h and ev ery candidate has v anishing lik eliho o d (equidismalit y). In practice, T ruth is unkno wn therefore it cannot b e used as a reference, as it is in Figure 2. Giv en the observ ations and the noise mo del, ho w ev er, t he exp ected log-lik eliho o d of T ruth can b e determined 15 and serv e as a reference. Figure 5, the lo g- lik eliho o d of 1024 states are plot ted along with the exp ected log- lik eliho o d of T ruth (bla c k dashed line). Figure 5 show s that it is not a case of equifinality but a case of equidismalit y . In cases where it is observ ed that all traditional candidate states ha v e v anishingly small log-lik eliho o d relativ e to the exp ected log-like liho o d of T ruth, a PD A based imp ortance 15 The log-likelihoo d of T ruth is − P n − 1 i =0 δ 2 i 2 σ 2 (from Equation 2) where δ i (observ ation no ise) is I I D ∼ N (0 , σ 2 ) dis tributed. Let Z = P n − 1 i =0 δ 2 i σ 2 , Z is a r andom v aria ble fo llo wing chi-squared distr ibution with n degrees of freedom. Statistics of the lo g-lik eliho o d of T ruth can therefo re simply derived fr om Z . 12 sampling might pro v e of v aluable. 0.698 0.7 0.702 0.704 0.706 0.708 0.71 0.712 0.714 0.716 −400 −350 −300 −250 −200 −150 −100 −50 0 x 0 LRik Figure 5: Log-likelihoo ds of 1 0 24 states uniformly sampled from [ ˜ x 0 − σ 10 , ˜ x 0 + σ 10 ], the r ed dashed line is the median log - lik eliho o d of those states and the black dashed line is the exp ected log-lik eliho o d of T ruth, following Figure 2b. 4 Implications for Quan tifying Unce rtain ty in Prac- tice Ha ving solv ed one of Berliner’s “impossible” c hallenges for nonlinear s ystems, a review er asks for p oten tial implications this solution migh t hold in practice. Tw o p ossibilities stand out. The first r egards improv ed noise reduction on the initia l condition of the forecast via a PDA imp ortance sampler; this w ould indeed b e exp ected to impro ve forecast skill. Berliner’s problem, ho w ev er, fo cuses on identification of the p oin t at t he starting (oldest) observ ation, while weather forecasts are launc hed near the time of the end (most recen t) observ ation. In o perational w eather forecasting, a r elat ively small set of initial conditions ( ∼ 51) are launche d as forecasts in a relativ ely high dimensional space ( ∼ 10 7 ) in a Mon te Carlo fashion eve ry forecast cycle (t ypically ev ery 6 hours); see [10, 17, 3 1]. As is long held 13 in w eather forecasting (see [9 ] for a discussion), uncertain ty in the state at the start of the observ ation window in time collapses m uch more dramatically than that in the state at the end (presen t time); this is consisten t with results shown in Figur e 4. Figure 2 of [9] captures c hallenge Berliner & MacEach en [3, 18] , Hansen & Smith [9 ] and Lalley [16] foresa w. High likelihoo d states at presen t time lie along the unstable manifold, suc h sets of states a re more efficien t and informative, in terms of exploring initial condition uncertaint y , than traditional approach es like Singular V ectors [23] or Bred V ectors [32]. A second po ss ibility lies in reanalysis [24]. A reanalysis uses the most mo dern w eather mo del to revis it and improv e estimate of the state o f the atmosphere o f the distant past; here the opp ortunit y for PD A imp ortance sampler to make a contribution app ears g reater. A reanalysis tak es a mo dern mo del and reconsiders past p erio ds of time; reanalysis no w common in atmospheric (w eather) and o ceanic contexts [14, 22, 33]. This task closely re- sem bles that p osed by Berliner. The results ach ieve d in T a ble 1 and T able 2 reflect the aims of reanalysis directly , a bet in a p erfectly known one-dimensional system. In terpretations in terms of scalar “noise reduction” are limited, as there can b e info rmation gained simply b y restricting the state estimate to the mo del manifold. Reanalysis hav e many applications, one of significan t current in terest is index based insurance. In this case, uncertain ty quan tification of the index w ould b e of significan t in terest. In short, the complications of ev a luating insurance claims ba sed o n certifying damages in far flung regions of the w orld has lead to the insurance policies t r iggered by a mo del based meteorological index. [34] T o the exten t that the metho ds deplo y ed in t his pap er b etter quantify (and reduce) the uncertain ty in the wind sp eeds o f a storm within a few days of its passing, the efficiency of index based insurance could b e improv ed. Note from T a ble 1 and 2 t ha t the longer the windo w length the b etter estimate (no ise reduction) is ac hiev ed. Noise reduction based up on PDA imp ortance sampler migh t also help fill in missing historical o bs erv ations, as w ell identify mo del deficiency when iden tifying mo del tra jectories intended to reflect past observ atio ns . 14 5 Disting uishing b et w e e n i nitial cond itions and pa- rameters The lik eliho o d functions of initial conditions and that of par amete r v alues ha v e similar features [2]. There are, ho w eve r, fundamen tal differences in the information a v aila ble to address these t w o distinct estimation problems. Giv en the structure of the mo del class, the mo del par a mete r v alue determines dynam- ical b ehav iors of the mo del ( e.g. natural measures ) whic h are not changed by the initial condition. Give n the mo del and “ true” parameter v alue, the searc h fo r unkno wn initial conditions is aided b y the restriction the nature measure(s) places o n candidate initial conditions in the state space (and, thereby , on t r a jectories in the sequence space). It is unclear how to construct similar constraints on unkno wn parameter v alues in the parameter space giv en the “true” initial state (if they exist 16 ). Uncertain t y in init ia l state differs from uncertain t y in the parameter v a lue. The info rmation in a me asur ement of the initial condition uncertain ty will dec ay with time and ev en tually b ecome statistically indistinguishable from a random sample of the natural measure, while the information in the initia l condition is preserv ed, arg uably forev er. While assuming the para mete r v alue is p erfect mig ht initially app ear extreme, it app ears less nonsensical giv en that one has already assumed that the mo del structure is p erfect. Assuming t he initial stat e is p erfect indicates a noise free observ ation is p oss ible. Let the mo del’s parameters be con tained in the v ector a ∈ R l . A set of l + 1 sequen tial noise free observ ations s i , s i +1 , ..., s i + l w ould, in g ene ral, b e sufficien t to determine a [19]. Th us if one noise free observ ation is obtainable, o btaining only a few more noise f r ee observ ations would define the true parameter v a lue precisely . P erhaps a more r ealis tic w a y to put the problem is to estimate the par a mete r v alue(s) giv en realistic observ ations, without assuming the “true” initial condition is kno wn. In that case, the goal is t o lo cate high lik eliho o d tra jectories 16 Neither is it clea r how to construct the set of parameter v alues whose cor r esponding inv ariant meas ure contains the “true” initial state, or how to exploit this set; what is cle a r is how to exploit the existence o f tra jecto ry manifold given a pa rticular v alue of the par ameter. 15 (Smith et al. [29] call these shado wing tra jectories) for particular sets of parameter v a lues . It is not y et clear ho w to solv e such a problem. In fact, it is not clear if it is p ossible to constrain the solution in the parameter space in a manner analogous t o the constrain ts in the space o f initia l condition ac hiev ed b y using tra jectory manifo ld in the state space. Giv en a p erfect mo del structure a nd kno wing the true parameter v alue(s), the true initial state is a well defined goal o f the iden tification. Inasm uc h as structural mo del errors imply no true parameter v alue exists [6, 12], it is unclear ho w one migh t define “true” initial state a nd the g o al of estimation m ust b e rethought when the structure of the system differs fro m t ha t of the mo del. Despite the imp ortance of mo del para meters, outside linear systems there is no general metho d of par amete r estimation 17 . Metho ds ha ve b een dev elop ed to obtain useful param- eter v a lues with some success: McSharry a nd Smith [19, 26] estimate mo del parameters b y incorp orating the global b eha viour of the mo del in to the selection criteria; Crev eling et al. [4] hav e exploited sync hro nization for para meter estimation; Smith et al. [29] fo cused on the geometric prop erties of tra jectories; Du and Smith [5] select parameter v alues based on the Ignorance sc ore of ens emb le fo r ec asts. Eac h of these metho ds, how ev er, require a large set of observ ations. Challenges remain when only a short sequence o f observ at io ns is a v ailable. Berliner’s second challenge still stands. 6 Conclus ion Berliner illustrated that even in t he p erfect mo del scenario traditional a pproac hes are un- able to prov ide go o d es timates of the initia l condition for nonlinear c hao tic sy stems. In large pa r t , those f a ilures are due to the inability of traditional approac hes to skillfully meld the informat ion in the dynamics of the nonlinear system itself with that in the observ ations. The imp ortance sampling approa c h presen ted here uses Pseudo-orbit Data Assimilation t o 17 That is no t to dismiss importa n t adv ances in para meter es timation when atten tion is r estricted to certain par ticular clas s es of mo del structure [2 1]; o ur claim her e is that no general appr oac h to par ameter estimation is known which is applicable to nonlinear mo dels in gener a l. 16 com bine information from b oth observ ations and dynamics more effectiv ely , thereby lo - cating high lik eliho o d initial states; this a chiev es one aim Berliner (1991) argued to b e imp ossible . As discussed in Section 5, this ac hiev emen t may prov e useful in reanalysis studies. Despite the similarity of state estimation and parameter estimation, there are fun- damen tal differences b et w een uncertaint y in the initial state and uncertaint y in pa r a mete r v alue. Significant obstacles remain in solving Berliner’s second c hallenge regarding para me- ter estimation while Pseudo-orbit Data Assimilation ov ercomes his first c hallenge regar ding initial condition. Ac kno wled gmen t This researc h w as supp orted b y the LSE’s Gr a n tha m Researc h Institute on Climate Change and the En vironmen t a nd the ESR C Cen tre fo r Climate Change Economics and P olicy , funded b y the Economic and So cial Researc h Council and Munic h Re. Additional sup- p ort fo r H.D. w as also provided b y the National Science F oundation Aw ard No. 0951576 “DMUU: Cen ter for Robust Decision Makin g on Climate and Energy P olicy ( R DCEP )” . L.A.S. g ratefully ac know ledges the con tin uing support of Pe mbrok e College, Oxford. W e also ac knowle dge the insigh tful commen ts o f Mic hael Stein and Mark Berliner in discussions of this w or k. References [1] C. Bec k. Effects of roundoff errors on c haotic dynamics. In J. V andew alle, R. Boite, M. Mo onen, and A. Oosterlinc k, editor s, Signal Pro cessing VI - Theories and Appli- cations Pro ceedings o f EUSIPCO-92, volume 1, page 18318 6 , 1992. [2] M.L. Berliner. Likelihoo d and ba y esian prediction for chaotic systems. J. Am. Stat. Asso c., 86:93 8952, 1 9 91. [3] M. L. Berliner and S. N. MacEac hern. Examples o f inconsisten t bay es pro cedures 17 based on observ atio ns on dynamical system s. Statistics and Proba bility Letters, 314 17(5):3553 60, 199 3 . [4] D.R. Crev eling, P .E. G ill, and H.D .I. Abarbanel. State a nd parameter 258 estimation in nonlinear systems as an optimal trac king problem. Phys ics Letters A, 372:2640264 4, 2008. [5] H. D u and L.A. Smith. P ar a mete r estimation thro ug h ignora nce. Phys . Rev. E, 016213, 2012. [6] H. Du and L.A. Smith. Pseudo-orbit data a ssimilation. I I: Assimilation with imp erf ect mo dels. Journal of the Atmosph eric Science s, 71:48349 5, 2014. [7] H. D u and L.A. Smith. Pseudo-orbit data assimilation par t I: The p erfec t mo del scenario. Journal of the A tmospheric Sciences, 71:4 69482, 2 014. [8] R.A. Fisher. On the mathematical foundations of theoretical statistics. Philosophical T ransactions of the Roy al So ciet y o f Lo ndon, 222:309 368, 1922. [9] J. A. Hansen and L. A. Smith. Probabilistic noise reduction. T ellus, 5(53):5 8 5598, 2001. [10] Brian Hoskins. The p oten tia l fo r skill across the range of the seamless w eather-climate prediction problem: a stim ulus f o r our s cience. Quarterly Journal of the Roy al Me- teorological So ciet y , 139 ( 6 72):573584, 2013. [11] K. Judd and L .A. Smith. Indistinguishable states I: The p erfect mo del scenario. Ph ysica D , 151:1 25141, 2001. [12] K. Judd and L.A. Smith. Indistinguishable states II: the imp erfect model scenario. Ph ysica D , 196:2 24242, 2004. [13] Kevin Judd, Leonard Smith, and An tje W eisheimer. G r a dien t fr ee descen t: shadow- ing, and state estimation using limited deriv ativ e informat io n. Phy sica D: Nonlinear Phenomena, 190(34):1 5 3166, April 2004. 18 [14] E. Kalnay , M. Kanamitsu, R. Kistler, W. Collins, D. Deav en, L. Gandin, M. Iredell, S. Saha, G . White, J. W o ollen, Y. Zh u, A. Leetmaa, R. Reynolds, M. Chelliah, W. Ebisuzaki, W. Higgins, J. Ja no wiak, K . C. Mo, C. Rop elewsk i, J. W ang, Roy Jenne, and Dennis Joseph. The ncep/ncar 40-y ear reanalysis pro ject. Bulletin of the American Meteorological So ciet y , 7 7(3):437471, 1996. [15] S.P . La lley . Beneath the noise, chaos. The Annals of Stat is tics, 27(2):461479, 1999. [16] S.P . Lalley . Remo ving t he noise from chaos plus noise. In A.I. Mees, editor, Nonlinear Dynamics and Statistics, Boston, 2000. [17] M. Leutb ec her and T. Palme r. Ensem ble forecasting. Journal of Computational Ph ysics, 2 2 7:35153539, 2008. [18] S.N. MacEache rn and L.M. Berliner. Asymptotic inference for dynamical systems observ ed with error. Journa l of Statistical Planning and Inference, 46:2 77292, 1995. [19] P . McSharry and L.A. Smith. Better nonlinear mo de ls from noisy data: A ttracto r s with maximum lik eliho o d. Ph ys. Rev. Lett, 83 ( 2 1):42854288, 1999. [20] R.N. Miller, M. G hil, and F. Ga ut hiez. Adv anced data assimilation in strongly non- linear dynamical systems. J. A tmos. Sci., 51 :10371056, 1 994. [21] No el A. C. Cressie and Christopher K. Wikle. St a tistics fo r spatio-temp oral da ta. Wiley series in pro ba bilit y and statistics. Hob o k en, N.J. Wiley , 2011. [22] P .R . Ok e, G.B. Brassington, D .A. Griffin, and A. Sc hiller. The bluelink o cean dat a assimilation system (b o das). Ocean Mo delling, 21 :4670, 2 0 08. [23] T. N. P almer, R. Buizza, F.Molteni, Y.- Q. Chen, and S. Corti. Singular v ec- tors and t he predictabilit y of w eather a nd climate. Philosophical T ransactions of the Roy al So ciet y of London A: Mathematical, Phys ical and Engineering Sciences, 348(1688 ) :4 59475, 1994 . 19 [24] W. Park er. Computer sim ulation, measuremen t, and data assimilation. British Jour- nal for the Philosophy of Science, 201 5 . [25] Y. P awitan. In all likelih o o d : statistical mo delling and inference 28 3 using lik eliho o d. Oxford science publications. Clarendon press, Oxford, 2001 . [26] V.F. Pisarenk o a nd D . Sornette. Statistical methods of parameter estimation for deterministically chaotic time series. Phys . Rev. E, 69:036122 , 2004. [27] L.A. Smith. La cunarity and Chaos in Nature. PhD thesis, Columbia Univ ersit y , New Y ork, 1987. [28] L.A. Smith. What might w e learn from climate forecasts? In Pro c. Na t io nal Acad. Sci., v olume 4 , pages 2487- 2492, USA, 2 0 02. [29] L.A. Smith, M.C. Cuellar, H. Du, and K. Judd. Exploiting dynamical coherence: A geometric approach to parameter estimation in nonlinear mo dels. Ph ysics Letters A, 374:26182 623, 20 1 0. [30] J.C. Sprott. Chaos and Time-Series Analysis. Oxford Unive rsity Press, Oxford, 2003 . [31] Z. T oth and E. Kalnay . Ensem ble forecasting at nmc: The generation of p erturba- tions. Bulletin of the American Meteorological So ciet y , 7 4:23172330, 1993 . [32] Z. T oth and Eugenia Kalna y . Ensem ble F orecasting at NCEP and the Breeding Metho d. Mon thly W eather R eview, 125(12):329 7 3319, Decem b er 1997. [33] S. M. Uppala, P . W. Kllb erg, A. J. Simmons, U. Andrae, V. Da Costa Bec h told, M. Fiorino, J. K. Gibson, J. Ha seler, A. Hernandez, G. A. Kelly , X. Li, K. Onogi, S. Saarinen, N. Sokk a, R. P . Allan, E. Andersson, K. Arp e, M. A. Balmaseda, A. C. M. Beljaars, L. V an De Berg, J. Bidlo t , N. Bormann, S. Caires, F. Chev allier, A. Dethof, M. D ragosa v ac, M. Fisher, M. F uen tes, S. Hagemann, E. Hlm, B. J. Hoskins, L . Isaksen, P . A. E. M. Janssen, R. Jenne, A. P . Mcnally , J.- F. Mahfo uf, J.-J. Morcrette, N. A. Rayner, R. W. Saunders, P . Simon, A. Sterl, K. E. T renberth, 20 A. Untc h, D . V asiljevic, P . Viterb o, and J. W o ollen. The era-40 re-analysis. Quarterly Journal of the R o y al Meteorological So ciet y , 1 31(612):29613012 , 2005 . [34] Caribb ean Catastrophe Risk Insurance F acilit y . Understanding ccrif: A collection of questions and a ns we rs. CCRIF SPC, Marc h 2015 . 21

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment