Sharp failure rates for the bootstrap particle filter in high dimensions
We prove that the maximum of the sample importance weights in a high-dimensional Gaussian particle filter converges to unity unless the ensemble size grows exponentially in the system dimension. Our work is motivated by and parallels the derivations …
Authors: Peter Bickel, Bo Li, Thomas Bengtsson
IMS Collectio ns Pushing the Limits of Con temp orary Statist ics: Contributions in Honor of Jay an ta K. Ghosh V ol. 3 ( 2008) 318–329 c Institute of Mathe matical Statistics , 2008 DOI: 10.1214/ 07492170 80000002 28 Sharp failure rates for the b o otstrap particle filte r in h igh di men sions P eter Bic k el 1 , Bo Li 2 and Thomas Bengtsson 3 University of California-Berkeley, Tsinghua University and Bel l L abs Abstract: W e p rov e that the maxim um of the sample i mpor tance weigh ts in a high-dimensional Gaussian particle filter con v erges to unity unless the ensem ble s ize grows exp onentially i n the system dimension. Our work is mo- tiv ated by and parallels the deriv ations of Bengtsson, Bick el and Li (2007); ho w ev er, we we ak en their assumptions on the eigenv alues of the cov ariance matrix of the prior distri bution and establish rigorously their strong conjec- ture on when weigh t coll apse occurs. Sp ecifically , we remov e the assumption that the nonzero eigenv alues are b ounded aw a y f rom zero, which, although the dimension of the inv ol v ed vect ors grow to infinit y , essen tially permi ts the effectiv e system dimension to be b ounded. Moreov er, wi th some restrictions on the r ate of gro wth of t he m aximum eigenv alue, w e relax their assumption t hat the eigenv alues are bounded from ab o v e, allowing the system to be dominated b y a single mode. Con ten ts 1 Int ro duction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 2 Mo del setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 2.1 The par ticle filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 2.2 Monte Carlo scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 0 3 Gaussian case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328 1. Introduction Bay esian filtering methods are a commonly employ ed tool fo r com bing ph ysical mo dels and data. The filters treat the unknown system state as a ra ndo m v ariable and reso lve its probability density co nditional on the data (and the sys tem dynam- ics) through Mon te Car lo sampling techniques. When applied sequen tially in time, these metho ds are commonly refer red to as particle filter s ([ 8 ], [ 10 ]). F or a diverse collection of a pplications and a n excellent intro duction to the field in general, see 1 367 Ev ans Hall, Departmen t of Statistics, Universit y of California, Berke ley , 94710-3860, CA, USA, e-m ai l: bickel@s tat.berk eley.edu 2 440 W eilun Hall, Sc hool of Economics and Management , Tsinghu a Universit y , Beijing, 100084, China, e-mail: libo@sem .tsinghu a.edu 3 Bell Labs, 600 Mountain A v en ue, Mur ray Hi ll, 07904 NJ, USA, e-mail : tocke@cg d.ucar.e du AMS 2000 subje c t classific ations: 93E11, 62L12, 86A22, 60G50, 86A32, 86A10. Keywor ds and phr ases: Bay esian filter, curse of dimensionality , ensemble forecast, ensem ble methods, i mp ortance sampling, l arge deviations, Monte Carlo, numerical weathe r prediction, sam- ple s ize r equiremen ts, state-space mo del. 318 Sharp failur e r ates in high dimensional b o otstr ap filters 319 the edited volume b y Douce t [ 6 ]. The particle filter metho d relies heavily on a likeli- ho o d based r eweigh ting mechanism of the inv olv ed sample draws. This r eweigh ting scheme pro duces the so c alled imp ortance weigh ts, and these weigh ts ar e the pri- mary fo cus of our work. Sp ecifica lly , in a Gaussian filter context, we examine the behavior o f the impor tance w eights as a function of the sy stem dimension a nd of sample size. The p opular it y of the particle filter is no doubt due to the flexibility of the mo del fra mework to handle both non-linear and non-gaussian structures. How ever, in s pite of its generality , the metho d is not without flaws: the par ticle filter is known to require large Monte Carlo ensembles and frequent resampling to estima te the desired densities (cf., [ 9 ]). This drawbac k is pa rticularly prev alent in higher dimensional sys tems where the filter b eco mes unstable and quickly collapses onto a sing le p oint mass. In r ecent work, for a single B ay es up date step in a Gaussian setting, B engtsson, Bick el, and Li [ 3 ] give a deriv ation of the weigh t collapse as a function of the system dimension and of sample size. T o shed further lig ht on the weight collaps e , this pape r establishes conjectures (given in [ 3 ]) which make their arguments fully rigo rous. Just as significant ly , we exhibit that co llapse is a function of the effe ctive dimension (defined in Section 3 ), rather than the absolute dimension. As in [ 3 ], our analysis is given in the context of a s t ylized Gaussia n example, but we conjectur e (and s im ulations s how) tha t our results are infor mative for situations that depend on simila rly defined reweighting schemes. The results imply that to avoid collapse, the s ample size m ust g row super -exp onentially in the effective dimensio n. W e do no t inv estigate refinements of particle filters metho ds, such as sim ulated tempering [ 4 ], althoug h our dis cussion in Section 2 .1 suggests that their approa ch is not a solution to av oid collapse in truly high-dimensiona l settings. Our w ork is o utlined as fo llows. The next section describ es the pa rticle filter, provides notation, and descr ib es the us e of the ensemble metho d for appr oximating po sterior densities. The main developmen ts are then presented in Sectio n 3, where we give several res ults es tablishing the co nditions under which the maximum sample weigh t in a Ga ussian particle filter conv erges to unity . All technical results are prov ed in the App endix. (W e note tha t so me mater ial in Section 2.1 and Sec tion 3 is given in [ 3 ], but is repro duced here for completeness.) 2. M o del s etting 2.1. The p article fil ter Let X t represent the unknown system state a t time t , Y t be a noisy da ta mea - surement o f X t , and let Y t represent a ll data up to and including time t . Based on the data Y t and (some) knowledge of the time-evolution of the system state from X t − 1 to X t , w e seek the p oster ior distr ibutio n p ( X t | Y t ). W e assume we hav e av ailable a random s a mple { X f t,i } of size n from the prio r distribution p ( X t | Y t − 1 ). Asso ciated with the prior sa mple is a set of weigh ts { w f i } . W e assume further that the likelihoo d de ns it y p ( Y t | X t ) is computable for arbitr ary X t . The particle filter seeks to recur sively in time es timate the proba bilit y distri- bution of the unknown sta te X t . At ea ch time t , the proba bilit y distr ibutio n is represented b y the sample ensem ble { X f t,i , w f i } , and the distribution ca n b e propa- gated forward one time- s tep by evolving ea ch X f t,i using the system dyna mics . Once new data Y t is av ailable, Bay es theor em is used to adjust the weigh ts based on ho w 320 P. Bickel, B. Li and T. Bengtsson “close” the asso cia ted sample po int s are to the data. The following schematic de- scrib es the par ticle filter : p ( X t | Y t − 1 ) , Y t Ba y es − → p ( X t | Y t ) G ( · ) − → p ( X t +1 | Y t ) , Y t +1 Ba y es − → p ( X t +1 | Y t +1 ) . Here, at time t (on the left), Bay es theorem c ombines p ( X t | Y t − 1 ) and Y t to pro duce p ( X t | Y t ). The sys tem dy namics, in the ab ov e represented by G ( · ) (middle), is used to propaga te p ( X t | Y t ) one time step and this yields p ( X t +1 | Y t ). Bay es theore m is then again employed to find the p oster ior p ( X t +1 | Y t +1 ) (right). In a pa rticle filter, the ab ov e schematic is straig htf orwardly implemen ted (at least conceptually) using a rando m sample. W e note first that the c hange-of-v a riables problem repre s ented by the pr opagatio n of p ( X t | Y t ) can b e solved by e v aluating G ( · ) at each sample p oint. W e will not discuss the implement ation of the for ecast step here; instead, our fo cus is o n the Bay es upda te step. As mentioned, the par ticle filter implements the Bay es step b y r eweigh ting the prior sample according to the likelihoo d. W e note in passing that the particle filter may b e derived as a (sequen- tial) imp or tance sampler (e.g ., [ 2 ]) where the prop osa l distr ibutio n is given by the prior and the targe t distribution is given by the p os ter ior. In the schematic b e- low, which descr ib es a b o otstra p-likelihoo d filter , the prior sample is “conv erted” to a p oster ior sample by resampling (with replace ment) each member X f t,i with probability pr op ortional to w f i × p ( Y t | X f t,i ), i.e., prior ensemble z }| { { X f t, 1 , . . . , X f t,n } , Y t r esample − → posterior ensemble z }| { { X u t, 1 , . . . , X u t,n } . Although the pa rticle filter has b een s uccessfully applied to a v a riety settings, it often pro duces highly v a r ying impo rtance weigh ts. Remedies to stabilize the filter include resa mpling (r enormalizing) the inv olv ed empiric a l measure at regula r time int erv als [ 8 , 9 ] and margina liz ing o r r estricting the sample space by conditio ning on a la r ger infor mation set [ 10 , 11 ]. Another approach is given by s im ulated temp ering [ 4 ], whic h mak es use of the reg ularized lik elihoo d p ( Y t | X f t,i ) α , whe r e 0 < α < 1 . How ev er, as can b e seen fro m our der iv ations , e.g . P rop osition 3.1 , a fixed α do es not alter the conclusio n of collapse. Moreover, for each time p oint, to obtain sa mples from the target densit y , simulated temp ering g enerates a se q uence o f ensembles from kernels K i ( · ) ( i = 1 , . . . , I ) s uch that K I ( · ) approa ches the desired kernel K ( · ) asso c iated with the p os terior density . Unfortunately , for truly high dimensio nal systems, w e conjecture that the num ber of intermediate sampling s teps I w ould be prohibitively large a nd render it pra ctically unfeasible. Th us, such remedies do not fundamen tally addres s perfor mance when the filter is applied to very lar ge sc ale systems. F or example, a s no ted by ([ 1 ], [ 1 3 ]), when applied in high dimensio ns, the filter co llapses to a po int mass after a few (or ev en one!) observ ation cycles. In par ticular, as will be shown in Section 3 , it is the norma lized quantit y w i = p ( Y t | X f t,i ) / P j p ( Y t | X f t,j ) that b ehav es singularly . The next section s e ts up the necessar y no tation and formalizes our proble m. 2.2. Monte C arlo scheme W e formalize our problem a s follows. Consider a set of n sample p oints X = { X 1 , . . . , X n } , where X i ∈ ℜ d and b oth the s ample size n and s ystem dimension d are “lar g e.” (T o lighten notatio n, we hav e dropp ed the time subscript and the fore- cast super s cript.) W e assume that the sample X is drawn randomly from the prior Sharp failur e r ates in high dimensional b o otstr ap filters 321 (or pr op osal) distribution p ( X ). New da ta Y is re la ted to the state X by the condi- tional density p ( Y | X ). F or concreteness, a functional relationship Y = f ( X ) + ε is assumed, and ε is ta ken to b e indep endent of the state X . The goal is to estimate po sterior exp ectations using the imp ortance ratio, i.e., for s ome function h ( · ), w e wan t to es timate E ( h ( X ) | Y ) = Z h ( X ) p ( Y | X ) p ( X ) R p ( Y | X ) p ( X )d X d X , and use ˆ E ( h ( X ) | Y ) = n X i =1 h ( X i ) p ( Y | X i ) P n j =1 p ( Y | X j ) as an estimator. Based on this formulation, the weights attached to each ensemble mem ber (1) w i = p ( Y | X i ) P n j =1 p ( Y | X j ) are the primar y ob jects of our study . As mentioned, in large scale analyz e s, the weigh ts in ( 1 ) are highly v aria ble and often pro duce estimates ˆ E ( · ) which a re col- lapsed onto a p oint mass with max ( w i ) ≈ 1 . As illuminated in [ 3 ], this degeneracy is per v asive for high-dimensiona l systems, and appea rs to hold for a v ariety o f prior and likelihoo d distr ibutio ns . W e next consider the ca se when b oth the prior and the likeliho o d distributions are Gaussia n. 3. Gauss ian case W e assume a data mo del given b y Y = H X + ε, where Y is a d × 1 vector, H is a known d × q matrix, and X is a q × 1 vector. Both the prop osal distribution and the er r or distribution are Gaussian with p ( X ) = N ( µ X , Σ X ) and p ( ε ) = N (0 , Σ ε ), and the noise ε is taken indep endent of the state X . Since the data mo del can b e pre-rota ted by Σ − 1 / 2 ε , we set Σ ε = I d without loss of generality (wlog). Moreover, since E Y = E H X , we ca n replace X i by ( X i − E X i ) and Y b y ( Y − E Y ) and leav e p ( Y | X ) unchanged. Hence, wlog we a lso set µ X = 0. F urther, de fine, for conformable A a nd B , the inner pro duct h A, B i = A T B (where the sup erscr ipt T denotes matrix tr ansp ose), and let k A k 2 = h A, A i . With p ( Y | X ) ∼ N ( H X, I d ), the weigh ts in ( 1 ) can b e expr e ssed a s (2) w i = exp − k Y − H X i k 2 / 2 P n j =1 exp − k Y − H X j k 2 2) . T o establish weight collapse for high-dimensiona l Gaussian p ( Y | X ) and p ( X ), we first write the ex po nent in ( 2 ) in terms o f the singular v a lues of cov ( H X ). Let d ′ = rank ( H ). With λ 2 1 , . . . , λ 2 d ′ the singula r v alues o f cov ( H X ), define the d ′ × d ′ matrix D = diag ( λ 1 , . . . , λ d ′ ). Then, with Q an or thogonal matrix obtained by the sing ular v alue deco mpo sition o f cov ( H X ), define the d ′ × 1 vector V s uch that Q T H X = DV . 322 P. Bickel, B. Li and T. Bengtsson Note that V i corres p o nding to X i is N (0 , I d ′ ). Since Q is orthogona l, we can write (3) k Y − H X i k 2 = k Q T Y − D V i k 2 = d ′ X j =1 λ 2 j W 2 ij + d X j = d ′ +1 ǫ 2 0 j , where, conditional o n Y , [ W i 1 , . . . , W id ′ ] T is N ( ξ , I d ′ ), and where ǫ 0 j is the j th comp onent of the observ ation noise vector ε . The mea n v ector ξ = [ µ 1 , . . . , µ d ′ ] T is given by (4) ξ = D − 1 Q T Y = V + D − 1 ε ′ , where V and ε ′ are indep endent N (0 , I d ′ ) . Now, for i = 1 , . . . , n, define (5) S i = P d ′ j =1 λ 2 j ( W 2 ij − (1 + µ 2 j )) 2 P d ′ j =1 λ 4 j (1 + 2 µ 2 j ) 1 / 2 . Note that the s econd term in ( 3 ) is co nstant for every X i , and will not influence the weigh t w i . By ( 2 ), we ca n express the maximum weight as (6) w ( n ) = 1 1 + T n,d ′ , where T n,d ′ = P n ℓ =2 e − σ d ′ √ d ′ ( S ( ℓ ) − S (1) ) with σ 2 d ′ = 2 d ′ P d ′ j =1 λ 4 j (1 + 2 µ 2 j ). Thus, to prov e weigh t collapse , we need to show conv ergence of the denominato r in ( 6 ) to unit y . W e now state the following. Prop ositi on 3. 1. L et S i , i = 1 , . . . , n, b e indep en dent r andom variables with cumu- lative distribution function (c df ) G d ( · ) s atisfying the c onditions sp e cifie d in L emma A.1 and L emma A.2 state d in the App endix. L et S (1) ≤ · · · ≤ S ( n ) b e the or der e d se quen c e of S 1 , . . . , S n , and define, for some σ > 0 , T n,d = P n ℓ =2 e − σ √ d ( S ( ℓ ) − S (1) ) . Then, as, n, d → ∞ , if log n log d d → 0 , we have s σ 2 d 2 log n E ( T n,d ) → 1 . A pro of of the result is provided in the Appendix. F or the Gaussia n case consid- ered here, a n immediate implica tion o f P rop osition 3.1 is weight collaps e. Sp ecifi- cally , with tw o additional as s umptions, we may a s sert the following. Prop ositi on 3. 2. We assume, for the Gaussian c ase c onsider e d her e, A1: Ther e is a p ositive c onst ant δ such that 1 δ ≥ λ 1 , · · · , λ d ′ > δ ; and A2: τ 2 d ′ = 2 d ′ P d ′ j =1 (3 λ 4 j + 2 λ 2 j ) → σ 2 > 0 . Then, if log n log d ′ d ′ → 0 , we have w ( n ) P → 1 . Prop ositio n 3.2 follows by Lemma A.3 (Appendix) and Pr op osition 3.1 . The ab ov e result implies that, unless n grows sup er -exp onentially in d ′ , we have weigh t collaps e. W e no te that Pr op osition 3.2 is a sha rp ening of the conv e r gence rate as compa red to that implied by Se c tio n 3.1 o f [ 3 ]. The log d ′ term app e ars only bec ause max | µ j | = O p ( √ log d ′ ), and we need to ma ke our ana lysis conditional on the { µ j } . Sharp failur e r ates in high dimensional b o otstr ap filters 323 The re s ults in P r op osition 3.2 sugg est that la rge d ′ leads to collapse. How ever, we argue now that what rea lly matters is the effe ctive dimension o f X , defined as the sum of the singular v alues o f cov ( H X ) . W e shall assume that B : λ 1 ≥ λ 2 ≥ · · · ≥ λ d ′ ≥ · · · ar e par t of an infinite s equence. Our a rguments can be modified to the case where { λ j : 1 ≤ j ≤ d ′ } is a double array , but we eschew this complication. There are tw o p ossibilities, (i) ∞ X j =1 λ 2 j < ∞ , or (ii) ∞ X j =1 λ 2 j = ∞ . W e claim that if (i) holds, there is no w eight colla pse. That is, if, say , g : R 7→ R is bo unded and co n tin uous, (7) n X i =1 w i g ( X ∗ i ) P → E g ( X | Y ) . In the a bove, X ∗ i is drawn from the empir ical measure P n j =1 w i δ ( X i ), where δ ( · ) represents the delta function, a nd wher e , as b efor e, the w i represents the likelihoo d- defined weigh ts. T o verify the conv ergence in ( 7 ), note that w i = U i / n X j =1 U j , where (8) U i = c − 1 d ′ exp {− 1 2 d ′ X j =1 λ 2 j ( Z 2 ij − 1) + 2 λ 2 j µ j Z ij } . In ( 8 ), the Z ij ’s are i.i.d. N (0 , 1 ) and c d ′ = E exp {− 1 2 d ′ X j =1 λ 2 j ( Z 2 ij − 1) + 2 λ 2 j µ j Z ij } = d ′ Y j =1 (1 + λ 2 j ) − 1 / 2 e λ 2 j / 2 e λ 4 j µ 2 j 2(1+ λ 2 j ) . Now, since (i) implies that Q d ′ j =1 (1 + λ 2 j ) − 1 / 2 e λ 2 j / 2 conv erges and E ∞ X j =1 λ 4 j µ 2 j 1 + λ 2 j = ∞ X j =1 λ 4 j E ( µ 2 j ) 1 + λ 2 j = ∞ X j =1 λ 2 j , we have E ( U 1 ) = 1 and c d ′ → c (with c a constant). Arguing as in P rop osition 4.1 in [ 3 ], we ca n show that V ar 1 n n X i =1 U i g ( X i ) ≤ 1 n E U 2 1 g 2 ( X 1 ) → 0 , 324 P. Bickel, B. Li and T. Bengtsson since a straightforward c omputation shows that E ( U 2 1 ) ≤ M < ∞ for all d ′ . Thus, under (i), the imp or ta nce weights hav e the co r rect exp ecta tio n and v anishing v ari- ance. On the other hand, if (ii) holds, we c a n state the following pr op osition. Prop ositi on 3. 3. Under B , if P ∞ j =1 λ 2 j = ∞ and (log n log d ′ ) /τ 2 d ′ → 0 , we have τ d ′ √ 2 log n E ( T n,d ′ ) → 1 . W e note that o ur conditions imply that max 1 ≤ j ≤ d ′ λ 2 j (1 + y 2 0 j ) τ 2 d ′ → 0 so that asympto tic nor mality holds. The pro of requires Lemmas A.1 and A.3 . The form r eveals that it is p o s sible to have muc h slower collapse than what Prop ositio n 3.2 sugges ts. F o r instance, if λ 2 j = 1 / j , B holds but τ 2 d ′ = lo g d ′ (1 + o (1)). In fact, the r equirement that the λ j form an infinite se q uence as ab ov e can b e weak ened to requiring simply that the λ j be b ounded ab ove uniformly , and this can b e verified us ing a subsequenc e a rgument. In conclusion, o n the basis of Prop osition 3 .3, provided that the nonzero λ j ’s ar e commensurate, it seems reasonable to define P d ′ j =1 λ 2 j as the effe ctive dimension . W e note that the fo rm o f the effective dimension also plays a crucial role in the work of [ 7 ], who study Mon te Car lo s ample size r equirements in the ensem ble K alman filter framework. App endix W e first introduce tw o lemmas that p er ta in to Edgeworth ex pansion type uniform normal approximations of the distribution (the cdf and the density r esp ectively) of independent s ums of ra ndom v ariables. The tw o lemmas lay the groundwork for the pro of of Pr op osition 3.1 . V alid for mo derately larg e deviations , the fir st result (Lemma A.1 ) is a sp ecial case of Theorem 2.5 in [ 12 ], and is stated here without pro of. Lemma A.1. L et ξ 1 , . . . , ξ d b e indep endent r ando m variables with E ξ j = 0 and σ 2 j = V ar ( ξ 2 j ) < ∞ . Set S d = 1 B d ( ξ 1 + · · · + ξ d ) , wher e B 2 d = P d j =1 σ 2 j , and define the Lyapunov quotients L k,d = 1 B k d d X j =1 E ξ j k , k = 1 , 2 , . . . . We also supp ose | E ( Z k j ) | ≤ k ! γ k − 2 j σ 2 j , k ≥ 3 , wher e γ 1 , . . . , γ d ar e c onst ant t erms. With t hese c onditions, as d → ∞ , ther e exist analytic functions P d ( x ) = P ∞ k =3 λ k,d x k with | λ k,d | ≤ Ac k d − k − 2 2 for some A, c and al l d , such that the c df of S d , denote d G d ( · ) , satisfies, 1 − G d ( x ) = (1 − Φ( x )) exp ( P d ( x ))(1 + o (1)) , Sharp failur e r ates in high dimensional b o otstr ap filters 325 G d ( − x ) = Φ( − x ) exp( P d ( − x ))(1 + o (1)) uniformly for al l x ≥ 0 and x = o ( B d /K d ) , wher e K d = ma x 1 ≤ j ≤ d { γ j , σ j } . F u r- thermor e, P d satisfies (9) | P d ( x ) | ≤ cx 3 /B d for some c onstant c > 0 . We use c generic al ly as a c onstant indep endent of d . Lemma A.1 gives a normal approximation for the cdf of indepe ndent sums, and serves as the basis for the nor mality conditions of Pro p o sition 3.1 . Next we give a lemma for a norma l approximation of the density of indep endent sums, which can be directly der ived from Pr o p osition 2 and Theorem 3 of [ 5 ]. Lemma A.2. With the same notation and c onditio ns as in L emma A.1 , we assume ξ j,d has density g j,d such that sup x {| g j,d ( x ) | : 1 ≤ j ≤ d } ≤ M < ∞ . Then, as d → ∞ , the density of S d , g d ( · ) = G d ( · ) , satisfies g d ( x ) = φ ( x ) exp ( P d ( x ))(1 + o (1)) , g d ( − x ) = φ ( − x ) exp ( P d ( − x ))(1 + o (1)) uniformly for al l x ≥ 0 and x = o ( B d /K d ) , wher e K d = max 1 ≤ j ≤ d { γ j , σ j } . W e note in pass ing that the condition of uniform b oundedness of the g j,d do es not hold for Z j , the Gaussia n–Gaussian case. How ev er, the s um of λ 2 1 Z 2 1 + λ 2 2 Z 2 2 , where λ 1 , λ 2 > 0 and Z 1 , Z 2 are indep endent Gaus sian, does indeed satisfy the condition. This may b e verified by a direct calculation of the density of the conv olution. The next lemma is given for the pur po se of verifying the L yapunov quotients conditions app earing in Lemmas A.1 and A.2 . Lemma A.3. L et Z j , V j , ǫ j , j = 1 , . . . , d, b e iid N(0,1). L et λ 1 ≥ λ 2 ≥ · · · wher e P ∞ j =1 λ 2 j = ∞ . Then, given µ j ≡ V j + ǫ j λ j , for al l j , we have λ 2 k j E | ( Z j + µ j ) 2 − (1 + µ 2 j ) | k µ j (10) ≤ O p ( √ log d ) k k ! ρ k λ 4 j E ( Z j + µ j ) 2 − (1 + µ 2 j ) µ j , for k ≥ 3 . Thu s, given the mean vector ξ = [ µ 1 , µ 2 , · · · , µ d ] defined in ( 4 ), Lemma A.3 states that the Lyapuanov conditions required by Lemma A.1 hold, with probability tending to 1 . W e note that o ur argument als o implies Lemma A.1 . Pr o of of L emm a A.3 . Since ( Z j + µ j ) 2 − (1 + µ 2 j ) = ( Z 2 j − 1) + 2 µ j Z j , it is enough to b ound λ 2 k j E | ( Z 2 j − 1) + 2 µ j Z j | k µ j ≤ 2 k λ 2 k j E | Z 2 j − 1 | k + 2 k ( | µ j | λ 2 j ) k E | Z j | k . By standar d pr o p erties of the Gaussia n momen ts, for some p ositive constant C , E | Z 2 j − 1 | k ≤ C k k ! , and E | Z j | k ≤ C k k ! . Since E ( Z 2 j − 1 + 2 µ j Z j ) 2 = 2 + 4 µ 2 j we see that ( 10 ) follows from the b o und λ 2 k j | µ j | k ≤ λ 2 j µ 2 j max {| λ 2 ℓ µ ℓ | k − 2 : 1 ≤ ℓ ≤ d } = O p ( p log d ) k − 2 , since the λ 2 j µ j are indep endent N (0 , λ 2 j + λ 4 j ) so that max {| λ 2 ℓ µ ℓ | : 1 ≤ ℓ ≤ d } ≤ ( λ 2 j + λ 4 j ) 1 / 2 max {| V ℓ | : 1 ≤ ℓ ≤ d } where the V ℓ are i.i.d. N (0 , 1 ) . The lemma follows. 326 P. Bickel, B. Li and T. Bengtsson The remainder o f the App endix is devoted to the pro o f of the main result given in Pro p o sition 3.1 . Pr o of of Pr op osition 3.1 . Let S j ( j = 1 , . . . , n ) b e a s defined in the Pro p o sition and let S (1) be the minimum. Note that (11) E ( T n,d | S (1) ) = ( n − 1 ) R ∞ S (1) exp − τ d ( z − S (1) ) d G d ( z ) ¯ G d ( S (1) ) , since, g iven S (1) , the r emaining ( n − 1) observ ations are i.i.d. with cdf eq ual to G d ( z ) / ¯ G d ( S (1) ) , z ≥ S (1) . Let ε d be a sequence of constants such that ε d → 0 a nd ε d τ d / √ 2 log n → ∞ a s n, d → ∞ . W e fir s t define, for x < ε d τ d , (12) h n,d ( x ) := Z ∞ x exp − τ d ( z − x ) d G d ( z ) . T o ev aluate h n,d ( x ), w e break the int egral into tw o parts : the first part yields the integral fro m x to x + ε d τ d , and the seco nd part yields the tail in tegral fro m x + ε d τ d to ∞ . By using the nor mal approximations of Lemmas A.1 and A.2 , under the ass umption that (log n ) / τ 2 d → 0, o ne can show that the second par t is o √ 2 log n/nτ d . T o deal with the first part, we s hall show that a s x → −∞ and x > − ε d τ d , (13) Z x + ε d τ d x exp − τ d ( z − x ) d G d ( z ) = 1 τ d φ ( x ) exp P d ( x ) (1 + o (1)) T o this end, applying Lemma A.2 with ℓ = 3, we o bta in, R d ( x ) := Z x + ε d τ d x exp − τ d ( z − x ) − 1 2 ( z 2 − x 2 ) + P d ( z ) − P d ( x ) d z (1 + o (1)) = Z ∆ n,d 0 exp − τ d v − 1 2 (( x + v ) 2 − x 2 ) + ∞ X k =3 λ k,d (( x + v ) k − x k ) d v (1 + o (1)) = 1 | x | Z | x | ε d τ d 0 exp − ( − 1 + τ d | x | ) w − w 2 2 | x | 2 + ∞ X k =3 λ k,d k X j =1 ( − 1) k − j C k,j | x | k − 2 j w j d w (1 + o (1)) = 1 | x | Z | x | ε d τ d 0 exp − b 1 w − b 2 w + ∞ X j =3 b j w j d w (1 + o (1)) , (14) where b 1 = σ √ d | x | − b ∗ 1 = − 1 + τ d | x | − ∞ X k =3 ( − 1) k − 1 C k, 1 λ k,d | x | k − 2 , b 2 = 1 2 | x | 2 − b ∗ 2 = 1 2 | x | 2 − ∞ X k =3 ( − 1) k − 2 C k, 2 λ k,d | x | k − 4 Sharp failur e r ates in high dimensional b o otstr ap filters 327 and b j = ∞ X k = j ( − 1) k − j C k,j λ k,d | x | k − 2 j . Note | λ k,d | ≤ Ac k 0 τ − ( k − 2) d and C k,j < c k , for some constants A, c 0 , c where µ j = V j + ǫ j /λ j . Herea fter, we use c as a ge neric p ositive constant that do es not depend on x a nd d . Under the assumptions tha t x → −∞ , | x | < ε d τ d (hence | x | /τ d → 0), and | x | ∆ n,d → ∞ , we hav e , fir s tly , b ∗ 1 ≤ ∞ X k =3 Ak c k 0 τ − ( k − 2) d | x | k − 2 = Ac 2 0 3( c 0 | x | /τ d ) / (1 − ( c 0 | x | /τ d )) + ( c 0 | x | /τ d ) 2 / (1 − ( c 0 | x | /τ d )) 2 = o (1) , (15) secondly , (16) b ∗ 2 ≤ x − 2 ∞ X k =3 c ( c | x | /τ d ) k − 2 = | x | − 2 ( c | x | /τ d ) / (1 − c | x | /τ d ) = o ( | x | − 2 ) , and thirdly , b j ≤ ( 2 j − 1 X k = j + ∞ X k =2 j ) A ( cc 0 ) k τ − ( k − 2) d | x | k − 2 j = 2 j − 1 X k = j A ( cc 0 ) k ( | x | /τ d ) k − j | x | − j τ − ( j − 2) d + ∞ X k =2 j A ( cc 0 ) k ( | x | /τ d ) k − 2 j τ 2 − 2 j d ≤ | x | − 2 ( c | x | τ d ) − ( j − 2) + cτ 2 − 2 j d ≤ 2 | x | − 2 ( c | x | τ d ) − ( j − 2) . (17) Since w / ( | x | τ d ) ≤ ε d → 0, we ca n further derive ∞ X j =3 b j w j ≤ 2( w/ x ) 2 ( cw/ ( | x | τ d )) j − 2 = 2 ( w / | x | ) 2 cw/ ( | x | τ d ) / 1 − cw / ( | x | τ d ) = o ( | x | − 2 ) w 2 . Combining ( 14 ), ( 15 ), ( 16 ), a nd ( 18 ) yields (18) R d ( x ) = 1 | x | Z | x | ∆ n,d 0 exp − ( − 1 + τ d | x | )(1 + o (1)) w − ( w 2 2 | x | 2 )(1 + o (1)) d w. The o (1)’s app ear ing in the last expr ession are unifor m as w v aries ov er the integral int erv al. Now, the b ounded conv ergence theor em ensures R d ( x ) = (1 /τ d )(1 + o (1)), which esta blishes ( 13 ). T ak ing in to account the r e mainder term, we conclude that (19) h n,d ( x ) = 1 τ d φ ( x ) exp P d ( x ) (1 + o (1)) + o √ 2 log n nτ d . 328 P. Bickel, B. Li and T. Bengtsson Our target ( τ d / √ 2 log n ) E ( T n,d ) can now be written as τ d √ 2 log n E ( T n,d ) = τ d ( n − 1 ) √ 2 log n E h n,d ( S (1) ) ¯ G d ( S (1) ) = τ d n √ 2 log n Z ∞ −∞ h n,d ( x ) ¯ G n − 2 d ( x ) dG d ( x ) . (20) W e deco mp os e the preceding integral into three parts (21) τ d √ 2 log n E ( T n,d ) = I n,d + I I n,d + I I I n,d where I n,d , I I n,d , a nd I I I n,d represent the integral of ( 11 ) over the interv a ls [ −∞ , − ε d τ d ], ( − ε d τ d , − (log n ) 1 / 4 ), and [ − (log n ) 1 / 4 , ∞ ), resp ectively . The pr eceding dis- cussion, combined with the a pproximation g d ( x ) = xG d ( x )(1 + o (1)) as x → − ∞ and | x | = o ( τ d ), implies that the dominating pa r t is the quantit y r epresented by I I n,d . W e have, I I n,d = n ( n − 1) √ 2 log n Z − (log n ) 1 / 4 − ε d τ d xG d ( x ) ¯ G n − 2 d ( x ) dG d ( x )(1 + o (1)) = 1 √ 2 log n Z nG d ( − (log n ) 1 / 4 ) nG d ( − ε d τ d ) G − 1 d ( w/n ) w (1 − w /n ) n dw (1 + o (1)) = 1 √ 2 log n Z nG d ( − (log n ) 1 / 4 ) nG d ( − ε d τ d ) p − 2 log( w /n ) w e − w dw (1 + o (1)) = Z ∞ 0 we − w dw (1 + o (1)) − 1 √ 2 log n Z ∞ 0 w lo g we − w dw (1 + o (1)) = 1 + o (1) . (22) T o arr ive at ( 22 ) we hav e used the approximation G − 1 d ( z ) = √ − 2 log z (1 + o (1)) for z → 0 in lig ht of Lemma A.1 a nd Mill’s ratio. F or the r emaining tw o parts, we use Mill’s ratio and o btain I n,d + I I I n,d ≤ τ d √ 2 log n ( n − 1 ) P ( S (1) ≤ − ε d τ d ) + P ( S (1) ≥ − (log n ) 1 / 4 ) = τ d √ 2 log n ( n − 1 ) 1 − ¯ G n d ( − ε d τ d ) + ¯ G n d − (log n ) 1 / 4 ≤ τ d √ 2 log n ( n − 1 ) nG d ( − ε d τ d ) + ¯ G n d − (log n ) 1 / 4 → 0 . (23) Finally , combining ( 21 ), ( 22 ), and ( 23 ), yie lds the desired r e sult. References [1] Anderson, J. and Anderson, S . (199 9). A Monte Car lo implement ation of the nonlinea r filtering problem to pro duce ens emble assimilations and for ecasts. Monthly We athe r R eview 127 2741– 2758. [2] Arulamp alam, M. , Maskell, S., Go rdon, N. and Clapp, T. (2002). A tutorial o n particle filters for online nonlinear /non-Gaus s ian Bay esian tra ck- ing. IEEE T r ansactions of S ignal Pr o c essing 50 174–1 8 8. Sharp failur e r ates in high dimensional b o otstr ap filters 329 [3] Bengtsson, T., Bickel, P. and L i, B. (2007). Pr ob ability and S tatistics: Essays in Honor of David A. F r e e dman . IMS Mono gr aph Series 337 –356. [4] Del Moral , P., Doucet, A. and Jasra, A. (2006 ). Sequential Monte Carlo samplers. J. R oy. Statist. So c. Ser. B 68 411– 436. MR22783 33 [5] Del tuviene, D. and Saulis, L. (20 03). Asymptotic expansion of the distri- bution density function for the sum of ra ndom v araibles in the series scheme in large deviations zo nes. A cta Appl. Math. 78 87– 97. MR202177 1 [6] Doucet, A., de Freit as, N. and Go rdon, N., eds. (200 1). Se quential Monte Carlo Metho ds in Pr actic e . Springer, New Y or k. [7] Furrer, R. and Bengtsson, T. (2007). Estimation o f high-dimensional prior a nd p oster ior cov aria nce matrices in Ka lman filter v ariants. J. Multivari- ate Anal. 98 22 7–25 5 . MR23017 51 [8] Gordon, N., Sal mon, D. and Smith, A. (1993). Novel appro ach to nonlinear/no n-Gaussian Bay e s ian state estimation. IEE Pr o c e e dings-F 14 0 107–1 13. [9] Liu, J. (200 1). Monte Carlo St r ate gies in Scientific Computing . Spr ing er, New Y ork. MR18423 42 [10] Liu, J. and Chen, R. (1998). Sequential Monte Carlo metho ds fo r dynamic systems. J. Amer. St at ist . Asso c. 93 1032–1 044. MR164 9198 [11] Pitt, M. and Shep ard, N . (1999). Filtering via simulation: Auxilliary pa r- ticle filters. J. Amer. S tatist. A s s o c. 94 590– 5 99. MR170 2 328 [12] Saulis, L. and St a tul evicius, V. (2000). Limit The or ems of Pr ob abili ty The ory . Spr inger, New Y o rk. MR17988 11 [13] v an Leeuwen, P . (2 003). A v ariance minimizing filter for larg e-scale appli- cations. Monthly We ather R eview 131 207 1–208 4.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment