Perfect simulation using dominated coupling from the past with application to area-interaction point processes and wavelet thresholding
We consider perfect simulation algorithms for locally stable point processes based on dominated coupling from the past, and apply these methods in two different contexts. A new version of the algorithm is developed which is feasible for processes whi…
Authors: Graeme K. Ambler, Bernard W. Silverman
3 P erfect sim ulatio n using dominated coupling from the past with appli cation to area-in teraction p oin t pro cesses and w a velet thresholding Graeme K. Am bler a and Bernard W. Silv erman b Abstract W e consider perfect simulation algorithms for loca lly stable point pro- cesses based on dominated coupling from the past, and apply these meth- o ds in tw o differ e nt contexts. A new version of the algo rithm is developed which is feasible for proce sses whic h are neither purely attr a ctive nor purely repulsive. Suc h pr o cesses include mult is cale area -interaction pro- cesses, whic h ar e capable of modelling p oint patterns whose clustering structure v aries acr oss s c a les. The other topic consider e d is no nparamet- ric regression using w avelets, where w e use a suitable area-interaction pro cess on the discrete spa c e of indices of w av elet co efficie nts to model the notion that if one wav elet co efficient is non-zer o then it is more likely that neighbouring co efficients will be also. A metho d based o n perfect simulation within this mo del shows promising results compared to the standard methods which threshold co efficient s indep endently . Keywor ds coupling from the past (CFTP), dominated CFTP , exact sim- ulation, lo ca l stability , Marko v chain Mont e Carlo, p erfect simulation, Papangelou conditional intensit y , spatial birth-and-death pro cess AMS subje ct classific ation (MSC2010) 62M30 , 60G55, 60K35 a Departmen t of Medicine, Addenbrook e’s Hospital, Hills Road, Cambridge CB2 0QQ; graeme@am bler. m e.uk b Smith School of Enterprise and Environmen t, Ha yes House, 75 George Street, Oxford OX1 2BQ; b ernard.silverman@stats.o x.ac.uk 2 G. K. Ambler and B. W. Silverman 1 In tro duction Marko v chain Monte Carlo (MCMC) is now one o f the standard ap- proaches of computational Bay esian inference. A standard issue when using MCMC is the need to ensure that the Marko v chain w e are using for sim ula tion has reached equilibrium. F or certain classes of problem, this problem w a s solv ed by the introduction of coupling from the past (CFTP) (Pr o pp and Wilson, 199 6, 199 8). More recently , metho ds based on CFTP hav e been develope d for perfect simulation o f spa tial po int pro cess mo dels (see for example Kenda ll (1997, 1998); H¨ aggstr ¨ om e t al. (1999); Kendall and Mø ller (2000)). Exact CFTP metho ds are therefore a ttractive, as one does not need to ch e ck convergence rigorously or worry ab out burn-in, or use co m- plicated meth o ds to find appropriate standard errors for Mon te Carlo estimates based on cor related samples. Independent and identically dis- tributed sa mples are now av ailable, so estimation reduces to the simplest case. Unfortunately , this simplicity comes at a price . These metho ds ar e notorious for taking a long time to r e turn just one exact sample and are often difficult to code, leading man y to g ive up and return to none x act metho ds. In respo ns e to these issues, in the first part o f this pap er we present a dominated CFTP algor ithm for the simulation o f lo ca lly stable po int pr o cesses which potentially requires far few e r ev aluatio ns per it- eration than the exis ting met ho d in the literature (Kendall and Møller, 2000). The pap er then go es on to discuss applications of this CFTP al- gorithm, in t wo different con texts, the modelling of point patterns and nonparametric reg ression by w av elet thresho lding. In par ticula r it will be seen that these tw o proble m a reas are muc h mor e clos ely r elated than might b e imagined, beca use of the w ay that the non-ze r o co efficients in a w avelet expansion ma y b e mo delled as an appropriate point pro cess. The structure o f the paper is as follows. In Section 2 w e discuss per- fect simulation, b eginning with ordinary coupling from the past (CFTP ) and moving on to do mina ted CFTP for spa tial p oint pro c esses. W e then int r o duce and justify our per fect s im ula tion algorithm. In Section 3 we first rev iew the standa rd area-interaction pro cess. W e then in tr o duce our multiscale process , des c rib e ho w to use our new p erfect sim ulation algorithm to sim ula te from it, and discuss a method for inferring the parameter v alues from data, and pr e s ent a n applica tio n to the Redw o o d seedlings data. In Section 4 we turn attention to the w av ele t r egressio n problem. Bayesian appro aches are reviewed, and a mo del introduced Perfe ct simulation using dominate d c oupling fr om the p ast 3 which incorp ora tes an area-interaction pro cess o n the discrete space of indices of wa velet co efficients. In Section 5 the application of o ur p erfect simulation algorithm in this cont e x t is developed. The need appropri- ately to modify the appro ach to increas e its computational feasibilit y is addressed, and a sim ulation study inv es tigating its performa nce on standard test examples is carried o ut. Sections 3 and 5 b oth co nclude with some suggestions for future w or k. 2 P erfect sim ulation 2.1 Coupling from the past In this section, w e offer a brief intuitiv e introduction to the principle b e- hind CFTP . F or more for mal descriptions and details, see, for exa mple, Propp and Wilso n (1996), MacKay (2003, Chapter 32) and C o nnor (2007). Suppo se w e w anted to s ample f r om the stationary distribution of an irreducible ap erio dic Marko v c hain { Z t } on s ome (finite) state space X with s tates 1, . . . , n . Int uitively , if it w er e p oss ible to g o ba ck a n infinite amount in time and start the chain running, the chain would b e in its stationary distribution when one r eturned to the present (i.e. Z 0 ∼ π , where π is the stationar y distr ibutio n o f the chain). Now, supp ose we were to set no t one, but n chains { Z (1) t } , . . . , { Z ( n ) t } running at a fixed time − M in the past, where Z ( i ) − M = i for ea ch c hain { Z ( i ) t } . No w let all the c hains be coupled so that if Z ( i ) s = Z ( j ) s at an y time s then Z ( i ) t = Z ( j ) t ∀ t ≥ s . Then if a ll the chains ended up in the sa me s tate j a t time zero (i.e. Z ( i ) 0 = j ∀ i ∈ X ), w e w o uld know that whichev er state the chain passing from time min us infinity to zer o was in at time − M , the c ha in would end up in state j at time zero . Thu s the state a t time zero is a sample from the stationary distribution provided M is large enough for coalescence to ha ve b een achiev ed for the r ealisations be ing considered. When p erfo r ming CFTP , a useful property of the coupling chosen is that it b e st o chastic al ly monotone as in the following definitio n. Definition 2.1 Let { Z ( i ) t } and { Z ( j ) t } b e t wo Marko v c ha ins ob eying the same tra ns ition kernel. Then a coupling o f these Markov chains is sto chastically monotone with resp ect to a par tial ordering if whenev e r Z ( i ) t Z ( j ) t , then Z ( i ) t + k Z ( j ) t + k for all po sitive k . Whenever the coupling use d is stochastically mono tone and there are 4 G. K. Ambler and B. W. Silverman maximal and minimal elements with re s pec t to then we need only simulate chains whic h start in the top and b ottom states , since c hains starting in all other states a re sandwiched by thes e t wo. This is an im- po rtant ingredien t of the dominated coupling from the past algorithm int r o duced in the next section. Although attempts have be en made to generalise CFTP to co n tinuous state spaces (n o tably Murdo ch and Green (1998) and Gree n a nd Murdo ch (1998), as w ell as K endall and Møller (200 0), discussed in Section 2.2), there is still m uch w o rk to be done befor e exact sampling beco mes uni- versally , or ev en generally applicable. F o r example, there are no tr uly general methods fo r pro cesses in hig h, or e ven mo derate, dimensions. 2.2 Dominated coupling from the past Dominated coupling from the past was intro duced as an extension of coupling fro m the past which allow ed the simulation of the area-interact- ion pro cess (Kendall, 1998), though it w as so on extended to o ther t yp es of point pro c esses and more general spaces (Kendall and Møller, 2000). W e giv e the formulation for lo c a lly stable point proc e sses. Let x be a spa tial p oint patter n in some bo unded subset S ⊂ R n , and u a single point u ∈ S . Suppose that x is a realisation of a spatial p oint pro cess X with density f with resp ect to the unit rate P ois son pro cess . The Pap angelou c onditional int en sity λ f is defined b y λ f ( u ; x ) = f ( x ∪ { u } ) f ( x ) ; see, for example, Papangelou (1 974) and Baddeley et al. (2005). If the pro cess X is lo cally stable, then there exists a consta nt λ such that λ f ( u ; x ) ≤ λ for a ll finite point co nfig urations x ⊂ S a nd a ll p oints u ∈ S \ x . The algorithm given in Kendall and Møller (2000) is then as follows. 1 Obtain a sample of the P ois son pro ces s with rate λ . 2 Evolve a Mark ov pro cess D ( T ) b ackwar ds until so me fixed time − T , using a birth-and- death proc ess with death ra te equal to 1 and birth rate eq ual to λ . The configuratio n genera ted in step 1 is used as the initial state. 3 Mark all o f the p oints in the pro ces s wit h U[0,1] marks. W e refer to the mark of point x a s P ( x ). Perfe ct simulation using dominate d c oupling fr om the p ast 5 4 Recursively define upper and lo wer pro cesses, U and L as follows. The initial configurations at time − T for the pro cesses are U − T ( − T ) = { x : x ∈ D ( − T ) } ; L − T ( − T ) = { 0 } . 5 Evolve the pro ces ses forwar ds in time to t = 0 in the following way . Suppo se that the pro cess e s hav e b een generated up a given time, u , and suppose that the next birth or death to o ccur after that time happ ens at time t i . If a birth ha pp ens next then we acc e pt the birth of the point x in U − T or L − T if the p oint’s mark, P ( x ), is less than min λ f ( x ; X ) λ : L − T ( t i ) ⊆ X ⊆ U − T ( t i ) or max λ f ( x ; X ) λ : L − T ( t i ) ⊆ X ⊆ U − T ( t i ) (2.1) resp ectively , where x is the po int to b e bo rn. If, how ever, a death happ ens next then if the ev ent is present in either of our pro cesse s we remov e the dying even t, setting U − T ( t i ) = U − T ( u ) \ { x } a nd L − T ( t i ) = L − T ( u ) \ { x } . 6 Define U − T ( u + ε ) = U − T ( u ) a nd L − T ( u + ε ) = L − T ( u ) for u < u + ε < t i . 7 If U − T and L − T are ident ic a l at time zero (i.e. if U − T (0) = L − T (0)), then we have the r equired s ample f r om the ar e a -interaction proces s with r ate para meter λ and a ttraction para meter γ . If not, go to step 2 and rep eat, extending the underlying P oisso n pro ce s s bac k to − ( T + S ) and generating additional U [0 , 1] marks (keeping the ones a lready generated). This algorithm inv olves calcula tion o f λ ( u ; X ) for each configuration that is both a subs e t o f U ( T ) and a sup erset of L ( T ). Since calculation of λ ( u ; X ) is typically expensive, this calculatio n may be very costly . The metho d prop osed in Section 2.3 uses an alternative version of step 5 which requires us o nly to c a lculate λ ( u ; X ) for upp e r and low er pro cesses. The more general form given in Kendall and Møller (2000) may b e obtained from the ab ov e algo rithm by replacing the evolving Poisson pro cess D ( T ) with a general dominating pro cess on a pa rtially or de r ed space (Ω , ) with a unique minimal elemen t 0 . The par tial or dering in the ab ov e algo rithm is that induced by the s ubset relation ⊆ . Step 5 is 6 G. K. Ambler and B. W. Silverman replaced b y any step which preserves the crucial funnel ling pr op erty L − T ( u ) L − ( T + S ) ( u ) U − ( T + S ) ( u ) U − T ( u ) (2.2) for all u < 0 and T , S > 0 and the sandwich ing r elations L − T ( u ) X − T ( u ) U − T ( u ) D ( u ) and (2.3) L − T ( t ) = U − T ( t ) if L − T ( s ) = U − T ( s ) (2.4) for s ≤ t ≤ 0. In equation (2.3 ), X − T ( u ) is the Markov chain or pro cess from whose stationary distribution w e wish to sample. 2.3 A perfect sim ulat ion algorithm Suppo se that we w is h to sa mple from a lo cally stable po int pr o cess with density p ( X ) = α m Y i =1 f i ( X ) , (2.5) where α ∈ (0 , ∞ ) and f i : R f → R are positive-v alued functions whic h are monotonic with re s pec t to the partial ordering induced by the subset relation 1 and hav e unifor mly bo unded Papangelo u conditional int ens ity: λ f i ( u ; x ) = f i ( x ∪ { u } ) f i ( x ) ≤ K . (2.6) When the conditional in tensity (2 .6) can be expr essed in this way , as the product of monotonic interactions, then we sha ll demonstrate t ha t the crucia l step of the Kendall–Møller alg orithm may b e r e - written in a form which is computationally muc h more efficie nt, essentially by dea ling with each factor separately . Clearly λ p ( u ; x ) ≤ λ = m Y i =1 max X, { x } λ f i ( x ; X ) (2.7) for all u and x , and λ is finite. Thus we may use the algo rithm in Section 2.2 to simulate from this proce s s using a P o is son process with rate λ a s the dominating pro cess. How ever, as previously ment io ned, calculation of λ p ( u ; x ) is typically exp ensive, incr e asing a t least linearly in n ( x ). Thus to calculate the expressions in (2.1), w e m ust in general perform 2 n ( U − T ( t i )) − n ( L − T ( t i )) 1 That is, configurations x and y satisfy x y if x ⊆ y . Perfe ct simulation using dominate d c oupling fr om the p ast 7 of these calculations, mak ing the algorithm non-p olyno mial. In practice it is c le a rly no t feasible to use this algo rithm in all but the most trivial of cases, so we must lo ok for some wa y to re duce the computational burde n in step 5 o f the alg orithm. This can b e do ne by replacing s tep 5 with the fo llowing alternative. 5 ′ Evolv e the pr o cesses forwar ds in time to t = 0 in the following wa y . Suppo se that the pro cess e s hav e b een generated up a given time, u , and suppose that the next birth or death to o ccur after that time happ ens at time t i . If a birth ha pp ens next then we acc e pt the birth of the point x in U − T or L − T if the p oint’s mark, P ( x ), is less than Q m i =1 [max { λ f i ( u ; U ( T )) , λ f i ( u ; L ( T )) } / λ ] or (2.8) Q m i =1 [min { λ f i ( u ; U ( T )) , λ f i ( u ; L ( T )) } / λ ] (2.9) resp ectively , where x is the po int to b e bo rn. If, how ever, a death happ ens next then if the ev ent is present in either of our pro cesse s we remov e the dying even t, setting U − T ( t i ) = U − T ( u ) \ { x } and L − T ( t i ) = L − T ( u ) \ { x } . Lemma 2.2 Step 5 ′ ob eys pr op erties ( 2.2 ) , ( 2.3 ) and ( 2.4 ) , and is thus a vali d dominate d c oupling-fr om-the-p ast algorithm. Pr o of Prop erty (2.3) follows by noting that (2.9) ≤ λ p ( u ; X ) ≤ (2.8) ≤ 1 . Prop erty (2.4) is trivial. Prop erty (2.2) follows from the monotonicity of the f i . Theorem 2.3 Supp ose that we wish to simulate fr om a lo c al ly stable p oint pr o c ess whose de n sity p ( X ) with r esp e ct to the unit- r ate Poisson pr o c ess is r epr esentable in form ( 2 .5 ) . Then by r eplacing Step 5 by Step 5 ′ it is p ossible t o b ound the ne c essary numb er of c alculations of λ p ( u ; X ) p er iter ation in the dominate d c oupling-fr om-the-p ast algorithm inde- p endently of n ( X ) . Pr o of Step 5 ′ clearly in volv es only a constan t n umber of ca lculations, so by Lemma 2.2 above and The o rem 2.1 of Kendall and Møller (2000), the r esult holds. In the ca se where it is p oss ible to write p ( X ) in form (2 .5) with m = 1, Step 5 ′ is identical to Step 5. This is the c a se for mo dels whic h are either purely attractiv e or purely repulsive, suc h as the sta ndard area- int er action proces s discussed in Sectio n 3 .1. It is no t the case f o r the 8 G. K. Ambler and B. W. Silverman m ultisca le pro ces s dis c us sed in Section 3.2, or the mo del for wav elet co efficients discussed in Section 4.2. The pro o f of Theorem 2.1 in Kendall and Møller (2000) do es not re- quire that the initial configuration of L − T be the minimal element 0 , only that it be cons tr ucted in such a wa y that prop erties (2.2), (2.3) a nd (2.4) are satisfied. Th us we ma y refine our metho d further by mo difying step 4 s o that the initial config uration of L − T is given by L − T ( − T ) = ( x ∈ D ( − T ) : P ( x ) ≤ m Y i =1 min X, { x } λ f i ( x ; X ) / λ ) , (2.10) which clearly satisfies the necessary requirements. 3 Area-in teraction pro cesses 3.1 Standard area-in teraction process There are several classes o f mo del for stochastic p oint pro cesses , for ex- ample simple Poisson pro cesses, cluster pro cesses such as Cox pro c esses, and pro cess es defined as the s tationary distribution of Mar ko v p oint pro cesses, such as Stra uss pro cesse s (Strauss, 19 75) and area-interaction pro cesses (Baddeley a nd Lie shout, 19 95). The a rea-interaction p oint pro - cess is capable of pro ducing both mo dera tely clustered and mo dera tely ordered patterns depe nding on the v alue of its clustering par ameter. It w a s in tro duced primarily to fill a gap left by the Strauss point pro- cess (Strauss, 197 5), which can pr o duce only ordered p oint patterns (Kelly and Ripley, 1976). The general definition of the a rea-interaction pr o cess depends on a sp ecification of the n eighb ourho o d of any p oint in the s pa ce χ on w hich the pro cess is defined. Giv en any x ∈ χ we denote b y B ( x ) the neigh- bo urho o d of the point x . Given a set X ⊆ χ , the neigh b ourho o d U ( X ) of X is defined as S x ∈ X B ( x ). The general area- in ter action pro cess is then defined b y Baddeley and Lieshout (1995) as follows. Let χ be s ome loca lly compact complete metric space and R f be the space o f a ll possible configurations o f p oints in χ . Suppos e that m is a finite Bo rel regula r measure on χ and B : χ → K be a my opically con- tin uo us function (Ma ther on, 19 75), where K is the class of all compact subsets of χ . Then the pr obability density of the gener a l a rea-interaction pro cess is g iven by p ( X ) = αλ N ( X ) γ − m { U ( X ) } (3.1) Perfe ct simulation using dominate d c oupling fr om the p ast 9 Figure 3.1 An example of some ev ents together with circular ‘grains’ G . The even ts in t he ab ov e diagram w ould b e the actual members of the pro cess. The circles around them are to show what the set X ⊕ G w ould lo ok lik e. If γ w ere large, t h e p oint configuration on the righ t w ould b e fav oured, whereas if γ w ere small, th e confi guration on the the left w ould b e fa voured. with resp ect to the unit rate Poisson pro cess , wher e N ( X ) is the num b er of p oints in configur a tion X = { x 1 , . . . , x N ( X ) } ∈ R f , α is a nor malising constant and U ( X ) = S N ( X ) i =1 B ( x i ) as above. In the spatial p o int - pro cess case, fo r so me fixed co mpact set G in R d , the ne ig hbourho o d B ( x ) of each p o int x is defined to be x ⊕ G . Here ⊕ is the Mink owski addition operato r , defined by A ⊕ B = { a + b : a ∈ A, b ∈ B } for sets A and B . So the resulting area -interaction pro cess ha s density p ( X ) = αλ N ( X ) γ − m ( X ⊕ G ) (3.2) with resp ect to the unit - rate Poisson pro cess , where α is a normalising constant, λ > 0 is the r ate para meter, N ( X ) is the num b er o f p o in ts in the configuration X , γ > 0 is the clustering para meter. Here 0 < γ < 1 is the re pulsive c ase, while γ > 1 is the att r active case. The case γ = 1 reduces to the ho mogeneous Poisson pro cess with rate λ . Figure 3.1 gives an example of the construction when G is a disc. 3.2 A m ultiscale area-in teraction process The a rea-interaction pro cess is a flexible mo del yielding a go od range of models, from regular through total spatial randomness to clustered. Unfortunately it do es not allow for mo dels whose b ehaviour c hang es a t different r esolutions, for example repulsion at sma ll dista nces and at- traction at larg e distances. So me examples which displa y this so r t of behaviour a r e the distribution of trees on a hillside, o r the distribution 10 G. K. Ambler and B. W. Silverman of ze br a in a patch of s av annah. A physical exa mple o f la rge scale attrac- tion and small scale repulsio n is the in tera ction betw een the strong nuc- lear for ce and the elec tro-magnetic force be t ween tw o opp os itely c ha r ged particles. The physical laws gov erning this behaviour are different fro m those go verning the b ehaviour of the ar e a-interaction class of models, though they may b e sufficiently similar so as to provide a useful approx- imation. W e prop ose the following model to capture these types of behaviour. Definition 3.1 The multisc ale ar e a-inter action pr o c ess has densit y p ( X ) = αλ N ( X ) γ − m ( X ⊕ G 1 ) 1 γ − m ( X ⊕ G 2 ) 2 , (3.3) where α , λ and N ( X ), are as in equa tion (3.2); γ 1 ∈ [1 , ∞ ) and γ 2 ∈ (0 , 1 ]; and G 1 and G 2 are balls of radius r 1 and r 2 resp ectively . The proce ss is clearly Marko v of range max { r 1 , r 2 } . If G 1 ⊃ G 2 , we will have small-scale re puls ion and large- scale attraction. If G 1 ⊂ G 2 , we will hav e small-scale attraction and large-scale repulsion. Theorem 3.2 The density (3.3) is b oth me asur able a n d inte gr able. This is a str aightforw ar d extension of the pro of of Baddeley and Lieshout (1995) for the sta ndard are a-interaction pro cess; for details, see the Ap- pendix of Amb le r and Silverman (2004b). 3.3 P erfect sim ulation of the multiscale pro cess Perfect sim ulation of the m ultiscale pr o cess (3.3) is p ossible us ing the metho d introduced in Section 2 .3. Since (3.3) is alrea dy wr itten as a pro duct of three monotonic functions with uniformly b ounded P apan- gelou c o nditional intensities, we need only substitute into eq uations (2.7– 2.10) as follo ws. Substituting in to equation (2.7), we find that the rate of a suitable dominating pro cess is λγ − m ( G 2 ) 2 . The initial configurations of the uppe r and low er pro cesses U and L are then found b y simulating this pr o cess, thinning w ith a probabilit y of γ − m ( G 1 ) 1 γ m ( G 2 ) 2 for L . Perfe ct simulation using dominate d c oupling fr om the p ast 11 P S f r a g r e p l a c e m e n t s x x Y − T ( u ) ⊕ G Y − T ( u ) ⊕ G ( x ⊕ G ) \ ( Y − T ( u ) ⊕ G ) ( x ⊕ G ) \ ( Y − T ( u ) ⊕ G ) Figure 3.2 A nother lo ok at Figure 3.1 with some shading added to sho w the pro cess of simulation. Dark shading shows Y − T ( u ) ⊕ G where Y − T ( u ) is th e state of either U or L immediately b efore we add th e new even t and G could b e either G 1 or G 2 . Ligh t shading shows the amount added if we accept th e new even t. I n the configuration on th e left, x ⊕ G = ( x ⊕ G ) \ ( Y − T ( u ) ⊕ G ), so that the attractiv e term in (3.4) or ( 3.5 ) will b e v ery small, w hereas the repulsiv e term will b e large. In the configuration on t h e righ t w e are adding ve ry little area to ( Y − T ( u ) ⊕ G ) by adding the event, so th e attractive term will be larger and th e repulsive term will b e smaller. As U and L evolve to wards time 0, we accept p oints x in U with probability γ − m (( x ⊕ G 1 ) \ U − T ( u ) ⊕ G 1 ) 1 γ m ( G 2 ) − m (( x ⊕ G 2 ) \ L − T ( u ) ⊕ G 2 ) 2 (3.4) and accept ev ents in L whenever P ( x ) ≤ γ − m (( x ⊕ G 1 ) \ L − T ( u ) ⊕ G 1 ) 1 γ m ( G 2 ) − m (( x ⊕ G 2 ) \ U − T ( u ) ⊕ G 2 ) 2 . (3.5) Figure 3.2 giv es examples of the construction ( x ⊕ G ) \ Y − T ( u ) ⊕ G . 3.4 Redwoo d seedlings data W e take a brief lo ok at a da ta set which has been mu ch analysed in the liter ature, the Redwoo d seedlings da ta first considered by Strauss (1975). W e examine a subs e t of the or iginal data chosen by Ripley (1977) and later analysed by Diggle (1978) among others . T he data a re plotted in Figure 3.3. W e wish to mode l these data using the multiscale mo del we hav e introduced. The rig h t pane of Figure 3 .3 gives the estimated p oint pro cess L-function 2 of the data, defined b y L ( t ) = p π − 1 K ( t ) w her e K is the K-function a s defined b y Ripley (1976, 1977). 2 There is no connection b et ween the p oint pro cess L-f unction and the use of the notation L elsewhere in this paper for the low er process in the CFTP algorithm; the clash of notation is an unfortunate result of the standard use of L in both con texts. Nor does either use of L refer to a likelihoo d. 12 G. K. Ambler and B. W. Silverman 0.0 0.1 0.2 0.3 0.4 0.5 −0.01 0.01 0.03 0.05 PSfrag replaemen ts r L ( r ) r Figure 3.3 Redwoo d seedlings data. Left: The data, selected by Ripley (1977) from a larger d ata set analysed by Strauss (1975). Right: Plot of the p oint-process L-function for the redwoo d seed- lings. There seems to b e interactio n at 3 different scales: (very) small scale repu lsion follo wed b y attraction at a mo derate sca le and th en repulsion at larg er scales. F rom this plot we estimate v alues of R 1 and R 2 as 0 . 07 and 0 . 01 3 resp ectively , g iving repulsion at sma ll scales and attra ction at mo dera te scales. It also s eems that there is some r epulsion a t slightly larger scales, so it may b e p o s sible to use R 2 = 0 . 2 and to mo del the larg e-scale int er action rather than the small- scale in terac tion as we ha ve c hose n. Exp erimenting with v ar ious v alues for the remaining parameters, we chose v alues γ 1 = 2 0 00 and γ 2 = 1 0 − 200 . The v alue λ = 0 . 118 was chosen to give about 62 points in eac h r e alisation, the num b er in the o bserved data set. The remark ably small v alue of γ 2 was necessary beca use the v alue of R 2 was also very sma ll. It is clear from these num b ers that it would be mor e natural to define γ 1 and γ 2 on a logar ithmic scale. Figure 3.4 shows po int pr o cess L- and T - function plots for 19 simu la tions from this mo del, providing approximate 95% Monte-Carlo confidence env elop es for the v alues of the functions. It can b e se e n that on the basis o f these functions, the model appea rs to fit the data reasonably well. The T - function, defined b y Schladitz and Ba ddeley (2 000), is a third or der analo gue of the K-function, a nd for a Poisson pro cess T ( r ) is prop ortiona l to r 4 ; in Fig ur e 3.4 the function is transformed b y taking the fourth ro ot of a suitable multiple and then subtracting r , in or der Perfe ct simulation using dominate d c oupling fr om the p ast 13 0.0 0.1 0.2 0.3 0.4 0.5 −0.02 0.02 0.06 0.10 PSfrag replaemen ts r L ( r ) r 0.00 0.10 0.20 0.30 −0.05 0.00 0.05 0.10 0.15 PSfrag replaemen ts r 4 r 2 b T ( r ) ( 3 4 p 3) r Figure 3.4 Po int pro cess L- and transformed T-function p lots of the redw o o d seedlings data. Lef t: L-fun ction plots of the data together with simulatio n s of the multiscale mo del with parameters R 1 = 0 . 07, R 2 = 0 . 013, λ = 0 . 118 , γ 1 = 2 000 and γ 2 = 1 0 − 200 . Dotted lines giv e an envelope of 19 sim ulations of the mo del, the solid line is the redw o o d seedlings data and the dashed line is the a verage of the 19 simulations. Right: the co rresp onding plots for t he transformed T-function. to yield a function whose theor etical v alue f o r a Poisson pro cess would be z e ro. The plots show sev er al things: firstly that th e mo del fits reasonably well, but that it is p ossible that we chose a v alue of R 1 which was slightly to o la r ge. Perhaps R 1 = 0 . 0 6 would hav e b een b etter. Secondly , it seems that the la rge-sc a le repulsion may b e an impor tant factor which should not b e ignore d. Thir dly , in this case we hav e gained little ne w infor mation by plo tting the T-function—the third-order b ehaviour of the data seems to be similar in nature to the second-or der structure. 3.5 F urther comments The main adv antage of our method for the perfect simulation of lo ca lly stable p oint pro cesses is that it a llows acceptance probabilities to b e computed in O ( n ) instead of O (2 n ) steps for models which ar e neither purely attractive nor purely repulsive. Beca use of t he ex po nent ia l de- pendenc e on n , the alg orithm of Kendall and Møller (2 0 00) is not f e a s- ible in these situations. It is clear that in pr a ctice it is p ossible to ex tend the work to more 14 G. K. Ambler and B. W. Silverman general m ultisca le mo dels . F or example, the sample L -function of the redwoo d seedlings might, if the sa mple size were larger , indicate the appropria teness of a three-sca le mo del p ( X ) = αλ N ( X ) γ − m ( X ⊕ G 1 ) 1 γ − m ( X ⊕ G 2 ) 2 γ − m ( X ⊕ G 3 ) 3 . (3.6) The proo f giv en in the App endix o f Am bler and Silverman (2004 b) can easily be extended to show the existence of this pro ces s, and (3.6) is also amenable to p erfect simulation using the metho d of Section 2.3. Because of the small size o f the redwoo d seedlings data set a model of this complexity is not warranted, but the fitting o f such mo dels, and even higher o rder m ultiscale models in appropriate circumsta nces, would b e an in teres ting topic for future res earch. Another topic is the p os sibility of fitting para meter s by a more system- atic appr o ach than the sub jective adjustment appro ach we hav e used. Am bler and Silverman (2004b) set out the p ossibility of using pseudo- likelihoo d (Besag, 1974, 1975, 197 7; J ensen and Mø ller, 199 1) to estim- ate the pa rameters λ , γ 1 and γ 2 for given R 1 and R 2 . How ever, this metho d has y et to b e implemented and inv estiga ted in practice. 4 Nonparametric regression b y wa v elet thresholding 4.1 In tro ductory remarks W e no w turn to our next theme, nonparametric regress ion. Supp ose we observe y i = g ( t i ) + ε i . (4.1) where g is an unkno wn function sampled with erro r at regular ly spaced int er v als t i . The noise, ε i is assumed to b e indep endent and Norma lly distributed with zero mean and v ar iance σ 2 . The standa rd wav elet-based approa ch to this pr o blem is based on t wo prop erties of the w avelet transform: 1. A lar g e class of ‘well-be haved’ functions can be sparsely represented in w av ele t space; 2. The wa velet transform maps indep endent iden tica lly distributed noise to independent iden tica lly distributed w avelet co efficient s . These tw o proper ties combine to suggest that a go o d way to remov e noise from a signal is to transform the sig nal in to wa velet space, dis - card all o f the s mall co efficie nts (i.e. threshold), and p er form the inv e r se Perfe ct simulation using dominate d c oupling fr om the p ast 15 transform. Since the true (noiseless) signal had a sparse representation in wa velet space, the signa l will es sentially be co ncent r ated in a small nu mber of large co efficients. The noise, on the other hand, w ill still be spread ev enly among the co efficients, so by dis carding the small co effi- cients w e m ust ha ve discarded mostly noise and will th us ha ve found a better estimate o f the true signal. The pr oblem then arises of how to c ho o se the thresho ld v a lue. General metho ds tha t ha ve b een applied in the wa velet context are SureShrink (Donoho and Johnstone, 19 95), cross-v alidation (Nas o n, 1996) and false discov ery rates (Abramovich a nd Benjamini, 1996). In the Bay esThresh approach (Abramovich et al., 1998) prop os e s a Bay esian hierarchical mo del fo r the w av elet co efficien ts, using a mixture o f a po in t mass at 0 and a N (0 , τ 2 ) density as their prior. The marginal po sterior media n of t he po pulation w av elet coefficient is then used as the estimate. This gives a thresholding rule, s ince the p oint mass at 0 in the prior gives non-zero probability that the popula tion w avelet co efficient will b e zero. Most Bayesian approaches to w av elet thresho lding mo del th e coeffi- cients indep e nden tly . In or der to capture the notion that nonzero wa velet co efficients ma y be in some wa y clus tered, w e allo w prior dependency betw e e n the co efficients b y mo delling them using an extension o f the area-interaction pro cess as defined in Sectio n 3.1 a b ove. The basic idea is that if a co efficient is nonzero then it is mor e lik ely that its neig h b o ur s (in a suitable sense) ar e a lso no n-zero. W e then use a n a ppropriate CFTP approach to sample from the poster ior distribution of our mo del. 4.2 A Ba y esian mo del for wa v elet thresholding Abramovic h et al. (1998) consider the problem where the true wav elet co efficients a re observed sub ject to Gaussian noise with ze ro mean and some v ariance σ 2 , b d j k | d j k ∼ N ( d j k , σ 2 ) , where b d j k is the v alue o f the noisy wa velet co efficient (the data ) and d j k is the v alue of the true (noiseless) co efficient. Their prior distribution on the true wav elet coefficients is a mixtur e of a Normal distribution with zero mean a nd v ariance dep endent on the level of the co efficient, and a po in t mas s at zer o as follows: d j k ∼ π j N (0 , τ 2 j ) + (1 − π j ) δ (0) , (4.2) where d j k is the v alue of the k th co efficient at level j of the discrete 16 G. K. Ambler and B. W. Silverman wa velet tr a nsform, and the mixture weigh ts { π j } ar e constant within each level. An alterna tive for mulation of this ca n be o btained by in tro - ducing auxiliary v ariables Z = { ζ j k } with ζ j k ∈ { 0 , 1 } a nd independent hyperprior s ζ j k ∼ Bernoulli( π j ) . (4.3) The prior given in equation (4.2) is then e xpressed as d j k | Z ∼ N (0 , ζ j k τ 2 j ) . (4.4) The s tarting p o int for our extension o f this approa ch is to no te that Z can b e consider ed to b e a p oint pro cess o n the discrete spa ce, or lattice, χ of indices ( j, k ) of the wav elet coefficients. The p oints of Z give the lo cations at which the prio r v ar iance of the w avelet co efficient , conditional on Z , is nonzero. F rom this point of view, the hyperpr ior structure g iven in equation (4 .3) is equiv a lent to sp ecifying Z to b e a Binomial pro cess with rate function p ( j, k ) = π j . Our ge ne r al approa ch will b e to repla ce Z by a more g eneral lat- tice pro cess ξ on χ . W e allow ξ to have multiple po int s at par ticula r lo cations ( j, k ), so that the num be r ξ j k of points a t ( j, k ) will be a non- negative integer, not ne c e ssarily confined to { 0 , 1 } . W e will assume that the prior v a riance is propo rtional to the num b er of points of ξ falling at the co rresp onding la ttice lo ca tion. So if ther e a re no po int s , the prior will b e concentrated at z e ro and the corresp onding observed wav elet will be treated as pure noise; on the other hand, the la rger the n um b er of po int s , the larger the prior v aria nce and the less shr ink ag e applied to the obser ved co efficient. T o allow for this gene r alisation, we extend (4.4) in the na tural w ay to d j k | ξ ∼ N (0 , τ 2 ξ j k ) , (4.5) where τ 2 is a constant. W e now cons ider the sp ecification of the pro cess ξ . While it is reas- onable that the wa velet trans fo rm will pro duce a spar se r epresentation, the time-fre q uency lo calis ation prop erties of the tr a nsform also make it natural to expect that the representation will b e clustered in some sense. The existence of t his clustered structure can be se en clearly in Figure 4 .1, whic h sho ws the discr e te wav elet tra nsform of several co m- mon test functions r epresented in the natural binar y tree configuration. With this clustering in mind, we mo del ξ a s an area-interaction pro cess on the space χ . The c hoice of the neighbourho o ds B ( x ) for x in χ will be discussed b elow. Given the choice of neighbourho o ds, the pr o cess will Perfe ct simulation using dominate d c oupling fr om the p ast 17 Figure 4.1 Examples of the discrete wa velet transform o f some test functions. There is clear evidence of clustering in most of the graphs. The original functions are sho wn above their discrete wa velet trans- form eac h time. be defined by p ( ξ ) = αλ N ( ξ ) γ − m { U ( ξ ) } (4.6) where p ( ξ ) is the intensit y r elative to the unit rate indep endent auto- 18 G. K. Ambler and B. W. Silverman Poisson pro cess (Cressie, 1993). If w e take γ > 1 this giv es a cluster ed configuratio n. Thus we would e xpe ct to see clusters of larg e v alues of d j k if this were a r easonable mo del—which is exactly wha t we do see in Figure 4.1. A s imple application of Bay es’s theorem tells us that the p os terior for our model is p ( ξ , d | b d ) = p ( ξ ) Y j,k p ( d j k | ξ j k ) Y j,k p ( b d j k | d j k , ξ j k ) = αλ N ( ξ ) γ − m { U ( ξ ) } Y j,k exp( − d 2 j k / 2 τ 2 ξ j k ) p 2 π τ 2 ξ j k Y j,k exp {− ( b d j k − d j k ) 2 / 2 σ 2 } √ 2 π σ 2 = αλ N ( ξ ) γ − m { U ( ξ ) } Y j,k exp {− d 2 j k / 2 τ 2 ξ j k − ( b d j k − d j k ) 2 / 2 σ 2 } p 2 π τ 2 ξ j k √ 2 π σ 2 . (4.7 ) Clearly (4.7) is not a standard density . In Sectio n 5.1 we show ho w the extension of the coupling-from-the-past a lgorithm describ ed in Sec- tion 2.3 enables us to sa mple from it. 4.3 Completing the sp e cification W e first note that in this co nt ex t χ is a discrete spac e, so the technical conditions requir ed in Section 3.1 of m ( · ) and B ( · ) are tr ivially satisfie d. In order to co mplete the sp ecification of our a rea-interaction prior for ξ , w e need a suitable in terpretation o f the neigh b ourho o d o f a lo catio n x = ( j, k ) on the lattice χ of indices ( j, k ) of w avelet co efficients. This lattice is a binary tr ee, and there are m a ny p ossibilities. W e decided to use the pa rent, the co efficient on the pare nt’s level of the transfo r m which is next-nea rest to x , the t wo adjacent coefficients on the level of x , the t wo children and the co efficients a djacent to them, making a total of nine co efficients (including x itself ). Figure 4.2 illustr a tes this s cheme, which captures the localis ation of b oth time and frequency effects. Figure 4.2 also shows how we dealt with b o undaries: w e assume that the s ig nal we are examining is p erio dic, making it natural to hav e p erio dic b oundary conditions in time. If B ( x ) ov erla ps with a frequency b oundar y we simply discard those parts which hav e no lo c ations a sso ciated with them. The simple counting mea sure used has m { B ( x ) } = 9 unless x is in the b ottom row or one of the to p t wo rows. Other p oss ible neighbour ho od functions include using only the par e n t, children and immediate sibling and cousin of a co efficient as B ( x ), or a v ariation o n this taking into account the length of supp or t of the wav elet Perfe ct simulation using dominate d c oupling fr om the p ast 19 Figure 4.2 The four plots give examples of what w e u sed as B ( · ) for four different example locations sho wing ho w w e d ealt with b ound - aries. Grey b oxes are B ( x ) \ { x } for each example location x , while x itself is sho wn as black. used. Though we hav e chosen to use p erio dic b oundary c onditions, o ur metho d is equa lly applicable without this assumption, w ith appropr iate mo dification of B ( x ). 5 P erfect sim ulation for wa v elet curv e estimation 5.1 Exact p osterior sampling for lattice pro cesses In this section, w e develop a practical approach to simulation fr om a close approximation to the p oster ior density (4.7 ), making use of coup- ling from the past. One of the adv an ta ges of the Normal mo del w e pro- po se in Section 4.2 is that it is p ossible to integrate out d j k and work only with the lattice pr o cess ξ . Performing this calculation, we see that equation (4.7) can be rewr itten a s p ( ξ | b d ) = p ( ξ ) Y j,k exp n − b d 2 j k / 2( σ 2 + τ 2 ξ j k ) o p 2 π ( σ 2 + τ 2 ξ j k ) , by the standa rd co nv olution pro p er ties of normal densities. W e now see that it is possible to sample from the poster ior b y simulating o nly the pro cess ξ and ignoring the marks d . This lattice pro cess is amenable to per fect sim ulation using the metho d of Section 3.3 ab ov e. Let f 1 ( ξ ) = λ N ( ξ ) , f 2 ( ξ ) = γ − m { U ( ξ ) } , 20 G. K. Ambler and B. W. Silverman f 3 ( ξ ) = Y j,k exp {− b d 2 j k / 2 ( σ 2 + τ 2 ξ j k ) } and f 4 ( ξ ) = Y j,k 2 π ( σ 2 + τ 2 ξ j k ) − 1 / 2 . Then λ f 1 ( u ; ξ ) = λ, λ f 2 ( u ; ξ ) = γ − m { B ( u ) \ U ( ξ ) } ≤ 1 , λ f 3 ( u ; ξ ) = exp ( b d 2 u τ 2 2( σ 2 + τ 2 ξ u ) { σ 2 + τ 2 ( ξ u + 1) } ) ≤ exp ( b d 2 u τ 2 2 σ 2 ( τ 2 + σ 2 ) ) and λ f 4 ( u ; ξ ) = τ 2 ξ u + σ 2 τ 2 ( ξ u + 1) + σ 2 1 / 2 ≤ 1 . By a slight abuse o f no ta tion, in the se c ond and third equations ab ov e we use u to refer both to the p oint { u } a nd the lo cation ( j, k ) at which it is found. The functions f 1 , . . . , f 4 are also monoto ne with resp ect to the subs e t re lation, so all of the conditions for exact sim ulation using the metho d of Section 2.3 are satisfied. In the spatial process es considered in detail in Section 3.3, the dom- inating pr o cess had constant in tensity a cross the space χ . In the prese nt context, how ever, it is neces sary in practice to use a do mina ting pro cess which ha s a differ e n t rate at each la ttice lo catio n, and then use lo cation- sp ecific m a xima and minima rather than g lobal maxima and minima. Because we can no w us e lo cation-sp ecific, rather than globa l, maxima and minima, we can initialise upp er and lower pro cesses that are muc h closer together than would hav e be e n p ossible with a constant-rate dom- inating pro cess . This has the consequence of reducing coa lescence times to feasible levels. A constant-rate dominating process would not ha ve bee n feas ible due to the siz e o f the globa l maxima, so this mo difica- tion to the metho d of Section 3 .3 is essential; see Section 5 .3 for deta ils . Chapter 5 o f Ambler (2002) giv es some other examples of dominating pro cesses with loc ation-sp ecific in tensities. The loca tion-sp ecific rate of the do mina ting proc ess D is λ dom j k = λe b d 2 jk τ 2 / 2 σ 2 ( τ 2 + σ 2 ) (5.1) for each lo ca tion ( j, k ) on the lattice. The lower pro cess is then star ted Perfe ct simulation using dominate d c oupling fr om the p ast 21 as a thinn e d version of D . Poin ts are accepted with probability P ( x ) = γ − M ( χ ) σ 2 τ 2 + σ 2 1 / 2 × exp ( − b d 2 x τ 2 2 σ 2 ( τ 2 + σ 2 ) ) , where M ( χ ) = max χ [ m { B ( x ) } ]. The upper and lower pro ces ses ar e then evolv ed through time, acce pting p oints as descr ibed in Section 2.3 with probability 1 λ dom j k λ f 1 ( u ; ξ up ) λ f 2 ( u ; ξ up ) λ f 3 ( u ; ξ low ) λ f 4 ( u ; ξ up ) for the upper pro cess and 1 λ dom j k λ f 1 ( u ; ξ low ) λ f 2 ( u ; ξ low ) λ f 3 ( u ; ξ up ) λ f 4 ( u ; ξ low ) for the lower pr o cess. The r e mainder o f the a lgorithm carries over in the obvious wa y . There a re still some is sues to b e a ddressed due to very high birth rates in the dominating pro cess, and this will be done in Section 5.3. 5.2 Using the generated samples Although d w as in teg rated out for sim ulation reasons in Section 4.2 it is, natura lly , the quantit y of in teres t. Having simulated r e alisations of ξ | b d we then generate d | ξ , b d fo r each rea lisation ξ generated in the first step. The sample median o f d | ξ , b d gives an estimate for d . The median is used instea d of the mean as this gives a thresholding rule, defined by Abramovic h et al. (1 998) as a r ule giving p ( d j k = 0 | b d ) > 0 . W e ca lculate p ( d | ξ , b d ) using logarithms for eas e of notation. Assuming that ξ j k 6 = 0 we find log p ( d j k | b d j k , ξ j k 6 = 0 ) = log p ( d j k | ξ j k 6 = 0 ) + lo g p ( b d j k | d j k , ξ j k 6 = 0 ) + C = − d 2 j k 2 τ 2 ξ j k + − ( b d j k − d j k ) 2 2 σ 2 + C 1 = − ( σ 2 + τ 2 ξ j k ) d j k − τ 2 ξ jk b d jk σ 2 + τ 2 ξ jk 2 2 σ 2 τ 2 ξ j k + C 2 where C , C 1 and C 2 are constants. Thus d j k | b d j k , ξ j k 6 = 0 ∼ N τ 2 ξ j k b d j k σ 2 + τ 2 ξ j k , σ 2 τ 2 ξ j k σ 2 + τ 2 ξ j k ! . 22 G. K. Ambler and B. W. Silverman When ξ j k = 0 we clea rly have p ( d j k | ξ j k , b d j k ) = 0. 5.3 Dealing with large and small ra tes W e now deal with some approximations whic h are necessa ry to allo w our alg o rithm to b e feasible computationally. Recall from e q uation (5 .1 ) that if the maximum data v alue d j k is tw ent y times larger in magnitude than the standard devia tion of the noise (a no t uncommon even t for reasona ble noise levels) then we ha ve λ dom = λe 400 σ 2 τ 2 / 2 σ 2 ( τ 2 + σ 2 ) = λe 200 τ 2 / ( τ 2 + σ 2 ) . Now unles s τ is sig nificantly smaller than σ , this will result in enormous birth rates, which make it necessary to mo dify the algorithm appro pri- ately . T o addre ss this issue, we noted that the chances of there being no live points at a loca tion whos e data v alue is la rge (resulting in a v alue of λ dom larger than e 4 ) is sufficien tly small that for the purp oses of calcu- lating λ f 2 ( u ; ξ ) for near b y lo cations it can b e assumed that the num b er of points alive w a s strictly positive. This means that w e do not know the true v alue of ξ j k for the lo c ations with the larges t v alues of d j k . This leads to problems since w e need to generate d j k from the distribution d j k | ξ j k , b d j k ∼ N τ 2 ξ j k b d j k σ 2 + τ 2 ξ j k , σ 2 τ 2 ξ j k σ 2 + τ 2 ξ j k ! , which requires v alues of ξ j k for ea ch loc a tion ( j, k ) in the co nfiguration. T o deal with this issue, we first note tha t, a s ξ j k → ∞ , τ 2 ξ j k b d j k σ 2 + τ 2 ξ j k − → b d j k monotonically from below, and τ 2 ξ j k σ 2 σ 2 + τ 2 ξ j k − → σ 2 , also monotonically from b elow. Since σ is t ypically sma ll, conv erge nce is very fas t indeed. T aking τ = σ as an example we s ee that even when ξ j k = 5 we have τ 2 ξ j k b d j k σ 2 + τ 2 ξ j k = 5 6 b d j k Perfe ct simulation using dominate d c oupling fr om the p ast 23 and τ 2 ξ j k σ 2 σ 2 + τ 2 ξ j k = 5 6 σ 2 . W e see that w e are alrea dy within 1 6 of the limit. Con vergence is ev en faster for larger v alues of τ . W e also recall that the dominating pro ce s s gives an upper bound f o r the v alue of ξ j k at ev ery loca tion. Th us a goo d estimate for d j k would be gained by taking the v alue of ξ j k in the dominating proc e ss for those po int s where we do not know the exa ct v alue. This is a go o d solution but is unnecess ary in some cases, as sometimes the v alue of λ dom is so large that there is little adv antage in using this v alue. Thus for exceptionally large v alues of λ dom we simply use N ( b d j k , σ 2 ) num b ers as o ur estimate of d j k . 5.4 Sim ulation study W e now prese nt a simulation study of the p erforma nce of o ur estima tor relative to several establishe d w av elet- ba sed estimators. Similar to the study of Abramovic h et al. (19 98), we inv estigate the p erformance of our metho d on the four standard test functions o f Donoho a nd Johnstone (1994, 1995), na mely ‘Blo cks’, ‘Bumps’, ‘Doppler ’ and ‘Heavisine’. These test functions are used beca use they exhibit different kinds o f b ehaviour t y pic a l of sig nals arising in a v ar iet y of applications. The tes t functions were simulated at 256 p o ints eq ually spaced o n the unit interv al. The test signa ls were centred and scaled s o a s to hav e mean v alue 0 and s ta ndard devia tion 1 . W e then added indep endent N (0 , σ 2 ) noise to ea ch of the functions, where σ w as taken a s 1 / 1 0, 1 / 7 and 1 / 3. The noise levels then corres po nd to ro ot signal- to -noise ratios (RSNR) o f 10, 7 and 3 resp ectively . W e p er formed 2 5 replicatio ns . F or our metho d, w e simulated 25 indep endent dr aws fr o m the po sterior distribution of the d j k and used the sample median as o ur estimate, as this gives a thresholding rule. F or ea ch of the runs, σ was set to the standard devia tion of the noise we a dded, τ was set to 1 . 0, λ was s et to 0 . 05 and γ was set to 3 . 0. The v alues of parameter s σ and τ were set to the true v a lues of the standard devia tio n of the no is e and the signal, resp ectively . In pr a ctice it will b e necessar y to develop some metho d for estimating these v alues. The v alue of λ was chosen to b e 0 . 05 b e cause it w a s felt tha t no t ma n y 24 G. K. Ambler and B. W. Silverman T able 5.1 Aver age me an-squar e err ors ( × 10 4 ) for the ar e a-inter action BayesThr esh (AIBT), Sure S hrink (SS), cr oss-validatio n ( CV) , or dinary BayesThr esh (BT) and false disc overy r ate (FDR) estimators fo r four test functions f or thr e e values of the r o ot signal-to-noise r atio. A ver ages ar e b ase d on 25 r eplic ates. Standar d err ors ar e given in p ar entheses. RSNR Method T est fun ctions Blocks Bumps Doppler Heavisine AIBT 25 (1) 84 (2) 49 (1) 32 (1) SS 49 (2) 131 (6) 54 (2) 66 (2) 10 CV 55 (2) 392 (21) 112 (5) 31 ( 1) BT 3 44 (10) 1651 (17) 167 (5) 35 (2) FDR 159 (14) 449 (17) 145 (5) 64 (3) AIBT 56 (3) 185 ( 5) 87 (3) 52 (2) SS 98 (3) 253 ( 10) 99 (4) 94 (4) 7 CV 96 (3) 441 (25) 135 (6) 54 ( 3) BT 4 14 (11) 1716 (21) 225 (6) 57 (2) FDR 294 (18) 758 (27) 253 (9) 93 (4) AIBT 535 (21) 1023 (15) 448 (18) 153 (6) SS 482 (13) 973 (45) 399 (14) 147 (3) 3 CV 452 (11) 914 (34) 375 (13) 148 (6) BT 8 60 (24) 2015 (37) 448 ( 12) 140 (4) FDR 1230 (52) 2324 (88) 862 (31) 148 ( 3) of the co efficients would b e significant. The v alue of γ w as c hose n based on small trials for the heavisine and jumpsine da tasets. W e co mpare o ur metho d with several es tablished wa velet-based estim- ators for reconstructing no isy sig nals: SureShrink (Donoho and Johnstone, 1994), tw o- fold cro s s-v alidation as applied by Nason (1 9 96), o rdinary Bay esThre s h (Abramo vich et al., 1998), and the f a lse discovery rate as applied b y Abramovich a nd Benjamini (1996). F or tes t signals ‘Bumps’, ‘Doppler’ and ‘Hea v isine’ w e used Daub e- chies’ least asymmetric wa velet o f order 10 (Daubechies, 1992). F or the ‘Blo cks’ signal we used the Haar wa velet, as the origina l sig nal was piecewise co nstant. The analysis was carried out using the freely a v ail- able R statistical pack age. The W av eThres h pac k age (Nason, 199 3) was used to per form the discrete wa velet transfor m and als o to compute the SureShrink, cross-v alidation, BayesThresh and false discov e r y rate estimators. Perfe ct simulation using dominate d c oupling fr om the p ast 25 The go o dness of fit of each estimator w a s measured by its av e r age mean-squar e error (AMSE) o ver the 25 replications. T able 5.1 presents the r esults. It is clear that our estimator performs extremely well with resp ect to the other estimators when the signal- to-noise ratio is mo der ate or larg e, but less w ell, thoug h still co mpetitively , when there is a small signal-to- noise ratio . 5.5 Remarks and directions for future w ork Our pro cedure for Bayesian w av elet thresholding has used the natur- ally c lus tered na tur e of the w avelet transform when deciding how m uch weigh t to give co efficient v a lue s . In co mparisons with other metho ds, our a pproach perfor med very well for moder ate and low nois e levels, and reasonably c omp e titiv ely f o r higher noise lev els . One p ossible area fo r future work would b e to replace equation (4.5 ) with d j k | ξ ∼ N (0 , τ 2 ( ξ j k ) z ) , where z would be a further para meter. This w ould modify the num b er of p oints which are likely to b e a live at any g iven lo ca tion and thus also mo dify the tail behaviour of the prior. The idea b ehind this suggestion is that when we know that the behaviour of the data is either heavy o r light tailed, w e could adjust z to compensa te. This could p oss ibly also help sp eed up conv erg ence by reducing the n umber of po ints at lo cations with large v a lues of d j k . A seco nd p ossible a rea fo r future work would b e to develop some automatic metho ds for choosing the para meter v alues, p erhaps using the metho d of maximum pseudo-likelihoo d (Besag, 1 974, 1975, 1977). Finally , it would b e o f obvious interest to find an approach which made the appr oximations of Section 5.3 unnecessar y and allowed for true CFTP to be preser ved. 6 Conclusion This paper , based on Am bler and Silverman (200 4a,b), has drawn to - gether a num b er of themes which demonstrate the wa y that mo dern computational statistics has made use of work in applied probability and sto chastic processes in ways which would ha ve been inconceiv able 26 G. K. Ambler and B. W. Silverman not m a ny decades ago. It is therefo r e a pa r ticular ple a sure to dedicate it to John K ingman on his birthda y ! References Abramovic h, F., and Benjamini, Y. 1996. Ad ap t ive thresholding of w av elet coefficients. Com put. Statist. Data A nal. , 22 , 351–361. Abramovic h, F., Sapatinas, T., and Silverman, B. W. 1998. W av elet t hreshold- ing via a Bay esian approach. J. R oy. Statist. So c. Ser. B , 60 , 725–749 . Ambler, G. K. 2002. Dominate d Coupling fr om the Past and Some Extensions of the Ar e a-Inter action Pr o c ess . Ph.D. thesis, D epartment of Mathemat- ics, Universit y of Bris t ol. Ambler, G. K., and Silv erman, B. W. 2004a. Perfe ct Simula- tion for Bayesian Wavelet Thr esholding with Corr elate d Co effi- cients . T ech. rept. Department of Mathematics, Universit y of Bristol. http://arX iv.org/abs/0903.26 54v 1 [s t at.ME]. Ambler, G. K., and Silverman, B. W . 2004b. Perfe ct Si mulation of Sp atial Point Pr o c esses using Dominate d Coupli ng fr om the Past wi th Appli c ation to a Multisc ale A r e a-Inter action Point Pr o- c ess . T ech. rept. Department of Mathematics, Un ivers ity of Bristol. http://arX iv.org/abs/0903.26 51v 1 [s t at.ME]. Baddeley , A. J., and Lieshout, M. N . M. v an. 1995. A rea-interac t ion point processes. A nn. I nst. Statist. M ath. , 47 , 601–619 . Baddeley , A . J., T urner, R ., Møller, J., and Hazelton, M. 2005. Residual analysis for spatial p oint pro cesses (with Discussion). J. R oy. Stat i st. So c. Ser. B , 67 , 617–666. Besag, J. E. 1974. Spatial interactio n and the statistical analysis of l attice systems. J. R oy. Statist. So c. Ser. B , 36 , 192–236. Besag, J . E. 197 5. Statistical analysis of non-lattice data. The St atistician , 24 , 179–195. Besag, J. E. 1977. Some metho ds of statistical analysis for spatial data. Bul l . Int. Statist . Inst. , 47 , 77–92. Connor, S. 2007. Perfect sampling. In: Ruggeri, F., Ken ett, R ., and F altin, F. (eds), Encyclop e dia of Statistics i n Quality and R eliabili ty . New Y ork: John Wiley & Sons. Cressie, N. A. C. 1993. Statistics for Sp atial Data . New Y ork : John W iley & Sons. Daub echies, I . 1992. T en L e ctur es on Wavelets . Philadelphia, P A: SIAM. Diggle, P . J . 1978. O n para meter estimation for spatial point p rocesses. J. R oy. Statist . So c. Ser. B , 40 , 178–18 1. Donoho, D. L., and Johnstone, I. M. 1994. Ideal spatial adaption by w av elet shrink age. Bi ometrika , 81 , 425–455. Donoho, D . L., and Johnstone, I. M. 1995. Adapting to unknown smo othness via w ave let shrink age. J . Amer. Statist. A sso c. , 90 , 1200–1224. Perfe ct simulation using dominate d c oupling fr om the p ast 27 Green, P . J., and Murd och, D. J. 1998. Exact sampling for Ba yesian inference: to wards general purp ose algorithms (with discussion). P ages 301–321 of: Bernardo, J. M., Berger, J. O., Dawi d , A. P ., and Smith, A. F. M. (eds), Bayesian Statist i cs 6 . Oxford: Oxford Univ. Press. H¨ aggstr¨ om, O., Lieshout, M. N. M. va n , and Møll er, J. 1999. Characterisa- tion results and Marko v chain Mon te Carlo algorithms including exact sim ulation for some spatial point pro cesses. Bernoul li , 5 , 641–658. Jensen, J. L., and Møller, J. 19 91. Pseudolikeli h oo d for expon ential family mod els of spatial p oint pro cesses. A nn. Appl. Pr ob ab. , 1 , 445–461. Kelly , F. P ., and Ripley , B. D. 1976. A note on Strauss’s mod el for clustering. Biometrika , 63 , 357–360. Kendall, W. S. 1997. O n so me w eigh t ed B o olean models. Pa ges 105– 120 of: Jeulin, D. (ed), A dvanc es in The ory and Applic ations of R andom Set s . Singap ore: W orld Scien tifi c. Kendall, W. S. 1998. Perf ect sim ulation for th e area-interactio n p oin t p ro- cess. Pages 218–2 34 of: Accardi, L., and H eyde, C. C. (eds), Pr ob ability T owar ds 2000 . New Y ork: Sprin ger-V erlag. Kendall, W. S., and Møller, J. 2000. P erfect simulatio n using dominated processes on ord ered spaces, with applications t o locally stable p oin t processes. A dv. in Appl. Pr ob ab. , 32 , 844–865. MacKa y , D. J. C. 2003. Information The ory, Infer enc e, and L e arning Al- gorithms . Cambridge: Cambridge Univ. Press. Matheron, G. 1975. R andom Sets an d Inte gr al Ge ometry . New Y ork: John Wiley & Son s. Murdo ch, D. J., and Green, P . J. 1998 . Exact sa mp ling from a contin uous state space. Sc and. J. Statist. , 25 , 483–5 02. Nason, G. P . 1993 . The WaveThresh Package: Wavelet T r ansform and Thr esholding Softwar e for S-Plus and R . Av ailable from Statlib. Nason, G. P . 1996 . W a velet shrinka ge using cross-v alidation. J. R oy. Statist. So c. Ser. B , 58 , 463–479. P apangelou, F. 1974. The conditional intensit y of general p oint processes and an app lication to line pro cesses. Z. W ahrscheinlichkeitsthe orie verw. Geb. , 28 , 207–226. Propp, J. G., and Wilson, D. B. 1996. Exact sampling with coupled Mark ov chai n s and applications to statistical mec h anics. R andom Struct ur es & Algor i thms , 9 , 223–252. Propp, J. G., and Wilson, D. B. 1998 . Ho w to get a p erfectly random sample from a generic Mark ov chai n a n d generate a random spanning tree of a directed graph. J. Algorithms , 27 , 170–217. Ripley , B. D. 1976. The second- order analysis of stationary p oint processes. J. Appl. Pr ob ab. , 13 , 255–266. Ripley , B. D. 1977. Mo delling spatial patterns (with Discussion). J. R oy. Statist. So c. Se r. B , 39 , 172–212. Schladitz, K., and Baddeley , A. J. 2000. A third order p oint pro cess charac- teristic. Sc and. J. Statist. , 27 , 657–671 . Strauss, D. J. 1975. A model for clustering. Biometrika , 62 , 467–475.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment