Usefulness of the Reversible Jump Markov Chain Monte Carlo Model in Regional Flood Frequency Analysis

Regional flood frequency analysis is a convenient way to reduce estimation uncertainty when few data are available at the gauging site. In this work, a model that allows a non-null probability to a regional fixed shape parameter is presented. This me…

Authors: Mathieu Ribatet (UR HHLY, INRS), Eric Sauquet (UR HHLY)

Usefulness of the Reversible Jump Markov Chain Monte Carlo Model in   Regional Flood Frequency Analysis
W A TER RESOURCES RESEARCH, V OL. 43, W08403, DOI:10.1029/ 2006WR00552 5, 2007 Usefulness of the Rev ersible Jump Mark o v Chain Mon te Carlo Mo del in Regional Flo o d F re quency Analysis M. Ribatet, 1,2 E. Sauquet, 1 JM. Gr ´ esillon, 1 and T.B.M.J. Ouarda 2 Abstract. Regional flo od frequency analys is is a con venien t wa y to reduce estimation uncertaint y when few data are av ailable a t the ga uging site. In this w ork , a model that allows a non null pro ba bility to a regional fixed shap e parameter is pr esented. This method- ology is in tegrated within a B ayesian fra mework and use s rev ersible jump tec hniques. The p erfor ma nce on sto chastic data of this new estimator is co mpared to t w o other mod- els: a con ven tional Bay esian a nalysis and the index flo o d a pproach. Results show that the pr op osed estimator is a bsolutely suited to regiona l estimation when only a few data is av ailable at the target site. Moreov er, unlike the index floo d estimator, tar g et site in- dex flo o d error estimation see ms to ha ve less impact o n Bay esian estimator s. Some sug - gestions ab out configurations of the p o oling groups ar e also pres e nt ed to increase the per formance of eac h es timator. Keywords: Reg ional F r equency Analysis, Extreme V alue Theory , Gene r alized Pareto Distribution, Rev ersible Jumps, Marko v Chain Mo nte Car lo. 1. In tro duction Extreme v alue theory is no w widely applied when mod- eling blo ck maxima or exceedences ov er a threshold is of interes t. In particular, the Generalized P areto Distribution ( GPD ) describes the limiting distribution of normalized ex- cesses of a threshold as the threshold approaches the end- p oint of the va riable [ Pickands , 197 5]. The GPD has a dis- tribution function defined by: G ( x ; µ, σ , ξ ) = 1 − » 1 + ξ ( x − µ ) σ – − 1 /ξ , x > µ, 1+ ξ ( x − µ ) σ > 0 (1) where σ > 0, ξ ∈ R . µ, σ and ξ are resp ectively the location, scale and sh ape p arameters. Thus, when extreme v alues must b e estimated, this ap- proximati on is frequently used. Most applications based on this result are related t o environmental sciences, as extreme wind sp eed [ Payer and Kuchenhoff , 2004], extreme sea level [ Bortot and Coles , 2000; Pandey et al. , 2004] or extreme river discharge [ Northr op , 200 4]. How ever, one must often d eal with small samples and large uncertain ties on estimation. Several publications p oin t out the problem of th e shap e parameter estima tion. This parameter is of great interest as it determines th e tail b e- haviour of the distribution. Therefore, many authors ana- lyzed th e p erformance of particular estimators given a sp ec- ified range for t he shap e parameter: Ros bjer g et al. [1992] for the metho d of moments; Coles and Dixon [1999] for the max im um likeli ho o d; Hosking and Wal lis [1987] for the probabilit y weigh ted moments; Ju´ ar ez and Schuc any [2004] for the minimum den sit y p ow er div ergence estimator; Mar- tins and Ste di nger [2000 ] for a prop osed generalized maxi- mum lik eliho od . H ow ever, these results p ro vide the most ac- curate estimator give n th e shap e p arameter; whic h is never 1 CEMAGREF Lyon, Unit´ e de Rec herche Hydrologie-Hydraulique, 3 bis quai Chauvea u, CP220, 69336 Ly on cedex 09, FRAN CE 2 INRS-ETE, Universit y of Qu´ ebec, 490, de la Couronne Qu´ ebec, Qc, G1K 9A9, CA N ADA. Cop yright 2018 b y the Am erican Geophysical Union. 0043-1397 /18/2006WR005 525$9.00 the case in practice. Therefore, Park [2005] introdu ced a systematic wa y of selecting hyper-parameters for his pro- p osed generalized maximum likelihood estimator. All t hese app roac hes only d eal with information from the target site sample. How ever, it is frequent in h ydrology to p erform a Regional F requency Analysis ( RF A ). T raditional RF A consists of t wo steps: (a ) delineation of homogeneous regions i.e. a p ooling group of stations with simila r b e- haviour; (b) regional estimation i.e. estimate target site distribution from the regional information. More recently , Ba yesian app roac hes have been app lied with success t o incorp orate regional informatio n in fre- quency analysis [ Coles and T awn , 1996; Northr op , 2004; Sei- dou et al. , 2006; Rib atet et al. , 2007]. Empirical Bay esian estimators hav e also b een p rop osed [ Kuczer a , 1982; M adsen and R osbjer g , 1997] . One of the adv antages of these ap- proac hes is to distinguish the at site information from t he other sites data in the estimation pro cedure. This is an im- p ortant p oint as, no m atter how high the homogeneity level ma y be, the only data which represents p erfectly the tar- get site is obviously the targe t site one. Thus, the whole information av ailable is used more efficiently . In addition, according to R ib atet et al. [2007], the Ba yesian approac h es allo w to relax the scale inv ariance prop erty required by the most app lied RF A mo d el, th at is, the index fl oo d [ Dalrym- ple , 1960]. How ever, a preliminary stu dy on sim ulated data show ed that the ap p roac h developed b y R ib atet et al. [20 07] may lead to unreliable estimates for larger return p erio ds ( T > 20 years ) when small samples are in volv ed. This p o or p erfor- mance is mainly due to the large v ariance on the shap e pa- rameter estimation. Co nsequently , for such cases, attention must b e paid to the regional estimation pro cedure for th e shap e parameter. The b asis of our new d evelo pment w as formerly prop osed by Stephenson and T awn [2004]. They u se reversible jump Mark ov c h ain Mon te Carl o techniques [ Gr e en , 1995] to at- tribute a non null probabilit y to the Gumbel case. There- fore, realizations are not supp osed to b e Gumbel d istribu t ed, but hav e a non null probabili ty to b e Gu mb el distributed. An application to extreme rainfall and sea-leve l is given. In this work, this app roach is extend ed to take in to account a regional shap e parameter, n ot only the Gum b el/Exp onential case, within a RF A framew ork. The reversible jump tech- nique allo ws t o focu s on a “likely” sh ap e parameter va lue 1 X - 2 RIBA T ET ET AL.: REVERSIBLE JUMP TECHNIQUES IN REGION A L FLOOD FREQU ENCY ANAL YSIS giv en b y the hydrological relev ance of the homogeneous re- gion. Thus, this approac h may reduce the shap e parameter v ariance estimation while relaxing the scale inv ariance prop- erty . The main ob jectiv es of this article is first to p resen t new developmen ts in the metho dology prop osed by Stephenson and T awn [2004] required for a RF A context; second to as- sess the qualit y of tw o Bay esian mo dels based on th e index floo d h yp othesis: th e regional Ba yesian mo del proposed b y Rib atet et al . [2007] ( BA Y ) and the new prop osed Bay esian approac h apply ing reversible jumps Marko w chains ( REV ). They are compared to the classical index flo o d approac h of Dalrymple [1960] ( IFL ). The assessment is developed through a stochastic generation of regional data p erformed in order to obtain realistic features of homogeneous regions. Detailing th e ind ex flo o d concept is out of the scope of this article. Estimation proced ure can be found i n Hosk ing and Wal lis [1997]. The pap er is organized as follow s. The next t wo sections concentrate on methodological a sp ects. Section 2 d escribes the Ba yesia n framew ork inclu d ing the specific Marko v Chain Mon te Carlo ( MC MC ) algorithm, required to extend the w ork by Stephenson and T awn [2004]. Section 3 presents the simple and efficient algorithm to generate stochastically hydrologi cal homogeneous regions. A sensitivity analysis is p erformed in section 4 to assess how quantile estimates and related uncertainties are influenced by the va lues of tw o parameters of the reversible jump Marko v chains. Section 5 compares th e performance of each estimator on six represen- tative case stud ies. The impact of the bias in the target site index flo o d estimation is analyzed in section 6, while sug- gestions for building efficient p o oling groups are presented in section 7 . Finally , some conclusions are drawn in section 8. 2. Metho dolog y In th e B ay esian framew ork, the posterior distribution o f parameters m ust b e known to derive q uantile estima tes. The p osterior distribution π ( θ | x ) is giv en by the Ba yes Theo- rem [ Bayes , 1763]: π ( θ | x ) = π ( θ ) π ( x ; θ ) R Θ π ( θ ) π ( x ; θ ) dθ ∝ π ( θ ) π ( x ; θ ) (2) where θ is the v ector of parameters of the distribution t o b e fitted , Θ is the parameter space. π ( x ; θ ) is th e likeli ho o d function, x is the vector of observ ations and π ( θ ) is th e prior distribution. In this stud y , as excesses ov er a high t h reshold are of in- terest, the likeli ho o d fun ction π ( x ; θ ) is related to the GPD - see equ ation (1). 2.1. Prior Distribution In this section, the methodology to elicit the prior dis- tribution is presented. In this study , regi onal information is used to define the prior d istribution. F u rt hermore, the prior is specific as it m ust accoun t for a fixed shape pa- rameter ξ Fix with a non null probability p ξ . Let Θ 0 b e a sub-space of the parameter space Θ of θ . More precisely , Θ 0 = { θ ∈ Θ : ξ = ξ Fix } . p ξ is a h yp er- p arameter of the prior distribution. The approac h is to co nstruct a suitable prior distribution on Θ ; then, for p ξ fixed, to mo dify this prior to account for the probability of Θ 0 . F or clarity purp oses, the prior d istribution is d efi ned in tw o step s. First, an initial prior distribution π in ( θ ) defin ed on Θ is introduced. Second, a revised prior distribution π ( θ ) is d erived from π in ( θ ) to attribu te a non null probability to the Θ 0 sub-sample. 2.1.1. Initial prior distribution As the prop osed mo del is fully parametric, the initial prior distribution π in ( θ ) is a m ultiv ariate distribution en- tirely defined by its hyper-parameters. In our case study , the initial prior d istribution corresp onds to th e one intro- duced b y R ib atet et al . [2007] . Co nsequently , the marginal prior d istributions were supp osed to b e indep en dent lognor- mal for both location and scale parameters a nd normal for the sh ape p arameter. Thus, π in ( θ ) ∝ J exp h ( θ ′ − γ ) T Σ − 1 ( θ ′ − γ ) i (3) where γ , Σ are hyper-p arameters, θ ′ = (log µ, log σ, ξ ) and J is the Jacobian of the transformation from θ ′ to θ , namely J = 1 /µσ . γ = ( γ 1 , γ 2 , γ 3 ) is the mean vector, Σ is the co v ariance matrix. A s marginal priors are supp osed to b e indep endent, Σ is a 3 × 3 diagonal matrix with diagonal elemen ts d 1 , d 2 , d 3 . Hyp er-parameters are defi ned through th e index flo o d concept, that is, all d istributions are identical up to an at- site dep endent constant. Consider all sites of a region except the target site - sa y the j -th site. A set of pseudo t arget site parameters can b e computed: ˜ µ i = C ( j ) µ ( i ) ∗ (4) ˜ σ i = C ( j ) σ ( i ) ∗ (5) ˜ ξ i = ξ ( i ) ∗ (6) for i 6 = j , where C ( j ) is the target site index floo d and µ ( i ) ∗ , σ ( i ) ∗ , ξ ( i ) ∗ are resp ectively th e location, scale and shap e at-site parameter estimates from the res caled sample - e.g . normalized by its respective in d ex flo o d estimate. U nder t he hypothesis of the index flo o d concept, pseudo- p arameters are exp ected to b e distributed as p arameters of the target site. Information from t h e target site sa mple can not b e used to elicit the prior distribution. Thus, C ( j ) in equations (4) and (5) must be estimated without use of the j -th sample site. In this case s tud y , C ( j ) is estima ted through a General- ized Linear Model ( GLM ) defin ed by: 8 < : E h log C ( j ) i = ν, ν = X β V ar h log C ( j ) i = φV ( ν ) (7) where X are basin chara cteristics ( p ossibly log tran s- formed), φ is the disp ersion parameter, V the v ariance func- tion and ν is the linear predictor. McCul lagh and Nelder [1989] give a comprehensiv e introd uction to GLM. O ther alternativ es for modeling the target site in dex flo o d can be considered such as Generalized A dditive Mo dels [ Wo o d and Aug ustin , 2 002], Neural Netw orks [ Shu and Burn , 2004 ] or Kriging [ Merz and Bl¨ oschl , 2005]. H ow ever, the v ariance of C ( j ) should b e estimated. Indeed, as C ( j ) is estimated without use of the target site data, uncertainties due t o th is estimation must b e incorp orated in the prior distribution. F rom these p seudo parameters, hyp er-parameters can b e computed: γ 1 = 1 N − 1 X i 6 = j log ˜ µ ( i ) , d 1 = 1 N − 1 X i 6 = j V ar h log ˜ µ ( i ) i (8) γ 2 = 1 N − 1 X i 6 = j log ˜ σ ( i ) , d 2 = 1 N − 1 X i 6 = j V ar h log ˜ σ ( i ) i (9) γ 3 = 1 N − 1 X i 6 = j ˜ ξ ( i ) , d 3 = 1 N − 2 X i 6 = j “ ˜ ξ ( i ) − γ 3 ” 2 (10) RIBA T ET ET AL .: REVERSIBLE JUMP TECHNIQUES IN REGIONAL FLOOD FREQUENCY ANAL YSIS X - 3 Under the indep endence assumption b etw een C ( j ) and µ ( i ) ∗ , σ ( i ) ∗ , the follow ing rela tions h old: V ar h log ˜ µ ( i ) i = V ar h log C ( j ) i + V ar h log µ ( i ) ∗ i (11) V ar h log ˜ σ ( i ) i = V ar h log C ( j ) i + V ar h log σ ( i ) ∗ i (12) The ind ep endence assumption is not to o restrictive as the target site ind ex flo o d is estimated indep endently from µ ( i ) ∗ , σ ( i ) ∗ . Note that V ar h log · ( i ) ∗ i are estima ted th anks to Fisher information and the delta method . Estimation of V ar h log C ( j ) i is a special case and d ep ends on the metho d for estimating t he at-site index flo o d. Nevertheless, it is al- w ays p ossible to carry out an estimation of this v ariance, at least through standard errors. 2.1.2. Revised prior distribution The initial prior distribution π in ( θ ) gives a null probabil- it y t o t he sub -sample Θ 0 . Thus, from this initial prior π in ( θ ), a revised prior π ( θ ) is constructed to attribute a non null probabilit y to t he Θ 0 sub-sample. A ccording to Stephenson and T awn [2004], π ( θ ) is defined as : π ( θ ) = ( (1 − p ξ ) π in ( θ ) for θ ∈ Θ \ Θ 0 p ξ π ξ Fix ( θ ) for θ ∈ Θ 0 (13) where p ξ ∈ [0 , 1] and with π ξ Fix ( θ ) = π in ( µ, σ , ξ Fix ) R µ,σ π in ( µ, σ, ξ Fix ) dµdσ (14) for θ ∈ Θ 0 . The in tegral in equation (14) can b e easily ev aluated by standard numerical in tegration metho ds. By construction, the new prior distribution π ( θ ) giv es the required p robab ility t o th e sub-space Θ 0 . Stephenson and T awn [2004] hav e already applied form u lations (13) and (14) with success for sea-lev el maxima and rainfall threshold ex- ceedences. 2.2. Poste rior Estimation As it is of ten th e case i n Bay esian analysis, the integra l in equation (2) is insolv able analytically . MCMC techniques are used to o vercome th is p roblem. Y et, due to th e duality of π ( θ ) distribution, standard Metrop olis-Hastings [ Hastings , 1970] within Gibbs [ Geman and Geman , 1984] metho ds are not sufficient. R eversi ble jump techniques [ Gr e en , 1995] are used to allo w mo ves from the tw o dimensional space Θ 0 to the t hree dimensional space Θ \ Θ 0 and v ice-versa. The classical Bay esian analysis, on Θ \ Θ 0 , is p erformed with Gibbs cycle ov er each comp onent of θ using Metrop olis- Hastings up dates, with random wa lk prop osals [ Coles and T awn , 1996]. Stephenson and T awn [2004] extended this algorithm to incorp orate the mass on th e Gumbel/Exp onential case. How ever, as our approach do es not only focus on the ξ Fix = 0 case, a new algorithm must be implemented. T o help un der- stand the algorithmic developments , some details ab out t h e classical Metropolis-Hastings algorithm and the reversi ble jump case are rep orted in Appendix A. The prop osed algorithm must deal with tw o dimensional changes : a change to Θ 0 from Θ \ Θ 0 space and vice-versa. These tw o t yp es of special mov es must b e defined cautiously . As inspired by Stephenson and T awn [2004], quantiles asso- ciated to a non exceedence probabilit y p are set to b e equal at current state θ t and p rop osal θ prop , p b eing fixed. F or a prop osal mov e to Θ \ Θ 0 from Θ 0 , i.e., ξ t = ξ Fix and a prop osal shap e ξ prop 6 = ξ Fix , the candidate mo ve is t o change θ t = ( µ t , σ t , ξ t ) to θ prop = ( µ prop , σ prop , ξ prop ) where µ prop = µ t (15a) σ prop = σ t ξ prop ( y − ξ t − 1) ξ t ( y − ξ prop − 1) (15b) ξ prop ∼ N ( ˜ ξ , s 2 ξ ) (15c) where y = 1 − p , p b eing fixed, ˜ ξ is taken to b e the mo de of the marginal d istribution for ξ when there is no mass on Θ 0 [ Stephens on and T awn , 20 04], and s ξ is the standard devi- ation selected to g ive g o o d mixing prop erties to the chain. As it i s u sually the cas e with Metropolis-Hastings up dates, this mo ve is accepted with probability min(1 , ∆) with ∆ = π ( µ prop , σ prop , ξ prop | x ) π ( µ t , σ t , ξ Fix | x ) p ξ 1 − p ξ h φ ( ξ prop ; ˜ ξ , s 2 ξ ) J ξ Fix ( ξ prop ) i − 1 (16) where φ ( · ; m, s 2 ) denotes the density function of the Nor- mal d istribution with mean m and v ariance s 2 , and J ξ Fix is the Jacobian of th e p arameter transformation for quantile matc hing, t hat is: J ξ Fix ( ξ ) = ξ Fix ξ y − ξ − 1 y − ξ Fix − 1 (17) If the mo ve is accepted, then θ t +1 = ( µ prop , σ prop , ξ prop ), else θ t +1 = θ t . F or a prop osal mov e to Θ 0 from Θ \ Θ 0 , i.e., ξ t 6 = ξ Fix and a p rop osal shap e ξ prop = ξ Fix , the prop osal is to change θ t = ( µ t , σ t , ξ t ) to θ prop = ( µ prop , σ prop , ξ prop ) where µ prop = µ t (18a) σ prop = σ t ξ prop ( y − ξ t − 1) ξ t ( y − ξ prop − 1) (18b) ξ prop = ξ Fix (18c) This mov e is accepted with probability min ( 1 , ∆) where ∆ = π ( µ prop , σ prop , ξ Fix | x ) π ( µ t , σ t , ξ t | x ) 1 − p ξ p ξ φ ( ξ t ; ˜ ξ , s 2 ξ ) J ξ Fix ( ξ t ) (19) If t he mov e is accepted, then θ t +1 = ( µ prop , σ prop , ξ prop ) else θ t +1 = θ t . Obviously , sp ecial mov es introduced in this study are not the only conceiv able ones. Oth er reversible jumps can b e ex- plored - see for example Stephenson and T awn [2004]. How - ever, for this application, the proposed mo ves seem to b e particularly well suited. I ndeed, a prelimi nary study sho ws that the lo cation parameter w as w ell estimated by a regional Ba yesian approach. Thus, a sp ecial mov e which only affects the shap e and scale parameters shou ld be consistent. 3. Generation Pro cedure In this section, the procedure implemen ted to generate stochastic homogeneous regions is describ ed. The idea con- sists in generating sample p oints in a neigh b orho o d of t he L-moment space (Mean, L-CV, L-Skewness). The genera- tion proced ure can b e su mmarized as follo ws: 1. Set the center of the neighborho o d i.e. ( l 1 ,R , τ R , τ 3 ,R ) or equiv alently parameters of the regional distribution ( µ R , σ R , ξ R ); X - 4 RIBA T ET ET AL.: REVERSIBLE JUMP TECHNIQUES IN REGION A L FLOOD FREQU ENCY ANAL YSIS R 2 Frequency 0.2 0.4 0.6 0.8 1.0 0 2000 4000 6000 8000 Figure 1. Histogram of th e co efficient of determination for the regressiv e mo del (7). Application of section 5. 2. Generate N p oin ts ( l 1 ,i , τ i , τ 3 ,i ) uniformly in the sp here B (( l 1 ,R , τ R , τ 3 ,R ); ε ); 3. Generate N index floo ds C using the scali ng model parametrization: C = αAr ea β (20) Catc hment areas are defined as realizations of a lognorma l random var iable. 4. F or eac h ( l 1 ,i , τ i , τ 3 ,i ), compu te adimensi onal parame- ters by: ξ ∗ i = 3 τ 3 ,i − 1 1 + τ 3 ,i (21a) σ ∗ i = ( ξ ∗ i − 1)( ξ ∗ i − 2) l 1 ,i τ i (21b) µ ∗ i = l 1 ,i − σ ∗ i 1 − ξ ∗ i (21c) 5. Then, compute at-site parameters from: ξ i = ξ ∗ i (22a) σ i = C i σ ∗ i (22b) µ i = C i µ ∗ i (22c) 6. Simulate samples from a GPD with parameters ( µ i , σ i , ξ i ). As a GLM is used to elicit the prior distribution, the scal- ing model (20) must b e altered to a void giving an adv antag e to t h e Bay esian approac hes o ver the index fl o o d mo del. F or this purp ose, a noise in relation (20) at step 3 is introduced. Thus, areas are altered by add ing uniform random va riables v arying in ( − 0 . 5 × A r ea, 0 . 5 × Ar ea ). This distortion is necessary to ensure t hat the regressiv e mod el is not too comp etitive and is consistent with observ a- tions. Indeed la rge deviations to the area-index flo o d rela- tionship are often encoun tered in practice. In the follo wing applications, α = 0 . 1 2 , β = 1 . 01 and A r ea ∼ LN (4 . 8 , 1). These v alues arise from a p revious stu dy on a F rench data set [ Rib atet et al. , 2007] and en sure realistic magnitudes. F or the app lication of section 5, th e coefficients of determination for the regressiv e mo del ( 7) v aries from 0.20 to 0.99, with a mean va lue of 0.89 . The h istogram of these co efficients of determination is presented in Figure 1. The radius ε in the generation algorithm is set to 0.04. This va lue is chosen to refl ect v ariabilit y met in practice while preserving a lo w disp ersion around the regional distribution. The ε v alue pri- marily impacts the prop ortions of regions satisfying H 1 < 1. F or sp ecific applications, regions with a heterogeneity statis- tic H 1 such as H 1 > 1 ma y b e discarded. 4. Sensitivit y Analysis In this section, a sensitivity analysis for the algorithm in- trod uced in section 2.2 is carried out. The primary g oal is to chec k if results are not too impacted by the choice of th e tw o user-selectable parameters p ξ and ξ Fix . F or this pu r- p ose, th e effect of b oth p ξ and ξ Fix v alues on estimates and credibilit y in t erv als i s ex amined. F or this sensitivity analy- sis, the parameters of the regional distribution is set t o be (0.64, 0.4 8, 0.26). The regions ha ve 20 sites with a sample size of 70. F or the whole sensitivit y analysis, 10 000 regions w ere generated. The target site has a sample size of 10. W e concentra te on estimates at sites with very few data, to exhibit the main differences in the most restricting config- uration. Oth er configurations w ere found to demonstrate features similar to Fig. 3 and Fig. 2. 4.1. Effect of p ξ The evolution of t h e normalized biases (expressed in p er- cent) for return levels with non exceedence p robabilities 0.75, 0.95 and 0.995 associated to sev eral p ξ v alues are de- picted in Fig. 3. Each b oxplot is obtained from at-site esti- mates comp u ted on more than 365 stochastic homogeneous regions. The case p ξ = 0 corresp onds to a classica l Bay esian approac h free from any point mass. In addition, to analyze only the effect of the parameter p ξ , ξ Fix is temp orarily fix ed to b e equ al to t he th eoretical regional shape p arameter. T able 1. Po sterior proportions (in p ercen t) of even ts { θ ∈ Θ 0 } for differen t v alues of p ξ and ξ Fix . T arget Sample Size 60. ξ Fix features p ξ v alues R Shap e D Shap e 1/8 1/6 1/4 1/3 1/2 2/3 -0.50 2e − 5 0.00 0.03 0.00 0.00 0.05 0.00 0.00 0.06 10.07 14.55 17.27 21.99 41.53 61.84 0.50 0.70 38.88 46.94 59.96 67.42 81.88 92.17 0.83 1.00 46.21 57.33 67.53 76.08 85.33 92.20 1.00 0.87 48.24 55.14 68.90 76.16 86.05 91.85 1.50 0.41 32.72 45.61 54.62 66.18 82.11 89.90 2.00 0.10 22.95 22.83 35.06 49.82 57.86 81.92 2.50 0.01 13.93 7.04 9.86 36.21 38.89 42.28 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Shape Parameter ξ Density Figure 2. Posterior marginal densit y for the shap e pa- rameter. RIBA T ET ET AL .: REVERSIBLE JUMP TECHNIQUES IN REGIONAL FLOOD FREQUENCY ANAL YSIS X - 5 0 1/8 1/6 1/4 1/3 1/2 2/3 1 −20 −10 0 10 20 30 40 Point Mass Probability p ξ NBIAS (%) Q 0.75 0 1/8 1/6 1/4 1/3 1/2 2/3 1 −20 0 20 40 60 Point Mass Probability p ξ NBIAS (%) Q 0.95 0 1/8 1/6 1/4 1/3 1/2 2/3 1 −40 −20 0 20 40 60 80 Point Mass Probability p ξ NBIAS (%) Q 0.995 Figure 3. Effect of p ξ v alue on qu antile estimation with n on exceedence probabilities 0.75, 0.95 and 0.995. S ample size 10. ξ Fix = 0 . 26. 0.5 5.0 50.0 500.0 0 50 100 150 Return Period Return Level p ξ = 0 0.5 5.0 50.0 500.0 0 50 100 150 Return Period Return Level p ξ = 1 8 0.5 5.0 50.0 500.0 0 50 100 150 Return Period Return Level p ξ = 1 6 0.5 5.0 50.0 500.0 0 50 100 150 Return Period Return Level p ξ = 1 4 0.5 5.0 50.0 500.0 0 50 100 150 Return Period Return Level p ξ = 1 3 0.5 5.0 50.0 500.0 0 50 100 150 Return Period Return Level p ξ = 1 2 0.5 5.0 50.0 500.0 0 50 100 150 Return Period Return Level p ξ = 2 3 0.5 5.0 50.0 500.0 0 50 100 150 Return Period Return Level p ξ = 1 Figure 4. Effect of p ξ v alue on 90% p osterior credibility interv al. Sample Size 10. F rom Fig. 3, th e quantil e estimates distribution seems to b e stationary , provided th at p ξ > 0. I ntroducing a p oint mass do es not impact Q 0 . 75 estimates, whereas significant reduction in median biases and scatter of estimates is no- ticeable for more extremal quantiles. Fig. 4 sho ws the p osterior distributions of return levels and 90% p osterior credibility interv als for seve ral p ξ v alues. It is clear that credibility interv als are sensitive to th e p ξ v alue. This result is consistent as more and more prop os- als in the MCMC simulation belong to Θ 0 as p ξ increases. Thus, by construction, the Mark ov c hain is less v ariable. As denoted b y Stephenson and T awn [2004], the sp ecial case p ξ = 1 is particular as uncertain ty i n the shap e parameter is not considered. I n th at case, credibility in terv als could b e falsely narro w. 4.2. Effect of ξ Fix It is imp ortant to analyze th e influence of the choice of ξ Fix on the sim u lated Marko v chains; and thus, its impact on estimations. I ndeed, when sp ecifying an unreasonable ξ Fix X - 6 RIBA T ET ET AL.: REVERSIBLE JUMP TECHNIQUES IN REGION A L FLOOD FREQU ENCY ANAL YSIS v alue, the estimations must not differ significantly from the conv enti onal Bay esian ones. F or this purp ose, T ab. 1 d is- pla ys the p osterior prop ortions of even ts { θ ∈ Θ 0 } for several ξ Fix and p ξ v alues. This table is obtained with a target site sample size of 60. F or eac h sp ecified ξ Fix v alue, tw o features are computed to measure the relev ance of the ξ Fix v alue: (a) R Shap e the ratio of ξ Fix to the true shape p arameter ; and (b) D Shap e the ratio of the marginal posterior den sit y from a con ven tional Bay esian analysis eva luated in ξ Fix and ˜ ξ . R Shap e chara cterizes how muc h th e p oint Mass differs from the true v alue. D Shap e quantifies t h e distance of ξ Fix from th e estimator of the shap e parameter prop osed by Ri- b atet et al. [2007 ]. Thus, from these tw o statistics, con- sistency of t he p osterior proportions wi th deviations from theoretical and empiri cal va lues can b e analyzed. The results in T ab. 1 show that v alues of ξ Fix that are not consistent with the d ata imply low prop ortions of state in Θ 0 . Th us, for such v alues, t he prop osed mo del is quite similar to a conv entional Ba yesian analysis. H o wev er, for tw o different v alues of ξ Fix ( R Shap e equal to 0.83 and 1), the p osterior prop ortions are quite equ iv alent. This emphasizes the large uncertaint y on the shap e parameter estimation for small sample sizes. Uncertaint y on the shape parameter estimation is also corrob orated b y the p osterior marginal distribution of a conv entio nal Bay esian analysis - see Fig. 2. As noticed abov e, these results are obtained w ith a tar- get site sample size of 6 0. This particular sample size w as selected as it is the most illustrative case. How ever, the p os- terior prop ortions are q uite similar when dealing with other target site sample sizes - even if for very small sample sizes, this is l ess noticeable. 5. Sim ulation Study In this section, performance of three d ifferent estimators are analyzed: a conv entional Ba yesian estimator ( BA Y ) in- trod uced by R ib atet et al. [2007], the p rop osed estimator based on reversible jumps ( REV ) and th e ind ex floo d esti- mator ( IF L ). In particular, the B AY estimator is related to the in itial prior distribut ion d efined in Section 2.1.1. Thus, the B AY estimator is iden tical t o the R E V approach with p ξ = 0. F or the prop osed estimator, the p oint Mass probability p ξ w as set to b e a function of the H 1 statistic of Hosk ing and Wal l is [1997]; that is: p ξ = exp( − H 1 ) 1 + exp( − H 1 ) (23) F or this parametrization, necessary requirements are sat- isfied; i.e. p ξ → 0 when H 1 → + ∞ and p ξ → 1 when H 1 → −∞ . Moreov er, for H 1 = 0, p ξ = 0 . 5 which corre- sp on d s to th e estimator introdu ced b y Stephenson and T awn [2004]. Note t hat p ξ in Eq. (23) is defined with the negativ e inv erse of the so called lo git function. Thus, for this choi ce, as u nderlined by t he sensitivity analysis, cred ib ilit y interv als are related to the degree of T able 2. Characteristics of the sixth case studies. The tar- get site is omitted in the couple ( n Site , n Size ) and has a sample size of: 10, 25 and 40. ( µ R , σ R , ξ R ) N Site ( n Site , n Size ) N Even ts Conf1 (0.64, 0.48, 0.26) 10 (9 , 50) 450 Conf2 (0.64, 0.48, 0.26) 20 (9 , 30) × (10 , 18) 450 Conf3 (0.64, 0.48, 0.26) 15 (14 , 50) 700 Conf4 (0.66, 0.48, 0.08) 10 (9 , 50) 450 Conf5 (0.66, 0.48, 0.08) 20 (9 , 30) × (10 , 18) 450 Conf6 (0.66, 0.48, 0.08) 15 (14 , 50) 700 confidence of the p oint Mass ξ Fix to b e the true shap e pa- rameter and implici tly to the lev el of homogeneit y of the regions. In addition, the non exceedence p robabilit y p used for quantiles matching in our algo rithm (see Section 2.2) is equal to 1 − 1 / 2 n , where n is the target site sample size. This last p oint guarantees that quan tiles associated with non excee- dence probability 1 − 1 / 2 n for b oth prop osal and cu rrent state of the Marko v c hain are identical. Oth er choices for p are arguable. Here, we introduce a quantile matching equa- tion for a v alue closely related t o th e scale parameter and for which uncertainties are not to o large. The analysis w as performed on s ix differen t case stud ies summarized in T ab. 2. The configurations differ b y th e w ay in formation is distributed in space; that is , (a) “small regions” with wel l instrumented but few sites ( C onf 1 and C onf 4); (b) “large regions” with less instrumented and nu- merous sites ( C onf 2 and C onf 5) and (c) “medium regions” with well instru mented sites and an intermediate number of gauging stations. C onf 1 ( resp. C onf 2, C onf 3) corresp ond to C onf 4 (resp. C onf 5 , C onf 6 ) apart from the ( µ R , σ R , ξ R ) v alues. The target site sample size takes the v alues in 10, 25 and 40. 1000 regions w ere generated for eac h configuration. Mark ov chains o f length 15 000 w ere generated. T o ensure goo d mixing prop erties for all simulated Mark ov chains, an automated trial and error p rocess w as used to defin e pro- p osal standard deviations of the MCMC algorithm. F ur- thermore, the first 2000 iterations were discarded to ensure that t he eq uilibrium w as reached. The performance of eac h estimator is assessed th rough the t hree follow ing statistics: N B I AS = 1 k k X i =1 ˆ Q i − Q i Q i (24) S D = v u u t 1 k − 1 k X i =1 ˆ Q i − Q i Q i − N B I A S ! 2 (25) N M S E = 1 k k X i =1 ˆ Q i − Q i Q i ! 2 (26) where ˆ Q i is the estimate o f the theoretical v alue Q i and k is t he total n umber of th eoretical va lues. 5.1. B AY vs. I F L Approach T able 3 sho ws that, for a small target site sample size and quantiles Q 0 . 75 and Q 0 . 95 , the B AY approac h is more com- p etitive th an the I F L one. I ndeed, the th ree B AY statistics ( N B I AS , S D , N M S E ) are smaller than the ones related to I F L . How ever, for C onf 2 and C onf 5, I F L Q 0 . 95 estimates are more comp etitive. These t w o case studies correspond to the same configuration - i.e. numerous sites with short records. I F L estimates for Q 0 . 995 are alw ays more accurate than B AY for all configurations. These results indicate that the relative performance of B A Y compared to I F L dep ends on the p o oling group. Thus, for the B A Y approac h and qu antil es Q 0 . 75 and Q 0 . 95 , it seems preferable to work with less gauging stations b ut whic h hav e larger data series, indep endently of th e target site sample size. The sensitivit y to t h e configuration of the sites and th e av ailabilit y of long time series is a draw back for th e app lication of this Ba yesian app roac h. These conclusions obtained on stochastic regions are in line with a previous analysis on a F rench data set [ R ib atet et al. , 2007]. The B AY approach is suited to work with “small” or “medium” regions and w ell instrumented gaug- ing stations. In add ition, this approac h is accurate for “rea- sonable” quantile estimation – see the bad p erformance of B A Y for Q 0 . 995 in table 3. RIBA T ET ET AL .: REVERSIBLE JUMP TECHNIQUES IN REGIONAL FLOOD FREQUENCY ANAL YSIS X - 7 T able 3. Performance of B AY and I F L estimators for quantile Q 0 . 75 , Q 0 . 95 and Q 0 . 995 . T arget si te sample size: 10. Mo del Q 0 . 75 Q 0 . 95 Q 0 . 995 N B I AS S D N M S E N B I AS S D N M S E N B I AS S D N M S E Conf1 B AY 0.015 0.123 0.015 0.001 0. 187 0.035 − 0.006 0.318 0.101 I F L 0.037 0.189 0.037 0.025 0.195 0.038 − 0.004 0.230 0.053 Conf2 B AY 0.019 0.122 0.015 0.030 0. 249 0.063 0.110 0.561 0.326 I F L 0.041 0.183 0.035 0.025 0.191 0.037 − 0.022 0.221 0.049 Conf3 B AY 0.019 0.110 0.012 0.006 0. 174 0.030 − 0.003 0.292 0.085 I F L 0.035 0.188 0.037 0.025 0.195 0.039 − 0.002 0.222 0.049 Conf4 B AY 0.009 0.104 0.011 − 0.007 0.149 0.022 − 0.021 0.233 0.054 I F L 0.023 0.157 0.025 0.022 0.163 0.027 0.022 0.192 0.037 Conf5 B AY 0.018 0.109 0.012 0.012 0. 193 0.037 0.033 0.378 0.144 I F L 0.036 0.168 0.029 0.033 0.173 0.031 0.024 0.197 0.039 Conf6 B AY 0.024 0.103 0.011 0.001 0. 151 0.023 − 0.038 0.222 0.050 I F L 0.028 0.168 0.029 0.028 0.177 0.032 0.028 0.202 0.042 T able 4. Performance of B AY and RE V estimators f or quan tile Q 0 . 75 , Q 0 . 95 and Q 0 . 995 . T arget site sample si ze: 10. Mo del Q 0 . 75 Q 0 . 95 Q 0 . 995 N B I AS S D N M S E N B I AS S D N M S E N B I AS S D N M S E Conf1 B AY 0.015 0.123 0.015 0.001 0. 187 0.035 − 0.006 0.318 0.101 RE V 0.011 0.119 0.014 − 0.012 0.159 0.026 − 0.046 0.213 0.047 Conf2 B AY 0.019 0.122 0.015 0.030 0. 249 0.063 0.110 0.561 0.326 RE V 0.005 0.105 0.011 − 0.026 0.154 0.024 − 0.066 0.269 0.077 Conf3 B AY 0.019 0.110 0.012 0.006 0. 174 0.030 − 0.003 0.292 0.085 RE V 0.014 0.103 0.011 − 0.008 0.139 0.019 − 0.042 0.185 0.036 Conf4 B AY 0.009 0.104 0.011 − 0.007 0.149 0.022 − 0.021 0.233 0.054 RE V 0.010 0.102 0.011 0.002 0.136 0.018 − 0.001 0.182 0.033 Conf5 B AY 0.018 0.109 0.012 0.012 0. 193 0.037 0.033 0.378 0.144 RE V 0.013 0.097 0.010 0.000 0.126 0.016 − 0.014 0.171 0.030 Conf6 B AY 0.024 0.103 0.011 0.001 0. 151 0.023 − 0.038 0.222 0.050 RE V 0.031 0.099 0.011 0.033 0.133 0.019 0.034 0.174 0.032 How ever, the white noise introdu ced in th e generation proced ure is indep endent of the targ et site sample size. It only regards b oth Bay esian approaches. Thus, the p erfor- mance of the B AY estimator f or large sample sizes may b e too impacted. Indeed, while t he I F L estimation procedure is not altered, b oth Bay esian approaches must deal with ar- tificially generated biases. The main idea for the RE V ap p roac h is to com bine the goo d performance of the B AY estimator for “reasonable” quantiles and th e effic iency of the I F L approach for la rger quantiles. 5.2. B AY vs. RE V Approach The comparison of the tw o Bay esian estimators is sum- marized in T ab. 4. RE V leads to more accurate estimated quantiles, in particular for Q 0 . 95 and Q 0 . 995 . This last p oint confirms t he benefits of u sing a regi onal shap e parameter through a reversible jump approac h. By construction of the algorithm described in section 2.2, Mark ov c hains generated from the RE V approac h are less v ariable th an the ones generated from the B AY mo d el. Thus, RE V is associated to smalle r standard deviation than B A Y whatev er the configuration is (T able 4). Moreov er, if the regio nal fixed sh ap e parameter ξ Fix is suited, RE V should hav e the same biases th an B A Y . Thereby , the RE V estimator alw a ys lea ds to a smaller N M S E . 5.3. Global Comparison Figures 5 to 7 illustrate the results for different target site sample sizes and regions. W e concentrate on the N M S E cri- teria since it measures v ariation of th e estimator around the true parameter v alue. F rom Figure 5, it is clear that Bay esian estimations, i.e. B A Y and RE V , of Q 0 . 75 are more accurate; sp ecially fo r a target site sample size of 10. F or larger target site sam- ple sizes, Ba yesia n approaches are alwa ys more comp etitive than the I F L estimator, even if this is less clear-cut on the graphs. F urthermore, B AY and RE V estimators often hav e the same p erformance. This result is logical as the Q 0 . 75 v alue is m ostly impacted by the location parameter µ . Thus, reversible jumps do not have a significant result on RE V Q 0 . 75 estimation. The plots in Figure 6 and those display ed in Figure 5 are q uite different. F or a target site sample size of 10, b oth Ba yesian app roac hes are th e most accurate - except for B AY applied on C onf 2 and C onf 5 - and th e RE V estimator leads alw a ys to th e smalles t N M S E . Thus, R E V is the most comp etitive mod el. F or larger t arget site sample sizes, RE V is at least as accurate a s I F L , except for C onf 2. F or Q 0 . 995 and a target site sample size of 10, RE V is the most accurate model, except for C onf 2. As the target site sample size increases, t h e I F L approach b ecomes more efficien t. Ho wev er, for these cases, N M S E for the RE V es- timator are often close to the I F L ones. Although the B AY X - 8 RIBA T ET ET AL.: REVERSIBLE JUMP TECHNIQUES IN REGION A L FLOOD FREQU ENCY ANAL YSIS Conf1 Conf2 Conf3 Conf4 Conf5 Conf6 BAY REV IFL (a) NMSE 0.000 0.020 Conf1 Conf2 Conf3 Conf4 Conf5 Conf6 BAY REV IFL (b) NMSE 0.000 0.008 Conf1 Conf2 Conf3 Conf4 Conf5 Conf6 BAY REV IFL (c) NMSE 0.000 0.006 Figure 5. Evolution of the N M S E for quantile Q 0 . 75 in function of th e regio n configuration. T arget site sample size: (a) 10, (b) 25 and (c) 40. Conf1 Conf2 Conf3 Conf4 Conf5 Conf6 BAY REV IFL (a) NMSE 0.00 0.03 0.06 Conf1 Conf2 Conf3 Conf4 Conf5 Conf6 BAY REV IFL (b) NMSE 0.00 0.02 0.04 Conf1 Conf2 Conf3 Conf4 Conf5 Conf6 BAY REV IFL (c) NMSE 0.000 0.020 Figure 6. Evolution of the N M S E for quantile Q 0 . 95 in function of th e regio n configuration. T arget site sample size: (a) 10, (b) 25 and (c) 40. Figure 7. Evolution of the N M S E for quantil e Q 0 . 995 in f unction of the region configuration. T arget site sample size: (a) 10, (b) 25 and (c) 40. Figure 8. Evolution of N B I AS for Q 0 . 95 in function of the normalized bias on target site index flo o d estimation (Bias(C)). T arget site sample size: (a) 10, ( b ) 25 and ( c) 40. Solid green lines: local smo others, blac k dashed lines: y = x . approac h p erforms p o orly for Q 0 . 995 , its N M S E for C onf 6 is close t o the R E V and I F L ones. In conclusion, these results illustrate t he go o d o verall p er- formance of the RE V model. Indeed, this approach b en efits RIBA T ET ET AL .: REVERSIBLE JUMP TECHNIQUES IN REGIONAL FLOOD FREQUENCY ANAL YSIS X - 9 from the efficiency of t h e B AY estimator for quantiles with small non exceedence probabilities w hile being as co mp eti- tive as the I F L approach for l arger non exceedence proba- bilities. How ever, the Bay esian a pproaches outp erform th e index floo d mo del bu t d ifferences in accuracy seem to b e less and less significant as the sample site increases. This may b e related to th e white noise introdu ce in the generation pro ce- dure. Indeed, this white noise is ind ep endent of the t arget site sample size and may strongly p enalize the p erformances of the both Ba yesian approaches. The n ext section tries to outline the effect of the target site ind ex flo od estimation error t o the quantile estimates. 6. Effect o f Bias on the T arget Site Index Flo o d Estimation According t o the mod el being considered, tw o t yp es of biases are encountered for th e target site index flo od esti- mation. Indeed, on one hand, th e in dex floo d for the I F L mod el is derived from the target site sa mple. On th e other hand, for B AY and RE V approac hes, the index flo o d is es- timated from a scaling model. Thus, b iases on ind ex flo o d estimation are due to th e relev ance of this scaling mo del but also to the index fl oo d error estimation for the other sites within the region. T o illustrates these tw o t yp es of biases, the normalized bias on target site index flo o d estimation is computed as follo ws: B ias ( C ) = ˆ C − C C (27) where C is the target site index floo d, and ˆ C is an estimate of C . Figure 8 depicts changes in N B I A S for qu antil e Q 0 . 95 in function of B ias ( C ). As normalized biases are considered, statistics for t he si x configurations are plotted in the same graphic. Solid lines corresp ond to lo cal p olynomial regres- sion fits to help u nderline t ren ds. Scatter-plots in Fig ure 8 sho w clearly these tw o t y p es of biases. I ndeed, on one hand, the range of B ias ( C ) is not the same for I F L than for B AY and RE V , particularly for a target site sample si ze of 25 and 40. On the oth er hand, for the B AY and RE V approaches, biases on index flo o d estimation are indep endent of the targe t site sample size; while this is not the case for I F L . This last p oint is also illustrated as the bias ranges for the Bay esian approaches remain the same for all target site sample size . Thus, for large sample size, efficiency of t h e Bay esian estimators ma y b e too much imp acted as the artificial bias in tro d uced in the generation pro cedure is to o p en alizing. The Ba yesian approac hes do not hav e th e same b ehaviour than the I F L model. In particular, B AY and R E V seem to b e less s ensitive to a large bias i n target s ite index flo o d estimation. N B I A S for the I F L mo del are clearly li near with a resp onse y = x . This last p oint is an exp ected result. Indeed, apart from sampling v ariabilit y , if a uniqu e regional distribution exists, quantil e I F L estimate biases are only in- duced by biases on target site index fl oo d estima tes. Th us, the relev ance of the generation pro cedure is corrob orated. The main difference b etw een the B AY and R E V estima- tors is the dispersion around lo cal smoothers. Indeed, RE V has a smaller range while preserving the same robustness to the b ias on target site index floo d estimation. These results and conclusions are indep endent of the tar- get site index floo d estimation p rocedu re. How ever, the p er- formance of the tw o Bay esian estimators is related to the bias and va riance of the target site index flo od estimate. Thus, for similar va riance, these results should b e i dentical if GAMs or Kriging were used. 7. Suggestions for Region Configuration This section attempts to present some suggestions for building suitable p o oling groups according to the considered estimator. Hosking and Wal lis [1997] already adv ice not to build regions greater than 20 sites b ecause of the small gain affected with a dditional stations. Ho we ver, they only focus on the I F L methodology . W e attempt to do the same for the tw o Ba yesian estimators considered in this study . F or this purp ose, tables 5 , 6 and 7 include th e N M S E and the related stand ard errors for eac h configuration and target site sample size. F rom T able 5, t h e I F L estimator seems to ha ve the same p erformance l evel indep end ently of the configuration. This result p oints out th at the information is not used opti- mally as regions with th e most information (i.e. C onf 3 and C onf 6) do not alw ays lead t o b ett er estimations. This last p oint corrob orates a previous commen ts of Rib atet et al. [2007]. T able 6 shows that th e B AY estimator is more accu- rate with “medium” regions, i.e. C onf 3 and C onf 6. Ho w- ever, results for “small” regions, i.e. C onf 1 and C onf 4 , are often cl ose to the b est ones - esp ecially for a lig ht tail. Thus, it is preferable to work with w ell-instrumented sites, i.e. C onf 1 , C onf 3 , C onf 4 and C onf 6. T able 7 sho ws th at the RE V estimator more efficien t with “medium” regions, i.e. C onf 3 and C onf 6. In addition, it seems to be more accurate with few bu t w ell-instrumented gauging stations rather more b u t less-instrumen ted ones. Nevertheless for a ligh t tail, all configurations seems to lead to similar p erformance lev els. T ables 5, 6 an d 7 sho w that the estimation of Q 0 . 75 is indep endent of th e region configuration for all estimators . Thus, it seems that th e regional information is not relev ant for quantiles with small non exceedence probabilities. 8. Conclusions This article introduced a new Ba yesian estimator whic h uses regio nal information in an innov ative wa y . The pro- p osed model acco unts for a fixed regional shape parameter with a non null probability . Thus, as in Rib atet et al. [2007], the regional i nformation is still u sed to elicit the prior dis- tribution. How ever, the prior distribution is now a m ix ture of a GEV/GPD and a GEV/GPD with only tw o parameters - t he remaining one correspond s to the fi xed regional shap e parameter. The estimation pro cedure is achieved using reversible jump Marko v chains [ Gr e en , 1995]; and theoretical de- tails for simulated suited Marko v chains were presented. A sensitivity analysis for t he prop osed algorithm was p er- formed. The results show ed th at the estimates are consis- tent provided that the probabilit y attributed to the fixed regional shap e p arameter is p ositive. In addition, as noticed by Steph enson and T awn [2004 ], the credibility interv als are sensitiv e t o this probabilit y va lue. Th us, th e p rop osed esti- mator relates th is p robabilit y va lue to th e homogeneit y de- gree of the region - using the heterogeneity statistic of Hosk- ing and W al lis [1997] . Therefore, th e credibilit y in terv als take into ac count the belief ab out the fixed reg ional shap e parameter to b e the tru e v alue. A p erformance analysis was carried out on sto chastic data for three different estimators. F or this pu rp ose, another algorithm which generates sto chastic homogeneous regions w as implemen ted. The goo d ov erall p erformance of th e pro- p osed estimator has b een d emonstrated. I ndeed, on one hand, th is approac h com bines the accuracy of the regional Ba yesian approach of Rib atet et al. [2007] for qu antile s asso- ciated to small exceedence probabilities. On the other hand, X - 10 RIBA TET ET AL .: REVERSIBLE JUMP TECHNIQUES IN REGIONAL FLOOD FREQUENCY ANAL YSIS T able 5. Changes in N M S E for Q 0 . 75 , Q 0 . 95 and Q 0 . 995 in function of the region configuration and the target site sample size for the I F L estimator. Related standard errors are display ed in brack ets. Mo del Hea vy T ail Light T ail C onf 1 C onf 2 C onf 3 C onf 4 C onf 5 C onf 6 T arget site sample size 10 Q 0 . 75 0.037 (3e-3) 0.035 (3e-3) 0.037 (3e-3) 0.025 (2e-3) 0. 029 (2e-3) 0.029(3e-3) Q 0 . 95 0.038 (3e-3) 0.037 (3e-3) 0.039 (3e-3) 0.027 (4e-3) 0. 031 (2e-3) 0.032 (3e-3) Q 0 . 995 0.053 (4e-3) 0.049 (3e-3) 0.049 (4e-3) 0.037 (2e-3) 0. 039 (3e-3) 0.042 (4e-3) T arget site sample size 25 Q 0 . 75 0.014 (8e-4) 0.015 (1e-3) 0.015 (1e-3) 0.011 (7e-4) 0. 011 (7e-4) 0.011(7e-4) Q 0 . 95 0.018 (1e-3) 0.018 (1e-3) 0.018 (1e-3) 0.014 (9e-4) 0. 014 (9e-4) 0.013 (9e-4) Q 0 . 995 0.034 (2e-3) 0.032 (2e-3) 0.027 (2e-3) 0.024 (2e-3) 0. 023 (2e-3) 0.020 (1e-3) T arget site sample size 40 Q 0 . 75 0.010 (6e-4) 0.009 (6e-4) 0.010 (6e-4) 0.007 (4e-4) 0. 007 (4e-4) 0.007 (5e-4) Q 0 . 95 0.013 (8e-4) 0.013 (8e-4) 0.012 (8e-4) 0.010 (6e-4) 0. 009 (5e-4) 0.010 (6e-4) Q 0 . 995 0.028 (2e-3) 0.028 (2e-3) 0.023 (2e-3) 0.020 (1e-3) 0. 017 (1e-3) 0.019 (1e-3) T able 6. Changes in N M S E for Q 0 . 75 , Q 0 . 95 and Q 0 . 995 in function of the region configuration and the target site sample size for the B A Y estimator. Related standard errors are display ed in brack ets. Mo del Hea vy T ail Light T ail C onf 1 C onf 2 C onf 3 C onf 4 C onf 5 C onf 6 T arget site sample size 10 Q 0 . 75 0.015 (1e-3) 0.015 (1e-3) 0.012 (8e-4) 0.011 (6e-4) 0. 012 (9e-4) 0.011(9e-4) Q 0 . 95 0.035 (2e-3) 0.063 (4e-3) 0.030 (2e-4) 0.022 (1e-3) 0. 037 (3e-3) 0.023 (2e-3) Q 0 . 995 0.101 (1e-2) 0.326 (3e-2) 0.085 (6e-3) 0.054 (5e-3) 0. 144 (1e-2) 0.050 (3e-3) T arget site sample size 25 Q 0 . 75 0.010 (6e-4) 0.011 (7e-4) 0.009 (5e-4) 0.008 (5e-4) 0. 007 (4e-4) 0.007(5e-4) Q 0 . 95 0.026 (2e-3) 0.041 (3e-3) 0.025 (1e-3) 0.017 (1e-3) 0. 023 (1e-3) 0.016 (9e-4) Q 0 . 995 0.089 (8e-3) 0.212 (2e-2) 0.079 (4e-3) 0.044 (3e-3) 0. 086 (6e-3) 0.038 (2e-3) T arget site sample size 40 Q 0 . 75 0.008 (5e-4) 0.008 (5e-4) 0.007 (4e-4) 0.005 (3e-4) 0. 005 (3e-4) 0.006 (4e-4) Q 0 . 95 0.020 (1e-3) 0.032 (2e-3) 0.020 (1e-3) 0.012 (8e-4) 0. 015 (9e-4) 0.013 (8e-4) Q 0 . 995 0.072 (5e-3) 0.187 (2e-2) 0.074 (5e-3) 0.038 (3e-3) 0. 070 (6e-3) 0.036 (2e-3) T able 7. Changes in N M S E for Q 0 . 75 , Q 0 . 95 and Q 0 . 995 in function of the region configuration and the target site sample size for the RE V estimator. Related standard err or s are di s play ed in brac ket s. Mo del Hea vy T ail Light T ail C onf 1 C onf 2 C onf 3 C onf 4 C onf 5 C onf 6 T arget site sample size 10 Q 0 . 75 0.014 (1e-3) 0.011 (7e-4) 0.011 (7e-4) 0.011 (6e-4) 0. 010 (7e-4) 0.011(9e-4) Q 0 . 95 0.026 (2e-3) 0.024 (2e-3) 0.019 (1e-3) 0.018 (1e-3) 0. 016 (1e-3) 0.019 (2e-3) Q 0 . 995 0.047 (3e-3) 0.077 (2e-2) 0.036 (2e-3) 0.033 (2e-3) 0. 030 (2e-3) 0.032 (3e-3) T arget site sample size 25 Q 0 . 75 0.009 (6e-4) 0.009 (6e-4) 0.008 (5e-4) 0.008 (5e-4) 0. 006 (4e-4) 0.007(5e-4) Q 0 . 95 0.019 (1e-3) 0.020 (2e-3) 0.016 (9e-4) 0.014 (1e-3) 0. 014 (9e-4) 0.014 (9e-4) Q 0 . 995 0.040 (3e-3) 0.061 (1e-2) 0.031 (2e-3) 0.026 (2e-3) 0. 032 (3e-3) 0.024 (2e-3) T arget site sample size 40 Q 0 . 75 0.008 (5e-4) 0.007 (5e-4) 0.006 (4e-4) 0.005 (3e-4) 0. 005 (3e-4) 0.006 (3e-4) Q 0 . 95 0.015 (1e-3) 0.016 (1e-3) 0.012 (9e-4) 0.010 (7e-4) 0. 010 (5e-4) 0.011 (6e-4) Q 0 . 995 0.034 (2e-3) 0.055 (1e-2) 0.027 (2e-3) 0.022 (2e-3) 0. 023 (2e-3) 0.021 (1e-3) the duality of the p rior distribution ( and th e fixed regional shap e parameter) allo ws th e prop osed estimator to b e at least as efficien t as the ind ex floo d mo del. Th us, this new estimator seems very suited for regional estimation when th e target site is not well instru mented. F u rt hermore, the tw o Bay esian approaches considered here app ear to b e less sensitive to biases on target site index floo d estimatio n than the index floo d estimator. Thus, the Ba yesian approaches are more readily adapt ab le whic h is a ma jor adv antage as errors on the index floo d estimation are often uncontroll able. As noticed by Rib atet et al . [2007], the index flo o d mo del does not use information optimally . This p oin t is corrob- orated in this study as the mod el initiated by Dal rympl e [1960] is not inevitably more accurate as the in formation within the p ooling group increases. This is not th e case for the Ba yesian approac hes. In addition, th ey seem to b e more accurate when dealing with regions with w ell instrumented sites, particularly f or large quantiles. All statistica l analysis were carried out by u se of R Devel- opment Cor e T e am [2006]. F or this p urp ose, th e algorithm presented in section 2.2 was incorp orated in the ev dbay e s pack ages [ Stephe nson and R ib atet , 2006]. The algorithm for the generatio n pro cedure is a v ailable on request from the author. Ac knowledgmen ts. The authors wi s h to thank A lec Stephenson for providing the original codes of his article. The financial supp ort provided by the National Science and Engineer- ing Researc h Council of Canada (NSERC) is ackno wledge. W e are also grateful to the editor, the asso ciate editor and t wo anon y- mous r eferees for useful criticism of the or iginal ve rsi on of the paper. App endix A: The Metrop olis-Hastings Algorithm In this section, th e Metrop olis-Hastings algorithm is pre- sented. According to the results derived by Gr e en [1995], some details will be giv en to consider the revers ible jump case. The basic idea of the Metrop olis-Hastings algorithm is to obtain a Marko v c hain that con verges to a known station- ary distribution. The strength of the Metropolis-Hasting RIBA T ET ET AL .: REVERSIBLE JUMP TECHNIQUES IN REGIONAL FLOOD FREQUENCY ANAL YSIS X - 11 approac h is t h at t he con verg ence is reached whatev er t he initial state of th e Marko v chain is and th at the distribu- tions could b e known up to a constan t. Let f den ote the target distribut ion of interest. Most often, in Bay esian inference, π will be t he p osterior dis- tribution for the parameters. Let q ( · , x ) b e th e prop osal distribution i.e. th e prop osal states will b e sampled from this prop osal distribution giv en the current state x t . The Metropolis-Hastings algorithm can b e summarized as fol- lo ws: 1. Generate u from a uniform distribution on [0 , 1]; 2. Generate x prop from q ( · , x t ) 3. ∆ class ← f ( x prop ) f ( x t ) q ( x t | x prop ) q ( x prop | x t ) 4. if u < min (1 , ∆ class ) then 5. x t +1 ← x prop 6. el se 7. x t +1 ← x t 8. endi f 9. Go to 1. The initial Metrop olis-Hastings algorithm can n ot ac- count for dimensional switch. F or this pu rp ose, the “jumps” b etw een sub-spaces must b e defined (see equations (15a)– (15c) and (18a)–(18c)) a nd the quantit y ∆ class must b e re- defined each t ime a jump is considered. H ere, only a simple case of th e reversible jumps approach is considered (see Sec- tion 3.3 of Gr e en [1995]). If only tw o mov es m 1 ( x t ) and m 2 ( x t ) can occur with probabilities p 1 and p 2 respectively , then t h e quantity ∆ class must b e rep laced by ∆ rev . Conse- quently , for a prop osal mov e of type m 1 : ∆ rev = ∆ class p 1 p 2 J 1 (A1) where J 1 is the jacobian of th e t ransformation x t 7→ m 1 ( x t ). If the p rop osal mo ve is of type m 2 , then ∆ rev = ∆ class p 2 p 1 J 2 (A2) where J 2 is the jacobian of th e t ransformation x t 7→ m 2 ( x t ). References Ba yes, T. (1763) , A n essa y tow ar ds solving a problem in the doc- trine of chance , Philosoph ic al T r ansaction of the R oyal So ci- ety , 53 . Bortot, P ., and S. Coles (2000), The multiv ariate gaussian tail model: An application to o ceanographic data, Journal of the R oyal Statistica l So ciety . Series C: Applie d Statistics , 49 (1), 31–49. Coles, S., and M. Dixon (1999), Lik eliho o d-based i nference for extreme v al ue models, Ext r emes , 2 (1), 5–23. Coles, S., and J. T awn (1996), A bay esian analysis of extreme rainfall dat a, Journal of the R oyal Statistic al So cie ty. Series C: Applie d St atistics , 45 (4), 463–478. Dalrymple, T. (1960) , Flo od frequency analysis, U.S. Ge ol. Surv. Water Supply Pap. , 1543 A . Geman, S., and D. Geman (1984), Sto ch astic relaxation, gibbs distributions, and the bay esi an restoration of images., IEEE T r ansactions on Pattern A nalysis and Machine Intel ligenc e , P AMI-6 ( 6), 721–741. Green, P . (1995), Reversible j ump marko v chain monte carlo com- putation and bay esian mo del dete rmination, Biometric a , 82 , 711–732. Hastings, W. K. (1970), Mon te carlo sampling metho ds using marko v c hains and their applications, Biometrika , 57 , 97–109. Hosking, J., and J. W allis (1987), Parameter a nd quant ile esti- mation f or the generalized pareto distri bution, T e chnometrics , 29 (3), 339–349. Hosking, J. R. M., and J. R. W al l is (1997), R egiona l F r e quency Ana lysis , Cam bridge Universit y Press. Ju´ ar ez, S., and W. Sc huc any (2004), Robust and efficient estima- tion for the generalized pareto distribution, Extr emes , 7 (3), 237–251. Kuczera, G. (1982), Combining at-site and regional i nforma- tion: A n empiri cal bay es approac h, Wate r R e sour c es R ese ar ch , 18 (2), 306–314. Madsen, H., and D. R osb jerg (1997), Generalized least squares and empir ical Bay es estimation i n regional partial dur ation se- ries index-flood mo deling, Water R esour c es R esea r ch , 33 (4), 771–781. Martins, E., and J. Stedinger (2000), Generalized maximum- likelihoo d generalized extreme-v alue quant ile estimators for h ydrologic data, Water R esour ces Rese ar ch , 36 (3), 737–744. McCullagh, P ., and J. A. Nelder (1989) , Gener alize d Li ne ar Mo d- els , Chapman and Hall. Merz, R., and G. Bl¨ osc hl (20 05), Floo d frequency regionalisa- tion – Spatial proximit y vs. catc hmen t a ttributes, J. Hydr ol. , 302 (1-4), 283–306. Northrop, P . (20 04), Likelihoo d-based approaches to flo od fre- quency estimation, Journal of Hydr olo gy , 292 (1-4), 96–113. Pa ndey , M., P . V an Gelder, and J. V rijl ing (200 4), Dutc h case studies of the estimation of extreme quantiles and associated uncertain t y by bo otstrap simulations, Envir onmetrics , 15 (7), 687–699. Pa rk, J.- S. (2005), A simulation-based hyperparameter selection for quan tile estimation of the generalized extreme v alue dis- tribution, Mathematics and Computers in Simulation , 70 (4), 227–234. Pa yer, T. , and H. Kuche nhoff (2 004), Modelli ng extreme wind speeds at a german w eather stat ion as basic input for a sub- sequen t r isk analysis for high-speed trains, Journal of Wind Engine e ring and Industrial Aer o dynamics , 92 (3-4), 241–261 . Pick ands, J. I. (19 75), Statistical inference using e xtreme order statistics, Annal s of St atistics , 3 , 119–131. R Developmen t Cor e T eam (2006) , R: A L anguage and Envir on- ment for Statistic al Co mputing , R F oundation for Statistical Computing, Vienna, Austria, ISBN 3-900051-07-0. Ribatet, M., E. Sauquet, J.-M. Gr ´ esillon, and T. B. M. J. Ouarda (2007), A regional bay esian p ot mo del for flo o d frequency anal- ysis, Sto chastic Envir onmental R ese ar c h and Risk Assessment (SERRA) , 21 (4), 327–339. Rosb j er g, D., H. Madsen, and P . Rasm ussen (1992), Prediction in partial dur ation series with generalised pareto- distri buted exceedan ces, Water R esour c es R e se ar ch , 28 (11), 3001–3010. Seidou, O. , T. Ouarda, M. Barbet, P . Bruneau, and B. Bob´ ee (2006), A parametric bay esian combination of lo cal and re- gional i nformation in floo d fr equency analysis, Water R esour. R es. , 42 (11), W11408 . Sh u, C., and D. H. Burn (2004), Artificial neural netw ork ensem- bles and their application in p o oled flood frequenc y analysis, Water R e sour c es R ese ar ch , 40 (9), W09,301. Stephenson , A., and M. Ribatet (2006), A User’s Guide to the evdb aye s Package (V ersion 1.1) . Stephenson , A., and J. T awn (2004), Bay esian inference for ex- tremes: Accounting for the three extremal types, Extr emes , 7 (4), 291–307. W o o d, S., and N. Augustin (2002), GAMs with i n tegrated model selection using p enalized regression s pl ines and applications to en vironmental modell ing, Ec ol. Mo del. , 157 (2-3), 157–177. M. Ribatet, Unit´ e de Rec herch e HH , Cemagref Groupement de Ly on, 3bis quai Chauv eau CP220, 69336 Lyo n Cedex 09, FRANCE. (ribatet@ly on.cemagref.fr)

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment