Regression modeling of multivariate precipitation extremes under regular variation

Regression modeling of multivariate precipitation extremes under regular variation
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Motivated by the EVA2025 data challenge, where we participated as the team DesiBoys, we propose a regression strategy within the framework of regular variation to estimate the occurrences and intensities of high precipitation extremes derived from different climate runs of the CESM2 Large Ensemble Community Project (LENS2). Our approach first empirically estimates the target quantities at sub-asymptotic (lower threshold) levels and sets them as response variables within a simple regression framework arising from the theoretical expressions of joint regular variation. Although a seasonal pattern is evident in the data, the precipitation intensities do not exhibit any significant long-term trends across years. Besides, we can safely assume the data to be independent across different climate model runs, thereby simplifying the modeling framework. Once the regression parameters are estimated, we employ a standard prediction approach to infer precipitation levels at very high quantiles. We calculate the confidence intervals using a nonparametric block bootstrap procedure. While a likelihood-based inference grounded in multivariate extreme value theory may provide more accurate estimates and confidence intervals, it would involve a significantly higher computational burden. Our proposed simple and computationally straightforward two-stage approach provides reasonable estimates for the desired quantities, securing us a joint second position in the final rankings of the EVA2025 conference data challenge competition.


💡 Research Summary

The paper presents a two‑stage regression methodology grounded in regular variation theory for estimating multivariate precipitation extremes derived from the CESM2 Large Ensemble (LENS2) climate model runs. The authors participated in the EVA2025 data challenge as the “DesiBoys” team and aimed to predict three specific quantities: (1) the expected number of days when all 25 grid points exceed 1.7 Leadbetters, (2) the expected number of days when at least six points exceed 5.7 Leadbetters, and (3) the expected number of occurrences of at least three points exceeding 5 Leadbetters for a run of at least two consecutive days. The dataset consists of daily precipitation for 165 years (1850‑2014) across a 5×5 spatial grid from four independent model realizations, with the original units transformed into an artificial “Leadbetters” scale.

Exploratory analysis shows no significant long‑term trend in annual maxima, clear seasonal patterns in daily averages, and negligible temporal extremal dependence after deseasonalising. Statistical tests confirm that yearly maxima are approximately independent and identically distributed across runs, justifying the assumption of independence in the modeling framework.

The theoretical foundation relies on multivariate regular variation: the precipitation vector (Y) is assumed to be regularly varying with index (\alpha>0) and a scaling function (b(t)). A homogeneous Radon measure (\nu) characterises the tail dependence, and for a high threshold (u) the joint exceedance probability behaves as (\Pr{Y_{(1)}>u}\approx C,u^{-\alpha}\ell(u)), where (C=\nu((1,\infty]^{25})) and (\ell(u)) is slowly varying. This asymptotic relationship provides a simple power‑law form linking exceedance probabilities at different thresholds.

In the first stage, the authors compute empirical exceedance rates at several sub‑asymptotic quantiles (e.g., 0.975, 0.98, 0.985, 0.99). By taking logarithms of both the empirical probabilities and the corresponding thresholds, they fit a linear regression model whose slope estimates (-\alpha) and intercept estimates (\log C). This step yields estimates of the tail index and the joint tail mass using data well within the observed range, avoiding the instability of direct high‑threshold estimation.

In the second stage, the fitted regression is extrapolated to the extreme thresholds required by the challenge (1.7, 5.7, and 5 Leadbetters). The resulting predicted exceedance probabilities are inserted into combinatorial formulas that count the expected number of days satisfying each of the three event definitions. For the third task, which involves runs of consecutive days, the authors adjust the count to avoid double‑counting overlapping sequences.

Uncertainty quantification is performed via a non‑parametric block bootstrap that resamples whole years as blocks, preserving intra‑year dependence while breaking inter‑year dependence. For each bootstrap replicate, the entire two‑stage procedure is repeated, producing a distribution of the target quantities. The 2.5th and 97.5th percentiles of these bootstrap distributions constitute 95 % confidence intervals.

The proposed approach is computationally lightweight compared with full multivariate extreme‑value likelihood methods, which are often infeasible for high‑dimensional climate ensembles. Despite its simplicity, the method achieved a joint second place in the EVA2025 competition, demonstrating that regular‑variation‑based regression can deliver accurate extreme‑value estimates in large‑scale applications.

The authors acknowledge limitations: the regular variation assumption may not capture all nuances of tail dependence, and the block bootstrap ignores any residual dependence within a year beyond the block structure. They suggest future extensions such as incorporating non‑regular variation models, spatial dependence structures (e.g., max‑stable processes), and Bayesian bootstrap techniques to improve inference robustness.


Comments & Academic Discussion

Loading comments...

Leave a Comment