A new mixture model for spatiotemporal exceedances with flexible tail dependence
We propose a new model and estimation framework for spatiotemporal streamflow exceedances above a threshold that flexibly captures asymptotic dependence and independence in the tail of the distribution. We model streamflow using a mixture of processes with spatial, temporal and spatiotemporal asymptotic dependence regimes. A censoring mechanism allows us to use only observations above a threshold to estimate marginal and joint probabilities of extreme events. As the likelihood is intractable, we use simulation-based inference powered by random forests to estimate model parameters from summary statistics of the data. Simulations and modeling of streamflow data from the U.S. Geological Survey illustrate the feasibility and practicality of our approach.
💡 Research Summary
This paper introduces a novel spatiotemporal mixture model designed to capture flexible tail dependence structures in extreme streamflow exceedances. Traditional extreme‑value approaches, such as max‑stable processes (MSPs) and existing max‑mixture or copula‑based mixtures, typically assume temporal independence or enforce a fixed asymptotic dependence (AD) versus asymptotic independence (AI) regime. Consequently, they struggle with high‑frequency data (e.g., daily maxima) where both spatial and temporal dependence can vary across the tail.
The authors construct a latent process X(s,t) as a convex combination of four component processes, each representing a distinct asymptotic regime:
- R_ST – a Brown‑Resnick process exhibiting AD in both space and time.
- R_S – a spatial‑only Brown‑Resnick process (AD in space, AI in time).
- R_T – a temporal‑only Brown‑Resnick process (AD in time, AI in space).
- W – an inverted Brown‑Resnick process that is AI in both dimensions.
All components are marginally transformed to standard exponential distributions, and the mixture weights λ₁,…,λ₄ are non‑negative and sum to one. The resulting X(s,t) follows a hypo‑exponential distribution, preserving the same χ‑coefficient tail dependence as the underlying components. The observable streamflow Y(s,t) is obtained via a location‑specific transformation G(·;s,t) that maps X to a generalized Pareto distribution (GPD) for the upper tail, allowing the GPD parameters to vary across space and time.
Because the likelihood of censored max‑stable models is intractable for moderate to large numbers of sites, the authors adopt a simulation‑based inference (SBI) framework. They generate a dense grid of plausible parameter vectors, simulate data for each, and compute a set of carefully chosen summary statistics—most notably empirical χ‑values across a range of spatial and temporal lags, as well as marginal exceedance statistics. Instead of the more common deep‑learning surrogates, they train a random‑forest regression model to learn the mapping from these summaries to the underlying parameters. Random forests are computationally cheaper, require fewer hyper‑parameters, and still achieve high predictive accuracy for this problem.
A comprehensive simulation study demonstrates that the mixture model can accurately recover the four asymptotic regimes, that the random‑forest estimator yields low bias and variance, and that performance degrades gracefully as the censoring threshold increases.
The methodology is applied to a 60‑year record of daily streamflow maxima from the USGS Hydro‑Climatic Data Network, comprising over 150 stations. After fitting the marginal GPDs and estimating the mixture weights, the model reveals spatial‑temporal patterns: during wet seasons the AD component (R_ST) dominates, while in drier periods AI components become more prominent. Out‑of‑sample validation at unobserved stations shows a 10–15 % reduction in mean absolute error and Continuous Ranked Probability Score compared with a standard MSP model. Trend analysis based on the fitted parameters indicates a statistically significant increase in extreme flow frequency in several downstream basins over the past six decades, consistent with broader climate‑change signals.
In summary, the paper’s contributions are threefold: (1) a flexible spatiotemporal mixture that can represent any combination of spatial and temporal AD/AI, (2) a practical SBI approach using random forests to bypass intractable likelihoods for censored extremes, and (3) a demonstration of improved predictive skill and insightful climatological findings on real streamflow data. Limitations include the computational cost of exploring the high‑dimensional λ‑space and the relatively abstract interpretation of mixture weights. Future work is suggested on adaptive parameter search (e.g., Bayesian optimization), incorporation of non‑stationarity in the mixture weights, and extension to multivariate environmental extremes.
Comments & Academic Discussion
Loading comments...
Leave a Comment