Perturbation-based Inference for Extreme Value Index
The extreme value index (EVI) characterizes the tail behavior of a distribution and is crucial for extreme value theory. Inference on the EVI is challenging due to data scarcity in the tail region. We propose a novel method for constructing confidence intervals for the EVI using synthetic exceedances generated via perturbation. Rather than perturbing the entire sample, we add noise to exceedances above a high threshold and apply the generalized Pareto distribution (GPD) approximation. Confidence intervals are derived by simulating the distribution of pivotal statistics from the perturbed data. We show that the pivotal statistic is consistent, ensuring the proposed method provides consistent intervals for the EVI. Additionally, we demonstrate that the perturbed data is differentially private. When the GPD approximation is inadequate, we introduce a refined perturbation method. Simulation results show that our approach outperforms existing methods, providing robust and reliable inference.
💡 Research Summary
This paper addresses the significant challenge of performing statistical inference on the Extreme Value Index (EVI), a crucial parameter characterizing the tail behavior of a distribution within Extreme Value Theory (EVT). The primary difficulty stems from the inherent scarcity of data in the tail region, which complicates the construction of accurate confidence intervals. The authors propose a novel, perturbation-based methodology that generates synthetic exceedances over a high threshold to facilitate robust inference while preserving data privacy.
The core innovation lies in perturbing only the exceedances above a carefully chosen threshold, rather than the entire dataset. This approach leverages the Pickands-Balkema-de Haan theorem, which states that exceedances over a sufficiently high threshold asymptotically follow a Generalized Pareto Distribution (GPD). The procedure begins by estimating the GPD parameters (shape γ and scale β) from the original sample’s top order statistics. Then, for each exceedance, its probability integral transform under the estimated GPD is computed, and Laplace noise is added to this value. A specific transformation function maps this noised value back to a synthetic exceedance that follows the GPD with the estimated parameters. This process is repeated to create multiple independent perturbed datasets.
Statistical inference is conducted using a pivotal statistic, specifically T = √k(ˆγ - γ)/ˆγ, where ˆγ is the Hill estimator. Instead of relying on its asymptotic normal distribution, the method simulates the finite-sample distribution of this pivot using the ensemble of perturbed samples. For each perturbed dataset, a perturbed Hill estimator ˆγ* is computed, and a corresponding perturbed pivotal statistic T* is formed. The empirical distribution of these T* values is then used to construct the confidence interval for γ, using the appropriate quantiles.
The paper provides substantial theoretical underpinnings for the proposed method. It derives the tail quantile process of the perturbed sample (Theorem 1) and establishes the asymptotic distribution of the perturbed Hill estimator (Corollary 1). The key theoretical result (Theorem 2) demonstrates the consistency of the perturbed pivotal statistic. Under the standard second-order condition where √k A(n/k) → 0, the distribution of T* converges to the distribution of T, guaranteeing that the resulting confidence intervals are consistent. Furthermore, the authors formally prove that the perturbation mechanism satisfies ε-differential privacy, offering a formal guarantee that the synthetic data does not leak sensitive information about individual data points in the original sample.
Acknowledging that the GPD approximation may be inadequate in finite samples, the authors introduce a refined perturbation procedure. This extension employs a more flexible model to reduce approximation bias and combines results from both the standard and refined perturbation schemes to form a final confidence interval that is more robust to the choice of the threshold.
Comprehensive simulation studies across various heavy-tailed distributions (e.g., Pareto, Fréchet) demonstrate the practical superiority of the proposed method. Compared to established benchmarks like normal approximation, bootstrap, and empirical likelihood, the perturbation-based method consistently delivers confidence intervals with coverage probabilities closer to the nominal level (e.g., 95%) while often achieving shorter interval lengths. This superior performance is particularly evident in scenarios with moderate sample sizes or suboptimal threshold choices, highlighting the method’s robustness.
In conclusion, this work presents a sophisticated and effective framework for EVI inference that successfully integrates ideas from perturbation theory, differential privacy, and extreme value theory. It offers a principled solution to the dual problems of data scarcity in the tails and privacy concerns, providing statistically consistent and private confidence intervals with strong finite-sample performance. The methodology holds considerable promise for applications in fields such as finance, insurance, and environmental science, where analyzing extreme events is critical.
Comments & Academic Discussion
Loading comments...
Leave a Comment