A Flexible Modeling of Extremes in the Presence of Inliers

A Flexible Modeling of Extremes in the Presence of Inliers
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Many random phenomena, including life-testing and environmental data, show positive values and excess zeros, which pose modeling challenges. In life testing, immediate failures result in zero lifetimes, often due to defects or poor quality, especially in electronics and clinical trials. These failures, called inliers at zero, are difficult to model using standard approaches. The presence and proportion of inliers may influence the accuracy of extreme value analysis, bias parameter estimates, or even lead to severe events or extreme effects, such as drought or crop failure. In such scenarios, a key issue in extreme value analysis is determining a suitable threshold to capture tail behaviour accurately. Although some extreme value mixture models address threshold and tail estimation, they often inadequately handle inliers, resulting in suboptimal results. Bulk model misspecification can affect the threshold, extreme value estimates, and, in particular, the tail proportion. There is no unified framework for defining extreme value mixture models, especially the tail proportion. This paper proposes a flexible model that handles extremes, inliers, and the tail proportion. Parameters are estimated using maximum likelihood estimation. Compared the proposed model estimates with the classical mean excess plot, parameter stability plot, and Pickands plot estimates. Theoretical results are established, and the proposed model outperforms traditional methods in both simulation studies and real data analysis.


💡 Research Summary

The paper addresses a pervasive problem in extreme‑value analysis: data sets that contain a substantial proportion of “inliers” (observations exactly at zero) together with genuinely extreme observations. Such situations arise in life‑testing (instantaneous failures), rainfall records (dry days), and many other domains. Traditional extreme‑value theory (EVT) and the widely used peak‑over‑threshold (POT) approach assume a positive continuous support and treat the threshold as a fixed, user‑chosen quantity. Consequently, standard graphical tools for threshold selection (mean excess plot, parameter stability plot, Pickands plot) are highly sensitive to the presence of zero‑valued inliers, leading to biased estimates of the threshold, the GPD scale σ, and especially the shape ξ. Moreover, existing extreme‑value mixture models (EVMMs) incorporate a bulk distribution below the threshold but do not explicitly model a point mass at zero, so misspecification of the bulk can propagate bias into the tail‑fraction estimate ϕu = Pr(X>u).

To overcome these shortcomings, the authors propose the Flexible Extreme‑Value Inlier Mixture Model (FEVIMM). The model consists of three components:

  1. A degenerate distribution at the origin with probability mass ϕ1, capturing the inliers.
  2. A continuous bulk distribution G∗(·|Φ) for observations between 0 and the threshold u (excluding the origin). The bulk can be any parametric family; the authors illustrate with a Gamma distribution.
  3. A Generalized Pareto Distribution (GPD) for exceedances above u, with parameters (ξ, σ) and an explicit tail‑fraction parameter ϕ2 = Pr(X>u) that is estimated jointly with the other parameters.

The cumulative distribution function is given by equation (3) and the density by equation (4), which can be written as a three‑component mixture:
f(x)=ϕ1 δ0(x)+(1−ϕ1−ϕ2) f1(x)+ϕ2 f2(x),
where f1 is the normalized bulk density on (0,u) and f2 is the GPD density on


Comments & Academic Discussion

Loading comments...

Leave a Comment