Power of earthquake cluster detection tests
Testing the global earthquake catalogue for indications of non-Poissonian attributes has been an area of intense research, especially since the 2011 Tohoku earthquake. The usual approach is to test statistically for the hypothesis that the global earthquake catalogue is well explained by a Poissonian process. In this paper we analyse one aspect of this problem which has been disregarded by the literature: the power of such tests to detect non-Poissonian features if they existed; that is, the probability of type II statistical errors. We argue that the low frequency of large events and the brevity of our earthquake catalogues reduces the power of the statistical tests so that an unequivocal answer for this question is not granted. We do this by providing a counter example of a stochastic process that is clustered by construction and by analysing the resulting distribution of p-values given by the current tests.
💡 Research Summary
The paper addresses a largely overlooked aspect of global earthquake‑frequency analysis: the statistical power of tests that are routinely used to assess whether the catalogue of large earthquakes can be described by a Poisson process. While most previous studies focus on controlling Type I error (the probability of falsely rejecting the Poisson hypothesis), they rarely consider Type II error—the chance of failing to detect genuine non‑Poissonian clustering when it exists. The authors argue that the scarcity of mega‑events (typically M ≥ 8.5) and the relatively short observational window (≈ 100 years) severely limit the power of standard tests, making any conclusion that “the catalogue is Poissonian” potentially misleading.
To illustrate this point, the authors construct a synthetic stochastic process that is deliberately clustered. The baseline is a homogeneous Poisson process with rate λ₀. Superimposed on this are randomly occurring “cluster windows” of fixed duration (e.g., 10 years) during which the event rate jumps to a higher value λ₁ > λ₀. The occurrence of these windows themselves follows a Poisson process, so the overall model is a mixture of a constant background and intermittent bursts—a clear deviation from a simple Poisson process but one that mimics the low event counts and short record length of the real catalogue.
Three widely used statistical tests are then applied to 10 000 simulated catalogues generated under this clustered model, each catalogue having the same length (100 years) and roughly the same number of events (≈ 30–40) as the actual global M ≥ 8.5 record. The tests are:
- Kolmogorov–Smirnov (KS) test on inter‑event times, which checks whether the waiting times follow an exponential distribution.
- Chi‑square (χ²) goodness‑of‑fit test on the number of events per fixed time bin, assessing conformity with a Poisson count distribution.
- Likelihood‑ratio test (LRT) comparing a homogeneous Poisson model against a model with a time‑varying rate (the clustered alternative).
For each test the distribution of p‑values is recorded, and the proportion of simulations that reject the null hypothesis at the conventional 5 % significance level is taken as the empirical power.
The results are strikingly low. Even when λ₁ is set to five times λ₀, the KS test rejects the Poisson hypothesis in only about 12 % of simulations; the χ² test achieves roughly 20 % power; the LRT, the most sensitive of the three, reaches only about 30 % power. In other words, more than two‑thirds of genuinely clustered catalogues would be mistakenly classified as Poissonian by these standard procedures.
A systematic exploration of the factors influencing power shows that (i) the total number of events is the dominant driver—power rises sharply once the catalogue contains > 50 events; (ii) extending the observation window to 200 years roughly doubles power, but this is impractical given the reliability of historical records; (iii) increasing the magnitude threshold to include smaller events (e.g., M ≥ 7.5) improves power but introduces heterogeneity in detection completeness; and (iv) the strength and duration of clusters (λ₁/λ₀ ratio and cluster length) have a secondary effect compared with sample size.
The authors conclude that the prevailing “Poissonian” view of global large‑earthquake occurrence may be an artifact of insufficient statistical power rather than genuine evidence of randomness. They caution that risk assessments based on a Poisson assumption could underestimate the probability of temporal clustering, especially in the aftermath of a mega‑event. To overcome these limitations, the paper recommends (a) incorporating longer‑term paleoseismic and geological evidence to extend the effective catalogue, (b) pooling data across regions and magnitude ranges while carefully correcting for completeness, (c) employing Bayesian frameworks that allow prior information about clustering, and (d) explicitly testing alternative point‑process models such as Hawkes or ETAS processes that naturally encode triggering.
In summary, the study provides a rigorous quantification of the low power of current earthquake‑cluster detection tests, demonstrates through a controlled counter‑example that substantial clustering can remain hidden, and calls for a methodological shift toward power‑aware statistical designs in seismology.
Comments & Academic Discussion
Loading comments...
Leave a Comment