Testing earthquake predictions

Testing earthquake predictions
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Statistical tests of earthquake predictions require a null hypothesis to model occasional chance successes. To define and quantify chance success' is knotty. Some null hypotheses ascribe chance to the Earth: Seismicity is modeled as random. The null distribution of the number of successful predictions -- or any other test statistic -- is taken to be its distribution when the fixed set of predictions is applied to random seismicity. Such tests tacitly assume that the predictions do not depend on the observed seismicity. Conditioning on the predictions in this way sets a low hurdle for statistical significance. Consider this scheme: When an earthquake of magnitude 5.5 or greater occurs anywhere in the world, predict that an earthquake at least as large will occur within 21 days and within an epicentral distance of 50 km. We apply this rule to the Harvard centroid-moment-tensor (CMT) catalog for 2000--2004 to generate a set of predictions. The null hypothesis is that earthquake times are exchangeable conditional on their magnitudes and locations and on the predictions--a common ``nonparametric'' assumption in the literature. We generate random seismicity by permuting the times of events in the CMT catalog. We consider an event successfully predicted only if (i) it is predicted and (ii) there is no larger event within 50 km in the previous 21 days. The $P$-value for the observed success rate is $<0.001$: The method successfully predicts about 5% of earthquakes, far better than chance,’ because the predictor exploits the clustering of earthquakes – occasional foreshocks – which the null hypothesis lacks. Rather than condition on the predictions and use a stochastic model for seismicity, it is preferable to treat the observed seismicity as fixed, and to compare the success rate of the predictions to the success rate of simple-minded predictions like those just described. If the proffered predictions do no better than a simple scheme, they have little value.


💡 Research Summary

The paper addresses a fundamental methodological flaw that pervades many statistical tests of earthquake predictions. Conventional significance testing typically fixes the set of predictions and then generates a null distribution by applying those predictions to a stochastic model of seismicity—often by randomizing event times, assuming exchangeability conditional on magnitudes and locations. Implicit in this approach is the assumption that the predictions are independent of the observed seismicity. By conditioning on the predictions while randomizing the earthquake catalog, the test sets an artificially low bar for significance because it ignores the strong spatiotemporal clustering that characterizes real seismicity.

To illustrate the problem, the authors construct an extremely simple “baseline” predictor: whenever a magnitude ≥ 5.5 earthquake occurs anywhere on the globe, they forecast that another earthquake of at least the same magnitude will occur within the next 21 days and within a 50 km epicentral radius. Applying this rule to the Harvard centroid‑moment‑tensor (CMT) catalog for 2000–2004 yields a success rate of roughly 5 %. A success is defined as (i) the event is indeed predicted by the rule and (ii) no larger event has occurred within 50 km in the preceding 21 days, thereby ensuring that the predicted event is not simply a aftershock of a larger quake.

For the null hypothesis, the authors adopt the common non‑parametric assumption that earthquake times are exchangeable given magnitudes, locations, and the set of predictions. They generate surrogate catalogs by permuting the timestamps of the CMT events while leaving magnitudes and coordinates untouched. Because this permutation destroys the natural clustering of foreshocks and aftershocks, the expected number of successes under the null is essentially zero. Consequently, the observed 5 % success rate yields a p‑value of < 0.001, suggesting “highly significant” predictive skill.

The authors argue that this apparent significance is illusory. The predictor’s performance is not evidence of genuine forecasting ability; rather, it exploits the very clustering that the null model fails to capture. In other words, the null hypothesis is misspecified: it treats the earthquake catalog as a set of independent draws when, in reality, seismicity exhibits strong temporal dependence (e.g., foreshock sequences). By conditioning on the predictions and randomizing the catalog, the test underestimates the baseline success rate that any method leveraging clustering would achieve.

A more appropriate evaluation, the paper proposes, is to treat the observed seismicity as fixed and to compare the proposed prediction scheme against simple, transparent baselines such as the 21‑day/50‑km rule itself. If a sophisticated method does not outperform this naïve benchmark, its practical value is questionable. This comparative approach respects the dependence between predictions and the observed catalog and avoids the pitfalls of an ill‑chosen null model.

Beyond the specific example, the paper’s broader contribution is a conceptual clarification: statistical tests of earthquake predictions must account for the fact that predictions are often derived from, and therefore dependent on, the same seismicity they aim to forecast. Ignoring this dependence leads to overly optimistic p‑values and can give a false impression of predictive skill. The authors advocate for baseline‑comparison frameworks, cross‑validation on independent time windows, or simulation models that faithfully reproduce clustering (e.g., ETAS models) when constructing null distributions.

In summary, the study demonstrates that many published tests of earthquake prediction may be misleading because they condition on predictions while randomizing the catalog. By showing that a trivial clustering‑based rule dramatically outperforms the exchangeable‑time null, the authors highlight the necessity of more realistic null hypotheses or, preferably, direct performance comparisons with simple benchmarks. Only through such rigorous evaluation can the true scientific merit of earthquake prediction schemes be properly assessed.


Comments & Academic Discussion

Loading comments...

Leave a Comment