Gambling scores in earthquake prediction analysis
The number of successes ’n’ and the normalized measure of space-time alarm ’tau’ are commonly used to characterize the strength of an earthquake prediction method and the significance of prediction results. To evaluate better the forecaster’s skill, it has been recently suggested to use a new characteristic, the gambling score R, which incorporates the difficulty of guessing each target event by using different weights for different alarms. We expand the class of R-characteristics and apply these to the analysis of results of the M8 prediction algorithm. We show that the level of significance ‘alfa’ strongly depends (1) on the choice of weighting alarm parameters, (2) on the partitioning of the entire alarm volume into component parts, and (3) on the accuracy of the spatial rate of target events, m(dg). These tools are at the disposal of the researcher and can affect the significance estimate in either direction. All the R-statistics discussed here corroborate that the prediction of 8.0<=M<8.5 events by the M8 method is nontrivial. However, conclusions based on traditional characteristics (n,tau) are more reliable owing to two circumstances: ’tau’ is stable since it is based on relative values of m(.), and the ’n’ statistic enables constructing an upper estimate of ‘alfa’ taking into account the uncertainty of m(.).
💡 Research Summary
The paper critically examines the conventional metrics used to assess earthquake prediction methods—namely the number of successful predictions (n) and the normalized space‑time alarm measure (τ)—and proposes a more nuanced statistic called the gambling score (R). While n simply counts how many target events fall within declared alarm regions and τ reflects the proportion of the total space‑time volume that is under alarm, both metrics ignore the varying difficulty of individual alarms. To address this, the authors define R as a weighted sum of successes and failures, where each alarm i receives a weight w(p_i) that is a decreasing function of its estimated probability p_i (or spatial rate m(dg)). In practice they use a family of weight functions w(p)=p^‑β, with β controlling how strongly low‑probability (hard) alarms are emphasized.
The study systematically investigates three factors that can dramatically alter the statistical significance level α associated with a given R value. First, the choice of β: larger β values assign higher scores to rare alarms, thereby reducing α (increasing apparent significance) but also inflating variance and making the result more sensitive to modeling errors. Second, the partitioning of the alarm volume into sub‑regions (cells). Finer partitioning yields many cells with small p_i, which again inflates weights and can lead to over‑optimistic R values; coarser partitioning dilutes the weight differences and reduces discriminative power. Third, the accuracy of the spatial rate of target events, m(dg). This rate is estimated from historical seismicity and geological models, and its uncertainty propagates directly into both τ and R. Because R incorporates w(p) explicitly, any error in m(dg) can cause α to swing widely.
To illustrate these effects, the authors apply the expanded R‑framework to the well‑known M8 algorithm, which aims to predict earthquakes of magnitude 8.0 ≤ M < 8.5 worldwide. Using a range of β values (0.5–2.0), cell sizes (10 km to 200 km), and plausible upper and lower bounds for m(dg) (±30 % around the best‑fit estimate), they compute R and the corresponding α for each configuration. The results show that α can vary from as low as 0.01 (highly significant) to as high as 0.2 (borderline), depending on the chosen parameters. Nevertheless, in every scenario R exceeds the expectation under a random‑alarm null hypothesis, confirming that M8’s predictions are non‑trivial.
Crucially, the authors emphasize that the flexibility inherent in choosing β, cell partitioning, and m(dg) estimates means that a researcher could, intentionally or unintentionally, steer α in either direction. By contrast, τ is relatively stable because it depends on relative values of m(dg) rather than absolute magnitudes, and n can be used to construct a conservative upper bound on α that explicitly accounts for uncertainty in m(dg). Consequently, while the gambling score provides a valuable complementary perspective—especially for highlighting the contribution of difficult, low‑probability alarms—the traditional n‑τ framework remains more robust for drawing reliable conclusions about prediction skill. The paper concludes that both approaches should be employed together: R to capture nuanced performance aspects, and n and τ to anchor the analysis in stable, less parameter‑sensitive statistics.
Comments & Academic Discussion
Loading comments...
Leave a Comment