A note on the ranking of earthquake forecasts
The ranking problem of earthquake forecasts is considered. We formulate simple statistical requirements to forecasting quality measure R and analyze some R-ranking methods on this basis, in particular, the pari-mutuel gambling method by Zechar&Zhuang (2014).
💡 Research Summary
The paper addresses the longstanding problem of how to rank competing earthquake‑forecasting models in a statistically sound manner. It begins by establishing four fundamental requirements that any forecast‑quality metric R should satisfy. First, R must faithfully capture the probabilistic agreement between forecasted rates and observed seismicity, ensuring that higher values correspond to better calibrated predictions. Second, R should provide a common scale so that disparate models can be compared directly. Third, the metric must be robust to sample‑size variations, avoiding undue sensitivity when only a few events are available. Fourth, R should be neutral for random or uninformative forecasts, i.e., its expected value under a null model should be zero, preventing systematic over‑ or under‑ranking.
Using these criteria, the author reviews three widely used ranking approaches: the log‑likelihood score, the Brier score, and the pari‑mutuel gambling method introduced by Zechar and Zhuang (2014). The log‑likelihood evaluates the full probability distribution but can become unstable for rare, high‑impact earthquakes. The Brier score measures mean squared error between forecast probabilities and binary outcomes, offering intuitive interpretation but lacking sensitivity to low‑probability, high‑consequence events. The pari‑mutuel method treats each model’s forecast as a “bet” against a collective pool; the payoff reflects the model’s relative performance within the pool, automatically normalizing by the total betting volume.
The analysis reveals that while the pari‑mutuel scheme satisfies the second and third requirements (common scale and sample‑size robustness), it fails to meet the first and fourth. Specifically, models assigning near‑zero probability to a large earthquake incur severe penalties when such an event occurs, effectively over‑penalizing realistic but low‑probability forecasts. Moreover, the composition of the betting pool heavily influences outcomes; adding or removing models can dramatically reshuffle rankings, undermining fair comparison. These issues stem from the method’s reliance on the pool’s internal distribution rather than an external, objective reference.
To remedy these shortcomings, the author proposes two modifications. First, decouple the betting pool from the prior probability distribution by fixing the pool’s total weight independently of the models’ forecasts, thereby preventing extreme penalties for rare events. Second, introduce a log‑likelihood‑based normalization that expresses each model’s score relative to the average log‑likelihood across all models, preserving the common‑scale property while ensuring neutrality for random forecasts. The combined “hybrid” metric retains the intuitive competitive aspect of pari‑mutuel betting but aligns with all four statistical requirements.
In conclusion, the paper argues that any ranking metric for earthquake forecasts must balance statistical rigor with practical interpretability. By formalizing the essential properties of R and critically evaluating existing methods, especially the pari‑mutuel approach, the study provides a clear roadmap for developing more reliable, fair, and actionable model rankings—not only for seismic hazard assessment but also for broader natural‑hazard forecasting contexts.
Comments & Academic Discussion
Loading comments...
Leave a Comment