Making and Evaluating Point Forecasts

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Typically, point forecasting methods are compared and assessed by means of an error measure or scoring function, such as the absolute error or the squared error. The individual scores are then averaged over forecast cases, to result in a summary measure of the predictive performance, such as the mean absolute error or the (root) mean squared error. I demonstrate that this common practice can lead to grossly misguided inferences, unless the scoring function and the forecasting task are carefully matched. Effective point forecasting requires that the scoring function be specified ex ante, or that the forecaster receives a directive in the form of a statistical functional, such as the mean or a quantile of the predictive distribution. If the scoring function is specified ex ante, the forecaster can issue the optimal point forecast, namely, the Bayes rule. If the forecaster receives a directive in the form of a functional, it is critical that the scoring function be consistent for it, in the sense that the expected score is minimized when following the directive. A functional is elicitable if there exists a scoring function that is strictly consistent for it. Expectations, ratios of expectations and quantiles are elicitable. For example, a scoring function is consistent for the mean functional if and only if it is a Bregman function. It is consistent for a quantile if and only if it is generalized piecewise linear. Similar characterizations apply to ratios of expectations and to expectiles. Weighted scoring functions are consistent for functionals that adapt to the weighting in peculiar ways. Not all functionals are elicitable; for instance, conditional value-at-risk is not, despite its popularity in quantitative finance.

💡 Research Summary

The paper “Making and Evaluating Point Forecasts” challenges the widespread practice of assessing point forecasts solely through generic error measures such as absolute error or squared error, followed by averaging these scores into summary statistics like MAE, MSE, or RMSE. The author argues that unless the scoring function (the loss used to compute the error) is deliberately matched to the forecasting task, the resulting performance rankings can be severely misleading.

The central concept introduced is that of a functional—a statistical summary of the predictive distribution that the forecaster is asked to deliver, such as the mean, a quantile, a ratio of expectations, or an expectile. For a functional to be meaningfully evaluated, the scoring function must be consistent for that functional: the expected score is minimized precisely when the forecaster reports the value of the functional. If a scoring function is strictly consistent, the forecaster’s optimal action (the Bayes rule) is to issue the functional itself.

A functional is called elicitable if there exists at least one strictly consistent scoring function for it. The paper provides complete characterizations for several important functionals:

Mean – The class of Bregman losses (φ(y) − φ(ŷ) − φ′(ŷ)(y − ŷ) with convex φ) are exactly those that are strictly consistent for the mean.
Quantile – Generalized piecewise‑linear losses, exemplified by ρτ(y − ŷ) = (τ − 1{y < ŷ})(y − ŷ), are the only strictly consistent scores for the τ‑quantile.
Ratio of Expectations – Consistency can be achieved by losses that combine the two variables linearly, e.g., (X − ŷ Z)², thereby targeting E

Making and Evaluating Point Forecasts

💡 Research Summary

Comments & Academic Discussion

Leave a Comment