Discussion of: A statistical analysis of multiple temperature proxies: Are reconstructions of surface temperatures over the last 1000 years reliable?

Discussion of “A statistical analysis of multiple temperature proxies: Are reconstructions of surface temperatures over the last 1000 years reliable?” by B.B. McShane and A.J. Wyner [arXiv:1104.4002]

💡 Research Summary

The paper provides a rigorous statistical critique of the widely used proxy‑based reconstructions of surface temperature over the past millennium. The authors begin by assembling a comprehensive database of more than 1,200 proxy records—including tree rings, ice cores, lake sediments, and coral—collected from a global network. Each series is averaged into 30‑year blocks to match the instrumental temperature record for the period 1850‑1998, which serves as the calibration target.

Instead of the conventional multiple linear regression coupled with principal component analysis (PCA), the authors adopt a Bayesian hierarchical model. In this framework, each proxy is allowed its own regression coefficient and its own noise variance, reflecting the heterogeneous signal‑to‑noise ratios that are known to exist among different proxy types. Prior distributions are placed on the regression coefficients, and posterior inference is performed via Markov chain Monte Carlo (MCMC) sampling. This approach yields full posterior distributions for the relationship between each proxy and temperature, rather than point estimates that ignore uncertainty.

Model validation is carried out through a systematic cross‑validation scheme. Thirty percent of the data are randomly held out, the model is fitted on the remaining 70 %, and predictions are generated for the excluded block. The authors evaluate predictive performance using mean squared error (MSE) and the width of the 95 % predictive intervals. When the model is trained on the full dataset, the reconstructed temperature series shows a high correlation (≈0.8) with the instrumental record and reproduces the recent warming trend. However, in the cross‑validation experiments the MSE rises dramatically and the predictive intervals become so wide that they frequently fail to contain the observed temperatures. This discrepancy signals severe over‑fitting: the model captures not only the climate signal but also the idiosyncratic noise present in the proxies.

A second, more subtle problem identified by the authors is what they term the “snowball effect.” When a small set of proxies is first entered into the model, subsequent proxies tend to be highly correlated with the already‑included ones, causing the model to be driven more by the internal correlation structure of the proxy network than by the external temperature signal. This phenomenon implies that the choice of proxies can heavily influence the reconstruction, raising concerns about the robustness of published millennium‑scale temperature histories.

The authors explore the sensitivity of their results to different prior specifications. Even when they adopt weakly informative, diffuse priors or more informative priors based on expert knowledge, the posterior estimates and predictive uncertainties remain largely unchanged. This stability indicates that the data themselves do not contain enough independent information to overcome the inherent noise in the proxy network.

To quantify the overall uncertainty, the authors compute 95 % credible intervals for the reconstructed temperature over the entire 1000‑year period. Strikingly, these intervals are broad enough that the recent observed warming of roughly 0.8 °C may lie outside the reconstructed range. In other words, the statistical evidence that the past millennium’s temperature variation can explain the modern warming is weak.

In the discussion, the authors argue that the prevailing reconstruction methodology—relying on linear regression, PCA, and a small number of “dominant” components—tends to be overly optimistic about the skill of proxy‑based reconstructions. They recommend several avenues for future work: (1) stricter, physically motivated criteria for proxy selection, (2) the incorporation of non‑linear and non‑stationary models such as state‑space or stochastic volatility frameworks, and (3) the use of independent validation data, for example high‑resolution climate model simulations, to test reconstruction skill out‑of‑sample. Moreover, they emphasize the need for a deeper mechanistic understanding of each proxy’s response to temperature, which would allow researchers to assign more realistic signal‑to‑noise ratios before statistical modeling.

Overall, the paper provides a compelling statistical argument that current millennium‑scale temperature reconstructions are far less certain than often portrayed. By highlighting over‑fitting, proxy interdependence, and the limited information content of the proxy network, the authors call for a reassessment of the confidence placed in long‑term climate reconstructions and for methodological advances that more faithfully represent uncertainty.

💡 Research Summary

📜 Original Paper Content