Generative deep learning improves reconstruction of global historical climate records

Reading time: 5 minute
...

📝 Original Info

  • Title: Generative deep learning improves reconstruction of global historical climate records
  • ArXiv ID: 2602.16515
  • Date: 2026-02-18
  • Authors: ** 원문에 저자 정보가 명시되지 않았습니다. (논문 본문이나 저널 페이지에서 확인 필요) **

📝 Abstract

Accurate assessment of anthropogenic climate change relies on historical instrumental data, yet observations from the early 20th century are sparse, fragmented, and uncertain. Conventional reconstructions rely on disparate statistical interpolation, which excessively smooths local features and creates unphysical artifacts, leading to systematic underestimation of intrinsic variability and extremes. Here, we present a unified, probabilistic generative deep learning framework that overcomes these limitations and reveals previously unresolved historical climate variability back to 1850. Leveraging a learned generative prior of Earth system dynamics, our model performs probabilistic inference to recover spatiotemporally consistent historical temperature and precipitation fields from sparse observations. Our approach preserves the higher-order statistics of climate dynamics, transforming reconstruction into a robust uncertainty-aware assessment. We demonstrate that our reconstruction overcomes pronounced biases in widely used historical reference products, including those underlying IPCC assessments, especially regarding extreme weather events. Notably, we uncover higher early 20th-century global warming levels compared to existing reconstructions, primarily driven by more pronounced polar warming, with mean Arctic warming trends exceeding established benchmarks by 0.15--0.29°C per decade for 1900--1980. Conversely, for the modern era, our reconstruction indicates that the broad Arctic warming trend is likely overestimated in recent assessments, yet explicitly resolves previously unrecognized intense, localized hotspots in the Barents Sea and Northeastern Greenland. Furthermore, based on our seamless global reconstruction that recovers precipitation variability across the oceans and under-monitored regions, we uncover an intensification of the global hydrological cycle.

💡 Deep Analysis

📄 Full Content

Understanding historical climate change is crucial for constraining the climate system's current state and predicting its future trajectory [1,8,9]. However, our view of historical climate change remains incomplete. Instrumental measurements of essential variables, such as temperature or precipitation, only extend back to the early 19th century at best, and are restricted to a highly limited number of observation sites [4,10]. Even with the introduction of satellite observations in the 1970s, many regions remain poorly observed [11,12]. Accurate knowledge of the evolution of temperature and precipitation is fundamental to understanding Earth's energy and water cycles [13,14] as well as past changes of climate variability and extreme events, yet observational gaps obscure the pace of long-term warming, the dynamics of climate variability, and the occurrence of past extremes [15]. Reconstructing climate fields is therefore indispensable, turning sparse data into spatiotemporally consistent records that underpin assessments of variability, change, and budget analyses [16,17].

A variety of techniques have been developed to reconstruct climate fields. Statistical approaches, such as Gaussian processes, Kriging, and angular-distance weighting (ADW), exploit spatial covariance to interpolate across missing regions, and have been shown to be effective in producing long-term pixel-wise datasets [18]. These methods are used to create global gridded benchmark datasets such as HadCRUT5 [4] and Berkeley Earth [19], which form the basis for the United Nations Intergovernmental Panel on Climate Change (IPCC) Sixth Assessment Report (AR6) [9]. However, it is well recognized that these interpolation methods inherently prioritize minimizing error variance, often at the cost of suppressing high-frequency variability and spatial heterogeneity [1,[20][21][22]. By smoothing over data-sparse regions, such approaches tend to dampen the magnitude of local extremes and smear distinct climatic gradients [15,19]. Consequently, while robust for diagnosing large-scale mean states, such conventional interpolations inevitably compress the dynamic range of historical climate, placing a conservative lower bound on the climate system’s intrinsic variability.

More recently, deterministic machine-learning approaches have offered promising alternatives. Convolutional inpainting models, including partial convolutional networks [6] and Fourier-domain architectures [7], can capture finer spatial patterns. However, those methods largely remain confined to coarse-resolution (e.g., 5°) temperature fields and formulate reconstruction as a static spatial inpainting problem, thereby neglecting the evolving dynamics of the climate system through time. Such temporal consistency is particularly important for representing dynamical properties such as autocorrelation and variance, which underpin analyses of stability changes of potential tipping systems [23][24][25][26][27][28].

Generative machine learning offers a fundamental departure from these deterministic approaches. Unlike standard deterministic inpainting models, which must be trained on specific input-mask pairs and often struggle to generalize to the irregular sparsity of historical records, generative models learn to approximate the complex high-dimensional probability distribution underlying the physical system in an unsupervised manner [29,30]. Among these models, probabilistic diffusion models (DMs) have emerged as the method of choice, overcoming the training instabilities and mode collapse that are characteristic of generative adversarial networks (GANs) [31][32][33]. By integrating spatiotemporal neural architectures, DMs can approximate the joint high-dimensional distribution of evolving dynamical systems in space and time, effectively learning the unconditional prior distribution of the climate states [34][35][36][37]. This capacity enables DMs to reconstruct the temporal continuity of climate fields, explicitly capturing dependencies [38] that are often neglected by the static spatial interpolation used in current methods [6,7]. Crucially, the score-based formulation [39] facilitates controllable conditional generation, allowing DMs to flexibly integrate sparse observations as constraints to ensure that reconstructions are spatiotemporally consistent, physically plausible, and faithful to historical records [36,38,40].

Here, we present a generative, DM-based probabilistic framework to improve the structural limitations of primary gridded observational products of global temperature and precipitation (Supplementary Fig. S1; see Methods and Supplementary Notes for details), as used in the IPCC’s assessment reports. In this framework, we unconditionally pre-train a DM to encode the joint spatiotemporal distribution of Earth system dynamics, derived from historical climate model simulations and reanalysis data. Subsequently, the model employs sparse observations as dynamical constraints t

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut