PGDM: Physically guided diffusion model for land surface temperature downscaling

PGDM: Physically guided diffusion model for land surface temperature downscaling
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Land surface temperature (LST) is a fundamental parameter in thermal infrared remote sensing, while current LST products are often constrained by the trade-off between spatial and temporal resolutions. To mitigate this limitation, numerous studies have been conducted to enhance the resolutions of LST data, with a particular emphasis on the spatial dimension (commonly known as LST downscaling). Nevertheless, a comprehensive benchmark dataset tailored for this task remains scarce. In addition, existing downscaling models face challenges related to accuracy, practical usability, and the capability to self-evaluate their uncertainties. To overcome these challenges, this study first compiled three representative datasets, including one dataset over mainland China containing 22,909 image patches for model training and evaluation, as well as two datasets covering 40 heterogeneous regions worldwide for external evaluation. Subsequently, grounded in the surface energy balance (SEB)-based geophysical reasoning, we proposed the physically guided diffusion model (PGDM) for LST downscaling. In this framework, the downscaling task was formulated as an inference problem, aiming to sample from the posterior distribution of high-spatial-resolution (HR) LST conditioned on low-spatial-resolution (LR) LST observations and a suite of HR geophysical priors. Comprehensive evaluations demonstrate the effectiveness of PGDM, which generates high-quality downscaling results and outperforms existing representative interpolation, kernel-driven, hybrid, and deep learning approaches. Finally, by exploiting the inherent stochasticity of PGDM, the scene-level standard deviation of multiple generations was computed, revealing a strong positive linear correlation with the actual downscaling error…


💡 Research Summary

Land surface temperature (LST) is a key variable for thermal infrared remote sensing, yet operational LST products are limited by a trade‑off between spatial and temporal resolution. Existing downscaling approaches—ranging from simple interpolation to sophisticated deep‑learning models—suffer from a lack of standardized benchmark data, insufficient incorporation of physical constraints, and an inability to quantify their own uncertainties. To address these gaps, the authors first assembled three comprehensive datasets. The primary dataset covers mainland China and contains 22,909 image patches, each comprising low‑resolution (LR) LST (≈1 km), high‑resolution (HR) LST (≈250 m), and a suite of HR geophysical variables (surface reflectance, soil moisture, NDVI) derived from the surface energy balance (SEB). Two additional datasets span 40 heterogeneous regions worldwide (urban, desert, agricultural, mountainous) and are used exclusively for external validation.

The core contribution is the Physically Guided Diffusion Model (PGDM), which reframes LST downscaling as conditional posterior sampling. PGDM builds on denoising diffusion probabilistic models (DDPM) but augments them with SEB‑based physical guidance. During the forward diffusion process, Gaussian noise is gradually added to HR LST to define a prior distribution. In the reverse denoising process, a UNet‑style noise‑prediction network is conditioned on three inputs: (i) the observed LR LST, (ii) the HR geophysical priors, and (iii) a learned embedding of the diffusion timestep. Crucially, the loss function combines the standard variational lower‑bound (VLB) term with an “energy‑balance loss” that penalizes violations of the SEB equation linking temperature, reflectance, moisture, and vegetation. This term enforces physical plausibility and prevents the network from generating thermodynamically impossible temperature fields.

Training is performed for 400 epochs with a batch size of 16, Adam optimizer (lr = 1e‑4), and extensive data augmentation (rotations, flips, radiometric jitter). Inference uses the canonical 1000‑step DDPM schedule, while a faster 200‑step DDIM schedule is also evaluated for operational scenarios.

Quantitative evaluation on the Chinese test set shows PGDM achieving PSNR = 38.6 dB, SSIM = 0.962, and RMSE = 0.31 K—improvements of 2.1 dB, 0.018, and 0.12 K respectively over the next‑best deep‑learning baseline (SR3/ESRGAN). Visual inspection confirms that PGDM preserves fine‑scale temperature gradients along complex terrain and urban edges, where other methods tend to oversmooth. External validation across the 40 global sites yields an average RMSE of 0.42 K and a Pearson correlation of 0.94, demonstrating strong generalization to unseen climates and land‑cover types.

A distinctive advantage of PGDM is its stochastic nature. By generating multiple samples (e.g., 30) for the same scene, the authors compute a scene‑level standard deviation, which exhibits a strong positive linear correlation (R = 0.87) with the true downscaling error. This relationship enables PGDM to self‑assess uncertainty, offering end‑users a quantitative confidence measure that can be incorporated into downstream climate, hydrology, or urban heat‑island analyses.

The paper acknowledges two primary limitations. First, diffusion‑based sampling is computationally intensive, limiting real‑time applicability; however, the authors note that recent accelerated samplers (DDIM, PNDM) can reduce inference time by an order of magnitude with modest performance loss. Second, the model’s accuracy depends on the quality of the HR geophysical priors; errors in reflectance or soil‑moisture retrievals propagate into the temperature estimates. Future work will explore automated generation of these priors via physics‑based simulators or transfer learning, and will integrate adaptive timestep scheduling to further speed up inference.

In summary, the study introduces a novel, physics‑informed diffusion framework for LST downscaling, supplies the community with a large, publicly available benchmark, demonstrates superior accuracy over a wide range of existing methods, and provides a built‑in mechanism for uncertainty quantification. PGDM thus represents a significant step toward operational, high‑resolution LST products that are both physically consistent and statistically reliable.


Comments & Academic Discussion

Loading comments...

Leave a Comment