Physics Encoded Spatial and Temporal Generative Adversarial Network for Tropical Cyclone Image Super-resolution
High-resolution satellite imagery is indispensable for tracking the genesis, intensification, and trajectory of tropical cyclones (TCs). However, existing deep learning-based super-resolution (SR) methods often treat satellite image sequences as generic videos, neglecting the underlying atmospheric physical laws governing cloud motion. To address this, we propose a Physics Encoded Spatial and Temporal Generative Adversarial Network (PESTGAN) for TC image super-resolution. Specifically, we design a disentangled generator architecture incorporating a PhyCell module, which approximates the vorticity equation via constrained convolutions and encodes the resulting approximate physical dynamics as implicit latent representations to separate physical dynamics from visual textures. Furthermore, a dual-discriminator framework is introduced, employing a temporal discriminator to enforce motion consistency alongside spatial realism. Experiments on the Digital Typhoon dataset for 4$\times$ upscaling demonstrate that PESTGAN establishes a better performance in structural fidelity and perceptual quality. While maintaining competitive pixel-wise accuracy compared to existing approaches, our method significantly excels in reconstructing meteorologically plausible cloud structures with superior physical fidelity.
💡 Research Summary
The paper introduces PESTGAN, a novel super‑resolution (SR) framework specifically designed for tropical cyclone (TC) satellite imagery. Traditional deep‑learning SR methods treat image sequences as generic videos, ignoring the governing atmospheric dynamics that shape cloud motion. PESTGAN addresses this gap by embedding physical knowledge directly into the network architecture and by enforcing temporal consistency through a dual‑discriminator scheme.
Core Architecture
The generator, called the Physics‑Encoded Generator (PEG), follows a disentangled design. Low‑resolution (LR) frames are first up‑sampled by nearest‑neighbor interpolation and passed through a shared encoder to obtain compact feature maps. These features are then split into two parallel branches:
-
Physical Dynamics Branch (Branch A) – Implements a PhyCell recurrent unit. PhyCell uses large (7×7) convolution kernels that are constrained to behave like differential operators. By regularizing the kernel moments (L_ker loss), the unit approximates the vorticity equation of a tropical cyclone in latent space, producing a physics‑focused latent state h_phy that captures macro‑scale cloud translation, rotation, and deformation.
-
Residual Texture Branch (Branch B) – Utilizes a ConvLSTM with small (3×3) kernels to learn high‑frequency texture details without physical constraints.
The outputs h_phy and h_res are concatenated, fused through a convolutional layer, and decoded by transposed convolutions and residual blocks to generate the final high‑resolution (HR) frame I_SR. This separation ensures that physically implausible texture hallucinations are minimized while preserving realistic fine‑scale structures.
Dual‑Discriminator System
PESTGAN employs two discriminators:
-
Spatial Discriminator (D_S) – A conventional 2‑D CNN that receives the concatenated up‑sampled LR and generated HR images. Spectral normalization is applied to all layers except the final one, and a Feature Matching loss forces the generator to align its intermediate features with those of real HR images, improving perceptual realism.
-
Temporal Discriminator (D_T) – Receives a 5‑channel tensor composed of three consecutive frames (real HR at t‑1 and t+1, generated HR at t) and their forward/backward difference maps (Δ_SR_prev, Δ_SR_next). By explicitly feeding motion cues, D_T penalizes discontinuities that violate the continuity equation, thereby suppressing flickering and ensuring physically coherent motion.
Both discriminators are trained with the hinge loss, which stabilizes adversarial learning.
Loss Function
The total objective combines five terms:
- Reconstruction loss (L1) – Pixel‑wise L1 distance for basic structural fidelity.
- Feature Matching loss (L_fea) – Euclidean distance between discriminator feature maps of real and generated samples.
- Adversarial loss (L_adv) – Sum of spatial and temporal hinge losses, encouraging sharp textures and smooth transitions.
- Physics‑encoded loss (L_ker) – Moment‑based regularization of PhyCell kernels to enforce approximation of differential operators derived from the vorticity equation.
- Statistical consistency loss (L_stat) – Matches spatial variance (spectral energy) of generated and real images and penalizes high variance in temporal differences, reducing non‑physical flicker.
Hyper‑parameters λ balance the contributions of each term, allowing the model to trade off pixel accuracy, perceptual quality, and meteorological validity.
Experiments
The authors train and evaluate PESTGAN on the Digital Typhoon dataset, which provides multi‑spectral satellite imagery of tropical cyclones at native resolutions of 0.5–2 km. They perform a 4× up‑scaling task (e.g., 64 × 64 → 256 × 256). Baselines include SRGAN, ESRGAN, EDSR, and video‑SR models such as DUF. Evaluation metrics cover standard SR measures (PSNR, SSIM, LPIPS) and physics‑oriented metrics (vorticity preservation, divergence continuity).
Results show that PESTGAN attains comparable or slightly higher PSNR/SSIM than the baselines while achieving substantially lower LPIPS, indicating superior perceptual fidelity. More importantly, the physics‑aware metrics demonstrate that the generated cloud structures retain realistic vorticity patterns and exhibit smooth, divergence‑consistent motion. Visual inspection confirms that eye‑walls, spiral rainbands, and convective cores are reconstructed with physically plausible shapes, reducing artifacts such as discontinuous cloud flow or impossible deformations that plague purely data‑driven models.
Limitations and Future Work
PESTGAN’s PhyCell operates in a 2‑D latent space, so vertical atmospheric structure (e.g., vertical wind shear, moisture stratification) is not explicitly modeled. The kernel‑moment regularization captures linear differential operators well but may struggle with highly nonlinear processes like deep convection or precipitation loading. The dual‑branch architecture introduces additional computational overhead, making inference roughly 1.5× slower than standard SRGAN. Future directions include extending the physics encoder to incorporate 3‑D atmospheric fields, designing richer physics‑losses for nonlinear terms, and optimizing the model for real‑time operational deployment.
Conclusion
By integrating a physics‑encoded recurrent cell and a temporal discriminator that explicitly evaluates motion consistency, PESTGAN establishes a new paradigm for satellite image super‑resolution: one that respects the governing fluid dynamics of tropical cyclones while delivering high‑quality, perceptually realistic high‑resolution imagery. This approach bridges the gap between data‑driven deep learning and physically grounded modeling, offering a valuable tool for meteorologists and disaster‑response agencies that require both visual detail and scientific fidelity in satellite observations.
Comments & Academic Discussion
Loading comments...
Leave a Comment