VibrantSR: Sub-Meter Canopy Height Models from Sentinel-2 Using Generative Flow Matching

VibrantSR: Sub-Meter Canopy Height Models from Sentinel-2 Using Generative Flow Matching
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We present VibrantSR (Vibrant Super-Resolution), a generative super-resolution framework for estimating 0.5 meter canopy height models (CHMs) from 10 meter Sentinel-2 imagery. Unlike approaches based on aerial imagery that are constrained by infrequent and irregular acquisition schedules, VibrantSR leverages globally available Sentinel-2 seasonal composites, enabling consistent monitoring at a seasonal-to-annual cadence. Evaluated across 22 EPA Level 3 eco-regions in the western United States using spatially disjoint validation splits, VibrantSR achieves a Mean Absolute Error of 4.39 meters for canopy heights >= 2 m, outperforming Meta (4.83 m), LANDFIRE (5.96 m), and ETH (7.05 m) satellite-based benchmarks. While aerial-based VibrantVS (2.71 m MAE) retains an accuracy advantage, VibrantSR enables operational forest monitoring and carbon accounting at continental scales without reliance on costly and temporally infrequent aerial acquisitions.


💡 Research Summary

VibrantSR introduces a generative super‑resolution framework that converts globally available 10 m Sentinel‑2 multispectral imagery into 0.5 m canopy height models (CHMs). The core innovation is a flow‑matching based latent‑space transformation: a frozen Sentinel‑2 auto‑encoder compresses the 12‑band input into a 32 × 32 latent grid, while a separately pretrained CHM auto‑encoder maps high‑resolution lidar‑derived CHMs into a compatible latent space. A trainable flow‑matching network, implemented as a U‑shaped Vision Transformer (U‑ViT) with 16 transformer layers and 16 attention heads, learns to transport the Sentinel‑2 latent distribution to the CHM latent distribution by minimizing a velocity‑matching loss. This probabilistic formulation allows the model to generate realistic fine‑scale canopy structures rather than a single deterministic surface, preserving height variability and edge detail.

Training used a massive dataset covering the western United States: 168,834 Sentinel‑2/CHM tile pairs for training and 66,154 for validation, spanning 22 EPA Level 3 ecoregions. Sentinel‑2 inputs were seasonally aggregated (June‑August 2024 median) and resampled to 10 m; each tile covers 480 m × 480 m (48 × 48 pixels) paired with a 960 × 960 pixel (0.5 m) CHM derived from USGS 3DEP airborne lidar (DSM‑DTM). Tiles with negative heights, water, or implausibly tall values (>120 m) were excluded or clipped. Spatially disjoint “checkerboard” splits ensured that training and test areas were geographically independent, eliminating spatial autocorrelation bias.

The model was trained on a single node with eight NVIDIA A100 GPUs, consuming roughly 5,400 A100‑hours. Data augmentation included random horizontal/vertical flips and 90° rotations. During inference, the Sentinel‑2 latent is concatenated with a deterministic noise vector, and the ODE defined by the flow network is integrated from t = 0 to t = 1 using the dopri5 solver with 100 steps, yielding fully reproducible outputs.

Performance was evaluated with four metrics: Mean Absolute Error (MAE), Mean Error (ME) for bias, Block‑R² (aggregated 30 m blocks), and Edge Error (EE) based on Sobel edge comparison. For canopy heights ≥ 2 m, VibrantSR achieved an MAE of 4.39 m, outperforming three satellite‑based benchmarks—Meta (4.83 m), LANDFIRE (5.96 m), and ETH (7.05 m)—representing improvements of 9 %, 26 %, and 38 % respectively. ME was –2.35 m, indicating a modest under‑estimation, while Block‑R² was –0.62, showing strong explanatory power at the block level. EE of 0.16 was substantially lower than the benchmarks (0.30–0.64), confirming superior preservation of fine structural detail. Compared with the aerial‑based VibrantVS (MAE = 2.71 m), VibrantSR incurs a higher error but gains the ability to produce continental‑scale, seasonally updated CHMs without costly aerial acquisitions.

Key contributions include: (1) the first application of flow‑matching generative models to biophysical field generation, moving beyond image‑only super‑resolution; (2) a staged training strategy that freezes the encoders, isolating the transport learning and improving stability; (3) extensive spatially independent validation across diverse forest types, demonstrating operational readiness. Limitations involve reliance on dense lidar training labels (which may be scarce in many regions), potential loss of extreme height information due to clipping, and residual errors that still limit sub‑meter precision for certain management tasks. Future work could explore semi‑supervised learning to reduce label dependence, multi‑temporal Sentinel‑2 composites for change detection, and integration of additional satellite sources (e.g., PlanetScope) to further enhance robustness and global applicability.


Comments & Academic Discussion

Loading comments...

Leave a Comment