Estimating Canopy Height at Scale

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We propose a framework for global-scale canopy height estimation based on satellite data. Our model leverages advanced data preprocessing techniques, resorts to a novel loss function designed to counter geolocation inaccuracies inherent in the ground-truth height measurements, and employs data from the Shuttle Radar Topography Mission to effectively filter out erroneous labels in mountainous regions, enhancing the reliability of our predictions in those areas. A comparison between predictions and ground-truth labels yields an MAE / RMSE of 2.43 / 4.73 (meters) overall and 4.45 / 6.72 (meters) for trees taller than five meters, which depicts a substantial improvement compared to existing global-scale maps. The resulting height map as well as the underlying framework will facilitate and enhance ecological analyses at a global scale, including, but not limited to, large-scale forest and biomass monitoring.

💡 Research Summary

The paper presents a comprehensive framework for generating a global, high‑resolution (10 m) canopy height map by integrating Sentinel‑1 radar, Sentinel‑2 multispectral imagery, and GEDI LiDAR measurements. Recognizing the limitations of existing national forest inventories and previous global canopy height products—namely coarse resolution, limited accuracy, and sparse ground truth—the authors design a pipeline that addresses data quality, label noise, model architecture, and loss function robustness.

Data preprocessing begins with constructing seasonal composites: Sentinel‑1 images are median‑aggregated over the summer leaf‑on period to mitigate weather‑related backscatter noise, while Sentinel‑2 data undergo a cloud‑reduction algorithm to remove clouds, shadows, and cirrus. Four Sentinel‑1 bands and ten Sentinel‑2 bands are combined, yielding 14 input channels per 512 × 512 pixel patch. GEDI’s RH100 metric serves as the target label, but because GEDI’s footprint can be mis‑located and can overestimate heights on steep slopes, the authors filter out measurements where the corresponding SRTM slope exceeds 20°. This step removes erroneous high values in mountainous terrain.

From the preprocessed imagery, 100 000 patches are sampled uniformly, each containing 10–400 GEDI points. The dataset is split into 80 % training, 10 % validation, and 10 % test sets, ensuring no spatial overlap. The model adopts a U‑Net architecture with a ResNet‑50 backbone, modified for regression by using a single linear output channel. Training employs AdamW (learning rate 0.001, weight decay 0.001), a batch size of 32, a 10 % warm‑up phase, and a linear decay schedule thereafter. To counter the skewed distribution of height values, a weighted sampler is used during training.

The most novel contribution is the “Shift‑Resilient Loss.” GEDI measurements suffer from systematic geolocation errors that manifest as consistent shifts along each track. Standard pixel‑wise L2 loss would penalize correct predictions simply because the ground‑truth point is displaced. The proposed loss allows each track to be shifted within a radius r (e.g., a few pixels) in any direction and selects the shift that yields the lowest loss. Formally, each track is rendered as a sparse height map, masked, and the loss is computed as the average over all tracks; the minimum loss across all admissible shifts is taken as the final loss. This makes the training process tolerant to label mis‑registration while still encouraging accurate height estimation.

Evaluation on the held‑out test set shows an overall MAE of 2.43 m and RMSE of 4.73 m. For trees taller than 5 m, the errors increase to MAE 4.45 m and RMSE 6.72 m, reflecting the greater variability of taller canopies but still representing a substantial improvement over prior global products (e.g., Potapov 2021, Lang 2023) which typically report MAE in the range of 5–7 m. The authors also demonstrate that the trained model can be deployed in a distributed inference environment, producing a seamless global canopy height layer that is made available via streaming services.

Limitations include the inherent sparsity of GEDI observations, the exclusion of steep‑slope regions (which may bias results in mountainous areas), and the sensitivity of the shift‑resilient loss to the chosen radius parameter. Future work could incorporate up‑to‑date digital terrain models for more precise label correction, explore multi‑scale fusion with higher‑resolution commercial satellites, and extend the loss formulation to handle anisotropic or terrain‑dependent shifts. Overall, the paper delivers a robust, scalable solution that bridges the gap between regional high‑quality canopy height maps and global coverage, offering a valuable resource for forest monitoring, carbon accounting, and ecological research.

Estimating Canopy Height at Scale

💡 Research Summary

Comments & Academic Discussion

Leave a Comment