SelvaMask: Segmenting Trees in Tropical Forests and Beyond
Tropical forests harbor most of the planet’s tree biodiversity and are critical to global ecological balance. Canopy trees in particular play a disproportionate role in carbon storage and functioning of these ecosystems. Studying canopy trees at scale requires accurate delineation of individual tree crowns, typically performed using high-resolution aerial imagery. Despite advances in transformer-based models for individual tree crown segmentation, performance remains low in most forests, especially tropical ones. To this end, we introduce SelvaMask, a new tropical dataset containing over 8,800 manually delineated tree crowns across three Neotropical forest sites in Panama, Brazil, and Ecuador. SelvaMask features comprehensive annotations, including an inter-annotator agreement evaluation, capturing the dense structure of tropical forests and highlighting the difficulty of the task. Leveraging this benchmark, we propose a modular detection-segmentation pipeline that adapts vision foundation models (VFMs), using domain-specific detection-prompter. Our approach reaches state-of-the-art performance, outperforming both zero-shot generalist models and fully supervised end-to-end methods in dense tropical forests. We validate these gains on external tropical and temperate datasets, demonstrating that SelvaMask serves as both a challenging benchmark and a key enabler for generalized forest monitoring. Our code and dataset will be released publicly.
💡 Research Summary
The paper introduces SelvaMask, a new, large‑scale dataset for individual tree crown segmentation in tropical forests, and a modular detection‑prompting pipeline that leverages vision foundation models (VFMs) such as Segment Anything Model (SAM). SelvaMask comprises 8,861 manually delineated crowns collected from three Neotropical sites—Barro Colorado Island (Panama), ZF2 Reserve (Brazil), and Tiputini Biodiversity Station (Ecuador)—using high‑resolution (1.3–3.5 cm pixel⁻¹) RGB orthomosaics captured by a DJI Mavic 3 Enterprise UAV. Unlike existing tropical datasets that focus on large emergent trees, SelvaMask follows a “complete‑crown” protocol, annotating every visible crown, which results in a distribution heavily weighted toward small crowns (52 % of crowns are 0–9 m²). The authors detail a rigorous two‑stage annotation workflow (expert tracing, secondary review, independent cross‑review) and report inter‑annotator agreement on a 500‑crown subset, showing decreasing IoU agreement for smaller crowns and establishing a human‑level performance ceiling.
To evaluate segmentation quality, the authors extend the raster‑level F1 (RF1) metric to a multi‑threshold version (mRF1) that averages F1 scores over IoU thresholds from 0.50 to 0.95, aligning with COCO‑style mAP standards and providing a more nuanced assessment of boundary accuracy. The dataset is split spatially into non‑overlapping training, validation, and test zones within each site, and tiles are extracted with appropriate overlap to avoid spatial autocorrelation.
Methodologically, the paper compares two families of approaches. First, end‑to‑end baselines: Mask R‑CNN (ResNet‑50 backbone) and Mask2Former (Swin‑L backbone), both pre‑trained on COCO and fine‑tuned on SelvaMask. Second, a modular pipeline where a detection model (DeepForest, DINO‑Swin‑L, or the newly trained SelvaBox) generates bounding‑box prompts that are fed into SAM‑v2 or SAM‑v3 to produce instance masks. SelvaBox, a DINO‑Swin‑L detector trained on tropical data, yields the strongest prompts.
Experimental results demonstrate that the modular detection‑prompting pipeline achieves an mRF1 of 0.68, surpassing Mask2Former’s 0.61 and the zero‑shot Detectree2 baseline (0.53). Cross‑site validation shows the pipeline maintains high performance (mRF1 ≈ 0.65) on the unseen Ecuador test zone, indicating strong out‑of‑distribution generalization. Performance stratified by crown size reveals notable gains for tiny crowns (0–9 m²), where the modular approach reaches an F1 of 0.62, approaching the inter‑annotator ceiling. Ablation studies assess the impact of different detectors, SAM versions, and the effect of SAM’s input‑size limitation (1024 × 1024), which forces down‑sampling of larger tiles and slightly degrades accuracy.
The authors acknowledge limitations: the need to resize tiles for SAM, the high cost of manual annotation, and the exclusive reliance on RGB imagery. They propose future work on semi‑automatic annotation tools, incorporation of LiDAR or multispectral data for multimodal fusion, and exploration of active‑learning strategies to reduce labeling effort.
In conclusion, SelvaMask provides the most extensive, densely annotated tropical crown dataset to date, accompanied by a rigorous benchmark (mRF1, size‑stratified analysis, spatial cross‑validation). The proposed modular detection‑prompting pipeline outperforms traditional end‑to‑end models and demonstrates robust generalization across tropical and temperate forests, positioning SelvaMask as a critical resource for advancing scalable, accurate forest monitoring worldwide. The code and dataset will be released publicly.
Comments & Academic Discussion
Loading comments...
Leave a Comment