Ice-FMBench: A Foundation Model Benchmark for Sea Ice Type Segmentation

Ice-FMBench: A Foundation Model Benchmark for Sea Ice Type Segmentation
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Accurate segmentation and mapping of sea ice types is crucial for safe polar navigation, offshore operations, and climate monitoring. While deep learning has demonstrated strong potential for automating sea ice type segmentation, its success often relies on access to extensive expert labeled datasets, which is both resource intensive and time consuming to create. However, foundation models (FMs), recently developed through self-supervised training on large-scale datasets, have demonstrated impressive performance. Nevertheless, their applicability to sea ice type segmentation based on Synthetic Aperture Radar (SAR) imagery remains uncertain due to the unique challenges posed by sea ice such as intricate geophysical patterns, pronounced seasonal variability, and SAR-specific artifacts like banding, scalloping, and heterogeneous backscatter as well as the fact that SAR data in polar regions are often acquired using specialized sensor modes that differ markedly from those used to collect FM training data at lower latitudes, limiting their direct transferability to polar environments. To address this gap, we contribute: (1) IceFMBench, a comprehensive benchmark framework for evaluation of the state-of-the-art remote sensing FMs on the sea ice type segmentation task using Sentinel1 SAR imagery, where IceFMBench is composed of a widely used standardized dataset, diverse evaluation metrics, and a representative set of selected remote sensing FM models suitable for sea ice type segmentation, with the ability to include new models side by side the existing models; (2) an extensive comparative evaluation of the representative FMs using IceFMBench, with additional case studies to assess performance of the top-performing model in terms of transferability across temporal and spatial domains and (3) a multi teacher knowledge distillation approach to address lack of spatiotemporal transferability.


💡 Research Summary

The paper introduces IceFMBench, a dedicated benchmark for evaluating foundation models (FMs) on the task of sea‑ice type segmentation using Sentinel‑1 synthetic aperture radar (SAR) imagery. Recognizing that accurate ice‑type maps are essential for polar navigation, offshore operations, and climate monitoring, the authors note that while deep learning—particularly U‑Net variants—has shown strong performance, it remains heavily dependent on large, expertly labeled datasets that are costly and scarce in polar regions. Recent self‑supervised foundation models, trained on massive unlabeled collections, promise to alleviate label scarcity, yet most have been pre‑trained on optical or low‑latitude data and have not been systematically tested on polar SAR data, which exhibits unique challenges such as speckle, banding, heterogeneous backscatter, and sensor‑mode differences (e.g., Sentinel‑1 Extra‑Wide mode in the Arctic).

IceFMBench addresses this gap through three contributions. First, it assembles a standardized dataset based on the AI4Arctic Sea Ice Challenge, selecting 512 Sentinel‑1 EW‑mode scenes for training and 20 for testing, covering January 2018–December 2021 across 16 Greenland‑adjacent regions. This dataset captures seasonal and regional variability, providing a realistic testbed for spatial and temporal transferability. Second, the benchmark evaluates eleven state‑of‑the‑art remote‑sensing foundation models that satisfy three criteria: (i) relevance to SAR (either SAR‑specific or multi‑modal with SAR channels), (ii) strong performance on established remote‑sensing benchmarks, and (iii) use of modern self‑supervised pre‑training (contrastive learning, masked image modeling, or hybrids). The model roster includes RVSA, CMID, the Prithvi family (100M–600M parameters), CROMA, DINO‑MM, DOFA, FG‑MAE, SegMunich, SARA‑TR‑X, and others, spanning ViT, Swin, and hierarchical ViT backbones, with pre‑training datasets ranging from BigEarthNet to SSL4EO‑S12.

Comprehensive experiments compare these FMs against strong baselines (U‑Net, DeepLabv3+, multitask variants) using metrics such as overall accuracy, mean Intersection‑over‑Union (mIoU), per‑class IoU, and boundary F‑score. Results show that most FMs surpass the baselines, especially those incorporating SAR during pre‑training (CROMA, DOFA, SARA‑TR‑X), which better handle speckle and texture nuances. Optical‑dominant models (RVSA, CMID) achieve high overall accuracy but suffer larger drops in class‑boundary delineation and temporal transfer. Sensitivity analysis reveals that self‑supervised models retain reasonable performance even with as little as 10 % of the labeled data, whereas performance degrades sharply for conventional supervised models when training data are scarce.

To improve spatiotemporal generalization, the authors propose a multi‑teacher knowledge distillation (KD) framework. Expert “teacher” models are fine‑tuned on specific years or regions (e.g., a model specialized for 2019 Arctic, another for 2020 Greenland). Their soft predictions are aggregated to train a compact “student” model. The distilled student, with fewer than 30 % of the teachers’ parameters, matches or exceeds the teachers’ average mIoU while exhibiting markedly better transfer to unseen years and regions (e.g., 2022 Arctic data). This approach demonstrates that knowledge from multiple specialized FM experts can be consolidated into an efficient model suitable for operational deployment.

The paper concludes by outlining future directions: expanding the benchmark to include Southern Ocean and multi‑sensor (optical, SAR, LiDAR) data, exploring larger-scale pre‑training on polar‑specific SAR archives, and developing ultra‑lightweight models for real‑time on‑board inference. IceFMBench thus fills a critical void in the remote‑sensing community by providing a rigorous, reproducible platform for assessing foundation models in polar SAR contexts, offering insights into model selection, fine‑tuning strategies, and transfer learning techniques that can accelerate reliable sea‑ice monitoring.


Comments & Academic Discussion

Loading comments...

Leave a Comment