Benchmarking Foundation Models for Mitotic Figure Classification

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The performance of deep learning models is known to scale with data quantity and diversity. In pathology, as in many other medical imaging domains, the availability of labeled images for a specific task is often limited. Self-supervised learning techniques have enabled the use of vast amounts of unlabeled data to train large-scale neural networks, i.e., foundation models, that can address the limited data problem by providing semantically rich feature vectors that can generalize well to new tasks with minimal training effort increasing model performance and robustness. In this work, we investigate the use of foundation models for mitotic figure classification. The mitotic count, which can be derived from this classification task, is an independent prognostic marker for specific tumors and part of certain tumor grading systems. In particular, we investigate the data scaling laws on multiple current foundation models and evaluate their robustness to unseen tumor domains. Next to the commonly used linear probing paradigm, we also adapt the models using low-rank adaptation (LoRA) of their attention mechanisms. We compare all models against end-to-end-trained baselines, both CNNs and Vision Transformers. Our results demonstrate that LoRA-adapted foundation models provide superior performance to those adapted with standard linear probing, reaching performance levels close to 100% data availability with only 10% of training data. Furthermore, LoRA-adaptation of the most recent foundation models almost closes the out-of-domain performance gap when evaluated on unseen tumor domains. However, full fine-tuning of traditional architectures still yields competitive performance.

💡 Research Summary

This paper presents a systematic benchmark of self‑supervised learning (SSL)–based foundation models for the task of mitotic figure classification in computational pathology. Recognizing that annotated pathology data are scarce, costly, and subject to inter‑observer variability, the authors investigate whether large‑scale vision transformers pre‑trained on billions of unlabeled tiles can serve as robust feature extractors for downstream mitosis detection. They evaluate several state‑of‑the‑art foundation models of varying size—ViT‑B, ViT‑L, ViT‑H, ViT‑G, and the recent H‑optimus‑0—each trained with modern SSL methods such as DINOv2, iBOT, or MoCo‑v3.

Two publicly available mitosis datasets (MIDOG++ and ICPR) are used as the primary benchmarks, and additional out‑of‑domain tumor types (e.g., canine mast cell tumors, human soft‑tissue sarcomas) serve to assess generalization across unseen domains. The experimental protocol varies the proportion of labeled training data from 1 % to 100 % and compares two adaptation strategies: (1) linear probing, where a simple linear classifier is trained on frozen embeddings, and (2) Low‑Rank Adaptation (LoRA), which injects trainable low‑rank matrices into the attention layers while keeping the bulk of the model frozen.

Key findings include: (i) All foundation models outperform conventional end‑to‑end‑trained CNNs (ResNet‑50) and ViTs (DeiT) even when only a linear head is trained, confirming the richness of the learned representations. (ii) LoRA consistently improves performance over linear probing by 3–7 % absolute F1 score across all data regimes, despite updating less than 0.1 % of the total parameters. (iii) Scaling laws reveal a logarithmic‑linear relationship between the amount of labeled data and performance; larger models (ViT‑G, H‑optimus‑0) reach near‑saturation with as little as 10 % of the full training set, achieving >95 % of the maximum F1 score. (iv) In out‑of‑domain evaluation, LoRA‑adapted models reduce the domain gap by roughly 4 % compared with linear probing, and the biggest models close the gap to under 10 %, indicating strong cross‑domain robustness. (v) Full fine‑tuning of traditional architectures can match or slightly exceed LoRA‑adapted foundation models but at a substantially higher computational cost and memory footprint.

The authors also discuss practical implications: LoRA offers a parameter‑efficient way to tailor massive pre‑trained models to niche medical tasks without the need for extensive labeled data or expensive GPU resources. They acknowledge limitations, such as the predominance of animal tumor data and the lack of prospective clinical validation, and suggest future work on multimodal foundation models and integration into real‑time slide‑analysis pipelines.

In conclusion, the study demonstrates that foundation models, when adapted with low‑rank techniques, provide a highly effective and resource‑efficient solution for mitotic figure classification, achieving near‑full‑data performance with only a fraction of the annotations and exhibiting strong resilience to domain shifts. This approach holds promise for broader adoption across diverse medical imaging tasks where labeled data are limited.

Benchmarking Foundation Models for Mitotic Figure Classification

💡 Research Summary

Comments & Academic Discussion

Leave a Comment