AtlasPatch: Efficient Tissue Detection and High-throughput Patch Extraction for Computational Pathology at Scale

AtlasPatch: Efficient Tissue Detection and High-throughput Patch Extraction for Computational Pathology at Scale
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Whole-slide image (WSI) preprocessing, comprising tissue detection followed by patch extraction, is foundational to AI-driven computational pathology but remains a major bottleneck for scaling to large and heterogeneous cohorts. We present AtlasPatch, a scalable framework that couples foundation-model tissue detection with high-throughput patch extraction at minimal computational overhead. Our tissue detector achieves high precision (0.986) and remains robust across varying tissue conditions (e.g., brightness, fragmentation, boundary definition, tissue heterogeneity) and common artifacts (e.g., pen/ink markings, scanner streaks). This robustness is enabled by our annotated, heterogeneous multi-cohort training set of ~30,000 WSI thumbnails combined with efficient adaptation of the Segment-Anything (SAM) model. AtlasPatch also reduces end-to-end WSI preprocessing time by up to 16$\times$ versus widely used deep-learning pipelines, without degrading downstream task performance. The AtlasPatch tool is open-source, efficiently parallelized for practical deployment, and supports options to save extracted patches or stream them into common feature-extraction models for on-the-fly embedding, making it adaptable to both pathology departments (tissue detection and quality control) and AI researchers (dataset creation and model training). AtlasPatch software package is available at https://github.com/AtlasAnalyticsLab/AtlasPatch.


💡 Research Summary

Whole‑slide images (WSIs) are the cornerstone of modern computational pathology, yet their gigapixel size means that most of the image consists of background and only a small fraction contains diagnostically relevant tissue. Consequently, every analysis pipeline must first detect tissue regions and then extract smaller patches for downstream tasks such as tumor detection, grading, or prognosis. Existing solutions either rely on simple color‑threshold heuristics, which are fast but fragile to stain, illumination, and artifact variations, or on deep‑learning segmenters like U‑Net, which are more robust but require thousands of forward passes per slide and thus become a bottleneck at scale.

AtlasPatch addresses these limitations with a four‑component, end‑to‑end framework: (1) a tissue detector built on the Segment‑Anything 2 (SAM2) model, (2) high‑throughput patch‑coordinate generation, (3) optional on‑the‑fly embedding using popular medical or general‑purpose encoders, and (4) optional patch image export. The authors assembled a heterogeneous corpus of roughly 36 000 WSI thumbnails drawn from four institutions (CHUM, TCGA, Radboud UMC, Karolinska) covering multiple organs, scanners, and staining protocols. Semi‑automatic annotation was performed in Labelbox: an initial automated mask was refined by expert annotators and then quality‑controlled by a senior pathologist, yielding about 30 000 high‑quality tissue‑vs‑background mask pairs. This dataset captures a wide range of tissue coverage, fragmentation, boundary contrast, brightness, hue entropy, and colorfulness, ensuring that the detector sees realistic variability.

For model adaptation, the authors fine‑tuned only the normalization layers of the hier‑tiny variant of SAM2, which represents less than 0.1 % of the total parameters. This lightweight fine‑tuning preserves the powerful backbone learned from massive internet‑scale data while dramatically reducing memory and compute requirements. Training on the thumbnail dataset yields a detector that operates directly on low‑resolution thumbnails (≈2 k × 2 k pixels) with a precision of 0.986 and an IoU above 0.94 on a held‑out test set. The detector reliably separates tissue from background even in challenging cases such as low‑contrast edges, fragmented biopsies, and common artifacts like ink markings or scanner streaks.

After detection, the pipeline vectorizes the contour, extrapolates it to the desired magnification using the WSI pyramid metadata, and generates a grid of patch coordinates entirely in contour space. This approach prunes background and artifacts before any image I/O occurs, dramatically reducing the number of patches that need to be read from disk. Coordinate generation and patch I/O are parallelized across multiple CPU cores, while optional embedding can be GPU‑accelerated using encoders such as ViT‑B, ResNet‑50, or specialized medical image models. The entire workflow is exposed through a simple Python API and can be run in a fully parallel fashion, making it suitable for both on‑premise clusters and cloud environments.

Performance was benchmarked against a suite of widely used tools: threshold‑based methods (HistoQC, TIA Toolbox, dplabtools, EntropyMasker), a zero‑shot SAM2 baseline, and recent deep‑learning pipelines (Trident‑GrandQC, Trident‑Hest). Qualitative visual comparisons show that AtlasPatch’s masks align closely with ground truth across all cohorts, while the competing methods either miss tissue in low‑contrast regions or erroneously include artifacts. Quantitatively, AtlasPatch outperforms all baselines in precision, recall, and F1‑score.

Downstream, the authors evaluated multi‑instance learning (MIL) classifiers on four organ‑specific tasks (kidney, lung, breast, colorectal) using patches extracted by AtlasPatch versus patches from conventional pipelines. Classification accuracy was statistically indistinguishable, and in some cases marginally higher, demonstrating that the thumbnail‑based detection does not sacrifice diagnostic information. Crucially, the end‑to‑end preprocessing time was reduced by up to 16× compared with the best competing deep‑learning pipelines, turning what is often the dominant cost in foundation‑model training into a minor overhead.

AtlasPatch is released as open‑source software (https://github.com/AtlasAnalyticsLab/AtlasPatch) with Docker and Conda installation options, comprehensive documentation, and example notebooks. Its modular design allows easy swapping of embedding models, addition of color‑normalization steps, or integration into larger training pipelines. By providing a robust, fast, and scalable tissue‑detection and patch‑extraction engine, AtlasPatch removes a major bottleneck for large‑scale computational pathology, enabling researchers and clinical laboratories to build and deploy foundation models on heterogeneous, multi‑institutional cohorts with far lower time and cost.


Comments & Academic Discussion

Loading comments...

Leave a Comment