Histo-Miner: Deep learning based tissue features extraction pipeline from H&E whole slide images of cutaneous squamous cell carcinoma

Histo-Miner: Deep learning based tissue features extraction pipeline from H&E whole slide images of cutaneous squamous cell carcinoma
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Recent advancements in digital pathology have enabled comprehensive analysis of Whole-Slide Images (WSI) from tissue samples, leveraging high-resolution microscopy and computational capabilities. Despite this progress, there is a lack of labeled datasets and open source pipelines specifically tailored for analysis of skin tissue. Here we propose Histo-Miner, a deep learning-based pipeline for analysis of skin WSIs and generate two datasets with labeled nuclei and tumor regions. We develop our pipeline for the analysis of patient samples of cutaneous squamous cell carcinoma (cSCC), a frequent non-melanoma skin cancer. Utilizing the two datasets, comprising 47,392 annotated cell nuclei and 144 tumor-segmented WSIs respectively, both from cSCC patients, Histo-Miner employs convolutional neural networks and vision transformers for nucleus segmentation and classification as well as tumor region segmentation. Performance of trained models positively compares to state of the art with multi-class Panoptic Quality (mPQ) of 0.569 for nucleus segmentation, macro-averaged F1 of 0.832 for nucleus classification and mean Intersection over Union (mIoU) of 0.907 for tumor region segmentation. From these predictions we generate a compact feature vector summarizing tissue morphology and cellular interactions, which can be used for various downstream tasks. Here, we use Histo-Miner to predict cSCC patient response to immunotherapy based on pre-treatment WSIs from 45 patients. Histo-Miner identifies percentages of lymphocytes, the granulocyte to lymphocyte ratio in tumor vicinity and the distances between granulocytes and plasma cells in tumors as predictive features for therapy response. This highlights the applicability of Histo-Miner to clinically relevant scenarios, providing direct interpretation of the classification and insights into the underlying biology.


💡 Research Summary

The manuscript introduces Histo‑Miner, an end‑to‑end deep‑learning pipeline specifically designed for the analysis of hematoxylin‑and‑eosin (H&E) whole‑slide images (WSIs) of cutaneous squamous cell carcinoma (cSCC). Recognizing the scarcity of skin‑focused annotated datasets and the unique histological challenges posed by skin tissue, the authors first created two publicly available resources: (1) NucSeg, comprising 47,392 manually annotated nuclei across 21 WSIs, labeled into five cell types (granulocytes, lymphocytes, plasma cells, stromal cells, tumor cells); and (2) TumSeg, containing binary tumor‑region masks for 144 WSIs from 125 patients collected at three German medical centers. Annotation quality was ensured through dual‑expert review and, where ambiguous, immunohistochemistry validation.

For nucleus segmentation and classification, the authors employed Hovernet, a convolutional neural network architecture that simultaneously predicts instance masks and cell‑type maps. Data augmentation (rotation, color jitter, scaling) was applied to improve robustness. For tumor region segmentation, a Vision Transformer‑based model (named SCC Segmen ter) was trained on the TumSeg set, leveraging patch‑wise processing and ImageNet‑based color normalization. Both models were evaluated with rigorous cross‑validation and an independent test set.

Performance metrics demonstrate state‑of‑the‑art results: nucleus segmentation achieved a multi‑class Panoptic Quality (mPQ) of 0.569, nucleus classification reached a macro‑averaged F1‑score of 0.832 across six classes (including a post‑hoc “non‑neoplastic epithelial” class added by re‑labeling tumor‑predicted nuclei outside the tumor mask), and tumor segmentation obtained a mean Intersection‑over‑Union (mIoU) of 0.907. These numbers are comparable to or exceed those reported for leading H&E segmentation frameworks such as Cellpose, Stardist, and other recent transformer‑based methods.

Beyond pixel‑level predictions, Histo‑Miner extracts a compact, 317‑dimensional feature vector that encodes tissue morphology, cellular composition, and spatial interactions. Features include cell‑type percentages overall and within tumor regions, density‑based metrics, and for every ordered pair of cell types the average nearest‑neighbor distance within tumor areas. By using percentages and distances rather than raw counts, the representation is invariant to slide size and scanning resolution. The resulting JSON file compresses a multi‑gigabyte WSI into ~3.7 KB, enabling efficient storage, sharing, and downstream machine‑learning applications.

To showcase clinical relevance, the authors applied Histo‑Miner to a cohort of 45 cSCC patients treated with anti‑PD‑1 checkpoint inhibitors. Pre‑treatment WSIs were processed, features were fed into a gradient‑boosted decision tree classifier, and 5‑fold cross‑validation yielded a mean area under the ROC curve (AUC) of 0.755 ± 0.091. Model‑interpretability analyses (SHAP) identified three key predictive biomarkers: (i) the proportion of lymphocytes in the tumor vicinity, (ii) the granulocyte‑to‑lymphocyte ratio within tumor regions, and (iii) the mean distance between granulocytes and plasma cells inside tumors. These findings suggest that the spatial organization of immune cells, rather than mere abundance, drives response to immunotherapy in cSCC.

The pipeline is modular: tumor segmentation, nucleus segmentation/classification, and feature extraction can be run independently or combined, and all code, pretrained weights, and the two datasets are released under an open‑source license on GitHub and Zenodo. This openness facilitates reproducibility, adaptation to other cancer types, and integration into existing digital pathology workflows.

In summary, Histo‑Miner fills a critical gap in skin‑cancer digital pathology by providing (1) high‑quality, publicly available annotated datasets, (2) robust deep‑learning models that achieve competitive segmentation and classification performance, (3) a systematic method to translate massive WSIs into concise, biologically meaningful feature vectors, and (4) a proof‑of‑concept clinical application that links histomorphological patterns to immunotherapy outcomes. The work represents a substantial step toward interpretable, scalable, and clinically actionable AI in dermatopathology.


Comments & Academic Discussion

Loading comments...

Leave a Comment