Iterative annotation to ease neural network training: Specialized machine learning in medical image analysis
Neural networks promise to bring robust, quantitative analysis to medical fields, but adoption is limited by the technicalities of training these networks. To address this translation gap between medical researchers and neural networks in the field of pathology, we have created an intuitive interface which utilizes the commonly used whole slide image (WSI) viewer, Aperio ImageScope (Leica Biosystems Imaging, Inc.), for the annotation and display of neural network predictions on WSIs. Leveraging this, we propose the use of a human-in-the-loop strategy to reduce the burden of WSI annotation. We track network performance improvements as a function of iteration and quantify the use of this pipeline for the segmentation of renal histologic findings on WSIs. More specifically, we present network performance when applied to segmentation of renal micro compartments, and demonstrate multi-class segmentation in human and mouse renal tissue slides. Finally, to show the adaptability of this technique to other medical imaging fields, we demonstrate its ability to iteratively segment human prostate glands from radiology imaging data.
💡 Research Summary
This paper introduces HAI L (Human AI Loop), an intuitive, iterative annotation pipeline that bridges the gap between pathologists’ workflow and state‑of‑the‑art deep‑learning segmentation models. By integrating the widely used whole‑slide image viewer Aperio ImageScope with the semantic segmentation network DeepLab v2, the authors enable experts to draw annotations in ImageScope’s native XML format, automatically convert those annotations to pixel masks for training, and then convert model predictions back to XML for immediate visual inspection and correction. The entire process is orchestrated by a single Python script, requiring only a predefined folder structure for data management.
The core training strategy follows an active‑learning, human‑in‑the‑loop paradigm. An initial small set of manual annotations (iteration 0) is used to train a baseline model. The model’s predictions on new whole‑slide images are then displayed in ImageScope, where experts correct errors. Corrected regions are added to the training set, and the model is retrained. This loop is repeated several times. In the case study of mouse kidney glomeruli segmentation, five iterations were performed, incorporating both PAS‑ and H&E‑stained sections and, in later iterations, tissue from streptozotocin‑induced diabetic nephropathy to increase variability. Annotation speed increased 4‑ to 10‑fold across three annotators, corresponding to 71‑82 % time savings, while the model’s F1 score approached near‑perfect performance by iteration 4.
To address the computational burden of processing gigapixel whole‑slide images, the authors devised a multi‑resolution “DeepZoom” approach. A low‑resolution network (1/16 scale) first identifies candidate “hot‑spot” regions; a high‑resolution network then performs fine‑grained segmentation only within those regions. This two‑stage pipeline yields a 4.5× speedup in inference time compared with a full‑resolution pass, while improving the overall F1 score by reducing false positives through the pre‑filtering step. Performance on four hold‑out slides after five iterations was reported as sensitivity 0.92 ± 0.02, specificity 0.99 ± 0.001, precision 0.93 ± 0.14, and accuracy 0.99 ± 0.001.
Beyond binary glomerular segmentation, the framework was extended to multi‑class tasks: (1) distinguishing podocyte versus non‑podocyte nuclei within glomeruli using immunofluorescence‑labeled data, (2) segmenting interstitial fibrosis and tubular atrophy (IFTA) in human renal biopsies, and (3) differentiating sclerotic from non‑sclerotic glomeruli. Although the IFTA task suffered from limited training data (15 biopsies) and required a single‑resolution pass due to the non‑sparse nature of the lesions, preliminary results were promising.
The authors also demonstrated the modality‑agnostic nature of HAI L by applying it to prostate gland segmentation in T2‑weighted MRI. A dataset of 39 patients (average 32 slices per patient) was iteratively expanded by adding four new patients per iteration. After five iterations, the model achieved sensitivity 0.88 ± 0.04, specificity 0.99 ± 0.001, precision 0.90 ± 0.03, and accuracy 0.99 ± 0.001. Annotation time dropped by roughly 90 % after the second iteration, with only 10 % of slices falling below a predefined performance threshold.
Key contributions of the work include: (i) seamless integration of expert annotation tools with deep‑learning pipelines, eliminating the need for custom GUIs; (ii) a practical active‑learning loop that quantifies annotation efficiency gains and provides immediate visual feedback on model performance; (iii) a scalable multi‑resolution inference strategy (DeepZoom) that accelerates processing of large, sparsely annotated images without sacrificing accuracy; and (iv) proof‑of‑concept extensions to diverse histopathology tasks and to radiology imaging.
The authors acknowledge that ultimate performance is bounded by the quality and quantity of training data, but argue that HAI L accelerates convergence toward the theoretical optimum by dramatically reducing the manual labeling bottleneck. Future directions include integration with DICOM viewers for radiology workflows, automated prediction of optimal annotation batch size per iteration, confidence‑based anomaly flagging, and broader clinical validation studies. In sum, HAI L offers a practical, extensible solution for bringing deep‑learning‑driven quantitative analysis into routine pathology and medical imaging practice.
Comments & Academic Discussion
Loading comments...
Leave a Comment