Glance and Focus Reinforcement for Pan-cancer Screening

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Pan-cancer screening in large-scale CT scans remains challenging for existing AI methods, primarily due to the difficulty of localizing diverse types of tiny lesions in large CT volumes. The extreme foreground-background imbalance significantly hinders models from focusing on diseased regions, while redundant focus on healthy regions not only decreases the efficiency but also increases false positives. Inspired by radiologists’ glance and focus diagnostic strategy, we introduce GF-Screen, a Glance and Focus reinforcement learning framework for pan-cancer screening. GF-Screen employs a Glance model to localize the diseased regions and a Focus model to precisely segment the lesions, where segmentation results of the Focus model are leveraged to reward the Glance model via Reinforcement Learning (RL). Specifically, the Glance model crops a group of sub-volumes from the entire CT volume and learns to select the sub-volumes with lesions for the Focus model to segment. Given that the selecting operation is non-differentiable for segmentation training, we propose to employ the segmentation results to reward the Glance model. To optimize the Glance model, we introduce a novel group relative learning paradigm, which employs group relative comparison to prioritize high-advantage predictions and discard low-advantage predictions within sub-volume groups, not only improving efficiency but also reducing false positives. In this way, for the first time, we effectively extend cutting-edge RL techniques to tackle the specific challenges in pan-cancer screening. Extensive experiments on 16 internal and 7 external datasets across 9 lesion types demonstrated the effectiveness of GF-Screen. Notably, GF-Screen leads the public validation leaderboard of MICCAI FLARE25 pan-cancer challenge, surpassing the FLARE24 champion solution by a large margin (+25.6% DSC and +28.2% NSD).

💡 Research Summary

**
The paper introduces GF‑Screen, a novel “Glance‑and‑Focus” reinforcement learning (RL) framework designed to tackle the long‑standing challenges of pan‑cancer screening in large‑scale computed tomography (CT) volumes. Traditional AI approaches for cancer detection suffer from two intertwined problems: (1) extreme foreground‑background imbalance, where lesions occupy less than 0.1 % of the total voxel space, causing models to over‑fit to the abundant healthy tissue and miss tiny lesions; and (2) inefficiency, because most existing pipelines apply a sliding‑window segmentation over the entire volume, wasting computation on regions that contain no pathology and inflating false‑positive rates.

GF‑Screen mimics the diagnostic workflow of radiologists who first “glance” at the whole scan to locate suspicious regions and then “focus” on those regions for detailed analysis. The system comprises two cooperating components: a lightweight Glance model that classifies each cropped sub‑volume as “lesion‑present” or “lesion‑absent”, and a high‑capacity Focus model that performs pixel‑wise segmentation on the selected sub‑volumes. The key novelty lies in training the Glance model with RL: the Glance model acts as a policy network that decides whether to keep or discard each sub‑volume, while the Focus model provides a reward signal derived from its segmentation quality (e.g., Dice score, surface distance). Because the selection operation is non‑differentiable, standard back‑propagation cannot be used; instead, policy gradients (e.g., REINFORCE) are employed to update the Glance policy based on the reward.

To further address class imbalance and the variability of sub‑volume views, the authors propose a “Group Relative Learning” paradigm. Sub‑volumes are grouped (e.g., by spatial proximity), and within each group the advantage of each sub‑volume is computed relative to its peers. High‑advantage sub‑volumes (those yielding good segmentation) receive amplified gradient updates, while low‑advantage ones are suppressed. This relative comparison encourages the Glance model to prioritize optimal diagnostic views—full lesion coverage and favorable orientation—over partial or poorly angled views that would otherwise degrade segmentation performance.

Training proceeds in two stages. First, all sub‑volumes are fed to the Focus model, which is supervised with a combination of binary cross‑entropy and Dice losses using ground‑truth masks. Simultaneously, the Glance model samples actions from its softmax policy, receives the segmentation‑based reward, and updates its parameters via policy gradient. During inference, the Glance model rapidly filters out healthy sub‑volumes, passing only those it predicts as lesion‑containing to the Focus model. This selective pipeline reduces the number of forward passes through the heavy segmentation network by a factor of 5.7 on average, dramatically improving runtime without sacrificing accuracy.

The authors evaluate GF‑Screen on an unprecedentedly large pan‑cancer dataset: 5,117 CT scans covering nine lesion types (lung tumor, lung nodule, COVID‑19, pleural effusion, liver tumor, pancreas tumor, kidney tumor, adrenal carcinoma, colon tumor) collected from 23 public and internal sources (16 internal, 7 external). Across these heterogeneous datasets, GF‑Screen consistently outperforms state‑of‑the‑art baselines such as nnUNet, SwinUNETR, and PASTA. On the MICCAI FLARE25 public validation leaderboard, GF‑Screen achieves a Dice coefficient improvement of +25.6 % and a Normalized Surface Distance (NSD) gain of +28.2 % over the previous year’s champion (FLARE24). Moreover, the method reduces false positives and computational cost simultaneously, addressing both major bottlenecks of prior work.

Limitations are acknowledged. The binary sub‑volume selection may miss lesions that are split across multiple sub‑volumes, especially when lesions are extremely small relative to the chosen crop size. The RL reward design, while effective, can be sensitive to hyper‑parameters and may require careful tuning for new datasets. Future directions proposed include multi‑scale Glance models, sequential decision policies (e.g., Transformer‑based agents) that can adaptively adjust crop size and stride, and integration of self‑supervised pre‑training or large medical vision‑language models to enrich the reward signal.

In summary, GF‑Screen represents a significant step forward for AI‑assisted cancer screening: it combines a biologically inspired two‑stage workflow with modern reinforcement learning to overcome foreground‑background imbalance, improve detection sensitivity, and achieve unprecedented efficiency on massive, multi‑lesion CT datasets. The results demonstrate that reinforcement learning can be successfully extended from classification‑centric vision tasks to dense medical image parsing, opening new avenues for intelligent, resource‑aware diagnostic systems.

Glance and Focus Reinforcement for Pan-cancer Screening

💡 Research Summary

Comments & Academic Discussion

Leave a Comment