EndoCaver: Handling Fog, Blur and Glare in Endoscopic Images via Joint Deblurring-Segmentation

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Endoscopic image analysis is vital for colorectal cancer screening, yet real-world conditions often suffer from lens fogging, motion blur, and specular highlights, which severely compromise automated polyp detection. We propose EndoCaver, a lightweight transformer with a unidirectional-guided dual-decoder architecture, enabling joint multi-task capability for image deblurring and segmentation while significantly reducing computational complexity and model parameters. Specifically, it integrates a Global Attention Module (GAM) for cross-scale aggregation, a Deblurring-Segmentation Aligner (DSA) to transfer restoration cues, and a cosine-based scheduler (LoCoS) for stable multi-task optimisation. Experiments on the Kvasir-SEG dataset show that EndoCaver achieves 0.922 Dice on clean data and 0.889 under severe image degradation, surpassing state-of-the-art methods while reducing model parameters by 90%. These results demonstrate its efficiency and robustness, making it well-suited for on-device clinical deployment. Code is available at https://github.com/ReaganWu/EndoCaver.

💡 Research Summary

The paper addresses a critical bottleneck in computer‑assisted colorectal cancer screening: endoscopic images are frequently degraded by lens fog, motion blur, and specular highlights, which dramatically reduce the reliability of automated polyp detection. Existing solutions either rely on heavyweight segmentation networks, multi‑encoder fusion schemes, cascaded de‑blurring‑segmentation pipelines, or complex joint‑learning frameworks, all of which incur substantial computational overhead and are unsuitable for real‑time clinical deployment. To overcome these limitations, the authors propose EndoCaver, a lightweight transformer‑based architecture that simultaneously restores degraded images and segments polyps using a unified end‑to‑end pipeline. The backbone is a MiT‑B0 encoder that extracts hierarchical features at four scales. A Global Attention Module (GAM) first resizes all encoder outputs to a common spatial resolution, averages them channel‑wise to reduce redundancy, and then applies multi‑head attention to capture global dependencies across scales. The enhanced feature maps are fed into two decoders: a D‑Decoder that reconstructs a de‑blurred, de‑fogged, and de‑glared image (Ĩ), and an S‑Decoder that predicts the segmentation mask (ĤM). Crucially, the Deblurring‑Segmentation Aligner (DSA) bridges the two tasks. It performs a first cross‑attention between segmentation queries and the GAM‑enhanced encoder features, then a second cross‑attention that injects the latent de‑blurring representation into the segmentation stream. This unidirectional guidance ensures that restoration cues directly inform the segmentation process without the need for costly bidirectional connections. Training is performed with a joint loss consisting of an L2 reconstruction term and a Dice‑based segmentation term. The relative weighting w_seg(t) follows a cosine‑annealing schedule (LoCoS), which emphasizes reconstruction early in training and gradually shifts focus to segmentation, stabilizing multi‑task optimization. Experiments are conducted on the Kvasir‑SEG dataset (900 images, 80/20 split) with synthetic degradations that mimic real‑world fog, blur, and glare. EndoCaver achieves a Dice score of 0.922 on clean images and 0.889 under severe degradation, outperforming state‑of‑the‑art models such as SegFormer‑B5 (0.919/0.862) while using only 7.8 M parameters and 11.9 GMAC—over 90 % fewer parameters and dramatically lower FLOPs. Out‑of‑distribution evaluation on CVC‑ClinicDB and CVC‑ColonDB yields Dice scores of 0.782 and 0.702 respectively, confirming robust generalization. Ablation studies demonstrate that removing LoCoS, DSA, or GAM each degrades performance, and that the de‑blurring branch is essential: without it, Dice drops to 0.823. Qualitative visualizations show that EndoCaver restores fine structural details and produces cleaner segmentation boundaries, reducing both false positives and missed regions. In summary, EndoCaver delivers a compact, efficient, and robust solution for joint deblurring and polyp segmentation, making it well‑suited for on‑device deployment in endoscopic suites and paving the way for real‑time AI assistance in colorectal cancer screening.

EndoCaver: Handling Fog, Blur and Glare in Endoscopic Images via Joint Deblurring-Segmentation

💡 Research Summary

Comments & Academic Discussion

Leave a Comment