DGGAN: Degradation Guided Generative Adversarial Network for Real-time Endoscopic Video Enhancement

February 18, 2026

Reading time: 5 minute

...

📝 Original Info

Title: DGGAN: Degradation Guided Generative Adversarial Network for Real-time Endoscopic Video Enhancement
ArXiv ID: 2512.07253
Date: 2025-12-08
Authors: Handing Xu, Zhenguo Nie, Tairan Peng, Huimin Pan, Xin-Jun Liu

📝 Abstract

Endoscopic surgery relies on intraoperative video, making image quality a decisive factor for surgical safety and efficacy. Yet, endoscopic videos are often degraded by uneven illumination, tissue scattering, occlusions, and motion blur, which obscure critical anatomical details and complicate surgical manipulation. Although deep learning-based methods have shown promise in image enhancement, most existing approaches remain too computationally demanding for real-time surgical use. To address this challenge, we propose a degradation-aware framework for endoscopic video enhancement, which enables real-time, high-quality enhancement by propagating degradation representations across frames. In our framework, degradation representations are first extracted from images using contrastive learning. We then introduce a fusion mechanism that modulates image features with these representations to guide a single-frame enhancement model, which is trained with a cycle-consistency constraint between degraded and restored images to improve robustness and generalization. Experiments demonstrate that our framework achieves a superior balance between performance and efficiency compared with several state-of-the-art methods. These results highlight the effectiveness of degradation-aware modeling for real-time endoscopic video enhancement. Nevertheless, our method suggests that implicitly learning and propagating degradation representation offer a practical pathway for clinical application.

💡 Deep Analysis

📄 Full Content

DGGAN: DEGRADATION GUIDED GENERATIVE ADVERSARIAL NETWORK FOR REAL-TIME ENDOSCOPIC VIDEO ENHANCEMENT Handing Xu, Zhenguo Nie ∗, Tairan Peng, Xin-Jun Liu Department of Mechanical Engineering, State Key Laboratory of Tribology in Advanced Equipment, Beijing Key Laboratory of Transformative High-end Manufacturing Equipment and Technology Tsinghua University Beijing, China {Handing Xu, Zhenguo Nie}xhd21@mails.tsinghua.edu.cn, zhenguonie@tsinghua.edu.cn Huimin Pan Department of Mechanical Engineering Tsinghua University Beijing, China ABSTRACT Endoscopic surgery relies on intraoperative video, making image quality a decisive factor for surgical safety and efficacy. Yet, endoscopic videos are often degraded by uneven illumination, tissue scatter- ing, occlusions, and motion blur, which obscure critical anatomical details and complicate surgical manipulation. Although deep learning-based methods have shown promise in image enhancement, most existing approaches remain too computationally demanding for real-time surgical use. To address this challenge, we propose a degradation-aware framework for endoscopic video enhance- ment, which enables real-time, high-quality enhancement by propagating degradation representations across frames. In our framework, degradation representations are first extracted from images using contrastive learning. We then introduce a fusion mechanism that modulates image features with these representations to guide a single-frame enhancement model, which is trained with a cycle-consistency constraint between degraded and restored images to improve robustness and generalization. Ex- periments demonstrate that our framework achieves a superior balance between performance and efficiency compared with several state-of-the-art methods. These results highlight the effectiveness of degradation-aware modeling for real-time endoscopic video enhancement. Nevertheless, our method suggests that implicitly learning and propagating degradation representation offer a practical pathway for clinical application. Keywords Real-time Video enhancement · Degradation representation · Cyclical consistency · Endoscopic video 1 Introduction Minimally invasive surgery (MIS) has become a cornerstone of modern clinical practice, offering reduced surgical trauma, shorter recovery times, and improved postoperative outcomes compared with traditional open procedures. Among various MIS techniques, endoscopic surgery plays a particularly critical role, as it enables surgeons to access deep or delicate anatomical regions through narrow working channels with minimal disruption of surrounding tissues. For example, in spine surgery, endoscopic approaches have been increasingly adopted for the treatment of conditions such as lumbar disc herniation, spinal stenosis, and degenerative diseases[1]. Unlike open spine procedures, spine endoscopy relies entirely on intraoperative video as the sole source of visual feedback, making video quality a decisive factor for surgical safety and efficacy. ∗Corresponding author arXiv:2512.07253v1 [cs.CV] 8 Dec 2025 Endoscopic Video Enhancement However, endoscopic videos are often far from ideal. The imaging environment inside the human body presents a variety of challenges: illumination is highly non-uniform due to directional light sources; optical scattering by tissues and fluids degrades image contrast; blood, smoke, and surgical instruments frequently occlude the field of view; and camera motion or limited depth of field can introduce blur. These degradations collectively compromise the visibility of fine anatomical structures, hinder accurate surgical manipulation, and may increase the risk of complications. In spine endoscopy, where the operative corridor is typically only a few millimeters in diameter and critical neural structures lie in close proximity, such limitations become particularly critical. Even subtle degradations in video clarity can obscure vital anatomical cues and complicate intraoperative decision-making. To address these challenges, video enhancement techniques have been widely investigated. Conventional approaches [2, 3, 4, 5] can provide modest improvements in brightness or contrast, but they often fail under severe degradations and may amplify noise or introduce artifacts [6]. More recently, deep learning-based methods have demonstrated remarkable success in image restoration tasks, including denoising, deblurring [7], and super-resolution [8, 9], by leveraging large-scale data and learning complex degradation models. These techniques have been gradually extended to the domain of endoscopic imaging, with promising results in improving surgical visibility. Nevertheless, several critical limitations remain. Most existing methods are designed and evaluated in offline settings, where computational efficiency is not a primary concern. As a result, they often involve complex models or iterative optimization procedures that are computationally expensive and unsuitable for depl

📄 Read Full PDF on ArXiv