AquaDiff: Diffusion-Based Underwater Image Enhancement for Addressing Color Distortion
Underwater images are severely degraded by wavelength-dependent light absorption and scattering, resulting in color distortion, low contrast, and loss of fine details that hinder vision-based underwater applications. To address these challenges, we propose AquaDiff, a diffusion-based underwater image enhancement framework designed to correct chromatic distortions while preserving structural and perceptual fidelity. AquaDiff integrates a chromatic prior-guided color compensation strategy with a conditional diffusion process, where cross-attention dynamically fuses degraded inputs and noisy latent states at each denoising step. An enhanced denoising backbone with residual dense blocks and multi-resolution attention captures both global color context and local details. Furthermore, a novel cross-domain consistency loss jointly enforces pixel-level accuracy, perceptual similarity, structural integrity, and frequency-domain fidelity. Extensive experiments on multiple challenging underwater benchmarks demonstrate that AquaDiff provides good results as compared to the state-of-the-art traditional, CNN-, GAN-, and diffusion-based methods, achieving superior color correction and competitive overall image quality across diverse underwater conditions.
💡 Research Summary
AquaDiff addresses the long‑standing challenge of underwater image degradation caused by wavelength‑dependent absorption and scattering, which lead to severe color casts, low contrast, and loss of fine details. While traditional model‑free and physics‑based methods rely on handcrafted priors that often fail to generalize, and existing CNN/GAN approaches produce deterministic, sometimes over‑smoothed results, recent diffusion models offer a promising generative framework but have not been fully adapted to the specific needs of underwater imaging.
The proposed system combines a physics‑inspired chromatic prior with a conditional diffusion process. First, a chromatic‑prior guided compensation map y is generated by estimating depth‑dependent attenuation coefficients and applying inverse color correction to the degraded input. This map explicitly encodes the expected color shift for each pixel, providing a strong conditioning signal for the diffusion model.
A forward diffusion process corrupts a clean reference image x₀ into a noisy latent x_T through a standard Gaussian Markov chain. The reverse process is realized by a denoising network that, at each timestep t, receives the noisy latent x_t, the compensation map y, and the timestep embedding. Crucially, cross‑attention (⊗Cross‑Att) fuses x_t and y dynamically: high‑noise steps emphasize global color context from y, while low‑noise steps focus on preserving local structural details.
The denoiser architecture extends the classic DDPM U‑Net with Residual Dense Blocks (RDB) and Multi‑Resolution Attention (MRA) modules. RDBs promote feature reuse and improve robustness to color variations, whereas MRA captures both coarse color context and fine‑grained textures across scales. Skip connections further safeguard low‑level details.
Training is guided by a novel cross‑domain consistency loss that aggregates four complementary terms: (1) L₁ pixel loss for absolute color fidelity, (2) VGG‑based perceptual loss for visual similarity, (3) SSIM loss to retain structural edges and contrast, and (4) a frequency‑domain loss that penalizes the discrepancy of high‑frequency Fourier components, thereby mitigating the typical diffusion‑induced blurring. The weighted sum of these terms enforces simultaneous preservation of color, structure, and texture.
Extensive experiments were conducted on four public underwater benchmarks—TEST‑U90, U45, S16, and C60—covering a wide range of turbidity, depth, and lighting conditions. Quantitative metrics (UIQM, UCIQE, PSNR, SSIM) show that AquaDiff consistently outperforms state‑of‑the‑art traditional, CNN, GAN, and prior diffusion methods, achieving 3–5 % higher UIQM/UCIQE scores and comparable or better PSNR/SSIM. Qualitative results demonstrate natural color restoration without over‑correction and sharp preservation of edges and textures.
A lightweight inference variant reduces the number of sampling steps from 100 to 50, cutting runtime by roughly 40% while maintaining visual quality, indicating potential for near‑real‑time deployment. Nonetheless, the model remains computationally intensive, and its reliance on an accurate chromatic prior means that extreme turbidity or rapidly changing illumination can degrade performance if the prior is misestimated.
In summary, AquaDiff introduces a principled integration of physics‑based color compensation and cross‑attention conditioned diffusion, delivering superior color correction and structural fidelity for underwater images. Future work may explore more efficient attention mechanisms, end‑to‑end learned priors, and multimodal cues (e.g., depth sensors) to further enhance robustness and speed.
Comments & Academic Discussion
Loading comments...
Leave a Comment