Filter2Noise: A Framework for Interpretable and Zero-Shot Low-Dose CT Image Denoising
Noise in low-dose computed tomography (LDCT) can obscure important diagnostic details. While deep learning offers powerful denoising, supervised methods require impractical paired data, and self-supervised alternatives often use opaque, parameter-heavy networks that limit clinical trust. We propose Filter2Noise (F2N), a novel self-supervised framework for interpretable, zero-shot denoising from a single LDCT image. Instead of a black-box network, its core is an Attention-Guided Bilateral Filter, a transparent, content-aware mathematical operator. A lightweight attention module predicts spatially varying filter parameters, making the process transparent and allowing interactive radiologist control. To learn from a single image with correlated noise, we introduce a multi-scale self-supervised loss coupled with Euclidean Local Shuffle (ELS) to disrupt noise patterns while preserving anatomical integrity. On the Mayo Clinic LDCT Challenge, F2N achieves state-of-the-art results, outperforming competing zero-shot methods by up to 3.68 dB in PSNR. It accomplishes this with only 3.6k parameters, orders of magnitude fewer than competing models, which accelerates inference and simplifies deployment. By combining high performance with transparency, user control, and high parameter efficiency, F2N offers a trustworthy solution for LDCT enhancement. We further demonstrate its applicability by validating it on clinical photon-counting CT data. Code is available at: https://github.com/sypsyp97/Filter2Noise.
💡 Research Summary
**
Low‑dose computed tomography (LDCT) offers essential diagnostic information while adhering to the ALARA principle, but the reduced photon budget inevitably introduces quantum and electronic noise that can mask subtle pathologies. Conventional denoising pipelines—ranging from classic non‑local means and BM3D to modern supervised deep networks—either rely on hand‑tuned parameters, assume unrealistic noise models, or demand large paired datasets that are ethically and practically infeasible in a clinical setting. Recent self‑supervised “zero‑shot” approaches (e.g., Noise2Void, Self2Self, Noise2Self) alleviate the data‑dependency issue but inherit two critical drawbacks: (1) they are built on deep, black‑box architectures that provide little insight into their decision process, undermining clinician trust; and (2) they struggle with the spatially correlated noise characteristic of reconstructed CT images, often collapsing to an identity mapping when trained on a single noisy slice.
The paper introduces Filter2Noise (F2N), a novel framework that simultaneously addresses interpretability, data‑efficiency, and computational practicality. At its core lies an Attention‑Guided Bilateral Filter (AGBF)—a fully differentiable, content‑aware extension of the classic bilateral filter. Unlike the traditional filter that uses globally fixed spatial (σ_x, σ_y) and range (σ_r) standard deviations, AGBF predicts spatially varying σ values for each image patch. These predictions are generated by a lightweight dual‑attention module:
- Feature Attention extracts patch‑wise contextual embeddings via a scaled dot‑product attention mechanism, allowing the network to distinguish between soft tissue, bone, air, and other anatomical structures.
- Sigma Attention maps the contextual embeddings to the three σ parameters using separate linear layers followed by a Softplus activation, guaranteeing positive values.
The patch size (P = 8) balances granularity and computational load; each patch receives its own set of filter parameters, enabling strong smoothing in homogeneous regions while preserving edges in high‑contrast areas. Because the only learnable parameters are the σ maps (≈3.6 k total), the model is orders of magnitude smaller than typical U‑Net‑based zero‑shot methods, which often contain millions of weights. This extreme parameter efficiency translates into fast inference (≈30 fps for 512 × 512 slices) and easy deployment on clinical workstations or embedded devices.
Training on a single noisy LDCT slice requires a self‑supervised strategy that can break the inherent noise correlation. F2N introduces Euclidean Local Shuffle (ELS), a novel augmentation that randomly permutes pixels within small Euclidean neighborhoods (e.g., 3 × 3 blocks). ELS disrupts the spatial correlation of the noise while preserving anatomical structures, preventing the network from learning a trivial identity mapping. In parallel, a multi‑scale loss is applied: L1/L2 reconstruction errors are computed at several down‑sampled resolutions, encouraging the model to denoise both coarse and fine details. The combination of ELS and multi‑scale supervision enables robust learning from a single image without any external data.
Experimental validation is performed on the Mayo Clinic LDCT Challenge dataset. F2N achieves state‑of‑the‑art zero‑shot performance, surpassing the best competing methods by up to 3.68 dB in PSNR and delivering consistent SSIM improvements. Despite its compact size, the model matches or exceeds the visual quality of heavyweight deep‑learning baselines, as confirmed by both quantitative metrics and radiologist assessments.
To demonstrate clinical relevance beyond the benchmark, the authors evaluate F2N on photon‑counting CT (PCCT) data—a next‑generation modality with distinct noise characteristics and limited training data. The method reduces noise by more than 45 % compared with BM3D, and the spatially varying σ maps can be visualized and manually adjusted by radiologists to fine‑tune denoising in regions of interest (e.g., small nodules). This interactive control is a key differentiator: clinicians can inspect the σ heatmaps, modify them post‑training, and instantly observe the effect on the reconstructed image, thereby establishing a transparent feedback loop that is impossible with opaque deep networks.
Limitations and future work are acknowledged. Currently F2N operates slice‑by‑slice (2D), so inter‑slice consistency is not guaranteed; extending the AGBF to 3D kernels and incorporating temporal attention could address volumetric coherence. Moreover, the strength of ELS must be carefully calibrated—excessive shuffling may degrade fine anatomical details, suggesting the need for adaptive augmentation schedules in clinical practice.
In summary, Filter2Noise delivers a triad of advantages: (1) interpretability through explicit, visualizable filter parameters; (2) zero‑shot learning that requires only a single noisy LDCT image; and (3) parameter efficiency enabling real‑time deployment. By marrying a classic, well‑understood image processing operator with modern attention mechanisms, the authors present a compelling pathway toward trustworthy AI‑assisted CT denoising that aligns with regulatory expectations and clinical workflow demands.
Comments & Academic Discussion
Loading comments...
Leave a Comment