Deep Unfolded BM3D: Unrolling Non-local Collaborative Filtering into a Trainable Neural Network

Reading time: 5 minute
...

📝 Original Info

  • Title: Deep Unfolded BM3D: Unrolling Non-local Collaborative Filtering into a Trainable Neural Network
  • ArXiv ID: 2511.12248
  • Date: 2025-11-15
  • Authors: ** 논문에 명시된 저자 정보가 제공되지 않았습니다. **

📝 Abstract

Block-Matching and 3D Filtering (BM3D) exploits non-local self-similarity priors for denoising but relies on fixed parameters. Deep models such as U-Net are more flexible but often lack interpretability and fail to generalize across noise regimes. In this study, we propose Deep Unfolded BM3D (DU-BM3D), a hybrid framework that unrolls BM3D into a trainable architecture by replacing its fixed collaborative filtering with a learnable U-Net denoiser. This preserves BM3D's non-local structural prior while enabling end-to-end optimization. We evaluate DU-BM3D on low-dose CT (LDCT) denoising and show that it outperforms classic BM3D and standalone U-Net across simulated LDCT at different noise levels, yielding higher PSNR and SSIM, especially in high-noise conditions.

💡 Deep Analysis

📄 Full Content

X-ray Computed Tomography (CT) is essential in clinical practice but faces a fundamental trade-off between image quality and radiation exposure. Low-Dose CT (LDCT) reduces patient risk but introduces strong noise that obscures anatomy and degrades diagnostic confidence. Post-processing denoising is flexible and does not depend on specific scanner models, but generalizing across dose levels without protocol-specific retraining is challenging because CT acquisition settings vary by anatomy, patient, and task.

Traditional model-based methods like BM3D [1] exploit non-local self-similarity but rely on fixed parameters. Deep learning approaches such as DnCNN [2] and GANs learn noise-to-clean mappings but lack interpretability and often require retraining per dose regime. In this study, we introduce a hybrid framework “Deep Unfolded BM3D (DU-BM3D)” that unrolls BM3D into a trainable network by replacing its fixed collaborative filtering with a learnable U-Net denoiser. This preserves BM3D’s structural prior while enabling end-to-end optimization. We demonstrate that a single DU-BM3D model, trained at one dose level, generalizes across simulated LDCT noise levels ranging from 10k to 500k photon counts, outperforming both classical BM3D and standalone U-Net.

LDCT denoising has been pursued through (i) traditional model-based algorithms, (ii) data-driven deep learning, and (iii) hybrid deep unfolding approaches. Model-based denoising: Classical methods impose explicit image priors: Total Variation (TV) enforces edge-preserving smoothness, wavelet models use multiscale sparsity, and Non-Local Means (NLM) averages self-similar patches. Block-Matching and 3D Filtering (BM3D) [1] is particularly effective in medical imaging: it groups similar patches into 3D stacks, applies transforms (e.g., DCT), and attenuates noise via coefficient shrinkage. BM3D and its LDCT-oriented extensions (e.g., contextaware BM3D [3]) perform strongly, but rely on fixed handtuned parameters and adapt poorly across dose levels. Deep learning-based denoising: Deep learning methods learn direct mappings from low-dose to normal-dose CT. U-Net [4] is widely used in biomedical imaging, via an encoder-decoder with skip connections that fuses semantic context with fine detail. Subsequent work incorporates self-attention [5] and non-local similarity modeling [6]. However, such models act as “black boxes”, lack explicit priors, and require large annotated datasets that are difficult to obtain in medical imaging. Hybrid / deep unfolding: Deep Unfolding (DU) [7], [8] maps iterative optimization procedures to neural networks, exposing interpretable structure while allowing parameters to be learned. BM3D-Net [9] follows this direction by learning transform-domain filters. Our approach differs in scope: we unfold the full BM3D pipeline and replace its entire filtering stage with a learnable U-Net denoiser, yielding an end-to-end trainable model that couples BM3D’s non-local structural prior with datadriven adaptability.

In this study, we introduce a hybrid framework integrating model-based algorithms with deep learning. Inspired by Deep Unfolding (DU) [7], we re-interpret BM3D [1] as a learnable network. BM3D comprises three stages:

(1) block-matching, (2) collaborative filtering, and (3) aggregation. We keep the first and third stages as fixed operators, preserving their mathematical structure, and replace the second stage with a compact learnable U-Net [10], yielding a parameter-efficient architecture. Our proposed Deep Unfolded BM3D (DU-BM3D) framework consists of three steps: Block-matching (non-learnable): Given a noisy lowdose CT input x l ∈ R H×W , we apply a block-matching operator M(•) that leverages non-local self-similarity: it searches for similar patches and groups them into 3D stacks G l ,

where the encoder-decoder with skip connections learns to separate signal from noise using both local and nonlocal correlations. Aggregation (non-learnable): The denoised stacks Ĝn are re-projected to their original 2D locations using a fixed aggregation operator A(•), which reconstructs the final image xn via weighted averaging of overlapping patches:

The complete DU-BM3D model f θ (•) is the composition of these three stages (1), ( 2) and (3):

For each training pair (x l , x n ) of low-dose and normaldose CT images, overlapping patches are extracted from both x l and x n . Similar patches from the noisy image are grouped via block-matching, forming non-local 3D stacks that are fed to the U-Net denoiser D θ (•). We optimize the U-Net parameters θ to minimize the discrepancy between the predicted output xn and the ground truth x n .

Given N training pairs {x l,i , x n,i } N i=1 , we minimize the Mean Squared Error (MSE) loss:

Optimization uses the Adam optimizer with backpropagation. Since M and A are fixed, gradients flow only through the U-Net parameters θ, forcing it to learn

We validate DU-BM3D through comparison with BM3D and U-Net baselines. W

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut