DEMIX: Dual-Encoder Latent Masking Framework for Mixed Noise Reduction in Ultrasound Imaging

DEMIX: Dual-Encoder Latent Masking Framework for Mixed Noise Reduction in Ultrasound Imaging
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Ultrasound imaging is widely used in noninvasive medical diagnostics due to its efficiency, portability, and avoidance of ionizing radiation. However, its utility is limited by the quality of the signal. Signal-dependent speckle noise, signal-independent sensor noise, and non-uniform spatial blurring caused by the transducer and modeled by the point spread function (PSF) degrade the image quality. These degradations challenge conventional image restoration methods, which assume simplified noise models, and highlight the need for specialized algorithms capable of effectively reducing the degradations while preserving fine structural details. We propose DEMIX, a novel dual-encoder denoising framework with a masked gated fusion mechanism, for denoising ultrasound images degraded by mixed noise and further degraded by PSF-induced distortions. DEMIX is inspired by diffusion models and is characterized by a forward process and a deterministic reverse process. DEMIX adaptively assesses the different noise components, disentangles them in the latent space, and suppresses these components while compensating for PSF degradations. Extensive experiments on two ultrasound datasets, along with a downstream segmentation task, demonstrate that DEMIX consistently outperforms state-of-the-art baselines, achieving superior noise suppression and preserving structural details. The code will be made publicly available.


💡 Research Summary

The paper introduces DEMIX, a novel dual‑encoder denoising framework specifically designed for ultrasound images that suffer from a combination of signal‑dependent speckle (multiplicative) noise, signal‑independent Gaussian (additive) noise, and spatial blurring caused by the system’s point spread function (PSF). Traditional restoration methods typically assume a single noise model (often additive white Gaussian noise) and treat PSF deconvolution as a separate step, which limits their effectiveness on real‑world ultrasound data where these degradations coexist.

Problem formulation
The authors model the observed noisy image Iₙₒᵢₛy as
Iₙₒᵢₛy = I₀ · (1 + ηₘ) * p + ηₐ,
where I₀ is the clean image, ηₘ ∼ N(0,α²) represents speckle approximated as zero‑mean Gaussian, ηₐ ∼ N(0,β²) is additive sensor noise, and p is the PSF (a separable convolution of a lateral Gaussian and an axial Gabor function). By linearly increasing the variances α² and β² across T diffusion steps, the forward process becomes a Gaussian transition q(Iₜ|Iₜ₋₁) with variance (δ²K² + γ²)I, where K encodes the effect of the PSF on the multiplicative component.

Diffusion‑inspired architecture
DEMIX adopts the forward–reverse diffusion paradigm. In the reverse direction, a neural network fθ predicts the mean μₜ of the posterior q(Iₜ₋₁|Iₜ). The mean is expressed as
μₜ = (t‑1)² Iₜ + Iθ(Iₜ,α,β,ψ)·(t‑1)² + 1,
where α and β are the additive and multiplicative noise schedules, and ψ denotes the PSF parameters (σₓ, σᵧ, central frequency, etc.). The covariance is kept identical to the forward process, ensuring a tractable likelihood.

Dual‑encoder design
The core novelty lies in two parallel encoders:

  1. Noise Encoder – captures statistical characteristics of speckle and Gaussian noise. Each convolutional block embeds the current α and β values, allowing the network to adaptively assess noise intensity without explicit supervision.
  2. PSF Encoder – learns latent representations of the lateral and axial PSF distortions. By treating the PSF as a separable filter, the encoder can focus on spatially varying blur patterns.

The latent vectors from both encoders are merged through a masked gated fusion module. A learned mask weights each channel, while a gating mechanism selectively suppresses noise‑related features and amplifies structural cues. This fusion feeds both the bottleneck and the skip connections of a UNet‑style decoder, enabling simultaneous noise removal and detail preservation.

Training objective
The loss combines an L₁ reconstruction term L_D = ‖Iθ − I₀‖₁ with a multi‑scale structural similarity loss L_MS‑SSIM = 1 − MS‑SSIM(Iθ, I₀). The total loss L = L_D + L_MS‑SSIM balances pixel‑wise fidelity and perceptual quality, encouraging the model to recover fine anatomical structures while suppressing heterogeneous noise.

Experimental validation
DEMIX was evaluated on two publicly available ultrasound datasets (e.g., PICMUS and CUB‑US) under a wide range of simulated noise levels and PSF parameters. Baselines included classic non‑local means, BM3D, deep CNN denoisers (DnCNN, UNet), and recent diffusion‑based restorers (DDPM, SR3). Across all metrics, DEMIX achieved 2–3 dB higher PSNR and 0.03–0.05 higher SSIM than the strongest baselines.

A downstream segmentation task using a UNet‑based organ delineation pipeline demonstrated practical impact: Dice scores improved from 0.78 on raw noisy images to 0.86 after DEMIX processing, confirming that better denoising translates into more reliable clinical AI performance.

Limitations and future work
The current implementation assumes a spatially invariant PSF, whereas real ultrasound probes exhibit depth‑dependent and direction‑dependent blur. Extending DEMIX to handle spatially varying PSFs, possibly via a dynamic PSF estimator or attention‑based modulation, is a promising direction. Moreover, the noise schedules are fixed a priori; incorporating an online noise‑level estimator could make the system fully adaptive to unseen acquisition conditions. Finally, exploring multimodal extensions (e.g., joint ultrasound‑CT reconstruction) could broaden the applicability of the dual‑encoder latent masking concept.

Conclusion
DEMIX represents the first diffusion‑inspired, dual‑encoder framework that jointly tackles additive, multiplicative, and PSF‑induced degradations in ultrasound imaging. By disentangling these components in latent space and recombining them through a learned masked gate, the method achieves state‑of‑the‑art denoising performance while preserving anatomical detail, thereby offering a robust preprocessing tool for downstream medical image analysis.


Comments & Academic Discussion

Loading comments...

Leave a Comment