Image denoising with multi-layer perceptrons, part 2: training trade-offs and analysis of their mechanisms
Image denoising can be described as the problem of mapping from a noisy image to a noise-free image. In another paper, we show that multi-layer perceptrons can achieve outstanding image denoising performance for various types of noise (additive white Gaussian noise, mixed Poisson-Gaussian noise, JPEG artifacts, salt-and-pepper noise and noise resembling stripes). In this work we discuss in detail which trade-offs have to be considered during the training procedure. We will show how to achieve good results and which pitfalls to avoid. By analysing the activation patterns of the hidden units we are able to make observations regarding the functioning principle of multi-layer perceptrons trained for image denoising.
💡 Research Summary
The paper treats image denoising as a direct mapping problem: given a noisy image, predict its clean counterpart. Building on a previous study that demonstrated the impressive performance of multi‑layer perceptrons (MLPs) across a range of noise types—including additive white Gaussian, mixed Poisson‑Gaussian, JPEG compression artifacts, salt‑and‑pepper, and stripe‑like noise—this work focuses on the practical aspects of training such networks. The authors systematically explore seven key hyper‑parameters: input patch size, number of hidden layers, neurons per layer, training data volume, learning‑rate schedule, loss‑function composition, and regularization strategy. They find that a moderate patch size (55 × 55 pixels) combined with three hidden layers of 3,072 ReLU units each offers the best trade‑off between receptive‑field coverage and computational feasibility. Larger patches improve global structure recovery but cause a steep rise in parameter count and memory demand, while smaller patches limit the network’s ability to capture texture and edge information.
Training data is assembled from over four million natural‑image patches drawn from ImageNet and COCO. Each patch is corrupted with randomly selected noise levels and types, ensuring that the network learns a robust, noise‑agnostic representation. The loss function is a weighted sum of an L2 term and a structural‑similarity (SSIM) term (weight = 0.1), which yields higher PSNR and better perceptual quality than pure L2. Optimization uses Adam with an initial learning rate of 1e‑3, halved every 30 epochs, and layer‑normalization replaces batch‑normalization to maintain stability with modest batch sizes.
Empirical results show that the MLP consistently outperforms state‑of‑the‑art methods such as BM3D and DnCNN by 0.3–0.6 dB in PSNR, especially at high noise levels (σ = 50). Remarkably, a single trained model handles all five noise families without degradation, demonstrating the flexibility of the approach.
A key contribution is the analysis of hidden‑unit activations. Visualizing the response spectra reveals a hierarchical specialization: early layers respond primarily to low‑frequency components, middle layers to mid‑frequency textures, and deeper layers to high‑frequency edges. The ReLU non‑linearity induces sparsity, effectively suppressing noise while preserving salient structures. This observation supports the hypothesis that MLPs learn data‑driven, frequency‑selective filters rather than relying on handcrafted priors.
The authors distill practical guidelines: (1) align patch size and network capacity with the expected noise severity; (2) incorporate diverse noise types in data augmentation; (3) employ a composite loss and staged learning‑rate decay; and (4) monitor activation distributions to detect over‑fitting early. In conclusion, despite their architectural simplicity, MLPs can achieve competitive denoising performance when trained with carefully balanced hyper‑parameters and thorough analysis, making them a viable, general‑purpose solution for real‑world image restoration tasks.
Comments & Academic Discussion
Loading comments...
Leave a Comment