A Novel Cost Function for Despeckling using Convolutional Neural Networks
Removing speckle noise from SAR images is still an open issue. It is well know that the interpretation of SAR images is very challenging and despeckling algorithms are necessary to improve the ability of extracting information. An urban environment makes this task more heavy due to different structures and to different objects scale. Following the recent spread of deep learning methods related to several remote sensing applications, in this work a convolutional neural networks based algorithm for despeckling is proposed. The network is trained on simulated SAR data. The paper is mainly focused on the implementation of a cost function that takes account of both spatial consistency of image and statistical properties of noise.
💡 Research Summary
The paper addresses the challenging problem of speckle reduction (despeckling) in synthetic aperture radar (SAR) images, with a particular focus on urban scenes that contain structures of many different scales. Speckle is modeled as multiplicative noise whose intensity follows a Gamma distribution parameterized by the number of looks (L). Traditional despeckling methods are divided into local filters (e.g., Lee, Enhanced Lee, Kuan) that rely on statistics of neighboring pixels, and non‑local filters (e.g., Patch Probabilistic Based – PPB, SAR‑BM3D, NL‑SAR) that search for similar patches over a larger window. While effective, these approaches often trade off between noise suppression and preservation of fine details, especially in complex urban environments.
Recent advances in deep learning have motivated the use of convolutional neural networks (CNNs) for SAR despeckling. A major difficulty is the lack of clean reference images for real SAR data, which forces researchers to either mimic existing filters or train on simulated data where clean images are artificially corrupted with speckle. This work follows the latter strategy, using clean images from three public datasets (UCID, BSD, and a Google‑Maps collection of urban scenes) and adding simulated speckle according to the Gamma model.
The proposed network consists of ten 3 × 3 convolutional layers. All intermediate layers (except the first and the last) contain 64 filters and are followed by ReLU activations; no pooling or batch‑normalization layers are used, keeping the architecture simple and computationally efficient. The network receives a single‑band noisy SAR image Y and outputs a filtered image (\hat{X}). Importantly, the loss function is designed to enforce two complementary objectives: (1) spatial fidelity and (2) statistical fidelity of the speckle component.
The loss is a weighted sum (C = \lambda C_1 + C_2).
- (C_2) is the mean‑squared error (MSE) between the network output (\hat{X}) and the clean reference X, encouraging accurate reconstruction of image content.
- (C_1) is a single‑band adaptation of Spectral Information Divergence (SID), originally introduced for hyperspectral analysis, applied to the estimated speckle ratio (\hat{N}=Y/\hat{X}) and the true speckle N. SID measures the divergence between the probability distributions of the estimated and true speckle, thus preserving the statistical properties of the noise.
Training is performed with stochastic gradient descent (SGD) with momentum, a learning rate of (2\times10^{-6}), and patches of size 65 × 65. The dataset comprises 30 000 training patches and 12 000 validation patches drawn from the three image collections, with a particular emphasis on the urban Google‑Maps set to ensure the network learns to handle complex structures.
Experimental evaluation is carried out on both simulated SAR data (two test clips) and a real SAR image. Quantitative performance is measured using the M‑index, a metric that jointly accounts for Equivalent Number of Looks (ENL) in homogeneous regions and the homogeneity of ratio images, with lower values indicating better performance. On the simulated clips, the proposed method achieves M‑index values of 5.59 and 6.55, substantially lower than PPB’s 10.65 and 10.27, respectively. Visual inspection confirms that the CNN preserves fine details such as cars, trees, and roof patterns that PPB tends to blur.
On a real SAR image, the proposed method yields an M‑index of 8.36 compared to PPB’s 7.29; despite the slightly higher index, the CNN’s output retains more structural detail, while PPB produces an over‑smoothed result. Ratio images further illustrate that PPB suppresses a large amount of high‑frequency information, whereas the CNN maintains a more faithful representation of the underlying speckle statistics, albeit with some difficulty in completely removing very strong scatterers.
The authors conclude that integrating a statistical divergence term (SID) with a conventional MSE loss enables a CNN to simultaneously respect the spatial structure of SAR images and the known statistical behavior of speckle. This dual‑objective loss mitigates the over‑smoothing common in traditional filters and improves detail preservation across multiple scales. Limitations include residual artifacts around strong scatterers and the reliance on simulated data for supervised training. Future work will explore unsupervised or self‑supervised training regimes to eliminate the need for clean references, potentially extending the approach to fully real‑world SAR datasets.
Overall, the paper demonstrates that a carefully crafted loss function, grounded in both image‑domain and statistical‑domain considerations, can significantly enhance deep‑learning‑based despeckling for urban SAR imagery, offering a promising direction for preprocessing in downstream remote‑sensing tasks such as classification and object detection.
Comments & Academic Discussion
Loading comments...
Leave a Comment