Deep Graph-Convolutional Image Denoising

1 Deep Graph-Con v olutional Image Denoising Diego V alsesia, Giulia Fracastoro, Enrico Magli Abstract —Non-local self-similarity is well-known to be an effective prior for the image denoising problem. Howe ver , little work has been done to incorporate it in con volutional neural networks, which surpass non-local model-based methods despite only exploiting local inf ormation. In this paper , we pr opose a novel end-to-end trainable neural network architectur e employing layers based on graph con volution operations, thereby creating neurons with non-local receptive ﬁelds. The graph con volution operation generalizes the classic con volution to arbitrary graphs. In this work, the graph is dynamically computed from similarities among the hidden features of the network, so that the powerful repr esentation learning capabilities of the network are exploited to uncover self-similar patterns. W e introduce a lightweight Edge- Conditioned Conv olution which addresses vanishing gradient and over -parameterization issues of this particular graph conv olution. Extensive experiments show state-of-the-art performance with impro ved qualitative and quantitative results on both synthetic Gaussian noise and real noise. Keyw ords — Graph neural networks, image denoising, graph con- volution I . I N T RO D U C T I O N Denoising is a staple among image processing problems and its importance cannot be ov erstated. Despite decades of work and countless methods, it still remains an acti ve research topic because its purpose goes f ar beyond generating visually pleasing pictures. Denoising is fundamental to enhance the performance of higher-le vel computer vision tasks such as classiﬁcation, segmentation or object recognition, and is a building block in the solution to various problems [1]–[4]. The recent successes achiev ed by con v olutional neural networks (CNNs) e xtended to this problem as well and ha ve brought a new generation of learning-based methods that is redeﬁning the state of the art. Ho wever , it is important to learn the lessons of past research on the topic and integrate them with the ne w deep learning techniques. In particular , classic denoising methods such as BM3D [5] showed the importance of exploiting non- local self-similar patterns. Ho wever , the con volution operation underpinning all CNNs architectures [6]–[9] is unable to capture such patterns because of the locality of the con volution kernels. Only very recently , some works started addressing the integration of non-local information into CNNs [10]–[13]. This paper presents a denoising neural network, called GCDN, where the con volution operation is generalized by means of graph con volution, which is used to create layers with hidden neurons having non-local receptiv e ﬁelds that success- fully capture self-similar information. Graph con volution is a The authors are with Politecnico di T orino – Department of Electronics and T elecommunications, Italy . email: { name.surname } @polito.it. This research has been partially funded by the SmartData@PoliTO center for Big Data and Machine Learning technologies. W e thank Nvidia for donating a Quadro P6000 GPU. generalization of the traditional con volution operation when the data are represented as sitting ov er the vertices of a graph. In this work, ev ery pixel is a vertex and the edges in the graph are dynamically computed from the similarities in the feature space of the hidden layers of the network. This allo ws us to exploit the powerful representational features of neural networks to discov er and use latent self-similarities. W ith respect to other CNNs integrating non-local information for the denoising task, the proposed approach has se veral advantages: i) it creates an adaptive recepti ve ﬁeld for the pixels in the hidden layers by dynamically computing a nearest-neighbor graph from the latent features; ii) it creates dynamic non-local ﬁlters where feature vectors that may be spatially distant but close in a latent vector space are aggregated with weights that depend on the features themselves; iii) the aggregation weights are estimated by a fully-learned operation, implemented as a subnetwork, instead of a predeﬁned parameterized opera- tion, allo wing more generality and adaptability . Starting from the Edge-Conditioned Con volution (ECC) deﬁnition of graph con volution, we propose se veral improv ements to address stability , ov er-parameterization and vanishing gradient issues. Finally , we also propose a novel neural network architecture which draws from an analogy with an unrolled re gularized optimization method. A preliminary version of this work appeared in [14]. There are several differences with the work in this paper . The architecture of the network is improv ed by drawing an analogy with proximal gradient descent methods, and it is signiﬁcantly deeper . Moreov er , we propose several solutions to address the ECC overparameterization and computational issues. Finally , we also present an in-depth analysis of the network behavior and greatly extended experimental results. This paper is structured as follows. Sec. II provides some background material on graph-con volutional neural networks and state-of-the-art denoising approaches. Sec. III describes the proposed method. Sec. IV analyzes the characteristics of the proposed method and experimentally compares it with state- of-the-art approaches. Finally , Sec. V draws some conclusions. I I . R E L A T E D W O R K A. Graph neur al networks Inspired by the ov erwhelming success of deep neural net- works in computer vision, a signiﬁcant research ef fort has recently been made in order to develop deep learning methods for data that naturally lie on irregular domains. One case is when the data domain can be structured as a graph and the data are deﬁned as vectors on the nodes of this graph. Extending CNNs from signals with a regular structure, such as images and video, to graph-structured signals is not straightforward, since even simple operations such as shifts are undeﬁned ov er graphs. 2 One of the major challenges in this ﬁeld is deﬁning a con volution-like operation for this kind of data. Con volution has a key role in classical CNNs, thanks to its properties of locality , stationarity , compositionality , which well match prior knowledge on many kinds of data and thus allo w ef- fectiv e weight reuse. For this reason, deﬁning an operation with similar characteristics for graph-structured data is of primary importance in order to obtain effecti ve graph neural networks. The literature has identiﬁed two main classes of approaches to tackle this problem, namely spectral or spatial. In the former case [15]–[17], the con volution is deﬁned in the spectral domain through the graph Fourier transform [18]. Fast polynomial approximations [16] ha ve been proposed in order to obtain an efﬁcient implementation of this operation. Graph- con volutional neural networks (GCNN) with this con volution operator ha ve been successfully applied in problems of semi- supervised node classiﬁcation and link prediction [17], [19]. The main drawback of these methods is that the graph is supposed to be ﬁxed and it is not clear how to handle the cases where the structure varies. The latter class of approaches ov ercomes this issue by deﬁning the con volution operator in the spatial domain [20]–[25]. In this case, the con volution is performed by local aggregations, i.e. a weighted combination of the signal v alues ov er neighboring nodes. Since in this case the operation is deﬁned at a neighborhood lev el, the conv olu- tion remains well-deﬁned e ven when the graph structure v aries. Many of the spatial approaches present in the literature [22]– [24] perform local aggregations with scalar weights. Instead, [20] proposes to weight the contrib utions of the neighbors using edge-dependent matrices. This makes the con volution a more general function, increasing its descriptiv e po wer . For this reason, in this paper we employ the con volution operator proposed in [20]. Howe ver , in order to obtain an efﬁcient operation, we introduce sev eral approximations that reduce its computation complexity , memory occupation, and mitigate vanishing gradient issues that arise when trying to build v ery deep architectures. B. Image denoising The literature on image denoising is vast, as it is one of most classic problems in image processing. F ocusing on the recent dev elopments, we can broadly deﬁne two categories of meth- ods: model-based approaches and learning-based approaches. Model-based approaches traditionally focused on deﬁning hand-crafted priors to carefully capture the salient features of natural images. Early works in this category include total variation minimization [26], and bilateral ﬁltering [27]. Non- local means [28] introduced the idea of non-local av eraging according to the similarity of local neighborhood. The popular BM3D [5] expanded on the idea by collaborativ e ﬁltering of the matched patches. WNNM [29] used nuclear norm minimization to enforce a low-rank prior . Finally , some works recently introduced graph-based regularizers [30] to enforce a measure of smoothness of the signal across the edges of a graph of patch or pixel similiarities. Many of the most successful model-based approaches are non-local, i.e., they exploit the concept of self-similarity among structures in the image be yond the local neighborhood. Learning-based approaches use training data to learn a model for natural images. The popular K-SVD algorithm [31] learns a dictionary in which natural patches hav e a sparse representation, and therefore casts image denoising as a sparse coding problem on this learned dictionary . The TNRD method [32] uses a nonlinear reaction diffusion model with trainable ﬁlters. An early work with neural networks [33] used a multilayer perceptron discriminativ ely trained on synthetic Gaussian noise and showed signiﬁcant improvements over model-based methods. More recently , CNNs have achiev ed remarkable performance. Zhang et al. [6] showed that the residual structure and the use of batch normalization [34] in their DnCNN greatly helps the denoising task. Follo wing the DnCNN, man y other architectures have been proposed, such as RED [7], MemNet [8] and a CNN working on wa velet coefﬁcients [9]. Howe ver , those CNN-based methods are limited by the local nature of the conv olution operation, which is unable to increase the receptiv e ﬁeld of a neuron- pixel to model non-local image features. This means that CNNs are unable to e xploit the self-similar patterns that were prov en to be highly successful in model-based methods. V ery recently , a few works started addressing this issue by trying to incorporate non-local information in a CNN. NN3D [10] uses a global post-processing stage based on a non-local ﬁlter after the output of a denoising CNN. This stage performs block matching and ﬁltering over the whole image denoised by the CNN. This is clearly suboptimal as the non-local information does not contribute to the training of the CNN. UNLNet [11] introduces a trainable non-local layer which collaborati vely ﬁlters image blocks. Howe ver , performance is limited by the selection of matching blocks from the noisy input image instead of the feature space, and ultimately UNLNet does not improv e ov er the performance of the simpler DnCNN. N 3 Net [12] introduces a continuous nearest-neighbor relaxation to create a non-local layer . Finally , NLRN [13] proposes a non- local module that uses the distances among hidden feature vectors of a search window around the pixel of interest to aggregate such vectors and return the output features of the pixel. Ho wever , there are signiﬁcant differences with respect to the work in this paper . First, they use all the pix els in the search window instead of only a number of nearest neighbors, which means that their receptiv e ﬁeld cannot dynamically adapt to the content of the image. Then, while in both works the fea- ture aggregation weights are dynamically computed from the features themselves, NLRN uses an explicitly-parameterized function with learnable parameters, in contrast to this work where the function is fully learned as a dedicated sub-network. These choices increase the adaptivity of the proposed non-local operations, which result in better performance around edges. I I I . P RO P O S E D D E N O I S E R A. Overview An overvie w of the proposed graph-con volutional denoiser network (GCDN) can be seen in Fig. 1. The structure will be explained more in detail in Sec. III-D where an analogy is drawn between unrolled proximal gradient descent with a graph total variation regularizer and the proposed network 3 𝛼 𝛼 𝛼 𝛼 𝛽 𝛼 𝛽 𝛽 𝛽 𝛽 3x3 5x5 7x7 Figure 1. GCDN architecture. architecture. At a ﬁrst glance, the network has a global input- output residual connection whereby the network learns to estimate the noise rather than successiv ely clean the image. This has been shown [6] to improv e training con ver gence for the denoising problem. The main feature of the proposed network is the use of graph-con volutional layers where the graphs are dynamically computed from the feature space. The graph-conv olutional layer , described in Sec. III-B, creates a non-local recepti ve ﬁeld for each pixel-neuron, so that pixels that are spatially distant but similar in the feature space created by the network can be merged. An important block of the proposed network is the pre- processing stage at the input. It can be noticed that the ﬁrst layers of the network are classic 2D con volutions rather than graph con volutions. This is done to create an embedding over a receptiv e ﬁeld larger than a single pixel and stabilize the graph construction operation, which would otherwise be af fected by the input noise. The preprocessing stage has three parallel branches that operate on multiple scales, in a fashion similar to the architectures in [35] and [36]. The multiscale features are extracted by a sequence of three con volutional layers with ﬁlters of size 3 × 3 , 5 × 5 , and 7 × 7 , depending on the branch. After a ﬁnal graph-conv olutional layer , the features are concatenated. The remaining network layers are grouped into an HPF block and multiple LPF blocks, named after the analogy with highpass and lo wpass graph ﬁlters described in Sec. III-D. These blocks have an initial 3 × 3 con volutional layer followed by three graph-con volutional layers sharing the same graph constructed from the output of the con v olutional layer . All layers are interleaved by Batch Normalization operations [34] and leaky ReLU nonlinearities. Notice that the LPF blocks hav e themselves a residual connection to help backpropag ation, as in ResNet architectures [37]. The ﬁnal layer is a graph- con volutional layer mapping from feature space to the image space. B. Graph-con volutional layer The operation performed by the graph-con volutional layer is summarized in Fig. 2. The two inputs to the graph- con volutional layer are the feature vectors H l ∈ R F l × N associated to the N image pix els at layer l and the adjacency matrix of a graph connecting image pixels. In this work, the graph is constructed as a K -nearest neighbor graph in the feature space. For each pixel, the Euclidean distances between GCONV Figure 2. Graph-conv olutional layer . The operation has a receptive ﬁeld with a local component ( 3 × 3 2D con volution) and a non-local component (pixels selected as nearest neighbors in the feature space). its feature vector and the feature vectors of pixels inside a search window are computed and an edge is drawn between the pixel and the K pixels with smallest distance. Using this method, we obtain a K -regular graph G l ( V , E l ) , where V is the set of vertices with |V | = N and E l ⊆ V × V is the set of edges. W e also assume that the edges of G l are labeled, i.e. there exists a function L : E l → R F l that assigns a label to each edge. In this work, we deﬁne the edge labeling function as the difference between the two feature vectors, i.e. L ( i, j ) = H l j − H l i = d l,j → i . A classic 3 × 3 local conv olution processes the local neighborhood to provide its estimate of the output feature vector for the current pixel, while the feature vectors of the non-local pixels connected by the graph are aggregated by means of the edge-conditioned conv olution (ECC) [20]. Notice that the 8 local neighbors of the pix el are excluded from graph construction as they are already used by the local con v olution. The non-local aggregation is computed as: H l +1 , NL i = X j ∈S l i γ l,j → i F l w l  d l,j → i  H l j |S l i | = X j ∈S l i γ l,j → i Θ l,j → i H l j |S l i | , (1) where F l w l : R F l → R F l +1 × F l is a fully-connected network that takes as input the edge labels and outputs the correspond- ing weight matrix Θ l,j → i = F l w l ( L ( i, j )) ∈ R F l +1 × F l , w l are the weights parameterizing network F l , and S l i is the set of neighbors of node i in the graph G l . The scalar γ j → i is an 4 edge-attention term computed as: γ l,j → i = exp  −k d l,j → i k 2 2 /δ  (2) where δ is a cross-validated hyper-parameter . This term is reminiscent of the edge attention mechanism from the graph neural network literature [38] and it serves the purpose of stabilizing training by underweighting the edges that connect nodes with distant feature vectors. Note that this term could, in principle, be learned by the F network but we found that decoupling it and making it explicitly dependent on feature distances in an exponential way , accelerated and stabilized training. Also notice that in Sec. IV we show that this term alone, i.e. without weight matrices Θ , is not powerful enough to reach good performance. Moreo ver , it is worth mentioning that the edge weights Θ and the edge-attention term γ depend only on the edge labels. This means that two pairs of nodes with the same edge labels will ha ve the same weights, resulting in a behaviour similar to weight sharing in classical CNNs. Finally , we combine the feature vector estimated by the non-local aggregation with the one produced by the local con volution to provide the output features as follows H l +1 i = H l +1 , NL i + H l +1 , L i 2 + b l , where H l +1 , L i is the output of the 3 × 3 local con volution for the node i and b l ∈ R F l is the bias. The advantages of the ECC with respect to other deﬁnitions of graph con volution are trifold: i) the edge weights depend on the edge label, ii) it allows to compute an afﬁne transformation along ev ery edge, and iii) the edge weight function is highly general since it does not hav e a predeﬁned structure. By making the edge weights depend on the input features, the ECC implements an adaptiv e ﬁlter which can be more complex than the non-adaptiv e local ﬁlters. Moreover , the second adv antage is due to the fact that Θ l,j → i is an edge-dependent matrix, making the con volution operation more general than other non- local aggreg ation methods using scalar edge weights. Among such methods we can ﬁnd GCN [17], GIN [22], MoNet [23], and FeastNet [24]. Finally , the F function is a general function which can be learned to be the optimal one for the denoising task by the function approximation capability of the subnetwork implementing it. This is in contrast with other methods where the function predicting the edge weights is ﬁxed with some learnable parameters. F or example, FeastNet [24] employs scalar edge weights computed using the follo w- ing function f ( H l i , H l j ) ∝ exp  u T H l i + v T H l j + c  , where u , v ∈ R F l and c ∈ R are learnable parameters. Instead, MoNet [23] employs a Gaussian kernel as follo ws f ( H l i , H l j ) = exp  − 1 2 ( d l,j → i − µ ) T Σ − 1 ( d l,j → i − µ )  , where Σ ∈ R F l × F l and µ ∈ R F l are learnable parameters. Also NLRN [13] uses a Gaussian kernel to perform non- local aggre gations. W e can consider this operation as a graph Figure 3. Circulant approximation of a fully-connected layer . con volution where each pixel is connected to all the other pixels in its search window and the edge weights are deﬁned as follo ws f ( H l i , H l j ) = exp  H lT i W T θ W φ H l j  P j ∈S i exp  H lT i W T θ W φ H l j  W g , where W θ , W φ ∈ R t × F l and W g ∈ R F l +1 × F l are learnable parameters. C. Lightweight Edg e-Conditioned Con volution As seen in the pre vious section, the function F has a ke y role in the ECC because it deﬁnes the weights for the neighborhood aggregation. In the original deﬁnition of ECC [20], the function F is implemented as a two-layer fully connected netw ork. This deﬁnition raises some rele vant issues. In the following, we will describe in detail these issues and present two possible solutions. 1) Circulant appr oximation of dense layer: The ﬁrst issue is related to the risk of ov er-parameterization. The dimension of the input of the F network is F l , while the dimension of its output is F l +1 × F l . This means that the number of weights of the network depends cubically on the number of features. Therefore, the number of parameters quickly becomes excessiv ely large, resulting in vanishing gradients or ov erﬁtting. T o address the over -parameterization problem we propose to use a partially-structured matrix for the last layer , instead of an unstructured one. W e impose that this matrix is composed of multiple stacked partial circulant matrices, i.e., matrices where only a few shifted versions of the ﬁrst row are used instead of all the possible ones of the full square matrix. Fig. 3 shows the structure of the approximated matrix. Using this approxima- tion, the only free parameters are in the ﬁrst row of each partial circulant matrix. If only m shifts per partial circulant matrix are allo wed, we reduce the number of parameters by a f actor m . Thus, if the unstructured dense matrix has F l F l +1 × F l parameters, with the proposed approximation the number of parameters drops to F l F l +1 m × F l . Similar approaches to ap- proximate fully connected layers have already been studied in the literature [39], [40]. In particular , [39] shows that imposing a partial circulant structure does not signiﬁcantly impact the ﬁnal performance in a classiﬁcation problem. Indeed, there are connections with results stating that random partial circulant matrices implement stable embeddings almost as well as fully random matrices [41]–[43]. 5 j ➝ i,L θ j ➝ i,R κ j ➝ i F j ➝ i Figure 4. F network. FC 0 is a fully-connected layer follo wed by a leaky ReLU non-linearity . The FC R , FC L , FC κ do not have any output non- linearities. 2) Low-rank node aggr e gation: The second issue related to the F network regards memory occupation and compu- tations. In order to perform the ECC operation, we hav e to compute a weight matrix Θ l,j → i for each edge j of every neighborhood N i of e very image in the batch. If we consider a K -regular graph and a batch of B images with N pixels each, the memory occupation needed to store all the matrices Θ l,j → i as single-precision ﬂoating point tensors is equal to B × N × K × F l +1 × F l × 4 bytes and this quantity can easily become unmanageable. T o giv e an idea of the required amount of memory , let us consider an example with B = 16 , N = 1024 , K = 8 , F l = F l +1 = 66 , then the memory required to store all the matrices Θ l,j → i for only one graph- con volutional layer is around 2 GB. In order to solve this issue, we propose to impose a low- rank approximation for Θ l,j → i . Let us consider the singular value decomposition of a matrix A = ΦΛΨ T = X s λ s φ s ψ T s , where φ s and ψ s are the left and right singular vectors and λ s the singular v alues. W e can obtain a lo w-rank approximation of rank r by keeping only the r largest singular v alues and setting the others to zero. Therefore, the approximation is reduced to a sum of r outer products. Inspired by this fact, we deﬁne Θ l,j → i as follo ws Θ l,j → i = r X s =1 κ j → i s θ j → i,L s θ j → i,R T s , (3) where θ j → i,L s ∈ R F l , θ j → i,R s ∈ R F l +1 , κ j → i s ∈ R and 1 ≤ r ≤ F l . Notice that the approximation in (3) ensures that the rank is at most r rather than exactly enforcing a rank- r structure, because we do not impose orthogonality between θ j → i,L s and θ j → i,R s , e ven though random initialization makes them quasi-orthogonal. Using this approximation, we can redeﬁne the F network in such a way that it outputs θ j → i,L s , θ j → i,R s , κ j → i s for s = 1 , 2 , . . . , r . In particular, we redeﬁne the second layer of the F network: instead of ha ving a single fully connected layer that outputs the entire matrix Θ l,j → i , we hav e three parallel fully connected layers that separately output θ j → i,L s , θ j → i,R s and κ j → i s , as shown in Fig. 4. The advantage of this approximation is that we only need to store θ j → i,L s , θ j → i,R s and κ j → i s instead of the entire matrix Θ l,j → i , drastically reducing the memory occupation to B × N × K × r (2 F l + 1) × 4 bytes. If we consider the example presented above and set r = 10 , the memory requirement drops from 2 GB to 700 MB. Another advantage of this approximation is that it also leads to a signiﬁcant reduction of the computation burden, because we ne ver ha ve to actually compute all the matrices Θ l,j → i . In fact, the neighborhood aggregation can be reduced as follows H l +1 , NL i = X j ∈S l i γ l,j → i Θ l,j → i H l j |S l i | = X j ∈S l i γ l,j → i P r s =1 κ j → i s θ j → i,L s θ j → i,R T s H l j |S l i | , (4) where the computational cost of the full operation on the ﬁrst line is O ( F l F l +1 ) , instead the cost of the decoupled operation on the second line is O ( r ( F l + F l +1 )) . Finally , this approximation also helps to reduce the number of parameters of the last layer of the F network since the output has size r ( F l + F l +1 + 1) instead of F l +1 F l . When we employ the new structure of the F network, we need to pay special attention to the weight initialization. In particular , we hav e to carefully deﬁne the variance of the random weight initialization of the three parallel layers to avoid scaling problems. W e deﬁne W 0 as the weight matrix of the ﬁrst layer of the F network, and W L , W R and W κ as the weight matrices of the three parallel fully connected layers. Let us suppose that d j → i t has been normalized to be approximately a standard Gaussian, i.e., d j → i t ∼ N (0 , 1) for t = 1 , . . . , F l , and that W 0 has been initialized using Glorot initialization [44], i.e., W 0 uv ∼ N  0 , 1 F l  with u, v = 1 , . . . , F l . Let us also assume that W L uv ∼ N (0 , σ 2 L ) , W R uv ∼ N (0 , σ 2 R ) , and W κ u ∼ N (0 , σ 2 κ ) . Then, we obtain θ j → i,L s,u ∼ N (0 , F l σ 2 L ) , θ j → i,R s,u ∼ N (0 , F l σ 2 R ) , κ j → i s ∼ N (0 , F l σ 2 κ ) , where s = 1 , . . . , r . Finally , considering the aggregation formula in Eq. (4) leads to the following result: H l +1 , NL i,u ∼ N  0 , 1 2 r F l 4 σ 2 L σ 2 R σ 2 κ  , (5) with u = 1 , . . . , F l +1 . In Eq. (5), we can observe that the variance of H l +1 , NL i,u depends on the fourth power of the number of features. This term can easily become extremely large, therefore it is important to set σ 2 L , σ 2 R and σ 2 κ in such a way that they can balance it. In this work, we set σ 2 L = σ 2 R = 1 F l 2 and σ 2 κ = 2 r . This allows us to obtain H l +1 , NL i,u ∼ N (0 , 1) with u = 1 , . . . , F l +1 . D. Analogy with unr olled gr aph smoothness optimization The neural network architecture presented in Sec. III-A can be seen as a generalization of few iterations of an un- rolled proximal gradient descent optimization method, which 6 ( I + 𝛽 L ) -1 L + x x x n 𝛽 1- 𝛼 𝜐 (t) 𝜐 (t+1) LPF HPF + x x x n 𝛽 1- 𝛼 𝜐 (t) 𝜐 (t+1) LPF HPF + x x n =A + y 𝛽 𝜐 (t) 𝜐 (t+1) I- 𝛼 A T A (a) Linear in verse problem 𝛽 + 𝛽 𝛼 𝜐 𝜐 + 𝛽 𝛼 𝜐 𝜐 (b) Denoising Figure 5. Single iteration. LPF is graph lowpass ﬁlter, HPF is a graph highpass ﬁlter . is widely used to solve linear in verse problems in the form of y = Ax + n (6) being x the clean image, A a forward model (e.g., a degra- dation such as blurring, downsampling, compressed sensing, etc.) and n a noise term. A well-kno wn technique to recov er x from y is to cast the problem as a least-squares minimization problem with a re gularization term that models some prior knowledge about the image. One such regularizer is graph smoothness. Considering a graph with Laplacian matrix L where edges connect pix els that are deemed correlated ac- cording to some criterion, the graph smoothness x T Lx is the graph equi valent of the total variation measure, indicating how much x varies across the edges of the graph. Natural images where the graph connects the local neighborhood typically hav e lowpass behavior , resulting in a low graph smoothness value. Reconstruction is therefore cast as: ˆ x = argmin x  1 2 k y − Ax k 2 2 + β 2 x T Lx  (7) The functional in Eq. (7) is in the form of a sum of two terms ( f ( x ) + g ( x ) ) and can be minimized by means of proximal gradient descent [45] which alternates a gradient descent step ov er f and a proximal mapping over g : x ( t +1) = prox g  x ( t ) − α ∇ ν f  = prox g  ( I − α A T A ) x ( t ) + α A T y  prox g ( µ ) = argmin z  k z − µ k 2 2 + β 2 z T Lz  . Solving for the proximal mapping operator results in the following update equation: x ( t +1) = ( I + β L ) − 1 h ( I − α A T A ) x ( t ) + α A T y i . (8) In order to match the framework of residual networks, let us deﬁne the least-squares solution x n = A + y =  A T A  − 1 A T y and perform a change of variable whereby the optimization estimates the residual of the least squares solution, i.e., ν ( t ) = x n − x ( t ) . Hence, we can rewrite Eq. (8) as: x n − ν ( t +1) = ( I + β L ) − 1 h  I − α A T A   x n − ν ( t )  + α A T y i . Finally , the follo wing update equation can be deriv ed: ν ( t +1) = ( I + β L ) − 1 h  I − α A T A  ν ( t ) + β Lx n i . (9) This update can be visualized as in Fig. 5a and is composed of tw o major operations in volving the signal prior: 1) Lx n : the graph Laplacian can be seen as a graph highpass ﬁlter applied to x n ; 2) ( I + β L ) − 1 : this term can be seen as a graph lowpass ﬁlter . In order to see this, let us use the matrix inv ersion lemma as ( I + β L ) − 1 =  I + β UΛU H  − 1 = I − U  β − 1 Λ − 1 + I  − 1 U H = U h I −  β − 1 Λ − 1 + I  − 1 i U H , where U is the graph Fourier transform. The term I −  β − 1 Λ − 1 + I  − 1 is a diagonal matrix whose entries are equal to 1 β λ i +1 where λ i are the eigen v alues of the graph Laplacian, and the lowpass behavior is due to decreasing value of such entries for increasing λ . For the denoising problem, we can set A = I and obtain the update shown in Fig.5b. The network architecture proposed in Sec. III-A draws from this deriv ation by unrolling a ﬁnite number of Eq. (9) iterations and generalizing the lowpass and highpass ﬁlters with learned graph ﬁlters interleav ed by nonlinearities. In Sec. IV we experimentally show that the learned ﬁlters actually show an approximate highpass and lowpass behavior . I V . E X P E R I M E N TA L R E S U L T S A. T raining details The training protocol follows the one used in [6]. The network is trained with patches of size 42 × 42 randomly extracted from 400 images from the train and test partitions of the Berkeley Segmentation Dataset (BSD) [46], withholding the 68 images in the v alidation set for testing purposes (BSD68 dataset). The loss function is the mean squared error (MSE) between the denoised patch output by the network and the ground truth. Each model is trained for approximately 800000 iterations with a batch size of 8. The Adam optimizer [47] has been used with an exponentially decaying learning rate be- tween 10 − 4 and 10 − 5 . The beha vior of the graph-con volutional layer is slightly different between training and testing for efﬁcienc y reasons. During training all pairwise distances are computed among the feature vectors corresponding to the pixels in the patch. On the other hand, testing is “fully con volutional”, as e very pixel has a search window centered around it and neighbors are identiﬁed as the closest pixels in such search window . The search window size is 43 × 43 , roughly comparable to the patch size used in training. This procedure is slightly suboptimal as some pixels might suffer from border effects during training (their search windo ws are not centered around them) but it is adv antageous in terms of speed and memory requirements. Reﬂection padding is used for all 2D con volutions to a void border ef fects. The δ parameter in the edge attention term in Eq. (2) is set to a value equal to 10. The number of features used in all con volutional layers is 132, except for the three parallel branches of the preprocessing 7 stage which hav e 44 features. The number of circulant rows in the circulant approximation of dense layers in the F network is m = 3 . The low-rank approximation uses r = 11 terms. During training, we noticed that the proposed lightweight ECC presented in Sec. III-C is extremely useful. In fact, without it, the network suffered from vanishing gradient problems ev en with a signiﬁcantly lo wer number of layers. B. F eatur e analysis In this section we study the properties of the features in the hidden layers of the network. 1) Adaptive receptive ﬁeld: W e ﬁrst analyze the characteris- tics of the receptiv e ﬁeld of a single pixel. Since the proposed network employs graph-con volutional layers, the shape of the receptive ﬁeld is not ﬁxed as in classical CNNs, but it depends on the structure of the graph. In Fig. 6 we sho w two examples of the receptive ﬁeld of a single pixel for the graph- con volutional layers in an LPF block with respect to the input of the block. Instead, in Fig. 7 we show the receptive ﬁeld of a single pixel for the layers in the HPF and in the ﬁrst LPF blocks with respect to the output of the preprocessing block. W e can clearly see that the receptive ﬁeld is adapted to the characteristics of the image: if we consider a pixel in a uniform area, its receptive ﬁeld will mostly contain pixels that belong to similar regions; instead if we consider a pixel on an edge, its receptiv e ﬁeld will be mainly composed of other edge pixels. This is beneﬁcial to the denoising task as it allows to exploit self-similarity and it descends from the use of a nearest neighbor graph, connecting each pixel to other pixels with similar features. Notice that dif ferently from algorithms performing block matching in the pixel space, we compute distances between feature vectors which can capture more complex image characteristics. This can be seen in Fig. 8 where we compute the Euclidean distances between the feature vector of the central pixel and the feature vectors of the other pixels in the search windo w . W e notice that the distances reﬂect the type of edge that includes the central pixel, e.g., a pixel sitting on a horizontal edge will detect as closest other pix els sitting on horizontal edges. This is due to the visual features learned by the network and would not happen in pixel-space matching. Thanks to the adaptability of the receptiv e ﬁeld, graph con volution can be interpreted as a generalization of the block matching operation performed in other non-local denoising methods, such as BM3D [5]. 2) F ilter analysis: W e also study the behavior of the LPF and HPF operators. In particular , we are interested in validating the analogy made in Sec. III-D. W e compute the discrete Fourier transform (DFT) of the feature maps at the output of these operators. As an example, Fig. 9 shows the log- magnitude of the coefﬁcients of three feature maps at the output of the HPF block and of the ﬁrst LPF block. The energy of the DFT coefﬁcients of the LPF feature maps is concentrated in the low frequencies, thus showing a lowpass behavior . Instead, the coefﬁcients of the HPF feature maps show a typical highpass behavior , having the energy concentrated along few directions. This substantiates our claim that the learned con volutional layers actually approximate nonlinear highpass and lowpass operators. Figure 6. Receptive ﬁeld (green) of a single pixel (red) for the three graph- con volutional layers in the LPF 1 block with respect to the input of the ﬁrst graph-con volutional layer in the block. T op row: gray pixel on an edge. Bottom row: white pixel in a uniform area. 3) Edge prediction: Lastly , we measure ho w much the true graph constructed by pixel or patch similarities on the noiseless image is successfully predicted by the graph constructed from the feature vectors in the hidden layers. In order to construct the true graph of the image, we ﬁrst compute the av erage pixel value of a 5 × 5 window centered at the considered pixel, for ev ery pixel in the image, and then we use the obtained values to compute a nearest neighbor graph with Euclidean distances. W e then compare the true graph with the graph computed in the hidden layers of the network. Fig. 10 shows the percentage of edges correctly identiﬁed as function of the number of neighbors considered for the true graph. W e can notice that the accuracy of the prediction decreases in layers closer to the output. This is due to the fact that we use a residual network that estimates the noise instead of approximating the clean image. In f act, the network learns to successiv ely remove the latent correlations in the feature space, and as a consequence, the graph becomes more random in the later layers. C. Ablation studies W e study the impact of v arious design parameters on denoising performance. First, T able I shows the PSNR on the Set12 testing set as function of the number of neighbors used by the graph conv olution operation for several values of the noise standard deviation σ . Each model has been independently trained for the speciﬁed number of neighbors. It can be noticed that increasing the number of neighbors improv es the denoising performance up to a saturation point, and then the performance slightly decreases. This shows that there an optimal neighborhood size and that it is important to employ only a small number of neighbors, in order to select only pixels with similar characteristics. This is in contrast with the NLRN method which uses all the pix els in the search window . Then, we study the relative impact on performance of the edge aggre gation matrices Θ in Eq. (1) with respect to using 8 Figure 7. Recepti ve ﬁeld (green) of a single pixel (red) for the layers in the HPF and LPF 1 blocks in the same order as a forward pass, with respect to the output of the preprocessing block. Figure 8. Euclidean distances between feature vectors of the central pixel and all the pixels in the search window (input of ﬁrst graph-conv olutional layer of LPF 1 ). Left to right: pixel on a horizontal edge, pixel on a vertical edge, pixel on a diagonal edge. Blue represents lower distance. Figure 9. Log-magnitude of discrete Fourier transform of three feature maps at the output of the LPF 1 block (top) and HPF block (bottom). Blue is lower magnitude. only the edge attention scalar γ . T able II reports the PSNR achiev ed on Set12 by the proposed method with the non-local aggregation performed as in Eq. (1) and a v ariant where the aggregation is computed as: H l +1 , NL i = X j ∈S l i γ j → i H l j . Both methods use a non-local graph with 8 nearest neighbors. W e can notice that the edge attention term alone achie ves a worse PSNR with respect to GCDN by approximately 0.2 dB, even though it improv es over a model without non-local neighbors (see T able I for the corresponding 0-NN value). This shows the adv antage of using a trainable afﬁne transformation, such as Θ in Eq. (1), instead of a scalar weight function with a predeﬁned structure. 0 100 200 300 400 500 Output neighbors 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 True positive rate LPF 1 LPF 2 LPF 3 LPF 4 last layer Figure 10. Accuracy of edge prediction from hidden layers. T able I. P S NR ( D B ) V . N O N - L O C AL N E IG H B O RH O O D S I Z E ( S E T 12 ) σ 0-NN 4-NN 8-NN 12-NN 16-NN 20-NN 15 32.91 33.09 33.11 33.13 33.14 33.13 25 30.50 30.70 30.74 30.75 30.78 30.78 50 27.28 27.52 27.58 27.58 27.60 27.59 Finally , we remark that we do not compare with respect to the full ECC without the approximations introduced in Sec. III-C because it suffers from vanishing gradient problems, rendering training unstable even for a much smaller number of layers, and it would be computationally prohibiti ve. D. Comparison with state of the art In this section we compare the proposed network with state- of-the-art models for the Gaussian denoising task of grayscale images. W e train an independent model for each noise standard deviation, which is assumed to be known a priori for all methods. W e ﬁx the number of neighbors for the proposed method to 16. The reference methods can be classiﬁed into model-based algorithms such as BM3D [5], WNNM [29], TNRD [32] and recent deep-learning methods such as DnCNN T able II. E D GE A T T E N TI O N V . E C C + E D G E AT TE N T I ON ( 8 - NN ) . P S NR ( D B ). σ Edge attention only Proposed 25 30.53 30.74 9 T able III. N ATU R A L I M AG E D E N OI S I N G R E S U L T S . M E T RI C S A R E P N S R ( D B ) A N D S S I M. Dataset Noise σ BM3D WNNM TNRD DnCNN N 3 Net NLRN GCDN Set12 15 32.37 / 0.8952 32.70 / 0.8982 32.50 / 0.8958 32.86 / 0.9031 - / - 33.16 / 0.9070 33.14 / 0.9072 25 29.97 / 0.8504 30.28 / 0.8557 30.06 / 0.8512 30.44 / 0.8622 30.55 / - 30.80 / 0.8689 30.78 / 0.8687 50 26.72 / 0.7676 27.05 / 0.7775 26.81 / 0.7680 27.18 / 0.7829 27.43 / - 27.64 / 0.7980 27.60 / 0.7957 BSD68 15 31.07 / 0.8717 31.37 / 0.8766 31.42 / 0.8769 31.73 / 0.8907 - / - 31.88 / 0.8932 31.83 / 0.8933 25 28.57 / 0.8013 28.83 / 0.8087 28.92 / 0.8093 29.23 / 0.8278 29.30 / - 29.41 / 0.8331 29.35 / 0.8332 50 25.62 / 0.6864 25.87 / 0.6982 25.97 / 0.6994 26.23 / 0.7189 26.39 / - 26.47 / 0.7298 26.38 / 0.7389 Urban100 15 32.35 / 0.9220 32.97 / 0.9271 31.86 / 0.9031 32.68 / 0.9255 - / - 33.42 / 0.9348 33.47 / 0.9358 25 29.70 / 0.8777 30.39 / 0.8885 29.25 / 0.8473 29.97 / 0.8797 30.19 / - 30.88 / 0.9003 30.95 / 0.9020 50 25.95 / 0.7791 26.83 / 0.8047 25.88 / 0.7563 26.28 / 0.7874 26.82 / - 27.40 / 0.8244 27.41 / 0.8160 T able IV . D EP T H M A P D E N O IS I N G R E SU LT S . M E T RI C S A R E P N S R ( D B) A N D S S I M. σ Method aloe art baby cones dolls laundry moebius reindeer A verage 15 GCDN 40.74 / 0.9873 40.66 / 0.9886 41.64 / 0.9917 39.29 / 0.9832 40.70 / 0.9830 41.97 / 0.9842 42.07 / 0.9877 42.62 / 0.9915 41.21 / 0.9872 NLRN 40.50 / 0.9844 40.48 / 0.9858 41.76 / 0.9899 39.50 / 0.9814 40.69 / 0.9800 41.96 / 0.9814 42.01 / 0.9848 42.44 / 0.9880 41.17 / 0.9845 OGLR 40.82 / 0.9801 40.77 / 0.9821 40.90 / 0.9806 39.65 / 0.9774 40.41 / 0.9756 41.32 / 0.9764 41.48 / 0.9793 41.72 / 0.9823 40.88 / 0.9792 25 GCDN 37.12 / 0.9771 37.15 / 0.9788 37.50 / 0.9814 35.88 / 0.9697 37.05 / 0.9705 38.62 / 0.9730 38.39 / 0.9786 38.80 / 0.9836 37.56 / 0.9766 NLRN 37.08 / 0.9720 37.01 / 0.9734 37.37 / 0.9797 36.09 / 0.9661 37.01 / 0.9646 38.42 / 0.9679 38.33 / 0.9723 38.65 / 0.9786 37.50 / 0.9718 OGLR 36.67 / 0.9592 36.68 / 0.9649 36.29 / 0.9594 35.51 / 0.9545 36.41 / 0.9541 37.44 / 0.9541 37.17 / 0.9575 37.86 / 0.9655 36.75 / 0.9587 50 GCDN 33.37 / 0.9522 33.18 / 0.9536 32.23 / 0.9468 31.61 / 0.9379 32.37 / 0.9417 34.07 / 0.9526 33.73 / 0.9567 34.35 / 0.9672 33.11 / 0.9511 NLRN 33.23 / 0.9444 32.86 / 0.9448 32.42 / 0.9534 31.53 / 0.9304 32.40 / 0.9347 34.15 / 0.9459 33.58 / 0.9475 34.37 / 0.9603 33.07 / 0.9452 OGLR 32.24 / 0.9121 31.92 / 0.9129 31.23 / 0.9027 30.21 / 0.8926 31.44 / 0.8999 32.85 / 0.9051 32.46 / 0.9093 32.99 / 0.9191 31.92 / 0.9067 [6], N 3 Net [12] and NLRN [13]. In particular , among the deep-learning methods, N 3 Net and NLRN propose non-local approaches. All results have been obtained running the pre- trained models provided by the authors, except for N 3 Net at σ = 15 which is unav ailable. T able III reports the PSNR and SSIM values obtained for the Set12, BSD68 and Urban100 standard test sets. It can be seen that the proposed method achiev es state-of-the art performance and works especially well at low to medium lev els of noise. This can be explained by a higher difﬁculty in constructing a meaningful graph from the noisy image at higher noise le vels. W e also notice that the proposed method achie ves strong results on the Urban dataset. This dataset contains higher resolution images with respect to the other two and is mainly composed of photos of buildings and other regular structures where exploiting self-similarity is very important. In addition, it is also worth mentioning that the proposed method provides a better visual quality . In man y cases, the proposed method has a higher SSIM score, ev en if NRLN has better performance in terms of PSNR. This can also be noticed in Fig. 11, which shows a visual comparison on an image from the Urban100 dataset. In general, the images produced by the proposed algorithm present sharper edges and smoother content in uniform areas. W e can notice that many areas in the photos from Urban100 hav e approximately piece wise smooth characteristics. It is well known that image processing algorithms based on graphs are well suited for piecewise smooth content (see, e.g., [48], [49] in the context of compression and [30] for denoising). T o further show this point, we study the performance of the proposed method for denoising of depth maps, e.g., generated by time-of-ﬂight cameras. The OGLR algorithm [30] based on a graph smoothness regularizer achiev ed state-of-the-art results among model-based algorithms for this speciﬁc task where it is essential to preserve edge sharpness while simultaneously smoothing the ﬂat areas. T able IV reports the PSNR and SSIM results achiev ed on a standard set of depth maps 1 . It can be seen that the proposed method outperforms both NLRN and OGLR, e ven at high le vels of noise. Also, we can notice that OGLR displays competitiv e performance at low noise levels, but its visual quality signiﬁcantly degrades when in presence of stronger noise. Fig. 12 shows a visual comparison where it can be seen that GCDN produces sharper edges while also providing a very smooth background. E. Real image denoising Real image noise is generally more challenging than syn- thetic Gaussian noise. There are multiple contributions such as quantization noise, shot noise, ﬁxed-pattern noise [1], [50], dark current, etc. that make it o verall signal-dependent. It has been observed [51], [52] that deep learning methods trained on synthetic Gaussian noise perform poorly in presence of real noise. Howe ver , suitable retraining with real data generally improv es their performance. In this section, we study the behavior of the proposed network in a blind denoising setting with real noisy images acquired by smartphones. W e retrain the proposed method, NLRN and DnCNN on the SIDD dataset [52] composed of 30000 high-resolution images acquired by smartphone cameras at varying illumination and ISO lev els. The authors provide clean and carefully registered ground truths for all the av ailable scenes, so that it is possible to perform a supervised training. W e create training and testing subsets from the sRGB images in the SIDD dataset by selecting a range of noise lev els. Our training set is composed of 3500 crops of size 512 × 512 whose RMSE with respect to the ground truth is below 15. The testing set is composed of 25 random crops of size 512 × 512 with noise in the same range as the training set. T able V reports the results for CBM3D [53], DnCNN, NLRN and the proposed GCDN. Notice that CBM3D is not a blind method, so we provide an estimate of the noise standard deviation, as computed by a noise 1 http://vision.middlebury .edu/stereo/data/. 10 Figure 11. Extract from Urban100 scene 13, σ = 25 . Left to right: ground truth, noisy ( 20 . 16 dB), BM3D ( 30 . 40 dB), DnCNN ( 30 . 71 dB), NLRN ( 31 . 41 dB), GCDN ( 31.53 dB ). Figure 12. aloe depthmap denoising, σ = 50 . Left to right: ground truth, noisy ( 14 . 16 dB), OGLR ( 32 . 24 dB), NLRN ( 33 . 23 dB), GCDN ( 33.37 dB ). T able V . R EA L I M AG E D E NO I S I NG ( S ID D DAT A S E T ) CBM3D DnCNN NLRN GCDN PSNR 38.73 dB 39.98 dB 41.24 dB 41.48 dB SSIM 0.9587 0.9605 0.9652 0.9697 estimation algorithm [54]. W e can notice that the proposed method achiev es better results and this is conﬁrmed by the visual comparison in Fig. 13. V . C O N C L U S I O N S In this paper, we presented a graph-con volutional neural network targeted for image denoising. The proposed graph- con volutional layer allows to exploit both local and non- local similarities, resulting in an adaptive receptiv e ﬁeld. W e showed that the proposed architecture can outperform state- of-the-art denoising methods, achie ving very strong results on piecewise smooth images. Finally , we have also considered a real image denoising setting, showing that the proposed method can provide a signiﬁcant performance gain. Future work will focus on e xtending the proposed architecture to other in verse problems, such as super-resolution [55], [56]. R E F E R E N C E S [1] J. Luk ´ a ˇ s, J. Fridrich, and M. Goljan, “Digital camera identiﬁcation from sensor pattern noise, ” IEEE T ransactions on Information F orensics and Security , v ol. 1, no. 2, pp. 205–214, 2006. [2] D. V alsesia, G. Coluccia, T . Bianchi, and E. Magli, “Compressed ﬁngerprint matching and camera identiﬁcation via random projections, ” IEEE T ransactions on Information F or ensics and Security , vol. 10, no. 7, pp. 1472–1485, July 2015. [3] Y . Romano, M. Elad, and P . Milanfar , “The little engine that could: Regularization by denoising (red), ” SIAM Journal on Imaging Sciences , vol. 10, no. 4, pp. 1804–1844, 2017. [4] Y . Sun, J. Liu, and U. S. Kamilov , “Block coordinate regularization by denoising, ” arXiv preprint , 2019. [5] K. Dabo v , A. F oi, V . Katko vnik, and K. Egiazarian, “Image denoising by sparse 3-D transform-domain collaborati ve ﬁltering, ” IEEE T ransactions on Imag e Processing , vol. 16, no. 8, pp. 2080–2095, 2007. [6] K. Zhang, W . Zuo, Y . Chen, D. Meng, and L. Zhang, “Beyond a Gaussian denoiser: residual learning of deep CNN for image denoising, ” IEEE Tr ansactions on Image Processing , vol. 26, no. 7, pp. 3142–3155, 2017. [7] X. Mao, C. Shen, and Y .-B. Y ang, “Image restoration using very deep conv olutional encoder-decoder networks with symmetric skip connections, ” in Advances in Neural Information Pr ocessing Systems 29 , D. D. Lee, M. Sugiyama, U. V . Luxburg, I. Guyon, and R. Garnett, Eds. Curran Associates, Inc., 2016, pp. 2802–2810. [8] Y . T ai, J. Y ang, X. Liu, and C. Xu, “Memnet: A persistent memory net- work for image restoration, ” in Pr oceedings of the IEEE international confer ence on computer vision , 2017, pp. 4539–4547. [9] W . Bae, J. J. Y oo, and J. C. Y e, “Be yond deep residual learning for image restoration: Persistent homology-guided manifold simpliﬁcation, ” in IEEE Conference on Computer V ision and P attern Recognition W orkshops (CVPRW) , July 2017, pp. 1141–1149. [10] C. Cruz, A. Foi, V . Katko vnik, and K. Egiazarian, “Nonlocality- reinforced conv olutional neural networks for image denoising, ” IEEE Signal Pr ocessing Letters , vol. 25, no. 8, pp. 1216–1220, Aug 2018. [11] S. Lefkimmiatis, “Universal denoising networks: a nov el CNN archi- tecture for image denoising, ” in Pr oceedings of the IEEE Conference on Computer V ision and P attern Recognition , 2018, pp. 3204–3213. [12] T . Pl ¨ otz and S. Roth, “Neural nearest neighbors networks, ” in Advances in Neural Information Pr ocessing Systems , 2018, pp. 1087–1098. 11 Figure 13. Real image denoising. Left to right: ground truth, noisy (23.80 dB), CBM3D (34.84 dB), DnCNN (36.05 dB), NLRN (37.15 dB), GCDN ( 37.33 dB ). [13] D. Liu, B. W en, Y . Fan, C. C. Loy , and T . S. Huang, “Non-local recur- rent network for image restoration, ” in Advances in Neural Information Pr ocessing Systems , 2018, pp. 1673–1682. [14] D. V alsesia, G. Fracastoro, and E. Magli, “Image denoising with graph-con volutional neural networks, ” in 2019 26th IEEE International Confer ence on Image Pr ocessing (ICIP) , 2019. [15] M. Henaf f, J. Bruna, and Y . LeCun, “Deep con volutional networks on graph-structured data, ” arXiv pr eprint arXiv:1506.05163 , 2015. [16] M. Defferrard, X. Bresson, and P . V andergheynst, “Con volutional neural networks on graphs with fast localized spectral ﬁltering, ” in Advances in Neural Information Pr ocessing Systems , 2016, pp. 3844–3852. [17] T . N. Kipf and M. W elling, “Semi-supervised classiﬁcation with graph con volutional networks, ” in International Conference on Learning Rep- r esentations (ICLR) 2017 , 2017. [18] D. Shuman, S. Narang, P . Frossard, A. Ortega, and P . V andergheynst, “The emerging ﬁeld of signal processing on graphs: Extending high- dimensional data analysis to networks and other irregular domains, ” IEEE Signal Processing Magazine , vol. 3, no. 30, pp. 83–98, 2013. [19] M. Schlichtkrull, T . N. Kipf, P . Bloem, R. V an Den Berg, I. T itov , and M. W elling, “Modeling relational data with graph conv olutional networks, ” in Eur opean Semantic W eb Conference . Springer , 2018, pp. 593–607. [20] M. Simonovsky and N. K omodakis, “Dynamic edge-conditioned ﬁlters in conv olutional neural networks on graphs, ” in IEEE Confer ence on Computer V ision and P attern Recognition (CVPR) , July 2017, pp. 29– 38. [21] Y . W ang, Y . Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon, “Dynamic graph cnn for learning on point clouds, ” arXiv pr eprint arXiv:1801.07829 , 2018. [22] K. Xu, W . Hu, J. Leskovec, and S. Jegelka, “How powerful are graph neural networks?” in International Confer ence on Learning Repr esen- tations (ICLR) 2019 , 2019. [23] F . Monti, D. Boscaini, J. Masci, E. Rodola, J. Svoboda, and M. M. Bronstein, “Geometric deep learning on graphs and manifolds using mixture model cnns, ” in Proceedings of the IEEE Conference on Computer V ision and P attern Recognition , 2017, pp. 5115–5124. [24] N. V erma, E. Boyer, and J. V erbeek, “Feastnet: Feature-steered graph con volutions for 3d shape analysis, ” in Pr oceedings of the IEEE Confer ence on Computer V ision and P attern Recognition , 2018, pp. 2598–2606. [25] D. V alsesia, G. Fracastoro, and E. Magli, “Learning localized generative models for 3d point clouds via graph con volution, ” in International Confer ence on Learning Repr esentations (ICLR) 2019 , 2019. [26] L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms, ” Physica D: nonlinear phenomena , vol. 60, no. 1-4, pp. 259–268, 1992. [27] C. T omasi and R. Manduchi, “Bilateral ﬁltering for gray and color images. ” in ICCV , vol. 98, no. 1, 1998, p. 2. [28] A. Buades, B. Coll, and J.-M. Morel, “ A non-local algorithm for image denoising, ” in 2005 IEEE Computer Society Conference on Computer V ision and P attern Recognition (CVPR’05) , vol. 2. IEEE, 2005, pp. 60–65. [29] S. Gu, L. Zhang, W . Zuo, and X. Feng, “W eighted nuclear norm minimization with application to image denoising, ” in Pr oceedings of the IEEE Confer ence on Computer V ision and P attern Recognition , 2014, pp. 2862–2869. [30] J. Pang and G. Cheung, “Graph Laplacian regularization for image denoising: analysis in the continuous domain, ” IEEE T ransactions on Image Processing , vol. 26, no. 4, pp. 1770–1785, 2017. [31] M. Elad and M. Aharon, “Image denoising via sparse and redundant representations over learned dictionaries, ” IEEE Tr ansactions on Image Pr ocessing , vol. 15, no. 12, pp. 3736–3745, Dec 2006. [32] Y . Chen and T . Pock, “Trainable nonlinear reaction diffusion: A ﬂexible framew ork for fast and effecti ve image restoration, ” IEEE transactions on pattern analysis and machine intelligence , vol. 39, no. 6, pp. 1256– 1272, 2016. [33] H. C. Burger , C. J. Schuler , and S. Harmeling, “Image denoising: Can plain neural networks compete with BM3D?” in IEEE Confer ence on Computer V ision and P attern Recognition (CVPR) . IEEE, 2012, pp. 2392–2399. [34] S. Ioffe and C. Szegedy , “Batch normalization: Accelerating deep network training by reducing internal covariate shift, ” in Proceedings of the 32Nd International Conference on International Conference on Machine Learning - V olume 37 , ser. ICML ’15. JMLR.or g, 2015, pp. 448–456. [Online]. A vailable: http://dl.acm.org/citation.cfm?id= 3045118.3045167 [35] C. Sze gedy , W . Liu, Y . Jia, P . Sermanet, S. Reed, D. Anguelo v , D. Erhan, V . V anhoucke, and A. Rabinovich, “Going deeper with con volutions, ” in 2015 IEEE Confer ence on Computer V ision and P attern Recognition (CVPR) , June 2015, pp. 1–9. [36] N. Divakar and R. V . Babu, “Image denoising via CNNs: an adversarial approach, ” in New T rends in Image Restoration and Enhancement, CVPR , 2017. [37] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition, ” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2016, pp. 770–778. [38] L. Gong and Q. Cheng, “Exploiting edge features for graph neural networks, ” in Pr oceedings of the IEEE Conference on Computer V ision and P attern Recognition , 2019, pp. 9211–9219. [39] Y . Cheng, F . X. Y u, R. S. Feris, S. Kumar , A. Choudhary , and S.- F . Chang, “ An exploration of parameter redundancy in deep networks with circulant projections, ” in The IEEE International Confer ence on Computer V ision (ICCV) , December 2015. [40] J. W u, “Compression of fully-connected layer in neural network by kro- necker product, ” in 2016 Eighth International Confer ence on Advanced Computational Intelligence (ICACI) , Feb 2016, pp. 173–179. [41] A. Hinrichs and J. Vyb ´ ıral, “Johnson-lindenstrauss lemma for circulant matrices, ” Random Structur es & Algorithms , vol. 39, no. 3, pp. 391– 398, 2011. [42] D. V alsesia, G. Coluccia, T . Bianchi, and E. Magli, “User authentication via prnu-based physical unclonable functions, ” IEEE T ransactions on 12 Information F orensics and Security , vol. 12, no. 8, pp. 1941–1956, Aug 2017. [43] D. V alsesia and E. Magli, “Binary adaptive embeddings from order statistics of random projections, ” IEEE Signal Pr ocessing Letters , vol. 24, no. 1, pp. 111–115, Jan 2017. [44] X. Glorot and Y . Bengio, “Understanding the difﬁculty of training deep feedforward neural networks, ” in Pr oceedings of the Thirteenth International Confer ence on Artiﬁcial Intelligence and Statistics , ser. Proceedings of Machine Learning Research, Y . W . T eh and M. Titter - ington, Eds., vol. 9. Chia Laguna Resort, Sardinia, Italy: PMLR, 13–15 May 2010, pp. 249–256. [45] P . L. Combettes and J.-C. Pesquet, “Proximal splitting methods in signal processing, ” in F ixed-point algorithms for inverse problems in science and engineering . Springer , 2011, pp. 185–212. [46] D. Martin, C. F owlkes, D. T al, and J. Malik, “ A database of human segmented natural images and its application to evaluating se gmentation algorithms and measuring ecological statistics, ” in Pr oc. 8th Int’l Conf. Computer V ision , vol. 2, July 2001, pp. 416–423. [47] D. P . Kingma and J. Ba, “ Adam: A method for stochastic optimization, ” arXiv pr eprint arXiv:1412.6980 , 2014. [48] W . Hu, G. Cheung, A. Ortega, and O. C. Au, “Multiresolution graph fourier transform for compression of piecewise smooth images, ” IEEE T ransactions on Imag e Processing , v ol. 24, no. 1, pp. 419–433, Jan 2015. [49] G. Fracastoro, D. Thanou, and P . Frossard, “Graph-based trans- form coding with application to image compression, ” arXiv preprint arXiv:1712.06393 , 2017. [50] G. C. Holst, CCD arrays, camer as, and displays . JCD Publishing SPIE Press, 1992. [51] T . Plotz and S. Roth, “Benchmarking denoising algorithms with real photographs, ” in Pr oceedings of the IEEE Conference on Computer V ision and P attern Recognition (CVPR) , 2017, pp. 1586–1595. [52] A. Abdelhamed, S. Lin, and M. S. Brown, “ A high-quality denoising dataset for smartphone cameras, ” in IEEE Conference on Computer V ision and P attern Recognition (CVPR) , June 2018. [53] K. Dabov , A. Foi, V . Katkovnik, and K. O. Egiazarian, “Color image denoising via sparse 3d collaborative ﬁltering with grouping constraint in luminance-chrominance space. ” in ICIP (1) , 2007, pp. 313–316. [54] G. Chen, F . Zhu, and P . A. Heng, “ An efﬁcient statistical method for image noise lev el estimation, ” in 2015 IEEE International Confer ence on Computer V ision (ICCV) , Dec 2015, pp. 477–485. [55] C. Dong, C. C. Loy , K. He, and X. T ang, “Learning a deep con volu- tional network for image super-resolution, ” in Eur opean conference on computer vision . Springer , 2014, pp. 184–199. [56] A. Bordone Molini, D. V alsesia, G. Fracastoro, and E. Magli, “Deep- sum: Deep neural network for super-resolution of unregistered multi- temporal images, ” arXiv pr eprint arXiv:1907.06490 , 2019.

Deep Graph-Convolutional Image Denoising

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment