The Projected GSURE for Automatic Parameter Tuning in Iterative Shrinkage Methods

In many applications in signal and image processing there is a need for solving a linear inverse problem [1,2]. Here we consider the scenario in which an original image x is degraded by a linear operator H followed by an additive white Gaussian noise w. Our observation can be written as where the goal is to reconstruct the original image x from y. In our setting, there is no Bayesian prior on x; it is therefore treated as a deterministic unknown. There are many techniques in the literature for this task that focus on different objectives [1,2,3,4,5,6,7,8,9,10,11]. Most of these methods include parameters that require tuning. Some of the algorithms are iterative, and thus the number of iterations needs to be set as well. Tuning of such parameters is generally not a trivial task. Very often, the objective in these problems is recovery of the signal with a minimal Mean-Squared Error (MSE) between the original image and the estimated one [12]. In this case, ideally, the parameters should be chosen such that this MSE is minimized. However, since the original image x is deterministic, the MSE will depend in general on the unknown x. Consequently, we cannot know what choice of parameters minimizes the true MSE. In practice, therefore, a common approach to parameter tuning is still manual, by looking at the reconstructed result and judging its quality subjectively. The literature offers several automatic ways for choosing the parameters of the reconstruction algorithm [1]. One such method is the Unbiased Predictive Risk Estimator (UPRE) [13], which is valid only for the case that the reconstruction operator is linear. Other popular tools are the generalized cross validation (GCV) technique with its many variants [14], and the L-Curve method [15], both available for general reconstruction algorithms. The number of iterations can be chosen using the discrepancy principle [16], such that the norm of the error between the degraded image and reconstructed image is close to the noise variance. An alternative technique for parameter tuning is the Stein Unbiased Risk Estimator (SURE) [17,18]. SURE provides an unbiased estimate of the MSE for a candidate solver, including non-linear ones. Minimizing this estimated MSE, leads to the automatic tuning desired. The original SURE by Stein is limited to the case of denoising of white additive Gaussian noise. One of the best known algorithms that uses SURE is Donoho's SureShrink denoising algorithm [19]. There are also other contributions along these lines, as in [20,21,22]. Recently, a generalization of SURE (GSURE) [23] has been developed for more general models that can be described in the form of exponential family distribution. This provides a new opportunity for choosing the parameters automatically in a much more diverse set of inverse problems. In the case where H is rank-deficient, or even rectangular (with more columns than rows), the inverse problem becomes ill-posed. In such a scenario, GSURE can at best estimate the MSE in the range of H T . We refer hereafter to a tuning of the algorithm's parameters with this estimation as the projected GSURE. The reconstruction algorithms used in our work belong to the iterated shrinkage family of methods [7,8,24,25], In these algorithms, the image is known to have a sparse representation over a dictionary. Reconstruction is obtained by an iterative process composed of applying the dictionary and the degradation operator with their conjugates, in addition to a point-wise shrinkage operation. In these algorithms the threshold value λ in the shrinkage parameter needs to be tuned, along with the number of iterations K to be performed. The work reported in [26] develops a special case of GSURE for the case where the blur operator H is assumed to be full-rank. This work also addresses automatic tuning of the parameters λ and K in an iterated shrinkage algorithm, used for the deblurring problem. For a pre-chosen number of iterations, the GSURE is used to select a fixed value of λ that optimizes the overall estimated MSE. For a given λ, the GSURE is again used to determine the number of iterations. We refer to this method of setting λ and K as the global method. Similar to the work in [26], this paper proposes to tune λ and K for iterated shrinkage algorithms, which are geared towards solving inverse problems in image processing. Our work extends the applicability of GSURE to illposed inverse problems, using the projected GSURE. We show that turning from the plain GSURE to its projected version, the performance of the parameter tuning improves substantially. In addition, we propose an alternative to the global method mentioned above, where the value of λ is chosen by minimizing the estimated MSE at each iteration, thereby obtaining a different value of λ per iteration. We refer to this technique as the greedy method. The main benefit of such a strategy is the natural way with which the number of iterations is set simultaneously -the iterations can be stopped when the estimated MSE improvement is under a certain threshold. In order to further improve the performance with regard to the overall estimated MSE in this greedy approach, λ in each iteration can be chosen with a look-ahead on the estimated MSE in the next few iterations. This paper is organized as follows: In Section 2 we present the deblurring and image scaling (zooming) problems, and the iterative shrinkage algorithms used for solving them. Section 3 describes several conventional techniques for parameter setting (GCV, L-curve, and the discrepancy method). Section 4 presents the SURE with its generalization, as developed in [23]. In Section 5 we use GSURE for the task of parameter selection in the iterative shrinkage algorithms using the two approaches -global and greedy. The results of these methods are presented in Section 6. We conclude in Section 7 with a summary of this work, and a discussion on future research directions. To define a linear inverse problems, we need to specify the model of the original image, the noise characteristics, and the degradation operator. We consider two types of operators H. The first is a blur operator, and the second is a scale-down operator that defines a scale-up problem. In the blur case, H is a square matrix that blurs each pixel by replacing it with a weighted average of its neighbors. In the scale-down scenario, H is composed of two operators applied sequentially, the first blurs the image and the second down-samples it. In both settings we assume that the operators are known, and the additive noise w is white Gaussian with known variance σ 2 . Our goal in both settings is to reconstruct the original image x without assuming any Bayesian prior. In order to evaluate the quality of the reconstruction, we measure the MSE between the reconstructed image and the original one, This criterion is very commonly used in image processing despite its known shortcomings [27]. Two related measures that are also common are the Peak Signal to Noise Ratio (PSNR), and the Improvement in Signal to Noise Ratio (ISNR) 1 , which are both inversely proportional to the MSE. Thus, all of the criteria favor low MSE. Since the goodness of our fit is measured by the MSE, a natural approach would be to choose an estimate x that directly minimizes the MSE. Unfortunately, however, the MSE depends in general on x and therefore cannot be minimized directly. This difficulty is explained in more detail in [12]. To avoid direct MSE minimization, in the context of imaging it is often assumed that x has a sparse representation α under a given dictionary D [28]. This knowledge of x is then used together with a standard least-squares criterion to find an estimate x. More specifically, a popular approach to recover x is to first seek the value of α that minimizes an objective of the form: The first term requires that the reconstructed image, after degradation, should be close to the measured image y. The second term forces sparsity using the penalty function ρ. The minimizer of this function is the representation of the desired image, and its multiplication by D gives the reconstructed image. Although intuitive, this strategy does not in general minimize the MSE. The minimization of f (α) in ( 3) is typically performed using an iterative procedure. An appealing choice is the iterative shrinkage methods [7,8]. These algorithms are iterative, applying the matrices H and D and their adjoints, and an additional element-wise thresholding operation in each iteration. Consequently, these methods are well-suited for high-dimensional problems with sparse unknowns. This family of techniques includes the Separable Surrogate functionals (SSF) method [7], the Parallel Coordinate Descent (PCD) method [29,24], and some others. In this work we focus on the SSF and PCD techniques. The SSF algorithm proposes the iteration formula which is proven to converge [7] to a local minimum of (3). In the case where ρ is convex, this is also the global minimum. The term S ρ,λ/c is a shrinkage operator dependant on ρ with threshold λ/c. The constant c depends on the combined operator HD. The resulting estimator is where K is the number of SSF iterations performed. When our task is minimizing the objective in (3), K is chosen to be the point that the decreasing rate is under a certain threshold. When aiming at minimization of the MSE, both λ and K need to be tuned carefully: in many cases, from a certain iteration, the objective value continues to descend, while the MSE value increases. In [29,24] an alternative iterated shrinkage approach is presented, which leads to faster convergence. In this technique, termed PCD, the iteration formula is given by where The diagonal matrix W contains the norms of the columns of HD, which are computed off line. Once v k is computed, a line search is performed along the trajectory v kα k , using the parameter µ. This method is also proven to converge [24]. Our estimator in this case is computed by (5) like in the SSF. In order to use the SSF and the PCD methods for the inverse problems mentioned, the penalty function ρ and the dictionary D need to be chosen as well as the λ and the number of iterations. Following Elad et. al. [24], we use the function This is a smoothed version of the ℓ p -norm in the range 1 < p < 2. The parameter s defines which ℓ p -norm we are approximating in this range. For small values of s (e.g., s = 0.001) this function serves as a smoothed version of the ℓ 1 -norm, and we use it as the penalty function throughout this work. This choice of ρ dictates the shape of the shrinkage operator S [24]. There are various possibilities for choosing D. In our simulations, and following the work reported in [9,25,8], we use the undecimated Haar wavelet with 3 layers of resolution. The motivation for this choice is that when looking for sparse representations, redundant dictionaries are favorable. The parameter λ which determines the weight of the penalty function also needs setting, as do the number of iterations. Ideally, we would like to choose the parameters that minimize the MSE between the original image x and its estimate. However, as we have seen already, the dependence of the MSE on x precludes its direct minimization. We now turn to present several methods that aim to bypass this difficulty. The main approaches the literature offers for automatic parameter setting are the GCV [14], the L-Curve [15] and the discrepancy principle [16]. For linear deconvolution methods, the parameter selection can also be based on the Unbiased Predictive Risk Estimator (UPRE) [13]; however, the algorithms we treat here are nonlinear. In this section we accompany the description of these methods with an image deblurring experiment along the following lines: Working on the image cameraman, we use the blur kernel and an additive white Gaussian noise with variance σ 2 = 2. This experiment and its chosen parameters are adopted from Figueiredo et. al. [9]. For a reconstruction algorithm h λ,K (•) and a degraded image y, we define the residual of the algorithm to be In the case where h λ,K (•) is linear, the GCV method selects the parameters to be those that minimize the GCV functional given by 1 where I is the identity matrix, and n y is the length of y. In high dimensions, the trace in the denominator of ( 11) is costly to compute directly. Stochastic estimation of this expression can be used instead [30]. Specifically, noting that where n is a random vector n ∼ N(0, I), we can approximate the trace by one such instance of the random vector. More than one instance of the random variable n can be used for better estimation of the trace by averaging, but in this work we use just one such realization of n. Replacing the trace in (11) yields Some extensions of the GCV to non-linear cases were treated in [31,32,33,1], but as far as we know the literature does not offer extensions of the GCV to non-linear deconvolution algorithms like the one we address in this work. We suggest using (12) as the GCV objective to be minimized for the parameters selection, despite the non-linearity of h λ,K (•). Figure 1 presents an experiment using GCV. For fixed number of iterations, K = 43, the GCV as a function of λ is presented in this figure. The value of λ is chosen to be the minimizer of the GCV function. This minimal point can be found by a line search, but such a procedure may get trapped in a local minimum of the GCV curve, since the obtained curve is not necessarily unimodal. Here and in later figures, the value chosen based on the true MSE is marked by an asterisk. In this case, as can be seen, the GCV performs well. For setting a parameter with the L-curve method, a log of the squared norm of the result of the algorithm versus the squared norm of its residual is plotted over a range of parameter values. The result is typically a curve with an L shape. The L-curve criterion is to choose the parameter value that corresponds to the "corner" of the curve -the point with maximum curvature [15]. For a parameter of interest λ, the curve is defined by the x-axis and the y-axis Y(λ) = log h λ (y) 2 2 . The curvature is then given by The value of λ is selected to maximize (13). Since the L-curve can only be sampled at specific discrete points, the calculation of the second order derivatives is very sensitive and thus should be avoided. Following Hansen and O'Leary [15], we apply a local smoothing on the L-curve points and use the new smoothed points as control points for a cubic spline. We then calculate the derivatives of the spline at those points and get the curvature on each of them. The smoothing is done by fitting five points, where the smoothed point is the center of them, to a straight line in the least squares sense. As in the case of the GCV, according to our knowledge, the L-Curve has not been applied to non-linear deconvolution methods. Limitations of this approach for linear deconvolution are presented in [1,34]. In addition, the curvature is very sensitive and thus we can very easily be diverted from the appropriate value. In addition to this shortcoming, each calculation of a curvature of a point needs the values of the L-curve of its neighbors. Therefore, it is hard to tune the parameters in an efficient way. Figure 2 presents the experiment results using the L-curve method. This figure shows both the L-curve (top) and its curvature (bottom) for a fixed K = 43. The location of the chosen parameter λ is marked by a diamond. As can be seen, the λ value chosen as the curvature maximizer, in this case, is very close to the ideal value. A very intuitive parameter tuning idea is to use the discrepancy principle. Based on the noise properties, the original image approximately satisfies Hxy 2 2 ≈ n y σ 2 . The discrepancy principle assumes that the same should hold for the reconstructed image and thus selects the parameters based on the assumption that The iteration number, K, is chosen to be the first iteration in which Hh K (y)y 2 2 < n y σ 2 . Experimenting with the discrepancy principle, Fig. 3 presents the value of Hh K (y)y 2 2n y σ 2 as a function of K, where λ = 0.065 is fixed. The discrepancy principle selects the parameter value that achieves the minimal value on the graphs. As can be seen, although this method is very appealing, in many instances like this one, it tends to select a value of K far from the optimal one. In this section we have described three parameter tuning methods: The GCV, the L-curve and the discrepancy principle. The discrepancy principle, which is commonly used as a stopping criterion, tends to fail in many cases, as can be seen in this section. As for the GCV and the L-curve methods, as far as we know, they were not applied before to non-linear deconvolution algorithms of the kind we use here. While their deployment is reasonable, their performance tends to be overly sensitive, possibly leading to poor results, as will be shown in the results section. The fact that these three methods may give good parameter tuning in part of the cases may be explained by the fact that these methods do not aim at minimizing the MSE but use different considerations for the tuning. This leads us naturally to consider ways to select the parameters directly based on an estimated value of the MSE. In this section we discuss an unbiased estimator for the MSE obtained by any reconstruction algorithm. This MSE estimate will later be used for parameter selection. The Stein unbiased risk estimator (SURE) was proposed in [17,18] as an estimate of the MSE in the Gaussian denoising problem, namely, when H = I and the noise is i.i.d. and Gaussian. This idea was then extended in [23] to handle more general inverse problems of the form discussed here, and a wider class of noise distributions. We refer to the later MSE estimate as the generalized SURE (GSURE). The essential concept behind the GSURE principle is to seek an unbiased estimate for the MSE that is only a function of the estimate h(y) and y (up to a constant which may depend on x but is independent of h(y)). Denote such an MSE estimate by η(x, y). Then, If we have access to such a function η(x, y) that estimates the MSE, while being also dependent on a set of parameters Θ (that control the reconstruction performance), then we may choose the values of Θ so as to minimize this function, and in that way aim at minimizing the true MSE as a consequence. For the case where H has full column rank, the GSURE is given by Eldar [23] η(h(u), y) Here, u = (1/σ 2 )H T y is the sufficient statistic for the model (1), ∇ is the divergence operator and c 1 is a constant independent of h(u). Given that our goal is to minimize the MSE by setting h(u), the constant c 1 is irrelevant for the minimization process. The term x ML stands for the Maximum Likelihood (ML) estimator -the minimizer of min z y -Hz . If H T H is invertible, then this estimate is given by The MSE estimate presented in ( 16) is valid only if H T H is invertible. When this condition is not satisfied, H has a nontrivial null space. Therefore, we have no information about the components of x that lie in the null space N(H). Thus we can estimate only the projected MSE on the range of H T (denoted by R(H T )), which is equal to the orthogonal complement of the null space N(H) ⊥ . Therefore, instead of attempting to minimize an unbiased estimate of the MSE, we consider an unbiased estimate of the projected MSE: E P R(H T ) h(u)x 2 2 , where is the orthogonal projection onto R(H T ) = N(H) ⊥ . Here † denotes the Moore-Pernose pseudo inverse. As shown in [23], an unbiased estimator for the projected MSE is given by the same formula as ( 16), but applied on the projected estimate h(u): Again, c 2 is a constant independent of h(y). Here the ML estimate is chosen as the minimum-norm minimizer of (18), given by The generalization of SURE provides the possibility of using this estimator for the case of general H, and specifically, for the deblurring and image scale-up problems. The tuning will be demonstrated for the SSF and PCD algorithms. As seen in Section 2, these methods have mainly two unknown parameters, the iterations number K and the thresholding value λ. Vonesch, Ramani and Unser, in [26], have already addressed this problem for the case where the operator H has full column rank. In their method, referred to as the global method, one parameter is being set at a time, given the other. We extend this for general H using the projected GSURE, showing the advantage of its usage over the regular GSURE, and over the conventional parameter setting methods that were presented in Section 3. Though the global method for parameter setting provides us with a good estimation of the parameters as demonstrated in the results section, it sets only one parameter given the other. We present an additional approach that sets both together. The parameter λ takes on a different value in each iteration of the algorithm in a greedy fashion. This way, we choose K together with λ, setting it to be the point where the estimated MSE stops decreasing. This method has the same computational cost for choosing two parameters as the global method has for selecting only one, without degrading the MSE performance. We also suggest a greedy look-ahead method that aims at better reconstruction results at the expense of higher complexity. We compare between these algorithms and the global method demonstrating the advantage of using these approaches for parameter tuning when both λ and K are unknown. In order to use GSURE with the iterated shrinkage algorithms, described in Section 2, we need to calculate the ML estimator, the derivative of the reconstruction algorithm with respect to u, and the projection operator in the rankdeficient case. In our experiments, the ML estimator and the projection matrix are easily calculated by exploiting the fact that the blur operators considered are block circulant. As such, they are diagonalized by the 2D discrete Fourier transform (DFT). This helps both for the deblurring and the scale-up problems we handle. To calculate the derivatives with respect to u, we first reformulate the iterative equation as a function of u instead of y. Rewriting (4) leads to where x = h λ,K (u) = Dα K as in (5). The derivatives of x are calculated recursively by first noticing that where S ′ ρ,λ/c (•) is an element wise derivative of the shrinkage function, organized as a diagonal matrix. From here, the divergence of the estimator can be directly obtained by multiplying the above by D from the left, gathering the diagonal of the matrix and summing it up: It is impracticable to calculate the Jacobian of ( 24) in high dimensions. Instead we use the trace estimator that was used for the GCV in Section 3.1. Following Vonesch et. al. [26] we choose a random vector n ∼ N(0, I). Using this estimator we iterate over dα k du n , which can be calculated instead of dα k du , leading to the following iterative formula For the estimation of the MSE for the PCD algorithm we need to calculate the trace as was done for the SSF. We rewrite the PCD iteration in terms of u The derivative with respect to u is given by Using randomization we get the following iterative formula for estimating the trace value, Now that we hold a complete expression for the GSURE MSE estimate, K and λ can be chosen by minimizing it. This strategy was developed in [26] and is referred to as the global method. In the full-rank case, the projection matrix is the identity and the projected GSURE coincides with the regular GSURE. Thus we can use the projected GSURE equation in (21) for both cases. For a fixed and pre-chosen number of iterations K, the λ which minimizes the GSURE expression is chosen. Repeating this process for various values of K, one can minimize the overall global MSE with respect to both K and λ. As an analytical minimization of the GSURE is hard to achieve, we use a golden section search [35]. We initialize these algorithms by and Denoting by T the GSURE calculation time (per-iteration), n gs the golden-section number of iterations, and n is the number of iterations of the iterated shrinkage algorithm, the time complexity of the global method for setting λ, when K is fixed, is O(n is n gs T ). For setting K another golden section can be performed over the number of iterations. In the greedy method, instead of choosing one global λ for all the iterations, we take a different route. Turning to a local alternative, the value of λ can be determined as the minimizer of the estimated MSE of the current iteration. In this strategy, K is set together with λ as the point where the estimated MSE (almost) stops decreasing. This allows us to avoid setting each of the parameters separately, as done in the global method. The algorithm proposed is the following: • Initialize α 0 and calculate its derivative w.r.t. u. • Repeat: )). 2. Perform the iterations in ( 27) and ( 29) (PCD case) or ( 23) and ( 26) (SSF case) for the calculation of α k and dα k du n using λ * k . 21), ( 5), ( 25), ( 20) and ( 19) or ( 22). The complexity of the greedy algorithm is also O(n is n gs T ). When one of the parameters is fixed in the global method and we need to set only the other parameter we get the same complexity in the two approaches. In the case where both parameters are to be set, the greedy technique has an advantage since it chooses the number of iterations along with the parameter λ. In contrast, in the global method the number of iterations is either pre-chosen and thus suboptimal, or searched using yet another golden-search for optimizing K increasing the overall complexity. To further decrease the MSE, we can modify the greedy algorithm by introducing a look-ahead option. One of the problems of the greedy strategy is that it minimizes the MSE of the current iteration but can harm the overall MSE. Thus, instead of choosing λ that minimizes the estimated MSE of the current iteration, we select it to minimize the estimated MSE of r iterations ahead, assuming that these r iterations are performed using the greedy approach described above. This change provides a look-ahead of r iterations, formulated as the following algorithm: • Initialize α 0 and calculate its derivative w.r.t. u. • Repeat: 1. Set λ * k by minimizing the estimated MSE r iterations ahead using the above-described greedy algorithm. 2. Perform a single iteration as in ( 27) and ( 29) (PCD case) or ( 23) and ( 26) (SSF case) for the calculation of α k and dα k du n using λ * k . 21), ( 5), ( 25), ( 20) and ( 19) or ( 22). In step 1, for each test λ in the golden-section, r iterations of the greedy method are performed, as described for the greedy algorithm. Finally, λ of the current iteration is chosen such that the estimated MSE of the last rth greedy iteration is minimized. The time complexity of the r look-ahead greedy algorithm is n is (n gs ) 2 rT , which is n gs r times slower than the other two methods. However, this is partially compensated for in some cases by getting smaller n is due to faster convergence. Furthermore, the resulting MSE is often lower, as we illustrate in the results section. The greedy methods were previously presented, with preliminary results on low dimension signals, in [36]. In this work we extended their use for images. In this section we present a set of experiments that demonstrate parameter tuning for the deblurring and the scale-up problems, using the PCD and the SSF algorithms. The representative experiments chosen correspond to different scenarios, which enable a demonstration of the various algorithms and their behavior. We should note that our work included various more tests, which are not brought here because of space limitation, and which lead to the same final conclusions. Table 1 lists the five tests performed. The tests we present divide into two groups: (i) preliminary ones that demonstrate the superiority of the projected GSURE over all the other alternatives, and then (ii) advanced tests that compare the global and the greedy methods. Blur We start by demonstrating the core ability of the GSURE and the classical techniques to choose the required parameters. In Fig. 4 we present the GSURE ISNR estimation 2 for Problem-1 applied on Cameraman with σ 2 = 2 as a function of the iterations. Similarly, Fig. 5 presents the GSURE ISNR as a function of λ for the same problem with σ 2 = 8. In both cases, the other parameter is kept fixed, choosing it to be the optimal one based on the true MSE. It can be observed in both graphs that the GSURE gives a very good estimation for the MSE and thus also a high-quality estimate of both λ and the number of iterations K. It can also be seen that it selects the parameters to be much closer to the optimal selection, compared to all the conventional methods. Among these conventional methods, the L-curve gives the best approximation for the parameters. However, it is hard to compute in an efficient way. The other methods approximate λ relatively well but fail in choosing K. Repeating the experiment reported in Fig. 4 on the image Boat and fixing λ = 0.0725, Table 2 summarizes the ISNR reconstruction results for setting K. It can be seen again that the GSURE gives the best reconstruction results, being approximately the same as those achieved by the optimal selection. In Figs. 6 and7 a comparison is made between the ISNR estimation using the regular and the projected GSURE. The problem considered in the first graph is Problem-2 and in the second is Problem-3, where both are applied on Cameraman. The selected parameters of the various methods are marked. The first graph is a function of λ and the second is a function of the iterations K. It can be seen that for both operators the estimation of λ and the iterations number using the projected GSURE is reasonable and always better than using the regular GSURE. Also, we see that in the case where we have a blur with real null space we get a totally unreliable estimation of the ISNR with the regular GSURE. We also see that L-curve fails this time in estimating λ. Table 3 compares the various parameter setting methods for a deblurring experiment based on Problem-2 for reconstructing the image Peppers when λ is being tuned. The visual effect of the various parameter selections can be observed in Fig. 8. We see the same behavior as before, indicating that the projected GSURE is the best choice, The true and estimated ISNR as a function of K, using the GSURE and its projected version. Applied on Problem-3, λ is set to λ = 0.67, and K is estimated using the various proposed methods. leading to the best reconstruction results. The results again show that the regular GSURE (and the conventional methods) tend to fail, while the projected version performs very well. In Figs. 9 and 10 a comparison is made between the estimation of the PSNR using the regular and the projected GSURE for Problem 4 (scale-up). As before, a comparison is done between the parameter values selected by the various methods. In addition, we show the reconstruction quality achieved by the bicubic, the Lanczos, and the bilinear interpolation methods 3 , as a reference performance to compare against. The first graph is a function of the iterations K and the second is a function of λ. It can be seen that the reconstruction algorithm with correctly tuned parameters achieves better reconstruction results than the conventional scale-up techniques, demonstrating the importance of the parameter tuning. Here as well, it can be observed that the projected GSURE provides the best estimation for the selection of λ and K. As in the case of deblurring, the scale-up results show that the projected GSURE gives the best reconstruction results. Because of its superiority, we shall use it hereafter in the various experiments that follow. Since in the case of a full rank operator, the GSURE and projected GSURE coincide, in every place that GSURE is referenced, the projected GSURE would be the one being intended. We now turn to look at the greedy strategies and compare them to the global method, all aiming to jointly estimate K and λ. In Fig. 11 we present the results obtained for Problem-1 applied on Cameraman. The graph shows the true ISNR after each iteration of the greedy and the greedy look-ahead methods (with r = 1), together with their ISNR estimation. As can be seen, the estimation has the same behavior as the true ISNR. In particular, in both graphs the maxima are very close. Choosing the number of iterations based on GSURE yields an ISNR very close to the one based on the true ISNR. This justifies our claim about using the greedy algorithm as an automatic stopping rule, by detecting an increase in the ISNR. The maxima in both graphs (marked in a circle as before) are very close, both in the greedy, as well as in the greedy look-ahead case. Figure 12 presents the actual result obtained for a similar experiment applied on the image House with Problem-2. A comparison of the MSE of the global, the greedy, and the look-ahead techniques for the deblurring problem in images is presented in Fig. 13 for Problems-1, 2, and 3 on the image Cameraman. In all cases, no real advantage is seen in terms of ISNR by each of the techniques. Since the greedy algorithm sets the iterations number together with λ, this means that the greedy methods are more efficient in the overall performance, at a possible slight costs in terms of the final ISNR. Turning to the scale-up problems 4 and 5, the same comparison is done in Fig. 14. Here again we get similar results in terms of PSNR for the three methods. This implies that the two approaches (global and greedy) are equivalent in terms of the output quality, and the differences are mostly in their complexities, with a clear advantage to the greedy techniques. To summarize, for the various deblurring and scale-up tests, there is no real advantage to one of the methods over the other, in terms of the quality of the reconstruction results. Thus, when we need to set both λ and K, the greedy methods are to be favored. In this work we have considered automatic tuning of parameters for inverse problems. Our focus has been iterated shrinkage algorithms for deblurring and image scale-up, targeting an automatic way for choosing λ in each iteration in these methods, and choosing the number of iterations K. We extended the global tuning method developed in [26] to general ill-posed problems, by exploiting a projected version of the GSURE. We also applied the GCV, the L-curve, and the discrepancy-principle methods for this task of parameter tuning, and compared their selection with those of the (projected) GSURE. The GSURE was shown to give better approximation for the true values of the tuned parameters, leading to better results in the reconstruction. Two greedy methods for parameter tuning -a simple and a look-ahead version -were presented. These two methods are shown to perform as well as the global method, with the advantage of setting an automatic stopping rule together with the λ parameter, whereas the previous methods set only one of the parameters, given the other. We are aware of fact that there are better algorithms today for image deblurring and scale-up. However, the scope of this paper is to demonstrate the applicability of the GSURE for parameter tuning for such methods and not to compete with current state of the art reconstruction results. Using the concepts presented in this paper, automatic parameter tuning can be performed also for other techniques. These are defined as PS NR = 10 log 10 2 /MS E and IS NR = 10 log 10 MS E re f /MS E , where MS E re f is a reference-MSE to compare against. The reference MSE is obtained by considering y as the estimate. these scale-up techniques do not require parameter tuning.

The Projected GSURE for Automatic Parameter Tuning in Iterative Shrinkage Methods

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment