An Alternating Direction Method for Finding Dantzig Selectors

Reading time: 5 minute
...

📝 Original Info

  • Title: An Alternating Direction Method for Finding Dantzig Selectors
  • ArXiv ID: 1011.4604
  • Date: 2007-03-01
  • Authors: Yuan, Y.; Lin, Y. —

📝 Abstract

In this paper, we study the alternating direction method for finding the Dantzig selectors, which are first introduced in [8]. In particular, at each iteration we apply the nonmonotone gradient method proposed in [17] to approximately solve one subproblem of this method. We compare our approach with a first-order method proposed in [3]. The computational results show that our approach usually outperforms that method in terms of CPU time while producing solutions of comparable quality.

💡 Deep Analysis

Figure 1

📄 Full Content

Consider the standard linear regression model:

where y ∈ ℜ n is a vector of responses, X ∈ ℜ n×p is a design matrix, β ∈ ℜ p is an unknown regression vector and ǫ is a vector of random noises. One widely studied problem for this model is that of variable selection, that is, how to determine the support of β (i.e., the indices of the nonzero entries of β). When p ≪ n, this problem can be tackled by many classical approaches. In recent years, however, the situations where p ≫ n have become increasingly common in many applications such as signal processing and gene expression studies. Thus, efforts have been directed at developing new variable selection methods that work for large values of p. A few examples of such methods include the lasso [23], the elastic net [28], and the more recent Dantzig selector [8].

A Dantzig selector for (1) is a solution of the following optimization problem:

where δ > 0 and D is the diagonal matrix whose diagonal entries are the norm of the columns of X. The Dantzig selector was first proposed in [8] and justified on detailed statistical grounds. In particular, it was shown that, this estimator achieves a loss within a logarithmic factor of the ideal mean squared error, i.e., the error one would achieve if one knows the support of β and the coordinates of β that exceed the noise level. For more discussion of the importance of Dantzig selector and its relationship with other estimators like lasso, we refer the readers to [5,6,9,11,13,18,21,14]. Despite the importance of the Dantzig selector and its many connections with other estimators, there are very few existing algorithms for solving (2). One natural way of solving (2) is to recast it as a linear programming (LP) problem and solve it using LP techniques. This approach is adopted in the package ℓ 1 -magic [7], which solves the resulting LP problem via a primal-dual interior-point (IP) method. However, the IP methods are typically not efficient for large-scale problems as they require solving dense Newton systems for each iteration. Another approach of solving (2) uses homotopy methods to compute the entire solution path of the Dantzig selector (see, for example, [22,14]). Nevertheless, as discussed in [3, Section 1.2], these methods are also unable to deal with large-scale problems. Recently, first-order methods are proposed for (2) in [16,3], which are capable of solving large-scale problems. In [16], problem (2) and its dual are recast into a smooth convex programming problem and an optimal first-order method proposed in [2] is then applied to solve the resulting problem. In [3], problem (2) is recast as a linear cone programming problem. The optimal first-order methods (see, for example, [19,2,20,24,15]) are then applied to solve a smooth approximation to the dual of the latter problem.

In this paper, we consider an alternative approach, namely, the alternating direction method (ADM), for solving (2). The ADM and its many variants have recently been widely used to solve large-scale problems in compressed sensing, image processing and statistics (see, for example, [1,12,25,27,26]). In general, the ADM can be applied to solve problems of the following form: min x,y f (x) + g(y)

where f and g are convex functions, A and B are matrices, b is a vector, and C 1 and C 2 are closed convex sets. Each iteration of the ADM involves solving two subproblems successively and then updating a multiplier, and the method converges to an optimal solution of (3) under some mild assumptions (see, for example, [4,10]). In this paper, we show that (2) can be rewritten in the form of (3), and hence the ADM can be suitably applied. Moreover, we show that one of the ADM subproblems has a simple closed form solution, while another one can be efficiently and approximately solved by a nonmonotone gradient method proposed recently in [17]. We also discuss convergence of this ADM. Finally, we compare our method for solving (2) with a first-order method proposed in [3] on large-scale simulated problems. The computational results show that our approach usually outperforms that method in terms of CPU time while producing solutions of comparable quality. The rest of the paper is organized as follows. In Subsection 1.1, we define notations used in this paper. In Section 2, we study the alternating direction method for solving problem (2) and address its convergence. Finally, we conduct numerical experiments to compare our method with a first-order method proposed in [3] in Section 3.

In this paper, ℜ n denotes the n-dimensional Euclidean space and ℜ m×n denotes the set of all m × n matrices with real entries. For a vector x ∈ ℜ n , x 1 , x 2 and x ∞ denote the 1-norm, 2-norm and ∞-norm of x, respectively. For any vector x in ℜ n , |x| is the vector whose ith entry is |x i |, while sgn(x) is the vector whose ith entry is 1 if x i > 0 and -1 otherwise. Given two vectors x and y in ℜ n , x • y denotes the Hadamard (entry-wise) product of x and y, max{x, y}

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut