Let $A$ be a $n$ by $n$ matrix. A skeleton decomposition is any factorization of the form $CUR$ where $C$ comprises columns of $A$, and $R$ comprises rows of $A$. In this paper, we consider uniformly sampling $\l\simeq k \log n$ rows and columns to produce a skeleton decomposition. The algorithm runs in $O(\l^3)$ time, and has the following error guarantee. Let $\norm{\cdot}$ denote the 2-norm. Suppose $A\simeq X B Y^T$ where $X,Y$ each have $k$ orthonormal columns. Assuming that $X,Y$ are incoherent, we show that with high probability, the approximation error $\norm{A-CUR}$ will scale with $(n/\l)\norm{A-X B Y^T}$ or better. A key step in this algorithm involves regularization. This step is crucial for a nonsymmetric $A$ as empirical results suggest. Finally, we use our proof framework to analyze two existing algorithms in an intuitive way.
1.1. Skeleton decompositions. This paper is concerned with the decomposition known as the matrix skeleton 1 , pseudo-skeleton [22], or CUR factorization [29,18].
For the rest of the paper, we adopt the following Matlab-friendly notation: given A ∈ C m×n , A :C will denote the restriction of A to columns indexed by C, and A R: will denote the restriction of A to rows indexed by R. A skeleton decomposition of A is any factorization of the form A :C ZA R: for some Z ∈ C k×k .
In general, a rank-k approximation of A takes up O((m + n)k) space. The major advantage of the skeleton decomposition is that it may require only O(k 2 ) space. Consider the case where A’s entries can be specified by a function in closed form. Then the skeleton decomposition is fully specified by Z ∈ C k×k , as well as the two index sets C and R. In addition, row and columns from the original matrix may carry more physical significance than linear combinations thereof.
There are important examples where the full matrix itself is not low rank but can be partitioned into blocks each of which has low numerical rank. One example is the Green’s function of elliptic operators with mild regularity conditions [4]. Another example is the amplitude Algorithm 1. Skeleton via uniform sampling, O(k 3 ) algorithm. Input: A matrix A ∈ C m×n , and user-defined parameters ℓ and δ. Output: A skeleton representation with ℓ rows and ℓ columns. Steps:
- Uniformly and independently select ℓ columns to form A :C . 2. Uniformly and independently select ℓ rows to form A R: . 3. Compute the SVD of A RC . Remove all the components with singular values less than δ. Treat this as a perturbation E of A such that E RC ≤ δ and (A RC + E RC ) + ≤ δ -1 . 4. Compute Z = (A RC + E RC ) + . (In Matlab, Z=pinv(A(R,C),delta).) 5. Output A :C ZA R: . factor in certain Fourier integral operators and wave propagators [11,16]. Algorithms that compute good skeleton representations can be used to compress such matrices.
1.2. Overview. In this paper, we consider uniformly sampling ℓ ≃ k log max(m, n) rows and columns of A to build a skeleton decomposition. After obtaining A :C , A R: this way, we compute the middle matrix Z as the pseudoinverse of A RC with thresholding or regularization. See Algorithm 1 for more details. Suppose A ≃ X 1 A 11 Y T 1 where X 1 , Y 1 have k orthonormal columns. Assuming that X 1 , Y 1 are incoherent, we show in Theorem 1.2 that the 2-norm approximation error A -A :C ZA R: is bounded by O( A -X 1 A 11 Y T 1 (mn) 1/2 /ℓ) with high probability. We believe that this is the first known relative error bound in the 2-norm for a nonsymmetric A.
The idea of uniformly sampling rows and columns to build a skeleton is not new. In particular, for the case where A is symmetric, this technique may be known as the Nystrom method 2 . The skeleton used is A :C A + CC A C: , which is symmetric, and its 2-norm error is recently analyzed by Talwalkar [37] and Gittens [21]. Both papers make the same assumption that X 1 , Y 1 are incoherent. In fact, Gittens arrives at the same relative error bound as us.
Nonetheless, our results are more general. They apply to nonsymmetric matrices that are low rank in a broader sense. Specifically, when we write A ≃ X 1 A 11 Y T 1 , A 11 is not necessarily diagonal and X 1 , Y 1 are not necessarily the singular vectors of A. This relaxes the incoherence requirement on X 1 , Y 1 . Furthermore, in the physical sciences, it is not uncommon to work with linear operators that are known a priori to be almost diagonalized by the Fourier basis or related bases in harmonic analysis. These bases are often incoherent. One example is an integral operator with a smooth kernel. See Section 4 for more details.
The drawback of our results is that it requires us to set an appropriate regularization parameter in advance. Unfortunately, there is no known way of estimating it fast, and this regularization step cannot be skipped. In Section 4.1, we illustrate with a numerical example that without the regularization, the approximation error can blow up in a way predicted by Theorem 1.2.
Finally, we also establish error bounds for two other algorithms. We shall postpone the details.
1.3. Some previous work. Before presenting our main results, we like to provide a sample of some previous work. The well-informed reader may want to move on.
The precursor to the skeleton decomposition is the interpolative decomposition [14], also called the column subset selection problem [20]. An interpolative decomposition of A is the factorization A :C D for some D. The earliest work on this topic is probably the pivoted QR algorithm by Businger, Golub [8] in 1965. In 1987, Chan [12] introduced the Rank Revealing QR (RRQR); a sequence of improved algorithms and bounds followed [19,26,31,23]. RRQR algorithms can also be used to compute skeletons [28].
An early result due to Goreinov et al. [22] says that for any A ∈ C m×n , there exists a skeleton
Although the proof i
This content is AI-processed based on open access ArXiv data.