Low Rank Matrix-Valued Chernoff Bounds and Approximate Matrix Multiplication

Reading time: 6 minute
...

📝 Original Info

  • Title: Low Rank Matrix-Valued Chernoff Bounds and Approximate Matrix Multiplication
  • ArXiv ID: 1005.2724
  • Date: 2023-06-15
  • Authors: : John Doe, Jane Smith, Michael Johnson

📝 Abstract

In this paper we develop algorithms for approximating matrix multiplication with respect to the spectral norm. Let A\in{\RR^{n\times m}} and B\in\RR^{n \times p} be two matrices and \eps>0. We approximate the product A^\top B using two down-sampled sketches, \tilde{A}\in\RR^{t\times m} and \tilde{B}\in\RR^{t\times p}, where t\ll n such that \norm{\tilde{A}^\top \tilde{B} - A^\top B} \leq \eps \norm{A}\norm{B} with high probability. We use two different sampling procedures for constructing \tilde{A} and \tilde{B}; one of them is done by i.i.d. non-uniform sampling rows from A and B and the other is done by taking random linear combinations of their rows. We prove bounds that depend only on the intrinsic dimensionality of A and B, that is their rank and their stable rank; namely the squared ratio between their Frobenius and operator norm. For achieving bounds that depend on rank we employ standard tools from high-dimensional geometry such as concentration of measure arguments combined with elaborate \eps-net constructions. For bounds that depend on the smaller parameter of stable rank this technology itself seems weak. However, we show that in combination with a simple truncation argument is amenable to provide such bounds. To handle similar bounds for row sampling, we develop a novel matrix-valued Chernoff bound inequality which we call low rank matrix-valued Chernoff bound. Thanks to this inequality, we are able to give bounds that depend only on the stable rank of the input matrices...

💡 Deep Analysis

Figure 1

📄 Full Content

In many scientific applications, data is often naturally expressed as a matrix, and computational problems on such data are reduced to standard matrix operations including matrix multiplication, ℓ 2 -regression, and low rank matrix approximation.

In this paper we analyze several approximation algorithms with respect to these operations. All of our algorithms share a common underlying framework which can be described as follows: Let A be an input matrix that we may want to apply a matrix computation on it to infer some useful information about the data that it represents. The main idea is to work with a sample of A (a.k.a. sketch), call it A, and hope that the obtained information from A will be in some sense close to the information that would have been extracted from A.

In this generality, the above approach (sometimes called “Monte-Carlo method for linear algebraic problems”) is ubiquitous, and is responsible for much of the development in fast matrix computations [FKV04, DKM06a, Sar06, DMM06, AM07, CW09, DR10].

As we sample A to create a sketch A, our goal is twofold: (i) guarantee that A resembles A in the relevant measure, and (ii) achieve such a A using as few samples as possible. The standard tool that provides a handle on these requirements when the objects are real numbers, is the Chernoff bound inequality. However, since we deal with matrices, we would like to have an analogous probabilistic tool suitable for matrices. Quite recently a non-trivial generalization of Chernoff bound type inequalities for matrix-valued random variables was introduced by Ahlswede and Winter [AW02]. Such inequalities are suitable for the type of problems that we will consider here. However, this type of inequalities and their variants that have been proposed in the literature [GLF + 09, Rec09, Gro09,Tro10] all suffer from the fact that their bounds depend on the dimensionality of the samples. We argue that in a wide range of applications, this dependency can be quite detrimental.

Specifically, whenever the following two conditions hold we typically provide stronger bounds compared with the existing tools: (a) the input matrix has low intrinsic dimensionality such as rank or stable rank, (b) the matrix samples themselves have low rank. The validity of condition (a) is very common in applications from the simple fact that viewing data using matrices typically leads to redundant representations. Typical sampling methods tend to rely on extremely simple sampling matrices, i.e., samples that are supported on only one entry [AHK06, AM07,DZ10] or samples that are obtained by the outer-product of the sampled rows or columns [DKM06a,RV07], therefore condition (b) is often natural to assume. By incorporating the rank assumption of the matrix samples on the above matrixvalued inequalities we are able to develop a “dimensionfree” matrix-valued Chernoff bound. See Theorem 1.1 for more details.

Fundamental to the applications we derive, are two probabilistic tools that provide concentration bounds of certain random matrices. These tools are inherently different, where each pertains to a different sampling procedure. In the first, we multiply the input matrix by a random sign matrix, whereas in the second we sample rows according to a distribution that depends on the input matrix. In particular, the first method is oblivious (the probability space does not depend on the input matrix) while the second is not.

The first tool is the so-called subspace Johnson-Lindenstrauss lemma. Such a result was obtained in [Sar06] (see also [Cla08,Theorem 1.3]) although it appears implicitly in results extending the original Johnson Lindenstrauss lemma (see [Mag07]). The techniques for proving such a result with possible worse bound are not new and can be traced back even to Milman’s proof of Dvoretsky theorem [Mil71].

Lemma 1.1. (Subspace JL lemma [Sar06]) Let W ⊆ R d be a linear subspace of dimension k and ε ∈ (0, 1/3). Let R be a t × d random sign matrix rescaled by 1/ √ t, namely R ij = ±1/ √ t with equal probability. Then

where c 1 > 0, c 2 > 1 are constants.

The importance of such a tool, is that it allows us to get bounds on the necessary dimensions of the random sign matrix in terms of the rank of the input matrices, see Theorem 3.2 (i.a).

While the assumption that the input matrices have low rank is a fairly reasonable assumption, one should be a little cautious as the property of having low rank is not robust. Indeed, if random noise is added to a matrix, even if low rank, the matrix obtained will have full rank almost surely. On the other hand, it can be shown that the added noise cannot distort the Frobenius and operator norm significantly; which makes the notion of stable rank robust and so the assumption of low stable rank on the input is more applicable than the low rank assumption.

Given the above discussion, we resort to a different methodology, called matrix-valued Chernoff bounds. These are non-trivial generalizations of

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut