Computer Science / Machine Learning Mathematics / math.PR

A tail inequality for quadratic forms of subgaussian random vectors

February 23, 2026

Reading time: 6 minute

...

#Mathematics #Machine Learning #Computer Science

📝 Original Info

Title: A tail inequality for quadratic forms of subgaussian random vectors
ArXiv ID: 1110.2842
Date: 2011-10-14
Authors: Researchers from original ArXiv paper

📝 Abstract

We prove an exponential probability tail inequality for positive semidefinite quadratic forms in a subgaussian random vector. The bound is analogous to one that holds when the vector has independent Gaussian entries.

💡 Deep Analysis

Deep Dive into A tail inequality for quadratic forms of subgaussian random vectors.

📄 Full Content

Suppose that x = (x 1 , . . . , x n ) is a random vector. Let A ∈ R m×n be a fixed matrix. A natural quantity that arises in many settings is the quadratic form Ax 2 = x ⊤ (A ⊤ A)x. Throughout v denotes the Euclidean norm of a vector v, and M denotes the spectral (operator) norm of a matrix M . We are interested in how close Ax 2 is to its expectation.

Consider the special case where x 1 , . . . , x n are independent standard Gaussian random variables. The following proposition provides an (upper) tail bound for Ax 2 . Proposition 1. Let A ∈ R m×n be a matrix, and let Σ := A ⊤ A. Let x = (x 1 , . . . , x n ) be an isotropic multivariate Gaussian random vector with mean zero. For all t > 0, Pr Ax 2 > tr(Σ) + 2 tr(Σ 2 )t + 2 Σ t ≤ e -t .

The proof, given in Appendix A.2, is straightforward given the rotational invariance of the multivariate Gaussian distribution, together with a tail bound for linear combinations of χ 2 random variables due to Laurent and Massart (2000). We note that a slightly weaker form of Proposition 1 can be proved directly using Gaussian concentration (Pisier, 1989).

In this note, we consider the case where x = (x 1 , . . . , x n ) is a subgaussian random vector. By this, we mean that there exists a σ ≥ 0, such that for all α ∈ R n ,

We provide a sharp upper tail bound for this case analogous to one that holds in the Gaussian case (indeed, the same as Proposition 1 when σ = 1).

One motivation for our main result comes from the following observations about sums of random vectors. Let a 1 , . . . , a n be vectors in a Euclidean space, and let A = [a 1 | • • • |a n ] be the matrix with a i as its ith column. Consider the squared norm of the random sum

Under mild boundedness assumptions on the x i , the probability that the squared norm in (1) is much larger than its expectation

falls off exponentially fast. This can be shown, for instance, using the following lemma by taking u i = a i x i (the proof is standard, but we give it for completeness in Appendix A.1).

Proposition 2. Let u 1 , . . . , u n be a martingale difference vector sequence ( i.e.,

for all i = 1, . . . , n, almost surely. For all t > 0,

After squaring the quantities in the stated probabilistic event, Proposition 2 gives the bound

with probability at least 1e -t when the x i are almost surely bounded by 1 (or any constant).

Unfortunately, this bound obtained from Proposition 2 can be suboptimal when the x i are subgaussian. For instance, if the x i are Rademacher random variables, so Pr[

with probability at least 1e -t . A similar result holds for any subgaussian distribution on the x i (Hanson and Wright, 1971). This is an improvement over the previous bound because the deviation terms (i.e., those involving t) can be significantly smaller, especially for large t.

In this work, we give a simple proof of (2) with explicit constants that match the analogous bound when the x i are independent standard Gaussian random variables.

Our main theorem, given below, is a generalization of (2).

Theorem 1. Let A ∈ R m×n be a matrix, and let Σ := A ⊤ A. Suppose that x = (x 1 , . . . , x n ) is a random vector such that, for some µ ∈ R n and σ ≥ 0,

(3)

Remark 1. Note that when µ = 0 and σ = 1 we have:

which is the same as Proposition 1.

Remark 2. Our proof actually establishes the following upper bounds on the moment generating function of Ax 2 for 0 ≤ η < 1/(2σ 2 Σ ):

where z is a vector of m independent standard Gaussian random variables.

Proof of Theorem 1. Let z be a vector of m independent standard Gaussian random variables (sampled independently of x). For any α ∈ R m ,

Thus, for any λ ∈ R and ε ≥ 0,

Moreover,

Let U SV ⊤ be a singular value decomposition of A; where U and V are, respectively, matrices of orthonormal left and right singular vectors; and S = diag( √ ρ 1 , . . . , √ ρ m ) is the diagonal matrix of corresponding singular values. Note that

By rotational invariance, y := U ⊤ z is an isotropic multivariate Gaussian random vector with mean zero. Therefore

for 0 ≤ γ < 1/(2 ρ ∞ ). Combining (4), ( 5), and (6) gives

where h 1 (a) := 1+ a-√ 1 + 2a, which has the inverse function h

The following lemma is a standard estimate of the logarithmic moment generating function of a quadratic form in standard Gaussian random variables, proved much along the lines of the estimate due to Laurent and Massart (2000).

Lemma 1. Let z be a vector of m independent standard Gaussian random variables. Fix any non-negative vector α ∈ R m + and any vector

The right-hand side can be bounded using the inequalities

Example: fixed-design regression with subgaussian noise

We give a simple application of Theorem 1 to fixed-design linear regression with the ordinary least squares estimator. Let x 1 , . . . , x n be fixed design vectors in R d . Let the responses y 1 , . . . , y n be random variables for which there exists σ > 0 such that

α 2 i for any α 1 , . . . , α n ∈ R. This condition is satisfied, for instance, if

…(Full text truncated)…

📄 Read Full PDF on ArXiv

📸 Image Gallery

Reference

This content is AI-processed based on ArXiv data.

A tail inequality for quadratic forms of subgaussian random vectors

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Related Posts

Algorithmic information theory

Compressed Regression

On the computability of conditional probability

Start searching

No results found