We show how to compactly represent any $n$-dimensional subspace of $R^m$ as a banded product of Householder reflections using $n(m - n)$ floating point numbers. This is optimal since these subspaces form a Grassmannian space $Gr_n(m)$ of dimension $n(m - n)$. The representation is stable and easy to compute: any matrix can be factored into the product of a banded Householder matrix and a square matrix using two to three QR decompositions.
If m ≥ n, the Householder QR algorithm represents an m × n orthogonal matrix U as a product of n Householder reflections using a total of n(m -(n + 1)/2) floating point numbers [1, chap. 5]. However, in some applications only the range of U is important; any other orthogonal matrix with the same range is equivalent. One example is the hierarchically semiseparable representations of [2], where a tree of orthogonal matrices is used to compress matrices with significant offdiagonal structure. Since the n-dimensional subspaces of R m form a Grassmannian manifold Gr n (m) of dimension n(m -n), we expect that some orthogonal matrix with the correct range can be represented with n(m -n) floats. The following theorem provides such a representation:
where B ∈ R n×n is square and G is a product of n Householder reflections with
Since each v i has m -n + 1 nonzero components, the first of which is 1, G can be stored in n(m -n) floats. If A is full rank, the matrices G and B are unique, although the Householder vectors v i may not be.
Proof. Observe that (1) is exactly the factored form produced by standard Householder QR except for the trailing zeroes in each v i , which correspond to the extreme lower triangle i > j + m -n. To introduce these zeroes, define A ♦ as A rotated by 180 • (pronounced “flip A”),
and perform an LQ decomposition of A ♦ :
The Householder QR algorithm constructs v i as a linear combination of e i and the ith column of the matrix (after rotation by the previous Householder reflections), and the first component of each vector can be chosen to be 1 [1, chap. 5]. Therefore, a Householder QR decomposition
will produce G with the correct banded structure. Our final factorization is
The steps are visualized in Figure 1. Uniqueness of G and B follows from the uniqueness of the QR decomposition when A is full rank [1, chap. 5]. Note that B is orthogonal whenever A is orthogonal.
Since the construction uses only matrix multiply and Householder QR decomposition as primitives, the computation of G is stable and requires O(mn 2 ) flops. Normally the scalar factors β i = 2/(v T i v i ) will be precomputed and stored for an additional n floats of storage. Once β i is available, a single Householder reflection requires 4(m -n) + 2 flops and matrix vector products Gx or G T y can be computed in 4n(m -n) + 2n flops. If blocking is desired, we can represent products of b Householder reflections where b is the block size using the I -V T V T representation [3], which involves a relatively small increase in storage and flops if b m -n. Unfortunately, the banded Householder representation in Theorem 1 is inefficient for large subspaces of R m , since Gx involves a large number of small level 1 BLAS operations if m -n is small. To remedy this problem, we can represent a large n-dimensional subspace in terms of its small (m -n)-dimensional orthogonal complement as follows:
Theorem 2. If m ≥ n, any A ∈ R m×n can be factored as
where B ∈ R n×n is square and G is a banded product of m -n Householder reflections with n + 1 nonzero components per vector, the first of which is 1 (equivalent to (1) with m -n and n swapped). In particular, G can also be stored in n(m -n) floats.
Proof. Perform a QR decomposition of A to get
Here the column span of U 1 ∈ R m×n contains the range of A, and the span of
Frames from an animation of a digital character using blend shapes stored in a hierarchically semiseparable representation (HSS). Using banded Householder form for the rotations in the HSS tree reduces the storage costs by 45.7% over dense storage, or 29.5% of Householder storage.
whence
Using Theorem 1 for m-n ≥ n and Theorem 2 for m-n < n, the resulting G consists of at most m/2 Householder vectors each with at least m/2 + 1 nonzero components. In particular, the blocked I -V T V T form is efficient whenever b m, n, regardless of the value of m -n.
Our motivating application for the banded Householder decomposition is the compression of blend shape matrices for digital characters. We start with a large, mostly dense matrix where each column represents a pose of the digital character mesh. An example face with 42391 vertices and 730 blend shapes is shown in Figure 2. The original matrix consumes 348 MB of storage in single precision. To reduce this, we compute a lossy hierarchically semiseparable (HSS) representation for the matrix [2], which represents a matrix as a tree of rotations and dense blocks. If the rotations are stored in dense form, the HSS representation requires 46.8 MB of storage. Using Householder form reduces the storage cost to 36.0 MB (77.7% of dense), and banded Householder form reduces the cost further to 25.4 MB (54.3% of dense). On an 8 core Intel Xeon 2.8 GHz machine, the cost to multiply the HSS representation with a vector is 11.2 ms using dense storage with optimized BLAS and 10.7 ms using banded Householder storage with handwritten, unvectorized C. Since the required memory traffic in t
This content is AI-processed based on open access ArXiv data.