Counting the Faces of Randomly-Projected Hypercubes and Orthants, with Applications
Let $A$ be an $n$ by $N$ real valued random matrix, and $\h$ denote the $N$-dimensional hypercube. For numerous random matrix ensembles, the expected number of $k$-dimensional faces of the random $n$-dimensional zonotope $A\h$ obeys the formula $E f_…
Authors: David L. Donoho, Jared Tanner
COUNTING THE F A CES OF RANDOML Y-PR OJECTED HYPER CUBES AND OR T HANTS, WITH APPLICA TIONS DA VID L. DONOH O AND JARED T ANNER Abstract. Let A be an n b y N real v alued random matrix, and H N denote the N -dimensional hypercube. F or numerous random matrix ensembles, the expected n umber of k -dimensional f aces of the rando m n -dimensional zono- tope AH N obeys the formula E f k ( AH N ) /f k ( H N ) = 1 − P N − n,N − k , where P N − n,N − k is a fair-coin-tossing probability: P N − n,N − k ≡ P r ob { N − k − 1 or fewe r successes in N − n − 1 tosses } . The formula applies, for example, where the columns of A are drawn i.i. d. from an absolutely con tin uous symmetric distri bution. The formula exploits W endel’s Theorem[19]. Let R N + denote the p ositive orthant; the expected num ber of k -faces of the r andom cone A R N + obeys E f k ( A R N + ) /f k ( R N + ) = 1 − P N − n,N − k . The for- mula applies to n umerous matrix ensem bles, including those with iid random columns f rom an absolutely contin uous, cent rall y symmetri c distribution. The probab ili ties P N − n,N − k c hange rapidly from nearly 0 to nearly 1 near k ≈ 2 n − N . Consequen tly , there is an asymptotically sharp threshold in the behavior of face coun ts of the pro jected hypercube; thresholds kno wn for pro jecting the sim plex and the cross-p olytope, o ccur at very different locations. W e briefly co nsider face cou nts of the pro jected orthan t when A does not hav e mean zero; these do b ehav e si milarly to those for the pro ject ed s i mplex. W e consider non-random pro ject ors of the orthan t; the ’best possible’ A is the o ne associated with the first n rows of the F ourier matrix. These geometric face-coun ting r esults ha ve implications for signal pro cess- ing, information theory , i n ve rse problems, and optimization. Most of these flo w in some wa y f rom the f act that face coun ting is related to conditions for uniqueness of solutions of underdetermined systems of linear equations. a) A vect or in R N + is called k -sparse if it has at most k nonzeros. F or such a k -sparse vector x 0 , let b = Ax 0 , where A is a random matrix ensemble cov ered b y our r esults. With probability 1 − P N − n,N − k the inequality-constrained system Ax = b , x ≥ 0 has x 0 as i ts unique nonnegative solution. This i s so, ev en if n < N , so that the system A x = b is underdetermined. b) A vect or i n the hy p ercube H N will be called k -simple if all en tries except at most k are at the bounds 0 or 1. F or such a k -s i mple v ector x 0 , let b = Ax 0 , where A i s a random matrix ensemble co vered by our results. With probability 1 − P N − n,N − k the inequality-const rained system A x = b , x ∈ H N has x 0 as its unique solution i n the h ypercub e. Date : May 2008. 2000 Mathematics Subje ct Classific ation. 52A22, 52B05, 52B11, 52B12, 62E20, 68P30, 68 P25, 68W20, 68W40, 94B20 94B35, 94B65, 94B70. The authors would like to than k the Isaac Newton Mathematical Institut e at Cam bridge Uni- v ersity for hosting the programme ”Statistical Challenges of High Dimensional Data” in 2008. and Professor D.M. Titterington for organizing th is programme. DLD ac kno wledges support from NSF DMS 05-05303 and a Rothschild Visi ting Professorship at the Universit y of Camb ridge. JT ac kno wledges suppor t fr om the Alfr ed P . Sloan F oundation and thanks John E. and Mar v a M. W arno c k for their generous supp ort in the f orm of an endow ed c hair. 1 2 Keyw ords. Zonotop e, Random P olytop es, Random Cones, W endel’s Theo- rem, Threshold Phenomena, Universalit y , Random Matrices, Compress ed Sensing, Unique Solution of Underdetermined Sy s tems of Linea r E quations. 1. Introduction There a re 3 fundamental r e gular p olytop es in R N , N ≥ 5: the h yp e r cube H N , the cross- p olyto p e C N , and the simplex T N − 1 . F or eac h of these, pro jecting the v ertices int o R n , n < N , yields the vertices o f a new p olytop e; in fact, ev ery poly tope in R n can be generated by rotating the simplex T N − 1 and orthogo nally pro jecting on the first n co o rdinates, for some choice of N a nd of N -dimensional ro tation. Similarly , every cen tro-sy mmetric po ly tope can be generated by pro jecting the cr oss-p olytop e, and ev ery zonotop e b y pro jecting the hypercub e. 1.1. Random p olytop es . Choosing the pro jection A at ra ndom has b ecome pop- ular. Let A be a n n × N uniformly distributed random o r thogonal pro jection, obtained by first a pplying a unifor mly-distributed rotation to R N and then pro- jecting o n the fir st n co ordina tes. Let Q b e a p olytop e in R N . Then AQ is a random p olytop e in R n . T aking Q in turn from ea c h o f the three families of r e g ular po lytope s we ge t thre e ar enas for scholarly s tudy: • Random p olytop es of the for m AT N − 1 were fir st studied by Affentranger and Schneider [1] and by V e r shik a nd Spor yshev [18]; • Random po lytope s o f the for m AC N were first studied extensively b y Borozcky and Henk [5]; • The random zonotop e AH N will b e heavily studied in this pap er; b e gin- nings of a liter ature on zo notop es ca n b e found in [4, 2]. Such random p olytop es ca n have face lattices undergoing abrupt changes in prop- erties as dimensions change only slightly . In the case o f AT N − 1 and AC N , previous work b y the authors [7 , 10, 13, 12] documented the follo wing ⁀ threshold phenomenon. (Our work built on fundamental formulas developed by Affentranger and Sc hneider [1] and used a n asymptotic framework pio neered by V er shik and Spo r yshev [18], who p ointed to the first such threshold effect). Let f k ( Q ) denote the num b er o f k -dimensional faces o f p olyhedro n Q . It turns o ut that for large n , the num ber of k -dimensional faces of f k ( AQ ) migh t either be approximately equal to f k ( Q ) or else significantly smaller, dep ending on the size of k relative to a threshold dep ending on the r atio of n to N . T o make this precise, c o nsider the following p r op ortional-dimensional asymptotic framework. A dimension sp e cifier is a triple of int eger s ( k , n, N ), r epresenting a ‘face’ dimension k , a ‘small’ dimension n and a ‘la rge’ dimensio n N ; k < n < N . F o r fixed δ, ρ ∈ (0 , 1), co ns ider sequences of dimension sp ecifiers, indexed b y n , and ob eying (1.1) k n /n → ρ and n/ N n → δ. F o r suc h sequences the small dimension n is held pr op o rtional to th e large dimension N as b oth dimensions grow. W e o mit subscripts on k n and N n when p ossible . F or Q = T N − 1 , C N , the pap ers [7, 10, 1 3, 1 2] exhibited thresholds ρ ( δ ; Q ) for for the ratio b et ween the ex pected num ber o f faces o f the low-dimensional p olytop e AQ F ACES OF THE RANDOML Y-P R OJECTED HYPERCUBE AND OR THANT 3 and the num b er of faces of the high-dimensional p olytop e Q : (1.2) lim n →∞ E f k ( AQ ) f k ( Q ) = 1 ρ < ρ W ( δ ; Q ) < 1 ρ > ρ W ( δ ; Q ) . (In this relation, we take a limit as n → ∞ alo ng so me sequence ob eying the prop ortional- dimens ional constraint (1.1)). In words, the ra ndom ob ject AQ has roughly as many k - fa ces a s its g enerator Q , for k b elow a thres hold; and has no- ticeably fewer k -faces than Q , for k ab ov e the threshold. The threshold functions are defined in terms of Ga us sian int egr a ls and other spec ial functions, and ca n b e calculated n umerica lly . These phenomena, describe d her e from the viewpoint of combinatorial geometry , hav e s urprising c o nsequences in pr obability theory , informa tio n theory and sig nal pro cessing; see [8 , 11, 13], and Section 5 b elow. 1.2. Random Zonotop es. Missing from the ab ov e pictur e is information ab out the third family o f reg ular p olytop es, the hypercub e. B¨ or¨ oczky and Henk [5 ] dis- cussed it in passing, but o nly consider ed the asymptotic framework where the sma ll dimension n is held fixe d while the la rge dimension N → ∞ . In that framework, the threshold phenomenon is not visible. In this pap er, w e again c onsider the prop ortional- dimens ional case (1 .1) and pr ov e the following. Theorem 1.1 (‘W ea k’ Threshold fo r Hyp ercub e) . L et (1.3) ρ W ( δ ; H N ) := max (0 , 2 − δ − 1 ) . F or ρ , δ in (0 , 1) , c onsider a se qu en c e of dimension sp e cifiers ( k , n, N ) ob eying (1.1). L et A denote a un iformly-di stribute d ra ndom ortho gonal pr oje ct ion fr om R N to R n . (1.4) lim n →∞ E f k ( AH N ) f k ( H N ) = 1 , ρ < ρ W ( δ, H N ) 0 , ρ > ρ W ( δ, H N ) . Thu s w e pro ve a sharp disco n tinuit y in the b ehavior o f th e face lattices of random zonotop es; the lo cation of the threshold is precisely identified. (Such sharpness of the phase transition is also o bserved empiric a lly for (1.2) ab ov e; to our knowledge, a pro of o f discontin uity has not yet b een published in that se tting. ) Our use of the mo difier ‘weak’ and the subscript W on ρ matches usa ge in the pre v ious c a ses T N − 1 and C N . Although this result has b een stated in the language of combinatorial co n vexit y , as with the earlier r esults fo r AT N − 1 and AC N , there are implications for a pplied fields inc luding optimization and signal pr o c e ssing, see Sectio n 5 b elow. 1.3. More General N otion of Random Pro jection. In fact, Theor e m 1 .1 is only the tip o f the iceb erg. The ensemble of random matrice s use d in that r esult - uniformly distributed rando m or thopro jector - is o nly o ne exa mple of a ra ndo m matrix ensemble for which the conclusion (1.4) holds. As it turns out, what really matters are the sta tistical pr o per ties of the nullspace of A . Definition 1.2 (Or than t-Symmetry) . Let B b e a random N − n by N ma tr ix such that for each diagonal matrix S with diago nal in {− 1 , 1 } N , and for every measurable set Ω, Prob { B S ∈ Ω } = Prob { B ∈ Ω } . Then we say that B is an orthant-symmetric random ma trix. Let V B be the linear span of the rows of B . If B is an orthant-symmetric random matr ix w e say that V is an orthant-symmet ric r andom subsp ac e . 4 DA VID L. DONOHO AND J ARED T ANNER R emark 1.3 (Ortha nt-Symmetric Ensembles) . The following ens e m bles o f r andom matrices a re orthant-symmetric: • Un iformly-di stribute d R andom orthopr oje ctors f ro m R N to R N − n ; implicitly this was the example considered earlier . • Gaussian Ensembles . A r andom matrix B with en tries chosen from a Gauss- ian ze ro-mean distribution, i.e. such that the ( N − n ) · N -element vector v ec ( B ) is N (0 , Σ) with Σ a nondegenera te cov ariance matrix. • S ymm et ric i.i.d. Ensembles. Matrices with entries sampled i.i.d. from a symmetric probability distribution; examples include Gaussian N (0 , 1 ), uniform on [ − 1 , 1], unif orm from th e set {− 1 , 1 } , and from the set {− 1 , 0 , 1 } where − 1 and 1 ha ve equal non-zero probability . • S ign Ensembles. F or any fixed g enerator matrix B 0 , let the ra ndom ma- trix B = B 0 S where S is a random diag onal matr ix with entries drawn uniformly from {− 1 , 1 } . New ortha nt-symmetric ensembles can b e crea ted from an existing one by m ul- tiplying on the left by an ar bitrary random matrix T which is sto chastically inde- pendent of B , and multiplying on the right b y a ra ndom diag onal matrix R also sto c hastica lly indepe nden t of B and T : thus B ′ = T B R inher its or than t symmetry from B . Definition 1.4 (Genera l Position) . Let B b e a ra ndom N − n by N ma trix such that every s ubset of N − n co lumns is almost sur ely linea rly indepe nden t. Let V B be the linear spa n of the rows of B . W e say that V B is a generic ra ndom subspace. Many orthant-symmetric ensembles fro m our list cre ate ge neric row s pa ces: • Unifor mly - distributed random orthopro jectors; • Ga us sian E nsemb les; • Symmetric iid ensemb les having an absolutely contin uous distr ibution; • Sig n E ns em bles with gener ator matrix B 0 having its columns in genera l po sition; Define a c ensor e d symmetric iid ensemble as a symmetric iid e nsem ble from which we discard rea lizations B where the columns happ en to b e not in genera l p osition. Censoring a symmetric iid ense m ble made fro m the Berno ulli {− 1 , +1 } coin tossing distribution pro duces a new random matrix mo del ˜ B whose realizatio ns ar e in general po sition with probability one. (The proba bilit y of a censo ring even t is exp onent ially small in N , [17]). Theorem 1. 5 (‘W e ak’ Thresho ld for Hyper cube ) . L et the r andom matrix A have a ra ndom nul lsp ac e which is orthant s ymm et ric and generic. In the pr op ortional- dimensional fr amework (1.1) the r andom zonotop e AH N ob eys the same c onclusion (1.4) as in The or em 1.1 . In a sens e , this theo rem extends the co nclusion of Theo r em 1 .1 to v a stly more cases . It has b een prev iously observed that some results known for the Go o dman- Pollac k r a ndom orthopro jector mo del ac tually extend to other ensemb les of r an- dom matrice s . It was obser v ed for the s implex by Affent rang er and Schneider [1], and prov en b y B aryshniko v and Vitale [3, 2], that face-co un ting results known for uniformly-distributed random orthopro jectors follow as well for Gaussia n iid ma- trices A . F ACES OF THE RANDOML Y-P R OJECTED HYPERCUBE AND OR THANT 5 Our extensio n o f Theo rem 1.1 from o rthopro jectors to or than t-symmetric null spaces in T he o rem 1.5 fo llows this program. Howev er, it is a v astly larger e x tension. 1.4. Random Cone. Convex cones provide another t yp e of fundamental p olyhe- dral se t. Amongst these, the simplest a nd most natura l is the p ositive o rthant P = R N + . The image of a cone under pr o j ection A : R N → R n is ag a in a cone K = AP . Typically the cone has f 0 ( K ) = 1 vertex (at 0), and f 1 ( K ) = N extreme rays, etc. In fact, every such p ointed cone in R n can b e genera ted as a pro jec- tion o f the p ositive orthant, with an appropr ia te o rthogonal pr o jection fro m an appropria te R N . As with the p olytop es mo dels, sur prising threshold phenomena can arise when the pro jector is random. Theorem 1.6 (‘W eak’ Threshold for Ortha n t) . L et A b e a r andom matrix whose nul lsp ac e is an orthant-symmetric and generic r andom subsp ac e. In the pr op ortional- dimensional fr amework (1.1) we have (1.5) lim n →∞ E f k ( A R N + ) f k ( R N + ) = 1 , ρ < ρ W ( δ ; R N + ) 0 , ρ > ρ W ( δ ; R N + ) with ρ W ( δ ; R N + ) ≡ ρ W ( δ ; H N ) as define d in (1.3). Here the threshold for the orthant is a t pr ecisely the same pla ce as it was for the hypercub e. Theorem 1.6 is prov en in Section 2 .3, and there are significant implications in optimization and sig nal pr oc essing briefly dis cussed in Sec tio n 5. 1.5. Exact equality i n the num b er of faces. O ur fo cus in Sections 1.1-1 .4 has b een o n the ‘weak’ agreement of E f k ( AQ ) with f k ( Q ); we hav e seen in the prop ortional- dimens ional framew ork , for ρ b elow threshold ρ W ( δ ; Q ), we hav e lim- iting r elative equality: E f k ( A R N + ) f k ( R N + ) → 1 , n → ∞ . W e now fo cus on the ‘strong’ agreement; it turns o ut that in the prop ortiona l di- mensional framework, for ρ below a so mewhat lo wer thresho ld ρ S ( δ ; Q ), we actua lly hav e exact equality with ov erwhelming probability: (1.6) Prob { f k ( Q ) = f k ( AQ ) } → 1 , n → ∞ . The existence of such ‘s trong’ thresholds for Q = T N − 1 and Q = C N was prov en in [7, 10 ], whic h exhibited thresholds ρ S ( δ ; Q ) b elow which (1.6) occ ur s. These “strong thresholds” a nd the pr eviously mentioned “weak thresholds” (1.2) are depicted in Figure 3.1. A similar strong thresho ld also holds for the pro jected orthant. Theorem 1.7 (‘Strong’ Threshold for Orthant) . L et (1.7) H ( γ ) := γ log(1 /γ ) − (1 − γ ) log(1 − γ ) denote the usual (b ase- e ) Shannon Entr opy. L et (1.8) ψ R + S ( δ, ρ ) := H ( δ ) + δ H ( ρ ) − (1 − ρ δ ) log 2 . F or δ ≥ 1 / 2 , let ρ S ( δ ; R N + ) denote the z er o cr ossing of ψ R + S ( δ, ρ ) . In the pr op ortional- dimensional fr amework (1.1) with ρ < ρ S ( δ ; R N + ) (1.9) Pr ob { f k ( A R N + ) = f k ( R N + ) } → 1 , as n → ∞ . 6 DA VID L. DONOHO AND J ARED T ANNER The threshold ρ W ( δ ; Q ) for Q = R N + and H N , and ρ S ( δ ; R N + ) are depicted in Figure 1.1. 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 δ =n/N ρ =k/n Figure 1.1. The ‘weak’ thresholds, ρ W ( δ ; H N ) and ρ W ( δ ; R N + ) (black), and a low er b ound on the strong threshold for the po sitiv e orthant, ρ S ( δ ; R N + ) (blue). In contrast to the pro jected simplex, cr oss-p olytop e, and or thant , for the h yp er- cube , ther e is no nontrivial reg ime wher e a phenomenon like (1.6) can o ccur. Lemma 1.8 (Zonoto pe V ertices) . L et A b e an n × N matrix, and let H N b e the N dimensional hyp er cub e. f k ( AH N ) < f k ( H N ) , k = 0 , 1 , 2 , . . . n. Pr o of of The or em 1.8. In fact, we will show tha t AH N alwa ys has fewer than 2 N vertices. This immediately implies the full r esult. Ther e exis ts a w ∈ N ( A ) with w 6 = 0. H N has a vertex x 0 ob eying x 0 ( i ) := 0 sgn( w ( i )) > 0 , 1 else . Let x t := x + tw with t > 0. F or t sufficient ly small x t is in the interior of H N , and by construction Ax 0 = Ax t . Invoking Lemma 2.5, x 0 is not a vertex of AH N , and f 0 ( AH N ) < f 0 ( H N ). Although this pro of only highlights a single vertex of H N that is int erio r to AH N , it is clea r fro m its constructio n tha t there ar e t ypically many such lo st vertices. Theo rem 1 .7 is proven in Sectio n 2.5. 1.6. Exact Non-As ymptotic R e sults. W e ha ve so far exclusiv ely used the V ershik- Spo ryshev prop ortio nal-dimensional asymptotic f ramework; this makes for the most natural compar is ons b etw een results for the three families of regular p olytop es. How ev er, for the p ositive o r than t and hypercub e, so mething truly remark able hap- pens : there is a simple exact express ion for finite N which connects to a bea utiful result in geometric pr obability . F ACES OF THE RANDOML Y-P R OJECTED HYPERCUBE AND OR THANT 7 Theorem 1 .9 (W endel, [19]) . L et M p oints in R m b e dr awn i.i.d. fr om a c entr o- symmetric distribution such that the p oints ar e in gener al p osition, then the pr ob a- bility t hat al l the p oints fal l in some half sp ac e is (1.10) P m,M = 2 − M +1 m − 1 X ℓ =0 M − 1 ℓ . This eleg ant result is often pr esent ed as simply a piece o f recrea tional ma the- matics. In our setting, it turns out to be truly p ow erful, b ecause of the following ident ity . Theorem 1. 1 0. L et A b e an n × N r andom matrix with an orthant-symmetric and generic r andom nul lsp ac e. (1.11) E f k ( A R N + ) f k ( R N + ) = 1 − P N − k,N − n . Symmetry implies a similar identit y for the hypercub e: Theorem 1. 1 1. L et A b e a r andom matrix with an orthant-symm et ric and generic r andom nul l sp ac e. (1.12) E f k ( AH N ) f k ( H N ) = 1 − P N − n,N − k . These formulae a re not at all asymptotic or a pproximate. But all the earlier asymptotic re sults der iv e fro m them. Theor em 1.10 is pr ov en in Section 2 .1, and the symmetry a rgument for Theo r em 1.11 is formalized in L e mma 2.6 and pr ov en in Sectio n 6.3. 1.7. Conten ts. Theor e m 1.6 is prov en in Section 2.3, Theorems 1 .1 and 1.5 are prov en in Section 2.4, and Theorem 1 .7 is prov en in Section 2.5; ea c h using the clas- sical W endel’s Theorem [19], Theorem 1.9. Their relationships with existing results in conv ex g eometry and matroid theor y a re discussed in Section 3, the the impli- cations of these results for informatio n theor y , sig nal pr oce s sing, and optimization are br iefly disc us sed in Section 5 . 2. P r oof of main resul ts Our pla n is to start with the key non-asymptotic ex act identit y (1.1 1) and then derive from it Theore m 1.6 by asymptotic analys is of the pro babilities in W endel’s Theorem. W e then infer Theorem 1.5 and later Theor em 1.7 follows in Sec tion 2.5. 2.1. Pro of of Theorem 1.10. Here and b elow we follow the co nven tion that, if we don’t give the pr oo f of a lemma o r corollary immedia tely following its statement, then the pro of ca n b e found in Section 6. Our pr oo f of the key formula (1.11) sta rts with the following observ ation on the exp ected n umber of k - faces of R N + . (2.1) E f k ( A R N + ) f k ( R N + ) = Ave F Prob { AF is a k -face of A R N + } . Here Ave F denotes ”the arithmetic mean ov er all k -faces of R N + . Because of (2.1) we w ill b e implicitly av era ging acr oss faces b elow. As a calcu- lation device we supp ose that all faces a re statistically equiv alent; this allows us to study o ne k -face, and yet co mpute the average across a ll k -faces. 8 DA VID L. DONOHO AND J ARED T ANNER Definition 2. 1 (Exchangable columns) . Let A b e a r andom n by N matrix s uc h that for each per mutation matrix Π, and fo r every measurable set Ω, Prob { A ∈ Ω } = Pr o b { A Π ∈ Ω } Then we say that A has e x c hangea ble columns. Below we as sume without loss of genera lit y that A has e x c hangea ble columns. Then (2 .1 ) beco mes: let F b e a fixed k - face of R N + ; then (2.2) E f k ( A R N + ) f k ( R N + ) = Pr ob { AF is a k -face of A R N + } . Let P b e a p olytop e in R N and x 0 ∈ P . The vector v is a feasible direction for P at x 0 if x 0 + tv ∈ P for all sufficie ntly sma ll t > 0. Let F ea s x 0 ( P ) deno te the s et of a ll feasible directions for P at x 0 . Lemma 2 . 2. L et x 0 b e a ve ctor in R N + with exactly k n onzer os. L et F denote the asso ciate d k - fac e of R N + . F or an n × N matrix A , let AF denote the image of F under A . The fol lowing ar e e quivalent: (Survive( A, F, R N + )): AF is a k -fac e of A R N + , (T r ansverse( A, x 0 , R N + )) N ( A ) ∩ F e as x 0 ( R N + ) = { 0 } . W e now develop the connections to the pr o babilities in W endel’s theor em. Lemma 2 . 3. L et x 0 ∈ R N + have k nonzer os. L et A b e n × N with n < N have an orthant-symmetric nul l sp ac e with exchange able c olumns. Then Pr ob { ( T r ansverse( A, x 0 , R N + ) ) Holds } = 1 − P N − n,N − k Pr o of. Exchangeability o f the co lumns implies that Prob { (T r a nsverse( A, x 0 , R N + )) Holds } do es not dep e nd on x 0 , but only on the num b er of nonz e ros in x 0 and the size of A . Ther efore, let k b e the num ber of nonzer os in x 0 , and set π k,n,N ≡ Pr ob { (T ra nsv erse ( A, x 0 , R N + )) Holds } . The matrix A has its columns in gener al po sition. Therefo re we may co nstruct a basis b i for its null s pace, N ( A ), having exactly N − n basis vectors. The N by N − n matrix B T having the b i for its co lumns genera tes every vector w in N ( A ) via a pr o duct of the form w = B T c , where c ∈ R N − n . Without loss of gener alit y , suppos e the nonzer os of x 0 are in p o sitions i = N − k + 1 , . . . , N . Then F ea s x 0 ( R N + ) = { v : v 1 , . . . v N − k ≥ 0 } . Condition (T r a nsverse( A, x 0 , R N + )) can be restated as (2.3) (Ineq) The only vector c satisfying ( B T c ) i ≥ 0 , i = 1 , . . . , N − k , is the vector c = 0 . Suppo se the contrary to (Ineq), i.e. supp ose there is a c 6 = 0 solving (2.3). Let now β i denote the i -th row of B T , with i = 1 , . . . , N − k . Then (2.3) is the s ame a s β i · c ≥ 0 , i = 1 , . . . , N − k . Geometrically , this says tha t Each vector β i , i = 1 , . . . , N − k , falls in the ha lf-space β · c ≥ 0. F ACES OF THE RANDOML Y-P R OJECTED HYPERCUBE AND OR THANT 9 Here c is some fixed but ar bitrary nonzero vector. Thus the even t { (Ineq) does no t hold } is eq uiv alent to the even t All the v ectors β i with i = 1 , . . . , N − k fall in s o me half-space of R N − n . By our hypothesis , the vectors β i with i = 1 , . . . , N − k are drawn i.i.d. from a centrosymmetric distr ibution and are in gener al p ositio n. W e now inv oke W endel’s Theorem, a nd it fo llo ws that π k,n,N = 1 − P N − n,N − k . 2.2. Some Generalities ab out Bi nomial Probabiliti e s. The proba bilit y P m,M in W endel’s theorem ha s a classical interpretation: it giv es the pro babilit y of at most m − 1 heads in M − 1 tosses of a fair coin. The usual Normal approximation to the binomial tells us that P m,M ≈ Φ ( m − 1) − ( M − 1) / 2 p ( M − 1) / 4 ! , with Φ the usual standard normal distr ibutio n function Φ( x ) = R x −∞ e − y 2 / 2 dy / √ 2 π ; here the a pproximation sym b ol ≈ ca n b e made precise using standard limit theo- rems, eg . appr o priate for small or lar ge deviations. In this ex pr ession, the approx- imating nor mal has mean ( M − 1) / 2 and standar d dev iation p ( M − 1) / 4. There are thr ee regimes o f in terest, for large m , M , and three b ehaviors for P m,M . • Lower T a il: m ≪ M / 2 − p M / 4. P m,M ≈ 0. • Middle: m ≈ M / 2. P m,M ∈ (0 , 1). • Upp er T ail: m ≫ M / 2 + p M / 4. P m,M ≈ 1. 2.3. Pro of of Theorem 1 . 6. Using the co rresp ondence N − n ↔ m , N − k ↔ M , and the c o nnection to W endel’s theor em, we have three regimes o f interest: • N − n ≪ ( N − k ) / 2 • N − n ≈ ( N − k ) / 2 • N − n ≫ ( N − k ) / 2 In the pr opo rtional-dimensiona l fra mew ork , the ab ov e discus s ion translates into three separate regimes, and s e pa rate b ehaviors we expe c t to b e true: • Ca se 1 : ρ < ρ W ( δ ; H N ). P N n − n,N n − k n → 0. • Ca se 2 : ρ = ρ W ( δ ; H N ). P N n − n,N n − k n ∈ (0 , 1). • Ca se 3 ρ > ρ W ( δ ; H N ). P N n − n,N n − k n → 1. Case 2 is trivially true, but it has no role in the statement of Theorem 1.6. Ca ses 1 and 3 cor resp ond exactly to the tw o pa rts of (1.5) that we must prove. T o prove Cases 1 and 3, w e need an upper b ound deriving from standard large- deviations analysis of the lower tail of the bino mia l. Lemma 2.4. L et N − n < ( N − k ) / 2 . (2.4) P N − n,N − k ≤ n 3 / 2 exp N ψ R + W n N , k n wher e the exp onent is define d as (2.5) ψ R + W ( δ, ρ ) := H ( δ ) + δ H ( ρ ) − H ( ρδ ) − (1 − ρδ ) log 2 with H ( · ) the Shannon Ent r opy (1.7) 10 DA VID L. DONOHO AND J ARED T ANNER Pro of. Upperb ounding the s um in P N − n,N − k by N − n − 1 times N − k − 1 N − n we arrive at (2.6) P N − n,N − k ≤ 2 N − k − 1 n − k N − k · ( N − k + 1) N n n k N k − 1 . W e ca n b ound m γ · m for γ < 1 using the Shanno n en tropy (1.7): (2.7) c 1 n − 1 / 2 e mH ( γ ) ≤ m γ · m ≤ c 2 e mH ( γ ) where c 1 := 16 25 p 2 /π , c 2 := 5 / 4 √ 2 π . Recalling the definition of ψ R + W , w e obtain (2.4). W e will now co nsider Cases 1 and 3, and prov e the cor resp onding conclus io n. Case 1: ρ < ρ W ( δ ; R N + ) . The threshold function ρ W ( δ ; R N + ) is defined a s the zer o level curve ψ H W ( δ, ρ W ( δ ; R N + )) = 0 ; thus for any ρ strictly b elow ρ W ( δ ; R N + ), the ex- po nen t ψ R + W ( δ, ρ ) is str ictly negative. Lemma 2.4 th us implies that P N n − n,N n − k n → 0 as n → ∞ . δ =n/N ρ =k/n Weak exponent ψ W H ( ρ , δ ) 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −0.6 −0.5 −0.4 −0.3 −0.2 −0.1 Figure 2.1. Exp onent for the weak phase tr a nsition, ψ R + W ( ρ, δ ), (2.5), which has its zero level curve at ρ W ( δ ; R N + ), equation (1.3). The pro jected h yp ercub e has the same weak phase transitio n and exp onent ψ H W ≡ ψ R + W . Case 3: ρ > ρ W ( δ ; R N + ) . Binomial proba bilities have a sta nda rd s ymmetry (relab el every ‘head’ o utcome as a ’tail’, and vice versa). It follows that P m,M = 1 − P M − m,M . W e hav e P N − k ,N − n = 1 − P N − k ,n − k . In this case N − n > ( N − k ) / 2, so Lemma 2.4 tells us that P N − k ,n − k → 0 as n → ∞ ; we conclude P N − k ,N − n → 1 as n → ∞ . 2.4. Pro ofs of Theorems 1 .1 and 1 .5. W e derive the exact non- asymptotic re- sult Theo rem 1.11 fro m Theor em 1.1 0 by symmetry . The limit results in Theorems 1.1 a nd 1.5 fo llow immedia tely from asymptotic analysis of Sec tio n 2.3. W e b egin as b efore, relating face counts to probabilities of surviv al. (2.8) E f k ( AH N ) f k ( H N ) = Ave F Prob { AF is a k -face of AH N } . F ACES OF THE RANDOML Y-P R OJECTED HYPERCUBE AND OR THANT 11 Here Av e F denotes the av erage ov er k -faces of H N . As b efore, we assume exchangeable columns as a calculation device, allowing us to fo cus on o ne k -face, but co mpute the av erag e. Under exchangeability , for any fixed k - face F , (2.9) E f k ( AH N ) f k ( H N ) = Pr ob { AF is a k -face of AH N } . W e a ls o a gain refor m ulate ma tters in ter ms of tra nsversal in tersection. Lemma 2. 5. L et x 0 b e a ve ctor in H N with exactly k nonzer os. L et F denote the asso ciate d k - fac e of H N . F or an n × N m atrix A the fol lowing ar e e quivalent: (Survive( A, F, H N )): AF is a k -fac e of AH N , (T r ansverse( A, x 0 , H N )): N ( A ) ∩ F e as x 0 ( H N ) = { 0 } . W e next connect the hyper cube to the p ositive o r thant . Infor mally , the p oint is that the p ositive o rthant in so me sense shares faces with the ”lower faces” of the hypercub e. F o r mally , let x 0 be a vector having x ( i ) = 0 , 1 ≤ i ≤ N − k − 1 , and x ( i ) = 1 / 2, N − k ≤ i ≤ N . Then x 0 belo ngs to b oth H N and R N + . It makes sense to define the t wo cones F eas x 0 ( H N ) and F eas x 0 ( R N + ) for this sp ecific po in t x 0 , and we immediately see F eas x 0 ( H N ) = F eas x 0 ( R N + ) . In fact this equa lit y holds fo r all x 0 in the re la tiv e interior o f the k -face o f H N containing x 0 . W e conclude: Lemma 2. 6 . L et F k,H b e the k -dimensional fac e of H N c onsisting of al l ve ctors x with x ( i ) = 0 , 1 ≤ i ≤ N − k − 1 , and 0 ≤ x ( i ) ≤ 1 , N − k ≤ i ≤ N . L et F k, R + b e t he k -dimensional fac e of R N + c onsisting of al l ve ct ors x with x ( i ) = 0 , 1 ≤ i ≤ N − k − 1 , and 0 ≤ x ( i ) , N − k ≤ i ≤ N . Then (2.10) P r ob { AF k,H is a k -fac e of AH N } = Pr ob { AF k, R + is a k -fac e of A R N + } . Combining (2.8) and Lemma 2 .6 we o btain the non- asymptotic Lemma 1.11 from the corresp onding non- asymptotic result fo r the p ositive or than t. 2.5. Pro of of Theorem 1.7. P N − n,N − k is the proba bilit y that one fixed k -dimensio nal face F o f R N + generates a k -face AF of A R N + . The probability that some k - dimensional face generates a k -face can b e upp erb o unded, using Bo ole’s inequality , by f k ( R N + ) · P N − n,N − k . F r om (2.7 ), (2 .4), a nd f k ( R N + ) = N k we hav e f k ( R N + ) · P N − n,N − k ≤ n 3 / 2 exp( N ψ R + S ( δ n , ρ n )) where ψ R + S was defined earlier in (1.8), as (2.11) ψ R + S ( δ, ρ ) := H ( δ ) + δ H ( ρ ) − (1 − ρδ ) log 2 . Recall that for δ ≥ 1 / 2 , ρ S ( δ ; R N + ) is the zer o cr ossing of ψ R + S . F or any ρ < ρ S ( δ ; R N + ) w e hav e ψ R + S ( δ, ρ ) < 0 and as a result (1.9) follows. 3. Contrasting the Hypercube with Other Pol ytopes The theorems in Sectio n 1 contrast str ongly with existing results for o ther po ly- top es. 12 DA VID L. DONOHO AND J ARED T ANNER 3.1. Non-E xistence of W e ak Thresholds at δ < 1 / 2 . Theor e m 1.5 identifies a region of ( n N , k N ) where the typical random zonotop e has nearly as many k -face s as its gener a ting h yp ercub e; in particular, if n < N/ 2, it has many fewer k -faces tha n the h yp ercub e, for every k . This b ehavior a t n/ N < 1 / 2 is quite different from the b ehavior of typical ra ndom pro jections of the simplex a nd the cross-p olytop e. Those po lytopes hav e f k ( AQ ) ≈ f k ( Q ) for quite a large range of k even at relativ ely small v alues of k /n , [13], see Figure 3.1. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 δ =n/N ρ =k/n Figure 3. 1. W eak thresholds for the simplex, ρ W ( δ ; T N − 1 ) (black-dash), and cross- p olyto pe, ρ W ( δ ; C N ) (black-solid). Con- sider sequence s ob eying the pro p or tio nal-dimensional a symptotic with parameter s δ , ρ . F or ( δ, ρ ) b elow these curves, and for large n , each pro jected p olytop e has nearly as many k -faces as its gen- erator; a bove these curves the pro jected p o lytop e has noticea bly few er. Strong thresholds for the simplex, ρ S ( δ ; T N − 1 ) (blue-da sh), and cross-p olytop e, ρ S ( δ ; C N ) (blue-s olid). F or ( δ, ρ ) b elow these curves, and for la rge n , each pro jected p olytop e and its gener ator t ypically hav e exactly the same n umber of k -faces. 3.2. Non-E xistence o f Strong Thresholds for H yp e rcube. Lemma 1.8 s ho ws that pro jected zo notope s always have strictly fewer k -faces than their genera to rs f k ( AH N ) < f k ( H N ), for every n < N . this is a gain q uite differe nt from the situation with the simplex and the cro ss-p olytop e, where we can even hav e n ≪ N and still find k for which f k ( AQ ) = f k ( Q ), [13], see Figure 3.1. 3.3. Universalit y of w eak phase transitions. F or Theo rems 1.1 and 1.5, A ca n be sampled from a n y ensemble of random matrice s having an or than t-symmetric and gener ic rando m null s pa ce. Our r esult is thus universal acr oss a wide cla s s of matrix ensembles. In pr oving w eak a nd strong thresho ld results for the simplex a nd cross- poly tope, we requir ed A to either b e a random ortho-pro jector or to hav e Gaussian iid en tries. Thu s, what w e prov ed for those families of regula r p olyto p es applies to a muc h mo re limited ra nge of matrix ensembles than what has now been prov en for h yp ercub es. Our empir ical studies s uggest that the same ensembles of ma trices which ‘work’ for the hypercub e weak threshold also ‘work’ for the simplex and cr o ss-p olytop e F ACES OF THE RANDOML Y-P R OJECTED HYPERCUBE AND OR THANT 13 thresholds. It seems to us that the universality across matrix ensem bles prov en here may point to a m uch larger phenomenon, v alid also for other p olytop e families. F or our empir ical s tudies s ee [1 4]. In fact, even in the hypercub e case , the weak threshold phenomenon may b e more g eneral than wha t can b e prov en to day; it seems also to hold for some matrix ensembles that may not hav e an or tha n t-symmetric n ull space. 4. Contrasting the Cone with the Hypercube The weak Cone thresho ld depends very muc h more delicately on details ab out A than do the hypercub e thresholds; it really makes a difference to the r esults if the matrix A is not ‘zer o-mean’. 4.1. The Lo w-F requency Partial F ourier Matrix. Co nsider the sp ecial partial F o ur ier matrix made only of the n low est frequency entries. Cor ol lary 4.1 . Assume n is o dd and let (4.1) Ω ij = cos π ( j − 1)( i − 1) N i = 1 , 3 , 5 , . . . , n sin π ( j − 1) i N i = 2 , 4 , 6 , . . . , n − 1 . Then f k (Ω R N + ) = f k ( R N + ) , k = 0 , 1 , . . . , 1 2 ( n − 1) . This b ehavior is dramatica lly different than the ca se for rando m A of the type considered so fa r, a nd in s ome se nse dra matically b etter. Corollar y 4 .1 is closely connected with the class ic a l questio n of n eighb orliness . There are famo us p olytop es which can b e g e nerated by pr o jections AT N − 1 and hav e exactly as many k -faces as T N − 1 for k ≤ ⌊ n/ 2 ⌋ . A standa r d exa mple is provided by the matr ix Ω defined in (4 .1); it ob eys f k (Ω T N − 1 ) = f k ( T N − 1 ), 0 ≤ k ≤ ⌊ n/ 2 ⌋ . (There is a v ast literature to uc hing in s o me wa y on the pheno menon f k (Ω T N − 1 ) = f k ( T N − 1 ). In that litera ture, the p olytop e Ω T N − 1 is usually called a cyclic p olytop e , and the columns of Ω are called p oints of the trigonometric moment curve ; se e standard r eferences [16, 20]). Hence the matrix Ω o ffer s both f k (Ω T N − 1 ) = f k ( T N − 1 ) and f k (Ω R N + ) = f k ( R N + ) for 0 ≤ k ≤ ⌊ n/ 2 ⌋ . This is exceptiona l. F o r random A of the type discus sed in earlier sections, there is a large disparity betw een the sets of triples ( k , n, N ) wher e f k ( AT N − 1 ) = f k ( T N − 1 ) – this happ ens for k /n < ρ S ( n/ N ; T N − 1 ) – a nd those where f k ( A R N + ) = f k ( R N + ) – this ha ppens for k /n < ρ S ( n/ N ; R N + ). Thes e tw o strong thresholds are displayed in Figures 3.1 and 1.1 r e spectively . Even if we relax o ur notion of agreement of face counts to weak agreement, the collections of triples where f k ( AT N − 1 ) ≈ f k ( T N − 1 ) and f k ( A R N + ) ≈ f k ( R N + ) are very different, b ecause the tw o cur v es ρ W ( n/ N ; T N − 1 ) and ρ W ( n/ N ; R N + ) are so dramatically different, particular ly at n < N/ 2. 4.2. Adjoini ng a Row of Ones to A . An imp ortant feature o f the rando m ma- trices A studied earlier is that their random nullspace is or than t s y mmetric. In particular, the po sitive o rthant plays no distinguished role with resp ect these ma- trices. O n the o ther hand, the partial F our ie r matrix Ω constructed in the last subsection contains a row of ones, and th us the p ositive orthant has a distinguished role to pla y for this matrix. Moreover, this distinctio n is crucia l; we find empirically 14 DA VID L. DONOHO AND J ARED T ANNER that r emoving the row o f ones from Ω causes the conclusion of Co rollary 4.1 to fail drastically . Conv ersely , consider the matr ix ˜ A obta ined by adjoining a row of N ones to some matrix A: ˜ A = 1 A . Adding this row of o nes to a ra ndom matrix causes a drastic s hift in the stro ng and weak thresholds. The fo llowing is pr ov ed in Section 6. Theorem 4.1. Consider the pr op ortional-di mensional asymptotic with p ar ameters δ, ρ in (0 , 1) . L et t he r andom n − 1 by N matrix A have iid standar d normal entries. L et ˜ A denote t he c orr esp onding n by N matrix whose first r ow is al l ones and whose r emaining r ows ar e identic al to t hose of A . Then (4.2) lim n →∞ E f k ( ˜ A R N + ) f k ( R N + ) = 1 , ρ < ρ W ( δ, T N − 1 ) < 1 , ρ > ρ W ( δ, T N − 1 ) . (4.3) lim n →∞ P { f k ( ˜ A R N + ) = f k ( R N + ) } = 1 , ρ < ρ S ( δ, T N − 1 ) 0 , ρ > ρ S ( δ, T N − 1 ) . Note particularly the mixe d form of this re la tionship. Althoug h the conclusions concern the b ehavior of faces of the rando mly-pro jected orthant , the thresholds are those that w ere previously obtained for the randomly-pro jected simplex . Since there is such a dramatic difference b etw een ρ ( δ, T N − 1 ) and ρ ( δ, R N + ), the single row of o nes can fa ir ly b e said to hav e a huge e ffect. In particular , the region ’b elow’ the simplex weak phase transition ρ W ( δ, T N − 1 ) comprises ≈ 0 . 5634 of the ( δ, ρ ) para meter area, and the hype rcube weak phas e transition ρ W ( δ, H N ) comprises 1 − log 2 ≈ 0 . 3069. 5. A pplica tion: Compressed Sensing Our face coun ting results can all b e reinterpreted as statements ab out “simple” solutions of under determined systems of linear equations. This reinterpretation al- lows us to make connections with n umerous problems of curr en t interest in signal pro cessing, informa tion theory , and proba bilit y . The reinterpretation follows fro m the tw o following le mma s, which ar e restatements o f Lemmas 2.2 and 2.5, rephras - ing the notio n o f (T r ansverse( A, x 0 , Q )) with the all but linguistically equiv alent (Unique( A, x 0 , Q )). F or pro ofs o f Lemmas 5.1 and 5.2 see the pro ofs of Lemmas 2.2 a nd 2.5. Lemma 5 . 1. L et x 0 b e a ve ctor in R N + with exactly k n onzer os. L et F denote the asso ciate d k - fac e of R N + . F or an n × N matrix A , let AF denote the image of F under A and b 0 = Ax 0 the image of x 0 under A . The fol lowing ar e e quivalent: (Survive( A, F, R N + )): AF is a k -fac e of A R N + , (Unique( A, x 0 , R N + )): The system b 0 = Ax has a unique solution in R N + . Lemma 5 . 2. L et x 0 b e a ve ctor in H N with exactly k entr ies s trictly b et we en the b ounds { 0 , 1 } . L et F denote the asso ciate d k - fac e of H N . F or an n × N matrix A , let AF denote the image of F u nder A and b 0 = Ax 0 the image of x 0 under A . The fol lowing ar e e quivalent: (Survive( A, F, H N )): AF is a k -fac e of AH N , (Unique( A, x 0 , H N )): The system b 0 = Ax has a unique solution in H N . F ACES OF THE RANDOML Y-P R OJECTED HYPERCUBE AND OR THANT 15 Note that the systems of linear equations r eferred to in these lemmas are under- determined: n < N . Hence these lemmas identif y co nditions o n underdetermined system of linear equations, such that, when the solution is known to ob e y cer tain constraints, there are many cases where this seemingly weak a priori knowledge in fact uniquely determines the s o lution. The first result can b e par aphrased as saying that nonnegativity constraints can b e very pow erful, if the o b ject is known to hav e relatively few nonzer os; the second result says that upp er and low er bounds can b e very powerful, provided those b ounds are active in most cases. These results provide a theor e tical v ant age po in t on an area o f r e cen t intense int erest in signal pro cessing, app earing v a riously under the lab els “ Compressed Sensing” o r “ Compressive Sampling”. In many pr actical applicatio ns of s c ien tific a nd engineer ing signal pr o cessing – sp ectroscopy is one example – o ne can obtain n linea r measur emen ts o f an ob ject x , obtaining data b = Ax ; here the rows o f the matrix A give the linear resp onse functions of the measurement devices. W e wish to r econstruct x , knowing only the measurements b , the measur emen t matrix A , and v arious ⁀ a prio ri constraints on x . It co uld be very useful to b e able to do this in the case n < N , allowing us to sav e measurement time or other r e s ources. This seems hop eless, b e c ause the linear system is underdeter mined; but the ab ov e lemmas show that ther e is some fundamen tal soundness to the ide a tha t we can have n < N and still r econstruct. W e now s pell out the consequences of these lemmas in more deta il. 5.1. Recons truction Exploi ting Nonnegativity Constrain ts. Many practical applications, such as sp ectros copy a nd as tronomy , the o b ject x to b e r ecov ered is known a prio ri to be nonnegative. W e wish to reconstruct the unknown x , kno wing only the linear measurements b = Ax , the matr ix A , and the constraint x ∈ R N + . Let J ( x ) be some function o f x . Consider the p ositivity-constrained v ar iational problem ( P os J ) min J ( x ) sub ject to b = Ax, x ∈ R N + . Let pos J ( b, A ) denote any solution of the pro blem instance ( P os J ) defined by data b and matr ix A . Typical v ariational functions J include • Spa r sit y: k x k ℓ 0 := # { i : x > 0 } . • Size: 1 ′ x . • neg Ent ropy: P x ( j ) log ( x ( j )) • E nergy: P x ( j ) 2 This fr amework con tains as s p ecial cases the po pular signal pro ces sing metho ds of maximum ent ropy reco nstruction and nonneg ative least-squares r econstruction. W e co nclude the followin g: Cor ol lary 5.1 . Supp o se that f k ( A R N + ) = f k ( R N + ) . Let x 0 ≥ 0 and k x 0 k ℓ 0 ≤ k . F or the problem instance defined b y b = Ax 0 pos J ( b, A ) = x 0 . In words: under the given conditions on the face num bers, any v ariatio nal pre- scription which imp oses nonnegativit y constra in ts will correctly recover the k -s pa rse solution in any problem instance where such a k -spars e solution exists. This may 16 DA VID L. DONOHO AND J ARED T ANNER seem surprising; as n < N , the system of linear eq uations is underdetermined yet we correctly find a s parse so lution if it exists. Corresp onding to this ‘strong’ statement is a ‘weak’ statement. Consider the following probability measur e on k -spar se problem instances. • Cho os e a random subset I of size k from { 1 , . . . , N } , b y k s imple ra ndom draws without repla c emen t. • Set the entries o f x 0 not in the selected subset to zer o. • Cho os e the en tries of x 0 in the selected set I fro m some fixed joint distr i- bution ψ I suppo rted in (0 , 1) k . • Gener ate the pr o blem instance b = Ax 0 . W e sp eak of drawing a k -spar se random problem instance at random. Cor ol lary 5.2 . Supp o se that for some ǫ ∈ (0 , 1). f k ( A R N + ) ≥ (1 − ǫ ) · f k ( R N + ) . F o r ( b, A ) a problem instance dr awn at r andom, as a bove: Prob { pos J ( b, A ) = x 0 } ≥ (1 − ǫ ) . In words: under the given conditions on the face lattice, any v a r iational pre- scription which imp oses no nneg ativity constraints will corr ectly succeed to recover the k -sparse solution in at le ast a fraction (1 − ǫ ) of all k -sparse problem instances. This may seem surprising; since n < N , the system of linear equations is under de- termined, and y et, w e typically find a spar se solution if it exists. Here are some simple applications: • In the prop ortional-dimensio nal framework, consider triples ( k n , n, N n ) with parameters δ, ρ . Let A denote an n b y N n matrix having rando m nullspace which is orthant symmetric and gener ic. – If the para meter s δ, ρ name a p oint ’b elow’ the orthant we ak thr esh- old ρ W ( δ ; R N + ), then fo r the v ast ma jo rit y of k n -sparse vectors, any v ar iational metho d will cor rectly recover the vector. – If the para meters δ, ρ na me a p oint ’be low’ the orthant str ong thr esh- old ρ S ( δ ; R N + ), then fo r large enough n , ev ery k n -sparse vector ca n be co rrectly recovered by any v ariational method imp osing p ositivity constraints. • In the prop ortiona l- dimensional a symptotic, c o nsider triples ( k n , n, N n ) with parameter s δ, ρ . Le t A 0 denote an n − 1 by N n matrix having iid standard normal entries. And let A denote the n b y N n matrix formed by adjoining a row of ones to A 0 . – If the pa rameters δ, ρ name a p oint ’b elow’ the simplex we ak t hr esh- old ρ W ( δ ; T N − 1 ), then for the v ast ma jority of k n -sparse vectors, any v ar iational metho d will cor rectly recover the spa rse vector. – If the parameters δ, ρ name a p oint ’b elow’ the simplex stro ng t hr esh- old ρ S ( δ, T N − 1 ), then for la r ge enough n , every k n -sparse vector can be co rrectly recovered by any v ariational method imp osing p ositivity constraints. • Let A denote the n by N par tial F ourier matr ix built from low frequencies and called Ω in Section 4.1. Every ⌊ n/ 2 ⌋ -spa r se vector will b e corr ectly recov ered by an y v a riational metho d imp osing p ositivity constr ain ts. F ACES OF THE RANDOML Y-P R OJECTED HYPERCUBE AND OR THANT 17 Hence in p ositivity-constrained reco ns truction problems where the ob ject to b e recov ered is zero in mos t entries – an ass umption which approximates the truth in many pr oblems o f sp ectros copy and astro nomical imaging [9], we can work with few er than N samples. The ab ov e parag raphs show that it matters a gr e at deal what ma trix A we use. O ur prefer ence o rder: Ω is b etter than the random matr ix, ˜ A is b etter than a random zero-mea n matrix A . (These results ex tend and generalize results which were previously obtained by the authors in [11], in the case where J ( x ) = 1 ′ x , and by the first author a nd coauthors in [9 ]; see a lso F uchs [15] and Bruckstein, E la d, and Ziubulevsky [6].) 5.2. Recons truction Exploi ting Box Cons train ts. Cons ider again the prob- lem o f reco nstruction fro m measur emen ts b = Ax , but this time assuming the ob ject x ob eys b ox-constraints: 0 ≤ x ( j ) ≤ 1, 1 ≤ j ≤ N . Suc h constraints can ar ise for example in infrar ed a bsorption sp ectro scopy and in binar y digital communications. W e define the b ox-constrained v aria tional problem ( B ox J ) min J ( x ) sub ject to b = Ax, 0 ≤ x ( j ) ≤ 1 , j = 1 , . . . , N . Let b ox J ( b, A ) denote a n y solution of the pr oblem ins tance ( B ox J ) defined b y data b and matr ix A . In this s e tting, the notion cor resp onding to ’spa rse’ is ’simple’. W e say that a vector x is k -simple if at most k of its en tries differ from the b o unds { 0 , 1 } . Here, the in teresting functions J p enalize deviations fr om simple structure; they include: • Simplicity: # { i : x ( i ) 6∈ { 0 , 1 }} . • Viola tion E nergy: P x ( j )(1 − x ( j )) Cor ol lary 5.3 . Supp o se that f k ( AH N ) = f k ( H N ) . Let x 0 be a k -simple vector ob eying the b ox constraints 0 ≤ x 0 ≤ 1. F or the problem instance defined b y b = Ax 0 , box J ( b, A ) = x 0 . In words: under the given conditions on the face lattice, any v a r iational pre- scription w hich imp oses b ox co ns traints, w he n pres en ted with a pr oblem insta nce where there is a k - simple so lution, will cor rectly rec over the k - simple so lution. Corresp onding to this ‘strong’ statement is a ‘weak’ statement. Consider the following pr obability measure on pr oblem instances having k -simple solutions. Re- call that k -simple vectors have a ll entries equal to 0 or 1 except at k exceptional lo cations. • Cho os e the subset I of k ex ceptional entries unifor mly at random fro m the set { 1 , . . . , N } without r eplacement; • Cho os e the nonexceptiona l e ntries to b e either 0 o r 1 based on tossing a fair coin. • Cho os e the v alues of the exceptional k entries according to a joint pro ba - bilit y measur e ψ I suppo rted in (0 , 1) k . • Define the pro blem instance b = Ax 0 . 18 DA VID L. DONOHO AND J ARED T ANNER Cor ol lary 5.4 . Supp o se that for some ǫ ∈ (0 , 1). f k ( AH N ) ≥ (1 − ǫ ) · f k ( H N ) . Randomly sample a pr oblem instanc e ( b, A ) using the metho d just describ ed. P { box J ( b, A ) = x 0 } ≥ (1 − ǫ ) . In words: under the g iv en conditions on the face lattice, any v ariational prescrip- tion which imp oses box co ns traints will corr ectly recov er at least a fraction (1 − ǫ ) of all underdetermined systems generated b y the matr ix A which have k - s imple solutions. Here is a simple application. In the pro p or tional-dimensional asymptotic frame- work, cons ide r triples ( k n , n, N n ) with par ameters δ, ρ . Let A denote an n by N n matrix having random nullspace which is ortha n t symmetric a nd generic. If the pa- rameters δ, ρ name a p oint ’below’ the hypercub e weak threshold, then for the v ast ma jority of k n -simple v ectors , an y v aria tional metho d imp osing b ox constraints will correctly recov er the vector. In the hypercub e case , to o ur k nowledge, there is no pheno menon co mpa rable to that which aro se in the pos itiv e orthant with the spe c ial constructions Ω and ˜ A . Consequently , the h yp ercub e weak thres ho ld is the b est known general r esult on the ability to undersa mple by exploiting b ox constr ain ts. In pa rticular, the difference be tw een the weak simplex thr e shold and the weak hyp ercube thres hold has this in terpretatio n: A given degr ee k of sparsity o f a nonnegative o b ject is muc h more powerful than that same degre e simplicity o f a b ox-constrained ob ject. Spec ific a lly , we shouldn ’t exp e ct t o b e able t o u ndersample a typic al b ox-c onst r aine d obje ct by mor e than a factor of 2 a nd then reco nstruct it using some g a rden-v ariety v ar iational prescr iption. In compar ison, the last sec tio n show ed that w e can s everely undersample very sparse nonnegative ob jects. Because b ox constraints are of interest in imp ortant ar eas of signal pro cessing , it seems that muc h more a tten tion sho uld b e paid to thresholds a sso ciated with the hypercub e. 6. A dditional Proofs 6.1. Pro of of Lemma 2.2. Let b 0 := Ax 0 . Assume (Surv ive( A, F, R N + )), that AF is a k -face o f A R N + . Gener a l p o sition o f A implies that AF is a simplicial co ne of dimension k − 1, and that there exists a unique x ∈ R N + satisfying Ax = b 0 , with x 0 being that solution. W e now a s sume ∃ ν ∈ N ( A ) ∩ F e as x 0 ( R N + ) 6 = 0. Then ∃ ǫ > 0 small enoug h such that z 0 := x 0 + ǫν ∈ R N + . T his z 0 satisfies Az 0 = b 0 , in contradiction to the uniqueness condition previously stated, therefor N ( A ) ∩ F e as x 0 ( R N + ) = { 0 } . F o r the con verse directio n, as s ume (T ransverse( A, x 0 , R N + )), that N ( A ) ∩ F eas x 0 ( R N + ) = { 0 } . Assume AF is not a k -face of A R N + , that is AF is interior to A R N + . As A pro jects the interior o f R N + to the co mplete interior o f A R N + , ∃ z 0 ∈ R N + with z 0 > 0 with Az 0 = b 0 . T he differ e nc e ν := z 0 − x 0 6 = 0, but ν ∈ N ( A ) ∩ F eas x 0 ( R N + ) contradicting the T r ansverse assumption, implying AF is a k -face of A R N + . F ACES OF THE RANDOML Y-P R OJECTED HYPERCUBE AND OR THANT 19 6.2. Pro of of Lemm a 2. 5. This pro of fo llows s imila rly to that of Lemma 2.2 and is omitted. 6.3. Pro of of Lemm a 2. 6. F or p oints x 0 on k -face s of H N that a re also k -fac es of R N + they share the s ame fea s ible set F eas x 0 ( H N ) = F eas x 0 ( R N + ) and by Lemmas 2 .2 and 2.5 the pr obabilities o f (Survive( A , F, Q )) for Q = R N + , H N m ust b e equal. Consider a po in t x 0 on a k - face of H N that is not a k -face of R N + ; without loss o f g enerality , due to column exchangeability , let x 0 ( i ) = 0 i = 1 , . . . , ℓ 1 i = ℓ + 1 , . . . , N − k 1 / 2 i = N − k + 1 , . . . N Then F eas x 0 ( H N ) = { ν : ν 1 , . . . , ν ℓ ≥ 0 , ν ℓ +1 , . . . , ν N − k ≤ 0 } . F ollowing the pr oo f of L e mma 2.3, co nditio n (T rans verse( A, x 0 , H N )) can be restated as (6.1) (Ineq H N ) The o nly vector c s a tisfying ( B T c ) i ≥ 0 , i = 1 , . . . , ℓ, ( B T c ) i ≤ 0 , i = ℓ + 1 , . . . , N − k , is the vector c = 0 where B is the orthog o nal complement of A . Orthant symmetr y of B states that the sign o f ( B T c ) i is equipr obable; conse- quently , the pro babilit y of the even t na med (6.1) is indep endent of ℓ , a nd is in fact equal to the pr o babilit y of the e vent named in (2.3). 6.4. Pro of of Corol lary 4.1. The r esult is a cor ollary of [9, Theorem 3, pp. 56]. How ev er, it may r equire effort on the par t of reader s to see this, so we select the key step from the pr oo f of Theor em 3, [9, Lemma 2, pp. 63], and use it dir ectly within the framework of this paper . As n is o dd, write n = 2 m + 1 wher e m is an integer. The r ange of the matrix Ω is the span of all F ourier frequencies from 0 to π ( m − 1) / N . In ac cord with terminology in electrical eng ineering, this space of v ectors with b e called the space of Lo wpass s e quences L ( m ). The n ullspace of Ω is the spa n of all F ourier frequencies from π m/ N to π / N . It will b e called the space of Highpass sequences H ( m ). W e have the following: Lemma 6.1. [9] Every se quenc e in H ( m ) has at le ast m ne gative entries. Recall condition (T ransverse( Ω , x 0 , R N + )). If x 0 has k nonzer os, then vectors in F eas x 0 ( R N + ) hav e at mos t k neg ative en tries. But vectors in N (Ω) = H ( m ) have at least m neg a tiv e ent ries. Therefore , if m > k , (T ra ns v erse(Ω , x 0 , R N + )) m ust hold. By Lemma 2 .2, every (Survive(Ω , F , R N + )) m ust hold for every k -fa c e with k < m . Hence f k − 1 (Ω R N + ) = f k − 1 ( R N + ), for k ≤ m = 1 2 ( n − 1). 6.5. Pro of of Theorem 4 .1. The Theorem is a n immediate consequence of the following ident ity . Lemma 6.2. Supp ose that the r ow ve ctor 1 is n ot in the r ow sp an of A . Then f k ( ˜ A R N + ) = f k − 1 ( AT N − 1 ) , 0 < k < n. 20 DA VID L. DONOHO AND J ARED T ANNER Pr o of. W e observe that there is a natura l bijection b et ween k - fa ces of R N + and the k − 1-faces of T N − 1 . The k − 1- fa ces o f T N − 1 are in bijection with the co r resp onding suppo rt s ets of cardinality k : i.e. we can identify with eac h k -face F the union I of all supports of all member s o f the face. Similarly to each supp o rt s et I of cardinality k there is a unique k -fac e ˜ F of R N + consisting of all p oint s in R N + whose suppo rt lies in I . Comp osing bijections F ↔ I ↔ ˜ F we hav e the bijection F ↔ ˜ F . Concretely , let x 0 be a p oin t in the relative interior of some k − 1-fa c e F of T N − 1 . Then x 0 has k nonzeros . x 0 is also in the r e la tiv e in terior of the k - face ˜ F of R N + Conv ersely , let y 0 be a p oint in the r elative interior of some k -face of R N + ; then x 0 = (1 ′ y 0 ) − 1 y 0 is a po int in the rela tiv e in terior of a k − 1-face of T N − 1 . The last t wo paragr aphs show that fo r ea c h pa ir o f cor resp onding faces ( F, ˜ F ), we may find a p oin t x 0 in b oth the rela tiv e interior of ˜ F a nd als o of the relative int erior of F . F or such x 0 , F ea s x 0 ( R N + ) = F eas x 0 ( T N − 1 ) + l in ( x 0 ) . Clearly N ( ˜ A ) ∩ lin ( x 0 ) = { 0 } , b ecause 1 ′ x 0 > 0. W e conclude that the following are e q uiv alent: (T r a nsverse( A, x 0 , T N − 1 )) N ( A ) ∩ F eas x 0 ( T N − 1 ) = { 0 } . (T r a nsverse( ˜ A , x 0 , R N + )) N ( ˜ A ) ∩ F eas x 0 ( R N + ) = { 0 } . Rephrasing [11], the following are equiv alent for x 0 a point in the relativ e in terior of F : (Survive( A, F , T N − 1 )) AF is a k − 1-fa ce of AT N − 1 , (T r a nsverse( A, x 0 , T N − 1 )) N ( A ) ∩ F eas x 0 ( T N − 1 ) = { 0 } . W e conclude tha t for t wo corres ponding fa ces F , ˜ F , the fo llo wing ar e equiv alent : (Survive( A, F , T N − 1 )): AF is a k − 1-face of AT N − 1 , (Survive( ˜ A, ˜ F , R N + )): ˜ A ˜ F is a k -fac e of ˜ A R N + . Combining this with the natural bijection F ↔ ˜ F , the lemma is prov ed. Ac knowledgments. Art O w en s uggested that we pa y atten tion to W e ndel’s Theor em. W e also tha nk Go o dma n, Pollack, and Schneider for providing scholarly background. References [1] F ernando Affentranger and Rolf Sch neider, R andom pr oje ctions of r e gular simplic es , Discrete Comput. Geom. 7 (1992), no. 3, 219–22 6. MR MR1149653 (92k:52008) [2] Y uliy M. Baryshniko v, Gaussian samples, r egula r simplic es, and ex c hange ability , Discr ete Comput. Geom. 17 (1997), no. 3, 257–261. MR MR1432063 (98a:52006) [3] Y uliy M. Baryshniko v and Ri c hard A. Vitale, R e gular simplic es and Gaussian samples , Dis- crete Comput. Geom. 11 (1994) , no. 2, 141–147. M R MR1254086 (94j:6001 7) [4] Ethan D. Bolk er, A class of c onvex b o dies , T rans. Amer . Math. So c. 145 (1969), 323–345. MR MR0256265 (41 #921) [5] K´ aroly B¨ or¨ oczky , Jr. and Martin Henk, R andom pr oje ctions of r e gular p olytop es , Arch . Math. (Basel) 73 (1999) , no. 6, 465–473. M R MR1725183 (200 1b:52004) [6] A. M. Bruckstein, M. Elad, and M . Zibulevsky , On the uniqueness of non-ne gative sp arse and r e dundant r epr esentations , ICASSP 2008 sp ecial session on Compr essed Sensing, Las V egas, Nev ada., 2008. [7] David L. Donoho, Hig h- dimensional c entr al ly-symmetric p olytop es with neighb orliness pr o- p ortional to dimension , Disc. Comput. Geomet ry 35 (2006), no. 4, 617–6 52. [8] , Neig hbo rly p olytop es and sp arse solutions of under determine d line ar e quations , Stan - ford University , T ec hnical Report (2006 ). F ACES OF THE RANDOML Y-P R OJECTED HYPERCUBE AND OR THANT 21 [9] David L. Donoho, Iain M. Johnstone, Jeffrey C. Ho ch, and Alan S. Stern, Maximum entro py and the ne arly black obje ct , J ournal of the Roy al St atistical Society , Series B (Metho dological) 54 (1992), no. 1, 41–81. [10] Dav id L. Donoho and Jared T anner, Neig hb orliness of r andomly-pr oje cte d simplic e s in high dimensions , Pr o c. N atl. Acad. Sci. USA 102 (2005), no. 27, 9452– 9457. [11] , Sp arse nonne gati v e solu tions of und er determine d line ar e quations by linea r pr o gr am- ming , Pro c. N atl. Acad. Sci. USA 1 02 (2005), no. 27, 9446– 9451. [12] , Exp onential b ounds implying c onstruction of neighb orly p olytop es, err or-c orr e cting c o des and c ompr esse d sensing matric es by r andom sampling , preprin t (2007 ). [13] , Counting fac es of r andomly-pr oje ct ed p olytop es when the pr oje ction r adic al ly lowers dimension , J. AMS (2008). [14] , Sharp thr esholds i n c ompr e sse d sensing ar e universal acr oss matrix ensembles , (2008). [15] Jean-Jacques F uc hs, On sp arse r e pr esentations in arbitr ary re dundant ba ses , IEEE T rans. Inform. Theory 50 (2004) , no. 6, 1341–1344. MR MR2094894 [16] Branko Gr¨ un baum, Convex p olytop es , second ed., Graduate T exts in Mathematics, vol. 221, Springer-V erlag, New Y ork, 2003, Prepared an d with a preface by V ol k er Kaibel, Victor Klee and G ¨ un ter M. Ziegler. MR M R 1976856 [17] Mark Rudelson and Roman V ershynin, The smal lest singular value of a r e ctangular r andom matrix , preprint (2008). [18] A. M. V ershik and P . V . Sp oryshev, Asymptotic b ehavior of the numb er of fac es of r andom p olyhe dr a and the nei ghb orliness pr oblem , Selecta Math. So viet. 11 (1992), no. 2, 181–201 . MR MR1166627 (93d:60017) [19] James G. W endel, A pr oblem in ge ometric pr ob ability , Mathematics Scandina via 11 (1962 ), 109–111. [20] G¨ unter M. Ziegler, L ectur es on p olytop es , Graduate T exts in Mathe matics, v ol. 152, Springer- V erlag, New Y ork, 1995. MR M R 131102 8 (96a:52011) Dep ar tmen t of St a tistics, S t an ford University Curr e nt addr ess : Departmen t of Statistics, Stanford University E-mail addr ess : donoho@s tanford.edu School of Ma thema tics, University of Edinburgh Curr e nt addr ess : Sc ho ol of Mathematics, Uni v ersity of Edinburgh E-mail addr ess : jared.ta nner@ed.ac. uk
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment