Super-resolution, Extremal Functions and the Condition Number of Vandermonde Matrices

Super-resolution is a fundamental task in imaging, where the goal is to extract fine-grained structure from coarse-grained measurements. Here we are interested in a popular mathematical abstraction of this problem that has been widely studied in the …

Authors: Ankur Moitra

Sup er-resolution, Extremal F unctions and the Condition Num b er of V andermonde Matrices Ankur Moitra ∗ April 30, 2015 Abstract Sup er-resolution is a fundamen tal task in imaging, where the goal is to extract fine-grained structure from coarse-grained measuremen ts. Here we are interested in a p opular mathematical abstraction of this problem that has b een widely studied in the statistics, signal pro cessing and mac hine learning comm unities. W e exactly resolve the threshold at which noisy super-resolution is p ossible. In particular, w e establish a sharp phase transition for the relationship betw een the cutoff frequency ( m ) and the separation (∆). If m > 1 / ∆ + 1, our estimator conv erges to the true v alues at an inv erse polynomial rate in terms of the magnitude of the noise. And when m < (1 −  ) / ∆ no estimator can distinguish b et ween a particular pair of ∆-separated signals ev en if the magnitude of the noise is exp onentially small. Our results in volv e making nov el connections b et ween extr emal functions and the sp ectral prop erties of V andermonde matrices. W e establish a sharp phase transition for their condition num b er which in turn allows us to giv e the first noise tolerance b ounds for the matrix pencil metho d. Moreov er we sho w that our metho ds can b e in terpreted as giving preconditioners for V andermonde matrices, and we use this observ ation to design faster algorithms for sup er-resolution. W e b eliev e that these ideas may hav e other applications in designing faster algorithms for other basic tasks in signal pro cessing. ∗ Massach usetts Institute of T echnology . Department of Mathematics and the Computer Science and Artificial Intelligence Lab. Email: moitra@mit.edu . This work is supported in part b y a gran t from the MIT NEC Corporation and a Google Researc h Award. 1 In tro duction 1.1 Bac kground Sup er-resolution is a fundamen tal task in imaging, where the goal is to resolv e features at a scale b eyond the diffraction limit. This problem has receiv ed considerable attention b oth as a statistic al and algorithmic problem — where the goal is to lo calize ob jects using only low frequency measurements — as well as an engine ering problem — where the goal is to design devices that can w ork around the diffraction limit. In fact, the 2014 Nob el Prize in Chemistry w as recently aw arded to Eric Betzig, Stefan Hell and William Mo erner “ ...for the developmen t of sup er-resolv ed fluorescence microscopy ... ” whic h has play ed a key role in a num b er of scientific disco veries, such as recen t discov eries ab out how viruses c hange their shap e before en tering a cell [ 32 ]. W e will be interested in a popular mathematical abstraction of this problem that has b een widely studied in the statistics, signal pro cessing and mac hine learning communities [ 11 , 5 , 6 , 3 ], In particular, there is an unkno wn super-p osition of k point sources: x ( t ) = k X j =1 u j δ f j ( t ) where δ τ ( t ) is the Dirac delta function at τ and w e assume that eac h f j ∈ [0 , 1). Additionally eac h measure- men t v ` of these p oin t sources tak es the form v ` = Z 1 0 e i 2 π `t x ( t ) dt + η ` = k X j =1 u j e i 2 π f j ` + η ` where η ` is noise in the measurement. How ever due to physical limitations, we are only able to measure the data at low-frequencies — i.e. we can observe v ` for | ` | ≤ m where m is referred to as the cutoff fr e quency . The basic question is, how large do es m need to be in order to recov er the lo cations f j and the co efficien ts u j ? The problem of recov ering fine-grained structure from coarse-grained measuremen ts has a broad set of applications including in domains suc h as medical imaging, microscopy , astronomy , radar, geophysics and sp ectroscop y (see [ 11 , 30 , 5 ] and references therein). This is also a fundamental algorithmic and statistical problem in its own right. This particular estimation problem has a long history of study , and there are a v ariety of methods that w ork in the noise-free case for m = k suc h as Pron y’s metho d (1795) [ 14 ], Pisarenk o’s metho d (1973) [ 31 ], the matrix p encil metho d (1990) [ 23 ] and other linear prediction metho ds (see references in [ 41 ]). This is optimal since if we had any few er than 2 k + 1 total measuremen ts regardless of their frequency , the true signal would not necessarily b e the sparsest solution to the inv erse problem [ 16 ]. Here we study this problem in the presence of noise, and establish that it exhibits a sharp phase transition. Let ∆ denote the minimum separation b et ween any pair of the f j ’s according to the wrap-around metric on the unit interv al. Our main algorithmic result sho ws that if m > 1 / ∆ + 1 then noisy sup er-resolution is p ossible and our estimates con verge to the true v alues at an inv erse p olynomial rate in terms of the magnitude of the noise. This is tigh t in that if m = (1 −  ) / ∆ there is a pair of k p oin t sources x and x 0 eac h with separation ∆ where w e w ould need the noise to be exponentially small to tell them apart. Along the w ay , w e also prov e a sharp phase transition for the condition num b er of the V andermonde matrix based on connections to certain extremal functions studied in analytic num b er theory . T o elab orate on this, one can think of a h yp othesis testing version of v arious learning problems. Then in many settings w e need a family of test functions that are strong enough to distinguish any pair of hypothesis with suitably differen t parameters. F or mixtures of Gaussians, low degree p olynomials work [ 24 ], [ 28 ], [ 1 ] but we can think of the presen t paper as fitting in to a broader agenda where the goal is to find more creativ e c hoices for these test functions which in turn lead to b etter algorithms (see also [ 26 ] where a similar framework is used to design algorithms for the p opulation recov ery problem). 1 1.2 Previous W ork and Connections Recall that there are a v ariety of algorithms for sup er-resolution that w ork for m = k in the noise-free case. W e review one popular tec hnique called the matrix p encil metho d in Section 2.1 where the idea is to express the unknown parameters as the solution to a generalized eigen v alue problem. There hav e b een numerous attempts to analyze the noise tolerance of these metho ds, but all of them pass to limiting argumen ts as m tends to infinity and yield ineffective b ounds. The main b ottlenec k is that this approach relies on solving linear systems in V andermonde matrices, which can b e very p o orly conditioned. In fact, man y other algorithms for in verse problems that use closely related tec hniques w ork in the exact case but are either hop eless or require very differen t settings of their parameters in the presence of noise. F or example: (a) Given the ev aluation of a degree k p olynomial p ( x ) at k + 1 distinct p oin ts we can solve for the co efficien ts of p ( x ) and then w e can ev aluate p ( x ) an ywhere. But if we are given the v alues of p ( x ) up to some additive noise η even for al l v alues of x ∈ [0 , 1] we can only hop e to estimate p (2) within an additiv e c (2 + √ 3) k η . In fact this is tight b y standard results in approximation theory [ 10 ]. (b) In sparse reco very , there is an unkno wn s -sparse signal x and we observe Ax + η . If η = 0 and A is c hosen to b e the first 2 s rows of the discrete F ourier transform matrix w e can recov er x exactly using Pron y’s metho d. But A must ha ve at least Ω( s log n/s ) rows in order to recov er x in the presence of some mo derate amoun t of noise [ 15 ]. W e could wonder whether sup er-resolution suffers from the same type of extreme noise-intolerance that mak es it p ossible in the exact case, but requires (sa y) η ` to b e exp onential ly small in order to recov er the u j ’s and f j ’s to any reasonable accuracy . Prior to the more mo dern w ork on sup er-resolution, there was a considerable effort to dev elop empirical metho ds that work when there is some a priori information about the sparsity of the signal (see references in [ 11 ]). In a remark able w ork, Donoho form ulated the notion of the R ayleigh index which can be though t of as a con tinuous analogue to separation, and studied sup er-resolution where the f j ’s are restricted to b e on a grid [ 11 ]. Donoho used Beurling’s theory of interpolation [ 2 ] to pro ve that any ∆-separated signal can b e uniquely recov ered from from its F ourier transform on the con tinuous range [ − 1 / ∆ , 1 / ∆]. Ho wev er this result is information theoretic in the sense that there could be tw o distinct signals that are both ∆-separated, but whose F ourier transforms restricted to [ − 1 / ∆ , 1 / ∆] differ in a measure zero set. (Hence they are not noticeably different). Donoho also ga ve results on the modulus of contin uity for the recov ery of ∆-separated signals, but these results lose an extra factor of tw o in the range of lo w-frequency measurements that are needed. The main question left op en by this work was to prov e recov ery b ounds when the signal is not restricted to b e on a grid. Subsequen tly , Candes and F ernandez-Granda [ 5 , 6 ] proposed an exciting new metho d for solving this in verse problem based on conv ex programming. The authors prov ed a v ariety of results in the setting where (a) there is no noise or (b) there is noise, but the f j ’s are restricted to b e on a grid. In the noise-free se tting, the authors sho wed their approac h recov ers the u j ’s and f j ’s exactly pro vided that m ≥ 2 ∆ and m ≥ 128 where ∆ is the minimum separation. The authors further impro ved this b ound when the u j ’s are real to a separation condition of m ≥ 1 . 87 ∆ (unpublished) 1 . In the grid setting, the authors defined a parameter C called the sup er-r esolution factor and prov ed v arious bounds using it on the error of their estimator. How ever the approach used to measure the error is non-standard since it uses the F ejer kernel to ‘mask’ the error on the high-frequency parts of the signal. Finally , F erndanez-Granda [ 17 ] returned to the setting where the f j ’s can b e at arbitrary lo cations, and sho wed that if the noise is small enough the estimate b ecomes lo calized around the true spikes. After the initial publication of our work, we became aw are of the recent w ork of Liao and F annjiang [ 25 ] who analyze the MUSIC algorithm. Their approac h is also based on upp er-bounding the condition n umber of the V andermonde matrix and works with a separation condition of m ≥ (1 + C (∆)) / ∆ where C (∆) go es to zero as ∆ increases. W e will prov e a sharp er b ound that is optimal for all ∆, through extremal functions. 1 One should note though, that the metho ds w e describ ed ab ov e work in the noise-free setting for m = k without any separation condition. So the crucial issue really is in understanding when sup er-resolution is p ossible in the presence of noise. 2 F urther Connections In fact, there are a num b er of connections — b oth conceptual and technical — to the rich literature on the sparse F ourier transform (SFFT). W e can in terpret 2 recen t works as providing algorithms for approximately reco vering a signal that is the sup er-position of k p oin t sources at discrete lo cations, even in the presence of noise, from few measurements [ 18 , 19 ] and state-of-the-art algorithms run in time O ( k log k log n/k ) [ 21 ]. What if these p oin t sources are not at discrete lo cations and are allo wed to b e off the grid ? In fact, Bhask ar et al. [ 3 ] defined a problem called c ompr esse d sensing off the grid where we are giv en a subset of the low frequency measuremen ts. More precisely , we are given v ` for all ` ∈ S ⊂ {− m, − m + 1 , ..., m − 1 , m } where | S | is roughly O ( k log m/k ). The authors pro ved that if S is c hosen uniformly at random and m ≥ 2 / ∆, the conv ex program recov ers the u j ’s and f j ’s exactly . Additionally , the authors needed a technical assumption that eac h u j ∈ {± 1 } and is c hosen uniformly at random. How ev er these algorithms are only kno wn to w ork in the noise-free setting, and it remains an in teresting op en question whether compressed sensing off the grid can be solved in the presence of noise, with similar guarantees to what we are able to obtain here. 1.3 Our Results and T echniques Here w e sho w the first noise tolerance bounds for the matrix p encil metho d (see Section 2.1 ) and w e use this to giv e algorithms for noisy super-resolution with the optimal relationship betw een the num b er of measuremen ts and the separation. Our key tec hnical ingredient is giving a tight c haracterization of when the V andermonde matrix V k m ( α 1 , α 2 , ..., α k ) (see Section 2.1 ) is well conditioned. Set α j = e i 2 π f j . Recall that ∆ is the minimum separation of the f j ’s according to the wrap-around metric on the unit interv al. Theorem 1.1. The c ondition numb er κ of V k m ( α 1 , α 2 , ..., α k ) satisfies κ 2 ≤ m +1 / ∆ − 1 m − 1 / ∆ − 1 pr ovide d that m > 1 ∆ + 1 . This theorem is easy to prov e, giv en the right extremal functions. The k ey is the Beurling-Selb erg ma jorant and minorant 3 (see Section 2.2 ) which ma jorize and minorize the sign function on the real line but ha ve a compactly supp orted F ourier transform [ 33 ]. In fact, these functions are the solutions to their resp ectiv e extremal problems and ha ve led to a sharp forms of v arious inequalities in analytic n um b er theory . Most notably these functions can b e used to give simple, direct pro ofs of the large sieve inequality (see [ 44 ]) and extensions of Hilb ert’s inequality [ 29 ]. These inequalities yield strong upp er and lo wer b ounds for in tegrals of sums of exp onen tials, and here we adapt the tec hniques to our setting to reason ab out ev aluating sums of exponentials at discrete sets of points. This readily yields our upper b ound on the condition n um b er. Indeed it is the condition num b er of V k m that gov erns whether or not the matrix p encil metho d is stable. Despite its widespread use in signal pro cessing, this is the first effe ctive b ound on its noise tolerance and w e get immediate consequences for sup er-resolution. Throughout this paper we will quantify the error b et ween our estimates { ( b u j , b f j ) } and the true v alues { ( u j , f j ) } based on the b est matching b et ween them: min π max j max  k b u j − u π ( j ) k , k b f j − f π ( j ) k  where π is a p erm utation. In particular, we refer to our estimates as b eing  -close to the true parameters if there is a matc hing of the k spikes in our estimate to the true k spikes so that across the matching we get b oth the co efficien ts and the locations correct within an additive  . Theorem 1.2. If the cut-off fr e quency satisfies m > 1 / ∆ + 1 wher e ∆ is the minimum sep ar ation (ac c or ding to the wr ap-ar ound metric), then ther e is a p olynomial time algorithm to r e c over the u j ’s and f j ’s whose estimates c onver ge to the true values at an inverse p olynomial r ate in terms of the magnitude of the noise. W e will state the precise quantitativ e b ounds later (see Theorem 2.8 ). These results are tight (as we will describ e next) and settle the question of what is the b est relationship b et ween the num b er of measurements 2 The guarantees in these works are even stronger, since the usual goal is to recov er a k -sparse approximation to the discrete F ourier transform (DFT) whose error is within a (1 +  ) factor of the error of the b est k -sparse approximation. In particular, these algorithms work even when the signal is not particularly close to a sum of k sinusoids. 3 In fact, the minorant is the more imp ortan t of the t wo extremal functions for our purp oses. It is the opp osite for most of the standard applications, where the ma joran t plays the key role and the minorant is a fo otnote. 3 and the minimum separation that we can achiev e for sup er-resolution. Notice that this theorem has none of the restrictions that w ere needed in some of the previous w ork; there are no restrictions on the f j ’s being on a grid, or on the co efficien ts. As w e noted earlier, if k = 1 / ∆ and m < 1 / ∆ then the true signal x is not necessarily the sparsest solution to the in verse problem. This is related to the fact that V k m is not full rank. How ever it is easy to give more comp elling low er b ounds where k is itself muc h smaller than m . Candes and F ernandez-Granda [ 5 ] pro ved almost what w e need, and gav e an x that is ∆-separated where R E k b x ( t ) k 2 dt < 2 − Ω( k ) for E = [ − m/ 2 , m/ 2] and m = (1 −  ) / ∆. Their approach was based on the asymptotics of Slepian’s discrete prolate spheroidal w av e functions which were developed in a sequence of pap ers [ 34 , 35 , 36 , 37 , 38 ]. Here w e give a discrete v ersion of the ab o ve result, and give an elemen tary and self-con tained proof. First, we pro ve: Theorem 1.3. If m < (1 −  ) / ∆ and k = Ω(log m ) , the c ondition numb er of V k m ( α k/ 2 , α k/ 2+1 , ..., α 3 k/ 2 − 1 ) is at le ast 2 Ω( k ) . It turns out that this gives us a pair of p oin t sources that are each 2∆-separated, where setting the cut-off frequency m = (1 −  ) / 2∆ we would need the magnitude of the noise to b e exp onen tially small in order to tell them apart. Hence we cannot ev en solve the follo wing hypothesis testing v ariant in the presence of noise: Corollary 1.4 (Informal) . If the cut-off fr e quency satisfies m < (1 −  ) / ∆ wher e ∆ is the minimum sep ar ation (ac c or ding to the wr ap-ar ound metric), then ther e is a p air x and x 0 of k p oint sour c es e ach with sep ar ation ∆ wher e we would ne e d | η ` | ≤ 2 − Ω( k ) in or der to tel l x and x 0 ap art. This prov es the optimality of our algorithms in a strong sense. In fact, this imp ossibilit y result holds even in a sto c hastic mo del, where the noise η ` has its real and complex part c hosen as Gaussian indep enden t random v ariables with v ariance 2 − Ω( k ) . W e s tate the precise result later (see Corollary 3.2 ). W e can think ab out our use of ma jorants and minorants as b eing a “t wo-function metho d” where we upp er and lo wer b ound some function we are in terested in — in this case k V k m u k 2 2 . W e develop another framew ork which we call a “one-function metho d” in which we pro ve upper b ounds on the condition num b er of V k m in an alternate manner. F or example, the pro of of the Salem inequality (see Section 4.2 ) relies on lo wer b ounding some in tegral w e are in terested in as: Z ∞ −∞ χ ( t ) k f ( t ) k 2 dt ≤ Z E k f ( t ) k 2 dt where χ ( t ) is an appropriately chosen “test function” that is supported on the interv al E . If additionally b f is w ell-spaced and b χ deca ys quickly , then we can write the left hand side as a relativ ely simple sum-of-squares plus some correction terms. These same sorts of techniques ha ve been applied in the sparse F ourier transform literature, where a combination of hashing and filtering is used to iden tify large coefficients [ 18 , 19 , 21 ]. Our k ey observ ation is that if we transfer these types of inequalities to the discrete setting, the test function itself can b e thought of as a wa y to re-weigh t the rows of V k m to precondition it. This is a v ery differen t setting than usual b ecause we do not know V k m = V k m ( α 1 , α 2 , ..., α k ) — the matrix we w ant to precondition — only a family that it b elongs to. W e call this universal pr e c onditioning , which seems to b e an interesting notion in its o wn right. The key p oin t is that χ ( t ) is an univ ersal preconditioner for any V k m where the f j ’s are well-separated, and indeed we will use it to design faster algorithms for sup er-resolution. Here we restrict to the case where eac h u j = 1 to keep things simple. Our final result is: Theorem 1.5. If the cut-off fr e quency satisfies m > Ω((1 / ∆) log 1 / ) wher e ∆ is the minimum sep ar ation (ac c or ding to the wr ap-ar ound metric) and e ach u j = 1 , ther e is an e O ( m 2 ) time algorithm to r e c over the f 0 j ’s up to ac cur acy  , pr ovide d that the noise satisfies k η j k ≤  2 / (4 k ) for e ach j . W e prov e this result by using χ ( t ) to construct a function F ( z ) whose lo cal minima are  -close to the f j ’s. Moreo ver we establish that F ( z ) is appro ximately strongly conv ex in a neighborho o d around each f j , and w e giv e a v ariant of gradient descen t that finds all of them in parallel. This approac h is muc h faster than trying to solve the conv ex program in [ 5 ] or finding al l the generalized eigenv alues for a pair of m × m T o eplitz matrices as in the matrix p encil metho d. W e remark that least-squares is a p opular approach to solv e sup er-resolution, but most analyses pass to a limiting argument and use the fact that as m go es to infinit y , the (re-normalized) columns of V k m b ecome orthogonal. Our approach instead is to precondition V k m and we b eliev e that this idea of using test functions (i.e. filters, kernels) as a preconditioner may hav e other applications in designing faster algorithms for other basic tasks in signal pro cessing. 4 2 The Optimal Separation Criteria 2.1 The Matrix P encil Metho d Here we review the matrix pencil method, whic h is a basic to ol in statistics, signal pro cessing and learning. The basic idea is to set up a generalized eigenv alue problem whose solutions are the unkno wn parameters. In particular, in the setting of sup er-resolution we are giv en v ` = P j u j e i 2 πf j ` + η ` for integers | ` | ≤ m and where η ` is noise. W e will limit our discussion (for no w) to the idealized case to explain the metho d, and later w e will see that it is the condition n umber of the V andermonde matrix V k m (defined b elo w) that con trols whether or not this metho d is noise tolerant. Let α j = e − 2 π if j , then: V k m ( α 1 , α 2 , ..., α k ) =      1 1 . . . 1 α 1 α 2 . . . α k . . . . . . . . . . . . α m − 1 1 α m − 1 2 . . . α m − 1 k      Moreo ver let D u = diag( { u j } ) and let D α = diag( { α j } ). Recall that in a generalized eigen v alue problem w e are giv en a pair of matrices A and B and the goal is to find the generalized eigenv alues λ for which there is a vector x that satisfies Ax = λB x . In particular, the eigenv alues of A are the solution to the generalized eigen v alue problem where B = I . Then the following w ell-known observ ations are the key to the matrix p encil metho d: Observ ation 1. The non-zer o gener alize d eigenvalues of V D u V H and V D u D α V H ar e exactly { α j } . Observ ation 2. The entry in r ow i , c olumn j of V D u V H is v i − j and the c orr esp onding entry in V D u D α V H is v i − j +1 Hence the entries in these matrices are precisely the low-frequency measurements w e ha ve access to. Now it is s traigh tforw ard to find the unknown frequencies { f j } by setting up the ab o ve generalized eigenv alue problem and solving it. Moreov er we can find the co efficients to o by solving the following linear system: v = V k n ( α 1 , α 2 , ..., α k ) u where v and u are the vectors of { v ` } ’s and { u j } ’s resp ectiv ely . This approach succeeds pro vided that V k m ( α 1 , α 2 , ..., α k ) has full column rank, which is true if the f j ’s are distinct and it has at least as many ro ws as columns. Th us as long as the cutoff frequency satisfies m ≥ k + 1 w e can solv e the sup er-resolution problem exactly in the noise-free setting. This algorithmic guarantee has b een rediscov ered numerous times. How ev er this approac h can sometimes b e extremely in tolerant to noise. What gov erns the noise-tolerance of the ab o v e generalized eigenv alue problem (and consequen tly the ov erall metho d) is the condition n umber of V k m ( α 1 , α 2 , ..., α k ). There are by now many known p erturbation bounds for generalized eigenv alue problems (see [ 40 ]), ho wev er we must take extra care in that the rectangular V andermonde matrix will hav e full column rank but will not b e inv ertible in our setting. W e defer proving p erturbation bounds for the abov e problem until Section 2.4 , but let us for no w focus on understanding when the V andermonde matrix is well-conditioned. 2.2 The Beurling-Selb erg Ma joran t The Beurling-Selb erg ma jorant B ( t ) pla ys a key role in analytic num b er theory , where it has b een used to establish sharp forms of the large siev e inequality and also imp ortan t extensions to the classic Hilb ert inequalit y . In particular: Definition 2.1. B ( t ) =  sin π t π  2  P ∞ j =0 ( t − j ) − 2 − P − 1 j = −∞ ( t − j ) − 2 + 2 t − 1  W e will describ e its k ey prop erties next but for now we remark that B ( t ) is an en tire function of exp o- nen tial type 2 π - i.e. its F ourier transform b B is supp orted on [ − 1 , 1]. Indeed, the definition ab o ve lo oks quite strange at first glance but any function of exp onential type 2 π satisfies a certain interpolation formula 5 based on its v alues at the integers (see e.g. [ 43 ] or [ 45 ]) and B ( t ) comes from plugging in the sign function instead (which is not of exp onen tial type 2 π ). The key prop erties of B ( t ) are that it ma jorizes the sign function sgn — i.e. sgn( t ) ≤ B ( t ) — and that: Z ∞ −∞ B ( t ) − sgn( t ) dt = 1 In fact, B ( t ) is the extremal function that satisfies the ab o ve prop erties. Hence any function F ( t ) of expo- nen tial t yp e 2 π that ma jorizes the sign function satisfies Z ∞ −∞ F ( t ) − sgn( t ) dt ≥ 1 and equalit y holds if and only if F ( t ) = B ( t ). There is a similar construction that minorizes the sign function, and is also extremal. How ever these functions are usually used to ma jorize and minorize a given interv al, and we will instead state the theorems w e need restricted to that setting. Let E = [ − n, + n ] and let I E ( t ) b e the indicator function for E . Then Theorem 2.2. Ther e ar e entir e functions C E ( t ) and c E ( t ) whose F ourier tr ansform is supp orte d on [ − ∆ , ∆] and that satisfy c E ( t ) ≤ I E ( t ) ≤ C E ( t ) and Z ∞ −∞ C E ( t ) − I E ( t ) dt = Z ∞ −∞ I E ( t ) − c E ( t ) dt = 1 / ∆ The functions C E ( t ) and c E ( t ) are called ma jorants and minorants resp ectiv ely . In [ 29 ], Mon tgomery and V aughan used these prop erties of C E ( t ) and c E ( t ) to give a strong generalization of Hilb ert’s inequality . Here, we will mo dify their approach and prov e upp er b ounds on the condition n umber of the V andermonde matrix. 2.3 Upp er Bounding the Condition Number Here we will adapt the approach of Mon tgomery and V aughan [ 29 ] to give an upper b ound on the condition n umber of V k m . Recall that α j = e i 2 π f j and d w is the wrap-around distance on the unit in terv al. Then supp ose each f j ∈ [0 , 1) and the minim um separation is strictly larger than ∆ — i.e. d w ( f j , f j 0 ) > ∆ for j 6 = j 0 . W e pro ve that V k m ( α 1 , α 2 , ..., α k ) is well-conditioned provided that m > 1 ∆ + 1. Moreov er we will pro ve in Section 3 that if m ≤ 1 −  ∆ then V k m can ha ve condition num b er that is exp onen tially large in k , so there is a sharp transition. Let b C E denote the F ourier transform of C E and let h ( ` ) = P ∞ ` = −∞ δ ( ` ) = P ∞ t = −∞ e i 2 π t` denote the Dirac c om b, where we hav e used the fact that the Dirac comb is its own F ourier transform. Recall that E = [ − n, + n ], I E ( t ) is the indicator function for E and v ` = P k j =1 u j e i 2 π f j ` . Then w e can write n X ` = − n k v ` k 2 = Z ∞ −∞ h ( ` ) I E ( ` ) k v ` k 2 d` ≤ Z ∞ −∞ h ( ` ) C E ( ` ) k v ` k 2 d` = k X j =1 k X j 0 =1 u j ¯ u j 0 Z ∞ −∞ h ( ` ) C E ( ` ) e i 2 π ( f j − f j 0 ) ` d` = k X j =1 k X j 0 =1 ∞ X t = −∞ u j ¯ u j 0 Z ∞ −∞ e i 2 π t` C E ( ` ) e i 2 π ( f j − f j 0 ) ` d` = k X j =1 k X j 0 =1 ∞ X t = −∞ u j ¯ u j 0 b C E ( f j − f j 0 + t ) = b C E (0) k X j =1 | u j | 2 where the last equality follows because | f j − f j 0 + t | > ∆ for any integer t , and b ecause b C E is supp orted on [ − ∆ , ∆]. Moreo ver b C E (0) = Z ∞ −∞ C E ( ` ) d` = | E | + 1 / ∆ = 2 n + 1 / ∆ 6 W e can instead use the minoran t c E ( ` ) and combined with the ab o ve inequalit y we get (2 n − 1 / ∆) k X j =1 | u j | 2 ≤ n X ` = − n k v ` k 2 ≤ (2 n + 1 / ∆) k X j =1 | u j | 2 No w w e can mak e the substitution b j = u j e − i 2 πf j n in which case k V k 2 n +1 ( α 1 , α 2 , ..., α k ) b k 2 = n X ` = − n k v ` k 2 = (2 n ± 1 / ∆) k X j =1 k b j k 2 Recall that the condition n umber κ ( V ) = max a k V a k min a 0 k V a 0 k and hence w e obtain the following theorem by setting m = 2 n + 1: Theorem 2.3. The c ondition numb er κ of V k m ( α 1 , α 2 , ..., α k ) satisfies κ 2 ≤ m +1 / ∆ − 1 m − 1 / ∆ − 1 pr ovide d that m > 1 ∆ + 1 . 2.4 The Generalized Eigen v alue Problem There are a num b er of well-kno wn results that b ound the noise-tolerance of a generalized eigen v alue problem (see [ 40 ] and references therein). Ho wev er often these results assume that the matrices A and B are b oth full rank. W e will b e able to reduce to this case and app eal to their p erturbation b ounds as a blac k-b o x. F or conv enience, w e will adopt the same notation as in [ 40 ] and let us first focus on settings where A and B are square matrices of order k . W e are in terested in the solutions to det( αA − β B ) = 0. Additionally let λ = α/β . Then the pair ( A, B ) is called r e gular if there is any choice of α and β for whic h det( αA − β B ) 6 = 0. Of course, if either A or B is full rank then the pair is regular. The b ounds in [ 40 ] are stated in terms of the chordal metric, which will b e conv enient for our purp oses. Definition 2.4. The chordal metric for λ, µ ∈ C is defined as χ ( λ, µ ) = | λ − µ | p 1 + | λ | 2 p 1 + | µ | 2 Equiv alently if we tak e s ( λ ) and s ( µ ) to b e the p oin ts on S 2 so that λ and µ are their stereographic pro jections, then χ ( λ, µ ) = 1 2 k s ( λ ) − s ( µ ) k 2 . This metric is w ell-suited for our applications since our eigenv alues λ will an ywa ys b e on the boundary of the complex disk. Throughout this section let b A = A + E and let b B = B + F . Let b λ b e the generalized eigen v alues for the pair b A, b B . W e will work with the following measure: Definition 2.5. The matching distance with resp ect to χ is defined as md χ [( A, B ) , ( b A, b B )] = min π max i χ ( λ i , b λ π ( i ) ) where π is a p erm utation on { 1 , 2 , ..., k } . Note that we can define the matching distance analogously using another metric. In fact, we will state the guaran tees of the matrix p encil metho d based on the matching distance of the output b f j to the true f j ’s according to the wrap-around metric. W e will make use of the following theorem, whic h com bines Theorem 2 . 4 and Corollary 2 . 5 from [ 40 ]: Theorem 2.6. [ 40 ] L et ( A, B ) and ( b A, b B ) b e r e gular p airs and further supp ose that for some nonsingular X we have ( X AX H , X B X H ) = ( I , D ) wher e D is diagonal. A lso let e i b e the i th r ow of X ( A − b A ) X H and let f i b e the i th r ow of X ( B − b B ) X H and set ρ = max i {k e i k 1 + k f i k 1 } . If the fol lowing r e gions G i = n µ    χ ( D ii , µ ) ≤ ρ o ar e disjoint then the matching distanc e of the gener alize d eigenvalues of ( b A, b B ) to { D ii } is at most ρ with r esp e ct to the chor dal metric. 7 2.5 A Mo dified Matrix Pencil Metho d No w w e are ready to prov e error b ounds for the matrix pencil method. Recall that we are in terested in solving the generalized eigen v alue problem ( V D u V H , V D u D α V H ) where V is short-hand for V k m , D u = diag( u j ) and D α = diag( α j ). Instead we are giv en V D u V H + E and V D u D α V H + F . W e will first reduce to a generalized eigen v alue problem on order k matrices so that the matrix pair will b e regular. Ideally , we would pro ject on to the column span of V to preserve the fact that the α j ’s are the generalized eigenv alues. Ho wev er we do not hav e access to this space, and instead we will pro ject onto the top k singular v ectors of V D u V H + E . Let b U b e an orthonormal basis for this space. It follows from W edin’s theorem (see e.g. [ 22 ]) that the column space of V and the span of b U ha ve a small principal angle. In fact, an even stronger statement is true: we can align the basis of the column span of V to b U . Throughout this section, let u min = min j | u j | and u max = max j | u j | . Lemma 2.7. Ther e is an orthonormal b asis U for the c olumn sp an of V that satisfies k U − b U k 2 ≤ 2 k E k 2 u min . Pr o of. By W edin’s theorem and the definition of the principal angle w e ha ve that sin(Θ( U, b U )) = k ( I − U U H ) b U k 2 ≤ k E k 2 u min Then using W eyl’s inequality (see e.g. [ 22 ]) w e can write k U − b U k 2 ≤ k U U H b U − b U k 2 + k U − U U H b U k 2 The first term is just sin(Θ( U, b U )). Also w e can use the fact that cos(Θ( U, b U )) = σ min ( U H b U ). Then w e can b ound the second term as k U − U U H b U k 2 ≤ k I − U H b U k 2 ≤ 1 − cos(Θ( U, b U )) ≤ sin(Θ( U, b U )) ≤ k E k 2 u min And this completes the pro of. No w let A = U H V D u V H U , let B = U H V D u D α V H U and let X = D − 1 / 2 u V + U . It follows that X AX H = I and X B X H = D α . Also consider the p erturbed generalized eigen v alue problem for b A = b U H ( V D u V H + E ) b U and b B = b U H ( V D u D α V H + F ) b U . Let us set k U − b U k 2 = τ for notational conv enience. Then we can write k X ( A − b A ) X H k 2 ≤ k X k 2 2 k A − b A k 2 . And furthermore k A − b A k 2 ≤ k E k 2 + k U H V D u V H U − b U H V D u V H b U k 2 ≤ k E k 2 +  2 τ k V k 2 2 + τ 2 k V k 2 2  u max And combining these inequalities and τ = 2 k E k 2 u min w e ha ve k X ( A − b A ) X H k 2 ≤ k E k 2 σ 2 min u min + 4 κ 2  k E k 2 u min + k E k 2 2 u 2 min  u max u min where κ and σ min are the condition n umber and minim um singular v alue of V respectively . An identical b ound holds for B and b B where we replace E by F . W e remark that a b ound on the sp ectral norm of X ( A − b A ) X H is also a b ound on the ` 2 -norm of an y of its ro ws; and to get a b ound on the ` 1 -norm of an y its rows, we can multiply the ab o v e bound b y k 1 / 2 . Hence we can com bine the ab o ve b ounds and Theorem 2.6 to get that the generalized eigen v alues of ( b A, b B ) conv erge to the generalized eigenv alues of ( A, B ) at an in verse p olynomial rate with the noise, with resp ect to the chordal metric. W e remark that the generalized eigenv alues of ( A, B ) are on the complex disk. Hence if ( b α, b β ) is a generalized eigenv alue of ( b A, b B ), we can find the closest p oin t on the complex disk to b λ = b α/ b β which in turn will be hav e arc-length at most O ( χ ( b λ, λ )) where λ = α/β and ( α, β ) is the generalized eigenv alue of ( A, B ) that is matched to ( b α, b β ). Hence the output of ModifiedMPM is a set of lo cations { b f j } whose matc hing distance to { f j } with resp ect to d w con verges at an inv erse p olynomial rate to zero with the magnitude of the error. All that 8 Algorithm 1 ModifiedMPM , Input: k and estimates v ` for | ` | ≤ m Output: { ( b u j , b f j ) } 1. Let e A and e B be matrices where e A i,j = v i − j and e B i,j = v i − j +1 2. Let b U b e the top k singular vectors of e A 3. Set b A = b U H e A b U and b B = b U H e B b U 4. Solve for the generalized eigen v alues of ( b A, b B ) 5. F or each generalized eigenv alue b λ = b α/ b β , let b f j b e the argument of the pro jection of b λ onto the complex disk 6. Set b V to b e the V andermonde matrix with { b f j } 7. Solve for b u such that b V b u = v 8. Output { ( b u j , b f j ) } remains is to recov er the co efficien ts to o. T o this end, w e write b V to denote the V andermonde matrix with b f j ’s instead of f j ’s. Then w e are given v = V u + η = b V u + ( V − b V ) u + η . Then if we solve the linear system v = b V b u w e get b u such that k u − b u k 2 ≤ k V − b V k 2 k u k 2 k η k 2 Let the matc hing distance b etw een { f j } and { b f j } b e ρ . Then it is easy to see that each en try in V − b V is b ounded by O ( ρm ) and hence k u − b u k 2 ≤ O  ρm 3 / 2 k u max + k η k 2 m − 1 − 1 ∆ − 2 ρ  since the minimum separation of { b f j } is at least ∆ − 2 ρ . W e can now state our main theorem, which giv es a quan titative rate for how quickly our estimates con verge to the true v alues: Theorem 2.8. L et η b e the ve ctor with entries { η ` } and set γ = k k η k 2 σ 2 min u min + 4 κ 2  k k η k 2 u min + k 3 / 2 k η k 2 2 u 2 min  u max u min and ζ = O  γ m 3 / 2 k u max + k η k 2 m − 1 − 1 ∆ − 2 γ  F urther supp ose that k E k 2 + k F k 2 < σ 2 min u min and γ < ∆ / 4 . Then the output of ModifiedMPM satisfies d w ( f j , b f π ( j ) ) ≤ 2 γ and | u j − b u π ( j ) | ≤ ζ for al l j , for some p ermutation π . The imp ortan t p oin t is that the estimates conv erge at an inv erse p olynomial rate to the true v alues as a function of k η k 2 pro vided that m > (1 +  ) / ∆ for an y  > 0. 3 The Need for Separation In the previous section, w e gav e an upp er b ound for the condition num b er of V k m based on m and ∆. Here w e establish that there is a sharp phase transition, and if m = (1 −  ) / ∆ then V k m has an exponentially large condition num b er. F rom this it follows that if m = (1 −  ) / ∆ there is a pair of of k p oin t sources x and x 0 eac h with separation ∆ which w e w ould need exponentially small noise to tell apart. 9 3.1 Imp ossibilit y Results In order to make the connection to sup er-resolution it will actually b e more conv enient to work with the shifted V andermonde matrix (b elo w), but all the results in this section carry-ov er the other case as well. V k − n/ 2 ,n/ 2 ( α 1 , α 2 , ..., α k ) =       α − n/ 2 1 α − n/ 2 2 . . . α − n/ 2 k α − n/ 2+1 1 α − n/ 2+1 2 . . . α − n/ 2+1 k . . . . . . . . . . . . α n/ 2 1 α n/ 2 2 . . . α n/ 2 k       Throughout this section we will use slightly different notation where w e will b e interested in delta functions spaced at in terv als of length 1 /m and w e will use n to denote the n umber of measuremen ts. Let n = (1 −  ) m . Theorem 3.1. L et k = Ω(log m ) . Then the matrix V k − n/ 2 ,n/ 2 wher e α j = e i 2 π j/m has c ondition numb er at le ast 2 Ω( k ) . Throughout this section let V b e short-hand for V k − n/ 2 ,n/ 2 . It is easy to upper b ound the maximum singular v alue of V and hence the culprit is the smallest singular v alue. W e conclude that there is a v ector u with k u k 2 = 1 and k V u k 2 ≤ 2 − Ω( k ) . Then: n/ 2 X k = − n/ 2  X j is odd u j e i 2 π jk /m + X j is ev en u j e i 2 π jk /m  2 = k V u k 2 2 ≤ 2 − Ω( k ) And hence w e hav e tw o sets of p oin t sources corresp onding to the o dd and even indices resp ectiv ely , with separation ∆ = 2 /m but where measuring them up to the cutoff frequency n/ 2 = (1 −  ) m/ 2 = (1 −  ) / ∆ w ould require exp onen tially small error to tell them apart. Corollary 3.2. Ther e is a p air of k p oint sour c es x and x 0 e ach with sep ar ation ∆ wher e if m = (1 −  ) / ∆ and the noise η ` has its r e al and c omplex p art chosen as Gaussian indep endent r andom variables with varianc e 2 − Ω( k ) , then for any inte ger ` with | ` | ≤ m , d T V ( v ` , v 0 ` ) ≤ 2 − Ω( k ) wher e v ` and v 0 ` ar e the r andom variables asso ciate d with me asuring x and x 0 at fr e quency ` r esp e ctively. Recall that in the noise-free case, the separation is irrelev ant. It is p ossible to reconstruct the p oin t sources with m = k . But in the noisy v ersion of the problem, k is irrelev ant and the minim um num b er of mea- suremen ts essentially only dep ends on the minim um separation. Let us no w prov e the main theorem in this section. W e will mak e use of the F ejer kernel here (and later), although many other k ernels could b e used in its place. 3.2 The F ejer Kernel I: Basic Prop erties Here we introduce the F ejer kernel, which we will use in the next subsection to construct a unit vector u where k V u k 2 is exp onen tially small. Definition 3.3. The F ejer kernel 4 is K ` ( x ) = 1 ` 2 ` X j = − ` ( ` − | j | ) e i 2 π jx = 1 ` 2  sin `π x sin π x  2 Here we hav e given b oth the F ourier representation and the closed form for the F ejer k ernel. W e will use b K ` to denote its F ourier transform. The k ey is that b K ` is supp orted on {− `, − ` + 1 , ...` − 1 , ` } and y et K ` ( x ) deca ys quic kly . W e will also mak e use of p ow ers of the F ejer k ernel which we denote by K r ` ( x ) and note that its F ourier transform is supp orted on {− r `, − r ` + 1 , ..., r ` − 1 , r ` } . W e mak e use of the following w ell-known prop ert y of the F ejer kernel: F act 3.4. K ` ( x ) ≤ 1 4 ` 2 x 2 for x ∈ [ − 1 / 2 , 1 / 2] and K r ` ( x ) ≤ 1 4 r ` 2 r x 2 r 10 Pr o of. It is well known that sin π x ≥ 2 x for x ∈ [0 , 1 / 2] and substituting this b ound into the denominator and using (sin `π x ) 2 ≤ 1 implies the b ound on K ` ( x ) and similarly for K r ` ( x ). 3.3 Lo w er Bounding the Condition Num b er W e are now ready to prov e Theorem 3.1 . Pr o of. Consider H ( x ) = K r ` ( x/m ) e iπ x . W e will set ` = 4 / and r = ( k − 1) / 2 ` . W e will establish some elemen tary properties of H ( x ) and its F ourier transform b H : Claim 3.5. b H ( t ) = P 3 k/ 2 − 1 j = k/ 2 h j e i 2 π jt/m and P j | h j | = 1 . Pr o of. Let b K ` denote the F ourier transform of K ` ( x/m ). Then w e can write b H as the con volution of b K ` with itself r times, then con volv ed with δ (1 / 2). This directly implies the first part of the claim. F urthermore, the co efficien ts of b K ` are nonnegativ e and sum to one, so we can interpret both b K ` and b H as distributions on a discrete set of p oints, and this yields the second part of the claim. W e will let u denote the vector whose en tries are the h j ’s. Then k u k 2 ≥ k u k 1 / √ k = 1 / √ k b ecause u has k non-zeros. No w w e give an upp er b ound for k V u k ∞ . Claim 3.6. k V u k ∞ ≤ max | x |≤ n/ 2 | H ( x ) | Pr o of. The claim is immediate since each co ordinate in V u is equal to H ( j ) for some j ∈ {− n/ 2 , − n/ 2 + 1 , ...n/ 2 } . No w all that remains is to b ound the right hand side ab ov e. W e can use F act 3.4 and K r ` ( x ) ≤ 1 (8 x/ ) 2 r . Moreo ver H ( x ) = K r ` ( x/m − 1 / 2) and so max | x |≤ n/ 2 | H ( x ) | ≤ 1 4 2 r ≤ 2 − Ω( k ) where w e hav e used the fact that K r ` ( x ) has p erio d one and that r = Ω( k ). This completes the pro of of the theorem. 4 Univ ersal Preconditioning In the previous sections, we analyzed the condition num b er of V k m based on m and the minimum separation ∆ according to the wrap-around distance, and this gav e us algorithms for sup er-resolution that are optimal in the relationship b et ween the n umber of measuremen ts and the minim um separation. Here w e pursue a differen t direction - namely , is there a w ay to precondition V k m to mak e it almost orthogonal? Just lik e the Beurling-Selb erg ma joran t play ed the key role in the previous section, here the F ejer kernel will itself b e a go od preconditioner for a family of V andermonde matrices. 4.1 The F ejer Kernel I I: Lo cal Appro ximation In Section 3.2 we introduced the F ejer kernel and stated some of its basic prop erties. Here we will also need need upp er and lo wer b ounds for K ` ( x ) in a neighborho o d of zero, that w e establish here for later reference. Recall that K ` ( x ) = 1 ` 2  sin `π x sin π x  2 and K r ` ( x ) =  K ` ( x )  r Lemma 4.1. F or ` ≥ 4 and | x | ≤ 1 /` and any inte ger r ≥ 1 we have 1 − C r ` 2 x 2 ≤ K r ` ( x ) ≤ 1 − c` 2 x 2 wher e we c an take C = 12 and c = 1 / 3 . Pr o of. In order to analyze the behavior of K r ` ( x ) in a neigh b orho od of zero, we need upper and low er b ounds for sin x that b eha ve like x − Θ( x 3 ). W e will mak e crucial use of the follo wing facts, which are easy to c heck: 4 W e use a slightly non-standard definition for the F ejer kernel, where ours will b e normalized so that K ` (0) = 1 and p eriodic on [0 , 1) instead of [0 , 2 π ) since this suits our applications more con venien tly . 11 F act 4.2. x (1 − x 2 2 ) ≤ sin x for al l x ≥ 0 F act 4.3. sin x ≤ x (1 − x 2 10 ) for al l x ∈ [0 , π ) Then substituting these inequalities w e obtain K r ` ( x ) ≤ 1 ` 2 r  `π x (1 − ( `π x ) 2 / 10) π x (1 − ( π x ) 2 / 2)  2 r =  1 − ( `π x ) 2 / 10 1 − ( π x ) 2 / 2  2 r ≤  (1 − ( `π x ) 2 / 10) (1 − ( π x ) 2 / 2)  where we hav e used the facts ab o ve, and we require | x | ≤ 1 /` in order to in vok e F act 4.3 . Finally we ha ve K r ` ( x ) ≤ 1 − ( `π x ) 2 / 10 + ( π x ) 2 ≤ 1 − ` 2 x 2 / 3 where we hav e used that (1 − ( π x ) 2 / 2) − 1 = 1 + ( π x ) 2 / 2 + ... ≤ 1 + ( π x ) 2 whic h holds for x ≤ 1 /π . Finally w e can pro ve our low er b ound K r ` ( x ) ≥ 1 ` 2 r  `π x (1 − ( `π x ) 2 / 2) π x (1 − ( π x ) 2 / 10)  2 r =  1 − ( `π x ) 2 / 2 1 − ( π x ) 2 / 10  2 r ≥  1 − ( `π x ) 2 / 2) 2 r ≥ 1 − r ( `π x  2 where again we ha ve used the facts ab o ve, and we require that | x | ≤ 1 /π in order to inv oke F act 4.3 . 4.2 F rom Uncertaint y Principles to Preconditioners Let us con trast our approach here with the wa y in whic h w e used ma jorants in the previous section: Recall our goal was to upp er and lo wer b ound the sum P m t = − m k f ( t ) k 2 . Once we plugged in ma jorants and minoran ts, b oth the upp er and low er b ounds b ecame simple expressions but the bounds themselves are not that close to each other unless m is muc h larger than 1 / ∆ and this is roughly tight. Instead, we could ask: are there co efficien ts c t so that the sum P m t = − m c t k f ( t ) k 2 itself has a simple expression that w e can get tighter upp er and low er b ounds for (see e.g. Claim 4.5 ). The p oin t is that many inequalities in harmonic analysis are established this w ay . One classic example is the Salem inequality (see e.g. [ 45 ]). This can b e thought of as a strengthening of some of the more w ell-known uncertaint y principles [ 13 ], which applies to the setting where a signal is sparse and its spikes are well-separated. Again let f ( t ) = P k j =1 u j e i 2 π f j t and supp ose that | f j − f j 0 | ≥ ∆ for all j 6 = j 0 . Then Lemma 4.4. [ 45 ] L et E b e an y interval of length (1 +  ) / ∆ . Then P j | u j | 2 ≤ C  | E | R E k f ( t ) k 2 dt wher e C  = 2 π (1+  ) 2 4  (2+  ) = O (1 + 1 / ) . In fact the pro of of this inequalit y follows by choosing an appropriate test function χ ( t ) that is supp orted on E , and analyzing Z ∞ −∞ χ ( t ) k f ( t ) k 2 dt directly and showing that it equals (up to scaling) P j | u j | 2 plus cross terms that decay as the separation increases. So when w e use a single test function (as opp osed a ma jorant and a minorant) it can really b e though t of as a preconditioner that makes the righ t hand side close to the sum-of-squares of the co efficien ts! W e will pursue this direction, but will make use of different test functions than in [ 45 ] that are supp orted on a discrete set of p oints, namely the F ourier transform of the F ejer k ernel and its pow ers. This has the effect of transforming the righ t hand side into a sum instead of an in tegral. Claim 4.5. L et F ( x ) b e a kernel wher e b F ( t ) = P m j = − m c j δ ( t − j ) . Then m X j = − m c j k f ( j ) k 2 = k X j =1 k X j 0 =1 u j ¯ u j 0 F ( f j − f j 0 ) Pr o of. m X j = − m c j k f ( j ) k 2 = k X j =1 k X j 0 =1 u j ¯ u j 0 Z ∞ −∞ b F ( t ) e i 2 π ( f j − f j 0 ) t ) dt = k X j =1 k X j 0 =1 u j ¯ u j 0 F ( f j − f j 0 ) 12 No w w e can set F to b e the F ejer kernel or its p o wers. Using the ab o v e claim we will sho w that this choice of c j ’s is a go o d preconditioner. Supp ose that d w ( f j , f j 0 ) ≥ ∆. Lemma 4.6. If c j ar e the c o efficients of b K ` , then    ` X j = − ` c j k f ( j ) k 2 − k X j =1 | u j | 2    ≤ k X j =1 | u j | 2 π 2 24 ` 2 ∆ 2 Pr o of. W e can use Claim 4.5 to obtain    ` X j = − ` c j k f ( j ) k 2 − k X j =1 | u j | 2    ≤ X j 6 = j 0 | u j || u j 0 || K ` ( f j − f j 0 ) | where we hav e used the fact that K ` (0) = 1 (recall that we chose a non-standard normalization for the F ejer k ernel). Moreov er rearranging we get: X j 6 = j 0 | u j || u j 0 || K ` ( f j − f j 0 ) | ≤ X j 6 = j 0 1 2 ( | u j | 2 + | u j 0 | 2 ) | K ` ( f j − f j 0 ) | ≤ k X j =1 | u j | 2 ∞ X j 0 =1 1 4 ` 2 ( j 0 ) 2 ∆ 2 = k X j =1 | u j | 2 π 2 24 ` 2 ∆ 2 where hav e used F act 3.4 in the last inequality . This implies the lemma. Corollary 4.7. If c 0 j ar e the c o efficients of b K r ` then    r` X j = − r` c 0 j k f ( j ) k 2 − k X j =1 | u j | 2    ≤ k X j =1 | u j | 2 π 2 4 r ` 2 r ∆ 2 r Pr o of. W e can follow the same pro of as ab o ve, but for p ow ers of the F ejer kernel. The only difference is that w e use fact that K r ` (∆) ≤ 1 4 r ` 2 r ∆ 2 r and that P ∞ j =1 1 j 2 r ≤ P ∞ j =1 1 j 2 = π 2 6 . This implies the corollary . 4.3 Appro ximate Strong Con vexit y In the previous subsection, we pro ved upp er and lo wer b ounds for P r` j = − r` c 0 j k f ( j ) k 2 , and ev entually we w ant to use this p oten tial function to guide our algorithm. Let us fo cus on the case where all the co efficien ts a j = 1 for simplicity . What we w ant to pro ve are soundness and completeness results that r` X j = − r` c 0 j k f ( j ) − e i 2 π jx k 2 ≈ k − 1 if and only if x is sufficiently close to some f j . So the challenge is that we cannot directly apply Lemma 4.6 or Corollary 4.7 because the set { x } ∪ { f j } j will not necessarily be well separated (and we hop e it won’t b e). The starting p oin t is com bining Claim 4.5 and Corollary 4.7 to obtain: r` X j = − r` c 0 j k f ( j ) − e i 2 π jx k 2 = T − 2 k X j =1 K r ` ( x − f j ) where T = P K r ` ( f j − f j 0 ) + K r ` (0) ≈ k + 1 is a constant that do es not dep end on x , and w e ha ve used the fact that K r ` is symmetric ab out the origin. Let γ = d w ( x, { f j } ). 13 Lemma 4.8. If γ < min(∆ / 2 , 1 /` ) and ` ≥ 4 then 1 − 12 r ` 2 γ 2 − π 2 4 r ` 2 r ∆ 2 r ≤ k X j =1 K r ` ( x − f j ) ≤ 1 − ` 2 γ 2 3 + π 2 4 r ` 2 r ∆ 2 r A nd if inste ad γ ≥ 1 /` then k X j =1 K r ` ( x − f j ) ≤ 1 4 r + π 2 4 r ` 2 r ∆ 2 r Pr o of. Let f 1 , f 2 , ..., f k b e sorted so that d w ( x, f j ) is monotonically decreasing. W e can write k X j =1 K r ` ( x − f j ) = K r ` ( γ ) + k X j =2 K r ` ( x − f j ) ≤ 1 − r ` 2 γ 2 5 + k X j =2 K r ` ( x − f j ) where w e hav e used Lemma 4.1 to give upp er b ounds for K r ` ( γ ). Moreov er w e hav e that and d w ( x, f j ) ≥ ∆( j − 1) − γ ≥ ∆( j − 1) / 2 for all j and so w e can use F act 3.4 as we did in the pro of of the mo dified Salem inequalit y (Lemma 4.6 ) and this implies the upp er bound. An identical argument implies the low er b ound to o. The second part of the lemma follows by instead in voking F act 3.4 . Notice that if we set r = Θ(log n ) and ` > 1 ∆ then the terms that depend on ∆ can b e b ounded b y an y in verse p olynomial and we hav e prov en that our p otential function is additiv ely close to being strongly conv ex. The in tuition is that if we hav e any t wo solutions x and x 0 at distance γ and γ 0 resp ectiv ely from f 1 if γ < γ 0 C ` our p oten tial function must b e smaller for x than it is for x 0 , and so w e iteratively discard p oin ts that are to o far from f 1 and make geometric progress in each step. 4.4 The Algorithm Here w e consider the follo wing abstraction: Supp ose there are unkno wn f j ’s whic h are ∆ separated according to the wrap-around metric and set ` = 1 / ∆. F urthermore w e are given an oracle F so that if d w ( z , { f j } ) = γ , then either A + c` 2 γ 2 −  2 / 4 ≤ F ( z ) ≤ A + C r ` 2 γ 2 +  2 / 4 ( ∗ ) if γ ≤ 1 /` and otherwise F ( z ) ≤ 1 / 4. Our goal is to find an appro ximation b f j to the f j ’s using this oracle, that hav e matc hing distance at most  . This will immediately yield a fast, greedy algorithm for the sup er-resolution problem. Let C, c b e constan ts, then: Theorem 4.9. The output of Itera tiveRefine is a set { b f i } that has matching distanc e at most  to { f i } with r esp e ct to d w . If R is the running time to c al l the or acle for F ( z ) , then Itera tiveRefine runs in time O ( R`r + Rk r 1 / 2 log(1 / )) . Pr o of. W e will prov e that Itera tiveRefinement succeeds by proving a set of in v ariants for the algorithm. W e will refer to Steps 2-6 as the initialization phase. Claim 4.10. After initialization, the matching b etwe en { b f i } and { f i } ac c or ding to d w is at most γ 1 ≤ 1 / (4 ` ) . Pr o of. Consider the first iteration where i = 1. Since the grid length is δ 1 , there must be an x where A + C r ` 2 δ 2 1 −  2 / 4 ≤ F ( x ). Hence z must satisfy A + C r ` 2 δ 2 1 −  2 / 4 ≤ F ( z ) ≤ A + c` 2 γ 2 +  2 / 4 where d w ( { f i } , z ) = γ . This implies that γ ≤ γ 1 = ( C r c ) 1 / 2 δ 1 + / 2 ≤ 1 4 ` . Hence if f 1 is the closest to z according to d w , we necessarily remov e all p oin ts x 0 with d w ( x 0 , f 1 ) ≤ 1 / (4 ` ). At the start of the next iteration, there still remains a p oin t z that is within δ 1 of some f j (ev en after we delete points close to b f 1 ) and so the next p oin t z will b e close to a new f j . This establishes the claim by induction. 14 Algorithm 2 Itera tiveRefinement , Input: Oracle F and k and constants C , c,  Output: { b f j } with matching distance at most  to { f j } 1. Set δ j = 1 4 ` ( c C r ) j and γ j = ( C r c ) 1 / 2 δ j + / 2 2. Set U to b e the grid of [0 , 1) with length δ 1 3. F or i = 1 to k 4. Find z in U that minimizes F ( z ); Set b f i to z 5. Remo ve all x 0 with d w ( b f i , x 0 ) ≤ 1 / (2 ` ) 6. Set j = 2 7. While γ j >  8. F or i = 1 to k 9. Let U i b e the grid of p oin ts x 0 with d w ( b f i , x 0 ) ≤ γ j − 1 with length δ j . 10. Find z in U i that minimizes F ( z ); Set b f i to z 11. Output { b f i } Claim 4.11. After iter ation j , the matching distanc e b etwe en { b f i } and { f i } is at most γ j . Pr o of. By induction, at the start of iteration j the matching distance betw een { b f j } and { f j } is at most γ j − 1 . Thus in iteration j , for each i there is a p oin t that is within distance δ j of f i . Moreov er the grids U i are disjoint. Again using the prop erties of the oracle, this implies that for eac h i the z we find satisfies d w ( f i , z ) = γ with γ ≤  C r c  1 / 2 δ j − 1 + / 2 = γ j and this completes the pro of of the claim. These inv arian ts immediately yield our main theorem. W e are now ready to prov e Theorem 1.5 . Pr o of. W e set ` = 1 / ∆ and r = Θ(log 1 / ). Recall that m = r ` is the cut-off frequency . F urthermore let F ( z ) = r` X j = − r` c 0 j k f ( j ) − e i 2 π jz k 2 in which case A = T − 2 and using Lemma 4.8 w e conclude that F ( z ) has the desired form in ( ∗ ). Moreov er w e can ev aluate F ( z ) in time O ( m ) by directly computing it. Finally it is easy to see that if there is noise in our measurements w e can accoun t for it expanding F ( z ) as follows:    r` X j = − r` c 0 j k f ( j ) − e i 2 π jz + η j k 2 − F ( z )    ≤ r` X j = − r` c 0 j  2( k + 1) k η j k + k η j k 2  Note that P j c 0 j = 1 b ecause it is the conv olution of discrete probabilit y distributions. Hence if the noise satisfies k η j k ≤  2 / (4 k ) for each j , we can group the error terms due to the noise in to the terms in ( ∗ ) that dep end on  , and the theorem no w follo ws by in voking Theorem 4.9 . Ac kno wledgements Thanks to David Jerison and Henry Cohn for many helpful discussions ab out extremal functions. 15 References [1] M. Belkin and K. Sinha. Polynomial learning of distribution families. In FOCS , pages 103–112, 2010. [2] A. Beurling. In terp olation for an in terv al on R 1 . (Mittag-Leffler Lectures on Harmonic Analysis) Col- lected W orks of Arne Buerling: V olume 2, Harmonic Analysis, pages 351–359, 1989. [3] B. Bhask ar, G. T ang and B. Rech t. Compressed sensing off the grid. , 2012. [4] A. Bhask ara, M. Charik ar, A. Moitra and A. Vija yaragha v an. Smo othed analysis of tensor decomp osi- tions. In STOC , 2014. [5] E. Candes and C. F ernandez-Granda. T o wards a mathematical theory of sup er-resolution. Comm. of Pur e and Applie d Math. , to app ear. [6] E. Candes and C. F ernandez-Granda. Sup er-resolution from noisy data. J. of F ourier A nalysis and Applic ations , pages 1229–1254, 2013. [7] E. Candes, J. Rom b erg and T. T ao. Robust uncertaint y principles: exat signal reconstruction from highly incomplete frequency information. IEEE T r ansactions on Information The ory , pages 489–509, 2006. [8] E. Candes and T. T ao. Decoding by linear programming. IEEE T r ans. on Information The ory , 51(12):4203–4215, 2005. [9] V. Chandrasek aran, B. Rech t, P . P arrilo and A. Willsky . The con vex geometry of linear in verse problems. F oundations of Computational Mathematics , pages 805–849, 2012. [10] E. Cheney . Intr o duction to Appr oximation The ory . American Mathematical Society , 2000. [11] D. Donoho. Sup erresolution via sparsity constraints. SIAM Journal on Mathematic al Analysis , pages 1309–1331, 1992. [12] D. Donoho and B. Logan. Signal recov ery and the large siev e inequality . SIAM Journal of Applie d Math , pass 577–591, 1992. [13] D. Donoho and P . Stark. Uncertain ty principles and signal recov ery . In SIAM J. on Appl. Math , pages 906–931, 1999. [14] B. G. R. de Pron y . Essay exp erimental et analytique: sur les lois de la dilatabilite de fluides elastique et sur celles de la force expansive de la v ap eur de l’alco ol, a differen tes temp eratures. Journal de l’e c ole Polyte chnique , pages 24–76, 1795. [15] K. Do Ba, P . Indyk, E. Price and D. W o odruff. Low er b ounds for sparse recov ery . In SODA , pages 1190–1197, 2010. [16] M. Elad. Sp arse and R e dundant R epr esentations . Springer, 2010. [17] C. F ernandez-Granda. Supp ort detection in sup er-resolution. International Confer enc e on Sampling The ory and Appl. , pages 145–148, 2013. [18] A. Gilb ert, S. Guha, P . Indyk, M. Muthukrishnan and M. Strauss. Near optimal sparse F ourier repre- sen tations via sampling. In STOC , pages 152–161, 2002. [19] A. Gilb ert, M. Muthukrishnan and M. Strauss. Improv ed time b ounds for near-optimal sparse F ourier represen tations. SPIE 5914, Wavelets XI , 2005. [20] N. Go yal, S. V empala and Y. Xiao. F ourier PCA. In STOC , pages 584–593, 2014. [21] H. Hassanieh, P . Indyk, D. Kitabi and E. Price. Nearly optimal sparse F ourier transform. In STOC , pages 563–578, 2012. 16 [22] R. Horn and C. Johnson. Matrix Analysis . Cambridge Universit y Press, 1990. [23] Y. Hua and T. Sark ar. Matrix p encil method for estimating parameters of exp onen tially damp ed/undamped sinusoids in noise. IEEE T r ans. on A c oustics, Sp e e ch and Signal Pr o c essing , pages 814–824, 1990. [24] A. T. Kalai, A. Moitra, and G. V aliant. Efficiently learning mixtures of tw o gaussians. In STOC , pages 553-562, 2010. [25] W. Liao and A. F annjiang. MUSIC for single-snapshot sp ectral estimation: stabilit y and sup er- resolution. , 2014. [26] A. Moitra and M. Saks. A p olynomial time algorithm for lossy p opulation recov ery . In FOCS , pages 110–116, 2013. [27] H. Mon tgomery . T en L e ctur es on the Interfac e Betwe en Analytic Numb er The ory and Harmonic Anal- ysis . American Mathematical So ciet y , 1994. [28] A. Moitra and G. V aliant. Setting the p olynomial learnability of mixtures of gaussians. In FOCS , pages 93–102, 2010. [29] H. Montgomery and R. V aughan. Hilbert’s inequality . Journal of the L ondon Math. So ciety , pages 43–81, 1974. [30] S. Park, M. Park and M. Kang. Super-resolution image reconstruction: a technical ov erview. Signal Pr o c essing Magazine , pages 21–36, 2003. [31] V. Pisarenk o. The retriev al of harmonics from a cov ariance function. Ge ophysic al Journal of the R oyal Astr onomic al So ciety , pages 347–366, 1973. [32] The Ro yal Sw edish Academy of Sciences. The Nob el Prize in Chemistry 2014. http://www.nobelprize.org/nobel_prizes/chemistry/laureates/2014/press.html [33] A. Selberg. Col le cte d Pap ers . Springer-V erlag, 1991. [34] D. Slepian and H. Pollac k. Prolate spheroidal w av e functions, F ourier analysis and uncertaint y – I Bel l System T e chnic al Journal , pages 43–64, 1961. [35] D. Slepian and H. Pollac k. Prolate spheroidal wa ve functions, F ourier analysis and uncertaint y – I I Bel l System T e chnic al Journal , pages 65–84, 1961. [36] D. Slepian and H. P ollack. Prolate spheroidal wa ve functions, F ourier analysis and uncertaint y – I I I Bel l System T e chnic al Journal , pages 1295–1336, 1962. [37] D. Slepian. Prolate spheroidal wa ve functions, F ourier analysis and uncertaint y – IV Bel l System T e chnic al Journal , pages 3009–3058, 1964. [38] D. Slepian. Prolate spheroidal w av e functions, F ourier analysis and uncertain ty – V Bel l System T e chnic al Journal , pages 1371–1430, 1978. [39] G. Stewart. Gershgorin theory for the generalized eigenv alue problem Ax = λB x . Mathematics of Computation , pages 600–606, 1975. [40] G. Stew art and J. Sun. Matrix Perturb ation The ory . Academic Press Inc., 1990. [41] P . Stoica. List of references on sp ectral line analysis. Signal Pr o c essing , pages 329–340, 1993. [42] G. T ang , B. Bhask ar, and B. Rech t. Sparse reco very ov er con tinuous dictionaries: Just discretize. Asilomar Confer enc e on Signals, Systems and Computers , 2013. [43] A. Timan. The ory of Appr oximation of F unctions of a R e al V ariable . Pergamon Press-Macmillan, 1963. 17 [44] J. V aaler. Some extremal functions in F ourier analysis. Bul letin of the Americ an Mathematic al So c. , pages 183–216, 1985. [45] A. Zygm und. T rigonometric Series . Cam bridge Universit y Press, 1968. 18

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment