Off-grid Direction of Arrival Estimation Using Sparse Bayesian Inference

Of f-grid Direction of Arri v al Estimation Using Sparse Bayesian Inference Zai Y ang, Lihua Xie ∗ , F ellow , IEEE , and Cishen Zhang † Abstract Direction of arriv al (DO A) estimation is a classical problem in signal processing with many practical appli- cations. Its research has recently been advanced owing to the development of methods based on sparse signal reconstruction. While these methods ha ve shown adv antages over con ventional ones, there are still dif ﬁculties in practical situations where true DO As are not on the discretized sampling grid. T o deal with such an of f-grid DO A estimation problem, this paper studies an off-grid model that tak es into account effects of the off-grid DO As and has a smaller modeling error . An iterativ e algorithm is developed based on the of f-grid model from a Bayesian perspectiv e while joint sparsity among different snapshots is exploited by assuming a Laplace prior for signals at all snapshots. The new approach applies to both single snapshot and multi-snapshot cases. Numerical simu- lations show that the proposed algorithm has impro ved accuracy in terms of mean squared estimation error . The algorithm can maintain high estimation accuracy e ven under a v ery coarse sampling grid. 1 Intr oduction Source localization using sensor arrays [1] has been an activ e research area for decades. This paper focuses on the narro wband far -ﬁeld source case where the wa ve front is assumed to be planar and the angle/direction information is to be estimated, known as the direction of arriv al (DO A) estimation problem. MUSIC [2] is the most successful method among con ventional DOA estimation techniques. It has been prov en to be a realization of the maximum likelihood method in the case of a large number of snapshots and uncorrelated source signals [3]. The research on DO A estimation has been adv anced in recent years o wing to the dev elopment of methods based on sparse signal reconstruction (SSR) or compressed sensing (CS) [4]. In these methods, e.g., ` 1 -SVD [5], a ﬁxed sampling grid is selected ﬁrstly that serves as the set of all candidates of DOA estimates. Then by assuming that all true (unknown) DO As are exactly on the selected grid, an SSR problem can be formulated where the DO As of interest constitute the support of the sparse signal to be recov ered. In the case of a single measurement v ector (SMV , or a single snapshot), ` 1 optimization [6] is a f av orable approach to the sparse signal recovery due to its guaranteed recov ery accuracy . It has been prov en in [6] that a sparse signal can be accurately recov ered under a so-called restricted isometry property (RIP) condition that requires all columns of the measurement matrix be highly incoherent. In the case of multiple measurement vectors (MMV), the sparse signals at all snapshots share the same support. It has been shown in [7] that such joint sparsity can be exploited to improv e the averaged recov ery success probability under a similar incoherent matrix condition. Sparse Bayesian inference/learning (SBI) [8 – 11] is another popular method for the sparse signal recovery in CS. In SBI, the signal recov ery problem is formulated from a Bayesian perspecti ve while the sparsity information is exploited by assuming a sparse prior for the signal of interest. As an e xample, a Laplace signal prior leads to a maximum ∗ Z. Y ang and L. Xie are with EXQUISITUS, Centre for E-City , School of Electrical and Electronic Engineering, Nanyang T echnological Univ ersity , 639798, Singapore (e-mail: yang0248@e.ntu.edu.sg; elhxie@ntu.edu.sg). † C. Zhang is with the Faculty of Engineering and Industrial Sciences, Swinburne Univ ersity of T echnology , Ha wthorn VIC 3122, Australia (e-mail: cishenzhang@swin.edu.au). 1 a posteriori (MAP) optimal estimate that coincides with an optimal solution to the ` 1 optimization [10]. In the MMV case, the joint sparsity among dif ferent (uncorrelated) snapshots is utilized by assuming the same sparse prior for the signals at all snapshots [12]. Correlations between snapshots have also been studied in a recent paper [13]. One merit of SBI is its ﬂexibility in modeling sparse signals that can not only promote the sparsity of its solution, e.g., in [11], but also exploit the possible structure of the signal to be recovered, e.g., in [14]. Since the Bayesian inference is a probabilistic method and based on heuristics to some extent, one shortcoming of SBI is that it offers fe wer guarantees on the signal recovery accurac y as compared with, for example, ` 1 optimization. Recent advancements in array signal processing include compressiv e (CS-) MUSIC [15] and subspace-augmented (SA-) MUSIC [16]. They are combinations of the con ventional MUSIC technique and recent CS methods with guar - anteed support recovery performance and can outperform MUSIC and standard CS approaches. Though e xisting CS-based approaches hav e sho wn their impro vements in DO A estimation, e.g., their success in the case of limited snapshots, there are still difﬁculties in practical situations where the true DOAs are not on the sampling grid. On one hand, a dense sampling grid is necessary for accurate DO A estimation to reduce the gap between the true DOA and its nearest grid point since the estimated DO As are constrained on the grid. On the other hand, a dense sampling grid leads to a highly coherent matrix that violates the condition for the sparse signal recovery . W e refer to the model adopted in the standard CS methods as an on-grid model hereafter in the sense that the estimated DOAs are constrained on the ﬁxed grid. An off-grid model for DO A estimation is studied in [17] where the estimated DOAs are no longer constrained in the sampling grid set. The model takes into account the basis mismatch in the measurement matrix caused by the of f-grid DO As. It has been shown in [17] that the sparse total least squares (STLS) solver proposed in [17] can yield an MAP optimal estimate if the matrix perturbation caused by the basis mismatch is Gaussian. Howe ver , we sho w in this paper that the Gaussian condition cannot be satisﬁed in the off-grid DOA estimation problem and hence a new solver is needed. In this paper , we propose a Bayesian algorithm for DOA estimation based on the off-grid model that applies to both SMV and MMV cases. The of f-grid distance (the distance from the true DO A to the nearest grid point) that lies in a bounded interv al is assumed to be uniformly distributed (noninformative) rather than Gaussian as in [17]. W e refer to our algorithm as off-grid sparse Bayesian inference (OGSBI) in the body of the paper . W e then incorporate in our algorithm an idea in [5], using the singular v alue decomposition (SVD) to reduce the computational workload of the signal recovery process and the sensiti vity to noise, namely , OGSBI-SVD. Our approach is a spectral-like method with the value of the spectrum at each DOA being the estimated source power from such a direction. W e sho w by numerical simulations that the proposed method has a smaller mean squared error (MSE) in comparison with ` 1 -SVD [5]. Indeed, the proposed method can exceed a lo wer bound of MSE that is shared among all on-grid model based methods including CS-MUSIC and SA-MUSIC. It is also sho wn that the proposed method is more accurate and faster than STLS. Moreover , the proposed method can estimate DO As accurately ev en under a coarse sampling grid. Notations used in this paper are as follows. Bold-face letters are reserved for vectors and matrices. x , x T and x H denote comple x conjugate, transpose and conjug ate transpose of a vector x , respecti vely . k·k 1 , k·k 2 and k·k F denote the ` 1 norm, ` 2 norm and Frobenious norm, respecti vely . | A | , T r { A } are the determinant and trace of a matrix A respecti vely . x j is the j th entry of a vector x . A j , A j and A ij are the j th column, j th row and ( i, j ) th entry of a matrix A , respecti vely . diag ( A ) denotes a column vector composed of the diagonal elements of a matrix A , and diag ( x ) is a diagonal matrix with x being its diagonal elements. x 0 ( θ ) is the deri v ativ e of x ( θ ) with respect to θ . < and = take the real and imaginary parts of a complex variable respectiv ely . x  y is the Hadamard (element-wise) product of x and y . b x denotes an estimate of x . The rest of the paper is organized as follows. Section 2 studies the off-grid model used in this paper . Section 3 introduces the proposed OGSBI and OGSBI-SVD algorithms. Section 4 presents our simulation results. Section 5 concludes this paper . 2 2 Off-grid DO A Estimation Model Consider K narro wband f ar-ﬁeld sources s k ( t ) , k = 1 , · · · , K , impinging on an array of M omnidirectional sensors from directions θ k , k = 1 , · · · , K . Time delays at different sensors can be represented by simple phase shifts, leading to the observ ation model [1]: y ( t ) = A ( θ ) s ( t ) + e ( t ) , t = 1 , · · · , T , (1) where y ( t ) = [ y 1 ( t ) , · · · , y M ( t )] T , θ = [ θ 1 , · · · , θ K ] T , s ( t ) = [ s 1 ( t ) , · · · , s K ( t )] T , e ( t ) = [ e 1 ( t ) , · · · , e M ( t )] T , and y m ( t ) and e m ( t ) , m = 1 , · · · , M , are the output and measurement noise of the m th sensor at time t respectiv ely . The matrix A ( θ ) = [ a ( θ 1 ) , · · · , a ( θ K )] is an array manifold matrix and a ( θ k ) is called steering v ector of the k th source. The entry a m ( θ k ) contains the delay information of the k th source to the m th sensor . In this paper , we assume that the number of sources K is already known. Readers are referred to a preprint version [18] for discussions on the case of unkno wn K . So, the goal is to ﬁnd the unkno wn DO As θ gi ven K , y ( t ) and the mapping θ → A ( θ ) . In the follo wing we re-deri ve the of f-grid model proposed in [17] using linear approximation and further sho w its relationship with the on-grid one. Let ˜ θ = n ˜ θ 1 , · · · , ˜ θ N o be a ﬁxed sampling grid in the DO A range [0 , π ] , where N denotes the grid number and typically satisﬁes N  M > K . W ithout loss of generality , let ˜ θ be a uniform grid with a grid interval r = ˜ θ 2 − ˜ θ 1 ∝ N − 1 . Suppose θ k / ∈ n ˜ θ 1 , · · · , ˜ θ N o for some k ∈ { 1 , · · · , K } and that ˜ θ n k , n k ∈ { 1 , · · · , N } , is the nearest grid point to θ k . W e approximate the steering vector a ( θ k ) using linearization: a ( θ k ) ≈ a  ˜ θ n k  + b  ˜ θ n k   θ k − ˜ θ n k  (2) with b  ˜ θ n k  = a 0  ˜ θ n k  . Denote A = h a  ˜ θ 1  , · · · , a  ˜ θ N i , B = h b  ˜ θ 1  , · · · , b  ˜ θ N i , β = [ β 1 , · · · , β N ] T ∈  − 1 2 r , 1 2 r  N and Φ ( β ) = A + B diag ( β ) , where for n = 1 , · · · , N , β n = θ k − ˜ θ n k , x n ( t ) = s n k ( t ) , if n = n k for any k ∈ { 1 , · · · , K } ; β n = 0 , x n ( t ) = 0 , otherwise , (3) with n k ∈ { 1 , · · · , N } and ˜ θ n k being the nearest grid to a source θ k , k ∈ { 1 , · · · , K } . By absorbing the approxi- mation error into the measurement noise the observ ation model in (1) can be written into y ( t ) = Φ ( β ) x ( t ) + e ( t ) , t = 1 , · · · , T , (4) which is the of f-grid model to be used in this paper . This model will be empirically v alidated in Subsection 4.1 by sho wing that the total noise (approximation error plus measurement noise) follo ws the Gaussian distribution with high probability if the measurement noise is Gaussian. It should be noted that the of f-grid model in (4) is closely related to the on-grid one that can be obtained by setting β = 0 in (4) ( Φ ( 0 ) = A ). In fact, the off-grid model can be considered as the ﬁrst order approximation of the true observation model while the on-grid one is the zeroth order approximation. As a result, the of f-grid model has a much smaller modeling error than the on-grid one. Such an advantage is twofold. First, by adopting the same sampling grid the off-grid model results in higher accuracy , especially in the case of a lo w measurement noise where the modeling error is the dominant modeling uncertainty . Second, a coarser sampling grid can be adopted in the of f-grid model to achiev e a considerably reduced computational workload with a comparable modeling accuracy . T o estimate the DOAs θ we need to ﬁnd not only the support of the sparse signals x ( t ) , t = 1 , · · · , T , but also the off-grid difference β . In this paper , we formulate the problem based on a Bayesian perspectiv e and de velop an iterati ve algorithm to jointly estimate x ( t ) , t = 1 , · · · , T , and β in the follo wing section. 3 3 OGSBI: Off-grid Sparse Bayesian Inference W e consider comple x-valued signals throughout the paper since the matrix Φ ( β ) is complex-v alued. W e deri ve our algorithm in the MMV case. The SMV is a special case by simply setting T = 1 . Denote Y = [ y (1) , · · · , y ( T )] , X = [ x (1) , · · · , x ( T )] and E = [ e (1) , · · · , e ( T )] . The of f-grid DO A estimation model in (4) becomes Y = Φ ( β ) X + E (5) with Φ ( β ) = A + B diag ( β ) , Y , E ∈ C M × T , X ∈ C N × T , A , B , Φ ( β ) ∈ C M × N and β ∈  − 1 2 r , 1 2 r  N . The matrix X of interest is jointly sparse (or ro w-sparse), i.e., all columns of X are sparse and share the same support. 3.1 Sparse Bayesian F ormulation 3.1.1 Noise model Under an assumption of white (circular symmetric) complex Gaussian [19] noises, we ha ve p ( E | α 0 ) = T Y t =1 C N  e ( t ) | 0 , α − 1 0 I  (6) where α 0 = σ − 2 denotes the noise precision with σ 2 being the noise variance, the probability density function (PDF) of a (circular symmetric) complex Gaussian distributed random variable u ∼ C N ( µ , Σ ) with mean µ and cov ariance Σ is [19] C N ( u | µ , Σ ) = 1 π N | Σ | exp n − ( u − µ ) H Σ − 1 ( u − µ ) o . (7) Then we hav e p ( Y | X , α 0 , β ) = T Y t =1 C N  y ( t ) | Φ ( β ) x ( t ) , α − 1 0 I  . (8) In this paper we assume that the noise precision α 0 is unknown. A Gamma hyperprior is assumed for α 0 since it is a conjugate prior of the Gaussian distrib ution: p ( α 0 ; c, d ) = Γ ( α 0 | c, d ) (9) where Γ ( α 0 | c, d ) = [Γ ( c )] − 1 d c α 0 c − 1 exp {− dα 0 } with Γ ( · ) being the Gamma function. W e set c, d → 0 as in [8, 9] to obtain a broad hyperprior . 3.1.2 Sparse signal model A sparse prior is needed for the jointly sparse matrix X of interest. W e assume that the signals among snapshots are independent and adopt the two-stage hierarchical prior: p ( X ; ρ ) = R p ( X | α ) p ( α ; ρ ) d α , where ρ > 0 , α ∈ R N , Λ = diag ( α ) and p ( X | α ) = T Y t =1 C N ( x ( t ) | 0 , Λ ) , (10) p ( α ; ρ ) = N Y n =1 Γ ( α n | 1 , ρ ) . (11) It is easy to show that all columns of X are independent and share the same prior . According to [10], for t = 1 , · · · , T both < { x ( t ) } and = { x ( t ) } are Laplace distrib uted and share the same PDF that is strongly peaked at the origin. As a result, the two-stage hierarchical prior is a sparse prior that fa vors most ro ws of X being zeros. 4 3.1.3 Off-grid distance model W e assume a uniform prior for β : β ∼ U  − 1 2 r , 1 2 r  N ! . (12) The prior is noninformati ve in the sense that the only information of β we use is its boundedness. By combining the stages of the hierarchical Bayesian model, the joint PDF is p ( X , Y , α 0 , α , β ) = p ( Y | X , α 0 , β ) p ( X | α ) p ( α ) p ( α 0 ) p ( β ) (13) with the distributions on the right hand side as deﬁned by (8), (10), (11), (9) and (12) respecti vely . 3.2 Bayesian Infer ence An evidence procedure [20] is exploited to perform the Bayesian inference since the exact posterior distribution p ( X , α 0 , α , β | Y ) cannot be explicitly calculated. Similar approaches have been used in standard Bayesian CS methods [9, 10]. First it is easy to show that the posterior distrib ution of X is a complex Gaussian distrib ution: p ( X | Y , α 0 , α , β ) = T Y t =1 C N ( x ( t ) | µ ( t ) , Σ ) (14) with µ ( t ) = α 0 ΣΦ H y ( t ) , t = 1 , · · · , T , (15) Σ =  α 0 Φ H Φ + Λ − 1  − 1 . (16) Calculations of Σ and µ ( t ) , t = 1 , · · · , T , need estimates of the hyperparameters α 0 , α and β . In an e vidence procedure, they are estimated using an MAP estimate that maximizes p ( α 0 , α , β | Y ) . It can be easily observed that to maximize p ( α 0 , α , β | Y ) is equi valent to maximizing the joint PDF p ( Y , α 0 , α , β ) = p ( α 0 , α , β | Y ) p ( Y ) since p ( Y ) is independent of the hyperparameters. An expectation-maximization (EM) algorithm is implemented that treats X as a hidden v ariable and turns to maximizing E { log p ( X , Y , α 0 , α , β ) } , where p ( X , Y , α 0 , α , β ) is gi ven in (13) and E {·} denotes an expectation with respect to the posterior of X as gi ven in (14) using the current estimates of the hyperparameters. Denote U = [ µ (1) , · · · , µ ( T )] = α 0 ΣΦ H Y , X = X / √ T , Y = Y / √ T , U = U / √ T and ρ = ρ/T . Follo wing a similar procedure as in [8], it is easy to obtain the follo wing updates of α and α 0 : α new n = r 1 + 4 ρE n k X n k 2 2 o − 1 2 ρ , n = 1 , · · · , N , (17) α new 0 = M + ( c − 1) /T E n k Y − Φ X k 2 F o + d/T , (18) where E n k X n k 2 2 o = k U n k 2 2 + Σ nn , E n k Y − Φ X k 2 F o = k Y − Φ U k 2 F + α − 1 0 P N n =1 γ n with γ n = 1 − α − 1 n Σ nn . 5 For β , its estimate maximizes E { log p ( Y | X , α 0 , β ) p ( β ) } by (13) and thus minimizes E ( 1 T T X t =1 k y ( t ) − ( A + B diag ( β )) x ( t ) k 2 2 ) = 1 T T X t =1 k y ( t ) − ( A + B diag ( β )) µ ( t ) k 2 2 + T r n ( A + B diag ( β )) Σ ( A + B diag ( β )) H o = β T P β − 2 v T β + C (19) where C is a constant term independent of β , P is a positiv e semi-deﬁnite matrix and P = < n B H B   U · U H + Σ  o , (20) v = < ( 1 T T X t =1 diag  µ ( t )  B H ( y ( t ) − Aµ ( t )) ) −<  diag  B H A Σ  . (21) The detailed deri vation of (19) is pro vided in Appendix for simplicity of exposition. As a result, we hav e β new = arg min β ∈ [ − 1 2 r, 1 2 r ] N  β T P β − 2 v T β  . (22) Remark 1 Though an explicit expr ession of β new cannot be given, by reco gnizing that β is jointly sparse with x , the dimension of β can be r educed to K in the computation and hence β new can be efﬁciently calculated. W e pr ovide the details in Subsection 3.5. The proposed OGSBI algorithm is implemented as follows. After initializations of the hyperparameters α , α 0 and β , we calculate Σ and µ ( t ) , t = 1 , · · · , T , using the current values of the hyperparameters according to (16) and (15) respecti vely . Then we update α , α 0 and β according to (17), (18) and (22) respecti vely . The process is repeated until some conv ergence criterion is satisﬁed. W e note that OGSBI has guaranteed con v ergence since the function p ( α 0 , α , β | Y ) is guaranteed to increase at each iteration by the property of EM algorithm [21]. 3.3 OGSBI-SVD In this subsection we recall a subspace-based idea in [5] that uses the SVD of the measurement matrix Y = U S V H to reduce the computation of the signal reconstruction process and the sensitivity to the measurement noise. Then we incorporate it into our OGSBI algorithm. Consider the noise-free case where Y = Φ X with K ≤ T . W e hav e Rank ( Y ) ≤ Rank ( X ) ≤ K . Let V = [ V 1 V 2 ] , where V 1 and V 2 are matrices that consist of the ﬁrst K and the rest T − K columns of V respectiv ely . Then we hav e that Y S V = Y V 1 ∈ C M × K preserves all signal information. In a general case where noises exist, by the SVD we hav e Y V = [ Y S V Y V 2 ] , where the ﬁrst part Y S V preserves most signal information and is to be used in the following signal recov ery process while the second part is abandoned. Denote X S V = X V 1 and E S V = E V 1 . Then we hav e Y S V = Φ X S V + E S V . (23) In (23), Y S V , X S V and E S V can be viewed as the new matrices of sensor measurements, source signals and measurement noises respecti vely . The joint sparsity still holds in X S V . W e do not exploit possible correlations that 6 exist between columns of X S V (and in E S V ), i.e., we still assume that X S V (and E S V ) hav e independent columns. 1 It is then straightforward to apply the proposed OGSBI algorithm to estimate X S V , β and then the DOAs. W e use OGSBI-SVD to refer to the resulting algorithm. Based on implementation details to be introduced in Subsection 3.5, it can be sho wn that OGSBI-SVD has a computational complexity of order O  M N 2  per iteration while that for OGSBI is O  max  M N 2 , M N T  per iteration. An additional computational workload of order O  max  M 2 T , M T 2  is for the SVD of Y in OGSBI- SVD. Since it is empirically found that OGSBI-SVD conv erges much faster than OGSBI, the whole computational workload of OGSBI-SVD is less than that of OGSBI in general. 2 3.4 Source P ower and DO A Estimation W e use the estimated source powers from different directions to form a spectrum of the proposed algorithm. In the follo wing we deriv e a formula to estimate the source po wers. W e tak e OGSBI-SVD as an example. The case of OGSBI is similar with some modiﬁcations [18]. Let c X = X S V V H 1 be an estimate of the signal X . Then consider c X ro w by ro w and we hav e c X n ∼ C N  b U n V H 1 , b Σ nn V 1 V H 1  where we use b U and b Σ to denote the ﬁnal estimates of the mean and covariance of X S V respecti vely . W e use the expectation as an estimate of the power ℘ n from direction ˜ θ n (with a modiﬁcation of β n ): b ℘ n = E { ℘ n } = 1 T E     c X n    2 2  = 1 T     E n c X n o    2 2 + E     c X n − E n c X n o    2 2  = 1 T     b U n V H 1    2 2 + T r n b Σ nn V 1 V H 1 o  =    b U n    2 2 T + K b Σ nn T . (24) Like other spectral-based methods, the DO As are estimated using the locations of the highest peaks of the spectrum. Suppose that the grid indices of the highest K peaks of b ℘ are b n k , k = 1 , · · · , K . The estimated K DO As will be b θ k = ˜ θ b n k + b β b n k , k = 1 , · · · , K . 3.5 Implementation Details This subsection presents some details of our implementations of OGSBI and OGSBI-SVD. At each iteration of OGSBI or OGSBI-SVD, an N × N matrix in version is required when updating Σ according to (16). By M < N the W oodbury matrix identity is applied to gi ve Σ = Λ − ΛΦ H C − 1 ΦΛ with C = α − 1 0 I + ΦΛΦ H ∈ C M × M . By the fact that β is jointly sparse with x ( t ) whose K nonzero entries correspond to the locations of the K sources, we calculate only entries of β that correspond to locations of the maximum K entries of α and set others to zeros. As a result, β , P and v can be truncated into dimension of K or K × K . W e still use β , P and v hereafter to denote their truncated versions for simplicity . By (22) and ∂ ∂ β  β T P β − 2 v T β  = 2 ( P β − v ) we ha ve β new = ˇ β if P is in vertible and ˇ β = P − 1 v ∈  − 1 2 r , 1 2 r  K . Otherwise, we update β elementwise, i.e., at each step we update one β n by ﬁxing up the other entries of β . For n = 1 , · · · , K , ﬁrst we let ˇ β n = v n − ( P n ) T − n β − n P nn , (25) 1 The correlations between columns of the signal matrix ( X S V in our case) hav e recently been studied in [13]. 2 A possible e xception happens in the case of T  N where the computation for the SVD is quite heavy . A modiﬁed approach in such a case is to partition Y ﬁrstly into blocks with each of about N columns, then operate the SVD on each block and keep the resulting signal subspaces, and ﬁnally do another SVD on the new signal matrix composed of all signal subspaces. A model similar to (23) can be cast. 7 where u − n is u without the n th entry for a v ector u . Then by constraining β n ∈  − 1 2 r , 1 2 r  we hav e β new n =    ˇ β n , if ˇ β n ∈  − 1 2 r , 1 2 r  ; − 1 2 r , if ˇ β n < − 1 2 r ; 1 2 r , otherwise . (26) It is easy to sho w that the objectiv e function is guaranteed to decrease at each step with β n deﬁned in (26). W e terminate OGSBI and OGSBI-SVD if k α i +1 − α i k 2 k α i k 2 < τ or the maximum number of iterations is reached, where τ is a user-deﬁned tolerance and the superscript i refers to the iteration. 4 Numerical simulations In this section, we present our numerical results for the DO A estimation. A standard uniform linear array (ULA) of M = 8 sensors is considered. The origin is set at the middle point of the ULA to reduce the approximation error in (2). So we ha ve A mn = exp n j π  m − M +1 2  cos ˜ θ n o and B mn = − j π  m − M +1 2  sin ˜ θ n · A mn , m = 1 , · · · , M , n = 1 , · · · , N , with j = √ − 1 . A uniform sampling grid { 0 ◦ , r, 2 r, · · · , 180 ◦ } is considered with r being the grid interv al. The number of snapshots is set to T = 200 in the case of MMV . W e consider only OGSBI-SVD in the MMV case since it is empirically observ ed to con ver ge faster and be more accurate in comparison with OGSBI. In OGSBI- SVD, we set ρ = 0 . 01 and c = d = 1 × 10 − 4 . W e initialize α 0 = 100 K P K t =1 V ar { ( Y S V ) t } , α = 1 M K P K t =1   A H ( Y S V ) t   and β = 0 , where |·| applies elementwise. W e set τ = 10 − 3 and the maximum number of iterations to 1000. W e note that the proposed algorithm is insensitiv e to the initializations of α 0 , α and β , as well as to ρ if ρ is not too large. As reported in [12], the estimate of α 0 can be inaccurate in some cases. But we hav e observ ed minimal effects on the result of DOA estimation. Readers are referred to [18] for detailed discussions. All experiments are carried out in Matlab v .7.7.0 on a PC with a W indows XP system and a 3GHz CPU. Matlab codes hav e been made av ailable online at https://sites.google.com/site/zaiyang0248/publication. 4.1 Comparison with ` 1 -SVD W e take ` 1 -SVD in [5] as a representati ve of on-grid model based methods and compare OGSBI-SVD with it in terms of mean squared error (MSE) and computational time with respect to the grid interval r and SNR. In our experiment, we consider SNR = 10 and 0 dB, and r = 0 . 5 ◦ , 1 ◦ , 2 ◦ and 4 ◦ . In each trial, K = 2 sources θ 1 , θ 2 are uniformly generated within interv als [58 ◦ , 62 ◦ ] and [86 ◦ , 90 ◦ ] respectively . Before presenting our comparison results, we show using K olmogorov-Smirno v test that the total noise (measurement noise plus approximation error) in (4) is Gaussian distributed with a rate of at least 94% in all scenarios. This empirically v alidates the off-grid model. For each combination ( SNR , r ) , the MSE is av eraged over R = 200 trials: MSE = 1 RK P R i =1 P K k =1  θ i k − b θ i k  2 where the superscript i refers to the i th trial. It should be noted that there exists a lo wer bound for the MSE of ` 1 -SVD regardless of the SNR since the best DOA estimate that ` 1 -SVD can obtain is the grid point nearest to the true DO A. In fact, the lo wer bound is shared among all on-grid model based methods including CS-MUSIC, SA- MUSIC and the algorithm in [13]. By assuming that the true DO A is uniformly distributed, the lower bound can be easily calculated as LB = r 2 / 12 . 3 Fig. 1 presents our e xperimental results. In all scenarios under consideration, OGSBI-SVD has more accurate DOA estimation than ` 1 -SVD. Moreov er , OGSBI-SVD can e xceed the lo wer bound for ` 1 -SVD in most scenarios. The phenomenon is signiﬁcant in the case of a higher SNR or a coarser sampling grid where the on-grid model has a poor performance on describing the true observation model while it is o vercome to a large e xtent by the off-grid model used in this paper . 3 The presented lower bound is, in fact, the expectation in the case of limited trials. The variance approaches zero as the number of trials gets large. 8 0 0.5 1 1.5 2 2.5 3 3.5 4 −60 −55 −50 −45 −40 −35 −30 Grid interval (degrees) MSE (dB) Lower bound for L1−SVD L1−SVD, SNR = 10dB L1−SVD, SNR = 0dB OGSBI−SVD, SNR = 10dB OGSBI−SVD, SNR = 0dB Figure 1: MSEs of OGSBI-SVD and ` 1 -SVD. The lo wer bound is for ` 1 -SVD regardless of the SNR. T able 1: A veraged T ime Consumptions of ` 1 -SVD and OGSBI-SVD W ith Respect to SNR and r . T ime unit: sec. SNR = 10 dB r = 0 . 5 ◦ r = 1 ◦ r = 2 ◦ r = 4 ◦ ` 1 -SVD 0 . 601 0 . 413 0 . 324 0 . 291 OGSBI-SVD 10 . 2 0 . 782 0 . 096 0 . 025 SNR = 0 dB r = 0 . 5 ◦ r = 1 ◦ r = 2 ◦ r = 4 ◦ ` 1 -SVD 0 . 413 0 . 295 0 . 218 0 . 190 OGSBI-SVD 10 . 9 0 . 831 0 . 104 0 . 024 T able 1 presents the a veraged CPU times of OGSBI-SVD and ` 1 -SVD (excluding the SVD process that tak es about 0 . 003 s in our case) with respect to SNR and r . 4 For both OGSBI-SVD and ` 1 -SVD, their CPU times decrease as the grid gets coarser . OGSBI-SVD is faster than ` 1 -SVD at r = 2 ◦ and 4 ◦ . One drawback of the proposed method is that it is slow in the case of a dense sampling grid. In practice, we recommend to use a coarser grid with r = 2 ◦ for the proposed algorithm since it can gi ve an accurate yet fast DO A estimation. Remark 2 (1) W e choose ` 1 optimization for comparison because it is typically known to have better theoretical guarantee for sparse r ecovery , though simpler solvers in CS, e.g., OMP [22], may succeed in our setting wher e the two sour ces are well separated and have lower computational cost in such low dimensional pr oblems. Readers ar e r eferr ed to [18] for mor e simulation results of ` 1 optimization and the pr oposed method in the case of closely spaced sour ces. (2) Another advantage of the pr oposed algorithm is its smaller DO A estimation bias in comparison with that of ` 1 -SVD [18]. 4 The code of ` 1 -SVD is provided by the author of [5]. W e note that its speed can be accelerated using state-of-the-art algorithms for CS. 9 T able 2: A veraged MSEs and CPU T imes of STLS and OGSBI in the SMV Case W ith Respect to r . MSE (dB) T ime (sec) r = 2 ◦ r = 4 ◦ r = 2 ◦ r = 4 ◦ STLS − 36 . 5 − 36 . 6 5 . 31 1 . 77 OGSBI − 45 . 1 − 43 . 3 0 . 098 0 . 028 T able 3: A veraged MSEs of OGSBI-SVD in the Presence of Outliers. κ denotes a ratio by which outliers are augmented. κ 1 5 10 20 50 100 MSE(dB) − 63 . 1 − 62 . 9 − 47 . 6 − 38 . 6 − 34 . 6 − 32 . 0 4.2 Comparison with STLS The off-grid model has recently been used in [17] for DOA estimation. In [17], a sparse total least-squares (STLS) approach is proposed. In the SMV case, STLS seeks to solve the noncon ve x optimization problem min x , β n k β k 2 2 + k y − ( A + B diag ( β )) x k 2 2 + λ k x k 1 o (27) where x is the sparse source signal of interest, y is the noisy measurement, A , B and β are the same as deﬁned in the off-grid model, and λ > 0 is a regularization parameter . From the Bayesian perspectiv e, this is equiv alent to seeking for an MAP solution of ( x , β ) by assuming that the measurement noise is white Gaussian, x is Laplacian and β is Gaussian. It is noted that the last assumption for β cannot properly capture the property of β . A local minima of the problem in (27) is achieved in [17] by an alternating approach, i.e., alternati vely solving x with a ﬁxed β , which requires a solution to an ` 1 -regularized least square problem, and solving β with a ﬁxed x , which requires a solution to an N dimensional linear system. As argued in [5] the SVD used in OGSBI-SVD can alleviate the sensitivity to the measurement noise in the MMV case that is not used in [17]. T o make a fair comparison, we consider only the SMV case when comparing our method with STLS though a similar problem can be cast for STLS in the MMV case. In our implementation of OGSBI, we initialize α 0 = 100 V ar { y } and α = 1 M K   A H y   . The rest settings are the same as those for OGSBI-SVD in the MMV case. In our e xperiment, we consider two DO As from 63 . 2 ◦ and 90 . 3 ◦ with SNR = 20 dB. W e consider r = 2 ◦ and 4 ◦ for both OGSBI and STLS. The parameter λ in (27) is tuned to our best such that STLS achiev es the smallest error . T able 2 presents the a veraged MSEs and CPU times of STLS and OGSBI ov er R = 200 trials. OGSBI obtains more accurate DO A estimations than STLS in both the scenarios with remarkably less computational times. W e also note that it is possible to accelerate STLS using state-of-the-art algorithms for CS. 4.3 Sensitivity to Measur ement Outliers The SVD procedure in OGSBI-SVD is related to the principal component analysis (PCA). As is known that the standard PCA is sensitive to outliers. Even a single corrupted measurement can deteriorate the quality of the ap- proximation. In this subsection we carry out experiments to check whether the proposed OGSBI-SVD is sensiti ve to measurement outliers due to the SVD. The experimental setup is similar to that in Subsection 4.1 but with SNR = ∞ . After acquiring the noiseless measurements, we randomly choose 3 out of the M T = 1600 measurements, multiply by a constant ratio κ and then save as the outliers. Beside the case of no outliers (ratio = 1 ) we consider ﬁv e other cases where κ is set to 5, 10, 20, 50 and 100 respecti vely . T able 3 presents our simulation results of the MSEs. It can be seen that the estimation accurac y of OGSBI-SVD can degrade signiﬁcantly e ven with about 0 . 2% measurements being corrupted due to the sensiti vity of the SVD. 10 Note that the corrupted measurement matrix Y due to the outliers is a sum of a low-rank matrix (noiseless measurement matrix of rank K ) and a sparse matrix (outliers). A robust PCA technique has recently been proposed in [23] that can reco ver the original low-rank matrix from the sparse outliers. So, it is possible to combine the robust PCA technique in [23] with the proposed OGSBI-SVD to improve its robustness to outliers, which, howe ver , is beyond the scope of this paper . 5 Conclusion In this paper , we studied the off-grid DOA estimation model ﬁrstly proposed in [17] for reducing the modeling error due to discretization of a continuous range. W e proposed an algorithm based on the of f-grid model from a Bayesian perspecti ve that is applicable to both single snapshot and multi-snapshot cases. A subspace-based idea was used to reduce the computational complexity of the signal recov ery process and the sensitivity to noise. W e illustrated by simulations that the proposed approach outperforms standard CS methods whose performance is limited by the underlying standard on-grid model. It is also more accurate than the algorithm in [17] based on the off-grid model. One dra wback of the proposed algorithm is its slow speed in the case of a dense sampling grid though a coarser grid can be adopted to obtain an accurate yet fast DOA estimation. A future work is to develop fast versions of our algorithm. After this work, we hav e shown in [24] that ` 1 optimization also works for the off-grid DO A estimation problem, where performance guarantees are also provided under some conditions. A ppendix: Derivation of (19) Denote ∆ = diag ( β ) . Eq. (19) is based on the follo wing two equalities: k y − ( A + B ∆ ) µ k 2 2 = k ( y − Aµ ) − B · diag ( µ ) · β k 2 2 = β T  B H B  µµ H  β − 2 <  diag ( µ ) B H ( y − Aµ )  T β + C 1 , T r n ( A + B ∆ ) Σ ( A + B ∆ ) H o = 2 <  T r  B H A Σ∆  + T r  ∆Σ∆ B H B  + C 2 = 2 <  diag  B H A Σ  T β + β T  Σ  B H B  β + C 2 where C 1 , C 2 are constants independent of β , and the equality T r  diag H ( u ) Q · diag ( v ) · R T  = u H ( Q  R ) v for vectors u , v and matrices Q , R with proper dimensions is used. Note that β T S β ∈ R for a positive semi-deﬁnite matrix S with proper dimension and thus β T S β = <  β T S β  = β T · < S · β since β is real-valued. Then (19) is obtained by observing that both B H B  µµ H and Σ  B H B are positi ve semi-deﬁnite. Refer ences [1] H. Krim and M. V iberg, “T wo decades of array signal processing research: the parametric approach, ” IEEE Signal Pr ocessing Magazine , v ol. 13, no. 4, pp. 67–94, 1996. 11 [2] R. Schmidt, “ A signal subspace approach to multiple emitter location spectral estimation, ” Ph.D. dissertation, Stanford Uni versity , 1981. [3] P . Stoica and A. Nehorai, “MUSIC, maximum likelihood, and Cramer-Rao bound, ” IEEE T ransactions on Acoustics, Speech and Signal Pr ocessing , vol. 37, no. 5, pp. 720–741, 1989. [4] E. Cand ` es, “Compressiv e sampling, ” in Pr oceedings oh the International Congr ess of Mathematicians: Madrid, August 22-30, 2006: invited lectur es , 2006, pp. 1433–1452. [5] D. Maliouto v , M. Cetin, and A. W illsky , “ A sparse signal reconstruction perspective for source localization with sensor arrays, ” IEEE T ransactions on Signal Pr ocessing , vol. 53, no. 8, pp. 3010–3022, 2005. [6] E. Cand ` es, J. Romber g, and T . T ao, “Stable signal recovery from incomplete and inaccurate measurements, ” Communications on Pur e and Applied Mathematics , vol. 59, no. 8, pp. 1207–1223, 2006. [7] Y . Eldar and H. Rauhut, “ A v erage case analysis of multichannel sparse recovery using con ve x relaxation, ” IEEE T r ansactions on Information Theory , vol. 56, no. 1, pp. 505–519, 2010. [8] M. T ipping, “Sparse Bayesian learning and the relev ance vector machine, ” The Journal of Machine Learning Resear ch , v ol. 1, pp. 211–244, 2001. [9] S. Ji, Y . Xue, and L. Carin, “Bayesian compressi ve sensing, ” IEEE T ransactions on Signal Pr ocessing , v ol. 56, no. 6, pp. 2346–2356, 2008. [10] S. Babacan, R. Molina, and A. Katsaggelos, “Bayesian compressive sensing using Laplace priors, ” IEEE T r ans- actions on Image Pr ocessing , vol. 19, no. 1, pp. 53–63, 2010. [11] Z. Y ang, L. Xie, and C. Zhang, “Bayesian compressed sensing with ne w sparsity-inducing prior , ” Arxiv pr eprint, available at http://arxiv .or g/pdf/1208.6464 , 2012. [12] D. W ipf and B. Rao, “ An empirical Bayesian strategy for solving the simultaneous sparse approximation prob- lem, ” IEEE T ransactions on Signal Pr ocessing , vol. 55, no. 7, pp. 3704–3716, 2007. [13] Z. Zhang and B. Rao, “Sparse signal reco very with temporally correlated source vectors using sparse Bayesian learning, ” IEEE Journal of Selected T opics in Signal Pr ocessing , no. 99, pp. 912–926, 2011. [14] L. He and L. Carin, “Exploiting structure in wavelet-based Bayesian compressi ve sensing, ” IEEE T ransactions on Signal Pr ocessing , v ol. 57, no. 9, pp. 3488–3497, 2009. [15] J. Kim, O. Li, and and J. Y e, “Compressi ve MUSIC: Revisiting the link between compressi ve sensing and array signal processing, ” IEEE T ransactions on Information Theory , v ol. 58, no. 1, pp. 278–301, 2012. [16] K. Lee, Y . Bresler, and M. Junge, “Subspace methods for joint sparse recovery , ” IEEE T r ansactions on Infor - mation Theory , vol. 58, no. 6, pp. 3613–3641, 2012. [17] H. Zhu, G. Leus, and G. Giannakis, “Sparsity-cognizant total least-squares for perturbed compressi ve sam- pling, ” IEEE T ransactions on Signal Pr ocessing , vol. 59, no. 5, pp. 2002–2016, 2011. [18] Z. Y ang, L. Xie, and C. Zhang, “Off-grid direction of arriv al estimation using sparse Bayesian inference, ” Arxiv pr eprint, available at http://arxiv .or g/pdf/1108.5838v2.pdf , 2011. [19] N. Goodman, “Statistical analysis based on a certain multiv ariate complex Gaussian distribution (an introduc- tion), ” The Annals of mathematical statistics , vol. 34, no. 1, pp. 152–177, 1963. 12 [20] D. MacKay , “Bayesian interpolation, ” Neur al Computation , vol. 4, no. 3, pp. 415–447, 1992. [21] G. McLachlan and T . Krishnan, The EM algorithm and extensions . John W ile y and Sons, 1997. [22] J. T ropp and A. Gilbert, “Signal reco very from random measurements via orthogonal matching pursuit, ” IEEE T r ansactions on Information Theory , vol. 53, no. 12, pp. 4655–4666, 2007. [23] E. Cand ` es, X. Li, Y . Ma, and J. Wright, “Robust principal component analysis?” Journal of the ACM , v ol. 58, no. 7, 2011. [24] Z. Y ang, C. Zhang, and L. Xie, “Robustly stable signal recov ery in compressed sensing with structured matrix perturbation, ” IEEE T ransactions on Signal Pr ocessing , vol. 60, no. 9, pp. 4658–4671, 2012. 13

Off-grid Direction of Arrival Estimation Using Sparse Bayesian Inference

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment