Compressed Anomaly Detection with Multiple Mixed Observations

Compr essed Anomaly Detection with Multiple Mixed Observ ations Natalie Durgin, Rachel Grotheer , Chenxi Huang, Shuang Li, Anna Ma, Deanna Needell and Jing Qin Abstract W e consider a collection of independent random v ariables that are iden- tically distributed, except for a small subset which follo ws a different, anomalous distribution. W e study the problem of detecting which random variables in the col- lection are governed by the anomalous distribution. Recent work proposes to solve this problem by conducting hypothesis tests based on mixed observ ations (e.g. linear combinations) of the random v ariables. Recognizing the connection between taking mixed observations and compressed sensing, we view the problem as recovering the “support” (index set) of the anomalous random variables from multiple mea- Natalie Durgin Spicew orks Austin, TX, 78746. e-mail: njdurgin@gmail.com Rachel Grotheer Goucher College Baltimore, MD 21204. Chenxi Huang Y ale Univ ersity New Ha ven, CT 06511. e-mail: chenxi.huang@yale.edu Shuang Li Colorado School of Mines Golden, CO 80401. Anna Ma Claremont Graduate Univ ersity Claremont, CA 91711. Deanna Needell Univ ersity of California, Los Angeles Los Angeles, CA 90095. Jing Qin Montana State Univ ersity Bozeman, MT 59717. 1 2 Durgin, Grotheer , Huang, Li, Ma, Needell, Qin surement vectors (MMVs). Many algorithms hav e been dev eloped for recovering jointly sparse signals and their support from MMVs. W e establish the theoretical and empirical effecti veness of these algorithms in detecting anomalies. W e also ex- tend the LASSO algorithm to an MMV version for our purpose. Further , we perform experiments on synthetic data, consisting of samples from the random variables, to explore the trade-off between the number of mixed observations per sample and the number of samples required to detect anomalies. 1 Introduction The problem of anomaly detection has been the focus of interest in many ﬁelds of science and engineering, including network tomography , cognitive radio, and radar [ 36 , 40 , 27 , 4 ]. In this paper, we study the problem of identifying a small number of anomalously distrib uted random variables within a much larger collec- tion of independent and otherwise identically distributed random v ariables. W e call the random v ariables following the anomalous distribution anomalous random vari- ables . A con ventional approach to detecting these anomalous random variables is to sample from each random v ariable indi vidually and then apply hypothesis testing techniques [ 29 , 30 , 31 , 31 ]. A recent paper [ 13 ] proposes to perform hypothesis testing on mixed observa- tions (e.g. linear combinations) of random variables instead of on samples from in- dividual random variables. They call this technique compressed hypothesis testing . Such an approach is motiv ated by the recent dev elopment of compr essed sensing [ 7 , 15 , 23 , 21 ], a signal processing paradigm that shows a small number of random linear measurements of a signal is suf ﬁcient for accurate reconstruction. No w a large body of w ork in this area shows that optimization-based [ 17 , 8 , 16 , 10 , 37 ] and iter - ativ e [ 38 , 33 , 6 ] methods can reconstruct the signal accurately and efﬁciently when the samples are taken via a sensing matrix satisfying certain incoherence properties [ 8 , 9 ]. Compressed sensing is also studied in a Bayesian framew ork, where signals are assumed to obey some prior distrib ution [ 25 , 41 , 3 ]. The results presented in [ 13 ] show that the “mixed” measurement approach achiev es better detection accuracy from fewer samples when compared to the con- ventional “un-mixed” approach. Ho we ver , compressed hypothesis testing requires that the distributions of the random v ariables are known a priori , which may not be av ailable in practice. Further , as the authors pointed out, their proposed approach requires conducting a large number of hypothesis tests, especially when the number of random v ariables in the collection is large, rendering such an approach compu- tationally prohibitive. T wo efﬁcient algorithms are proposed as alternati ves in [ 13 ], but no analytical study of their performance is pro vided. W e propose ne w methods for detecting anomalous random v ariables that require minimal knowledge of the distributions, are computationally efﬁcient, and whose performance is easy to characterize. W e begin by generalizing the compressed hy- pothesis testing method and posing our problem as a multiple measurement vector Compressed Anomaly Detection with Multiple Mixed Observ ations 3 (MMV) problem [ 24 , 2 , 19 , 18 , 11 , 14 , 32 , 5 ]. In the MMV compressed sensing set- ting, a collection of signals are recov ered simultaneously , under the assumption that they ha ve some commonalities, such as sharing the same support. A related vein of work in volves signals that are smoothly v arying, where the support may not be con- sistent but changes slowly over time [ 1 , 22 , 35 ]. While the compressed hypothesis testing in [ 13 ] is certainly motiv ated by compressed sensing techniques, the authors do not formally frame the anomaly detection problem in the compressed sensing set- ting. Also, they do not focus on compressed sensing algorithms that might eliminate the need for prior knowledge of the distributions, and might lead to more efﬁcient detection for large collections of random v ariables. In the following, we view the collection of random variables as a random vector and aim to identify the indices of the anomalous random v ariables within the random vector . W e also draw an analogy between the collection of independent samples from the random vector and an ensemble of signals where in practice these signals often become av ailable ov er time. More speciﬁcally , we consider a random vector , X = ( X 1 , . . . , X N ) , where the X n ’ s are independent random variables. W e assume that each X n follows one of two distributions, D 1 , D 2 . W e call D 1 the pre valent distribution , and D 2 the anomalous distribution . W e let N = { n ∈ N : 1 ≤ n ≤ N } denote the index set of the random v ariables, X n , and let K denote the index set of the K random v ariables that follow the anomalous distribution. Let x (  , t ) ∈ R N denote the independent realization of the random vector at time t . At each time-step t , we obtain M mixed observ ations by applying the sensing matrix φ t ∈ R M × N , y t = φ t x (  , t ) , 1 ≤ t ≤ T , with y t ∈ R M . Thus the goal of the anomaly detection problem in this setting is to recov er the index set K from the MMVs y t , t = 1 , · · · , T . The signals x (  , t ) in our formulation are not necessarily sparse and may hav e dif- ferent supports since they are samples from a random vector and are changing over time. Nev ertheless, there is still a close connection between our formulation and that for recovering the common sparse support of a collection of signals from MMVs. The index set of the anomalous random v ariables, which corresponds to the index set of the anomalies (realizations of anomalous random variables) in the signals x (  , t ) , is shared by all signals. This inde x set can thus be vie wed as the common “support” of the anomalies in the signals, which moti vates us to consider the applicability of many MMV algorithms designed for signal reconstruction. Further , the analytical studies of many of these algorithms are readily av ailable. W e therefore in vestigate which of these MMV algorithms can be applied or adapted to the anomaly detection problem under consideration and analyze their performance in detection accuracy in theory and through numerical e xperiments. W e focus on algorithms presented in [ 2 ]. 4 Durgin, Grotheer , Huang, Li, Ma, Needell, Qin 1.1 Contributions In this paper , by extending the deﬁnitions of two so-called joint sparsity models (JSMs) from [ 2 ], we introduce two new signal models, JSM-2R and JSM-3R, for the problem of anomaly detection. For JSM-2R and JSM-3R signals, we adapt sev- eral MMV signal reconstruction algorithms to anomaly detection. Additionally , we dev elop a new algorithm for the JSM-2R model that extends the Least Absolute Shrinkage and Selection Operator (LASSO) algorithm [ 12 ] to the MMV frame- work. W e show theoretically and numerically that these algorithms accurately detect the anomalous random v ariables. W e also provide numerical results which demon- strate the trade-off between the number of time-steps, and the number of mixed observations per time-step needed to detect the anomalies. 1.2 Organization In Section 2 , we introduce the models JSM-2R, JSM-3R and the four algorithms we hav e repurposed from MMV signal recovery into MMV anomaly detection, as well as our new LASSO algorithm. W e also provide theoretical guarantees in this sec- tion. In Section 3 , we explore the performance of these algorithms by conducting numerical experiments for some strategic choices of the parameters in volv ed. Fi- nally , we conclude in Section 4 . T o help keep track of notation, we pro vide a handy reference table in Section 4 . W e adopt the conv ention that random variables will be upper case and their realizations will be lower case. All matrix entries will ha ve two, subscripted indices. The ﬁrst index will indicate the ro w position, the second will indicate the column position. 2 Method In this section, we introduce two new signal models for the anomaly detection prob- lem and describe ﬁv e algorithms for detecting anomalous random variables under these signal models. W e also provide theoretical guarantees for the algorithms. Recall that we consider the problem of detecting K anomalous random variables from a collection of N random variables where K  N . The anomalous random variables ha ve a different probability distribution from that of the remaining N − K random variables. W e seek to identify the K anomalous random variables, from T independent realizations of the N random variables. T o emphasize our framing of this random variable problem as a compressed sensing problem, we refer to the in- dependent realizations as signals. These T signals hav e an important commonality: they share the same indices of anomalous entries (realizations of anomalous random variables). Compressed Anomaly Detection with Multiple Mixed Observ ations 5 Commonality among signals has already been explored in the ﬁeld of distributed compressed sensing for recovering signals that ha ve speciﬁc correlation among them. Three joint sparsity models (JSMs) were introduced in [ 2 ] to characterize dif- ferent correlation structures. T o utilize the commonality of the signals for anomaly detection, we propose two new signal models that are moti vated by two of the JSMs deﬁned in [ 2 ], namely , JSM-2 and JSM-3. Since the signals under consideration are realizations of random v ariables, we term the ne w models JSM-2R and JSM-3R, re- spectiv ely , where the appended “R” indicates the “random variable” version of the existing JSMs. Before we deﬁne the new models, W e ﬁrst brieﬂy describe JSM-2 and JSM-3. The JSM-2 signals are jointly sparse signals that share the same support (the indices of non-zero entries). The JSM-3 signals consist of two components: a non-sparse “common component” shared by all signals and a sparse “innovation component” that is different for each signal. But the innov ation components of the JSM-3 sig- nals share the same support. W e next extend these deﬁnitions to the signals in the anomaly detection setting. The ne w JSM-2R and JSM-3R models are deﬁned as follows: Deﬁnition 1 (JSM-2R and JSM-3R). Let the random variable X n ∼ D 1 if n / ∈ K and X n ∼ D 2 if n ∈ K where K is the set of the anomalous indices. For a signal ensemble x ∈ R N × T where each of its entries x ( n , t ) denotes the realization of X n at time t , 1. x is a JSM-2R signal ensemble when: | x ( n , t ) | is small if n / ∈ K and | x ( n , t ) | is large if n ∈ K ; 2. x is a JSM-3R signal ensemble when: x ( n , t ) = x C n + x I ( n , t ) such that | x I ( n , t ) | is small if n / ∈ K and | x I ( n , t ) | is large if n ∈ K . x C n is a common component shared by all t , and x I ( n , t ) is an innov ation component that is different for dif ferent t . The JSM-2R signal model assumes a small amplitude for variables generated from the prev alent distribution and a large amplitude for variables generated from the anomalous distrib ution. Such a model characterizes a scenario where anomalies exhibit large spikes. This model relates to a sparse signal model where the support of the sparse signal corresponds to the set of indices of the anomalous random v ari- ables. In fact, when D 1 = N ( 0 , σ 2 ) and D 2 = N ( µ , σ 2 ) with µ 6 = 0, the JSM-2R signal is a sparse signal with additive Gaussian noise. An example of anomalies following the JSM-2R model is a network where some of the sensors completely malfunction and produce signals with vastly dif ferent amplitudes than the rest of the sensors. Different from the JSM-2R signals, the JSM-3R signal model introduced above does not hav e constraints on the amplitude of the signal entries x ( n , t ) . Rather the signals at dif ferent time-steps are assumed to share an unkno wn common compo- nent x C n while having a different innov ation component x I ( n , t ) for signals at different time-steps. Of note, the common component x C n from the pre valent distribution may or may not be the same as that from the anomalous distribution. Further, the inno va- tion component x I ( n , t ) is assumed to follow the JSM-2R signal model. Such a model 6 Durgin, Grotheer , Huang, Li, Ma, Needell, Qin characterizes a scenario where there e xists a background signal that does not change ov er time and the anomalies exhibit large spikes on top of the background signal. Because of the common component, the JSM-3R signals no longer correspond to a sparse signal model. The JSM-3R model has applications in geophysical monitoring where a constant background signal is present and anomalies appear as large spikes of erratic behavior . Figure 1 provides a visual illustration of the model nuances. 2.1 Algorithms W e brieﬂy describe the ﬁve algorithms we study in this paper, among which three are for JSM-2R signals and two are for JSM-3R signals. T wo of the algorithms for JSM-2R signals were originally proposed for recov ering JSM-2 signals, including the one-step gr eedy algorithm (OSGA) and the multiple measur ement vector simul- taneous orthogonal matching pursuit (MMV -SOMP) algorithm. W e further propose a new MMV version of the LASSO algorithm for detecting anomalies for JSM-2R signals and in vestigate its performance via numerical experiments. The two algo- rithms for JSM-3R were also proposed in [ 2 ] for recovering JSM-3 signals, includ- ing the T ranspose Estimation of Common Component (TECC) algorithm and the Alternating Common and Innovation Estimation (ACIE) algorithm. For each of the presented algorithms, the goal is to identify the indices of the anomalous random v ariables from the mixed measurements y t = φ t x (  , t ) for t = 1 , 2 , . . . , T . The number of anomalies K is assumed to be known a pri- ori . W e ﬁrst describe three algorithms we applied to anomaly detection for JSM-2R signals. 2.1.1 OSGA The OSGA algorithm is a non-iterati ve greedy algorithm introduced in [ 2 ] to re- cov er the support of JSM-2 signals based on inner products of the measurement and columns of the sensing matrix (Algorithm 1 ). W e show in Theorem 1 that with some condition on the prev alent and anomalous distributions, the OSGA algorithm is able to recov er the anomaly indices under the JSM-2R model, using a small num- ber of measurements per time-step. Although the OSGA algorithm is shown to work asymptotically , it may not perform well when only a small number of time-steps are av ailable. Empirical evidence has conﬁrmed this conjecture when the OSGA algo- rithm is used to reconstruct JSM-2 signals [ 2 ]. Thus we further consider approaches like matching pursuit [ 28 , 34 ] for our problem. Ne xt, we describe the MMV version of orthogonal matching pursuit algorithm proposed in [ 2 ]. Compressed Anomaly Detection with Multiple Mixed Observ ations 7 1 2 3 4 5 6 7 8 9 10 t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 n JSM-2 1 2 3 4 5 6 7 8 9 10 t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 n JSM-2R 1 2 3 4 5 6 7 8 9 10 t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 n JSM-3 1 2 3 4 5 6 7 8 9 10 t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 n JSM-3R 2.5 0.0 2.5 5.0 7.5 10.0 Fig. 1 Depiction of the existing joint sparsity models (JSM-2 and JSM-3) and the new models dev eloped for anomaly detection (JSM-2R and JSM-3R). The distributions used to generate this example are the same as the ones used for the numerical experiments in Section 3 , see T able 1 . The index set of the anomalies is K = { 6 , 10 } . 8 Durgin, Grotheer , Huang, Li, Ma, Needell, Qin Algorithm 1 OSGA 1: Input: y 1 , . . . , y T , φ t , K . 2: Output: b K . 3: for n = 1 , 2 , . . . , N do 4: Compute ξ n = 1 T ∑ T t = 1 h y t , φ t (  , n ) i 2 5: end for 6: return b K = { n , for the K largest ξ n } 2.1.2 MMV -SOMP The MMV -SOMP algorithm is an iterativ e greedy pursuit algorithm for recov ery of jointly sparse signals. SOMP was ﬁrst proposed in [ 39 ] and was adapted to the MMV frame work in [ 2 ]. Since our focus is not on signal recovery but on detect- ing anomalous entries, we adapt this algorithm for our JSM-2R signal model. The adapted algorithm is presented in Algorithm 2 , which identiﬁes the anomaly indices one at a time. In each iteration, the column index of the sensing matrices that ac- counts for the largest residual across signals of all time-steps is selected. Then the remaining columns of each sensing matrix (for each time-step) are orthogonalized. The algorithm stops after K iterations where K is the number of anomalous random variables. W e show through numerical experiments in Section 3 that the adapted MMV -SOMP algorithm performs better than the OSGA algorithm for a small num- ber of time-steps. 2.1.3 MMV -LASSO The LASSO algorithm aims to ﬁnd a sparse solution to the regression problem by constraining the L 1 norm of the solution [ 12 ]. The LASSO algorithm was also con- sidered in [ 13 ] as an efﬁcient algorithm for anomaly detection from mixed obser- vations. Howe ver , the authors of [ 13 ] considered the LASSO algorithm when us- ing only one measurement at each time-step. In this paper, we further extend the LASSO algorithm to a more general setting for MMV and term it the MMV -LASSO algorithm. The MMV -LASSO algorithm is described in Algorithm 3 . The measure- ments y t ∈ R M up to T time-steps are concatenated vertically to become a vector y ∈ R ( M T ) × 1 ; the sensing matrices φ t ∈ R M × N are also concatenated vertically to be- come φ ∈ R ( M T ) × N . The concatenated measurements and sensing matrices are then fed to the regular LASSO algorithm, where the anomaly indices are found by taking indices corresponding to the K largest amplitudes of the estimate. The LASSO prob- lem, that is, Step 4 in Algorithm 3, can be tackled by various approaches [ 20 , 26 ], which is out of scope of this paper . W e next describe two algorithms for anomaly detection for JSM-3R signals. Compressed Anomaly Detection with Multiple Mixed Observ ations 9 Algorithm 2 MMV -SOMP 1: Input: y 1 , . . . , y T , φ t , K . 2: Output: b K . 3: Initialize: b K = / 0, residual r 0 t = y t . 4: for k = 1 , . . . , K do . 5: Select n k = arg max n T ∑ t = 1 |h r k − 1 t , φ t (  , n ) i| k φ t (  , n ) k 2 . 6: Update b K = [ b K , n k ] . 7: Orthogonalize selected basis vector against previously selected vectors for all t , 1 ≤ t ≤ T : γ 0 t = φ t (  , k ) , if k = 1, γ k t = φ t (  , n k ) − k − 1 ∑ l = 0 h φ t (  , n k ) , γ l t i k γ l t k 2 2 γ l t , if k > 1. 8: Update the residual for all t , 1 ≤ t ≤ T , r k t = r k − 1 t − h r k − 1 t , γ k t i k γ k t k 2 2 γ k t . 9: end for 10: retur n b K Algorithm 3 MMV -LASSO 1: Input: y 1 , . . . , y T , φ t , K . 2: Output: b K . 3: Let y = [ y T 1 , . . . , y T T ] T and φ = [ φ T 1 , . . . , φ T T ] T 4: Solve ˆ x = arg min x 1 2 k y − φ x k 2 2 + λ k x k 1 5: Let ˆ x n denote the n -th element of ˆ x 6: return b K = { n , for the K largest | ˆ x n |} 2.1.4 TECC The key difference between JSM-2R and JSM-3R signals is that JSM-3R signals share a common component that is unkno wn. Thus the tw o algorithms for the JSM- 3R signals aim to ﬁrst estimate the common component from the mixed measure- ment and subtract the contribution of this component from the measurement. The TECC algorithm was proposed in [ 2 ] for recovering JSM-3 signals. W e also adapt the algorithm to focus only on detecting the anomalous indices of JSM-3R signals, and the adapted algorithm can be found in Algorithm 4 . The ﬁrst step of the TECC algorithm estimates the common component of the JSM-3R signals. Using this es- timate, the contribution of the remaining inno vation component to the measurement can be estimated. Then algorithms for JSM-2R signals can be applied to identify the 10 Durgin, Grotheer , Huang, Li, Ma, Needell, Qin anomaly indices. W e show in Theorem 2 that the TECC algorithm is able to iden- tify the anomalous v ariables under some conditions on the prev alent and anomalous distributions. Similar to the OSGA algorithm, while Theorem 2 guarantees the suc- cess of the TECC algorithm in the asymptotic case as T goes to inﬁnity , it may not perform well for a small T . Next we describe an alternati ve algorithm also proposed in [ 2 ] for cases with a small T . Algorithm 4 TECC 1: Input: y 1 , . . . , y T , φ t , K . 2: Output: b K . 3: Let y = [ y T 1 , . . . , y T T ] T , and φ = [ φ T 1 , . . . , φ T T ] T 4: Calculate b x C = 1 T M φ T y . 5: Calculate b y t = y t − φ t b x C . 6: Estimate b K from b y t by Algorithm 1 , 2 or 3 7: return b K 2.1.5 A CIE The A CIE algorithm is an extension of the TECC algorithm, also introduced in [ 2 ], based on the observation that the initial estimate of the common component may not be sufﬁciently accurate for subsequent steps. Instead of one-time estimation in the TECC algorithm, the A CIE algorithm iterati vely reﬁnes the estimates of the com- mon component and the innovation components. The ACIE algorithm can also be easily adapted for the JSM-3R signals for anomaly detection. In the A CIE algorithm described in Algorithm 5 , we ﬁrst obtain an initial estimate of the anomaly index set b K using the TECC algorithm. Then for each iteration, we build a basis B t for R M where M is the number of measurements at each time-step: B t = [ φ t , b K , q t ] , where φ t , b K is the subset of the basis vectors in φ t corresponding to the indices in b K and q t has orthonormal columns that spans the orthogonal complement of φ t , b K . Then we can project the measurements onto q t to obtain the part of the measurement caused by signals not in K : e y t = q t T y t , (1) e φ t = q t T φ t . (2) Then e y t and e φ t are used to reﬁne the estimate of the common component. After subtracting the contribution of this estimated common component, algorithms such as OSGA and MMV -SOMP described above can be applied to detect the anomalies. Compressed Anomaly Detection with Multiple Mixed Observ ations 11 Algorithm 5 A CIE 1: Input: y 1 , . . . , y T , φ t , K , L (iteration counter). 2: Output: b K . 3: Let y = [ y T 1 , . . . , y T T ] T 4: Obtain an initial estimate of b K from Algorithm 4 5: for l = 1 , 2 , . . . , L do 6: Update e y t and e φ t according to Equations ( 1 ) and ( 2 ) for all t , 1 ≤ t ≤ T 7: Update e x C = e φ † e y , where e y = [ e y T 1 , · · · , e y T T ] T , e φ = [ e φ T 1 , . . . , e φ T T ] and e φ † = ( e φ T e φ ) − 1 e φ T 8: end for 9: Calculate b y t = y t − φ t e x C 10: Estimate b K from b y t by Algorithm 1 , 2 or 3 11: retur n b K 2.2 Theoretical Guarantees In this section we show theoretically that Algorithm 1 and Algorithm 4 (coupled with Algorithm 1 in step 6) can detect anomalies for the JSM-2R and JSM-3R set- tings, respectiv ely . Recall that Algorithm 1 is designed for JSM-2R signals where variables gen- erated from the prev alent distribution are much smaller in amplitude than those from the anomalous distribution. The following theorem shows that for JSM-2R signals, the OSGA algorithm is able to identify the indices of the anomalous vari- ables asymptotically , with very few measurements at each time-step. Theorem 1. [Adapted fr om [ 2 ] Theor em 8] Let the M × N sensing matrix, φ t , con- tain entries that are i.i.d ∼ N ( 0 , 1 ) at each time-step t . Suppose the random vari- ables, X n , are distributed with D 1 = N ( 0 , σ 2 1 ) if n / ∈ K and D 2 = N ( µ 2 , σ 2 2 ) if n ∈ K . Assuming µ 2 2 + σ 2 2 > σ 2 1 , then with M ≥ 1 measurements per time-step, OSGA r ecovers K with pr obability appr oaching one as T → ∞ . Before diving into the proof of Theorem 1 , we ﬁrst observe that the signals cor- respond to the JSM-2R signals: with a zero mean and a potentially small v ariance σ 2 1 for the pre valent distribution D 1 , the signal entry x ( n , t ) , n / ∈ K (i.e. the realiza- tion of X n at the time-step t ) is expected to hav e small amplitude. In contrast, with a non-zero mean µ 2 and a similar or possibly larger variance σ 2 2 for the anomalous distribution D 2 , the amplitude of x ( n , t ) , n ∈ K is expected to be much larger . Pr oof. W e assume, for conv enience and without loss of generality , that the anoma- lous random variables are indexed by , K = { 1 , 2 , . . . , K } , and the prev alent random variables are indexed by N \ K = { K + 1 , . . . , N } . Consider that the test statistic ξ n = 1 T ∑ T t = 1 h y t , φ t (  , n ) i 2 is the sample mean of the random variable h Y , Φ (  , n ) i 2 , so by the Law of Lar ge Numbers, lim T → ∞ ξ n = E [ h Y , Φ (  , n ) i 2 ] . W e select an arbitrary index n from each of the anomalous random variable index set and the prev alent random variable inde x set, and compute E [ h Y , Φ (  , n ) i 2 ] in each 12 Durgin, Grotheer , Huang, Li, Ma, Needell, Qin case. As the ﬁnal step, we compare the expected values of the two ξ n and establish that they are distinguishable under very general conditions. W ithout loss of gener- ality , we select n = K + 1 for the “pre valent case” and n = 1 for the “anomalous case”. Note that [ 2 ] refers to these cases respectiv ely as the “bad statistics” and the “good statistics” in their setting. For them, “bad” reﬂects an incorrect estimate of the sparse support and “good” reﬂects a correct estimate of the sparse support. Pre valent Case: Substituting Φ X for Y in h Y , Φ (  , K + 1 ) i and rearranging we obtain h Y , Φ (  , K + 1 ) i = ∑ N n = 1 X n h Φ (  , n ) , Φ (  , K + 1 ) i . W e can then write, E [ h Y , Φ (  , K + 1 ) i 2 ] = E   N ∑ n = 1 X n h Φ (  , n ) , Φ (  , K + 1 ) i ! 2   = E " N ∑ n = 1 ( X n ) 2 h Φ (  , n ) , Φ (  , K + 1 ) i 2 # + E    N ∑ n = 1 N ∑ l = 1 l 6 = n X n X l h Φ (  , l ) , Φ (  , K + 1 ) ih Φ (  , n ) , Φ (  , K + 1 ) i    = N ∑ n = 1 E [( X n ) 2 ] E [ h Φ (  , n ) , Φ (  , K + 1 ) i 2 ] + N ∑ n = 1 N ∑ l = 1 l 6 = n E [ X n ] E [ X l ] E [ h Φ (  , l ) , Φ (  , K + 1 ) ih Φ (  , n ) , Φ (  , K + 1 ) i ] . The last step follows from the independence of Φ and X and the independence of the X n ’ s from each other . W e claim that the cross-terms above sum to zero. T o see this, we set Φ (  , l ) = a , Φ (  , K + 1 ) = b and Φ (  , n ) = c , where the entries of the vectors a , b , c are all i.i.d. N ( 0 , 1 ) . W e note that if l , K + 1 , and n are mutually distinct, then a , b , c are mutually independent. In this case we hav e, E [ h a , b ih c , b i ] = E [ a T bc T b ] = E [ a T ] E [ bc T b ] = 0 . Since the cross-terms assume l 6 = n , we consider the cases when either n = K + 1 or l = K + 1. In the case where n = K + 1 we hav e, E [ h a , b ih b , b i ] = E [ a T bb T b ] = E [ a T ] E [ bb T b ] = 0 . Similarly , in the case where l = K + 1 we hav e, Compressed Anomaly Detection with Multiple Mixed Observ ations 13 E [ h b , b ih c , b i ] = E [ b T bc T b ] = E [ c T bb T b ] = E [ c T ] E [ bb T b ] = 0 . Thus, all cross-terms vanish so returning to our original goal we may claim, E [ h Y , Φ (  , K + 1 ) i 2 ] = N ∑ n = 1 E [( X n ) 2 ] E [ h Φ (  , n ) , Φ (  , K + 1 ) i 2 ] = K ∑ n = 1 E [( X n ) 2 ] E [ h Φ (  , n ) , Φ (  , K + 1 ) i 2 ] + E [( X K + 1 ) 2 ] E [ k Φ (  , K + 1 ) k 4 ] + N ∑ n = K + 2 E [( X n ) 2 ] E [ h Φ (  , n ) , Φ (  , K + 1 ) i 2 ] . Examining each expected value individually , we recall that for n ∈ { 1 , . . . , K } = K the X n were distributed with D 2 and thus E [( X n ) 2 ] = E [ X n ] 2 + V ar ( X n ) = µ 2 2 + σ 2 2 . Recalling that the rest of the X n are distributed with D 1 which has µ = 0, we have that E [( X n ) 2 ] = σ 2 1 in the subsequent cases. In [ 2 ] they establish that E [ k Φ (  , K + 1 ) k 4 ] = M ( M + 2 ) and E [ h Φ (  , n ) , Φ (  , K + 1 ) i 2 ] = M , and we may use these results without further argument because we make the same assumptions about Φ . Finally , substituting the expected values we have just calculated, we hav e that as T grows lar ge, the statistic ξ n when n / ∈ K con verges to E [ h Y , Φ (  , K + 1 ) i 2 ] = K ( µ 2 2 + σ 2 2 ) M + σ 2 1 M ( M + 2 ) + ( N − K − 1 ) σ 2 1 M (3) = M [ K ( µ 2 2 + σ 2 2 ) + ( M + 1 + N − K ) σ 2 1 ] . Anomalous Case: W ith n = 1, we proceed as in the previous case, E [ h Y , Φ (  , 1 ) i 2 ] = E   N ∑ n = 1 X n h Φ (  , n ) , Φ (  , 1 ) i ! 2   = N ∑ n = 1 E [( X n ) 2 ] E [ h Φ (  , n ) , Φ (  , 1 ) i 2 ] , = E [( X 1 ) 2 ] E [ k Φ (  , 1 ) k 4 ] + K ∑ n = 2 E [( X n ) 2 ] E [ h Φ (  , n ) , Φ (  , 1 ) i 2 ] + N ∑ n = K + 1 E [( X n ) 2 ] E [ h Φ (  , n ) , Φ (  , 1 ) i 2 ] = ( µ 2 2 + σ 2 2 ) M ( M + 2 ) + ( K − 2 )( µ 2 2 + σ 2 2 ) M + ( N − K ) σ 2 1 M (4) = M [( M + 1 + K )( µ 2 2 + σ 2 2 ) + ( N − K ) σ 2 1 ] . 14 Durgin, Grotheer , Huang, Li, Ma, Needell, Qin Combining the results of ( 3 ) and ( 4 ), we hav e lim T → ∞ ξ n = ( M [( M + 1 + K )( µ 2 2 + σ 2 2 ) + ( N − K ) σ 2 1 ] n ∈ K M [ K ( µ 2 2 + σ 2 2 ) + ( M + 1 + N − K ) σ 2 1 ] n / ∈ K . The difference in the tw o expectations is thus, M ( M + 1 )( µ 2 2 + σ 2 2 − σ 2 1 ) . For any M ≥ 1 and µ 2 2 + σ 2 2 > σ 2 1 , the expected v alue of ξ n in the “anomalous case” is strictly larger than the expected value of ξ n in the “prev alent case”. Therefore, as T increases, OSGA can distinguish between the two expected values of ξ n with ov erwhelming probability . u t The next theorem shows that asymptotically , Algorithm 4 is able to detect anoma- lous variables with very few measurements at each time-step, for JSM-3R signals. Recall that JSM-3R signals hav e an unknown common component shared by sig- nals at all time-steps, while each signal has a different innov ation component that follows the JSM-2R model. The follo wing theorem and proof assume that Algo- rithm 1 is implemented for step 6 of Algorithm 4 . Once results like Theorem 1 exist for Algorithms 2 and 3 , then any JSM-2R algorithm could be used in step 6, and Theorem 2 would still hold. Theorem 2 (Adapted from [ 2 ] Theorem 10). Let the M × N sensing matrix φ t at each time-step t contain entries that are i.i.d. ∼ N ( 0 , 1 ) . F or random variables X n that ar e distributed with D 1 = N ( µ 1 , σ 2 1 ) if n / ∈ K and D 2 = N ( µ 2 , σ 2 2 ) if n ∈ K , if σ 2 2 > σ 2 1 and with M ≥ 1 , TECC algorithm (with OSGA) r ecovers K with probability appr oaching one as T → ∞ . W e ﬁrst note that the signals in Theorem 2 correspond to the JSM-3R signals: for n / ∈ K , the signal entries x ( n , t ) can be written as x ( n , t ) = µ 1 + x I n , t where x I n , t are i.i.d. ∼ N ( 0 , σ 2 1 ) . W ith zero-mean and a potentially small variance, the amplitude of x I n , t , n / ∈ K is expected to be small. For n ∈ K , the signal entries x ( n , t ) can be written as x ( n , t ) = µ 2 + x I n , t where x I n , t are i.i.d. ∼ N ( 0 , σ 2 2 ) . W ith a larger v ariance σ 2 2 , the amplitude of x I n , t , n ∈ K is expected to be much larger . Pr oof. By the common component estimation from Algorithm 4 , we hav e: b x C = 1 T M φ T y = 1 M 1 T T ∑ t = 1 φ T t y t = 1 M 1 T T ∑ t = 1 φ T t φ t x (  , t ) ! . Compressed Anomaly Detection with Multiple Mixed Observ ations 15 Note that this is 1 / M times the sample mean of the random v ariable Φ T Φ X . Letting I N denote the N × N identity matrix, we note that since Φ has independent N ( 0 , 1 ) entries then E [ Φ T Φ ] = M I N . Since Φ is fully independent of X , 1 M E [ Φ T Φ X ] = 1 M E [ Φ T Φ ] E [ X ] = I N E [ X ] = E [ X ] . In voking the Law of Lar ge Numbers, we have lim T → ∞ b x C = E [ X ] . Let b X = X − b x C , then as T → ∞ , b X n is distributed as N ( 0 , σ 2 1 ) if n / ∈ K and N ( 0 , σ 2 2 ) if n ∈ K . Since b Y = Y − Φ b x C = Φ ( X − b x C ) = Φ b X , it follows from Theorem 1 that with M ≥ 1 and σ 2 2 > σ 2 1 , the TECC with OSGA algorithm recovers K with probability approaching one as T → ∞ . u t 3 Experiments In this section, we ev aluate numerically the performance of Algorithms 1 , 2 , 3 , 4 and 5 for anomaly detection. More speciﬁcally , we e xamine the success rate of determining the anomalous index set K from the signal matrix x ∈ R N × T , whose columns are signals obtained at each time-step and share the same anomalous in- dices. The performance is assessed under various settings, by varying the number of anomalies, the number of columns in x (i.e. the time-steps) and the number of mixed measurement M at each time-step. Our focus is on the trade-off between the number of measurements M and the number of time-steps T required to identify K for varying numbers of anomalies. In all e xperiments, the measurement matrices φ t ∈ R M × N comprise independent, N ( 0 , 1 ) entries and the measurement vectors y t ∈ R M are calculated by y t = φ t x (  , t ) for t = 1 , . . . , T . T o obtain an estimate of an algorithm’ s recovery success rate with high conﬁdence, instead of using a ﬁxed number of random trials across the dif- ferent parameter combinations, we adaptively determine the necessary number of trials with a Jeffreys interval, a Bayesian two-tailed binomial proportion conﬁdence interval. When the 95% conﬁdence interval around the true success rate shrinks to a width smaller than 0.1, we report the current proportion of successes as the reco very accuracy for the algorithm. The signals (i.e. x (  , t ) ) are generated under two models corresponding to the JSM-2R and JSM-3R signal deﬁnitions introduced in Section 2 . Algorithms 1 , 2 and 3 are applied to the JSM-2R signals while Algorithms 4 and 5 are applied to the JSM-3R signals. The experiments are summarized in T able 1 . The JSM-2R e xperiments assume a mean zero for the prev alent distribution and a much larger mean for the anomalous distribution while letting the variance be small. As shown in the previous section, signals generated from these distributions satisfy the deﬁnitions of JSM-2R. For 16 Durgin, Grotheer , Huang, Li, Ma, Needell, Qin JSM-3R experiments, we explore two settings: First, the pre valent and anomalous distributions are assumed to have different means; second, the two distributions hav e the same mean. Recall from the previous section, we show that the means of the distributions are the common components for the JSM-3 signals generated from these distrib utions. Note that the algorithms for JSM-3R signals have no knowledge of the mean of the prev alent or anomalous distributions. T able 1 Signal model D 1 D 2 Algorithms JSM-2R N ( 0 , 1 ) N ( 7 , 1 ) OSGA, MMV -SOMP , MMV -LASSO JSM-3R N ( 7 , 1 ) N ( 0 , 10 ) TECC, A CIE JSM-3R N ( 7 , 1 ) N ( 7 , 10 ) TECC, A CIE W e chose the distributions in T able 1 for our numerical simulations to remain consistent with [ 13 ]. W e observe, in the JSM-2R experiments, that the distributions N ( 0 , 1 ) and N ( 7 , 1 ) have their means separated by three standard deviations each, with one additional standard de viation in between for good measure. This ensures that the distrib utions are statistically distinct from each other . W e hav e not e xplored how the detection accuracy is affected as we vary the proportion of overlap in the two distrib utions. 3.1 JSM-2R W e now present the results of recovering the anomalous index set for the JSM- 2R signals. The signal length is ﬁxed at N = 100 and results of K = 1 , 5 , and 10 anomalies are presented. For each K value, K random variables follow the distri- bution N ( 7 , 1 ) and the other N − K random variables follow another distribution N ( 0 , 1 ) . The goal is to recover the index set K of these K random variables. Fig- ure 2 shows the success rate of identifying K for the three K values using the OSGA Algorithm. Each dot in the ﬁgure denotes the success rate for a speciﬁc M (number of measurements per time-step) and a speciﬁc T (number of time-steps) estimated from a number of trials, and the value is indicated by the color (see the colorbar). Both M and T tak e values from 1 to 100. Figure 3 and 4 plot the success rate for MMV -SOMP and MMV -LASSO algorithms respectiv ely . For all three algorithms, the success rate of anomaly identiﬁcation increases as the number of measurements M increases and/or as the number of time-steps T increases. A 100% success of identiﬁcation is obtained with a sufﬁciently large number of measurements and time-steps. There are some important differences in performance among the three algorithms. Firstly , for the OSGA and the MMV -SOMP algorithms, with a sufﬁciently large number of time-steps, the minimum number of measurements at each time-step re- Compressed Anomaly Detection with Multiple Mixed Observ ations 17 quired for anomaly detection increases with the number of anomalies present. The MMV -LASSO performance seems less affected by varying the number of anoma- lies than the performance of the other two algorithms. Secondly , comparing Fig- ure 2 and 3 reveals that MMV -SOMP requires fewer time-steps than the OSGA algorithm to reach 100% success for a giv en number of measurements. Thirdly , the MMV -LASSO algorithm requires signiﬁcantly fe wer measurements and time-steps for 100% success compared with OSGA and MMV -SOMP . Finally , there is asym- metry between the effect of increasing the number of measurements versus that of increasing the number of time-steps on the performance of OSGA and MMV -SOMP . For these tw o algorithms, increasing the number of measurements is more effecti ve than increasing the number of time-steps for improving the performance. No obvi- ous asymmetry of recovery performance is found for the MMV -LASSO algorithm. The near symmetry phenomenon of the MMV -LASSO is expected since doubling either M or T doubles the number of ro ws in the matrix φ in Algorithm 3 , providing similar amounts of information for the algorithm. For comparison with a benchmark, we note that in [ 13 ], the authors propose LASSO as an efﬁcient algorithm to detect anomalies. The performance of their pro- posed method is shown as the ﬁrst row , M = 1, in the phase diagrams of Figure 4 . Here, we expand the application of LASSO by allowing for a trade-of f between the number of measurements per time-step, M , and the number of time-steps, T , for which measurements are taken. Applications with an ability to store multiple mea- surements at each time-step, while seeking to minimize the time needed to accumu- late data, might prefer to use the MMV -LASSO of Algorithm 3 to detect anomalies. 20 40 60 80 100 t 20 40 60 80 100 m K=1 20 40 60 80 100 t K=5 20 40 60 80 100 t K=10 0.0 0.2 0.4 0.6 0.8 1.0 Fig. 2 The recovery phase transition for the OSGA algorithm with K = 1 , 5, and 10 anomalous random variables. In these experiments, we have assumed that we know the number of anomalies, K . T o explore the possibility of estimating the number of anomalies as we detect them, we consider the following e xperiments. 1. For OSGA, we calculate the test statistics, ξ n , in Algorithm 1 for all n = 1 , 2 , · · · , N and sort them in descending order; then determine whether the am- plitudes of the ξ n can be used to estimate K . 18 Durgin, Grotheer , Huang, Li, Ma, Needell, Qin 20 40 60 80 100 t 20 40 60 80 100 m K=1 20 40 60 80 100 t K=5 20 40 60 80 100 t K=10 0.0 0.2 0.4 0.6 0.8 1.0 Fig. 3 The reco very phase transition for the MMV -SOMP algorithm with K = 1 , 5, and 10 anoma- lous random variables. 20 40 60 80 100 t 20 40 60 80 100 m K=1 20 40 60 80 100 t K=5 20 40 60 80 100 t K=10 0.0 0.2 0.4 0.6 0.8 1.0 Fig. 4 The recovery phase transition for the MMV -LASSO algorithm with K = 1 , 5, and 10 anoma- lous random variables. 2. Similarly , for MMV -SOMP , we use the amplitude of ∑ T t = 1 | < r k − 1 t , φ t ( · , n ) > | k φ t ( · , n ) k 2 in Algo- rithm 2 to determine the number of anomalies. 3. Lastly , for MMV -LASSO, we calculate the reconstructed signal | ˆ x | in Algo- rithm 3 and sort the entries in descending order , and determine K based on the amplitudes. In each case, we ﬁx M = 50 and T = 50 to ensure that recov ery is possible if K is known (we can see this from the results in Figures 2 , 3 , 4 ). The results sho wn in Figure 5 demonstrate the potential of these methods to estimate K . Theoretical justiﬁcation of these methods is left as future work. 3.2 JSM-3R W e next present the results of recovering the anomalous index set for the JSM- 3R signals. Similar to JSM-2R signals, the length of the signal is set to N = 100 and the number of anomalies takes values of K = 1 , 5 , and 10. Unlike the JSM- 2R signals, the N − K random v ariables now follow the distribution N ( 7 , 1 ) while Compressed Anomaly Detection with Multiple Mixed Observ ations 19 20 40 60 80 100 n 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 n V a l u e s 1e5 OSGA K=1 K=5 K=10 20 40 60 80 100 n 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 Values 1e3 SOMP K=1 K=5 K=10 20 40 60 80 100 n 0 1 2 3 4 5 6 7 | x n | V a l u e s LASSO K=1 K=5 K=10 Fig. 5 Plots of the values from which indices are selected for b K in the JSM-2R algorithms. The dotted line denotes the drop between the top K values and the remaining N − K values. 20 Durgin, Grotheer , Huang, Li, Ma, Needell, Qin the K anomalous random variables follo w the distribution N ( 0 , 10 ) or N ( 7 , 10 ) . In order for a fair comparison between the algorithms, we implement the OSGA algorithm for both step 6 of the TECC algorithm and step 10 of the A CIE algorithm. The iteration L in the ACIE algorithm is set to L = 5. The performance of the TECC and ACIE algorithms for varying numbers of measurements M and time-steps T when the anomalous distribution follo ws N ( 0 , 10 ) is presented in Figures 6 and 7 , where both M and T range from 1 to 100. The performance for the setting where the anomalous v ariables are distributed as N ( 7 , 10 ) is similar to Figures 6 and 7 and is thus omitted. W ith a sufﬁciently large number of measurements and time-steps, both algo- rithms are able to achiev e 100% success in recov ery of the anomalous inde x set. For a ﬁxed number of time-steps, the minimum number of measurements required for identiﬁcation increases as the number of anomalies increases for both algorithms. There is improvement in performance of the A CIE algorithm ov er the TECC al- gorithm. The ACIE algorithm requires fewer time-steps to reach 100% recovery success, for a given number of measurements; similarly , it requires fewer measure- ments for 100% recov ery success with a gi ven number of time-steps. 20 40 60 80 100 t 20 40 60 80 100 m K=1 20 40 60 80 100 t K=5 20 40 60 80 100 t K=10 0.0 0.2 0.4 0.6 0.8 1.0 Fig. 6 The recovery phase transition for the TECC algorithm with K = 1 , 5, and 10 anomalous random variables. Here the prev alent distribution is N ( 7 , 1 ) and the anomalous distribution is N ( 0 , 10 ) . Thus far , we have assumed that the prev alent and anomalous distributions hav e very different variances, σ 2 1 = 1 and σ 2 2 = 10 in these experiments. T o in vestigate the performance of these algorithms as the ratio of the variance changes, we exper - iment by setting σ 2 2 / σ 2 1 = 2 , 5 , and 10, for K = 1 , 5 , and 10. Figure 8 shows the phase transition for the TECC algorithm as we vary the ratio of the variances, and Figure 9 shows the phase transition for the A CIE algorithm as we vary the ratio of the variances. In both cases, the algorithms are beha ving as we might expect. The smaller the ratio between the variances, the more measurements and time-steps it takes to detect the anomalies. Compressed Anomaly Detection with Multiple Mixed Observ ations 21 20 40 60 80 100 t 20 40 60 80 100 m K=1 20 40 60 80 100 t K=5 20 40 60 80 100 t K=10 0.0 0.2 0.4 0.6 0.8 1.0 Fig. 7 The recovery phase transition for the A CIE algorithm with K = 1, 5, and 10 anomalous random variables. Here the prev alent distribution is N ( 7 , 1 ) and the anomalous distribution is N ( 0 , 10 ) . 20 40 60 80 100 t 20 40 60 80 100 m 2 2 = 1 0 20 40 60 80 100 t 20 40 60 80 100 t 20 40 60 80 100 m 2 2 = 2 K=1 K=5 K=10 20 40 60 80 100 m 2 2 = 5 0.0 0.2 0.4 0.6 0.8 1.0 Fig. 8 The recovery phase transition for the TECC algorithm with K = 1, 5 and 10 anomalous random variables. Here the prev alent distribution is N ( 7 , 1 ) and the anomalous distribution is N ( 0 , σ 2 2 ) , with σ 2 2 = 2, 5, and 10 shown. 22 Durgin, Grotheer , Huang, Li, Ma, Needell, Qin 20 40 60 80 100 t 20 40 60 80 100 m 2 2 = 1 0 20 40 60 80 100 t 20 40 60 80 100 t 20 40 60 80 100 m 2 2 = 2 K=1 K=5 K=10 20 40 60 80 100 m 2 2 = 5 0.0 0.2 0.4 0.6 0.8 1.0 Fig. 9 The recovery phase transition for the ACIE algorithm with K = 1, 5 and 10 anomalous random variables. Here the prev alent distribution is N ( 7 , 1 ) and the anomalous distribution is N ( 0 , σ 2 2 ) , with σ 2 2 = 2, 5, and 10 shown. 4 Conclusion In this paper , we formally posed the problem of detecting anomalously distributed random variables as an MMV problem, by drawing an analogy between samples of the random variables and ensembles of signals. W e further established two sig- nal models characterizing possible correlation structures among signals that contain anomalous entries. Based on the ne w signal models, we showed through theoretical and numerical analysis that many of the MMV algorithms for sparse signal recov- ery can be adapted to the anomaly detection problem. For tw o of the algorithms, we provided theoretical guarantees of anomaly detection in the asymptotic case. Our experimental results on synthetic data show good performance for signals conform- ing to either model, when a sufﬁciently lar ge number of time-steps is av ailable. While these algorithms succeed in detecting anomalies, there is still room for optimizing performance. Currently these algorithms require storing the sensing ma- trices at each time-step in memory . In future work, we would lik e to explore optimal Compressed Anomaly Detection with Multiple Mixed Observ ations 23 ways to design sensing matrices to reduce the memory burden. Having provided asymptotic anomaly detection guarantees for two algorithms, we are further inter- ested in providing such guarantees for all the algorithms presented. Additionally , we are interested in characterizing the performance bounds for each algorithm’ s ﬁ- nite sample case. Theorem 2 shows that only when the variances of the anomalous and prev alent distributions are distinct can the anomalies be detected by the algo- rithm. W ith additional information about the means of the distrib utions, perhaps the algorithms could be extended to identify the differences in means and detect anoma- lies ev en with identical v ariances. Finally , the theoretical results presented rely on Gaussian distributions. W e are interested in expanding these algorithms to distribu- tions which might not be distinguishable with the current approach. For distributions with heavy tails where the variance is no longer ﬁnite, a theorem assuming the law of large numbers might be incorrect, or the con vergence to the expected value might be v ery slo w . It w ould be interesting to in vestigate under what kinds of heavy-tailed distributions these algorithms start to f ail. Acknowledgements The initial research for this effort was conducted at the Research Collabora- tion W orkshop for W omen in Data Science and Mathematics, July 17-21 held at ICERM. Funding for the workshop was provided by ICERM, A WM and DIMACS (NSF grant CCF-1144502). SL was supported by NSF CAREER grant CCF − 1149225. DN was partially supported by the Alfred P . Sloan Foundation, NSF CAREER #1348721, and NSF BIGDA T A #1740325. JQ was supported by the faculty start-up fund of Montana State Uni versity . 24 Durgin, Grotheer , Huang, Li, Ma, Needell, Qin A ppendix Here we provide a summary of notation for reference. N Number of random variables N Set of random variables indices, { n ∈ N : 1 ≤ n ≤ N } K Number of anomalous random variables K Set of anomalous random variable indices, K ⊂ N , | K | = K M Number of measurements per time-step m Measurement index, 1 ≤ m ≤ M T Number of time-steps measured t T ime-step index, 1 ≤ t ≤ T D 1 Prev alent distribution D 2 Anomalous distribution X Random vector comprising independent random variables X 1 , . . . , X N x N × T -dimensional matrix of independent realizations of X for all T time-steps Φ M × N -dimensional sensing matrix, i.i.d. ∼ N ( 0 , 1 ) entries φ t M × N -dimensional realization of Φ at time t φ ( M · T ) × N -dimensional v ertical concatenation of the φ t , [ φ T 1 , . . . , φ T T ] T y t M -dimensional result of measuring the signal, φ t · x (  , t ) , at time t y ( M · T ) -dimensional vertical concatenation of the y t , [ y T 1 , . . . , y T T ] T Y M -dimensional random vector deﬁned by Φ X JSM Joint Sparsity Model, introduced in [ 2 ] JSM-2 Signals are nonzero only on a common set of indices JSM-3 Signals consist of common non-sparse component and a sparse innovation JSM-2R “Random variable” version of JSM-2 JSM-3R “Random variable” version of JSM-3 OSGA One-step greedy algorithm MMV Multiple Measurement V ector MMV -LASSO MMV Least Absolute Shrinkage and Selection Operator MMV -SOMP MMV Simultaneous Orthogonal Matching Pursuit TECC Transpose Estimation of Common Component A CIE Alternating Common and Innov ation Estimation W e adopt the con vention that random variables will be upper case and their realiza- tions will be lo wer case. All matrix entries will hav e two, subscripted indices. The ﬁrst index will indicate the row position, the second will indicate the column posi- tion. W e will indicate ro w and column vectors by substituting  for the respectiv e index. Compressed Anomaly Detection with Multiple Mixed Observ ations 25 References [1] Angelosante D, Giannakis GB, Grossi E (2009) Compressed sensing of time- varying signals. In: Digital Signal Processing, 2009 16th International Confer- ence on, IEEE, pp 1–8 [2] Baron D, W akin MB, Duarte MF , Sarvotham S, Baraniuk RG (2005) Dis- tributed compressed sensing. Preprint av ailable at https://www.ece. rice.edu/ ˜ shri/pub/DCS_TR.pdf [3] Baron D, Sarvotham S, Baraniuk RG (2010) Bayesian compressive sensing via belief propagation. IEEE T rans Signal Process 58(1):269–280 [4] Basse ville M, Nikiforov IV , et al (1993) Detection of abrupt changes: theory and application, vol 104. Prentice Hall Engle wood Cliffs [5] Berg E, Friedlander MP (2009) Joint-sparse recovery from multiple measure- ments. Arxiv preprint arXi v:09042051 [6] Blumensath T , Davies ME (2009) Iterativ e hard thresholding for compressed sensing. Appl Comput Harmon Anal 27(3):265–274 [7] Cand ` es EJ (2006) Compressiv e sampling. In: Proc. Int. Congress of Mathe- maticians, Madrid, Spain, vol 3, pp 1433–1452 [8] Cand ` es EJ, T ao T (2005) Decoding by linear programming. IEEE Trans Info Theory 51:4203–4215 [9] Cand ` es EJ, T ao T (2006) Near optimal signal recov ery from random projec- tions: Univ ersal encoding strategies? IEEE T rans Info Theory 52(12):5406– 5425 [10] Cand ` es EJ, Romberg J, T ao T (2006) Stable signal recovery from incomplete and inaccurate measurements. Commun Pur Appl Math 59(8):1207–1223 [11] Chen J, Huo X (2006) Theoretical results on sparse representations of multiple measurement vectors. IEEE T rans Signal Process 54(12):4634–4643 [12] Chen SS, Donoho DL, Saunders MA (2001) Atomic decomposition by basis pursuit. SIAM Rev 43(1):129–159 [13] Cho M, Xu W , Lai L (2016) Compressed Hypothesis T esting: T o Mix or Not to Mix? arXiv preprint arXi v:160907528 [14] Cotter S, Rao B, Engan K, Kreutz-Delgado K (2005) Sparse solutions to lin- ear in verse problems with multiple measurement vectors. IEEE T rans Signal Process 53(7):2477–2488 [15] Donoho D (2006) Compressed sensing. IEEE Trans Info Theory 52(4):1289– 1306 [16] Donoho DL (2006) Compressed sensing. IEEE Trans Info Theory 52(4):1289– 1306 [17] Donoho DL, Huo X (2001) Uncertainty principles and ideal atomic decompo- sition. IEEE T rans Info Theory 47:2845–2862 [18] Duarte M, W akin M, Baron D, Baraniuk R (2006) Univ ersal distributed sens- ing via random projections. In: Proc. Inf. Process. Sensor Networks (IPSN) [19] Duarte MF , W akin MB, Baron D, Sarv otham S, Baraniuk RG (2013) Measure- ment bounds for sparse signal ensembles via graphical models. IEEE T rans Info Theory 59(7):4280–4289 26 Durgin, Grotheer , Huang, Li, Ma, Needell, Qin [20] Efron B, Hastie T , Johnstone I, Tibshirani R (2004) Least angle regression. Annals of Statistics 32:407–499 [21] Eldar YC, Kutyniok G (2012) Compressed sensing: theory and applications. Cambridge Univ ersity Press [22] Filos J, Karseras E, Dai W , Y an S (2013) Tracking dynamic sparse signals with hierarchical kalman ﬁlters: a case study . In: Digital Signal Processing (DSP), 2013 18th International Conference on, IEEE, pp 1–6 [23] Foucart S, Rauhut H (2013) A mathematical introduction to compressiv e sens- ing. Birkh ¨ auser Basel [24] Haupt J, Now ak R (2006) Signal reconstruction from noisy random projec- tions. IEEE T rans Info Theory 52(9):4036–4048 [25] Ji S, Xue Y , Carin L (2008) Bayesian compressive sensing. IEEE T rans Signal Process 56(6):2346–2356 [26] K eerthi SS, She v ade S (2007) A fast tracking algorithm for generalized lars/lasso. IEEE T ransactions on Neural Networks 18(6):1826–1830 [27] Lai L, F an Y , Poor HV (2008) Quickest detection in cogniti ve radio: A sequen- tial change detection frame work. In: Global T elecommunications Conference, 2008. IEEE GLOBECOM 2008. IEEE, IEEE, pp 1–5 [28] Mallat SG, Zhang Z (1993) Matching pursuits with time-frequency dictionar- ies. IEEE T ransactions on Signal Processing 41(12):3397–3415 [29] Malloy M, No wak R (2011) On the limits of sequential testing in high dimen- sions. In: 2011 Conference Record of the F orty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR), pp 1245–1249 [30] Malloy M, Now ak R (2011) Sequential analysis in high-dimensional multi- ple testing and sparse recov ery . In: 2011 IEEE International Symposium on Information Theory Proceedings, pp 2661–2665 [31] Malloy ML, T ang G, Now ak RD (2012) Quickest search for a rare distribu- tion. In: 2012 46th Annual Conference on Information Sciences and Systems (CISS), pp 1–6 [32] Mishali M, Eldar YC (2009) Reduce and boost: recovering arbitrary sets of jointly sparse vectors. IEEE T rans Signal Process 56(10):4692–4702 [33] Needell D, V ershynin R (2007) Signal recovery from incomplete and inaccu- rate measurements via Regularized Orthogonal Matching Pursuit. IEEE J Sel T op Signa 4:310–316 [34] Pati YC, Rezaiifar R, Krishnaprasad PS (1993) Orthogonal matching pursuit: recursiv e function approximation with applications to wav elet decomposition. In: Proceedings of 27th Asilomar Conference on Signals, Systems and Com- puters, pp 40–44 vol.1 [35] Patterson S, Eldar YC, Keidar I (2014) Distributed compressed sensing for static and time-varying networks. IEEE T rans Signal Process 62(19):4931– 4946 [36] Poor HV , Hadjiliadis O (2009) Quickest detection, v ol 40. Cambridge Univ er- sity Press Cambridge [37] T ropp J A (2006) Just relax: Conv ex programming methods for subset selection and sparse approximation. IEEE T rans Info Theory 52(3):1030–1051 Compressed Anomaly Detection with Multiple Mixed Observ ations 27 [38] T ropp J A, Gilbert AC (2007) Signal recovery from random measurements via Orthogonal Matching Pursuit. IEEE T rans Info Theory 53(12):4655–4666 [39] T ropp J A, Gilbert AC, Strauss M (2005) Simultaneous sparse approximation via greedy pursuit. In: ICASSP [40] Xia Y , Tse D (2006) Inference of link delay in communication networks. IEEE J Sel Areas in Commun 24(12):2235–2248 [41] Y u G, Sapiro G (2011) Statistical compressed sensing of gaussian mixture models. IEEE T rans Signal Process 59(12):5842–5858

Compressed Anomaly Detection with Multiple Mixed Observations

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment