Towards stationary time-vertex signal processing

1 T o wards stationary time-verte x signal processing Nathana ¨ el Perraudin ∗ and Andreas Loukas ∗ and Francesco Grassi and Pierre V anderghe ynst ´ Ecole Polytechnique F ´ ed ´ erale Lausanne, Switzerland Abstract —Graph-based methods f or signal processing hav e shown promise for the analysis of data exhibiting irregular structure, such as those found in social, transportation, and sensor networks. Y et, though these systems are often dynamic, state-of-the-art methods for signal pr ocessing on graphs ignore the dimension of time, treating successi ve graph signals indepen- dently or taking a global av erage. T o address this shortcoming, this paper considers the statistical analysis of time-varying graph signals. W e introduce a novel deﬁnition of joint (time-vertex) stationarity , which generalizes the classical deﬁnition of time sta- tionarity and the more recent deﬁnition appropriate for graphs. Joint stationarity gi ves rise to a scalable Wiener optimization framework for joint denoising, semi-supervised learning, or more generally inv ersing a linear operator , that is provably optimal. Experimental results on real weather data demonstrate that taking into account graph and time dimensions jointly can yield signiﬁcant accuracy improvements in the reconstruction effort. I . I N T RO D U C T I O N Whether examining opinion dichotomy in social net- works [1], how trafﬁc ev olves in the roads of a city [2], or neuronal activ ation patterns present in the brain [3], much of the high-dimensional data one encounters exhibit comple x non-euclidean properties. This realization has been the driving force behind recent efforts to re-in vent the mathematical mod- els used for data analysis. W ithin the ﬁeld of signal processing, one of the main research thrusts has been to extend harmonic analysis to graph signals, i.e., signals supported on the vertices of irregular graphs. The key breakthrough in the ﬁeld has been the introduction of a notion of frequency appropriate for graph signals and of the associated graph Fourier transform (GFT). Because it enables us to process signals taking into account complex relations between variables, the GFT has lead to advances in problems such as denoising [4] and semi- supervised learning [5], [6]. Y et, state-of-the-art graph frequency based methods often fail to produce useful results when applied to real datasets. One of the main reasons underlying this shortcoming is that they ignore the time dimension, for example by treating successiv e signals independently or performing a global average [3], [7], [8]. On the contrary , many of the systems to which graph signal processing is applied to are dynamic. Consider for instance a sensor network, and suppose that we want to infer the weather conditions on a mountain gi ven temperature measurements from a small set of weather stations. Ap- proaches that do not take into account the temporal ev olution of weather will be biased by seasonal variations and unable to provide insights about transient phenomena. Moreover , when the weather dynamics are slow and predictable, taking into account the time dimension, e.g., by imposing a smoothness A. Loukas and N. Perraudin contributed equally to this work. prior , can yield accuracy improvements in the reconstruction effort. Motiv ated by this need, this paper considers the statis- tical analysis of time-evolving graph signals . Our results are inspired by the recent introduction of a joint temporal and graph Fourier transform (JFT), a generalization of GFT appropriate for time-varying graph signals [9], and the re- cent generalization of stationarity for graphs [7], [10], [11]. Our main contribution is a novel deﬁnition of time-vertex (wide-sense) stationarity , or joint stationarity for short. W e believ e that the proposed deﬁnition is natural, at it elegantly generalizes existing deﬁnitions of stationarity in the time and verte x domains. W e sho w that joint stationarity carries along important properties classically associated with stationarity . Moreov er , our deﬁnition leads to a W iener framew ork for solv- ing denoising and interpolating time-varying graph signals, that yields superior performance compared to state-of-the-art methods in time or verte x domains. The proposed frame work is composed out of two key components: a scalable joint power spectral density estimation method, and an optimization framew ork suitable for deconv olution under additive error . The latter is shown to be optimal in the mean-squared error sense. Experiments with a real weather dataset illustrate the superior performance of our method, ultimately demonstrating that joint stationarity is a useful assumption in practice. I I . P R E L I M I NA R I E S Our objective is to model and predict the e v olution of graph signals, i.e., signals supported on the vertices V = { v 1 , v 2 , . . . , v N } of a weighted undirected graph G = ( V , E , W G ) , with E the set of edges and W G the weighted adjacency matrix. A more con venient matrix representa- tion of G is the (combinatorial 1 ) Laplacian matrix L G = diag ( W G 1 N ) − W G , where 1 N is the all-ones v ector of size N , and diag ( W G 1 N ) is the diagonal de gree matrix. Harmonic vertex analysis. In the context of graph signal processing, the importance of the Laplacian matrix stems from it gi ving rise to a graph-speciﬁc notion of frequency . The Graph F ourier T ransform (GFT) of a graph signal x ∈ R N is deﬁned as GFT { x } = U ∗ G x , where U G is the eigen vector matrix of L G and thus L G = U G Λ G U ∗ G . The GFT allows us to extend ﬁltering to graphs [13], [14], [15]. Filtering a signal x with a graph ﬁlter h ( L G ) corresponds to element- wise multiplication in the spectral domain h ( L G ) x ∆ = GFT -1 { h ( Λ G ) ◦ GFT { x }} = U G h ( Λ G ) U ∗ G x , 1 Though we use the combinatorial Laplacian in our presentation, our results are applicable to any positive semi-deﬁnite matrix representation of a graph or to the recently introduced shift operator [12]. 2 where the scalar function h : R + 7→ R , referred to as the graph frequency response, has been applied to each diagonal entry of Λ G . It is often con venient to represent the diagonal of matrix h ( Λ G ) as a vector , in which case we write h = diag ( h ( Λ G )) . The notation U ∗ G denotes the transposed complex conjugate of U G , U | G the transpose of U G , and ¯ U G the complex conjugate of U G . W e will also use the notion of graph localization [15], [7], a generalization of the translation operator used in the classical setting appropriate for graphs 2 . Localizing a ﬁlter with frequency response h onto vertex v i reads T G i h ∆ = h ( L G ) δ i , (1) where δ i is a Kronecker delta centered at vertex v i . For a sufﬁciently regular function h , this operation localizes the ﬁlter around v i [15, Theorem 1 and Corollary 2]. Ev aluated at the i 2 -th verte x, the abo ve expression becomes T G i 1 h ( i 2 ) = N X n =1 h ( λ n ) ¯ u n ( i 1 ) u n ( i 2 ) , (2) where we use the notation u n ( i ) = [ U G ] i,n and λ n = [ Λ G ] n,n . Note that in the expression above, the localization operator takes precedence over indexing and T G i 1 h ( i 2 ) = [ T G i 1 h ]( i 2 ) ; this con vention is used throughout this paper . The concept of localization is intimately linked to that of translation in the time domain. If T T τ h is the localization operator taken on a cycle graph of T vertices (representing time), localization is equiv alent to translation T T τ h ( t ) = T T 0 h ( t − τ ) , for all t, τ = 1 , . . . , T . (3) In simple words, for a cyclic graph, the localization operator computes the in verse Fourier transform of the frequency response h , and translates it to vertex v i . W e can verify this using the fact that the complex exponential Fourier basis form the eigen vector set of all cyclic graphs [16], which together with (2) implies that T T t 1 h ( t 2 ) = 1 T T X τ =1 h ( ω τ ) e − 2 π j ( τ − 1) t 1 T e 2 π j ( τ − 1) t 2 T = 1 T T X τ =1 h ( ω τ ) e 2 π j ( τ − 1)( t 2 − t 1 ) T = T T 0 h ( t 2 − t 1 ) . (4) Abov e T T 0 h = U T h is the in verse Fourier transform of h and U T is the orthonormal Fourier basis. In the case of irregular graphs, localization differs further from translation because the shape of the localized ﬁlter adapts to the graph and v aries as a function of its topology . Additional insights about the localization operator can be found in [15], [13], [17], [7]. Harmonic time-vertex analysis. Suppose that a graph signal x t is sampled at T successiv e regular intervals of unit length. The time-varying graph signal X = [ x 1 , x 2 , . . . , x T ] ∈ R N × T is then the matrix having graph signal x t as its t - th column. Equiv alently , X =  x 1 , x 2 , . . . , x N  | holds N 2 Stationarity is classically deﬁned as the in variance of statistical moments of a signal with respect to translation. This deﬁnition ho wev er cannot be directly generalized to graphs, which do not possess regular structure and thus lack of an isometric translation operator . temporal signals x i ∈ R T , one for each vertex v i . Throughout this paper , we denote as x = vec ( X ) (without subscript) the vectorized representation of the matrix X . The frequency representation of X is giv en by the joint (time-verte x) Fourier transform (or JFT for short) JFT { X } = U ∗ G X ¯ U T , (5) where, once more, U G is the graph Laplacian eigen vector ma- trix, whereas ¯ U T is the complex conjugate of the DFT matrix divided by 1 / √ T . In fact, matrix U T is the eigenv ector matrix of the lag operator (or Laplacian matrix) L T = U T Λ T U ∗ T . Denote by Ω the diagonal matrix of angular frequencies (i.e., Ω tt = ω t = 2 π t/T ). In case L T is the lag operator , we have Λ T = e − j Ω , where j = √ − 1 . When L T is the Laplacian, we hav e Λ T = real  I − e − j Ω  . Expressed in vector form, the joint Fourier transform becomes JFT { x } = U ∗ J x , where U J = U T ⊗ U G is unitary , and operator ( ⊗ ) denotes the kro- neker product. The in verse joint Fourier transforms in matrix and vector form are, respecti vely , JFT -1 { X } = U G X U | T and JFT -1 { x } = U J x . For an in-depth discussion of JFT and its properties, we refer the reader to [9]. Lev eraging the deﬁnition of the JFT , ﬁltering and localiza- tion can also be extended to the joint (time-vertex) domain. A joint ﬁlter h ( L J ) is a function deﬁned in the joint spectral domain h : R + × R 7→ R that is ev aluated at the graph eigen values λ G and the angular frequencies ω . The output of a joint ﬁlter is h ( L J ) x ∆ = U J h ( Λ G , Ω ) U ∗ J x , (6) where h ( Λ G , Ω ) is a N T × N T diagonal matrix with [ h ( Λ G , Ω )] k,k = h ( λ n , ω τ ) and k = N ( τ − 1) + n . Equiv- alently , if we deﬁne the matrix H of dimension N × T as H n,τ = h ( λ n , ω τ ) for ev ery graph frequency λ n and temporal frequency ω τ , we ha ve h ( L J ) x = vec  JFT -1 { H ◦ JFT { X } ) }  , (7) with ( ◦ ) being the element-wise multiplication (Hadamard product). In an analogy to (1), we deﬁne the joint localization operator as T J i,t h ∆ = mat ( h ( L J ) ( δ t ⊗ δ i )) (8) = JFT -1 { H ◦ JFT { δ i δ | t } ) } (9) where mat ( · ) is the matricization operator , such that mat ( vec ( X )) = X . In order to link (8) with graph localization (1) and the classical translation operator, we observe the following relations T J i 1 ,t 1 h ( i 2 , t 2 ) = 1 T N ,T X n =1 τ =1 h ( λ n , ω k ) ¯ u n ( i 1 ) u n ( i 2 ) e 2 π j ( τ − 1)( t 2 − t 1 ) T = T J i 1 , 0 h ( i 2 , t 2 − t 1 ) (10) = 1 T N X n =1 " T X τ =1 h ( λ n , ω τ ) e 2 π j ( τ − 1)( t 2 − t 1 ) T # ¯ u n ( i 1 ) u n ( i 2 ) = N X n =1  T T t 1 H n, ·  ( t 2 ) ¯ u n ( i 1 ) u n ( i 2 ) (11) 3 = 1 T T X τ =1 " N X n =1 h ( λ n , ω τ ) ¯ u n ( i 1 ) u n ( i 2 ) # e 2 π j ( τ − 1)( t 2 − t 1 ) T = 1 T T X τ =1  T G i 1 H · ,τ  ( i 2 ) e 2 π j ( τ − 1)( t 2 − t 1 ) T . (12) The above equations provide three key insights about the joint localization operator: 1) From (10), we observe that the localization operator performs a translation along the time dimension. 2) From (11), it follows that joint localization consist of ﬁrst localizing (translating) independently in time each line of the matrix H and then localizing independently on the graph each column of the resulting matrix. Joint localization is thus equiv alent to a successive application of a graph and and a time localization operator . 3) Furthermore, according to (12), the successive localiza- tion in time and graph can be performed in any order . When the ﬁlter is separable, i.e., when the joint frequency response can be written as the product of a frequency response deﬁned solely in the verte x domain and one in the time domain h ( λ, ω ) = h 1 ( λ ) h 2 ( ω ) , the joint localization is simply h ( L J ) ( δ t ⊗ δ i ) = vec ( h ( L G )( δ t ⊗ δ | i ) h ( L T )) . (13) Nev ertheless, for this work, we assume that the ﬁlter is not separable as it is a too restrictive hypothesis. I I I . J O I N T T I M E - V E RT E X S TA T I O NA R I T Y Let X be a discrete multiv ariate stochastic process with ﬁnite number of time-steps T that is indexed by verte x v i and time t . W e refer to such processes as joint time-vertex pr ocesses , or joint processes for short. T o put our results in context, let us ﬁrst revie w the estab- lished deﬁnitions of stationarity over time and vertex domains, respectiv ely . Our deﬁnition will emerge us a consequence of both. W e note that, although our exposition is self-contained, the reader will beneﬁt from familiarizing with previous work on stationarity on graphs [7]. Deﬁnition 1 (Time stationarity) . A joint pr ocess X is T ime W ide-Sense Stationary (TWSS), if and only if the following two pr operties hold independently for each vertex v i : 1) The expected value is constant over the time domain E  x i  = c i 1 T . 2) There exists a function γ i , for which [ Σ x i ] t, · = h E  x i x i ∗  − E  x i  E  x i ∗  i t, · = T T t γ i Function γ i is the autocorr elation function of signal x i in the F ourier domain, and is also r eferred to as T ime P ower Spectral Density (TPSD). W e remind the reader that on a cyclic graph, localizing the TPSD is equiv alent to translating the autocorrelation. Thus, using (9) we recover the classical deﬁnition, where the autocorrelation function depends only on the time difference: [ Σ x i ] t,τ = T T 0 γ i ( t − τ ) . Simply put, assuming time stationar- ity is equiv alent to asserting that the statistics of the tw o ﬁrst moments are independent of the time. W e also observe that the TPSD is the Fourier transform of the autocorrelation, agreeing with the Wiener -Khintchine Theorem [18]. T o summarize, TPSD encodes the statistics of the signal in the spectral domain. This consideration allows us to generalize the concept of stationarity to graph signals. Please refer to the work of Perraudin and V andergheynst [7] for a more detailled study . W e express a v ariation of their deﬁnition in the follo wing. Deﬁnition 2 (V ertex stationarity) . A joint process X = [ x 1 x 2 . . . x T ] is called V erte x W ide-Sense (or second or der) Stationary (VWSS), if and only if the following two properties hold independently for eac h time t : 1) The expected value is in the null space of the Laplacian L G E [ x t ] = 0 N . 2) There exists a gr aph ﬁlter s t ( L G ) , for which [ Σ x t ] i, · = [ E [ x t x ∗ t ] − E [ x t ] E [ x ∗ t ]] i, · = T G i s t . Function s t is the autocorr elation function of signal x t in the graph F ourier domain and is also referr ed to as V ertex P ower Spectral Density (VPSD). Considering that the null space of L T in both the normal- ized and the combinatorial case is the span of the constant eigen vector 1 T , the ﬁrst condition of the above deﬁnition is analogous to the corresponding condition of the time stationar- ity deﬁnition. Moreov er , the condition for the second moment is a natural generalization of the second condition of time stationarity where, instead of imposing translation in v ariance, we suppose inv ariance under the localization operator . This second condition is in fact equivalent to a generalization of the Wiener -Khintchine theorem and implies that Σ x t is jointly diagonalizable with L G (in TWSS the covariance is T oeplitz and thus also diagonalizable with the DFT matrix U T ). W e now unify the TWSS and GWSS in order to lev erage both the time and v ertex domain statistics. Deﬁnition 3 (Joint stationarity) . A pr ocess X is called Jointly (or time-vertex) W ide-Sense Stationary (JWSS), if and only if its vector form x = vec ( X ) satisﬁes the following properties: 1) The expected value is in the null space of the Laplacian L J E [ x ] = 0 N T . 2) There exists a joint ﬁlter h ( L J ) , for which [ Σ x ] k, · = [ E [ xx ∗ ] − E [ x ] E [ x ∗ ]] k, · = vec  T J i,t h  , wher e k = N ( t − 1) + i . Function h is the autocorrelation function of signal x in the joint F ourier domain and is also r eferr ed to as time-vertex power spectral density or J oint P ower Spectral Density (JPSD) for short. The deﬁnition above is in fact equiv alent to stating that the mean is constant, and the co v ariance matrix Σ x is jointly di- agonalizable with the joint Laplacian L J . The latter statement is (also) a generalization the Wiener -Khintchine theorem and is prov en ne xt. 4 Theorem 1. A pr ocess X is JWSS if and only if 1) L J E [ x ] = 0 N T , and 2) its covariance matrix is jointly diagonalizable by the joint F ourier basis U J . Pr oof: T o prove an equiv alence relation between the two deﬁnitions (i.e., the JWSS deﬁnition and the one stated by the theorem) we will prov e a one-to-one equiv alence between their respective conditions. Clearly the ﬁrst condi- tions of both deﬁnitions are identical. The second condition [ Σ x ] k, · = v ec  T J i,t h  of the joint stationarity deﬁnition together with (8) assert that the covariance being a joint ﬁlter Σ x = h ( L J ) for some function h . W e therefore have that Σ x = U J h ( Λ G , Ω ) U ∗ J with h ( Λ G , Ω ) diagonal, which implies our claim. Interestingly , assuming joint stationarity is equiv alent to assuming stationarity in both domains at the same time. Theorem 2. If a joint pr ocess X is JWSS, then it is both TWSS and GWSS. Pr oof: It is straightforward to see that L J E [ x ] = 0 N T if and only if both L T E [ x t ] = 0 N and L G E  x i  = 0 T , hold for all t and i . W e still need to show that the second- order moment properties of TWSS and VWSS are equiv alent to that of JWSS. If a process is joint stationary , then from (12) we hav e that, for each vertex v i [ Σ x i ] t 1 ,t 2 = [ T J i,t 1 h ]( i, t 2 ) = 1 T T X τ =1  T G i H · ,τ  ( i ) e 2 π j ( τ − 1)( t 2 − t 1 ) T which is equi v alent to asserting that x i is stationary in time with TPSD γ i ( ω τ ) = [ T G i H · ,τ ]( i ) . Similarly , using (11), we ﬁnd that for each time t [ Σ x t ] i 1 ,i 2 = [ T J i 1 ,t h ]( i 2 , t ) = N X n =1  T T t H n, ·  ( t ) ¯ u n ( i 1 ) u n ( i 2 ) meaning that process x t is stationary with VPSD s t ( λ n ) = [ T G t H n, · ]( t ) . Example 1 (White i.i.d. noise) . White i.i.d. noise w ∈ R N T is JWSS for any graph. Indeed, the ﬁrst moment E [ w ] is constant for any time and vertex. Mor eover , due to being an identity matrix, the covariance of w is diagonalized by the joint F ourier basis of any graph Σ w = I = U J I U ∗ J . This last equation tells us that the JPSD is constant, which implies that similar to the classical case, white noise contains all joint (time-vertex) frequencies. An interesting property of JWSS processes is that station- arity is preserved through a ﬁltering operation. Theorem 3. When a joint ﬁlter f ( L J ) is applied to a JWSS process X , the result Y r emains JWSS with mean f (0 , 0) E [ X ] and JPSD that satisﬁes h Y ( λ, ω ) = f 2 ( λ, ω ) · h X ( λ, ω ) . (14) Pr oof: The output of a ﬁlter f ( L J ) can be written in vector form as y = f ( L J ) . If the input signal x is JWSS, we can conﬁrm that the ﬁrst moment of the ﬁlter output is zero, E [ f ( L J ) x ] = f ( L J ) E [ x ] = f (0 , 0) E [ x ] . The last equality follows from the fact that by deﬁnition E [ x ] is in the null space of L J . The computation of the second moment gives Σ y = E  f ( L J ) x ( f ( L J ) x ) ∗  − E [ h ( L J ) x ] E [( f ( L J ) x ) ∗ ] = f ( L J ) E [ xx ∗ ] f ( L J ) − f ( L J ) E [ x ] E [ x ∗ ] f ( L J ) ∗ = f ( L J ) Σ x f ( L J ) ∗ = U J  f 2 ( Λ G , Ω ) h X ( Λ G , Ω )  U ∗ J , which is, from Theorem 1, JWSS as it is diagonalizable by U J . As the following diagram illustrates, Theorem 3 provides a simple way to artiﬁcially produce JWSS signals with a prescribed PSD f 2 by simply ﬁltering white noise with the joint ﬁlter f ( L J ) . The resulting signal will be stationary with PSD f 2 and this holds for white noise abiding to any distribution (not only Gaussian). In the sequel, we assume for simplicity that the signal is centered at 0 , i.e., E [ x ] = 0 · 1 . Whenev er it is clear from the context, in the following we simply refer to the TPSD, VPSD, and JPSD as PSD. I V . J O I N T P OW E R S P E C T R A L D E N S I T Y E S T I M A T I O N As the JPSD is central in our method, we need a reliable way to compute it. Since we take into account the correlation both in the time and in the vertex domain, the actual size of the covariance matrix Σ x is N T × N T . In many cases, this matrix is not computable nor can be even stored. Additionally , if attempt to estimate it using classical cov ariance estimation methods, the number of samples necessary for obtaining a reasonable estimation accurac y can be prohibiti ve. The number of samples needed for obtaining a good sample covariance matrix of an n -dimensional process is generally not known, but for distributions with ﬁnite second moment it has been shown to be O ( n log n ) by Rudelson [19], [20]. In our case, this theorem implies that we need O ( N T log ( N T )) signals, of N T variables each, to obtain a good estimate of the statistics of a joint process. T o circumvent this issue, we leverage the time-vertex struc- ture of the data. The basic idea behind our approach stems from two established methods used to estimate the TPSD of a temporal signal, namely Bartlett’ s and W elch’ s methods [21], which are summarized belo w . TPSD estimation methods. In Bartlett’ s method, the signal (timeseries) is ﬁrst cut into equally sized segments without ov erlap. Then, the Fourier transform of each segment is computed. Finally , the PSD is obtained by averaging over segments the squared amplitude of the Fourier coefﬁcients. W elch’ s method [22] is a generalization that works with ov erlapping se gments. W e can see the TPSD estimation of both methods as the averaging over time of the squared coefﬁcients of a Short Time Fourier T ransform (STFT). W e remind the reader that STFT is used to extract the frequency content of a 5 temporal signal at a given time, by ﬁrst selecting a part of the signal using a windo w and then compute the discrete F ourier transform. More concretely , for a discrete signal s of length T , the circular discrete sampled STFT of s at the m -th (out of M ) frequency band, and under windo w g is STFT { s } ( k , m ) ∆ = T X t =1 s ( t ) g ( t k ) e − 2 πj ( t − 1)( m − 1) M , where t k = mod ( t − a ( k − 1) , T ) + 1 , scalar a is the shift in time between two successive windows [23, equation 1], and mod ( t, T ) ﬁnds the remainder after division by T i.e., mod ( t, T ) = t − T b t T c . Note that k = 0 , 1 , . . . , b T a c − 1 is the time band centered at k a and that m = 1 , . . . , M is the frequency band index. For additional insights about this transform, we refer the reader to [24], [25]. Joint PSD estimation. Based on the idea that the Bartlett method is an average of STFT coefﬁcients, we propose to use the GFT of the STFT as a tool to estimate the joint PSD. Consider a time window g and a time-verte x signal X . W e ﬁrst deﬁne the coef ﬁcients’ tensor as C n,k,m ∆ = N X i =1 [ U G ] i,n STFT { x i } ( k , m ) = N X i =1 [ U G ] i,n T X t =1 X i,t g ( t k ) e − 2 π j ( t − 1)( m − 1) M . A usual parameter for M is the support size of g . Then, for half-ov erlapping windows, we set a to M / 2 . For any discrete verte x frequency λ n and time frequency ω m = 2 π m/ M , our JPSD estimator is ˜ h ( λ n , ω m ) ∆ = a T k g k 2 2 b T /a c− 1 X k =0 C 2 n,k,m (15) In order to get an estimate of h at ω 6 = ω m , we interpo- late between the known points. Alternatively , with sufﬁcient computation power , one may set M = T . Though alternative choices are possible, we suggest using the iterated sine windo w g ( t ) = sin  0 . 5 π cos ( π t/ M ) 2  χ [ − M / 2 ,M/ 2] ( t ) , where χ [ − M / 2 ,M/ 2] ( t ) = 1 if t ∈ [ − M / 2 , M / 2] and 0 otherwise, as it turns the STFT into a tight operator for M = 2 a . W e defer an error analysis of the estimator for the longer version of this paper . Other PSD estimation methods. In case T  N , two problems arise with the aforementioned method. First, we cannot compute the graph Fourier basis U G , and second the number of sample in time might not be sufﬁcient to av erage ov er time. T o circumvent this problem, one can do the av erage ov er the graph vertex using a technique similar to the PSD estimator of [7]. This technique will be studied in future work. V . O P T I M I Z A T I O N F R A M E W O R K W e can le verage our deﬁnition of stationarity to generalize the optimization framework of [7], useful for denoising, inter- polating, and more generally deconv oling stationary processes. Concretely , suppose that our measurements y are generated by a linear model y = Ax + w , (16) where, as in the rest of this document, x and y are the vectorized version of X , Y . Further , suppose that the JPSD of x is h X , whereas the noise w is zero mean has JPSD h W and may follo w any distribution. Matrix A is a general linear operator , not assumed to be jointly diagonalizable with L J . Tikhono v-regularization. Whet the signal x v aries smoothly on the graph, i.e is low frequency based, the classical ap- proach of ﬁnding x from y , consists of solving the following optimization scheme, commonly referred to as Tikhono v- regularization arg min x k Ax − y k 2 2 + α x ∗ L J x (17) Notice that the prior abo ve is separable into two terms x ∗ L J x = tr ( X ∗ L G X ) + tr ( X L T X ∗ ) . (18) As a result, optimization problem (17) can only encode a particular joint time-verte x structure. Additionally this scheme requires the parameter α to be tuned and does not take into account the statistical structure of the signals. Wiener optimization framework. W e instead propose to recov er x as the solution of the Wiener optimization problem ˙ x = arg min x k Ax − y k 2 2 + k f ( L J )( x − E [ x ]) k 2 2 , (19) where f ( λ, ω ) are the joint Fourier penalization weights, deﬁned as f ( λ, ω ) ∆ =      s h W ( λ, ω ) h X ( λ, ω )      = 1 p SNR( λ, ω ) . (20) In the noise-less case, one alternatively solves the problem ˙ x = arg min x k h − 1 2 X ( L J ) x k 2 2 , subject to Ax = y . (21) Intuitiv ely , the weight f ( λ, ω ) heavily penalizes frequencies associated with low SNR and vice-versa. Formally , we can show that: • If X is a Gaussian process, then the solution of Prob- lem (19) coincides with a MAP estimator . • If A is a masking operator , then the solution of Prob- lem (19) coincides with the minimum mean square error linear estimator . • If A = a ( L J ) is a joint ﬁlter, then the solution of Problem (19) is a joint Wiener ﬁlter [9]. The proofs are generalizations of Theorems 3,4 and 5 of [7]. Comparison to the MAP estimator . There are three main advantages of the Wiener optimization framework ov er a Gaussian MAP estimator based on an empirical covariance matrix estimate. Firstly , assuming stationarity allows for a more robust estimate of the covariance matrix. This is crucial in this problem since we typically expect the number of variable N × T to be large and an empirical estimate of the covariance matrix to be expensi ve. Secondly , storing the 6 cov ariance might not be possible as it consists of O (( N T ) 2 ) elements. On the contrary , the JPSD h X has only N T ele- ments. Finally , thanks to proximal splitting methods, we can deriv e an algorithm for solving Problem (19) that requires only the application of A and spectral graph ﬁltering. On the contrary the classical Gaussian MAP estimator requires the in verse of a large part of the cov ariance matrix. V I . E X P E R I M E N T S W e apply our methods to a weather dataset depicting the temperature of 32 weather stations, o ver a span of 31 days. Our experiment aims to show that 1) joint stationarity is a useful model, even in datasets which may violate the strict conditions of our deﬁnition, and 2) that time-vertex stationarity can yield a signiﬁcant increase in denoising and recovery accuracy , as compared to time- or vertex-based methods, on a real dataset. Experimental setup. The French national meteorological service has published in open access a dataset 3 with hourly weather observations collected during the Month of January 2014 in the region of Brest (France). The graph is built from the coordinates of the weather stations by connecting all the neighbors in a gi ven radius with a weight function [ W G ] i 1 ,i 2 = exp( − k d ( i 1 , i 2 ) 2 ) , where d ( i 1 , i 2 ) is the euclidean distance between the stations i 1 and i 2 . Parameter k is adjusted to as obtain an average degree around 3 ( k , howe ver , is not a sensitiv e parameter). As sole pre-processing, we remove the mean (over time and stations) of the temperature. This is equiv alent to removing the ﬁrst moment. The dataset, which consisted of a total of T = 744 timesteps, was split into two parts of size ρT and (1 − ρ ) T , respectiv ely . W e use the ﬁrst part of the dataset to estimate the PSD and the second to quantify the joint ﬁlter performance. W e compare our joint method to the state-of-the-art wiener ﬁlters for the disjoint time/verte x domains, which are known to outperform non-statistics based methods, such as graph/time T ikhonov and graph/time TV . T o highlight the beneﬁt of the joint approach, in the disjoint cases we use the entire dataset to estimate the PSD (for ρ = 1 the same data are used for both training and testing). Denoising. For this experiment, we add Gaussian random noise to the data and remove the noise thanks to Wiener ﬁlter ( A = I in problem (19)). The result is displayed in Figure 1. Joint stationarity outperforms time or v ertex stationarity especially when the noise level is high. Indeed, joint stationarity allows the estimator to average over more samples. In order to obtain a good denoising, we need a good JPSD estimation. The effect of the dataset size can be observed through the parameter ρ , with larger ρ resulting in higher accuracy . Especially for large input SNR, the joint approach becomes particularly meaningful as it outperforms other approaches, even when a very small portion of the data is used for JPSD estimation (whereas the time and v ertex based methods ρ = 1 , meaning that they use the entire dataset for PSD estimation). 3 Access to the raw data is possible directly from https://donneespubliques. meteofrance.fr/donnees libres/Hackathon/RADOMEH.tar .gz -20 -15 -10 -5 0 5 10 Input SNR (dB) 0 5 10 15 Output SNR (dB) V ertex, ρ = 100% Time, ρ = 100% Joint, ρ = 25% Joint, ρ = 50% Joint, ρ = 75% Joint, ρ = 100% (a) Denoising 10 20 30 40 50 60 70 80 90 Pe rcen t of mis sing v alues 6 8 10 12 14 Output - SNR (dB ) V ertex, ρ = 100% Time, ρ = 100% Joint, ρ = 75% Joint, ρ = 100% (b) Recovery Fig. 1: Experiments on Molene temperatures. The joint ap- proach becomes especially meaningful when the av ailable data are very noisy or are few . The recovery performance is slightly improv ed when a larger percentage ρ of data are av ailable for training. Recovery . W e also consider a recovery problem, where a gi ven percentage of entries of matrix X is missing. Figure 1 depicts the recovery error obtained using problem (21). Again, we observe a signiﬁcant improvement ov er competing methods. This improv ement is achiev ed because the joint approach lev erages the correlation both in the time and in the vertex domain: each random variable in a TWSS or VWSS process is dependent on only T − 1 or N − 1 other random variables, respectiv ely (rather than N T − 1 as in the joint case), implying a higher recov ery variance. V I I . C O N C L U S I O N This paper proposed a novel deﬁnition of (wide-sense) stationarity appropriate for time-varying graph signals. W e showed that joint stationarity possess a number of useful properties, that are familiar from the classical setting. Based on our deﬁnition, we proposed a W iener optimization framew ork and the accompanying PSD estimation method, which together can be used to for solving the problem of in verting a rank- deﬁcient linear system under a jointly stationary input and disturbance. The proposed optimization frame work is optimal in the mean-squared error sense and scales well with the number of time samples. In our experiment with a weather dataset, the joint approach was shown to yield a signiﬁcant beneﬁt over disjoint statistical methods for signal denoising and recov ery . The longer version of this paper will expand our analysis and ev aluate our approach in a larger set of experiments. W e will additionally make a detailed complexity analysis and propose solutions to a void the computationaly expensiv e 7 diagonalization of the graph Laplacian L G . W e remark that our simulations were done using the GSPBOX [26], the UNLocBoX [27], and the L TF A T [23]. The code reproducing all ﬁgures will be made available soon. R E F E R E N C E S [1] L. A. Adamic and N. Glance, “The political blogosphere and the 2004 us election: divided they blog, ” in Proceedings of the 3r d international workshop on Link discovery . A CM, 2005, pp. 36–43. [2] P . Mohan, V . N. Padmanabhan, and R. Ramjee, “Nericell: rich mon- itoring of road and trafﬁc conditions using mobile smartphones, ” in Pr oceedings of the 6th ACM conference on Embedded network sensor systems . A CM, 2008, pp. 323–336. [3] W . Huang, L. Goldsberry , N. F . W ymbs, S. T . Grafton, D. S. Bassett, and A. Ribeiro, “Graph frequency analysis of brain signals, ” arXiv preprint arXiv:1512.00037 , 2015. [4] F . Zhang and E. R. Hancock, “Graph spectral image smoothing using the heat kernel, ” P attern Recognition , vol. 41, no. 11, pp. 3328–3342, 2008. [5] A. J. Smola and R. Kondor , “Kernels and regularization on graphs, ” in Learning theory and kernel machines . Springer , 2003, pp. 144–158. [6] M. Belkin and P . Niyogi, “Semi-supervised learning on riemannian manifolds, ” Machine learning , vol. 56, no. 1-3, pp. 209–239, 2004. [7] N. Perraudin and P . V andergheynst, “Stationary signal processing on graphs, ” arXiv preprint , 2016. [8] V . Kalofolias, “How to learn a graph from smooth signals, ” arXiv pr eprint arXiv:1601.02513 , 2016. [9] A. Loukas and D. Foucard, “Frequency analysis of temporal graph signals, ” arXiv preprint , 2016. [10] B. Girault, “Stationary graph signals using an isometric graph transla- tion, ” in Signal Pr ocessing Conference (EUSIPCO), 2015 23rd Euro- pean . IEEE, 2015, pp. 1516–1520. [11] A. G. Marques, S. Segarra, G. Leus, and A. Ribeiro, “Stationary graph processes and spectral estimation, ” arXiv pr eprint arXiv:1603.04667 , 2016. [12] A. Sandryhaila and J. M. Moura, “Discrete signal processing on graphs, ” IEEE transactions on signal processing , vol. 61, pp. 1644–1656, 2013. [13] D. K. Hammond, P . V anderghe ynst, and R. Gribonv al, “W avelets on graphs via spectral graph theory, ” Applied and Computational Harmonic Analysis , vol. 30, no. 2, pp. 129–150, 2011. [14] D. I. Shuman, S. K. Narang, P . Frossard, A. Ortega, and P . V an- derghe ynst, “The emerging ﬁeld of signal processing on graphs: Ex- tending high-dimensional data analysis to networks and other irregular domains, ” Signal Pr ocessing Magazine, IEEE , v ol. 30, no. 3, pp. 83–98, 2013. [15] D. I. Shuman, B. Ricaud, and P . V andergheynst, “V ertex-frequency analysis on graphs, ” arXiv preprint , 2013. [16] G. Strang, “The discrete cosine transform, ” SIAM r eview , v ol. 41, no. 1, pp. 135–147, 1999. [17] N. Perraudin, B. Ricaud, D. Shuman, and P . V anderghe ynst, “Global and local uncertainty principles for signals on graphs, ” arXiv preprint arXiv:1603.03030 , 2016. [18] N. Wiener , “Generalized harmonic analysis, ” Acta mathematica , vol. 55, no. 1, pp. 117–258, 1930. [19] M. Rudelson, “Random vectors in the isotropic position, ” Journal of Functional Analysis , vol. 164, no. 1, pp. 60–72, 1999. [20] R. V ershynin, “How close is the sample cov ariance matrix to the actual cov ariance matrix?” Journal of Theoretical Probability , vol. 25, no. 3, pp. 655–686, 2012. [21] M. S. Bartlett, “Periodogram analysis and continuous spectra, ” Biometrika , pp. 1–16, 1950. [22] P . W elch, “The use of fast fourier transform for the estimation of power spectra: a method based on time averaging over short, modiﬁed periodograms, ” IEEE T ransactions on audio and electr oacoustics , pp. 70–73, 1967. [23] Z. Prusa, P . L. Sonderg aard, N. Holighaus, C. Wiesmeyr , and P . Balazs, “The Lar ge Time-Frequency Analysis T oolbox 2.0, ” in Sound, Music, and Motion , ser . Lecture Notes in Computer Science, M. Aramaki, O. Derrien, R. Kronland-Martinet, and S. Ystad, Eds. Springer International Publishing, 2014, pp. 419–442. [Online]. A vailable: http://dx.doi.org/10.1007/978- 3- 319- 12976- 1 25 [24] K. Gr ¨ ochenig, F oundations of time-fr equency analysis . Springer Science & Business Media, 2013. [25] H. G. Feichtinger and T . Strohmer, Gabor analysis and algorithms: Theory and applications . Springer Science & Business Media, 2012. [26] N. Perraudin, J. Paratte, D. Shuman, V . Kalofolias, P . V andergheynst, and D. K. Hammond, “GSPBO X: A toolbox for signal processing on graphs, ” ArXiv e-prints , Aug. 2014. [27] N. Perraudin, D. Shuman, G. Puy, and P . V andergheynst, “UNLocBoX A matlab con ve x optimization toolbox using proximal splitting methods, ” ArXiv e-prints , Feb. 2014.

Towards stationary time-vertex signal processing

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment