Irregularity-Aware Graph Fourier Transforms

In this paper, we present a novel generalization of the graph Fourier transform (GFT). Our approach is based on separately considering the definitions of signal energy and signal variation, leading to several possible orthonormal GFTs. Our approach i…

Authors: Benjamin Girault, Antonio Ortega, Shrikanth Narayanan

Irregularity-Aware Graph Fourier Transforms
1 Irre gularity-A ware Graph Fourier T ransforms Benjamin Girault, Antonio Ortega F ellow , IEEE and Shrikanth S. Narayanan F ellow , IEEE Abstract —In this paper , we pr esent a no vel generalization of the graph Fourier transf orm (GFT). Our approach is based on separately considering the definitions of signal energy and signal variation, leading to several possible orthonormal GFTs. Our approach includes traditional definitions of the GFT as special cases, while also leading to new GFT designs that are better at taking into account the irregular nature of the graph. As an illustration, in the context of sensor networks we use the V oronoi cell ar ea of vertices in our GFT definition, showing that it leads to a more sensible definition of graph signal energy even when sampling is highly irregular . Index T erms —Graph signal processing, graph fourier trans- form. I . I N T RO D U C T I O N W ith the ev er growing deluge of data, graph signal process- ing has been proposed for numerous applications in recent years, thanks to its ability to study signals lying on irregular discrete structures. Examples include weather stations [1], taxi pickups [2] or bic ycle rides [3], [4], people in a social network [5], or motion capture data [6]. While classical signal processing is typically built on top of regular sampling in time (1D Euclidean space), or space (2D Euclidean space), graphs can be applied in irregular domains, as well as for irregular sampling of Euclidean spaces [7]. Thus, graph signal processing can be used to process datasets while taking into consideration irregular relationships between the data points. Successful use of graph signal processing methods for a giv en application requires identifying: i) the right graph struc- ture, and ii) the right frequenc y representation of graph signals. The choice of graph structure has been studied in recent work on graph learning from data [8]–[12]. In this paper we focus on the second question, namely , given a graph structure, how to extend the classical Fourier transform definition, which relies on a regular structure, to a graph F ourier transform , which is linked to a discrete irregular structure. State of the art methods deriv e the definition of the graph Fourier transform (GFT) from algebraic representations of the graph such as the adjacency matrix A , whose entries are the weights of the edges connecting vertices. If i and j are two vertices connected through an edge weighted by The authors are with the Signal and Image Processing Institude, Uni- versity of Southern California, Los Angeles, CA 90089, USA (first- name.lastname@usc.edu). W ork partially funded by NSF under grants CCF-1410009, CCF-1527874, CCF-1029373. The research is based upon work supported by the Of fice of the Director of National Intelligence (ODNI), Intelligence Adv anced Research Projects Activity (IARP A), via IARP A Contract No 2017- 17042800005. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the ODNI, IARP A, or the U.S. Gov ernment. The U.S. Gov ernment is authorized to reproduce and distrib ute reprints for Governmental purposes notwithstanding any copyright annotation thereon. w ( ij ) , then A ij = w ( ij ) . In the context of sensor networks, edges are often defined by selecting the K -nearest neighbors to each vertex, with weights gi ven by a Gaussian kernel of the Euclidean distance. This is a setting shown to have good properties in the context of manifold sampling [13] when the number of samples is large. The adjacency matrix is then used to compute the Laplacian matrix L = D − A with D the diagonal degree matrix verifying D ii = d i = P j w ( ij ) . Con ventional methods in graph signal processing use the eigen vectors of either A [14] or L [15] as graph Fourier modes. One motiv ation of this paper comes from physical sensing applications, where data collection is performed by recording on a discrete set of points, such as weather readings in a weather station network. In these applications the distrib ution of the sensors is irregular , and so it is not easy to map these measurements back onto a grid in order to use the classical signal processing toolbox. Importantly , in many of these applications, the specific arrangement of sensors is unrelated to the quantity measured. As an ob vious e xample, the exact location and distribution of weather stations does not hav e any influence o ver the weather patterns in the region. Thus, it would be desirable to dev elop graph signal processing representations that i) hav e a meaningful interpretation in the context of the sensed data, and ii) are not sensiti ve to changes in graph structure, and in particular its degree of regularity . 1 In order to motiv ate our problem more precisely on this sensor network example, consider a popular definition of GFT , based on the eigen values of L , with v ariation for a graph signal x defined as: x H L x = 1 2 X ij w ( ij ) | x i − x j | 2 . A signal x with high v ariation will ha ve v ery dif ferent v alues ( i.e. , large | x i − x j | 2 ) at nodes connected by edges with large weights w ( ij ) . This GFT definition is such that variation for constant signals is zero, since L1 = 0 . This is a nice property , as it matches definitions of frequency in continuous domain, i.e . , a constant signal corresponds to minimal v ariation and thus frequency zero. Note that this property is v alid independently of the number and position of sensors in the en vironment, thus achie ving our goal of limiting the impact of graph choice on signal analysis. In contrast, consider the frequency representation associated to an impulse localized to one vertex in the graph. The impulse signal δ i has v ariation δ H i L δ i = d i , where d i is the degree 1 Note that studies of stationarity of graph signals, such as [1], are concerned with the variability across multiple observations of signals on the same graph. Instead here we focus on how the choice of different graphs ( i.e. , different vertex/sensor positions placed in space) affects the spectral representation of the corresponding graph signals ( e.g. , dif ferent discrete samples of the same continuous domain signal). 2 of verte x i . Impulses do not have equal v ariation, so that a highly localized phenomenon in continuous space (or a sensor anomaly affecting just one sensor) would produce different spectral signatures depending on the degree of the vertex where the measurement is localized. As a consequence, if the set of sensors changes, e.g . , because only a subset of sensors are active at any given time, the same impulse at the same vertex may have a significantly different spectral representation. As an alternativ e, choosing the symmetric normalized Laplacian L = D − 1 / 2 LD − 1 / 2 to define the GFT would lead to the opposite result: all impulses would hav e the same variation, b ut a constant signal would no longer correspond to the lo west frequency of the GFT . Note the importance of graph regularity in this trade-off: the more irregular the degree distribution, the more we deviate from desirable behavior for the impulses ( L ) or for the lowest frequency ( L ). As further motiv ation, consider the definition of signal energy , which is typically its ` 2 -norm in the literature: E x = k x k 2 2 [14]–[16]. Assume that there is an area within the region being sensed where the energy of the continuous signal is higher . For this giv en continuous space signal, energy esti- mates through graph signal ener gy will depend significantly on the position of the sensors ( e.g. , these estimates will be higher if there are more sensors where the continuous signal energy is higher). Thus, estimated energy will depend on the choice of sensor locations, with more significant dif ferences the more irregular the sensor distribution is. In this paper , we propose a nov el approach to address the challenges associated with irregular graph structures by replacing the dot product and the ` 2 -norm, with a dif ferent inner product and its induced norm. Note that here irre gularity refers to how irregular the graph structure is, with vertices having local graph structures that can v ary quite significantly . In particular , a graph can be regular in the sense of all vertices having equal degree, yet vertices not being homogeneous w .r .t. to some other respect. Different choices of inner product can be made for different applications. F or any of these choices we show that we can compute a set of graph signals of increasing variation, and orthonormal with respect to the chosen inner product. These graph signals form the basis vectors for novel irregularity-aw are graph Fourier transforms that are both theoretically and computationally efficient. This framew ork applies not only to our motiv ating example of a sensor network, but also to a wide range of applications where verte x irregularity needs to be taken into consideration. In the rest of this paper , we first introduce our main contribution of an irregularity-aware graph Fourier transform using a gi ven graph signal inner product and graph signal energy (Sec. II). W e then explore the definition of graph filters in the context of this nov el graph Fourier transform and show that they share many similarities with the classical definition of graph filters, but with more richness allowed by the choice of inner product (Sec. III). W e then discuss specific choices of inner product, including some that correspond to GFTs kno wn in the literature, as well novel some novel ones (Sec. IV). Finally , we present two applications of these irre gularity-aware graph Fourier transforms: verte x clustering and analysis of sensor network data (Sec. V). I I . G R A P H S I G N A L E N E R G Y : L E A V I N G T H E D OT P R O D U C T One of the cornerstones of classical signal processing is the Fourier transform, and one of its essential properties is orthogonality . Indeed, this leads to the Generalized Parsev al Theorem: The inner product in the time h x , y i and spectral h b x , b y i domains of two signals are equal. In this section, we propose a generalization to the definition of the GFT . Our k ey observation is to note that the choice of an inner product is a parameter in the definition of the GFT , and the usual dot product is not the only choice av ailable for a sound definition of a GFT verifying Parse val’ s Theorem. A. Graph Signal Pr ocessing Notations Let G = ( V , E , w ) be a graph with V = { 1 , . . . , N } its vertex set, 2 E ⊆ V × V its edge set, and w : E → R + the weight function of the edges (as studied in [17], these weights should describe the similarity between vertices). For simplicity , we will denote the edge from i to j as ij . A graph is said undir ected if for any edge ij , j i is also an edge, and their weights are equal: w ( ij ) = w ( j i ) . A self-loop is an edge connecting a vertex to itself. A graph is algebraically represented by its adjacency ma- trix 3 A verifying A ij = 0 if ij is not an edge, and A ij = w ( ij ) otherwise. 4 W e use also a normalization of the adjacency matrix by its eigen value of largest magnitude µ max such that A ( norm ) = 1 µ max A [18]. W e denote d i = P j A ij the de gr ee of vertex i , and D the diagonal de gr ee matrix having d as diagonal. When the graph is undirected, the Laplacian matrix is the matrix L = D − A . T wo normalizations of this matrix are frequently used in the literature: the normalized Laplacian matrix L = D − 1 / 2 LD − 1 / 2 = I − D − 1 / 2 AD − 1 / 2 and the random walk Laplacian L R W = D − 1 L = I − D − 1 A . In this graph signal processing literature, the goal is to study signals lying on the vertices of a graph. More pre- cisely , a graph signal is defined as a function of the vertices x : V → R or C . The vertices being index ed by { 1 , . . . , N } , we represent this graph signal using the column vector x = [ x 1 , . . . , x N ] T . Follo wing the literature, the GFT is defined as the mapping from a signal x to its spectral content b x by orthogonal projection onto a set of graph F ourier modes (which are themselves graph signals) { u 0 , . . . , u N − 1 } and such that x i = P l b x ( l ) [ u l ] i = [ U b x ] i , where U = [ u 0 · · · u N − 1 ] . W e denote F − 1 = U the inver se GFT matrix and F = U − 1 the GFT matrix . Furthermore, we assume that there is an operator ∆ : C V → R + quantifying how much variation a signal shows on the 2 Although vertices are indexed, the analysis should be independent of the specific indexing chosen, and any other indexing shall lead to the same outcome for any vertex for a sound GSP setting. 3 Throughout this paper, bold uppercase characters represent matrices and bold lo wercase characters represent column vectors. 4 Note that A ij corresponds here to the edge from i to j , opposite of the con vention used in [14]. This allows for better clarity at the expense of the interpretation of the filtering operation A x . Here, ( A x ) i = P j A ij x j is the dif fusion along the edges in rev erse direction, from j to i . 3 T ABLE I. Hermitian positive semidefinite (HPSD) graph v ariation operators. Here GQV is the Graph Quadratic V ariation (see Sec. IV -A3). Name ( M ) ∆( x ) [15] Comb. Lapl. ( L ) 1 2 P ij w ( ij ) | x i − x j | 2 [15] Norm. Lapl. ( L ) 1 2 P ij w ( ij )     x i √ d i − x j √ d j     2 [18] GQV ( ( I − A ( norm ) ) 2 ) P i | x i − [ A ( norm ) x ] i | 2 graph. ∆ typically depends on the graph chosen. Sev eral examples are giv en on T abs. I and II. The graph variation ∆ is related to a frequency as variation increases when frequency increases. The graph fr equency of the graph Fourier mode u l is then defined as λ l = ∆( u l ) . 5 In summary , the two ingredients we need to perform graph signal processing are i) the graph F ourier matrix F defining the projection on a basis of elementary signals ( i.e . , the graph Fourier modes), and ii) the variation operator ∆ defining the graph frequency of those elementary signals. Our goal in this paper is to propose a new method to choose the graph Fourier matrix F , one dri ven by the choice of graph signal inner product. For clarity and conciseness, we focus on undirected graphs, but our definitions extend naturally to directed graphs. An in-depth study of directed graphs will be the subject of a future communication. B. Norms and Inner Pr oducts In the literature on graph signal processing, it is desirable for the GFT to orthogonal, i.e. , the graph F ourier tr ansform is an orthogonal pr ojection on an orthonormal basis of gr aph F ourier modes . Up to now , orthogonality has been defined using the dot product: h x , y i = y H x , with . H the transpose conjugate operator . W e propose here to relax this condition and explore the benefits of choosing an alternati ve inner product on graph signals to define orthogonality . First, note that h ., . i Q is an inner product on graph signals, if and only if there exists a Hermitian positive definite (HPD) matrix Q such that: h x , y i Q = y H Q x , for any graph signals x and y . W e refer to this inner product as the Q -inner pr oduct . W ith this notation, the standard dot product is the I -inner product. Moreover , the Q -inner product induces the Q -norm: k x k Q = q h x , x i Q = q x H Q x . Therefore, an orthonormal set for the Q -inner product is a set { u l } l verifying: u k H Qu l = ( 0 if k 6 = l 1 otherwise, i.e. , U H QU = I with U = [ · · · u l · · · ] . Although any HPD matrix Q defines a proper inner product, we will be mostly focusing on diagonal matrices Q = diag( q 1 , . . . , q N ) . In that case, the squared Q -norm of a graph 5 W e also used before λ l = p ∆( u l ) when ∆ is a quadratic variation [16]. In this paper, howe ver , the difference between the two does not affect the results. T ABLE II. Non-HPSD graph variation operators. Here, GTV is the Graph T otal V ariation (see Sec. IV -A3), and GDV is the graph directed variation (see Sec. II-F). Name ∆( x ) [18] GTV P i   x i − [ A ( norm ) x ] i   [19] GDV P ij w ( ij )[ x i − x j ] + signal x , i.e. , its energy , is a weighted sum of its squared components: k x k 2 Q = X i ∈V q i | x i | 2 . Such a norm is simple but will be shown to yield interesting results in our motiv ating example of sensor networks (see Sec. V -B). Essentially , if q i quantifies ho w important v ertex q i is on the graph, the energy above can be used to account for irregularity of the graph structure by correctly balancing verte x importance. Examples of diagonal matrix Q include I and D , but more inv olved definitions can be chosen such as with the illustrating example of a ring in Sec. II-D or our nov el approach based on V oronoi cells in Sec. IV -D. Moreov er , endowing the space of graph signals with the Q - inner product yields the Hilbert space H G ( Q ) = ( C V , h ., . i Q ) of graph signals of G . This space is important for the next section where we generalize the graph Fourier transform. Remark 1. T o simplify the pr oofs, we observe that many r esults fr om matrix algebra that r ely on diagonalization in an orthonormal basis w .r .t. the dot pr oduct can actually be extended to the Q -inner pr oduct using the following isometric operator on Hilbert spaces: ϕ : H G ( Q ) − → H G ( I ) x 7− → Q 1 / 2 x . Since Q is in vertible, ϕ is in vertible, and an orthonormal basis in one space is mapped to an orthonormal basis in the other space, i.e ., { u l } l is an orthonormal basis of H G ( Q ) if and only if { ϕ ( u l ) } l is an orthonormal basis of H G ( I ) . For example, if h is a linear operator of H G ( Q ) such that h ( x ) = H x , then e h : y 7→ ϕ ( h ( ϕ − 1 ( y ))) = Q 1 / 2 HQ − 1 / 2 y is an operator of H G ( I ) . Also, the eigenv ectors of H are orthonormal w .r .t. the Q -inner product if and only if the eigen vectors of Q 1 / 2 HQ − 1 / 2 are orthonormal w .r .t. the dot product. This applies for example to L RW = D − 1 L the random walk Laplacian. Since L = D 1 / 2 ( D − 1 L ) D − 1 / 2 has orthonormal eigen vectors w .r .t. the dot product, then the eigen vectors of L RW are orthonormal w .r .t. the D -inner product (see Sec. IV -B). C. Contribution 1: Generalized Graph F ourier Modes Definition 1 (Generalized Graph Fourier Modes) . Given the Hilbert space H G ( Q ) of graph signals and the varia- tion operator ∆ , the set of (∆ , Q ) -graph Fourier modes is defined as an orthonormal basis of gr aph signals { u l } l solution to the following sequence of minimization 4 pr oblems, for incr easing L ∈ { 0 , . . . , N − 1 } : min u L ∆( u L ) subj. to U H L QU L = I , (1) wher e U L = [ u 0 . . . u L ] . This definition is mathematically sound since the search space of each minimization problem is non empty . Note also that this definition relies on only two assumptions: i) Q is an inner product matrix, and ii) ∆ maps graph signals to real values. 6 Among the examples of graph variation examples given in T abs. I and II, those of T ab . I share the important property of being quadratic forms: Definition 2 (Hermitian Positi ve Semi-Definite Form) . The variation operator ∆ is a Hermitian positi ve semi-definite (HPSD) form if and only if ∆( x ) = x H M x and M is a Hermitian positive semi-definite matrix. When ∆ is an HPSD form it is then algebraically charac- terized by the matrix M . In what follo ws we denote the graph variation operator as M whene ver ∆( x ) = x H M x is verified. Examples from the literature of graph variation operators that are HPSD forms are shown on T ab. I, and non-HPSD ones are shown on T ab . II. The following theorem is of utmost importance as it relates the solution of (1) to a generalized eigen value problem when ∆ is an HPSD form: Theorem 1. If ∆ is an HPSD form with HPSD matrix M , then { u l } l is a set of (∆ , Q ) -graph F ourier modes if and only if the graph signals { u l } l solve the following generalized eigen value problems for incr easing eigen val- ues λ l = ∆( u l ) : Mu l = λ l Qu l , with k u l k 2 Q = 1 . Pr oof. Observing that x H M x = h Q − 1 M x , x i Q , we can show that M Hermitian is equiv alent to Q − 1 M being a self- adjoint operator on the Hilbert space H G ( Q ) . Therefore, using the spectral theorem, there exists an orthonormal basis { u l } l of H G ( Q ) of eigen vectors of Q − 1 M , such that: ∆( u l ) = u l H Mu l = u l H QQ − 1 Mu l = λ l u l H Qu l = λ l . The relation above between the variation of the graph Fourier modes and the eigen v alues of M also shows that, just as approaches of the literature based on the Laplacian [15] or on the adjacency matrix [14], the set of graph Fourier modes is not unique when there are multiple eigen v alues: If ∆( u l ) = ∆( u k ) , then if v l = α u l + (1 − α ) u k and v k = (1 − α ) u l + α u k , we have ∆( u l ) = ∆( v l ) = ∆( v k ) and h v l , v k i Q = 0 , for an y α ∈ [0 , 1] . This remark will be important in Sec. III, as it is desirable for the processing of graph signals to be independent of the particular choice of (∆ , Q ) -graph Fourier modes. 6 Q may depend on ∆ , but no such relation is required in Def. 1. An example is given at the end of this section where q i = ∆( δ i ) is desirable. D. Example with the Ring Graph: Influence of Q T o giv e intuitions on the influence of Q on the (∆ , Q ) - graph Fourier modes, we study the example of a ring graph with the Laplacian matrix as graph variation operator matrix: M = L . In Fig. 1, we sho w the ( L , Q ) -graph Fourier modes of a ring with 8 vertices for three choices of Q : • Q = diag(0 . 1 , 1 , . . . , 1) with a less important vertex 1, • Q = I , i.e. , the combinatorial Laplacian GFT , • Q = diag(10 , 1 , . . . , 1) with a more important vertex 1. Sev eral observ ations can be made. First of all, any ( L , I ) - graph Fourier mode which is zero on vertex 1 ( i.e . , u 1 , u 3 , u 5 ) is also an ( L , Q ) -graph Fourier mode with the same graph variation. Intuitiv ely , this verte x ha ving zero v alue means that it has no influence on the mode (here, Qu l = u l ), hence its importance does not affect the mode. For the remaining modes, we make sev eral observations. First of all, we consider the spectral representation of a highly localized signal on vertex 1, such as δ 1 pictured in the last column of Fig. 1. While Q = I in v olves many spectral components of roughly the same po wer , the other two cases q 1 = 0 . 1 and q 1 = 10 are distincti ve. Indeed, in the first case, we hav e c δ 1 (7) that is large (almost an order of magnitude larger), interpreted by a graph Fourier mode [ u 7 ] that is close to our localized signal. On the other hand, for q 1 = 10 the two largest Fourier components of δ 1 are c δ 1 (0) and c δ 1 (1) with [ u 1 ] being close to our localized signal. In other words, q i shifted the impulse δ 1 to the higher spectrum when small (unimportant verte x i ) or the lo wer spectrum when lar ge (important vertex i ). These cases gi ve intuitions on the impact of Q , and ulti- mately on how to choose it. E. Discussion on M When ∆ is and HPSD from, we can rewrite the minimiza- tion of (1) as a generalized Rayleigh quotient minimization. Indeed, the L th minimization problem is also given by: min x : U H L Q x = 0 ∆  x k x k Q  , which is exactly a Rayleigh quotient minimization when ∆ is an HPSD form of HPSD matrix M since: ∆  x k x k Q  = x H M x x H Q x . For example, this allows the study of bandpass signals using spectral proxies as in [20]. Having an HPSD form for ∆ also has two advantages, the first one being a simple solution to (1) as stated in Thm. 1. But more importantly , this solution is ef ficient 7 to compute through the generalized eigenv alue problem since both Q and M are Hermitian [21, Section 8.7]. Therefore, instead of in verting Q and computing eigen values and eigenv ectors of the non- Hermitian matrix Q − 1 M , the two matrices M and Q can be directly used to compute the ( M , Q ) -graph Fourier modes using their Hermitian properties. 7 Efficienc y comes here from the cost of computing generalized eigen values and eigen vectors compared to the when M is not HPSD or Q is not HPD. 5 λ 0 = 0 λ 1 = 0 . 59 λ 2 = 0 . 74 λ 3 = 2 λ 4 = 2 . 42 λ 5 = 3 . 41 λ 6 = 3 . 8 λ 7 = 21 . 05 q 1 = 0 . 1 1 1 1 1 1 1 1 1 10 20 − 0 . 30 0 . 05 λ l c δ 1 ( l ) λ 0 = 0 λ 1 = 0 . 59 λ 2 = 0 . 59 λ 3 = 2 λ 4 = 2 λ 5 = 3 . 41 λ 6 = 3 . 41 λ 7 = 4 Q = I 1 1 1 1 1 1 1 1 2 4 − 0 . 5 0 . 5 λ l c δ 1 ( l ) λ 0 = 0 λ 1 = 0 . 24 λ 2 = 0 . 59 λ 3 = 1 . 31 λ 4 = 2 λ 5 = 2 . 8 λ 6 = 3 . 41 λ 7 = 3 . 85 q 1 = 10 1 1 1 1 1 1 1 1 2 4 − 0 . 5 2 λ l c δ 1 ( l ) Fig. 1. ( L , Q ) -graph Fourier modes of the ring graph with N = 8 vertices. Q = diag( q 1 , 1 , . . . , 1) with one choice of q 1 per row of the table, for increasing values of q 1 . Modes are not aligned on their index, but when they correspond either exactly (3 rd , 5 th , and 8 th column), or approximately , to highlight the impact of q 1 on the graph Fourier modes. Colors scale from deep blue for − 1 to deep red for 1 and color being lighter as v alues get closer to 0 until white for 0. Note that [ u 7 ] 1 = − 3 . 08 for q 1 = 0 . 1 is out of this range. The last column shows the graph Fourier transform c δ 1 of the impulse δ 1 . In some applications, it may also be desirable to hav e Q depend on M . In our moti vating example the ke y variations of the constant signal 1 and of the impulses δ i are given by: ∆( 1 ) = X ij M ij ∆( δ i ) = M ii . In the case of 1 , variation should be zero, hence P ij M ij = 0 . Howe ver , for δ i , the variations above cannot be directly compared from one impulse to another as those impulses have different energy . V ariations of the energy normalized signals yields: ∆  δ i k δ i k Q  = M ii Q ii . W e advocated for a constant (energy normalized) variation. In effect, this leads to a relation between Q and M giv en by constant M ii /Q ii , i.e. , the diagonals of Q and M should be equal, up to a multiplicative constant. For instance, choosing M = L , which leads to ∆( 1 ) = 0 , and ∆( δ i ) = d i , our requirement on Q leads to q i = d i , hence Q = D . This is the random walk Laplacian approach described in Sec. IV -B. F . Relation to [19] Finally , we look at the recent advances in defining the graph Fourier transform from the literature, and in particular to [19] which aims at defining an orthonormal set of graph Fourier modes for a directed graph with non-negativ e weights. After defining the graph directed variation as: GD V( x ) := X ij w ( ij ) [ x i − x j ] + , where [ x ] + = max(0 , x ) , the graph Fourier modes are com- puted as a set of orthonormal vectors w .r .t. the dot product that solve: min { u l } l X l GD V( u l ) subj. to U H U = I . (2) The straightforward generalization of this optimization prob- lem to any graph variation operator ∆ and Q -inner product, is then: min { u l } l X l ∆( u l ) subj. to U H QU = I . (3) Unfortunately , we cannot use this generalization when ∆ is an HPSD form. Indeed, in that case, the sum in (3) is exactly tr( U H MU ) = tr( Q − 1 M ) for any matrix U verifying U H QU = I . For example, U = Q − 1 / 2 solves (3), but under the assumption that Q is diagonal, this leads to a trivial diagonal GFT , modes localized on vertices of the graph. Note that, for any graph variation, a solution to our pro- posed minimization problem in (1) is also a solution to the generalization in (3): Property 1. If { u l } l is a set of (∆ , Q ) -graph F ourier modes, then it is a solution to: min { u l } l X l ∆( u L ) subj. to U H QU = I , with U = [ u 0 . . . u N − 1 ] . For example, the set of (GDV , I ) -graph Fourier modes is a solution to (2), hence a set of graph F ourier modes according to [19]. This property is important as it allows, when direct computation through a closed form solution of the graph Fourier modes is not possible, to approximate the (∆ , Q ) - graph Fourier modes by first using the techniques of [19] to obtain the ( Q 1 / 2 ∆ Q − 1 / 2 , I ) -graph Fourier modes and then using Remark 1. Finally , another recent work uses a related optimization function to obtain graph Fourier modes [22]: min { u l } l X l ( f l − GQD V( u l )) 2 subj. to U H U = I , with f l = l − 1 N − 1 f max and the graph quadr atic dir ected variation defined as: GQD V( x ) := X ij w ( j i ) [ x i − x j ] 2 + . Beyond the use of a squared directed difference [ x i − x j ] 2 + , 6 the goal of this optimization is to obtain ev enly distributed graph frequencies, and not orthogonal graph Fourier modes of minimally increasing v ariation of [19] and of our contri- bution. Note that this alternative approach can implement the constraint U H QU = I to use an alternati ve definition of graph signal energy . This is howe ver out of the scope of this paper . G. Contribution 2: Generalized GFT Giv en the (∆ , Q ) -graph F ourier modes in the previous sec- tion, and the definition of the in verse graph Fourier transform F − 1 = U found in the literature and recalled in Sec. II-A, we can no w define the generalized graph Fourier transform. Note that ∆ is not assumed to be an HPSD form anymore. Definition 3 (Generalized Graph Fourier T ransform) . Let U be the matrix of (∆ , Q ) -graph F ourier modes. The (∆ , Q ) -graph Fourier transform ( (∆ , Q ) -GFT) is then: F = U H Q , and its in verse is: F − 1 = U . The in verse in Def. 3 above is a proper inv erse since FF − 1 = U H QU = I . In Sec. II-A, we introduced the graph Fourier transform as an orthogonal projection on the graph Fourier modes. Def. 3 is indeed such a projection since: b x ( l ) = [ F x ] l =  U H Q x  l = u l H Q x = h x , u l i Q . Another important remark on Def. 3 concerns the complexity of computing the graph Fourier matrix. Indeed, this matrix is the in verse of the in verse graph Fourier matrix F − 1 = U which is directly obtained using the graph Fourier modes. Computation of a matrix in verse is in general costly and subject to approximation errors. Howe ver , just as classical GFTs for undirected graphs, we can obtain this in verse without preforming an actual matrix in verse. Indeed, our GFT matrix is obtained through a simple matrix multiplication U H Q = U − 1 that uses the orthonormal property of the graph Fourier modes basis. Additionally , when Q is diagonal, this matrix multipli- cation is extremely simple and easy to compute. One property that is essential in the context of an orthonor- mal set of graph Fourier modes is Parse v al’ s Identity where the energy in the verte x and spectral domains are equal: Property 2 (Generalized Parse v al’ s Theorem) . The (∆ , Q ) - GFT is an isometric operator fr om H G ( Q ) to H b G ( I ) . Mathe- matically: h x , y i Q = h b x , b y i I . Pr oof. h x , y i Q = y H Q x = b y H  F − 1  H QF − 1 b x = b y H b x . Finally , Prop. 2 bears similarities with Remark 1. Indeed, in both cases, there is an isometric map from the Hilbert space H G ( Q ) to either H G ( I ) or H b G ( I ) . Howe ver , in Prop. 2, the graph signal x is mapped to the spectral components F x of x instead of Q 1 / 2 x such that both cases are distinct. Intuiti vely , ϕ and Q 1 / 2 reshape the space of graph signals to account for irregularity , while the graph Fourier matrix F decomposes the graph signal into elementary graph Fourier modes of distinct graph variation. I I I . G R A P H F I LT E R S In this section we e xplore the concept of operator on graph signals, and more specifically , operators whose definition is intertwined with the graph Fourier transform. Such a relation enforces a spectral interpretation of the operator and ultimately ensures that the graph structure plays an important role in the output of the operator . A. Fundamental Matrix of the GFT Before defining graph filters, we need to introduce what we call the fundamental matrix of the (∆ , Q ) -GFT giv en its graph Fourier matrix F and the diagonal matrix of graph frequencies Λ : Z := F − 1 ΛF = UΛU H Q . Although ∆ does not appear in the definition above, Z does depend on ∆ through the graph Fourier matrix F . Some authors use the term shift for this matrix when Q = I . Howe ver , the literature uses v ery often the Laplacian matrix [15] as the fundamental matrix, and as a close equi v alent to a second order dif ferential operator [13], it does not qualify as the equi valent of a shift operator . Therefore, we choose the more neutral term of fundamental matrix . Y et, our definition is a generalization of the literature where such a matrix is always diagonalizable in the graph Fourier basis, with graph frequencies as eigenv alues. T ab . III shows sev eral classical choices of fundamental matrices depending on Q and ∆ . Further assuming that ∆ is an HPSD form, we ha ve Z = Q − 1 M . As we will see in Sec. IV, this is consistent with the graph signal processing literature. Noticeably , this matrix is also uniquely defined under these conditions, even though the graph Fourier matrix may not be. Moreo ver , complexity of computing this matrix is negligible when Q is diagonal. A diagonal matrix Q also leads to Z ha ving the same sparsity as M since M and Z differ only for non-zero elements of M . Algorithms whose efficienc y is driv en by the sparsity of M are generalized without loss of efficienc y to Q − 1 M = Z . This will be the case for the examples of this paper . B. Definitions of Graph Filter s W e recall here the classical definitions of graph filters and describe how we straightforwardly generalize them using the fundamental matrix Z of the GFT . First of all, giv en a graph filter H , we denote by b H = FHF − 1 the same graph filter in the spectral domain such that d H x = b H b x . This notation allows to properly study the spectral response of a giv en graph filter . W e can now state the three definitions of graph filters. The first one directly e xtends the inv ariance through time shifting. This is the approach followed by [14], replacing the adjacency matrix by the fundamental matrix of the graph: 7 Definition 4 (Graph Filters by In variance) . H is a graph filter if and only if it commutes with the fundamental matrix Z of the GFT : HZ = ZH . The second definition extends the con volution theorem for temporal signals, where con volution in the time domain is equiv alent to pointwise multiplication in the spectral domain [17, Sec. 3.2] 8 . The follo wing definition is identical to those in the literature, simply replacing existing choices of GFT with one of our proposed GFTs: Definition 5 (Graph Filters by Con volution Theorem) . H is a graph filter if and only if ther e exists a graph signal h such that: d H x ( l ) = b h ( l ) b x ( l ) . The third definition is also a consequence of the con volution theorem, but with a different interpretation. Indeed, in the time domain, the Fourier transform b s of a signal s is a function of the (signed) frequency . Given that there is a finite number of graph frequencies, any function of the graph frequency can be written as a polynomial of the graph frequenc y (through polynomial interpolation). W e obtain the following definition using the fundamental matrix instead [23], [24]: Definition 6 (Graph Filters by Polynomials) . H is a graph filter if and only if it is a polynomial in the fundamental matrix Z of the GFT : H = X k h k Z k . Interestingly , these definitions are equiv alent for a majority of graphs where no two graph frequencies are equal. Howe ver , in the con verse case of a graph with two equal graph frequen- cies λ k = λ l , these definitions differ . Indeed, Def. 6 implies that b h ( k ) = b h ( l ) , while b h ( k ) and b h ( l ) are arbitrary according to Def. 5. Also, a graph filter according to Def. 4 does not necessarily verify b H diagonal, since d H u l = H k,l u k + H l,l u l is a valid graph filter , ev en with H k,l 6 = 0 . Choosing one definition ov er another is an application-dependent choice driv en by the meaning of two graph Fourier modes of equal graph frequencies. For example, if these modes are highly related, then Def. 6 is a natural choice with equal spectral response of the filter , whereas in the opposite case of unrelated modes, Def. 5 allows to account for more flexibility in the design of the filter . C. Mean Squar e Error Mean Squar e Error (MSE) is classically used to study the error made by a filter when attempting to recov er a signal from a noisy input. More precisely , giv en an observation y = x + n 8 In [23], the authors also define conv olutions as pointwise multiplications in the spectral domain. Howe ver , the notation used for the GFT is unclear since b x ( λ l ) (instead of b x ( l ) ) can be interpreted as the graph signal x having equal spectral components for equal graph frequencies. Such a requirement actually falls within the setting of Def. 6. of a signal x with additi ve random noise n (with zero mean), MSE is defined as: MSE H ( x ) := E  k Hy − x k 2 2  = X i E h ([ Hy ] i − x i ) 2 i . (4) In other words, this is the mean energy of the error made by the filter . Ho wev er , the definition of energy used above is the dot product. As stated before, this energy does not account for irregularity of the structure. In the general case of an irregular graph, we defined in Sec. II-B graph signal ener gy using the Q -norm. W e define in this section the energy of an error by the Q -norm of that error , thus generalizing mean square error into the Q -MSE: MSE ( Q ) H ( x ) := E h k Hy − x k 2 Q i . Using the zero mean assumption, this yields the classical bias/variance formula: MSE ( Q ) H ( x ) := k H x − x k 2 Q | {z } bias term + E h k Hn k 2 Q i | {z } noise variance term (5) The definition of graph Fourier transform introduced in Def. 3 is then a natural fit to study this MSE thanks to Parse val’ s identity (Prop. 2): MSE ( Q ) H ( x ) =     b H − I  b x    2 2 + E     b H b n    2 2  . Since b H is diagonal (or block diagonal if using Def. 4), studying the bias and the variance from a spectral point of view is much simpler . Recalling the meaning of the graph Fourier modes in terms of graph variation, this also allo ws to quantify the bias and v ariance for slo wly varying components of the signal (lo wer spectrum) to quickly v arying components (higher spectrum) gi ving an intuitiv e interpretation to MSE and the bias/variance trade off across graph frequencies. T o have a better intuition on how using the Q -norm allows to account for irreguarity , we no w assume that Q is diagonal. Let  = Hn be the filtered noise. The noise variance term abov e becomes: E h k  k 2 Q i = X i q i E  |  i | 2  In other words, those vertices with higher q i hav e a higher impact on the o verall noise ener gy . Using E  k  k 2 Q  = tr   H Q  , we also hav e: E  k Hn k 2 Q  = tr  b H H b HF Σ n QF − 1  , where Σ n = E  nn H  is the cov ariance matrix of the noise. W e introduce a definition of Q -white noise with a tractable power spectrum: Definition 7 ( Q -White Noise) . The graph signal noise n is said Q -White Noise ( Q -WN) if and only if it is center ed E [ n ] = 0 and its covariance matrix verifies Σ n = σ 2 Q − 1 , for some σ ≥ 0 . There are two important observations to make on this definition. In the vertex domain, if we assume a diagonal Q , this noise has higher po wer on v ertices with smaller q i : 8 E  | n i | 2  = σ 2 q i . Assuming Q -WN is equivalent to assuming less noise on vertices with higher q i . Therefore, this definition of noise can account for the irregularity of the graph structure through Q . Second, the importance of this definition is best seen in the spectral domain. Indeed, the spectral covariance matrix verifies: Σ b n = F Σ n F H = σ 2 I . In other words, a Q -WN has a flat power spectrum. Note that this is true independently of the v ariation operator ∆ chosen to define the (∆ , Q ) -GFT matrix F , hence the name of Q -WN (and not (∆ , Q ) -WN). The noise v ariance term in (5) under the assumption of a Q -WN n becomes: E h k Hn k 2 Q i = σ 2 tr  b H H b H  . In other words, it is completely characterized by the spectral response of the filter H . Note that using Def. 4, b H is not necessarily Hermitian. Using Def. 5 or Def. 6, b H is diagonal and the noise variance term simplifies to: E h k Hn k 2 Q i = σ 2 X l    b h ( l )    2 . I V . C H O O S I N G T H E G R A P H S I G N A L E N E R G Y M A T R I X Q The goal of this section is twofold. First, we look at the literature on the graph Fourier transform to show how closely it relates to our definition, including the classical Fourier transform for temporal signals. This is summarized in T ab. III. The second goal is to giv e examples of graph Fourier transforms that use the ne wly introduced degree of freedom allowed by the introduction of the Q -inner product. A. The Dot Product: Q = I 1) T emporal Signals: This case deri ves from classical digi- tal signal processing [25], with x [ k ] a periodic temporal signal of period T and sampled with sampling frequency N /T ( N samples per period). This sampling corresponds to a ring graph with N vertices. In this case, energy is classically defined as a scaled ` 2 -norm of the vector [ x [0] , . . . , x [ N − 1]] T , i.e. , E x = T N x H I x . DFT modes are eigenv ectors of the continuous Laplacian operator ∆ T x = d 2 x dt 2 , thus corresponding to the variation ∆( x ) = h ∆ T x , x i = x H ∆ T x . Finally , DFT modes are orthogonal w .r .t. the dot product. 2) Combinatorial and Normalized Laplacians: This clas- sical case relies on the computations of the eigenv ectors of the combinatorial Laplacian L = D − A or of the normalized Laplacian L = D − 1 / 2 LD − 1 / 2 to define the graph Fourier modes [15]. These graph Fourier transforms are exactly the ( L , I ) -GFT and the ( L , I ) -GFT . 3) Graph Shift: This is a more complex case where the (generalized) eigen vectors of the adjacency matrix are used as graph Fourier modes [14]. When the graph is undirected, it can be shown that this case corresponds to the (( I − A ( norm ) ) 2 , I ) - GFT [18], with ∆( x ) = k x − A ( norm ) x k 2 2 = GQV ( x ) , the Graph Quadratic V ariation . An alternativ e definition of smoothness based on the ` 1 -norm of x − A ( norm ) x leading to the Graph T otal V ariation ∆( x ) = k x − A ( norm ) x k 1 = GTV( x ) is also in vestigated in [18]. Howe ver , this norm leads to a different graph Fourier transform when used to solve (1), since an ` 1 -norm promotes sparsity (smaller support in graph F ourier modes) while an ` 2 -norm promotes smoothness. Finally , when the graph is directed, the resulting GFT of [14] has no guarantee that its graph Fourier modes are orthogonal. Note that the same ∆( x ) = k x − A ( norm ) x k 2 2 can be used in the directed case, leading to M = ( I − A ( norm ) ) H ( I − A ( norm ) ) . Solving (1) is then equi valent to computing the SVD of ( I − A ( norm ) ) and using its right singular vectors as ( M , I ) - graph Fourier modes. Howe ver , these dif fer from the eigen vec- tors of ( I − A ( norm ) ) used in [14] since those are orthogonal. 4) Dir ected V ariation Appr oach ( ∆ = GDV ): W e pre- sented in Sec. II-F a recently proposed GFT [19]. This approach minimizes the sum of directed variation of an orthogonal set of graph signals (see (3)). As explained in Sec. II-F, the (∆ , I ) -GFT is a directed GFT as described in [19]. B. The De gr ee-Norm (Random W alk Laplacian): Q = D The random walk Laplacian, L R W = D − 1 L , is not widely used in the graph signal processing literature, gi ven its lack of symmetry , so that its eigen vectors are not necessarily orthogo- nal. Therefore, the graph Fourier matrix F is not unitary , hence naiv ely computing it through the matrix inv erse F = U − 1 is not efficient. Y et, this normalization is successfully used in clustering in a similar manner to the combinatorial and normalized Laplacians [26]. Noticeably , it can be le veraged to compute an optimal embedding of the vertices into a lo wer dimensionnal space, using the first few eigenv ectors [27]. In graph signal processing we can cite [28], [29] as e xamples of applications of the random walk Laplacian. Another e xample, in image processing, is [30] where the authors use the random walk Laplacian as smoothness prior for soft decoding of JPEG images through x H LD − 1 L x = k L R W x k 2 D . Our framew ork actually allows for better insights on this case from a graph signal processing perspective 9 . Indeed, con- sidering the inner product with Q = D , we obtain the ( L , D ) - GFT , whose fundamental matrix is Z = D − 1 L = L R W . In [30], this leads to minimizing k L R W x k 2 D = k Z x k 2 D which is equi v alent to minimizing the energy in the higher spectrum since Z is a high pass filter . This GFT is orthonormal w .r .t. the D -inner product, leading to a graph signal energy definition based on the degr ee-norm : E x = k x k 2 D = x H D x . As stated in Remark 1, this normalization is related to the normalized Laplacian through the relation L R W = D − 1 / 2 L D 1 / 2 , such that L R W and L share the same eigen v alues and their eigen vectors are related: If x is an eigenv ector of L R W , ϕ ( x ) is an eigen vector of L . By combining the variation of the combinatorial Laplacian with the normalization of the normalized Laplacian, the ran- dom walk Laplacian achie ves properties that are desirable for 9 The property that the eigenvectors of the random walk Laplacian are orthogonal w .r .t. the Q -norm is well kno wn in the literature. W e are howe ver the first to make the connection with a properly defined graph Fourier transform. 9 T ABLE III. State of the art of the graph Fourier transform. Ref Name Directed W eights ∆ Q Orthon. ∆( 1 ) = 0 ∆( δ i ) = cst Z [15] Comb . Lapl. 7 Non-neg. L = D − A I 3 3 7 L [15] Norm. Lapl. 7 Non-ne g. L = D − 1 2 LD − 1 2 I 3 7 3 L [14] Graph Shift 7 Complex ( I − A norm ) H × I 3 7 7 A 3 ( I − A norm ) 7 [19] Graph Cut 3 Non-neg. GDV I Approx. 3 7 n.a. R W Lapl. 7 Non-ne g. L R W = D − 1 L I 7 3 3 L R W L D 3 a sensor network: ∆ ( δ i / k δ i k D ) = d i d i = 1 since k δ i k D = d i , and L1 = 0 = 0 · D1 i.e. , the graph frequencies are normalized, the constant signals have zero v ariation, and all impulses ha ve the same variation, while having dif ferent energies 10 . This case is actually justified in the context of manifold sampling in the extension [13] of [7]. The authors show that under some conditions on the weights of the edges of the similarity graph, the random walk Laplacian is essentially the continuous Laplacian, without additive or multiplicativ e term, ev en if the probability density function of the samples is not uniform. C. Bilateral F ilters: Q = I + D Bilateral filters are used in image processing to denoise images while retaining clarity on edges in the image [31]: y i = 1 1 + d i   x i + X j w ( ij ) x j   with w ( ij ) = exp  − k p i − p j k 2 2 σ 2 d  exp  − k I ( i ) − I ( j ) k 2 2 σ 2 i  where p i is the position of pixel i , I ( i ) is its intensity , and σ d and σ i are parameters. Intuitiv ely , weights are smaller when pixels are either far (first Gaussian kernel) or of different intensities (second Gaussian kernel). This second case corre- sponds to image edges. W e can rewrite this filtering operation in matrix form as: y = ( I + D ) − 1 ( I + A ) x = I − ( I + D ) − 1 L x . In other words, using Z = ( I + D ) − 1 L , we obtain that bilateral filtering is the polynomial graph filter I − Z . This corresponds exactly to a graph filter with the ( L , I + D ) -GFT . Moreov er , gi ven a noisy observation y = x + n of the noiseless image x , noise on the output of this filter is gi ven by ( I − Z ) y − x = Z x + ( I − Z ) n . This noise can be studied using the ( I + D ) -MSE introduced in Sec. III-C. Indeed, we can experimentally observe that pixels do not contribute equally to the overall error , with pixels on the edges (lower degree) being less filtered than pixels in smooth regions (higher degree). experimentally , we observe that ( I − Z ) n is an ( I + D ) -WN, thus v alidating the use of the ( I + D ) -MSE. This approach of quantifying noise is coherent with human perception of noise as the human eye is more sensiti ve to small changes in smooth 10 Impulse energies show here the irregularity of the graph structure, and as such are naturally not constant. areas. W e will de velop the study of bilateral filters with respect to our setting in a future communication. Finally , note that the e xpression of bilateral filtering sho wn here is different than in [29]: y i = 1 d i X j w ( ij ) x j . This e xpression is equi v alent to considering the filter I − Z with the ( L , D ) -GFT , and is not the original approach of [31]. D. V or onoi Cell Ar eas: Q = C In the motiv ating example of a sensor network, we wished to find a graph Fourier transform that is not biased by the particular sampling being performed. Considering the energy of the signal being measured, this energy should not vary with the sampling. In the continuous domain, we observe that energy of a continuous signal e s is defined as: E s := Z | e s ( x ) | 2 dx . In the definition above, dx acts as an elementary volume, and the integral can be interpreted as a sum over elementary volumes of the typical value | e s ( x ) | within that volume times the size of the v olume dx . The discrete version of this integral is therefore: E s ≈ X i | e s ( x i ) | 2 v ol( i ) = X i | s i | 2 v ol( i ) , with s the sampled signal on the points { x i } i . Giv en a particular sampling, the question is then what is a good value of v ol( i ) ? A simple and intuitive approximation is to approximate this volume with the volume of the subspace of points whose closest sample is x i . This subspace is e xactly the V oronoi cell of i , and the volume is therefore the area of the cell in 2D. Let c i be this area, and C = diag( c 1 , . . . , c N ) . W e obtain: E s ≈ X i | s i | 2 c i = s H Cs = k s k 2 C . Intuitiv ely , this corresponds to interpolating the sampled signal into a piece wise constant signal for which the signal values are equal within each V oronoi cell, and then computing the continuous ener gy of this interpolated signal. Other interpola- tion schemes could be considered. For e xample, if we assume a weighted linear interpolation scheme, then we obtain the interpolated signal e s ( x ) = P i f ( x , x i ) s i and its energy is: E e s = Z | e s ( x ) | 2 d x = X i,j  Z f ( x , x i ) H f ( x , x j ) d x  s H i s j , 10 (a) Ground Truth 0 7 14 (b) Degree Fig. 2. T wo cluster dataset input with a sparse cluster (left) and a dense cluster (right). (a) Ground truth clusters. (b) V ertex degrees from the 10- nearest neighbor graph Gaussian weights. and we have E e s = k s k 2 Q with q ij = R f ( x , x i ) H f ( x , x j ) d x . As soon as there is one location x whose interpolation in volves two samples i , j , then this interpolation scheme corresponds a non-diagonal matrix Q . Ho wev er , as we will sho w in Sec. V -B, the approach based on the V oronoi cell areas gives already good results compared to the state of the art of graph signal processing for sensor networks. V . E X P E R I M E N T S The choice of a graph Fourier transform, by selecting a graph variation operator ∆ and a matrix Q such as those discussed in Sec. IV, is application dependent. In particular, meaningful properties for the graph Fourier transform can be drastically different. For example, in a clustering context, the practitioner is interested in extracting the structure, i.e. , in identifying tight groups of vertices. This is in constrast to sensor networks where the particular arrangement of sensors should hav e as little influence ov er the results as possible, i.e . , tight groups of stations should not be a source of bias ( e.g . , weather stations). For this reason, there is no unique way of studying how good a particular graph F ourier transform is. In this section, we use these tw o applications, i.e. , clustering and sensor networks, to sho w how our framew ork can be lev eraged to achiev e their application-specific goals. All experiments were performed using the GraSP toolbox for Matlab [32]. A. Clustering In this section, we study the problem of clustering defined as grouping similar objects into groups of object that are similar while being dissimilar to objects in other groups. In a graph setting, we are interested in grouping vertices of a graph such that there are many edges within each group, while groups are connected by fe w edges. Our goal is not to provide a ne w approach to perform clustering, but rather use this problem as a sho wcase for how using a well defined matrix Q can help achiev e the goals of a target application. In the context of clustering, spectral clustering extracts groups using the spectral properties of the graph [26]. More precisely , using C -means ( k -means with k = C ) on the first C graph Fourier modes yields interesting partitions. In [26], the author interprets this approach using graph cuts, for ( L , I ) -GFT (combinatorial Laplacian), ( L , I ) -GFT (normal- ized Laplacian) and ( L , D ) -GFT (random walk Laplacian). W e extend this interpretation here to any variation operator ∆ and any diagonal innner product matrix Q . For each cluster c , let V c ⊂ V be the subset of its vertices. Then the set {V c } c of all these subsets is a partition of the set of vertices V . Let the normalized cut of {V c } c associated to the (∆ , Q ) -GFT be: (∆ , Q ) -Ncut ( {V c } c ) := X c ∆  h ( Q ) c  with h ( Q ) c the normalized indicator function of cluster V c verifying: h ( Q ) c := 1 k 1 V c k Q 1 V c and 1 V c the indicator function of cluster c . This extends the normalized cut interpretation of [26]. Using the notation H = [ h ( Q ) 1 · · · h ( Q ) C ] T , and since Q is diagonal, we obtain the orthonormality property H H QH = I . Finding a partition of vertices minimizing the normalized cut (∆ , Q ) -Ncut ( {V c } c ) is therefore equiv alent to finding orthonormal normalized indicator functions of minimal variation. Spectral clustering is performed by first relaxing the con- straint that the graph signals h ( Q ) c are normalized indicator functions. Using Prop. 1, the first C (∆ , Q ) -graph Fourier modes are solutions to this relaxed optimization problem. The final partition is then obtained through C -means on the spectral features, where the feature vector of v ertex i is  [ u 0 ] i , . . . , [ u C − 1 ] i  T . The accuracy of spectral clustering is therefore dri ven by the choice of variation operator ∆ and inner product matrix Q . T o illustrate this, and the statement that choosing Q and ∆ is application-dependent, we look at sev eral choices in the context of a 2-class clustering problem with skewed clusters. Combined with an irregular distribution of inputs within each cluster , we are in the context of irregularity where our framework thriv es through its flexibility . The resulting graph is shown on Fig. 2, with a sparse cluster on the left (30 inputs), and a dense one on the right (300 inputs). Each cluster is defined by a 2D Gaussian distribution, with an overlapping support, and samples are drawn from these two distributions. W e already observe that inputs are irregularly distributed, and some inputs are almost isolated. These almost isolated inputs are important as they represent outliers. W e will see that correctly choosing Q alleviates the influence of those outliers. The graph of inputs is built in a T ABLE IV. F 1 score of the sparse (left) cluster on Fig. 2 for the clustering results of Fig. 3. Appr oach Accuracy Sparse F 1 score ( L , I ) 91 . 21% 6 . 45% ( L , I ) 93 . 64% 46 . 15% ( L , I ) , feat. norm. 46 . 67% 4 . 35% ( L , D ) 96 . 67 % 77 . 55 % ( L , C ) 91 . 21% 6 . 45% ( k x − A ( norm ) x k 1 , I ) 86 . 06% 4 . 17% 11 0 0 . 5 1 0 1 u 0 u 1 > s > d ⊥ s 0 5 10 15 0 0 . 25 λ l    \ h ( Q ) blue ( λ l )    0 5 10 15 0 0 . 1 λ l MSE ( Q ) H l (a) ( L , I ) 0 0 . 2 0 . 4 0 0 . 2 0 . 4 u 0 u 1 > s > d ⊥ d 0 0 . 5 1 1 . 5 2 0 0 . 1 λ l    \ h ( Q ) blue ( λ l )    0 0 . 5 1 1 . 5 2 0 0 . 1 λ l MSE ( Q ) H l (b) ( L , I ) − 1 0 1 − 1 0 1 f u 0 f u 1 > s > d ⊥ d ⊥ s (c) ( L , I ) , feat. norm. 0 0 . 2 0 . 4 0 0 . 2 0 . 4 u 0 u 1 > s > d ⊥ d 0 0 . 5 1 1 . 5 2 0 0 . 06 λ l    \ h ( Q ) blue ( λ l )    0 0 . 5 1 1 . 5 2 0 0 . 1 λ l MSE ( Q ) H l (d) ( L , D ) 0 . 1 − 0 . 3 0 0 . 1 u 0 u 1 > s > d ⊥ d 0 1 k 2 k 3 k 0 0 . 35 λ l    \ h ( Q ) blue ( λ l )    0 1 k 2 k 3 k 0 0 . 1 λ l MSE ( Q ) H l (e) ( L , C ) 0 0 . 2 0 . 4 0 0 . 2 0 . 4 u 0 u 1 > s > d ⊥ d ⊥ s 0 0 . 4 0 . 8 0 0 . 5 λ l    \ h ( Q ) blue ( λ l )    0 0 . 4 0 . 8 0 1 λ l MSE ( Q ) H l (f) ( k x − A ( norm ) x k 1 , I ) Fig. 3. Sev eral spectral clustering results on the graph of Fig. 2. The smallest cluster is in dark red, and the largest cluster is in light blue. The second ro w of plots shows the input in the feature space, i.e. , to each vertex i is associated a point  [ u 0 ] i , [ u 1 ] i  T in the spectral feature space. The set > s (resp. > d ) corresponds to the vertices in the sparse (resp. dense) cluster and correctly clustered. The set ⊥ s (resp. ⊥ d ) corresponds to vertices incorrectly clustered in the sparse (resp. dense) cluster. Colors in both rows of figures match for consistency . The third row shows the ( M , Q ) -GFT of the corresponding normalized indicator function h ( Q ) blue (the first Fourier component is not sho wn as it is quite large, except for (f)). The fourth ro w shows the Q -MSE of the ideal low-pass filter H l with varying cut-off frequency λ l applied to this normalized indicator function. Generalized GFT: (a): ( L , I ) , (b) ( L , I ) , (c) ( L , I ) with normalized feature vectors, (d) ( L , D ) , (e) ( L , C ) , and (f) (∆ , I ) with ∆( x ) = k x − A ( norm ) x k 1 . K -nearest neighbor fashion ( K = 10 ) with weights chosen using a Gaussian kernel of the Euclidean distance ( σ = 0 . 4 ). Fig. 3 shows the results of spectral clustering on this graph, using several GFTs (first two rows), the analysis of h ( Q ) blue in the spectral domain (third ro w), and an example of Q -MSE of sev eral ideal low pass filters on the same indicator function. First of all, the Random W alk Laplacian case based on the ( L , D ) -GFT giv es the best results, consistent with [26] where the author advocates this approach. [26, Prop. 5] giv es an intuition on why this case works best using a random w alk interpretation: If a starting verte x is chosen at random from the stationary distrib ution of the random walk on the graph, then the ( L , D ) -Ncut is e xactly the probability to jump from one cluster to another . Minimizing this cut, is then minimizing transitions between clusters. In the spectral domain, we see that the indicator function has a lot of energy in the first few spectral components, with a rapid decay afterwards. W e notice also, a slight increase of energy around graph frequency 1 due to the red vertices of the sparse (red) cluster which are in the middle of the dense (blue) cluster (see Fig. 2(a)). As expected, the Q -MSE decreases when the cut-of f frequency increases. But interestingly , if we use the classical I -MSE, it is not decreasing with the cut-off frequency because of the use of the Q -norm instead of the ` 2 -norm (see Fig. 4(a). The combinatorial Laplacian case, i.e. , using the ( L , I ) - GFT , suffers from the presence of outliers. Indeed, if i is such an outlier, then having it in a separate cluster, i.e. , V 1 = { i } and V 2 = V \{ i } for C = 2 , leads to ∆( h ( I ) 1 ) = ∆( h ( I ) 2 ) = d i . Since i is isolated, it has e xtremely lo w degree, and the resulting cut is small, making it a good clustering according to the ( L , I ) -Ncut . Not accounting for irregularity of the de gree in the definition of the normalized indicator function h ( I ) c leads therefore to poor clustering in the presence of outliers. A similar behavior is observed for the ( L , C ) -GFT with a normalization by the V oronoi cell area that is large for isolated vertices, hence an ev en smaller normalized cut. In the spectral domain, more Fourier components of the normalized indicator function are lar ge, especially around the graph frequency 0. This results in a Q -MSE that is large for our ideal low-pass filters. In other words, more lowpass components are required to correctly approximate the indicator function using a low- pass signal. Next, is the normalized Laplacian case, i.e. , using the ( L , I ) - GFT . Just as the Random W alk Laplacian case, singleton clusters with outliers are not associated to small cuts since ( L , I ) -Ncut( { i } , V \{ i } ) = ( L , D ) -Ncut( { i } , V \{ i } ) . How- ev er , a careful study of the graph Fourier modes re veals the weakness of this case. Indeed, using the well known relation 12 0 0 . 5 1 1 . 5 2 0 0 . 1 λ l MSE ( I ) H l (a) ( L , D ) 0 1 k 2 k 3 k 0 0 . 1 λ l MSE ( I ) H l (b) ( L , C ) Fig. 4. I -MSE of the ideal low-pass filter H l applied to the normalized indicator function h ( Q ) blue corresponding to (a) Fig. 3(d) and (b) to Fig. 3(e). between the ( L , I ) -graph Fourier modes { u l } l and the ( L , D ) - graph F ourier modes { v l } l with ϕ ( v l ) = u l = D 1 / 2 v l , this case deviates from the Random W alk Laplacian by introducing a scaling of graph Fourier modes by the degree matrix. Differ - ences are then noticeable in the spectral feature space (bottom plots in Fig. 3). Indeed, the graph Fourier modes verify [ u l ] i = √ d i [ v l ] i , such that low degree vertices, e.g . , outliers and vertices in the boundary of a cluster, ha ve spectral features of smaller magnitude:    [ u 0 ] i , [ u 1 ] i  T       [ v 0 ] i , [ v 1 ] i  T   . In other words, spectral features of low degree vertices are mov ed closer to the origin of the feature space, and closer to each other . C -means cluster those vertices together , resulting in the poor separation of clusters we see on Fig. 3(b). In the spectral domain, we clearly see the interest of using the random walk Laplacian approach: the normalized indicator functions are not lo wpass. This is further seen on the Q -MSE that decreases continuously with the cutof f frequency without the fast decay we wish for . The spectral clustering literature is actually not using the raw spectral features to run C -means, but normalize those features by their ` 2 -norm prior to running C -means [33]. This is actually equiv alent to the projection of the ( L , D ) -graph Fourier modes { v l } l on the unit circle: [ f u 0 ] i = cos ( θ i ) [ f u 1 ] i = sin ( θ i ) θ i = atan  k 1 k D [ v 1 ] i  . W e can use this relation to characterize how spectral fea- tures of vertices are modified compared to the ( L , D ) -GFT approach, and how C -means behaves. Indeed, separation of features ( i.e. , how far the spectral features of two vertices are) is modified by the transformation abov e. Looking at the norm of the gradient of these spectral features yields:     − − → grad  [ f u 0 ] i [ f u 1 ] i      2 = k 1 k D 1 + ( k 1 k D [ v 1 ] i ) 2 Therefore, the gradient is smaller for higher v alues of | [ v 1 ] i | : This spectral feature normalization brings features closer for higher values of | [ v 1 ] i | . As features of the dense and sparse clusters are identified by the magnitude of | [ v 1 ] i | , this normal- ization actually brings features of the two clusters closer . This yields less separation in the feature space and poor clustering output from C -means (see Fig. 3(c)). Finally , we performed the same specral clustering approach using the (∆ , I ) -GFT with ∆( x ) = k x − A ( norm ) x k 1 the graph total variation. Results on Fig. 3(f) sho w that this approach actually identifies the stronger small clusters rather than larger clusters. More importantly , in this case, the spectral features of both clusters are not separated, such that C -means cannot separate clusters based on the spectral features. Furthermore, it turns out that the clusters gi ven by these spectral features are not the best with an (GTV , I ) -Ncut of 8 . 21 , while those giv en by the combinatorial Laplacian yields a (GTV , I ) -Ncut of 6 . 61 . In other words, the relaxation we made that consider arbitrary h ( I ) c instead of normalized indicator functions leads to large errors. The spectral domain analysis suffers from the difficulty to compute efficiently the graph Fourier modes, since the graph v ariation is not HPSD. Howe ver , the first fe w components of the normalized indicator function show that this signal is now low pass, with a lot of energy in these components. Q -MSE is not sharply decreasing either and stays very high. B. Sensor Networks Here we explore our motiv ating example of a sensor net- work, where the goal is the opposite of that in the one above: clusters of vertices should not bias the results. More precisely , we are interested in studying signals measured by sensor networks where the location of the sensors is unrelated to the quantity measured, e.g . , temperature readings in a weather station network. In this section, we show that we can obtain a good definition of inner product achieving this goal. Let G ( U ) be a graph whose vertices are uniformly sampled on a 2D square plane, and G ( NU ) a similar graph obtained with a non-uniform sampling distribution (see Fig. 5(e) where the areas of higher density are indicated). Edges of both graphs are again selected using K -nearest neighbors, with K = 10 . W eights are chosen as a Gaussian kernel of the Euclidean distance w ij = exp  − d( i, j ) 2 / (2 σ 2 )  , with σ = 0 . 3 (empir- ically chosen to hav e a good spread of weights). Example graphs with 500 sampled vertices are shown in Figs. 5(a) and 5(e). In this experiment, we are interested in studying how we can remove the influence of the particular arrangement of sensors such as to obtain the same results from one graph realization to another . T o that end, we generate 500 graph realizations, each with 1000 vertices, to compute empirical statistics. The signals we consider are pure sine wa ves, which allows us to control the energies and frequencies of the signals we use. W e experiment with se veral signals for each frequency and report the worst case statistics over those signals of equal (continuous) frequency . This is done by choosing sev eral values of phase shift for each frequency . Let e s ν ( x, y ; ϕ ) = cos(2 π ν x i + ϕ ) be a continuous horizontal Fourier mode of frequency ν phase shifted by ϕ . For a giv en graph G generated with any of the two schemes above, we define s ν ( i ; ϕ, G ) = e s ν ( x i , y i ; ϕ ) the sampled signal on the graph. Its energy w .r .t. the Q -norm is then: E ( Q ) ν ( ϕ ; G ) := k s ν ( _ ; ϕ, G ) k 2 Q . Note that this energy depends on the graph if Q does, i.e. , we have Q ( G ) which may vary from one graph to another such as Q ( G ) = D ( G ) . W e use the shorter notation Q to keep notations shorter . T o get better insights on the influence of the sensor sam- pling, we are interested in statistical quantities of the graph 13 0 3 6 (a) Degree 0 . 0 0 . 2 0 . 4 (b) V oronoi Cell Area 0 1 2 3 4 5 6 7 8 9 10 0 . 00 0 . 02 0 . 04 0 . 06 0 . 08 0 . 10 ν CV ( I ) ( ν ) CV ( D ) ( ν ) CV ( C ) ( ν ) (c) 0 1 2 3 4 5 6 7 8 9 10 0 . 00 0 . 02 0 . 04 0 . 06 0 . 08 0 . 10 ν m ( I ) ( ν ) m ( D ) ( ν ) m ( C ) ( ν ) (d) 0 5 10 (e) Degree 0 . 0 0 . 5 1 . 0 (f) V oronoi Cell Area 0 1 2 3 4 5 6 7 8 9 10 0 . 00 0 . 02 0 . 04 0 . 06 0 . 08 0 . 10 ν CV ( I ) ( ν ) CV ( D ) ( ν ) CV ( C ) ( ν ) (g) 0 1 2 3 4 5 6 7 8 9 10 0 . 00 0 . 10 0 . 20 0 . 30 0 . 40 0 . 50 0 . 60 ν m ( I ) ( ν ) m ( D ) ( ν ) m ( C ) ( ν ) (h) Fig. 5. Study of the graph energy of single cosine continuous signal f s ν ( x, y ; ϕ ) = cos(2 π ν x i + ϕ ) sampled uniformly (a)-(d) and non-uniformly (e)-(h) as s ν ( i ; ϕ ) = f s ν ( x i , y i ; ϕ ) . Example sampling with 500 samples (vertices) showing vertex degree ((a) and (e)), and V oronoi cell area ((b) and (f)) with colors. (c)-(d) and (g)-(h) 1000 samples (vertices), with results averaged over 500 sampling (graph) realizations. (c) and (g) coef ficient of variation CV ( Q ) ( ν ) of the signal energy depending on the continuous frequency ν . (d) and (h) maximum absolute de viation m ( Q ) ( ν ) of the normalized mean signal energy depending on the continuous frequency ν . signal energy . First, we study the empirical mean : µ ( Q ) ( ν ; ϕ ) := D E ( Q ) ν ( ϕ ; G ) E G = 1 N G N G X n =1 E ( Q ) ν ( ϕ ; G n ) , with N G the number of sampling realizations. It is interesting to study how the mean varies depending on ϕ and ν in order to check whether the graph signal energy remains constant, giv en that the ener gy of the continuous signals being sampled is constant. The first thing we observ e is that the av eraging ov er ϕ , i.e. , over many signals of equal continuous frequenc y , yields the same av erage mean for all continuous frequencies. Howe ver , µ ( Q ) ( ν ; ϕ ) does depend on ϕ . T o show this, we use the maximum absolute deviation of the normalized mean ¯ µ ( Q ) ( ν ; ϕ ) := µ ( Q ) ( ν ; ϕ ) / h µ ( Q ) ( ν ; ϕ ) i ϕ : m ( Q ) ( ν ) := max ϕ    1 − ¯ µ ( Q ) ( ν ; ϕ )    . Use of the normalized mean is necessary to be able to compare the various choices of Q since they can yield quite different av erage mean h µ ( Q ) ( ν ; ϕ ) i ϕ . Howe ver , the mean is only characterizing the bias of the graph signal energy approximation to the continuous signal energy . W e also need to characterize the v ariance of this estimator . T o that end, we consider the empirical standard deviation of the graph signal energy: σ ( Q ) ( ν ; ϕ ) := v u u t 1 N G − 1 N G X n =1  E ( Q ) ν ( ϕ ; G n ) − µ ( Q ) ( ν ; ϕ )  2 . This quantity sho ws how much the actual sampling being performed influences the signal energy estimator . Since the mean energy is influenced by the choice of Q , we need to normalize the standard de viation to be able to compare results between v arious choices of Q . This yields the coefficient of variation , and we report its maximum ov er all signals of equal (continuous) frequency: CV ( Q ) ( ν ) := max ϕ σ ( Q ) ( ν ; ϕ ) µ ( Q ) ( ν ; ϕ ) , from which we can study the variance of the graph signal energy estimator depending on the continuous frequency . W e experiment here with three choices for Q . The first one a standard GFT from the literature, corresponding to the ` 2 - norm: Q = I . The second one corresponds to the random-walk Laplacian with Q = D . The third one is our novel approach based on the V oronoi cell area inner product Q = C (see Sec. IV -D). Results for G ( U ) and G ( NU ) are given in Fig. 5. The coefficient of variation shows here the strong advantage of using the V oronoi cell area inner product: CV ( C ) ( ν ) is v ery small in the lower continuous spectrum, and almost zero (see Figs. 5(c) and 5(g)). In other words, for signals that do not vary too quickly compared to the sampling density , the C - norm gives an estimated ener gy with small variance, r emoving the influence of the particular arr angement of sensor s . This is true for both the uniform distribution and the non-uniform distribution. Both the dot product Q = I and the degree norm Q = D have larger variance here. Finally , the maximum absolute de viation on Figs. 5(d) and 5(h) sho ws ag ain the advantage of the C -norm. Indeed, considering several signals of equal continuous frequencies, these should have equal average ener gy . While this is the case for both the dot product and the C -norm in the lower spectrum for a uniform sampling, using a non-uniform sampling yields a strong deviation for the dot product. The C -norm appears again as a good approximation to the continuous energy with small deviation between signals of equal frequencies, and between samplings. Remark 2. In [13], [34], the authors advocate for the use of the random walk Laplacian as a good appr oximation for 14 the continuous (Laplace-Beltr ami) oper ator of a manifold. W e showed her e that this case is not ideal when it comes to variance of the signal ener gy estimator , but in those commu- nications, the authors actually normalize the weights of the graph using the de gr ee matrix prior to computing the random walk Laplacian, thus working on a differ ent graph. If e A is the adjacency prior its normalization, and e D the associated de gr ee matrix, then A = e D − 1 e A e D − 1 . This normalization is important when we look at the de gr ee. Indeed, without it, we see on F ig. 5(a)-(b) and 5(e)-(f) that degr ee and V or onoi cell ar ea evolve in opposed directions: lar ge de gr ees corr espond to small ar eas. Pr e-normalizing by the degr ee corrects this. Using this normalization leads to better r esults with r espect to the ener gy (omitted here to save space), however , V or onoi cells still yield the best results. These results show that it is possible to better account for irregularity in the definition of the energy of a graph signal: The V oronoi cell area norm yields very good results when it comes to analysing a signal lying in a Euclidean space (or on a manifold) independently of the sampling performed. V I . C O N C L U S I O N W e showed that it is possible to define an orthonormal graph Fourier transform with respect to any inner product. This allows in particular to finely account for an irregular graph structure. Defining the associated graph filters is then straightforward using the fundamental matrix of the graph. Under some conditions, this led to a fundamental matrix that is easy to compute and efficient to use. W e also showed that the literature on graph signal processing can be interpreted with this graph Fourier transform, often with the dot product as inner product on graph signals. Finally , we showed that we are able to obtain promising results for sensor networks using the V oronoi cell areas inner product. This work calls for many extensions, and gi ving a complete list of them would be too long for this conclusion. At this time, we are working on the sensor network case to obtain graph signal energies with even less variance, especially once the signals are filtered. W e are also working on studying bilateral filters, and image smoothers in general [35], and extending them to graph signals with this ne w setting. W e explore also an extension of the study of random graph signals and stationarity with respect to this new graph Fourier transform. Finally , many communications on graph signal processing can be re- interpreted using alternati ve inner product that will giv e more intuitions on the impact of irregularity . A C K N O W L E D G E M E N T The authors would like to thank the anonymous revie wers for their careful reading and constructi ve comments that helped improv e the quality of this communication. A P P E N D I X A. Low P ass F iltering of an Indicator Function W e now sho w an example to compare the quality of dif ferent low pass approximations to the normalized cluster indicator function, under different GFT definitions. This is moti vated by the fact that spectral clustering techniques make use of the low frequencies of the graph spectrum. Fig. 6 sho ws these approximations in the vertex domain. Note that these correspond to the Q -MSE plots of Fig. 3. Comparing results for l = 1 and l = 5 we note that for both L and L , the low pass approximation is good, and significantly better than that achiev ed based on the GFT corresponding to graph total variation. Howe ver , the ( L , I ) -GFT suffers from isolated vertices ( l = 1 case on Fig. 6), and the approximation needs more spectral components for an output closer to the indicator function ( l = 5 case), while the ( L , I ) -GFT is clearly biased with the degree (smaller amplitudes on the vertices of the cluster boundaries). This confirms that the indicator function approximation (and thus spectral clustering performance) is better for the ( L , D ) - GFT , which does not have the bias towards isolated v ertices shown by the ( L , I ) -GFT (impulses on those isolated vertices are associated to lowpass signal, while they are associated to the graph frequency 1 for the ( L , D ) -GFT), and without the bias of the first Fourier mode tow ards v ertex degrees of the ( L , I ) -GFT . R E F E R E N C E S [1] Benjamin Girault, “Stationary Graph Signals using an Isometric Graph T ranslation, ” in Signal Processing Conference (EUSIPCO), 2015 Pr o- ceedings of the 23rd European . IEEE, 2015. [2] Siheng Chen, Y aoqing Y ang, Christos Faloutsos, and Jelena Ko va ˇ cevi ´ c, “Monitoring Manhattan’s Traf fic at 5 Intersections?, ” in 2016 IEEE Global Conference on Signal and Information Pr ocessing , W ashington D.C., USA, Dec 2016. [3] Ronan Hamon, Pierre Borgnat, Patrick Flandrin, and Céline Robardet, “Networks as signals, with an application to a bike sharing system, ” in 2013 IEEE Global Conference on Signal and Information Processing , Dec 2013, pp. 611–614. [4] Ronan Hamon, Analysis of tempor al networks using signal pr ocessing methods : Application to the bike-sharing system in Lyon , Theses, Ecole normale supérieure de lyon - ENS L YON, Sept. 2015. [5] Benjamin Girault, P aulo Gonçalves, and Éric Fleury , “Graphe de con- tacts et ondelettes : étude d’une diffusion bactérienne, ” in Pr oceedings of Gr etsi 2013 , Sept. 2013. [6] Jiun-Y u Kao, Antonio Ortega, and Shrikanth S Narayanan, “Graph- based approach for motion capture data representation and analysis, ” in Image Pr ocessing (ICIP), 2014 IEEE International Conference on . IEEE, 2014, pp. 2061–2065. [7] Mikhail Belkin and Partha Niyogi, “T owards a theoretical foundation for Laplacian-based manifold methods, ” Journal of Computer and System Sciences , vol. 74, no. 8, pp. 1289–1308, 2008. [8] Samuel I Daitch, Jonathan A Kelner , and Daniel A Spielman, “Fitting a graph to vector data, ” in Pr oceedings of the 26th Annual International Confer ence on Machine Learning . A CM, 2009, pp. 201–208. [9] Xiaowen Dong, Dorina Thanou, Pascal Frossard, and Pierre V an- derghe ynst, “Learning laplacian matrix in smooth graph signal rep- resentations, ” IEEE T ransactions on Signal Pr ocessing , vol. 64, no. 23, pp. 6160–6173, 2016. [10] Hilmi E. Egilmez, Eduardo P avez, and Antonio Ortega, “Graph Learning from Data under Laplacian and Structural Constraints, ” IEEE Journal of Selected T opics in Signal Pr ocessing , vol. 11, no. 6, pp. 825–841, Sept 2017. [11] Bastien Pasdeloup, V incent Gripon, Grégoire Mercier , Dominique Pas- tor , and Michael G Rabbat, “Characterization and inference of graph diffusion processes from observations of stationary signals, ” IEEE T ransactions on Signal and Information Pr ocessing over Networks , 2017. [12] Jonathan Mei and José MF Moura, “Signal processing on graphs: Causal modeling of unstructured data, ” IEEE T ransactions on Signal Pr ocessing , vol. 65, no. 8, pp. 2077–2092, 2017. 15 ( L , I ) ( L , I ) ( L , D ) ( L , C ) ( k x − A ( norm ) x k 1 , I ) l = 1 − 2 0 2 l = 5 − 2 0 2 l = 50 Fig. 6. Output of the ideal low pass filter H l applied to the normalized indicator function h ( Q ) blue . H l keeps only the first l + 1 spectral components ( i.e. , up to the graph frequency λ l ). Note that H l depends on the (∆ , Q ) -GFT . [13] Matthias Hein, Jean-Yves Audibert, and Ulrike von Luxburg, “Graph Laplacians and their Conver gence on Random Neighborhood Graphs., ” Journal of Machine Learning Researc h , vol. 8, pp. 1325–1368, 2007. [14] Aliaksei Sandryhaila and José M. F . Moura, “Discrete Signal Processing on Graphs, ” IEEE T ransactions on Signal Pr ocessing , vol. 61, no. 7, pp. 1644–1656, 2013. [15] David I. Shuman, Sunil K. Narang, Pascal Frossard, Antonio Ortega, and Pierre V andergheynst, “The Emerging Field of Signal Processing on Graphs: Extending High-Dimensional Data Analysis to Networks and Other Irregular Domains., ” IEEE Signal Pr ocessing Magazine , vol. 30, no. 3, pp. 83–98, 2013. [16] Benjamin Girault, Paulo Gonçalves, and Éric Fleury , “Translation on Graphs: An Isometric Shift Operator, ” Signal Pr ocessing Letters, IEEE , vol. 22, no. 12, pp. 2416–2420, Dec 2015. [17] Benjamin Girault, Signal Pr ocessing on Graphs - Contributions to an Emer ging F ield , Phd thesis, Ecole normale supérieure de lyon - ENS L YON, Dec. 2015. [18] Aliaksei Sandryhaila and José M. F . Moura, “Discrete Signal Processing on Graphs: Frequency Analysis, ” Signal Pr ocessing, IEEE T ransactions on , vol. 62, no. 12, pp. 3042–3054, June 2014. [19] Stefania Sardellitti, Sergio Barbarossa, and Paolo Di Lorenzo, “On the Graph F ourier T ransform for Directed Graphs, ” IEEE Journal of Selected T opics in Signal Processing , vol. 11, no. 6, pp. 796–811, 2017. [20] Aamir Anis, Akshay Gadde, and Antonio Orte ga, “Efficient Sampling Set Selection for Bandlimited Graph Signals Using Graph Spectral Proxies., ” IEEE T rans. Signal Pr ocessing , vol. 64, no. 14, pp. 3775– 3789, 2016. [21] Gene H. Golub and Charles F . V an Loan, Matrix Computations , The Johns Hopkins University Press, 3rd edition, 1996. [22] Rasoul Shafipour , Ali Khodabakhsh, Gonzalo Mateos, and Evdokia Nikolov a, “A Digraph Fourier Transform W ith Spread Frequency Components, ” May 2017. [23] David I. Shuman, Benjamin Ricaud, and Pierre V anderghe ynst, “V ertex- frequency analysis on graphs, ” Applied and Computational Harmonic Analysis , vol. 40, no. 2, pp. 260–291, 2016. [24] David K. Hammond, Pierre V andergheynst, and Rémi Gribonv al, “W av e- lets on graphs via spectral graph theory, ” Applied and Computational Harmonic Analysis , vol. 30, no. 2, pp. 129–150, 2011. [25] Alan V . Oppenheim and Ronald W . Schafer, Discr ete-time Signal Pr ocessing , Pearson, 3rd edition, 2013. [26] Ulrike von Luxburg, “A tutorial on spectral clustering, ” Statistics and Computing , vol. 17, no. 4, pp. 395–416, 2007. [27] Mikhail Belkin and Partha Niyogi, “Laplacian eigenmaps for dimen- sionality reduction and data representation, ” Neural computation , v ol. 15, no. 6, pp. 1373–1396, 2003. [28] Sunil K Narang and Antonio Ortega, “Compact Support Biorthogonal W avelet Filterbanks for Arbitrary Undirected Graphs, ” IEEE T ransac- tions on Signal Processing , vol. 61, no. 19, pp. 4673–4685, 2013. [29] Akshay Gadde, Sunil K. Narang, and Antonio Ortega, “Bilateral filter: Graph spectral interpretation and e xtensions, ” in 2013 IEEE International Conference on Image Pr ocessing , Sept 2013, pp. 1222– 1226. [30] Xianming Liu, Gene Cheung, Xiaolin Wu, and Debin Zhao, “Random W alk Graph Laplacian-Based Smoothness Prior for Soft Decoding of JPEG Images, ” IEEE T ransactions on Image Processing , vol. 26, no. 2, pp. 509–524, 2017. [31] Carlo T omasi and Roberto Manduchi, “Bilateral Filtering for Gray and Color Images, ” in 6th International Conference on Computer V ision ’98, Pr oceedings of , W ashington, DC, USA, 1998, p. 839. [32] Benjamin Girault, Shrikanth S. Narayanan, Antonio Ortega, Paulo Gonçalves, and Eric Fleury , “Grasp: A matlab toolbox for graph signal processing., ” in ICASSP . 2017, pp. 6574–6575, IEEE. [33] Andrew Y Ng, Michael I Jordan, and Y air W eiss, “On spectral clus- tering: Analysis and an algorithm, ” in Advances in neural information pr ocessing systems , MIT Press, Ed., V ancouver , Canada, 2002, pp. 849– 856. [34] Ronald R. Coifman and Stéphane Lafon, “Diffusion maps, ” Applied and Computational Harmonic Analysis , vol. 21, no. 1, pp. 5–30, 2006. [35] Peyman Milanfar , “A T our of Modern Image Filtering: New Insights and Methods, Both Practical and Theoretical., ” IEEE Signal Process. Mag. , vol. 30, no. 1, pp. 106–128, 2013.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment