Multiresolution Representations for Piecewise-Smooth Signals on Graphs

What is a mathematically rigorous way to describe the taxi-pickup distribution in Manhattan, or the profile information in online social networks? A deep understanding of representing those data not only provides insights to the data properties, but …

Authors: Siheng Chen, Aarti Singh, Jelena Kovav{c}evic

Multiresolution Representations for Piecewise-Smooth Signals on Graphs
1 Multiresolution Representations for Piece wise-Smooth Signals on Graphs Siheng Chen, Aarti Singh, Jelena K ova ˇ ce vi ´ c Abstract —What is a mathematically rigorous way to describe the taxi-pickup distribution in Manhattan, or the pr ofile informa- tion in online social networks? A deep understanding of r epresent- ing those data not only provides insights to the data properties, but also benefits to many subsequent processing procedures, such as denoising, sampling, reco very and localization. In this paper , we model those complex and irregular data as piecewise- smooth graph signals and propose a graph dictionary to effec- tively repr esent those graph signals. W e first propose the graph multiresolution analysis, which pr ovides a principle to design good representations. W e then propose a coarse-to-fine approach, which iteratively partitions a graph into tw o subgraphs until we reach individual nodes. This appr oach efficiently implements the graph multiresolution analysis and the induced graph dictionary promotes sparse representations piecewise-smooth graph signals. Finally , we validate the proposed graph dictionary on two tasks: approximation and localization. The empirical results show that the proposed graph dictionary outperforms eight other represen- tation methods on six datasets, including traffic networks, social networks and point cloud meshes. Index T erms —Signal processing on graphs, signal repr esenta- tions, graph dictionary I . I N T RO D U C T I O N T oday’ s data is being generated from a diversity of sources, all residing on complex and irregular structures; examples include profile information in social networks, stimuli in brain connectivity networks and traffic flow in city street networks [1], [2]. The need for understanding and analyzing such complex data has led to the birth of signal processing on graphs [3], [4], which generalizes classical signal processing tools to data supported on graphs; the data is the graph signal index ed by the nodes of the underlying graph. Modeling real-world data using piecewise-smooth graph signals. In urban settings, the intersections around shopping areas will exhibit homogeneous mobility patterns and life- style behaviors, while the intersections around residential areas will exhibit different, yet still homogeneous mobility patterns and life-style behaviors. Similarly , in social networks, within a given social circle users’ profiles tend to be homogeneous, while within a different social circle they will be different, yet still homogeneous. W e can model data generated from both cases as piecewise-smooth graph signals, as they capture large variations between pieces and small variations within pieces. Figure 1 illustrates ho w a piecewise-smooth signal model can be used to approximate the taxi-pickup distrib ution in Manhattan and users’ profile information on Facebook (hard thresholding is applied for better visualization). The piecewise-smooth signal model has been intensely stud- ied and widely used in classical signal processing, image processing and computer graphics [5], [6], [7]. Multireso- lution analysis and splines are standard representation tools (a) T axi-pickup distrib ution (b) Piecewise-smooth in Manhattan (13,679 intersections). approximation (50 coefficients). (c) Profile information (d) Piecewise-smooth on Facebook (277 users). approximation (5 coefficients). Fig. 1: Piecewise-smooth graph signals approximate irregular , nonsmooth graph signals by capturing both lar ge variations at boundaries as well as small variations within pieces. to analyze piecewise-smooth signals [8]. The idea of using piecewise-smooth graph signals is not novel either; in [3], the authors show that graph wav elets can capture discontinuities in a piecewise-smooth graph signal and in [9], the authors proposed denoising for piece wise-polynomial graph signals through minimizing a generalized total-variation term. There are two gaps in the previous literature we address here: (1) define piece wise-smooth graph signals precisely and find appropriate representations and (2) provide theoretical results on sparse representations for piecewise-smooth graph signals. Representations f or piecewise-smooth graph signals. Sig- nal representations are at the heart of most signal processing techniques [10], allowing for targeted signal models for tasks such as denoising, compression, sampling, recovery and detec- tion [11]. Our aim in this paper is to find an appropriate and efficient approach to represent piecewise-smooth graph signals. W e define a mathematical model for piecewise-smooth graph signals and propose a graph dictionary to sparsely represent piecewise-smooth graph signals. Inspired by classical signal processing, we generalize the idea of multiresolution analysis to graphs as a representation tool for piece wise-smooth sig- nals [12]. W e implement the graph multiresolution analysis by using a coarse-to-fine decomposition approach; that is, 2 we iterati vely partition a graph into two connected subgraphs until we reach individual nodes. W e show that the process leads to an efficient construction of a graph wa velet basis that satisfies the graph multiresolution analysis, and the induced graph dictionary promotes sparse representations for piece wise- smooth graph signals. W e validate the proposed graph dic- tionaries on two tasks: approximation and localization. W e show that the proposed graph dictionaries outperform eight other representation methods on six graphs, including traffic networks, citation network, social networks and point cloud meshes. Contributions. The main contrib utions of this paper are: • A no vel and e xplicit definition for piece wise-smooth graph signals; see Section III. • A nov el graph multiresolution analysis that is imple- mented by a coarse-to-fine decomposition approach; see Section IV. • A novel graph dictionary that promotes the sparsity for piecewise-smooth graph signals and is well-structured and storage-friendly; see Section III-B; and • An extensi ve empirical study on a number of real-world graphs, including traf fic networks, citation networks, so- cial networks and point cloud meshes; see Section VI. Outline of the paper . Section II re vie ws the background materials; Sections III defines piecewise-smooth graph signals; Section IV proposes the graph multiresolution analysis, which provides a principled way to represent graph signals. Sec- tions III-B show that the proposed graph dictionary promotes sparse representations for piecewise-smooth graph signals. W e validate the proposed methods in Section VI and conclude in Section VII. I I . B AC K G RO U N D W e briefly introduce the framework of graph signal pro- cessing. W e then overvie w related works on graph signal representation. A. Graph Signal Pr ocessing Graphs. Let G = ( V , E , A) be an undirected, irregular and non-negati ve weighted graph, where V = { v i } N i =1 is the set of nodes (vertices), E = { e i } E i =1 is the set of weighted edges and A ∈ R N × N is the weighted adjacency matrix whose element A i,j is the edge weight between the i th and the j th nodes. Let D ∈ R N × N be a diagonal degree matrix with D i,i = P j A i,j . The graph Laplacian matrix is L = D − A ∈ R N × N , which is a second-order dif ference operator on graphs [13]. Let ∆ ∈ R E × N be the graph incidence matrix; its ro ws correspond to edges. If e i is an edge that connects the j th node to the k th node ( j < k ), the elements of the i th row of ∆ are ∆ i,` =    − p A j,k , ` = j ; p A j,k , ` = k ; 0 , otherwise . The graph incidence matrix measures the first-order difference and ∆ T ∆ = L . Graph signals. Giv en a fixed ordering of nodes, we assign a signal coef ficient to each node; a graph signal is then defined as a vector , x = [ x 1 , x 2 , · · · , x N ] T ∈ R N , with x n the signal coefficient corresponding to the node v n . W e say that the graph is smooth when adjacent nodes ha ve similar signal coefficients [3], [14]. • Smoothness in the vertex domain. Consider ∆ x ∈ R E as an edge signal representing the first-order difference of x . The i th element of ∆ x , (∆ x ) i = p A j,k ( x k − x j ) , assigns the difference between two adjacent signal coeffi- cients to the i th edge, which connects the j th node to the k th node ( j < k ). When the variation k ∆ x k 2 2 = x T L x is small, the differences are small and x is smooth. • Smoothness in the spectral domain. W e typically call this type of smoothness bandlimitedness [15], [16]. Let the graph Fourier basis V ∈ R N × N be the eigenv ector matrix of L 1 , with L = V Λ V T and the diagonal elements of Λ are arranged in ascending order . The graph spectrum is then b x = V T x ∈ R N . Let V ( K ) ∈ R N × K be the first K columns of V , which span the lowpass bandlimited subspace. For K  N , when    V T ( K ) x    2 2 / k x k 2 2 = 1 , the energy concentrates in the lowpass band and x is smooth. In practice, graph signals may not satisfy the smoothness constraint as abov e (as shown in Figure 1), which serves as motiv ation to further dev elop graph signal models and tools to represent them, the topic of this paper . B. Related W orks Ideally , a good representation should be efficient, hav e some structure such as orthogonality and promote sparse represen- tations for graph signals (at least in some subspaces). T o deal with large-scale graphs, we may also need the representation itself to be sparse and storage-friendly . W e categorize the previous work on graph signal representations as follo ws: 2 . 1) Graph F ilter Banks: Here, representations are con- structed by generalizing classical filter banks to graphs. [17], [18] designs critically-sampled filter banks via bipartite sub- graph decomposition; [19], [20] design critically-sampled filter banks for circulant graphs; [21] designs oversampled filter banks; [22] designs iterati ve filter banks; [23] designs critically- sampled filter banks via community detection; and [24] designs each channel via sampling and recovery . 2) Graph V ertex-Domain Designs: Here, representations are constructed by designing each basis vector (atom) in the graph vertex domain. [25] designs spatial wavelets via neighborhoods; and [26], [27], [28], [29] considers coarse-to- fine approaches. 3) Graph Spectral-Domain Designs: Here, representations are constructed by designing graph filters in the graph spectral domain. [30] designs graph wav elets; [31], [32] design tight frames; and [33] designs the windo wed graph F ourier trans- forms by generalizing translation. 1 The graph Fourier basis can also be defined based on adjacency matrix [4]. 2 The categorization follows from https://www .macalester .edu/ ∼ dshuman1/ T alks/Shuman GSP 2016.pdf. 3 4) Graph Diffusion-Based Designs: Here, representations are constructed based on the polynomial of the transition matrix. [34] designs dif fusion wa velets; and [35] designs diffusion wavelet packets; 5) Graph Dictionary Learning: Here, representations are constructed by learning from the giv en graph signals. The representations in the other branches depend on the graph structure only; [36], [37] learn graph dictionaries that provide smoothness for giv en graph signals, which is adaptive and biased to the observed graph signals. In this paper , we consider connecting graph filter banks and graph vertex domain designs. Similarly to [26], [27], [28], [29], [23], the proposed representation considers the coarse-to-fine decomposition in the graph vertex domain. Our goal is to im- plement the graph multiresolution analysis, where the coarse- to-fine approach is more efficient and straightforward than the local-to-global approach (graph filter banks). W e further show that the proposed representation is efficient, orthogonal and storage-friendly; it also satisfies the graph multiresolution analysis and promotes the sparsity for piecewise-constant and piecewise-smooth graph signals. The representations of smooth graph signals have been thor- oughly studied in the graph spectral domain [38], [39]. In this paper , we emphasize the representations of piecewise-smooth graph signals in the graph vertex domain. As a continuous counterpart of graph signal representations, some other works study on manifold data representations [40], [41], [42]. I I I . G R A P H S I G N A L M O D E L S Piecewise-smooth graph signals can model a number of real- world cases as they capture large variations between pieces and small v ariations within pieces. In this section, we mathemat- ically define piece wise-smooth graph signals. W e start with piecewise-constant graph signals, an important subclass, and then extend it to piece wise-smooth graph signals. A. Piecewise-Constant Model In classical signal processing, a piecewise-constant signal is a signal that is locally constant over connected regions sepa- rated by lower -dimensional boundaries. Such a signal is often related to step functions, square wa ves and Haar wav elets and is widely used in image processing [10]. Piecewise-constant graph signals ha ve been used in many applications without having been explicitly defined; for example, in community detection, community labels form a piecewise-constant graph signal for a giv en social network and in semi-supervised learn- ing, classification labels form a piecewise-constant graph signal for a graph constructed from the dataset. While smooth graph signals emphasize slow transitions, piecewise-constant graph signals emphasize fast transitions (corresponding to bound- aries) and localization in the vertex domain (corresponding to signals being nonzero in a local neighborhood). W e define a piece wise-constant graph signal through the concept of a piece that has been implicitly used before [43], [44]. The definition is intuitiv e; a piece wise-constant graph signal partitions a graph into several pieces; within each piece, signal coefficients are constant. Definition 1. Let S be a subset of the node set V . W e call S a piece when its corresponding subgraph G S is connected. W e can represent a piece S by using a one-piece graph signal, 1 S ∈ R N . A piecewise-constant graph signal is a linear combination of sev eral one-piece graph signals. Definition 2. Let { S c } C c =1 be a partition of the node set V , where each S c is a piece. A graph signal x ∈ R N is piecewise- constant with C pieces when x = C X c =1 a c 1 S c , where and a c ∈ R is the piece coef ficient for the piece S c . Denote this class by PC( C ) . For most graph signals, two adjacent signal coef ficients are typically the same; that is, k ∆ x k 0 may be close to the number of edges E . For a piecewise-constant graph signal x with a few number of pieces, ho we ver , k ∆ x k 0 is usually small. W ithin a piece, it is 0 while across pieces, k ∆ x k 0 is the cut cost to separate the pieces. For example, in an unweighted graph, k ∆ x k 0 ≤ # edges across pieces { S c } C c =1 , for all x ∈ PC( C ) , where the equality is achie ved when all a c are different. Thus, k ∆ x k 0  E when C  N ; see a quick summary in T able I. Graph signal models Properties k ∆ x k 0 k ∆ x k 2 Arbitrary graph signal O ( E ) large Smooth graph signal O ( E ) small Piecewise-constant graph signal O (1) large T ABLE I: Property summary of some typical graph signals. B. Piecewise-Smooth Model Piecewise-smooth signals are widely used to represent im- ages, where edges are captured by the piece boundaries and smooth content is captured by the pieces themselves. Piecewise-smooth graph signals arise naturally from piecewise- constant graph signals with more flexibility to model real-world data, such as taxi-pickup distrib ution supported on the city- street networks and 3D point cloud information supported on the meshes. W e define a piece wise-smooth graph signal as a general- ization of a piecewise-constant graph signal. For a piece wise- constant signal, signal coefficients within a piece are constant; for a piece wise-smooth signal, signal coefficients within a piece form a smooth graph signal over that piece. Let S be a piece, G S be the corresponding subgraph, V S ∈ R | S |×| S | be the corresponding graph Fourier basis. Giv en a graph signal x ∈ R N , x S ∈ R | S | denotes the signal coefficients supported on G S and x ¯ S ∈ R N −| S | denotes the rest signal coefficients. Definition 3. A graph signal x ∈ R N is localized and bandlimited over the piece S with bandwidth K ( K ≤ | S | ) when x ¯ S = 0 and V S ( K ) V S T ( K ) x S = x S ∈ R | S | , 4 where V S ( K ) ∈ R | S |× K contains the first K columns of the graph Fourier basis V S . Definition 3 sho ws a class of graph signals that is localized in both the verte x and the graph spectral domains. Since these signals are bandlimited over a piece, we consider them lowpass and smooth with in this piece. A similar definition has also been proposed in [45]; the dif ference is that the bandlimitedness in [45] is defined for the entire graph, while the bandlimitedness in Definition 3 is defined for a subgraph only . W e then consider piece wise-bandlimited graph signals as a linear combination of localized and bandlimited graph signals. Definition 4. A graph signal x ∈ R N is piecewise-bandlimited with C pieces and bandwidth K when x = P C c =1 x ( c,K ) , where each { S c } , c = 1 , . . . , C is a valid piece and x ( c,K ) ∈ R N is bandlimited o ver the piece S c with bandwidth K . Denote this class by PBL( C , K ) . I V . G R A P H M U L T I R E S O L U T I O N Having defined a piecewise-smooth model for the data we are interested in, we no w embark upon looking for the appro- priate representations. Inspired by classical signal processing, we generalize the multiresolution analysis to graph signals and propose a coarse-to-fine approach to implement it. A. Graph Multir esolution Analysis Definition 5. A multir esolution analysis on graphs consists of a sequence of embedded closed subspaces V ( L ) ⊂ · · · ⊂ V (1) ⊂ V (0) , such that • it satisfies upward completeness, V (0) = R N ; • it satisfies do wnward completeness, V ( L ) = { c 1 V , c ∈ R } ; • there exists an orthonormal basis { v ( ` ) k } K ( ` ) − 1 k =0 for V ( ` ) ; • it satisfies generalized shift in variance, that is, for any x ∈ V ( ` ) , there exists an nontri vial permutation operator Φ ∈ { 0 , 1 } N × N ( Φ 1 = Φ T 1 = 1 ) such that Φ x ∈ V ( ` ) . The permutation operator Φ only allo ws for swapping signal coefficients in two nonoverlapping pieces. • it satisfies generalized scale in variance; that is, for any x ∈ V ( ` ) , there exists an nontri vial permutation oper- ator , Φ ∈ { 0 , 1 } N × N such that Φ x ∈ V ( ` ) . When the permutation operator Φ swaps signal coef ficients in two nonov erlapping pieces, each piece has at most 2 ` nodes. While similar in spirit, the proposed graph multiresolution analysis is different from the original one [12]. For example, the complete space here is R N instead of L 2 ( R ) because of the discrete nature of the graph. W e unify the shift and scale inv ariance axioms via a permutation operator, which reshapes a graph signal by swapping signal coefficients. The standard shift in v ariance axiom ensures that the input signal shape is preserved during shifting; here, this is accomplished by requiring that the permutation operator swap the signal coefficients supported on two nonov erlapping pieces only . The standard scale in variance axiom ensures that the input signal shape is preserved during scaling; here, this is accomplished by requiring that the number of swaps scale exponentially as the multiresolution lev el ` grows; see Figure 3 for illustration. Fig. 2: Coarse-to-fine decomposition approach. At each step, we partition a larger piece into two smaller disjoint pieces and generate a pair of lowpass/highpass basis vectors. Piece v (3) 1 = V is at Lev el 3; Pieces v (2) 1 , v (2) 2 are at Le vel 2; and Pieces v 1 , v 2 , v 3 , v 4 are at Lev el 1. (a) Permutation in V (1) . W e swap the signal coefficients between the blue piece and the yellow piece. The total number of swaps is 2. (b) Permutation in V (2) . W e swap the signal coefficients between the blue piece and the yellow piece. The total number of swaps is 4. Fig. 3: Permutation leads to the generalized shift and scale in variances. The permutation operator Φ shifts a graph signal x ∈ V ( ` ) to another graph signal Φ x ∈ V ( ` ) by swapping signal coefficients supported on two difference pieces, which leads to the generalized shift in v ariance; the permutation oper - ator needs twice as many swaps to permute a graph signal in a coarser space, which leads to the generalized scale inv ariance. B. Coarse-to-F ine Construction Our goal now is to implement the graph multiresolution analysis. In classical signal processing, this is typically accom- plished by using filter banks, which in volves a series of do wn- sampling and shifting. Filter banks start with building filters in a fine space, which captures local information, and gradually building them in coarser spaces, which captures global infor- mation. For discrete-time signals, filter banks happen to be an efficient w ay to implement the multiresolution analysis because the downsampling and shifting operators follow naturally . For graph signals, howe ver , there is no recipe to permute the nodes; thus, it is hard to obtain efficient downsampling and shifting 5 operators; see details in Appendix A. Instead, we consider implementing graph multiresolution analysis using a coarse-to-fine appr oach . The main idea is to recursiv ely partition each piece into two smaller disjoint child pieces as follows: Gi ven a connected graph G 0 ( V 0 , E 0 , A 0 ) with |V 0 | > 1 , partition G 0 into two smaller graphs G 1 ( V 1 , E 1 , A 1 ) and G 2 ( V 2 , E 2 , A 2 ) by solving min V 1 , V 2 ||V 1 | − |V 2 || (1a) sub ject to V 1 ∩ V 2 = ∅ , V 1 ∪ V 2 = V 0 , (1b) G 1 , G 2 are connected . In other words, we want (1) each of the two child pieces to be connected; and (2) the partition to be close to a bisection; that is, the difference between cardinalities of two child pieces is as small as possible. These properties ensure that the coarse-to- fine approach implements the graph multiresolution analysis. W e solve (1) in Section IV -D. W e start with the coarsest lowpass subspace V (0) = { c 1 V , c ∈ R } and partition the largest piece V into two disjoint and connected child pieces v (1) 1 , v (1) 2 ⊆ V ; that is, v (1) 1 ∪ v (1) 2 = V , v (1) 1 ∩ v (1) 2 = ∅ , where the subscript denotes the index at each level. The lowpass/highpass basis vectors are, respectiv ely , v (1) 1 = g ( v (1) 1 , v (1) 2 ) , u (1) 1 = h ( v (1) 1 , v (1) 2 ) , where g ( S 1 , S 2 ) = s S 1 || S 2 | | S 1 | + | S 2 |  1 S 1 | S 1 | + 1 S 2 | S 2 |  ∈ R N , h ( S 1 , S 2 ) = s S 1 || S 2 | | S 1 | + | S 2 |  1 S 1 | S 1 | − 1 S 2 | S 2 |  ∈ R N , with S 1 , S 2 ⊂ V two nono verlapping pieces. The normalization ensures that each basis vector is of unit norm and 1 T u (1) 1 = 1 . The highpass subspace is U (1) = { c u (1) 1 , c ∈ R } . W e no w partition pieces v (1) 1 and v (1) 2 to obtain v (1) 1 , v (2) 2 and v (2) 3 , v (2) 4 , respecti vely . The lowpass/highpass basis vectors are v (2) k = g ( v (2) 2 k − 1 , v (2) 2 k ) , u (2) k = h ( v (2) 2 k − 1 , v (2) 2 k ) , for k = 1 , 2 . The lowpass subspace is V (2) = span  { v (2) k } K (2) k =1  and the highpass subspace is U (2) = span  { u (2) k } K (2) k =1  , where K (2) = 2 . W e keep on partitioning, building the lo wpass and highpass subspaces and their corre- sponding bases in the process. At the ` th lev el, we partition v ( ` +1) k to obtain v ( ` ) 2 k − 1 , v ( ` ) 2 k . When both v ( ` ) 2 k − 1 , v ( ` ) 2 k are nonempty , v ( ` ) k = g ( v ( ` ) 2 k − 1 , v ( ` ) 2 k ) is a lowpass basis v ector and u ( ` ) k = h ( v ( ` ) 2 k − 1 , v ( ` ) 2 k ) is a highpass basis vector; when one of them is empty , the cardinality of v ( ` +1) k is 1 and we cease partitioning in this branch. At the finest resolution, each piece corresponds to an indi vidual node. Since we promote bisection, the total decomposition depth L is around 1 + log 2 N . At the end of the process, we collect all highpass basis vectors into a Haar -like graph wav elet basis (see Algorithm 1). A toy example is sho wn in Figure 2. Algorithm 1 Haar-lik e Graph W avelet Basis Construction Input G ( V , E , A) graph Output W wav elet basis Function 1) initialize a stack of pieces sets S and a set of vectors W 2) push S = V into S 3) add w = 1 √ | S | 1 S to W 4) while the cardinality of the largest element in S is larger than 1 4.1) pop up one element from S as S 4.2) evenly partition S into two disjoint pieces S 1 , S 2 4.3) push S 1 , S 2 into S 4.4) add w = q | S 1 || S 2 | | S 1 | + | S 2 |  1 | S 1 | 1 S 1 − 1 | S 2 | 1 S 2  to W retur n W C. Graph W avelet Basis Pr operties 1) Efficiency: This coarse-to-fine approach inv olves ( N − 1) partitions. The ov erall computational complexity is approxi- mately P 1+log 2 N ` =1 2 ` f ( N / 2 ` ) , where f ( N ) is the computa- tional complexity of partitioning an N -node graph. For a sparse graph ( E = O ( N ) ), when we use a standard graph partitioning algorithm, METIS [46], to partition the graph, f ( N ) = O ( N ) and the overall computational complexity is O ( N log 2 N ) . 2) Graph multiresolution: When the number of nodes N = 2 L for some L ∈ Z + , the proposed graph wav elet basis in Algorithm 1 satisfies the axioms of the graph multiresolution analysis. When the number of nodes cannot be partitioned equally , the proposed graph wav elet basis may not exactly satisfy the generalized shift and scale in variance axioms due to the residual condition, but still comes close to the spirit of multiresolution. 3) Orthogonality: Orthogonality also implies efficient per- fect reconstruction. Theorem 1. The proposed graph wav elet basis W ∈ R N × N in Algorithm 1 is orthonormal; that is, for any graph signal x ∈ R , we have x = W W T x . The proof is giv en in Appendix B. D. Graph P artition Algorithm An ideal graph partitioning results in two connected sub- graphs with the same number of nodes; ho wever , connecti vity and bisection may conflict in practice. Many existing graph partition algorithms can be used in graph partition. For ex- ample, METIS provides an efficient bisection, but does not ensure that two resulting subgraphs are connected. In (1), we consider the connectivity-first approach, as the constraints (1b) requires that the resulting subgraphs be connected. The objec- tiv e function (1a) promotes a bisection; that is, two subgraphs 6 hav e similar number of nodes. The optimization problem (1) is combinatorial and we aim to obtain a suboptimal solution with certain theoretical guarantee. T o solve (1), we consider finding two nodes with the longest geodesic distance as two hubs and then compute the geodesic distances from each nodes to two hubs. W e rank all the nodes based on the dif ference between the geodesic distances to two hubs and record the median value. W e partition the nodes according to this median value. All the nodes falling into the median v alue forms the boundary set. W e further partition the boundary set to ensure connectivity and promote bisection. The details are summarized in Algorithm 2. Algorithm 2 Graph Partition with Connecti vity Guarantee Input G 0 original graph Output V 1 , V 2 two node sets Function 1) compute the geodesic distance matrix D ∈ R |V 0 |×|V 0 | ; 2) select v i , v j ∈ V 0 , such that D v i ,v j is maximized; 3) let p be median value of D v i , : − D v j , : ; 4) let S 1 = { v | D v i ,v − D v j ,v > p } and boundary set S 2 = { v | D v i ,v − D v j ,v = p } ; 5) partition S 2 into connected components C 1 , C 2 , . . . C M with | C 1 | < | C 2 | < . . . < | C M | . 6) set q m = | S 1 ∪ C 1 ∪ . . . ∪ C m | for m = 1 , 2 , · · · , M ; 7) set m ∗ = arg min m | q m − |V 0 | / 2 | ; 8) V 1 = S 1 ∪ C 1 ∪ . . . ∪ C m ∗ and V 2 = V 0 \V 1 end retur n V 1 , V 2 W e can sho w that Algorithm 2 provides a near-optimal solution; see Appendix C for the proof. Theorem 2. Let b V 1 , b V 2 be the solution gi ven by Algorithm 2. Then, b V 1 , b V 2 is a feasible solution of the optimization prob- lem (1) and || b V 1 | − | b V 2 || ≤ 2 | C m ∗ | , where C m ∗ is the m ∗ th smallest connected component in the boundary set, following from the Steps 5 - 7 in Algorithm 2. V . G R A P H D I C T I O NA R I E S W e now use the graph dictionary induced by the graph multiresolution analysis from the previous section to represent piecewise-smooth graph signals. As before, we start with piecewise-constant graph signals and then generalize to the piecewise-smooth ones. A. Piecewise-Constant Gr aph Dictionary Representing piecewise-constant graph signals is difficult because the geometry of the pieces is arbitrary . W e now show that the graph wa velet basis in Algorithm 1 can effecti vely parse the pieces and promote the sparse representations for piecewise-constant graph signals. Theorem 3. Let W ∈ R N × N be the graph wa velet basis in Algorithm 1. For a piece wise-constant graph signal x ∈ R N ,   W T x   0 ≤ 1 + k ∆ x k 0 L. where L is the decomposition depth. The proof is gi ven in Appendix D. Since we promote the bisection scheme, L is roughly 1 + log 2 N . Theorem 3 shows an upper bound on the sparsity of graph wa velet coefficients, which depends on the cut cost k ∆ x k 0 and the size of the graph. As shown in T able I, k ∆ x k 0 is usually small when x is a piecewise-constant signal. In [47], we also show that this graph wa velet basis can be used to detect localized graph signals. W e can expand the graph wav elet basis from Algorithm 1 to a redundant graph dictionary , allowing for more flexibility . Each piece v ( ` ) k obtained from the graph partition is a column vector (called an atom) 1 v ( ` ) k in the graph dictionary; we collect all the pieces at all lev els to obtain a dictionary . In other words, the piecewise-constant graph dictionary is D PC = { 1 v ( ` ) k } ` = L,k =2 ` ` =1 ,k =1 . (2) There are 2 N − 1 pieces in total; thus, D PC ∈ R N × (2 N − 1) and the proposed graph dictionary D PC contains a series of atoms with different sizes activating dif ferent positions. Each graph wav elet basis vector in Algorithm 1 can be represented as a linear combination of two atoms in the piecewise-constant graph dictionary . Since most atoms are sparse, the number of nonzero elements in the piece wise-constant dictionary is small, allowing for efficient storage. For example, when N = 2 L for some L = Z + , the number of nonzero elements is exactly N L . Corollary 1. Let D PC ∈ R N × (2 N − 1) be the piecewise- constant graph dictionary . Let the sparse coefficients of a piecewise-constant graph signal x ∈ PC( C ) be a ∗ = arg min a ∈ R 2 N − 1 k a k 0 , (3) sub ject to x = D PC a . Then, we hav e k a ∗ k 0 ≤ 1 + k ∆ x k 0 L. Corollary 1 directly follows from Theorem 3, as the graph wa velet basis can be linearly represented by the piecewise- constant graph dictionary . W e expect the upper bound in Corollary 1 is not tight. In practice, the corresponding sparsity is usually e ven smaller than the sparsity provided by the graph wa velet basis because of the redundancy and flexibility of the piecewise-constant graph dictionary . B. Piecewise-Smooth Gr aph Dictionary W e no w generalize the piece wise-constant graph dictionary to the piecewise-smooth graph dictionary . In the piece wise- constant graph dictionary , we use a single one-piece graph signal to activ ate a certain subgraph; in the piece wise-smooth graph dictionary , we can use multiple localized and bandlimited graph signals to acti vat e the same subgraph. Since localized and bandlimited graph signals are smooth on the corresponding subgraphs, the piecewise-smooth graph dictionary provides more redundancy and flexibility to capture the localized e vents within a graph signal. Let v ` k be the k th piece in the ` th decomposition le vel, G v ` k the corresponding subgraph and V v ` k ∈ R | v ` k |×| v ` k | the corre- sponding graph Fourier basis. The subdictionary corresponding 7 to the k th piece at the ` th decomposition lev el is D v ` k = V v ` k ( K ) ∈ R | v ` k |× min( K, | v ` k | ) , which is the first min( K , | v ` k | ) columns of V v ` k . W e collect all subdictionaries across all le vels to obtain the piecewise-smooth graph dictionary , D PS = { D v ` k } ` = L,k =2 ` ` =1 ,k =1 . (4) The total number of atoms of D PS is upper bounded by (2 N − 1) K with bandwidth K . The total number of nonzero elements of D PS is at most O ( N K log 2 N ) , still storage friendly . W e no w sho w that D PS promotes sparsity for piecewise- bandlimited graph signals. Theorem 4. Let D PS be the piecewise-smooth graph dic- tionary . Let the sparse coefficient of a piecewise-bandlimited graph signal x ∈ PBL( C , K ) be a ∗ = arg min a k a k 0 , sub ject to k x − D PS a k 2 2 ≤  par k x k 2 2 , where  par is a constant determined by the graph partitioning algorithm. Then, we hav e k a ∗ k 0 ≤ 1 + 2 K k ∆ x PC k 0 L, where L is the decomposition depth in the coarse-to-fine approach and x PC is a piecewise-constant signal that shares the same pieces with x . The proof is giv en in Appendix E. V I . E X P E R I M E N TA L R E S U LT S A good representation can be used in compression, ap- proximation, inpainting, denoising and localization. Here we ev aluate our proposed graph dictionaries on two tasks: approx- imation and localization. A. Experimental Setup W e consider six datasets summarized in T able II. • Sensors. This is a simulated geometric graph with 500 nodes and 2,050 edges. W e simulate a piecewise-smooth graph signal following [22]. • Minnesota. This is the Minnesota road network with 2,642 intersections and 3,304 road segments. W e model each intersection as a node and each road segment as an edge. W e simulate a localized smooth graph signal following [22]. • Manhattan. This is the Manhattan street network with 13,679 intersections and 34,326 road segments. W e model each intersection as a node and each road segment as an edge. W e model the restaurant distribution, and taxi- pickup positions as signals supported on the Manhattan street network. • Kaggle 1968. This is a social network of Facebook users with 277 nodes and 2,321 edges. It also contains 14 social circles, where each one can be modeled as a binary piece wise-constant signal supported on this social network. • Citeseer . This is a co-authorship network with 2,120 nodes and 3,705 edges. It also contains 7 research groups, where each one can be modeled as a binary piecewise-constant signal supported on this co-authorship network. • T eapot. This is a dataset with 7,999 3D points, repre- senting the surface of a teapot. W e construct a 10-nearest neighbor graph to capture the geometry . 3D coordinates can be modeled as three piecewise-smooth signals sup- ported on this generalized mesh. Dataset T ype # nodes # edges Signals Sensors Simulation 500 2,050 Simulation Minnesota Traf fic net 2,642 3,304 Simulation Manhattan Traf fic net 13,679 3,679 T axi Kaggle 1968 Social net 277 2,321 Circle Citeseer Citation net 2,120 3,705 Attribute T eapot Mesh 7,999 198,035 Coordinate T ABLE II: Dataset description. W e consider the following ten competitiv e representation methods: • PC (dashed dark red line). This is our piece wise-constant graph dictionary (2). • PS (solid red line). This is our piece wise-smooth graph dictionary (4). The bandwidth in each piece is 10 . • Delta (solid dark yellow line). This is the basis of Kro- necker deltas. • GFT (dashed yellow line) [3]. This is the graph Fourier basis. • SGWT (solid blue line) [30]. This is the spectral graph wa velet transform with fiv e wav elet scales plus the scaling functions for a total redundancy of 6. • Pyramid (dashed light blue line) [22]. This is the multi- scale pyramid transform. • CKWT (solid grey line) [25]. These are spatial graph wa velets with wa velet functions based on the renormalized one-sided Mexican hat wav elet, also with five wa velet scales and concatenated with the dictionary of Kronecker deltas. • Dif fusionW (dashed purple line) [34]. These are the the diffusion wavelets. • QMF (solid pink line) [17]. This is the graph-QMF filter bank transform. • CoSubFB (solid green line) [23]. This is the subgraph- based filter bank. B. Appr oximation Approximation is a standard task used to ev aluate the quality of a representation. The goal here is to use a few expansion coefficients to approximate a graph signal. W e consider two approximation strategies: nonlinear approximation and orthog- onal marching pursuit. Gi ven the budget of K expansion coefficients, nonlinear approximation chooses the K largest- magnitude ones to minimize the approximation error while or - thogonal marching pursuit greedily and sequentially selects K expansion coefficients to minimize the residual error . For each representation method, we use both approximation strategies 8 0 50 100 # expansion coefficents -6 -5 -4 -3 -2 -1 0 log(Error) PS PC delta GFT SGWT pyramid CKWT diffusionW QMF CoSubFB 0 50 100 # expansion coefficents -2 -1.5 -1 -0.5 0 log(Error) PS PC delta GFT SGWT pyramid CKWT diffusionW QMF CoSubFB 0 50 100 # expansion coefficents -5 -4 -3 -2 -1 0 log(Error) PS PC delta GFT SGWT pyramid CKWT diffusionW QMF CoSubFB 0 50 100 # expansion coefficents -0.8 -0.6 -0.4 -0.2 0 log(Error) PS PC delta GFT SGWT pyramid CKWT diffusionW QMF CoSubFB 0 50 100 # expansion coefficents -6 -5 -4 -3 -2 -1 0 log(Error) PS PC delta GFT SGWT pyramid CKWT diffusionW CoSubFB (a) Sensors. (b) Minnesota. (c) Kaggle 1968. (d) Citeseer . (e) T eapot. Fig. 4: Piecewise-smooth graph dictionary (in red) outperforms the other competitive methods on fiv e datasets. The x -axis is the number of coefficients used in the approximation and the y -axis is the approximation error (5), where lower means better . and report the results of the better one. The ev aluation metric is the normalized mean square error, defined as Error = k b x − x k 2 2 k x k 2 2 . (5) Figure 4 compares the approximation performances on fiv e datasets. Fi ve columns in Figure 4 sho w the sensors, Minnesota, Kaggle 1968, Citeseer and T eapot, respectiv ely . Each plot in the first row sho ws the visualization of the graph signal; each plot in the second ro w shows the approximation error on the logarithm scale, where the x -axis is the number of expansion coefficients and the y -axis is the normalized mean square error . Overall, the proposed piecewise-smooth graph dictionary outperforms its competitors under v arious types of graphs and graph signals. • Sensors. The graph signal is piecewise-smooth. The top three methods are the piecewise-smooth graph dictionary , the piecewise-constant graph dictionary and the diffusion wa velets; on the other end of the spectrum, the Kronecker deltas, which fit one signal coefficient at a time, fails. • Minnesota. The graph signal is localized smooth. The top three methods are the piecewise-smooth graph dictionary , the diffusion wa velets and the spectral graph wav elet transform; on the other end of the spectrum, the spatial graph wa velets fail. • Kaggle 1968. The graph signal is binary and piecewise- constant with a few pieces. The top three methods are the piecewise-smooth graph dictionary , the piecewise-constant graph dictionary and the spectral graph wa velet transform; on the other end of the spectrum, the multiscale pyramid transform fails. • Citesser . The graph signal is binary , piece wise-constant with a lar ge number of pieces. None of the methods performs well due to the noisy input signal. The top three methods are the piecewise-smooth graph dictionary , the subgraph-based filter bank and the graph-QMF filter bank transform; on the other end of the spectrum, the multiscale pyramid transform fails. • T eapot. The graph signal is smooth. The top three methods are the piecewise-smooth graph dictionary , the subgraph-based filter bank and the graph F ourier basis; on the other end of the spectrum, the Kronecker deltas and the spatial graph wav elets fail. T o ha ve an illustrati ve understanding, we visualize the reconstructions in Figure 5 where each plot sho ws the recon- struction by using 100 expansion coefficients. Additionally , Figure 6 compares the approximations of urban data supported on the Manhattan street networks. The two rows show the reconstructions of the taxi-pickup distribution and restaurant distribution, respectively , by using 100 expansion coefficients. W e see that three graph signals are nonsmooth and inhomogeneous. For each of the three graph signals, the piecewise-smooth graph dictionary pro vides the lar gest signal- to-noise ratio (SNR) and smallest normalized mean square error . The spectral graph wav elet transform is also competitiv e; the subgraph-based filter bank tends to be over smooth and the spatial graph wav elets tend to be less smooth. C. Localization One functionality of a graph dictionary is to detect local- ized graph signals [47]; applications include localizing virus attacks in cyber-ph ysical systems, localizing stimuli in brain connectivity networks and mining traffic events in city street networks. W e here consider simulations on the Minnesota road networks. W e generate one-piece graph signals with Gaussian noises. Giv en the noisy graph signals, we use graph dictionary to remove noises and reconstruct a denoised graph signal to localize the underlying activ ated pieces. W e average over 20 random trials. Figure 7 shows the localization performance, where the x - axis is the noise lev el and the y -axis is either SNR or corre- lation. In both cases, higher value means better . The baseline (dark curv e) means that we naively use the noisy graph signal as the reconstruction. W e see that the piecewise-smooth graph dictionary outperforms the others in terms of both metrics, 9 (a) Original. (b) PS. (c) PC. (d) Delta. (e) GFT . (f) SGWT . (g) Pyramid. (h) CKWT . (i) Diffusion wa velets. (j) CSFB. Fig. 5: Reconstruction visualization for T eapot. (a) T axi-pickup distrib ution. (b) PS. (c) CSFB. (d) SGWT . (e) CKWT . (f) Restaurant distribution. (g) PS. (h) CSFB. (i) SGWT . (j) CKWT . Fig. 6: Reconstruction visualization for urban data. especially when the noise level is lo w; when the noise level is high, piecewise-constant graph dictionary , piecewise-smooth graph dictionary and multiscale pyramid transform perform similarly . Figure 8 compares reconstructions. Figure 8 (a) shows the original one-piece graph signal, (b) shows the noisy graph signal, while (c), (d) and (e) show the denoised graph signals by using the piecewise-smooth graph dictionary , the subgraph- based filter bank and the spectral graph wav elet transform, respectiv ely . W e see that the piece wise-smooth graph dictionary localizes the underlying piece well, the spectral graph w av elet transform does a reasonable job, but the subgraph-based filter bank provides an ov er-smooth reconstruction and fails. V I I . C O N C L U S I O N S A N D F U T U R E W O R K S In this paper , we model complex and irregular data, such as urban data supported on the city street networks and profile information supported on the social networks, as piece wise- smooth graph signals. W e propose a well-structured and storage-friendly graph dictionary to represent those graph signals. T o ensure a good representation, we consider the graph multiresolution analysis. T o implement this, we pro- pose the coarse-to-fine approach, which iterativ ely partitions a graph into two subgraphs until we reach individual nodes. This approach efficiently implements the graph multiresolution analysis and the induced graph dictionary promotes sparse representations for piecewise-smooth graph signals. Finally , we test the proposed graph dictionary on the tasks of ap- proximation and localization. The empirical results validate that the proposed graph dictionary outperforms eight other 10 0 0.5 1 1.5 Noise level -15 -10 -5 0 5 SNR PS PC pyramid CoSubFB Baseline 0 0.5 1 1.5 Noise level 0 0.2 0.4 0.6 0.8 1 Correlation PS PC pyramid CoSubFB Baseline (a) Original. (b) PS. Fig. 7: Localization performance as a function of noise lev el. Piecewise-smooth graph dictionary (in red) outperforms the other competiti ve methods. The x -axis is the noise le vel and the y -axis is the signal-to-noise ratio (SNR), where higher means better . representation methods on v arious datasets. Future works may include dev elop sampling, reco very , denoising and detection strategies based on the proposed piece wise-smooth graph signal model. R E F E R E N C E S [1] M. Jackson, Social and Economic Networks , Princeton University Press, 2008. [2] M. Newman, Networks: An Intr oduction , Oxford Uni versity Press, 2010. [3] D. I. Shuman, S. K. Narang, P . Frossard, A. Ortega, and P . V anderghe ynst, “The emerging field of signal processing on graphs: Extending high- dimensional data analysis to netw orks and other irregular domains, ” IEEE Signal Pr ocess. Mag. , vol. 30, pp. 83–98, May 2013. [4] A. Sandryhaila and J. M. F . Moura, “Discrete signal processing on graphs, ” IEEE T rans. Signal Pr ocess. , vol. 61, no. 7, pp. 1644–1656, Apr . 2013. [5] P . Prandoni and M. V etterli, “ Approximation and compression of piecewise smooth functions, ” Phil. T ransaction R. Soc. Lond. A. , v ol. 357, no. 1760, pp. 2573–2591, 1999. [6] M. B. W akin, J. K. Romberg, H. Choi, and R. G. Baraniuk, “W avelet- domain approximation and compression of piece wise smooth images, ” IEEE T rans. Imag e Pr ocess. , vol. 15, no. 5, pp. 1071–1087, 2006. [7] V . Chandrasekaran, M. B. W akin, D. Baron, and R. G. Baraniuk, “Repre- sentation and compression of multidimensional piecewise functions using surflets, ” IEEE T rans. Inf. Theory , vol. 55, no. 1, pp. 374–400, 2009. [8] M. Unser , “Splines: A perfect fit for signal and image processing, ” IEEE Signal Pr ocess. Mag. , vol. 16, no. 6, pp. 22–38, Nov . 1999. [9] Y -X W ang, J. Sharpnack, A. Smola, and R. J. Tibshirani, “Trend filtering on graphs, ” in AIST A TS , San Diego, CA, May 2015. [10] M. V etterli, J. Ko va ˇ cevi ´ c, and V . K. Goyal, F oundations of Sig- nal Pr ocessing , Cambridge Uni versity Press, Cambridge, 2014, http://www .fourierandwav elets.org/. [11] S. Mallat, A W avelet T our of Signal Processing , Academic Press, New Y ork, NY , third edition, 2009. [12] M. V etterli and J. K ov a ˇ cevi ´ c, W avelets and Subband Coding , Prentice Hall, Englewood Cliffs, NJ, 1995, http://wav eletsandsubbandcoding.org/. [13] M. Belkin and P . Niyogi, “Laplacian eigenmaps for dimensionality reduction and data representation, ” Neur . Comput. , vol. 13, pp. 1373– 1396, 2003. [14] A. Sandryhaila and J. M. F . Moura, “Big data processing with signal processing on graphs, ” IEEE Signal Pr ocess. Mag. , vol. 31, no. 5, pp. 80–90, 2014. [15] A. Anis, A. Gadde, and A. Ortega, “T owards a sampling theorem for signals on arbitrary graphs, ” in Pr oc. IEEE Int. Conf. Acoust., Speech, Signal Pr ocess. , Florence, May 2014, pp. 3864–3868. [16] S. Chen, R. V arma, A. Sandryhaila, and J. Ko va ˇ cevi ´ c, “Discrete signal processing on graphs: Sampling theory , ” IEEE T rans. Signal Pr ocess. , vol. 63, no. 24, pp. 6510–6523, Dec. 2015. [17] S. K. Narang and A. Ortega, “Perfect reconstruction two-channel wavelet filter banks for graph structured data, ” IEEE T rans. Signal Pr ocess. , vol. 60, pp. 2786–2799, June 2012. [18] S. K. Narang and Antonio Ortega, “Compact support biorthogonal wav elet filterbanks for arbitrary undirected graphs, ” IEEE T rans. Signal Pr ocess. , vol. 61, no. 19, pp. 4673–4685, Oct. 2013. [19] V . N. Ekambaram, G. C. Fanti, B. A yazifar , and K. Ramchandran, “Critically-sampled perfect-reconstruction spline-wavelet filterbanks for graph signals, ” in GlobalSIP , Austin, TX, Dec. 2013, pp. 475–478. [20] M. S. Kotzagiannidis and P . L. Dragotti, “The graph FRI framew ork- spline wavelet theory and sampling on circulant graphs, ” in ICASSP , Shanghai, China, Mar . 2016, pp. 6375–6379. [21] Y . T anaka and A. Sakiyama, “M-channel oversampled graph filter banks, ” IEEE T rans. Signal Pr ocess. , vol. 62, no. 14, pp. 3578–3590, 2014. [22] D. I Shuman, M. J. Faraji, and P . V andergheynst, “ A multiscale pyramid transform for graph signals, ” IEEE T rans. Signal Process. , vol. 64, no. 8, pp. 2119–2134, April 2016. [23] N. Tremblay and P . Borgnat, “Subgraph-based filterbanks for graph signals, ” IEEE T rans. Signal Process. , vol. 64, no. 15, pp. 3827–3840, August 2016. [24] Y . Jin and D. I Shuman, “ An m-channel critically sampled filter bank for graph signals, ” Pr oceedings of the IEEE Conference on Acoustics, Speech, and Signal Processing , pp. 3909–3913, March 2017. [25] M. Crovella and E. Kolaczyk, “Graph wa velets for spatial traffic analysis, ” in Pr oc. IEEE INFOCOM , Mar . 2003, vol. 3, pp. 1848–1857. [26] A. D. Szlam, M. Maggioni, R. R. Coifman, and J. C. BremerJr ., “Diffusion-dri ven multiscale analysis on manifolds and graphs: top-down and bottom-up constructions, ” in Pr oceedings of the SPIE, W avelets XI , Aug. 2005, vol. 5914, pp. 445–455. [27] M. Gavish, B. Nadler, and R. R. Coifman, “Multiscale wavelets on trees, graphs and high dimensional data: Theory and applications to semi supervised learning, ” in Pr oc. Int. Conf. Mach. Learn. , Haifa, Israel, June 2010, pp. 367–374. [28] R. M. Rustamov , “ A verage interpolating wav elets on point clouds and graphs, ” CoRR , vol. abs/1110.2227, 2011. [29] Jeff Irion and Naoki Saito, “Hierarchical graph laplacian eigen trans- forms, ” JSIAM Letters , vol. 6, no. 0, pp. 21–24, Jan. 2014. [30] D. K. Hammond, P . V andergheynst, and R. Gribon val, “W av elets on graphs via spectral graph theory , ” Appl. Comput. Harmon. Anal. , vol. 30, pp. 129–150, Mar . 2011. [31] N. Leonardi and D. V an De V ille, “T ight wav elet frames on multislice graphs, ” IEEE T rans. Signal Process. , vol. 61, no. 13, pp. 3357–3367, 2013. [32] D. I Shuman, C. W iesmeyr , N. Holighaus, and P . V andergheynst, “Spectrum-adapted tight graph wavelet and vertex-frequenc y frames, ” IEEE T rans. Signal Pr ocess. , vol. 63, no. 16, pp. 4223–4235, August 2015. [33] D. I Shuman, B. Ricaud, and P . V anderghe ynst, “V erte x-frequency analysis on graphs, ” Applied and Computational Harmonic Analysis , vol. 40, no. 2, pp. 260–291, March 2016. [34] R. R. Coifman and M. Maggioni, “Diffusion wavelets, ” Appl. Comput. Harmon. Anal. , pp. 53–94, July 2006. [35] J. Bremer, R. Coifman, M. Maggioni, and A. R. Szlam, “Diffusion wav elet packets, ” Appl. Comput. Harmon. Anal. , v ol. 21, pp. 95–112, July 2006. [36] X. Zhang, X. Dong, and P . Frossard, “Learning of structured graph dictionaries, ” in Proc. IEEE Int. Conf . Acoust., Speech, Signal Pr ocess. , Kyoto, Japan, 2012, pp. 3373–3376. [37] D. Thanou, D. I. Shuman, and P . Frossard, “Learning parametric dictionaries for signals on graphs, ” IEEE T rans. Signal Process. , vol. 62, pp. 3849–3862, June 2014. [38] X. Zhu and M. Rabbat, “ Approximating signals supported on graphs, ” in Pr oc. IEEE Int. Conf. Acoust., Speech, Signal Pr ocess. , K yoto, Japan, Mar . 2012, pp. 3921 – 3924. [39] B. Ricaud, D. I. Shuman, and P . V anderghe ynst, “On the sparsity of wav elet coefficients for signals on graphs, ” in Conference on W avelets and Sparsity XV , 2013, vol. 8858. [40] W . K. Allard, G. Chen, and M. Maggioni, “Multiscale geometric methods for data sets i: Multiscale svd, noise and curvature, ” Appl. Comput. Harmon. Anal. , , no. 3, pp. 504–567, 2017. [41] W . K. Allard, G. Chen, and M. Maggioni, “Multi-scale geometric methods for data sets ii: Geometric multi-resolution analysis, ” Appl. Comput. Harmon. Anal. , , no. 3, pp. 435–462, 2012. [42] W . Liao and M. Maggioni, “ Adaptiv e geometric multiscale ap- proximations for intrinsically lo w-dimensional data, ” arXiv preprint arXiv:1611.01179 , 2016. 11 (a) Original. (b) Noisy . (c) PS. (d) CSFT . (e) SGWT . Fig. 8: Localization visualization. [43] U. V . Luxbur g, “ A tutorial on spectral clustering, ” Statistics and Computing. , vol. 17, pp. 395–416, 2007. [44] X. W ang, P . Liu, and Y . Gu, “Local-set-based graph signal reconstruc- tion, ” IEEE T rans. Signal Process. , vol. 63, no. 9, May 2015. [45] M. Tsitsvero, S. Barbarossa, and P . D. Lorenzo, “Signals on graphs: Uncertainty principle and sampling, ” , 2015. [46] G. Karypis and V . K umar , “ A fast and high quality multilev el scheme for partitioning irregular graphs, ” SIAM J. Scientific Computing , vol. 20, no. 1, pp. 359–392, 1998. [47] S. Chen, Y . Y ang, S. Zong, A. Singh, and J. K ov a ˇ cevi ´ c, “Detecting localized categorical attributes on graphs, ” IEEE T rans. Signal Pr ocess. , vol. 65, pp. 2725–2740, May 2017. A P P E N D I X A. Iterated Gr aph F ilter Bank In this section, we generalize the classical filter banks to the graph domain and point out why the graph filter banks are hard to implement. Suppose we have an ordering of nodes { v 1 , v 2 , . . . , v N } , such that two consecutive nodes v 2 k − 1 , v 2 k are connected for k = 1 , 2 , . . . , K , where K ≤ b N/ 2 c . W e group all pairs of v 2 k − 1 , v 2 k to form a series of connected and nonov erlapping subgraphs. The basis v ectors of the k th subgraph are v (1) k = 1 √ 2  1 v 2 k − 1 + 1 v 2 k  ∈ R N , (6) u (1) k = 1 √ 2  1 v 2 k − 1 − 1 v 2 k  ∈ R N , (7) where the subscript k is the index of the subgraph and the superscript 1 indicates the root layer , the low-pass basis vector v (1) k considers the average of two nodes within this subgraph and the high-pass basis sequence u (1) k considers the dif ference between two nodes within this subgraph. W e collect all the low-pass basis vectors and high-pass basis vectors to form a low-pass subspace and a high-pass subspace, respecti vely , V (1) = span  { v (1) k } K k =1  and U (1) = span  { u (1) k } K k =1  . Different from the discrete-time scenario, V (1) ⊕ U (1) may not span the entire R N space, as a fe w nodes may be isolated due to the ordering. Let the residual subspace be R (1) = span  { 1 v k } N k =2 K +1  , where each basis vector only activ ates an indi vidual node. No w V (1) ⊕ U (1) ⊕ R (1) = R N . For any graph signal x ∈ R N , the reconstruction is x = K X k =1 h x , v (1) k i v (1) k | {z } x V (1) + K X k =1 h x , u (1) k i u (1) k | {z } x U (1) + N X k =2 K +1 h x , 1 v k i 1 v n | {z } x R (1) , Fig. 9: As a fine-to-coarse approach, the analysis part of iterated graph filter banks implement the graph multiresolution analysis (Definition 5). There is no residual in this case. where x V (1) ∈ V (1) is the low-pass projection, x U (1) ∈ U (1) is the high-pass projection and x R (1) ∈ R (1) handles the residual condition. T o summarize, based on a well-designed ordering, we parti- tion the entire graph into a series of nonov erlapping subgraphs and then design the Haar-like basis vectors on graphs. For discrete-time signals whose underlying graph is a directed line graph, the ordering is provided by time and each subgraph con- tains two consecutiv e time stamps. As described in Section ?? , because of the nice ordering by time, all the basis vectors can be efficiently obtained by filtering following by downsampling; howe ver , this is not true for arbitrary graphs. Follo wing the classical discrete-time signal processing, we can iterativ ely decompose the low-pass subspace and obtain smoother and smoother subspaces, which is equiv alent to coarsen in the graph vertex domain. This iterated graph filter bank di vides the verte x-spectrum plane into more tiles, ap- proaching to the limit of uncertainty barrier . Here we sho w the second layer for an example. Let a supernode (connected node set) v (2) k = v 2 k − 1 ∪ v 2 k for k = 1 , 2 , . . . , K , where the superscript of the supernode indicates the second layer . T wo supernodes v (2) i , v (2) j are connected when there exists a pair of nodes p ∈ v (2) i , q ∈ v (2) j satisfying that p, q are connected. Similarly to the paradigm in Section A, suppose we ha ve an ordering of K supernodes { v (2) 1 , v (2) 2 , . . . , v (2) K } , such that two consecuti ve supernodes v (2) 2 k − 1 , v (2) 2 k are connected for k = 1 , . . . , K (2) , where K (2) ≤ b K / 2 c . W e group all v (2) 2 k − 1 , v (2) 2 k to form a series of connected, yet nonov erlapping 12 subgraphs. Let S 1 , S 2 ⊂ V be two nonov erlapping supernodes. W e define the low-pass and high-pass Haar template basis vector are, respecti vely , g ( S 1 , S 2 ) = s S 1 || S 2 | | S 1 | + | S 2 |  1 S 1 | S 1 | + 1 S 2 | S 2 |  ∈ R N , h ( S 1 , S 2 ) = s S 1 || S 2 | | S 1 | + | S 2 |  1 S 1 | S 1 | − 1 S 2 | S 2 |  ∈ R N . Follo wing from the template, the basis vectors of the k th subgraph are v (2) k = g ( v (2) 2 k − 1 , v (2) 2 k ) , u (2) k = h ( v (2) 2 k − 1 , v (2) 2 k ) . W e collect all the low-pass basis vectors and high-pass basis vectors in the second layer to form a lo w-pass subspace and a high-pass subspace, respectiv ely , V (2) = span  { v (2) k } K (2) k =1  and U (2) = span  { u (2) k } K (2) k =1  . Let the residual subspace be R (2) = span  { 1 v (2) n } K n =2 K (2) +1  , where each basis vector only activ ates an individual supernode. Now V (2) ⊕ U (2) ⊕ R (2) = V (1) . For an y graph signal x ∈ R N , the reconstruction is x = K (2) X k =1 h x , v (2) k i v (2) k | {z } ∈ V (2) + K (2) X k =1 h x , u (2) k i u (2) k | {z } ∈ U (2) + K X k =1 h x , u (1) k i u (1) k | {z } ∈ U (1) + N X k =2 K +1 h x , 1 v k i 1 v k | {z } ∈ R (1) + K X k =2 K (2)+1 h x , 1 v (2) k i 1 v (2) k | {z } ∈ R (2) . W e can keep decomposing the low-pass subspace until there is only one constant basis vector . During the iterated decomposi- tion, we keep coarsening in the graph vertex domain, leading to larger supernodes and more global-wise basis vectors; we thus call this a fine-to-coarse appr oach ; see Figure 9. Let the decomposition depth be L . By induction, the general reconstruction is x = K ( L ) X k =1 h x , v ( L ) k i v (2) k | {z } ∈ V ( L ) + L X ` =1 K ( ` ) X k =1 h x , u ( ` ) k i u ( ` ) k | {z } ∈ U ( ` ) + L X ` =1 K ( ` − 1) X k =2 K ( ` ) +1 h x , 1 v ( ` ) k i 1 v ( ` ) k | {z } ∈ R ( ` ) , where v (1) k = v k , K 1 = K and K 0 = N . Note that for discrete-time signals, the ordering of time stamps is naturally provided by time, leading to straightforward downsampling and shifting, and iterated filter banks, as a fine- to-coarse approach, are ef ficient architectures to implement the multiresolution analysis. For graph signals, the ordering in each multiresolution lev el is unknown and an efficient fine-to-coarse approach to implement the graph multiresolution analysis is not straightforward any more. This is why we consider the coarse- to-fine approach in this paper; in other words, we con vert the problem of node ordering to the problem of graph partitioning, which is more efficient and straightforward. B. Pr oof of Theorem 1 Pr oof. First, we show each vector has norm one.      s | S 1 || S 2 | | S 1 | + | S 2 |  1 | S 1 | 1 S 1 − 1 | S 2 | 1 S 2       2 2 ( a ) =      s | S 1 || S 2 | | S 1 | + | S 2 | 1 S 1 | S 1 |      2 2 +      s | S 1 || S 2 | | S 1 | + | S 2 | 1 S 2 | S 2 |      2 2 = 1 , where ( a ) follows from that S 1 ∩ S 2 = ∅ . Second, we show each vector is orthogonal to the other vectors. W e hav e 1 T w = s | S 1 || S 2 | | S 1 | + | S 2 | X i ∈ S 1 1 | S 1 | − X i ∈ S 2 1 | S 2 | ! = 0 . Thus, each vector is orthogonal to the first vector , 1 V / p |V | . Each other indi vidual vector is generated from two node sets. Let S 1 , S 2 generate w i and S 3 , S 4 generate w j . Due to the construction, there are only two conditions, two node sets of one vector belong to one node set of the other vector , and all four node sets do not share element with each other . For the first case, without losing generality , let ( S 3 ∪ S 4 ) ∩ S 1 = S 3 ∪ S 4 , we hav e w T i w j = s | S 1 || S 2 | | S 1 | + | S 2 | | S 3 || S 4 | | S 3 | + | S 4 | X i ∈ S 3 1 | S 3 | − X i ∈ S 4 1 | S 4 | ! = 0 . For the second case, the inner product between w i and w j is zero because their supports do not match. Third, we sho w that W spans R N . Since we recursiv ely partition the node set until the cardinalities of all the node sets are smaller than 2, there are N vectors in W . C. Pr oof of Theorem 2 Pr oof. W e first show that V 1 , V 2 are connected and then bound the cardinality dif ference. Since the original graph is connected, D v i ,v j is finite, where v i , v j are two hubs. In Step 4, we partition the nodes according to their distances to two hubs. Every node in the node set S 1 is connected to v j ; thus, the subgraph induced by the node set S 1 is connected. In Step 5, we partition the boundary set S 2 into connected node sets, C 1 , C 2 , · · · , C M , and each of them connects to S 1 ; otherwise, the maximum element in the geodesic distance matrix D is infinity . W e thus ha ve S 1 ∪ C 1 ∪ C 2 · · · ∪ C m is connected for all m = 1 , · · · , M . When we set m = m ∗ obtained in Step 7, we have V 1 = S 1 ∪ C 1 ∪ C 2 · · · ∪ C m ∗ is connected. Similarly , we can show that V 2 is also connected. In Step 3, we set p as the median v alue of the dif ferences to two hubs, which sets | S 1 | around |V 0 | / 2 . In Step 6, we sequen- tially add connected components to S 1 and finally choose the 13 one, whose cardinality is closest to |V 0 | / 2 . The last component added to S 1 is C m ∗ , which ensures that ||V 1 | − |V 0 | / 2 | ≤ | C m ∗ | and ||V 2 | − |V 0 | / 2 | ≤ | C m ∗ | . D. Pr oof of Theorem 3 Pr oof. When an edge e ∈ supp(∆ w ) , where w is one basis vector in the graph wa velet basis W , ∆ is the graph incident matrix, and supp denotes the edge indices activ ated by the nonzero elements of ∆ w ; we call that the edge e is acti vated by the wa velet basis vector w . Since in each lev el, the pieces are disjoint, each edge will be activ ated at most once in each lev el; in total, each edge will be activ ated by at most L wavelet basis vectors, where L the decomposition lev el. Let acti v ations( e ) be the number of wav elet basis vectors in W that activ ates e .   W T x   0 ≤ 1 + X e ∈ Supp(∆ w ) activ ations( e ) ≤ 1 + k ∆ x k 0 L, where 1 comes from the activ ation of the first column vector , which is constant. Since we promote the bisection scheme, the decomposition lev el L is roughly 1 + log 2 N . E. Pr oof of Theorem 4 Pr oof. The main idea is that we approximate a bandlimited signal in the original graph by using bandlimited signals in subgraphs. Based on the eigen vectors of graph Laplacian matrix, we define the bandlimited space, where each signal can be represented as x = V ( K ) a , where V ( K ) is the submatrix of V containing the first K columns in V . W e can show that this bandlimited space is a subspace of the small-variation space { x : x T L x ≤ λ K x T x } . x T L x = X i,j ∈E W i,j ( x i − x j ) 2 = X S c X i,j ∈E S c W i,j ( x i − x j ) 2 + X i,j ∈ ( E / ∪ c E S c ) W i,j ( x i − x j ) 2 = X S c x T S c L S c x S c + x T L cut x ≤ λ K x T x , where L S c is the graph Laplacian matrix of the subgraph G S c and L S c stores the residual edges, which are cut in the graph partition algorithm. Thus, { x : x T L x ≤ λ K x T x } is a subset of S S c { x S c : x T S c L S c x S c ≤ λ K x T x − x T L cut x } ; that is, any small- variation graph signal in the whole graph can be precisely represented by small-variation graph signals in the subgraphs. In each local set, when we use the bandlimited space { x : x = V S c ( K ) a } to approximate the space { x S c : x T S c L S c x S c ≤ c x T S c x S c } , the maximum error we suf fer from is c x T S c x S c /λ ( S c ) K +1 , which is solved by the follo wing optimization problem, max x   x − V S c V T S c x   2 2 sub ject to : x T L S c x ≤ c x T x . In other words, in each local set, the maximum error to represent { x S c : x T S c L S c x S c ≤ λ K x T x − x T L cut x } is ( λ K x T x − x T L cut x ) /λ ( S c ) K +1 . Since all the local sets share the variation budget of λ K x T x together , the maximum error we suffer from is  par = x T ( λ K I − L cut ) x min S c λ ( S c ) K +1 k x k 2 2 , which depends on the property of graph partitioning. In Corollary 1, we have shown that we need at most 2 L k ∆ x PC k 0 local sets to represent the piecewise-constant template of x . Since we use at most K eigen vectors in each local set, we obtain the results in Theorem 4.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment