Extraction of hierarchical functional connectivity components in human brain using resting-state fMRI
The study of hierarchy in networks of the human brain has been of significant interest among the researchers as numerous studies have pointed out towards a functional hierarchical organization of the human brain. This paper provides a novel method fo…
Authors: Dushyant Sahoo, Theodore D. Satterthwaite, Christos Davatzikos
1 Hierarchical e xtraction of functional connecti vity components in human brain using resting-state fMRI Dushyant Sahoo, Theodore D. Satterthwaite, Christos Dav atzikos Abstract —The study of functional networks of the human brain has been of significant interest in cognitive neuroscience for over two decades, albeit they are typically extracted at a single scale using various methods, including decompositions like ICA. H owever , since numer ous studies ha ve suggested that the functional organization of the brain is hierarchical, analo- gous decompositions might better capture functional connectivity patterns. Moreov er , hierarchical decompositions can efficiently reduce the very high dimensionality of functional connectivity data. This paper pro vides a novel method for the extraction of hierarchical connectivity components in the human brain using resting-state fMRI. The method builds upon prior work of Sparse Connectivity Patterns (SCPs) by introducing a hierarchy of sparse, potentially overlapping patterns. The components are estimated by cascaded factorization of correlation matrices generated from fMRI. The goal of the paper is to extract sparse interpr etable hierarchically-organized patterns using correlation matrices where a low rank decomposition is formed by a linear combination of a higher rank decomposition. W e formulate the decomposition as a non-con vex optimization problem and solve it using gradient descent algorithms with adaptive step size. Along with the hierarch y , our method aims to capture the heterogeneity of the set of common patterns across individuals. W e first validate our model through simulated experiments. W e then demonstrate the effectiveness of the dev eloped method on two different real- world datasets by showing that multi-scale hierarchical SCPs are repr oducible between sub-samples and are more reproducible as compared to single scale patterns. W e also compare our method with an existing hierarchical community detection approach. Index T erms —Connectivity analysis, Matrix factorization, Hi- erarchical decomposition, fMRI I . I N T R O D UC T I O N It has been known that the human brain consists of spatially different regions which are functionally connected to form networks [ 1 ]. In addition, these networks are thought to be hierarchically org anized in the brain [ 2 ], [ 3 ], [ 4 ], [ 5 ]. Ho wev er , our understanding of the hierarchical nature of these networks is limited due to their complex nature. Most of the commonly used methods, such as Independent Component Analysis (ICA) [ 6 ], Sparse Dictionary Learning (DL) [ 7 ] and graph theory based network analysis [ 8 ], for analysis of functional networks are focused on estimating a fixed number of networks with no hierarchy . If the assumption about the hierarchy is true then the original data might contain complex hierarchical Dushyant Sahoo and Christos Dav atzikos are with Department of Electrical and Systems Engineering University of Pennsylvania, P A, USA (e-mail: sadu@seas.upenn.edu; Christos.Davatzikos@uphs.upenn.edu) Theodore D. Satterthwaite is with Department of Psychiatry , University of Pennsylvania, P A, USA (email: sattertt@pennmedicine.upenn.edu ) information with implicit lower-le vel hidden attributes, that classical one lev el connectivity methodologies would not be able to capture effecti vely and interpretably . Notably , existing methods extract different number of components but can not describe relationships between the components. Most of the methods used for estimation of hierarchical networks in fMRI data analysis are of agglomerativ e (“bottom- up”) type such as Hierarchical clustering [ 9 ], [ 10 ], Hierar- chical Community Detection [ 11 ] where the method begins by regarding each element as a separate network and then merging them into larger networks successiv ely . Most of the hierarchical community detection approaches assume that the communities are independent [ 12 ], [ 13 ], [ 14 ] where they hav e in vestigated multi-scale brain networks and conducted multi- scale community detection by manipulating the number of communities. But, this is not the case in the human brain where it is known the certain brain regions interact with multiple networks i.e., the networks overlap [ 15 ]. Relativ ely fe w hierarchical community detection methods [ 16 ], [ 17 ], [ 18 ], [ 19 ] hav e been developed which find ov erlapping communities. Moreov er , in community detection approaches, negati ve edge links are treated as repulsion. Previously , most approaches hav e used thresholds before their analysis and estimated networks by using sparse graphs. The reason for thresholding was that the strong edges contain most relev ant information leading to the remov al of negati ve edges. In contrast, in resting fMRI, a negati ve edge link carries essential information on functional co-v ariation with the opposing phase [ 20 ] and has a substantial physiological basis [ 21 ], [ 22 ]. These relations may play an important role in neuropsychiatric disorders and cognitiv e dif ferentiation [ 23 ]. Some studies have recently shown that the weak network edges contain unique information that can not be re vealed by analysis of just strong edges [ 24 ], [ 25 ]. Assigning anti-correlated and correlated regions to the same component can reveal more details about the organization of the human brain patterns [ 26 ], [ 27 ], [ 28 ], as long as interpreted correctly . Non-negati ve Matrix Factorization (NMF) [ 29 ] is one com- mon matrix decomposition approach which many researchers use for obtaining information about community structure by analyzing low dimensional matrix. NMF has been used to find hierarchical structure [ 30 ], recently , [ 31 ] used Deep Semi Non- negati ve Matrix Factorization [ 32 ] for estimating hierarchical, potentially overlapping, functional networks. The model giv en by [ 31 ] could only find networks containing regions with positiv e correlation between them as the method is based on non-negati ve matrix factorization thus limiting the model to 2 only use positiv e matrices. Our work addresses aforementioned limitations by modeling the fMRI data to capture essential properties of the network, namely- 1) Sparsity: only a small subset of nodes interact with other nodes in a given network; 2) Heterogeneity: some networks might be more prominent in particular individuals as compared to others; 3) Existence of positively and negati vely correlated nodes in a network; 4) Overlapping networks, which is likely to reflect true brain organization, as brain networks might share certain regional components; and 5) Hierarchy: By adding extra layers of abstraction we can learn latent attributes and the hierarchy in the networks. Our method is built upon Sparse Connecti vity Patterns (SCPs) [ 26 ] which can be considered a symmetric CP decomposition for which an indirect fitting procedure makes the model structure equiv alent to the P ARAF AC2 model representation considered in [ 33 ] with the addition of sparsity rather than orthogonality . Our method aims to find Hierarchical Sparse Connectivity Patterns (hSCPs) by jointly decomposing correlation matrices into multiple components having different ranks using a cascaded framew ork for matrix factorization. W e use gradient descent with adaptiv e step size for solving non-con vex optimization, and have also introduced an initialization algorithm for making algorithm deterministic and faster . W e ev aluate the representation learned by the model on two different real datasets and compare it with EA GLE [ 17 ] and OSLOM [ 18 ] which are well kno wn hierarchical ov erlapping community detection algorithms. W e also provide an extension of the model for clustering the data using hSCPs which could help in understanding of hSCPs and its distribution. The org anization of the remainder of the paper as follo ws. In Section 2, we present the method for the extraction of hSCPs shared between rs-fMRI scans. Section 3 presents experimental results for validation of the method on simulated datasets and the ef fectiv eness on the rs-fMRI scans of the 100 unrelated HCP subjects [ 34 ] and 969 subjects from the Philadelphia Neurode velopmental Cohort (PNC) data set [ 35 ]. W e conclude with a discussion. I I . M E T H O D A. Sparse Connectivity P atterns Let X i ∈ R P × T be the fMRI data of the i th subject having P regions and T time points, and Θ i ∈ S P × P ++ is the correlation matrix where Θ i m,o = corr( x i m , x i o ) is the correlation between time series of m th and o th node. W e first define the model for estimating the Sparse Connecti vity Patterns (SCPs) [ 26 ] in the fMRI data which decomposes the correlation matrices into non-negati ve linear combination of sparse low rank components such that for all i = 1 , .., S we hav e Θ i = WΛ i W T where W ∈ R P × k is a set of shared patterns across all subjects, k < P and Λ i 0 is a diagonal matrix storing the subject specific information about the strength of each of the components. Let w l ∈ R P be the l th column of W such that − 1 w l 1 and let w l,s be the s th element of w l vector , then w l represents a component which reflects the weights of the nodes in the component and if w l,s is zero then s th node does not belong to l th component. If the sign of weights of any two nodes in a component is same then they are positiv ely correlated else they hav e anti-correlation. T o make the patterns sparse, each column of W was subjected to L 1 penalty and the below optimization is solved to obtain the SCPs minimize W , Λ S X i =1 || Θ i − WΛ i W T || 2 F subject to k w l k 1 ≤ λ, l = 1 , ..., k k w l k ∞ ≤ 1 , l = 1 , ..., k Λ i 0 , i = 1 , ..., S (1) where S is the total number of subjects and λ controls the sparsity of the components. B. Hierar chical Sparse Connectivity P atterns W e have extended the above work and introduced Hier- archical Sparse Connecti vity Patterns (hSCPs) to estimate hierarchical sparse low rank patterns in the correlation matrices. In our model, a correlation matrix is decomposed into K le vels as - Θ i ≈ W 1 Λ i 1 W T 1 Θ i ≈ W 1 W 2 Λ i 2 W T 2 W T 1 . Θ i ≈ W 1 W 2 .. W K Λ i K W T K W T K − 1 .. W T 1 (2) where W 1 ∈ R P × k 1 and W q ∈ R k q − 1 × k q , Λ i q ∈ R k q × k q is a diagonal matrix storing subject specific information of the patterns, P k 1 > k 2 > ... > k K , P K and W T is the transpose of W . Here k r is the number of components at the r th lev el, note that k 1 is the number of components at the lower most lev el of the hierarchy . If we consider 2 layer hierarchical representation of a giv en correlation matrix then we can define Z 1 = W 1 W 2 to be a P × k 2 matrix, then Z 1 is a coarse network which consist of weighted linear combination of W 1 which are fine le vel components where weights are stored in W 2 . For better interpretability , for noise reduction in the model, but also because of our hypothesis that brain subnetworks are relativ ely sparse [ 36 ], we hav e introduced sparsity constraints on the W matrices. By making W 1 sparse we are forcing the components to contain few number of nodes and by forcing rest of the W s to be sparse, we are forcing that the components at each of the next lev el are sparse linear combination of previous components. The hierarchical networks can be estimated by solving the belo w minimization procedures simultaneously under the constraints mentioned above min W 1 , Λ 1 S X i =1 k Θ i − W 1 Λ i 1 W T 1 k 2 F min W 1 ,W 2 , Λ 2 S X i =1 k Θ i − W 1 W 2 Λ i 2 W T 2 W T 1 k 2 F . min W , Λ K S X i =1 k Θ i − W 1 W 2 .. W K Λ i K W T K W T K − 1 .. W T 1 k 2 F (3) 3 where W = { W 1 , .., W K } . As the above minimization procedures are inter-dependent, we need to solve them jointly . Let L = { Λ 1 , .., Λ K } and H ( W , L ) = P S i =1 P K r =1 || Θ i − (Π r j =1 W j ) Λ i r (Π r n =1 W n ) T || 2 F . The joint minimization prob- lem can be written as below minimize W , L H ( W , L ) subject to k w r l k 1 < λ r , l = 1 , ..., k r and r = 1 , .., K k w r l k ∞ ≤ 1 , l = 1 , ..., k r and r = 1 , .., K W j ≥ 0 , j = 2 , ..., K Λ i r 0 , i = 1 , ..., S and r = 1 , .., K trace( Λ i r ) = 1 , i = 1 , ..., S and r = 1 , .., K (4) where trace operator calculates sum of diagonal elements of a matrix. In the above minimization procedure, the sum of diagonal values of Λ i is fixed to be 1 such that the sparsity of W is not trivially minimized. The optimization problem defined in 4 is a non-con vex problem which we solved using alternating minimization. Below are the gradients of H with respect to W and L . Let us first define the following variables W 0 = I P Y r = Π r j =0 W j T r n,i = (Π n − r j =1 W j ) Λ i n − r (Π n − r j =1 W j ) T the gradient of H with respect to Λ i r is: ∂ H ∂ Λ i r = ( − 2 Y T r Θ i r Y r + 2 Y T r Y r Λ i r Y T r Y r ) ◦ I k r (5) where ◦ is entry-wise product. The gradient of H with respect to W l is written as: ∂ H ∂ W r = S X i =1 K X j = r − 4 Y T r − 1 Θ i Y r − 1 W r T r j,i + 4 Y T r − 1 Y r − 1 W r T r j,i W T r Y T r − 1 Y r − 1 W r T r j,i (6) Algorithm 1 describes the complete alternating minimization procedure where pro j 1 ( W , λ ) operator projects each column of W into intersection of L 1 and L ∞ ball [ 37 ], and pro j 2 projects a matrix onto R + by making all the negati ve elements in the matrix equal to zero. As the gradients are not globally Lipschitzs, we don’ t have bounds on the step size for the gradients. For that reason, we hav e used AMSGrad [ 38 ], AD AM [ 39 ] and NAD AM [ 40 ] as gradient descent algorithms which have adaptiv e step size. descen t function in the Algo- rithm 1 is the update rule used by different gradient descent techniques. All the code is implemented in M A T L A B and will be released upon publication. The cost of computing gradients of Λ is O ( K S P 2 k 1 ) and of W is O ( K S P 2 k 1 + K 2 S P k 2 1 ) . The overall cost of Algorithm 1 is number of iterations ×O ( K S P 2 k 1 + K 2 S P k 2 1 ) . From our previous assumption that P K , the final cost is number of iterations ×O ( K S P 2 k 1 ) . In the above formulation, the last lev el has the highest number of components k 1 , and in the le vel after that we have k 2 number of components which are linear combination of components at previous level, so on and so forth. In this way , we hav e built up a hierarchical model where each component is made up of linear combination of components at the previous Algorithm 1 hSCP 1: Input: Data Θ , number of connectivity patterns k 1 ,.., k K and sparsity λ 1 ,.., λ K at different lev el 2: W and L = Initialization( Θ ) 3: repeat 4: for r = 1 to K do 5: W r ← descent( W r ) 6: if r == 1 then 7: W r ← pro j 1 ( W r , λ r ) 8: else 9: W r ← pro j 2 ( W r ) 10: for i = 1 , .., S do 11: Λ i r ← descent( Λ i r ) 12: Λ i r ← pro j 2 ( Λ i r ) 13: until Stopping criterion is reached 14: Output: W and L hierarchy . Note that we can not just use the last decomposition in the above architecture to get the hierarchy as different layers hav e dif ferent ranks and dif ferent approximations, hence we will need all the approximations to build the hierarchical structure. In addition, one would expect W 2 and W s to be degenerate, but that would be the case only when W 1 is orthogonal matrix. Consider the case where we hav e a two lev el hierarchy , we can hav e better approximation by taking a linear combination of columns of W 1 which we hav e also observed empirically . Algorithm 2 Initialization 1: Input: Data Θ 2: for r = 1 to K do 3: for i = 1 to S do 4: if r == 1 then 5: U i V i ( U i ) T = k 1 - rank SVD ( Θ i ) 6: else 7: V i = k r − 1 top values of Λ i r − 1 8: U i = Permutation matrix 9: Λ i r = V i 10: W r = 1 S P S i =1 ( U i ) 11: Output: W and L C. Initialization pr ocedur e for Gradient Descent Single lev el matrix decomposition considered in hSCP is structurally similar to Singular V alue Decomposition (SVD) but with the dependent components and sparsity added. Hence, we believ e that the final components estimated are a modification of singular vectors. Thus, we have initialized the W and L in Algorithm 1 by taking SVD of input data matrix. This helps in making algorithm deterministic. Define ¯ Θ as the sample mean of Θ i . W e then do k-rank SVD of ¯ Θ and obtain U and S such that U V U T = k-rank SVD of ¯ Θ . W e then initialize W 1 by U and Λ i 1 by V i where V i can be obtained by taking k-rank SVD of Θ i as described in Algorithm 2. For r > 1 , W r can be initialized as a permutation matrix and Λ r by top k r diagonal elements of k r − 1 so that we don’t hav e to perform SVD at 4 (a) k 1 = 15 , k 2 = 5 , λ 1 = 9 , λ 2 = 7 . 5 (b) k 1 = 15 , k 2 = 5 , λ 1 = 3 . 6 , λ 2 = 3 Fig. 1: Performance comparison of different gradient descent techniques. SVD corresponds to gradient descent with SVD initialization each lev el. W e empirically show in the next section that SVD initialization results in faster conv ergence. I I I . E X P E R I M E N T S A. Dataset W e used two real dataset for demonstrating the effecti veness of the method which are described below • HCP- Human Connectome Project (HCP) [ 34 ] dataset is one of the widely used dataset for fMRI analysis containing fMRI scans of 100 unrelated subjects as provided at the HCP 900 subjects data release [ 41 ] which were processed using ICA+FIX pipeline with MSMAll registration [ 42 ]. Each subject has 4004 time points and the time series were normalized to zero mean and unit L2 norm, av eraged ov er the 360 nodes of the multimodal HCP parcellation [43]. • PNC- Philadelphia Neuro-dev elopmental Cohort (PNC) [ 35 ] dataset contains 969 subjects (ages from 8 to 22 ) each having 120 time points and 121 nodes described in [ 44 ]. The data were preprocessed using an optimized procedure [ 45 ] which includes slice timing, confound regression, and band-pass filtering. B. Conver gence Analysis W e compare AMSGrad, AD AM, NAD AM and vanilla gra- dient descent with SVD initialization and random initialization by measuring percentage error which is defined as: P S i =1 P K r =1 || Θ i − (Π r j =1 W j ) Λ i r (Π r n =1 W n ) T || 2 F P S i =1 P K r =1 || Θ i || 2 F For fair comparison, we set β 1 = 0 . 9 and β 2 = 0 . 99 for AD AM, NAD AM and AMSGRAD algorithm, where β 1 and β 2 are the hyperparameters used in the update rules of the gradient descent algorithms. These are values are typically used as parameter settings for adaptive gradient descent algorithms [ 38 ]. Figure 1 shows the conv ergence of the algorithm on the complete HCP data for two different combinations of sparsity parameters at a particular set of k 1 and k 2 . From the Figure 1 we can see that the AMSGrad has the best con ver gence and 10 15 20 5 hSCP 0 . 8293 ± 0 . 0467 0 . 8097 ± 0 . 0728 0 . 8305 ± 0 . 0614 EA GLE 0 . 4051 ± 0 . 0304 0 . 4180 ± 0 . 0290 0 . 4068 ± 0 . 0070 OSLOM 0 . 6866 ± 0 . 0442 0 . 6955 ± 0 . 0362 - 6 hSCP 0 . 8421 ± 0 . 0585 0 . 8660 ± 0 . 0286 0 . 8497 ± 0 . 0292 EA GLE 0 . 3867 ± 0 . 0141 0 . 4855 ± 0 . 0731 0 . 4463 ± 0 . 0334 OSLOM 0 . 6249 ± 0 . 0554 0 . 7302 ± 0 . 0431 - 8 hSCP 0 . 8350 ± 0 . 0666 0 . 8457 ± 0 . 0353 0 . 8454 ± 0 . 0385 EA GLE 0 . 4408 ± 0 . 0857 0 . 5339 ± 0 . 0900 0 . 4099 ± 0 . 0274 OSLOM 0 . 6610 ± 0 . 0540 - - T ABLE I: Similarity comparison (mean ± std) on simulated dataset. The rows correspond to values of k 1 and the columns correspond to values of k 2 . SVD initialization giv es a better conv ergence rate. For rest of the experiments we have used AMSGrad algorithm with SVD initialization to perform gradient descent. C. Simulation T o ev aluate the performance of the proposed model, we first use synthetic data. W e compared the hierarchical components extracted from hSCP to hierarchical ov erlapping communities obtained using EA GLE [ 17 ] and OSLOM [ 18 ]. Implementation of EA GLE and OSLOM was obtained from the authors. W e randomly generate V 1 ∈ R p × k 1 with percentage of non-zeros equal to µ 1 , W 2 ∈ R k 1 × k 2 with percentage of non- zeros equal to µ 2 and Λ i ∈ R k 2 × k 2 for i = 1 , .., n . The goal is to generate V 1 W 2 Λ i W T 2 V T 1 matrices which are close to a correlation matrix. For this, we first take mean of all Λ i such that U = 1 n P n i =1 Λ i and generate T such that T = V 1 W 2 UW T 2 V T 1 . Now , let D be a matrix containing diagonal elements of T , to make T a correlation matrix, we modify V 1 by multiplying it by D 1 2 . Let W 1 = D 1 2 V 1 , then R = W 1 W 2 UW T 2 W T 1 would be a correlation matrix. Now , we generate correlation matrix for each subject by using the below equation Θ i = W 1 W 2 Λ i W T 2 W T 1 + E i 5 (a) Similarity comparison at fine scale (b) Similarity comparison at coarse scale Fig. 2: Comparison between ground truth and extracted hSCPS components on simulated dataset. X axis corresponds to proportion of non-zeros in the estimated components. As W 1 W 2 Λ i W T 2 W T 1 matrix is close to a correlation but not a correlation matrix, we add E i ∈ R p × p such that it becomes a correlation matrix. For the experiments, the parameters were set as follows: n = 300 , p = 100 k 1 = 20 , k 2 = 10 , µ 1 = 0 . 4 and µ 2 = 0 . 5 . W e compare components derived from hSCP with k 1 ∈ { 10 , 15 , 20 } , k 2 ∈ { 5 , 6 , 8 } , λ 1 ∈ P × 5(10 [ − 3: − 1] ) and λ 2 ∈ k 1 × 10 [ − 3: − 1] . By varying λ values, we generate components with dif ferent sparsity . W e first compare fine-scale and coarse-scale components separately to demonstrate the effect of sparsity on the performance. For a fixed k 1 and λ 1 , we find k 2 and λ 2 giving the maximum similarity with the ground truth and for a fixed k 2 and λ 2 , we find k 1 and λ 1 gi ving the maximum similarity with the ground truth over 10 runs. Here the similarity is defined as the average correlation between extracted and the ground truth components. Fig. 2 shows the similarity of the fine-scale and coarse-scale components with the ground truth. From the figure, we can see that the hSCP can extract components that are highly similar to the ground truth. Also, as the fine-scale components become sparse, the similarity decreases. Next, we compare hSCP to EAGLE and OSLOM. Hierarchical components and communities with k 1 ∈ { 5 , 6 , 8 } , k 2 ∈ { 10 , 15 , 20 } were extracted from hSCP , EA GLE and OSLOM. The correlation matrix a veraged across all the subjects was used as an input to EAGLE and OSLOM. For hSCP , among different values of λ 1 and λ 2 , we extract components at lev el k 1 and k 2 , which have maximum similarity with the ground truth. T able I shows the similarity of the extracted components with the ground truth. Some cells in the tables are empty as the EA GLE and OSLOM algorithms were not able to generate hierarchical structures for particular values of k 1 and k 2 . It can be seen that the hSCP method can extract the components which are closer to ground truth as compared to other methods. D. Comparison with single scale components W e also compared the reproducibility of the shared com- ponents extracted from the hierarchical model (hSCP) versus single scale components (SCP). Reproducibility here is defined as the normalized inner product of components deriv ed from the two equal random sub-samples of the data av eraged across all the components. W e decomposed the correlation matrix into two lev els to demonstrate the advantages of hierarchical factorization and show proof of concept. There might not be a single K that best describes the data, and the algorithm allows us to in vestigate the continuum of functional connectivity patterns at different K s. W e compare components deri ved from hSCP with k 1 ∈ { 10 , 15 , 20 , 25 } , k 2 ∈ { 4 , 5 , 6 , 8 } , λ 1 ∈ P × 5(10 [ − 3: − 1] ) and λ 2 ∈ k 1 × 10 [ − 3: − 1] , and from SCP with k ∈ { 4 , 5 , 6 , 8 , 10 , 15 , 20 , 25 } at λ ∈ P × 5(10 [ − 4: − 1] ) . At a fixed k 2 and λ 2 , we find the optimal k 1 and λ 1 by dividing the data into three equal parts: training, v alidation, and test data, and choosing the parameters corresponding to maximum mean reproducibility over 20 runs on training and validation set. Figure 3 and Figure 4 show the reproducibility of the components av eraged over 20 runs on training and test data. W e can see that the same number of components extracted from the second lev el using hSCP are, on average, more reproducible than the components extracted using SCP . E. Comparison of hSCP with existing appr oaches W e compared the reproducibility of hierarchical components extracted from hSCP to hierarchical ov erlapping communities obtained using EA GLE [ 17 ] and OSLOM [ 18 ]. Implementation of EAGLE and OSLOM was obtained from the authors. Correlation matrix averaged across all the subjects was used as an input to EA GLE and OSLOM. Hierarchical components and communities with k 1 ∈ { 4 , 5 , 6 , 8 } , k 2 ∈ { 10 , 15 , 20 , 25 } were generated from hSCP , EA GLE and OSLOM. Optimal λ 1 and λ 2 for hSCP were selected by dividing the data into three equal parts: training, validation and test set, and performing the validation procedure as described in section III-D . Reproducibility was computed using training and test for all the methods for all combinations of k 1 and k 2 . T ableII and T ableIII show the reproducibility results on HCP and PNC datasets. For a particular k 1 and k 2 , reproducibility table sho w the av erage of the two reproducibility v alues. The results clearly show that the hSCPs have better reproducibility than the communities deriv ed using EA GLE and OSLOM. 6 Fig. 3: Comparison of single scale (SCP) and hierarchical (HSCP) components on HCP dataset. X axis corresponds to proportion of non-zeros in the components. All the HSCP are second level components. Fig. 4: Comparison of single scale (SCP) and hierarchical (HSCP) components on PNC dataset. X axis corresponds to proportion of non-zeros in the components. All the HSCP are second level components. F . Age pr ediction W e compared the predictability power on the age prediction problem of the hierarchical components extracted from hSCP , EA GLE and OSLOM. Using PNC dataset, we first extracted the components and their strength ( Λ ) for each individual. These strength values were then used to predict age of each individual using linear regression. Pearson correlation coefficient and mean absolute error (MAE) between the predicted brain age and the true age was used as the performance measure for comparison. T able IV summarizes the result obtained. T o determine if our results are significantly better , the W ilcoxon signed-rank test was performed as the information about the underlying distribution in case of dif ferent performance measures is unknown. As the lower MAE is preferred, we performed a left-tailed hypothesis test when MAE is used as a performance measure. A right-tailed hypothesis test is performed when correlation is used as a performance measure because a higher value of correlation is better . Below is the null hypotheses in the two case: • There is no difference between correlation values obtained from our method compared to other methods • There is no difference between MAE values obtained from our method compared to other methods 7 10 15 20 4 hSCP 0 . 8885 ± 0 . 0441 0 . 8351 ± 0 . 0748 0 . 8507 ± 0 . 0635 EA GLE 0 . 3077 ± 0 . 0981 0 . 4158 ± 0 . 1321 - OSLOM 0 . 7493 ± 0 . 0882 - - 5 hSCP 0 . 8753 ± 0 . 0348 0 . 8356 ± 0 . 0591 0 . 8281 ± 0 . 0656 EA GLE 0 . 2908 ± 0 . 0737 0 . 2664 ± 0 . 0333 0 . 0792 ± 0 . 1656 OSLOM 0 . 6092 ± 0 . 0733 - - 6 hSCP 0 . 8756 ± 0 . 0375 0 . 8461 ± 0 . 0486 0 . 8224 ± 0 . 0555 EA GLE 0 . 2356 ± 0 . 0196 0 . 3209 ± 0 . 1206 0 . 3717 ± 0 . 1698 OSLOM 0 . 5791 ± 0 . 0792 - - 8 hSCP 0 . 8781 ± 0 . 0694 0 . 8389 ± 0 . 0479 0 . 8240 ± 0 . 0460 EA GLE - - 0 . 3374 ± 0 . 1672 OSLOM - - - T ABLE II: Reproducibility comparison (mean ± std) on HCP dataset. The rows correspond to values of k 1 and the columns correspond to values of k 2 . 10 15 20 4 hS C P 0 . 8838 ± 0 . 0495 0 . 7998 ± 0 . 0766 0 . 8036 ± 0 . 0599 EA GLE 0 . 6287 ± 0 . 3005 0 . 6433 ± 0 . 1321 0 . 6046 ± 0 . 2981 OSLOM 0 . 6780 ± 0 . 0537 - - 5 hSCP 0 . 8785 ± 0 . 0675 0 . 8379 ± 0 . 0704 0 . 8099 ± 0 . 0736 EA GLE 0 . 6575 ± 0 . 1973 0 . 5327 ± 0 . 1828 0 . 5426 ± 0 . 1656 OSLOM 0 . 5867 ± 0 . 0869 - - 6 hSCP 0 . 8655 ± 0 . 0404 0 . 8364 ± 0 . 0649 0 . 8518 ± 0 . 0587 EA GLE 0 . 7571 ± 0 . 2366 0 . 6279 ± 0 . 1011 0 . 6244 ± 0 . 2627 OSLOM 0 . 6391 ± 0 . 1266 - - 8 hSCP 0 . 8670 ± 0 . 0559 0 . 8347 ± 0 . 0517 0 . 8340 ± 0 . 0657 EA GLE - 0 . 7451 ± 0 . 0319 0 . 5933 ± 0 . 2126 OSLOM 0 . 5479 ± 0 . 0987 - - T ABLE III: Reproducibility comparison (mean ± std) on PNC dataset. The rows correspond to values of k 1 and the columns correspond to values of k 2 . T able V demonstrates that the prediction model built using hSCPs performed significantly better (p-value < 0 . 05 ) better than the model built using EAGLE and OSLOM components in the majority of the cases. This indicates that the hSCPs were more informativ e for predicting brain age. One of the reasons for the poor performance of EA GLE was that it only estimated if a region is present or not present in a component. In contrast, hSCP can determine the strength of the presence, thus had more degree of freedom resulting in better performance. G. Clustering An extension of the above method is presented belo w , which estimates hSCPs for better clustering of the data and capturing of heterogeneity . Data clustering is performed using subject specific information of the components. W e add a penalty term for clustering in the objective function giv en in problem 4. The modified objective function is given in problem 7. The joint minimization problem for estimating hSCPs and using their Algorithm 3 hSCP-clust 1: Input: Data Θ , number of connectivity patterns k 1 ,.., k K and sparsity λ 1 ,.., λ K at different lev el 2: W and L = Initialization( Θ ) 3: random initialization for (uniform sampling in [0, 1]) 4: repeat 5: for r = 1 to K do 6: Step 5-9 from Alg 1 7: for i = 1 , .., S do 8: Λ i r ← descent( Λ i r ) 9: Λ i r ← pro j 2 ( Λ i r ) 10: C , {M l } l =1: L ← k-means( Λ i r ) 11: until Stopping criterion is reached 12: Output: W , L , C and {M l } l =1: L subject specific information for clustering is giv en below: minimize W , L , C H ( W , L ) + K X r =1 L X l =1 X i ∈M l k r X d =1 || x i d,r k x d,r k − c l d,r || 2 subject to k w r l k 1 < λ r , l = 1 , ..., k r and r = 1 , .., K k w r l k ∞ ≤ 1 , l = 1 , ..., k r and r = 1 , .., K W j ≥ 0 , j = 2 , ..., K Λ i r 0 , i = 1 , ..., S and r = 1 , .., K trace( Λ i r ) = 1 , i = 1 , ..., S and r = 1 , .., K (7) where c l r is the l th cluster center at the r th lev el, L is the number of clusters, C = { c l r } l =1: L,r =1: K , x i r stores the diagonal elements of Λ i r and M l stores the information whether i th subject belongs to l th cluster or not. In the above problem, || x i d,r k x d,r k − c l d,r || 2 penalty is used for incorporating clustering by penalizing distance between points in a cluster and cluster center , and x i d,r is divided by k x d,r k for normalization. The abov e non-conv ex problem can be solved in a similar way as the problem 4 is solved in section II-B using alternating minimization. Alg 3 provides a complete procedure for solving the problem. In the Alg 3, k-means( Λ i r ) is used for applying k-means clustering [ 46 ] on Λ i r and k-means( Λ i r ) outputs C as cluster centers and {M l } l =1: L as cluster assignments of L . W e ran the algorithm on HCP data to extract the components and the clusters in the data. Number of clusters was selected by first extracting the hierarchical components without the penalty term and then clustering the data by using k-means on L . L which is number of clusters was set to 2 by using the elbow method. Number of of coarse scale components was set to be 4 and and fine scale components to be 10 since they exhibited the highest reproducibility between the training and test sets. Figure 5 and Figure 6 show the distribution of fine and coarse components in two clusters. From Figure 6, we can see that component A and C are more prominent in cluster 2 compared to cluster 1, and component B and D are prominent in cluster 1 compared to cluster 2. The algorithm has forced Component A and its sub-components to hav e higher weights in one cluster . But for component B, sub-components 2 and 3 are prominent in cluster 1 and sub-component 1 is prominent in cluster 2 8 Correlation MAE (years) 10 15 20 25 10 15 20 25 4 hSCP 0 . 259 ± 0 . 010 0 . 301 ± 0 . 014 0 . 319 ± 0 . 012 0 . 377 ± 0 . 010 3 . 20 ± 0 . 06 3 . 16 ± 0 . 10 3 . 13 ± 0 . 09 3 . 06 ± 0 . 14 EA GLE 0 . 246 ± 0 . 004 0 . 298 ± 0 . 009 0 . 300 ± 0 . 004 0 . 347 ± 0 . 005 3 . 22 ± 0 . 01 3 . 20 ± 0 . 01 3 . 15 ± 0 . 01 3 . 10 ± 0 . 01 OSLOM 0 . 209 ± 0 . 007 - - - 3 . 25 ± 0 . 01 - - - 5 hSCP 0 . 263 ± 0 . 010 0 . 327 ± 0 . 025 0 . 379 ± 0 . 028 0 . 403 ± 0 . 017 3 . 19 ± 0 . 07 3 . 10 ± 0 . 17 3 . 06 ± 0 . 21 3 . 03 ± 0 . 13 EA GLE 0 . 259 ± 0 . 003 0 . 298 ± 0 . 002 0 . 301 ± 0 . 006 - 3 . 21 ± 0 . 01 3 . 13 ± 0 . 01 3 . 09 ± 0 . 01 - OSLOM 0 . 217 ± 0 . 005 - - - 3 . 24 ± 0 . 01 - - - 6 hSCP 0 . 257 ± 0 . 013 0 . 342 ± 0 . 027 0 . 381 ± 0 . 021 0 . 407 ± 0 . 022 3 . 20 ± 0 . 08 3 . 11 ± 0 . 18 3 . 06 ± 0 . 16 3 . 02 ± 0 . 18 EA GLE 0 . 281 ± 0 . 004 0 . 308 ± 0 . 005 0 . 321 ± 0 . 007 - 3 . 18 ± 0 . 01 3 . 15 ± 0 . 01 3 . 14 ± 0 . 01 - OSLOM 0 . 236 ± 0 . 008 - - - 3 . 20 ± 0 . 01 - - - 8 hSCP 0 . 278 ± 0 . 022 0 . 372 ± 0 . 026 0 . 382 ± 0 . 023 0 . 409 ± 0 . 010 3 . 18 ± 0 . 14 3 . 07 ± 0 . 18 3 . 05 ± 0 . 17 3 . 02 ± 0 . 13 EA GLE - 0 . 311 ± 0 . 003 0 . 326 ± 0 . 007 - - 3 . 17 ± 0 . 01 3 . 13 ± 0 . 02 - OSLOM 0 . 264 ± 0 . 007 - - - 3 . 20 ± 0 . 01 - - - T ABLE IV: Prediction performance comparison of hSCP , EAGLE and OSLOM Correlation MAE (years) 10 15 20 25 10 15 20 25 4 EA GLE 0 . 0034 0 . 0229 3 . 6 × 10 − 4 4 . 7 × 10 − 5 0 . 0159 0 . 0108 0 . 0323 0 . 0351 OSLOM 4 . 7 × 10 − 5 - - - 0 . 0012 - - - 5 EA GLE 0 . 0447 1 . 1 × 10 − 4 4 . 7 × 10 − 5 - 0 . 0447 0 . 1198 0 . 0108 - OSLOM 4 . 7 × 10 − 5 - - - 0 . 0413 - - - 6 EA GLE 1 6 . 2 × 10 − 4 4 . 7 × 10 − 5 - 0 . 9727 0 . 0653 0 . 0145 - OSLOM 1 . 3 × 10 − 4 - - - 0 . 778 - - - 8 EA GLE - 4 . 7 × 10 − 5 4 . 7 × 10 − 5 - - 0 . 0447 0 . 0209 - OSLOM 0 . 0175 - - - 0 . 0563 - - - T ABLE V: p-value from Wilcoxon signed-rank test on different performance measures (a) Heterogeneity captured by subjects of cluster 1 (b) Heterogeneity captured by subjects of cluster 2 Fig. 5: Heterogeneity captured by fine scale components in HCP . The color indicates the strength ( Λ 2 ) of each component present in a subject. Maximum strength of a component across subjects is fixed to be 1 for comparison purpose. which can be seen in Figure 5. From the Figure 5 and 6, it can be seen that our method can rev eal heterogeneity in the population by capturing the strength of components’ presence in each individual. H. Results fr om r esting state fMRI Figure 7 displays the 10 fine lev el components, the 4 coarse le vel components and the hierarchical structure. It can be clearly seen from Figure 7 that the fine and coarse le vel components are ov erlapping and sparse, and coarse components are comprised of a sparse linear combination of fine lev el components which helps in discovering the relation between different networks at different scales. The ten fine lev el components obtained show the relation between different functional networks and are similar to the SCPs extracted in [ 26 ]. From Fig. 7, it can be seen that our approach can separate task-positiv e regions and their associated task-negati ve regions into separate patterns without using traditional seed-based methods that require knowledge of a seed region of interest. V arious studies have found that task-positiv e regions are positively correlated with each other, and task- negati ve regions are positi vely correlated with each other . The regions in the two networks are negativ ely correlated with each other , which aligns with our results [ 47 ], [ 48 ]. Component 2 cov ers Default Mode Network and Dorsal Attention Network, which are anti-correlated with each other . This result is a well-kno wn finding, previously described using the seed-based correlation method [ 22 ]. Anti correlations between dif ferent brain regions can represent interactions that are dependent 9 (a) Heterogeneity captured by subjects of cluster 1 (b) Heterogeneity captured by subjects of cluster 2 Fig. 6: Heterogeneity captured by coarse scale components in HCP . The color indicates the strength ( Λ 1 ) of each component present in a subject. Maximum strength of a component across subjects is fixed to be 1 for comparison purpose. on the state of the brain. As our method is not capturing dynamics, it has captured the interactions between dif ferent regions in different components. An example of this anti- correlation between the default mode network and the task positi ve network; these interactions are thought to be facilitated by indirect anatomical connections between the regions of two networks[ 49 ]. Component 8 shows different regions of DMN anti-correlated with sensorimotor , described in separate study [ 50 ]. Component C comprises of three connectivity patterns that inv olve the sensorimotor areas and its anti-correlations. Component A consists of V isual Network and V entral Attention Network, which are anti-correlated with each other . From Fig. 7, we can see that part of sensorimotor and emotion networks [ 51 ] are anti-correlated with each other . These connections highlighted by our method are corroborated by the fact that these regions hav e direct anatomical pathways [ 52 ]. These negati vely correlated networks can highlight different interactions in different brain regions, such as suppression, inhibition, and neurofeedback. An extension of this method that estimates dynamic components can help us understand different anti-correlations mechanism between the regions. Future research is needed to understand more about the anti- correlation and the source of these interactions. Our study also finds that compared to the primary sensory cortex, the higher-order association cortex has more has more associations in different components, shown in pre vious studies [ 53 ], [ 54 ]. Traditional seed-based approaches have been used to sho w that these regions have functional connectivity with more heterogeneous regions implying that they recei ve input and send outputs to more di verse brain regions [ 55 ], [ 56 ]. Thus, allowing ov erlapping components and positive and negati ve correlations within the same components provides additional insights. These features of the method facilitate storing the relation of various ov erlapping regions within a functional system with other areas by assigning them to different components. Another important observation is that each of the coarse components comprises fine lev el components having major functional networks and their relation with other nodes. For instance, coarse component B includes majorly of 2 and 3, which stores the link between regions of Default Mode network and other nodes in the brain. Similarly , coarse component D sav es the relationship between visual areas and the rest of the brain regions using 4 and 5. Thus, hSCPs can provide nov el insights into the functioning of the brain by jointly uncov ering both fine and coarse le vel components with the coarse components comprised of similarly functioning fine components. I V . D I S C U S S I O N A N D C O N C L U S I O N S In this work, we proposed a nov el technique for hierarchical extraction of sparse components from correlation matrices, with application to rsfMRI data . The proposed method is a cascaded joint matrix factorization problem where correlation matrix corresponding to each indi vidual’ s data is considered as an independent observation, thus allowing us to model inter- subject variability . W e formulated the problem as non-con ve x optimization with sparsity constraints. It is important to note that as the decomposition is not by itself unique, the ability to reproducibly recov er components hinges on imposing sparsity in the decomposition which appears to provide useful and reproducible representations. W e used adaptiv e learning rate based gradient descent algorithm to solve the optimization problem. Compared to the implementation of SCP , which had random initialization, we used SVD initialization which made the complete algorithm both deterministic and faster . In addition to shared patterns, we are able to extract the ‘strength’ of these patterns in indi vidual components, thus capturing heterogeneity across data. Experimentally we showed that our method is able to find sparse, low rank hierarchical decomposition using cascaded matrix factorization which is highly reproducible across data-sets. Experimental results using the PNC dataset demonstrate that the hierarchical components extracted using our model could better predict the brain age compared to EA GLE and OSLOM. W e also show that our model is able to capture heterogeneity using HCP dataset. Our model computationally e xtracts a set of hierarchical components 10 A 1 9 (a) Component A comprises of 1 and 9 B 1 2 3 (b) Component B comprises of 1, 2 and 3 C 6 7 8 10 (c) Component C comprises of 6, 7 and 8 D 4 5 6 (d) Component D comprises of 4, 5 and 6 Fig. 7: Hierarchical components derived from HCP dataset showing the connection between 10 fine scale components ( W 1 ) denoted from 1 to 10 and 4 coarse scale components ( W 1 W 2 ) denoted from A to D. Nodes with red and blue color are correlated among themselves, but are anitcorrelated with each other . Note that blue color does not need to be necessarily associate with positiv e or negati ve correlation because the colors can be flipped without affecting the solution. 8 fine scale components ( W 1 ). 2 and 3 sho w different regions of Default Mode anti-correlated with the y Dorsal Attention and Cingulo-Opercular system. 8 show different regions of default mode anti-correlated with the sensori-motor areas. 4 and 5 shows different regions of V isual system anti-correlated with Salience and fronto-parietal control systems. common across subjects, including resting state networks. At the same time, we capture individual information about subjects as a linear combination of these hierarchical components, making it a useful measure for group studies. Importantly , our work provides a method to uncover hierarchical organization in the functioning of the human brain. There are several directions for the future work. Firstly , it is possible to extend the idea to estimate dynamic hierarchical components similar to [ 57 ] which can help re veal how the hier- archical networks are varying over time. Secondly , generativ e- discriminativ e models can be build on the top of hSCP to find the components which are highly discriminative of some particular groups. For example, such model can estimate the hierarchical components which are most discirimanative of a neurodegenerati ve disorder . Third, it would be interesting to find the guarantee on the estimation error of the hierarchical compo- nents. One possible approach is to adapt the proof techniques of [ 58 ]. Finally , future studies incorporating cognitiv e, clinical, and genetic data, might elucidate the biological underpinning and clinical significance of the heterogeneity captured by our approach. Such studies are beyond the scope of the current paper . R E F E R E N C E S [1] O. Sporns, Networks of the Brain . MIT press, 2010. [2] G. Doucet, M. Nav eau, L. Petit, N. Delcroix, L. Zago, F . Crivello, G. Jobard, N. Tzourio-Mazoyer, B. Mazoyer, E. Mellet et al. , “Brain activity at rest: a multiscale hierarchical functional org anization, ” Journal of neurophysiology , vol. 105, no. 6, pp. 2753–2763, 2011. [3] H.-J. Park and K. Friston, “Structural and functional brain networks: from connections to cognition, ” Science , vol. 342, no. 6158, p. 1238411, 2013. [4] L. Ferrarini, I. M. V eer, E. Baerends, M.-J. van T ol, R. J. Renken, N. J. van der W ee, D. J. V eltman, A. Aleman, F . G. Zitman, B. W . Penninx et al. , “Hierarchical functional modularity in the resting-state human brain, ” Human brain mapping , vol. 30, no. 7, pp. 2220–2231, 2009. [5] D. Meunier, R. Lambiotte, A. Fornito, K. Ersche, and E. T . Bullmore, “Hierarchical modularity in human brain functional networks, ” F r ontiers in neuroinformatics , vol. 3, p. 37, 2009. [6] C. F . Beckmann, M. DeLuca, J. T . Devlin, and S. M. Smith, “Inv es- tigations into resting-state connecti vity using independent component analysis, ” Philosophical T ransactions of the Royal Society B: Biological Sciences , vol. 360, no. 1457, pp. 1001–1013, 2005. [7] H. Eavani, R. Filipovych, C. Davatzik os, T . D. Satterthwaite, R. E. Gur , and R. C. Gur, “Sparse dictionary learning of resting state fmri networks, ” in 2012 Second International W orkshop on P attern Recognition in Neur oImaging . IEEE, 2012, pp. 73–76. [8] E. Bullmore and O. Sporns, “Complex brain networks: graph theoretical analysis of structural and functional systems, ” Nature revie ws neuro- science , vol. 10, no. 3, p. 186, 2009. 11 [9] Y . W ang and T .-Q. Li, “ Analysis of whole-brain resting-state fmri data using hierarchical clustering approach, ” PloS one , vol. 8, no. 10, p. e76315, 2013. [10] X. Liu, X.-H. Zhu, P . Qiu, and W . Chen, “ A correlation-matrix-based hierarchical clustering method for functional connectivity analysis, ” Journal of neuroscience methods , vol. 211, no. 1, pp. 94–102, 2012. [11] A. Ashourvan, Q. K. T elesford, T . V erstynen, J. M. V ettel, and D. S. Bassett, “Multi-scale detection of hierarchical community ar- chitecture in structural and functional brain networks, ” arXiv preprint arXiv:1704.05826 , 2017. [12] R. F . Betzel and D. S. Bassett, “Multi-scale brain networks, ” Neuroima ge , vol. 160, pp. 73–83, 2017. [13] M. G. Puxeddu, J. Fasko witz, R. F . Betzel, M. Petti, L. Astolfi, and O. Sporns, “The modular organization of brain cortical connectivity across the human lifespan, ” Neur oImage , p. 116974, 2020. [14] R. F . Betzel, B. Miši ´ c, Y . He, J. Rumschlag, X.-N. Zuo, and O. Sporns, “Functional brain modules reconfigure at multiple scales across the human lifespan, ” arXiv preprint , 2015. [15] J. Xu, M. N. Potenza, V . D. Calhoun, R. Zhang, S. W . Y ip, J. T . W all, G. D. Pearlson, P . D. W orhunsky , K. A. Garrison, and J. M. Moran, “Large- scale functional network overlap is a general property of brain functional organization: reconciling inconsistent fmri findings from general-linear- model-based analyses, ” Neur oscience & Biobehavioral Revie ws , vol. 71, pp. 83–100, 2016. [16] A. Lancichinetti, S. Fortunato, and J. K ertesz, “Detecting the overlapping and hierarchical community structure in complex networks, ” New Journal of Physics , vol. 11, no. 3, p. 033015, 2009. [17] H. Shen, X. Cheng, K. Cai, and M.-B. Hu, “Detect overlapping and hierarchical community structure in networks, ” Physica A: Statistical Mechanics and its Applications , vol. 388, no. 8, pp. 1706–1712, 2009. [18] A. Lancichinetti, F . Radicchi, J. J. Ramasco, and S. Fortunato, “Finding statistically significant communities in networks, ” PloS one , vol. 6, no. 4, p. e18961, 2011. [19] Z. Zhang and Z. W ang, “Mining overlapping and hierarchical commu- nities in complex networks, ” Physica A: Statistical Mechanics and its Applications , vol. 421, pp. 25–33, 2015. [20] M. Rubinov and O. Sporns, “W eight-conserving characterization of complex functional brain networks, ” Neuroima ge , vol. 56, no. 4, pp. 2068–2079, 2011. [21] L. Zhan, L. M. Jenkins, O. E. W olfson, J. J. GadElkarim, K. Nocito, P . M. Thompson, O. A. Ajilore, M. K. Chung, and A. D. Leow , “The significance of negati ve correlations in brain connectivity , ” Journal of Comparative Neur ology , vol. 525, no. 15, pp. 3251–3265, 2017. [22] M. D. Fox, A. Z. Snyder, J. L. V incent, M. Corbetta, D. C. V an Essen, and M. E. Raichle, “The human brain is intrinsically org anized into dynamic, anticorrelated functional networks, ” Pr oceedings of the National Academy of Sciences , vol. 102, no. 27, pp. 9673–9678, 2005. [23] A. L. Fitzpatrick, C. K. Buchanan, R. L. Nahin, S. T . DeK osky , H. H. Atkinson, M. C. Carlson, and J. D. Williamson, “ Associations of gait speed and other measures of physical function with cognition in a healthy cohort of elderly persons, ” The J ournals of Gerontolo gy Series A: Biological Sciences and Medical Sciences , vol. 62, no. 11, pp. 1244–1251, 2007. [24] A. Goulas, A. Schaefer, and D. S. Margulies, “The strength of weak connections in the macaque cortico-cortical network, ” Brain Structure and Function , vol. 220, no. 5, pp. 2939–2951, 2015. [25] E. Santarnecchi, G. Galli, N. R. Polizzotto, A. Rossi, and S. Rossi, “Effi- ciency of weak brain connections support general cognitiv e functioning, ” Human brain mapping , vol. 35, no. 9, pp. 4566–4582, 2014. [26] H. Eav ani, T . D. Satterthwaite, R. Filipovych, R. E. Gur , R. C. Gur, and C. Davatzikos, “Identifying sparse connectivity patterns in the brain using resting-state fmri, ” Neuroima ge , vol. 105, pp. 286–299, 2015. [27] D. Sahoo, N. Honnorat, and C. Davatzikos, “Sparse low-dimensional causal modeling for the analysis of brain function, ” in Medical Imaging 2019: Image Processing , vol. 10949. International Society for Optics and Photonics, 2019, p. 109492R. [28] ——, “Gpu accelerated extraction of sparse granger causality patterns, ” in 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018) . IEEE, 2018, pp. 604–607. [29] J. Y ang and J. Leskovec, “Overlapping community detection at scale: a nonnegati ve matrix factorization approach, ” in Pr oceedings of the sixth ACM international conference on W eb search and data mining . ACM, 2013, pp. 587–596. [30] H. A. Song and S.-Y . Lee, “Hierarchical representation using nmf, ” in International conference on neural information pr ocessing . Springer , 2013, pp. 466–473. [31] H. Li, X. Zhu, and Y . Fan, “Identification of multi-scale hierarchical brain functional networks using deep matrix factorization, ” in Interna- tional Conference on Medical Image Computing and Computer-Assisted Intervention . Springer , 2018, pp. 223–231. [32] G. Trigeorgis, K. Bousmalis, S. Zafeiriou, and B. W . Schuller , “ A deep matrix factorization method for learning attribute representations, ” IEEE transactions on pattern analysis and machine intelligence , vol. 39, no. 3, pp. 417–429, 2017. [33] K. H. Madsen, N. W . Churchill, and M. Mørup, “Quantifying functional connectivity in multi-subject fmri data using component models, ” Human brain mapping , vol. 38, no. 2, pp. 882–899, 2017. [34] D. C. V an Essen, S. M. Smith, D. M. Barch, T . E. Behrens, E. Y a- coub, K. Ugurbil, W .-M. H. Consortium et al. , “The wu-minn human connectome project: an overvie w , ” Neuroima ge , v ol. 80, pp. 62–79, 2013. [35] T . D. Satterthwaite, M. A. Elliott, K. Ruparel, J. Loughead, K. Prab- hakaran, M. E. Calkins, R. Hopson, C. Jackson, J. Keefe, M. Riley et al. , “Neuroimaging of the philadelphia neurodevelopmental cohort, ” Neur oimage , vol. 86, pp. 544–553, 2014. [36] S. Achard and E. Bullmore, “Efficienc y and cost of economical brain functional networks, ” PLoS computational biology , vol. 3, no. 2, p. e17, 2007. [37] A. Podosinnikova, M. Hein, and R. Gemulla, “Robust principal compo- nent analysis as a nonlinear eigenproblem, ” Ph.D. dissertation, Saarland Univ ersity , 2013. [38] S. J. Reddi, S. Kale, and S. Kumar , “On the con ver gence of adam and beyond, ” arXiv preprint , 2019. [39] D. P . Kingma and J. Ba, “ Adam: A method for stochastic optimization, ” arXiv preprint arXiv:1412.6980 , 2014. [40] T . Dozat, “Incorporating nesterov momentum into adam, ” 2016. [41] D. C. V an Essen, K. Ugurbil, E. Auerbach, D. Barch, T . Behrens, R. Bucholz, A. Chang, L. Chen, M. Corbetta, S. W . Curtiss et al. , “The human connectome project: a data acquisition perspectiv e, ” Neuroima ge , vol. 62, no. 4, pp. 2222–2231, 2012. [42] M. F . Glasser, S. N. Sotiropoulos, J. A. W ilson, T . S. Coalson, B. Fischl, J. L. Andersson, J. Xu, S. Jbabdi, M. W ebster, J. R. Polimeni et al. , “The minimal preprocessing pipelines for the human connectome project, ” Neur oimage , vol. 80, pp. 105–124, 2013. [43] M. F . Glasser, T . S. Coalson, E. C. Robinson, C. D. Hacker , J. Harwell, E. Y acoub, K. Ugurbil, J. Andersson, C. F . Beckmann, M. Jenkinson et al. , “ A multi-modal parcellation of human cerebral cortex, ” Nature , vol. 536, no. 7615, pp. 171–178, 2016. [44] J. Doshi, G. Erus, Y . Ou, S. M. Resnick, R. C. Gur, R. E. Gur, T . D. Satterthwaite, S. Furth, C. Da vatzikos, A. N. Initiativ e et al. , “Muse: Multi- atlas region segmentation utilizing ensembles of registration algorithms and parameters, and locally optimal atlas selection, ” Neur oimage , vol. 127, pp. 186–195, 2016. [45] R. Ciric, D. H. W olf, J. D. Power , D. R. Roalf, G. L. Baum, K. Ruparel, R. T . Shinohara, M. A. Elliott, S. B. Eickhoff, C. Davatzik os et al. , “Benchmarking of participant-lev el confound regression strategies for the control of motion artifact in studies of functional connectivity , ” Neur oimage , vol. 154, pp. 174–187, 2017. [46] K. W agstaf f, C. Cardie, S. Rogers, S. Schrödl et al. , “Constrained k- means clustering with background knowledge, ” in Icml , vol. 1, 2001, pp. 577–584. [47] M. E. Raichle, “The brain’ s default mode network, ” Annual revie w of neur oscience , vol. 38, pp. 433–447, 2015. [48] B. T . Y eo, F . M. Krienen, M. W . Chee, and R. L. Buckner, “Estimates of segregation and overlap of functional connectivity networks in the human cerebral cortex, ” Neur oimage , vol. 88, pp. 212–227, 2014. [49] R. L. Buckner, F . M. Krienen, and B. T . Y eo, “Opportunities and limitations of intrinsic functional connectivity mri, ” Nature neuroscience , vol. 16, no. 7, pp. 832–837, 2013. [50] F . I. Karahano ˘ glu and D. V an De Ville, “Transient brain activity disentangles fmri resting-state dynamics in terms of spatially and temporally overlapping networks, ” Natur e communications , vol. 6, no. 1, pp. 1–10, 2015. [51] W . C. Drev ets and M. E. Raichle, “Reciprocal suppression of regional cerebral blood flow during emotional versus higher cognitive processes: Implications for interactions between emotion and cognition, ” Cognition and emotion , vol. 12, no. 3, pp. 353–385, 1998. [52] F . V ergani, L. Lacerda, J. Martino, J. Attems, C. Morris, P . Mitchell, M. T . de Schotten, and F . Dell’Acqua, “White matter connections of the supplementary motor area in humans, ” Journal of Neurology , Neur osur gery & Psychiatry , vol. 85, no. 12, pp. 1377–1385, 2014. [53] F . Geranmayeh, R. J. Wise, A. Mehta, and R. Leech, “Overlapping networks engaged during spoken language production and its cognitive control, ” Journal of Neuroscience , vol. 34, no. 26, pp. 8728–8740, 2014. 12 [54] E. Beldzik, A. Domagalik, S. Daselaar, M. Fafrowicz, W . Froncisz, H. Oginska, and T . Marek, “Contributiv e sources analysis: a measure of neural networks’ contribution to brain activ ations, ” Neuroima ge , vol. 76, pp. 304–312, 2013. [55] F . Katsuki and C. Constantinidis, “Unique and shared roles of the posterior parietal and dorsolateral prefrontal cortex in cognitive functions, ” F r ontiers in integrative neur oscience , vol. 6, p. 17, 2012. [56] N. A. Crossley , A. Mechelli, J. Scott, F . Carletti, P . T . Fox, P . McGuire, and E. T . Bullmore, “The hubs of the human connectome are generally implicated in the anatomy of brain disorders, ” Brain , vol. 137, no. 8, pp. 2382–2395, 2014. [57] B. Cai, P . Zille, J. M. Stephen, T . W . W ilson, V . D. Calhoun, and Y . P . W ang, “Estimation of dynamic sparse connectivity patterns from resting state fmri, ” IEEE transactions on medical imaging , vol. 37, no. 5, pp. 1224–1234, 2017. [58] M. Y u, Z. W ang, V . Gupta, and M. K olar , “Recovery of simultaneous low rank and two-way sparse coefficient matrices, a nonconv ex approach, ” arXiv preprint arXiv:1802.06967 , 2018.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment