Functional Brain Networks Discovery Using Dictionary Learning with Correlated Sparsity

Functional Brain Networks Disco v ery Using Dictionary Learning with Correlated Sparsity Mohsen Joneidi Department of Electrical Engineering and Computer Science Uni versity of Central Florida mohsen.joneidi@ucf.edu COURSE PR OJECT REPOR T Summary : Analysis of data from functional mag- netic resonance imaging (fMRI) results in constructing functional brain networks. Principal component analy- sis (PCA) and independent component analysis (ICA) are widely used to generate functional brain networks. Moreov er , dictionary learning and sparse representation provide some latent patterns that rules brain acti vities and they can be interpreted as brain networks. Howe ver , these methods lack modeling dependencies of the discovered networks. In this study an alternativ e to these con- ventional methods is presented in which dependencies of the networks are considered via correlated sparsity patterns. W e formulate this challenge as a new dictionary learning problem and propose two approaches to solve the problem ef fectiv ely . I . M OT I V A T I O N Identifying brain networks and their interactions re- quire the analysis of the recorded signals over time. The correlation of functional brain networks (FBNs) extracted from fMRI is a powerful tool for diagnostic purposes. fMRI is based on blood oxygenation lev el- dependent (BOLD) contrast, that is captured using local ﬂuctuations in the ﬂo w of oxygenated blood. T o perform dif ferent brain tasks, speciﬁc brain functional networks might be activ ated. They will be engaged collaborati vely to ex ecute a speciﬁc task. These networks are related to lo w-lev el brain functions, called segregated specialized small brain regions [1]. These re gions collaborate to perform tasks; howe ver , a few regions will be activ ated to ex ecute a certain task. Activ ation of a small number of brain functions for each task implies a kind of sparsity in terms of the fundamental functional bases. The recent promising results of dictionary learning (DL) based fMRI signal analysis conﬁrm that the assumed sparsity is consistent with the pattern of brain acti vities. DL aims to decompose the observed signals in terms of some fundamental bases and their corresponding sparse coef ﬁcients. This method beats the traditional methods including principal component analysis (PCA) and in- dependent component analysis (ICA) for extraction of the fundamental bases of acti vity patterns [2 – 4]. The traditional methods assume ortho gonality or indepen- dency for the fundamental bases which is an unnatural constraint for the bases [5 – 7]. DL is a promising alternativ e of ICA for fMRI sig- nals decomposition. Recent studies hav e sho wn that DL outperforms ICA in this application due to more relaxed assumptions on the bases [8]. Moreov er , it assumes a more ﬂe xible model for data. Additionally , the underly- ing model of DL is consistent with sparse acti vation of FBNs. It means only fe w FBNs are activ e in each instant. In addition to temporal sparsity of FBNs, the spatial acti vity patterns also are spa rse [9]. Ho wever , there e xists a correlation between spatial activ ation patterns in the spatial domain. It means the non-zero entries are not spread element-wise and there is a piece-wise sparsity which cause smoothness in FBNs. These characteristics moti vate us to impose the dictionary learning model for data extracted from brain. Atoms of the dictionary span a union of reliable subspaces which is ﬁtted to the training data. W a velet decomposition, compressiv e sensing and DL are some products of union of subspaces model [10, 11]. fMRI data consists of a collection of time series for each brain’ s v oxel. These time series can be represented in terms of a collection of principal time series. Howe ver , the number of principal time series is limited. The ﬂuctuation of each vox el generates a certain pattern in time and this time-series can be represented in terms of the brain’ s principal time series. Different brain regions collaborate to perform a certain function, while ex ecution of a certain brain function only implies a few principal time series to be in volv ed. Therefore, there exists an inherent sparsity for representation of the signals of brain acti vity . The general model of DL assumes activ ation of the bases are independent of each other , i.e., activ ation of a principle basis does not effect on activ ation of any other bases. W e call this type of sparsity as element-wise sparsity . In addition to element-wise sparsity , some struc- tural sparsity also exists in the pattern of brain activities. Lack of regulation constraints such as orthogonality and coherency for the basic time series causes high- sensiti ve set of atoms. a merely sparse constraint results in inconsistent activ ation maps. T o address this problem, we exploit unsupervised group sparsity constraints for dictionary learning. Our goal is to engage the correlation of the principal time-series on their corresponding sparse coef ﬁcients. This paper has tw o main contrib utions. First, proposing a new dictionary learning problem for single subject fMRI resting state signals decomposition, and second, presenting two computational algorithms for solving the proposed problem. The rest of the paper is organized as follows. Section II revie ws the related work in the literature. Section III presents some basic methods for estimation in terms of correlated bases. Inspired by Section III, a new dictionary learning problem is introduced in Section IV . Experimental results are presented in Section V and Section VI concludes the paper . I I . R E L A T E D W O R K S Multi variate statistical algorithms consider brain v ox- els’ acti vity as a collaborati ve network and analyze the vox els’ data jointly . These methods include PCA, factor analysis and ICA [12]. ICA is a data-dri ven method that can decompose the observed multiv ariate data into some maximally independent sources. This method does not need prior information neither on the sources nor on the characteristics of the mixing. ICA has been widely used to rev eal brain networks for both task- based and resting state fMRI signals [13]. Although ICA exhibits a relati vely ﬁt model for fMRI signals, it cannot impose additional prior information such as sparsity . There exist many ef forts to consider sparsity for fMRI signals analysis [7, 14]. DL is a sparsity-based method that has receiv ed much attention recently . The basic dictionary model is giv en by , Y = D X + N , k x i k 0 ≤ T , (1) in which Y ∈ R N × L is the observed signals, D ∈ R N × K is the dictionary and X ∈ R K × L is the coef ﬁcients matrix, N is the dimension of the observed data, L is the number of observed data and, K is the number of bases in the dictionary . x i is the i th column of X and k . k 0 denotes  0 norm which counts the number of nonzero elements of a vector . T is the maximum number of nonzero entries in each column. Each column of Y contains an observed signal, and the corresponding column in X is its sparse representation. The basic DL can be cast as an optimization problem as follows, min X , D 1 2 k Y − D X k 2 F s.t. k x k 0 ≤ T . (2) A well-known solution for this problem is the K-SVD algorithm which is an alternativ e optimization method. I.e., it initializes D and optimizes w .r .t X and D alterna- ti vely [10]. Optimization w .r .t X is called sparse coding and optimization w .r .t D is called dictionary updating. Sparse representation of Y is leaned in X , ho we ver this representation can be learned such that provides a discriminativ e representation for Y when input data are labeled [15]. K-SVD is used for resting state and task-based fMRI signal analysis. The only constraint on the dictionary is the normalization of its columns. K- SVD does not consider any additional constraint on the dictionary . Ho we ver , in presence of highly correlated bases the corresponding coef ﬁcients are not consistent. In other w ords distances in the original data space are not projected to the sparse representation space consistently .  1  2  3  4  1  2  3  4  1 = 2, 0 , 0, 0   2 = 0, 2, 0, 0   3 = 0, 0, 2, 0   4 = 0, 0, 0, 2   =   =  1  2  3  4  =  1  2  3  4  =  1  2  3  4 Fig. 1: A simple e xample illustrates the inconsistency of element-wise sparsity . Fig. 1 illustrates this effect. Suppose we are given 4 bases, d 1 , · · · , d 4 , distrib uted on the unit ball. Some data, y 1 , · · · , y 4 are approximated by the underlying bases and their corresponding sparse representations are denoted as x 1 , · · · , x 4 . Only one basis is utilized to represent each data, i.e., the sparsity is 1. Obviously , the distance between each pair of data is so much variant while the distance between each pair of sparse vectors is 2 √ 2 . It means a small change in data space may cause a big abrupt change in the sparse domain. Correlation between atoms of a dictionary should be considered in order to estimate a more accurate sparse representation [16]. In order to alleviate the inconsistency of sparse rep- resentation, imposing an incoherency constraint for the dictionary bases could be a remedy that suggested in [17] for analyzing fMRI signals. This idea aims to learn incoherent dictionary bases by considering a new regularizer term. min X , D 1 2 k Y − D X k 2 F + λ k X k 1 + γ k D T D − I k 2 F . (3) where, λ and γ encourage sparsity and incoherency , respecti vely . The last term adds a penalty for the off- diagonal elements of the correlation matrix D T D . How- e ver , in this paper we do not assume any additional constraint on the dictionary but the sparse coding is modiﬁed in a way to compensate the destructive effect of highly correlated bases of the dictionary on the sparse coef ﬁcients. In the next section the undesired effect of correlated bases is discussed and some existing solutions are explained. I I I . D E A L I N G W I T H C O R R E L A T E D B A S E S In some applications, atoms of the dictionary may be highly correlated. In this setting, we expect the coef ﬁcients corresponding to correlated atoms to be correlated with each other . For example, if two atoms are highly correlated, then the y would be either both zero or both non-zero. Thus, the coefﬁcients tend to appear in groups, i.e., the coef ﬁcients corresponding to a subset of correlated atoms are all together equal to zero, or they are all non-zero. This is known as the grouping ef fect which is of a high importance especially in linear regression [18]. T raditional  1 -regularization, as used in LASSO [19], though lead to sparse coefﬁcients, it fails to maintain the grouping effect in highly correlated bases. That is, from the coefﬁcients corresponding to the same group of correlated atoms, only one of them may be non-zero due to the crisp sparsity beha viour of the  1 -norm. T o remedy this problem, group sparse regularizing terms hav e been proposed [20 – 22]. The group structure is considered in both predeﬁned model-based and unknown data-dri ven manners. One such interesting data-dri ven solution has been proposed by Zhou et al, by the elastic net regularization [22]. The elastic net term comprises of a sparsity term, the  1 -norm, plus a grouping and stabilizing term, the  2 -norm. That is, elastic net merges the beneﬁts of the LASSO and the ridge regression [18]. The ﬁnal sparse coding (regression) problem is then, min x  1 2 k y − Dx k 2 2 + λ 1 k x k 1 + λ 2 k x k 2 2  . The grouping effect is essential in the brain func- tional analysis, where there may exist strong correlations among v arious activity areas in the brain. I V . D I C T I O N A RY L E A R N I N G W I T H C O R R E L A T E D S P A R S I T Y C O N S T R A I N T S Plain DL algorithms like KSVD are not able to learn incoherent bases. Moreo ver , their coefﬁcients are not consistent in presence of correlated bases. This is more crucial in the brain network analysis because the functional brain networks are not independent [5 – 7]. Herein, we propose an elastic-net-based [22] dictionary learning formulation to solve this problem. Herein, our proposed problem is as follows, argmin X , D 1 2 k Y − D X k 2 F + λ EN ( X ) , (4) where, EN ( X ) , k X k 1 + γ 2 k X k 2 F , is the matrix form of the elastic-net regularization. W e use the proximal-spliting algorithm [23] to solve Prob- lem (4). Proximal-spliting method targets the follo wing optimization problem: argmin x { f ( x ) = g ( x ) + h ( x ) } , where g ( . ) and h ( . ) are con vex functions with g ( . ) being dif ferentiable in addition. The idea is then to perform the follo wing iterations to update x x k +1 = Prox h ( x k − µ ∇ g ( x k )) . (5) where Prox h ( . ) is the so-called proximal operator of h ( . ) deﬁned as Prox h ( x ) , argmin u  1 2 k x − u k 2 2 + h ( u )  Also, the step size µ ∈ (0 , 1 L ] , in which L is the Lipschitz constant of g ( . ) . In problem (4), we hav e g ( X ) = 1 2 k Y − D X k 2 F , h ( X ) = λ EN ( X ) It can be sho wn that L = k D T D k , where k . k denotes the spectral norm. The proximal operator of the elastic net term has also a simple close-form formula Prox EN ( X ) = 1 1 + λγ Soft ( X , λ ) (6) Algorithm 1 OMP sparse coding 1: Require: y , D ,  . 2: Initialization: r 0 = y , S = {} , x = 0 and i = 0 . 3: while Error >  do 4: S = S S argmax j < r i , d j > 5: x S = argmin x k y − D (: , S ) x k 2 2 6: i ← i + 1 7: r i = y − Dx 8: Error = k r i k 2 9: end while 10: Output: x where Soft ( ., . ) is the well-know element-wise soft- thresholding function deﬁned as Soft ( X , λ ) , sign ( X )  max( | X | − λ, 0) . In which,  indicates element-wise multiplication. Our proposed scheme contains two main subroutines as same as con ventional DL algorithms, sparse coding and dic- tionary updating. The steps of these subroutines are explained in Alg. 1 and Alg. 2. Moreov er , the o verall iter - ati ve sparse coding algorithm is summarized in Alg. 3. In this algorithm, we use the relativ e change between con- secuti ve solutions of the iterative algorithm as a stopping criterion. This sparse coding is then used as the sparse approximation stage of our proposed dictionary learning algorithm. For the dictionary update stage, any algorithm like KSVD atom-by-atom dictionary update can be used. Moreov er , for initializing the elastic-net sparse coding we lev erage the iterative nature of the dictionary learning problem, and use the coefﬁcient matrix X of the previous DL iteration as a warm start. This greatly reduces the computational b urden of the sparse coding stage. T o prove the ef ﬁciency of the proposed grouped vari- ables approach in DL problem, we hav e simply modiﬁed the K-SVD DL algorithm. T o this aim, in the sparse coding stage the correlations of the dictionary bases are taken into account to construct a set of grouped-wise coef ﬁcients. G is deﬁned as hard-threshold (HT) of the bases correlations. HT function is illustrated in Figure 1. Note that if the bases are low-correlated, G will be close to the identity matrix and the algorithm works like the basic K-SVD. Howe ver , in the case of highly correlated bases, multiplication of G by the coef ﬁcients results in group sparsity of the highly correlated coefﬁcients. At the end the new coefﬁcients should be normalized Algorithm 2 Updating Dictionary’ s atoms [10] 1: Require: Y , X . 2: for k = 1 , · · · , K 3: Collect all the data that use d k in S k set 4: Assume d k = 0 and compute E k = Y − D X 5: Reduce E k by S k columns → E R k 6: Update d k by the ﬁrst left Eigen vector of E R k 7: end 8: Output: D Algorithm 3 Elastic-Net Regularized DL 1: Require: Y , λ , γ ,  2: Initialization: D = D 0 ∈ R N × K , ξ = ∞ , µ = 1 / k D T D k 3: while stopping criterion is not met do 4: Sparse approximation (Alg. 1) 5: while ξ >  do 6: X o = X 7: X = X − µ ∇ g X  Gradient step 8: X = 1 1+ λγ Soft ( X , λ )  Proximal mapping 9: ξ = k X − X o k F / k X o k F 10: end while 11: Dictionary update (Alg. 2) 12: end while 13: Output: X and D to minimize the reconstruction error . The scale can be calculated easily , Σ ii = min σ k y i − σ Dx ∗ i k 2 2 = y T i Dx ∗ i x ∗ T i D T Dx ∗ i . (7) In which, Σ is a diagonal normalization matrix. Al- gorithm 2 shows the steps of the modiﬁed algorithm. In this algorithm only the coding is modiﬁed and the dictionary update stage is remained like the original K- SVD algorithm. V . E X P E R I M E N TA L R E S U LT S The proposed algorithms are ev aluated in two simula- tion scenarios, synthetic and real fMRI data. Algorithm 4 Modiﬁed Grouped-wise K-SVD 1: Require: Y , λ ,  . 2: Initialization: D = D 0 ∈ R N × K 3: while stopping criterion is not met do 4: G = HT ( D T D , λ ) 5: Sparse Appr oximation (Alg. 1) 6: X ← G X 7: X ∗ ← X Σ using Eq. (7) 8: Dictionary Update (Alg. 2) 9: end while 10: Output: X ∗ Fig. 2: Soft threshold versus hard threshold. A. Synthesized fMRI Data This experiment examines the performance of pro- posed methods in separating the sources of some ar- tiﬁcially generated fMRI data. Some 3D images are considered as functional brain networks and they are modulated by some principle time series and an additiv e Gaussian noise is added to generate the ﬁnal synthetic data. Figure 2 shows some underlying brain networks and their activ ation time series as well as some generated 3D fMRI image. T o e v aluate the obtained dictionary using dif ferent methods, the following dictionary distance is used [10]. d d ( D 0 , ˆ D ) = K X k =1 min j (1 − ˆ d T k ˆ d 0 j ) (8) (a) T wo synthetic principal bases of FBNs. (b) Synthetic fMRI data in two time courses obtained by modulation of 10 FBNs by their corresponding time series with additiv e Gaussian noise. Fig. 3: Illustration of synthetic fMRI data generation. Fig. 4: Performance of the different DL algorithms over iterations. Fig. 5: Performance of the proposed algorithm while it observes a portion of the whole data. B. Real fMRI A single subject analysis is performed to compare the traditional dictionary learning with the modiﬁed one. Resting-state fMRI data are downloaded from a free access online dataset 1 . SPM 12 Matlab toolbox is used to perform the needed pre-processing such as normalization and registration. The spatial resolution for fMRI data is 160x160x36 pixels where we access to 50 time courses. Corre- sponding to each point there is a time series. All of these time series are collected as columns of Y matrix, then it is decomposed to some bases and coefﬁcients. The coefﬁcient of each point is exploited to perform segmentation using clustering. A simple clustering-based segmentation is performed by K-means with  1 criterion. 1 http://www .myconnectome.org As it can be seen in Fig. 7, the modiﬁed coefﬁcients are able to segment the volume of the brain to result in coherent regions which is consistent with functional brain networks. Fig. 6 shows the segmentation of the 27 th and the 28 th slice of brain more accurately . The upper image sho ws segmentation using the pure sparsity features, the middle one indicates the obtained results using the proposed Modiﬁed-KSVD algorithm and the bottom image is resulted by the Elastic-net dictionary learning. V I . C O N C L U S I O N Principal activity patterns of the brain are detected. Sparsity of the activ ation maps utilized in a frame work based on dictionary learning. The correlated sparsity pattern of the underlying data showed advantage over pure sparsity pattern due to taking into account the dependencies of functional brain networks. R E F E R E N C E S [1] V . Perlbarg and G. Marrelec, “Contribution of e xploratory meth- ods to the in vestigation of extended large-scale brain networks in functional mri: methodologies, results, and challenges, ” Jour- nal of Biomedical Imaging , vol. 2008, p. 4, 2008. [2] S. Zhao, J. Han, J. Lv , X. Jiang, X. Hu, Y . Zhao, B. Ge, L. Guo, and T . Liu, “Supervised dictionary learning for inferring concurrent brain networks, ” 2015. [3] J. Lv , X. Jiang, X. Li, D. Zhu, H. Chen, T . Zhang, S. Zhang, X. Hu, J. Han, H. Huang et al. , “Sparse representation of whole- brain fmri signals for identiﬁcation of functional networks, ” Medical image analysis , vol. 20, no. 1, pp. 112–134, 2015. [4] M. Ramezani, K. Marble, H. Trang, I. Johnsrude, and P . Abol- maesumi, “Joint sparse representation of brain activity patterns in multi-task fmri data, ” Medical Imaging, IEEE T ransactions on , vol. 34, no. 1, pp. 2–12, 2015. [5] M. J. McK eown, T . J. Sejno wski et al. , “Independent component analysis of fmri data: examining the assumptions, ” Human brain mapping , vol. 6, no. 5-6, pp. 368–372, 1998. [6] I. Daubechies, E. Roussos, S. T akerkart, M. Benharrosh, C. Golden, K. D’ardenne, W . Richter, J. Cohen, and J. Haxby , “Independent component analysis for brain fmri does not select for independence, ” Pr oceedings of the National Academy of Sciences , vol. 106, no. 26, pp. 10 415–10 422, 2009. [7] K. Lee, S. T ak, and J. C. Y e, “ A data-driven sparse glm for fmri analysis using sparse dictionary learning with mdl criterion, ” Medical Imaging, IEEE T ransactions on , vol. 30, no. 5, pp. 1076–1089, 2011. [8] H. Eav ani, R. Filipovych, C. Dav atzikos, T . D. Satterthwaite, R. E. Gur , and R. C. Gur, “Sparse dictionary learning of resting state fmri networks, ” in P attern Recognition in Neur oImaging (PRNI), 2012 International W orkshop on . IEEE, 2012, pp. 73–76. [9] D. Papo, M. Zanin, and J. Martin Buld ´ u, “Reconstructing func- tional brain networks: hav e we got the basics right?” F r ontiers in human neur oscience , vol. 8, p. 107, 2014. [10] M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm for designing ov ercomplete dictionaries for sparse representa- tion, ” IEEE T rans. on Signal Pr ocessing , vol. 54, no. 11, pp. 4311–4322, 2006. [11] Y . C. Eldar and M. Mishali, “Robust recovery of signals from a structured union of subspaces, ” Information Theory , IEEE T ransactions on , vol. 55, no. 11, pp. 5302–5316, 2009. [12] L. Ma, B. W ang, X. Chen, and J. Xiong, “Detecting functional connectivity in the resting brain: a comparison between ica and cca, ” Magnetic r esonance imaging , vol. 25, no. 1, pp. 47–56, 2007. [13] M. H. Lee, C. D. Smyser, and J. S. Shimony , “Resting-state fmri: a revie w of methods and clinical applications, ” American Journal of Neur oradiology , vol. 34, no. 10, pp. 1866–1872, 2013. [14] O. Y amashita, M.-a. Sato, T . Y oshioka, F . T ong, and Y . Kami- tani, “Sparse estimation automatically selects vox els relev ant for the decoding of fmri activity patterns, ” Neur oImage , vol. 42, no. 4, pp. 1414–1429, 2008. [15] J. Golmohammady , M. Joneidi, M. Sadeghi, M. Babaie-Zadeh, and C. Jutten, “K-lda: An algorithm for learning jointly over- complete and discriminati ve dictionaries, ” in 2014 22nd Eur o- pean Signal Pr ocessing Confer ence (EUSIPCO) . IEEE, 2014, pp. 775–779. [16] M. Joneidi, A. Zaeemzadeh, N. Rahnavard, and M. B. Khalil- sarai, “Matrix coherency graph: A tool for improving sparse coding performance, ” in 2015 International Confer ence on Sampling Theory and Applications (SampT A) . IEEE, 2015, pp. 168–172. [17] V . Abolghasemi, S. Ferdowsi, and S. Sanei, “Fast and inco- herent dictionary learning algorithms with application to fmri, ” Signal, Image and V ideo Pr ocessing , vol. 9, no. 1, pp. 147–158, 2015. [18] J. O. Ogutu, T . Schulz-Streeck, and H.-P . Piepho, “Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions, ” in BMC pr oceedings , v ol. 6, no. Suppl 2. BioMed Central Ltd, 2012, p. S10. [19] R. Tibshirani, “Regression shrinkage and selection via the lasso, ” Journal of the Royal Statistical Society . Series B (Methodological) , pp. 267–288, 1996. [20] L. Meier , S. V an De Geer, and P . B ¨ uhlmann, “The group lasso for logistic re gression, ” Journal of the Royal Statistical Society: Series B (Statistical Methodology) , vol. 70, no. 1, pp. 53–71, 2008. [21] B. Efron, T . Hastie, I. Johnstone, R. Tibshirani et al. , “Least angle regression, ” The Annals of statistics , vol. 32, no. 2, pp. 407–499, 2004. [22] H. Zou and T . Hastie, “Regularization and variable selection via the elastic net, ” Journal of the Royal Statistical Society: Series B (Statistical Methodology) , vol. 67, no. 2, pp. 301–320, 2005. [23] N. Parikh and S. Boyd, “Proximal algorithms, ” F oundations and T rends in Optimization , vol. 1, no. 3, pp. 123–231, 2014. Fig. 6: The segmentation results in two slices of brain using pure sparsity constraint (the upper image) versus proposed sparsity . The proposed sparsity is solved using two proposed algorithms. The middle one is the Modiﬁed-KSVD and the bottom image is resulted by the EN-KSVD. (a) Resting state fMRI data in which 50 time slots are observed. (b) Segmentation of region acti vities using K-SVD coef ﬁcients. (c) Segmentation of region acti vities using the proposed DL coef ﬁcients. Fig. 7: Applying brain segmentation on coef ﬁcients extracted by the K-SVD and the proposed dictionary learning.

Functional Brain Networks Discovery Using Dictionary Learning with Correlated Sparsity

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment