Local Discriminant Hyperalignment for multi-subject fMRI data alignment

Local Discriminant Hyperalignment f or multi-subject fMRI data alignment Muhammad Y ousefnezhad, Daoqiang Zhang College of Computer Science and T echnology , Nanjing Univ ersity of Aeronautics and Astronautics, Nanjing, China. { myousefnezhad, dqzhang } @nuaa.edu.cn Abstract Multiv ariate Pattern (MVP) classiﬁcation can map different cognitiv e states to the brain tasks. One of the main chal- lenges in MVP analysis is validating the generated results across subjects. Howe ver , analyzing multi-subject fMRI data requires accurate functional alignme nts between neuronal ac- tivities of different subjects, which can rapidly increase the performance and robustness of the ﬁnal results. Hyperalign- ment (HA) is one of the most ef fecti ve functional alignment methods, which can be mathematically formulated by the Canonical Correlation Analysis (CCA) methods. Since HA mostly uses the unsupervised CCA techniques, its solution may not be optimized for MVP analysis. By i ncorporating the idea of Local Discriminant Analysis (LDA) into CCA, this paper proposes Local Discriminant Hyperalignment (LDHA) as a novel supervised HA method, which can provide better functional alignment for MVP analysis. Indeed, the locality is deﬁned based on the stimuli categories in the train-set, where the correlation between all stimuli in the same category will be maximized and the correlation between distinct categories of stimuli approaches to near zero. Experimental studies on multi-subject MVP analysis conﬁrm that the LDHA method achiev es superior performance to other state-of-the-art HA al- gorithms. Introduction As an imaging technology , functional Magnetic Resonance Imaging (fMRI) measures neural acti vity by employing the Blood-Oxygen-Level-Dependent (BOLD) contrast as a proxy for neural acti vation. The main idea is utilizing these measurements of neural activities to shed light on cogni- tiv e processes. Indeed, fMRI enables us to ask what infor- mation is represented in a region of the human brain and how that information is encoded, instead of asking what is a regions function (Haxby , Connolly , and Guntupalli 2014). Multiv ariate P attern (MVP) classiﬁcation is one of the main techniques in fMRI analysis, which can e xtract and decode brain patterns by applying the classiﬁcation methods (Haxby et al. 2011; Chen et al. 2015; Oswal et al. 2016). In fact, it can predict patterns of neural activities associated with dif- ferent cognitiv e states (Mohr et al. 2015; Chen et al. 2015; Figueiredo and Now ak 2016) and also can deﬁne decision surfaces to distinguish different stimuli for decoding the brain and understanding ho w it works (Haxby et al. 2011; Haxby , Connolly , and Guntupalli 2014). MVP analysis can be used to ﬁnd nov el treatments for mental diseases or e ven to create a new generation of the user interf ace. One of the main challenges in fMRI studies, especially MVP analysis, is using multi-subject datasets. On the one hand, the multi-subject analysis is critical to ﬁgure out the generality and validity of the generated results across sub- jects. On the other hand, analyzing multi-subject fMRI data requires accurate functional and anatomical alignments be- tween neuronal activities of different subjects in order to increase the performance of the ﬁnal results (Haxby et al. 2011; Chen et al. 2014). Indeed, the fMRI datasets must be aligned across subjects in multi-subject studies in order to take between-subject variability into account. As men- tioned before, there are tw o main alignment approaches, i.e. anatomical alignment and functional alignment, which can work in unison. The anatomical alignment is the most com- mon method for aligning fMRI images based on anatomi- cal features by employing structural MRI images, e.g. T a- lairach alignment (T alairach and T ournoux 1988). Ho wev er , this method generated limited accurac y since the size, shape and anatomical location of functional loci differ across sub- jects (W atson et al. 1993; Rademacher et al. 1993). Indeed, anatomical alignment is just used in many fMRI studies as a preprocessing step. By contrast, functional alignment seeks to directly align the brain neural responses across subjects. Hyperalignment (HA) (Haxby et al. 2011) is one of the most famous methods for functional alignment. HA can be mathematically formulated by Canonical Correlation Anal- ysis (CCA). As a result, HA for multi-subject fMRI stud- ies can be deﬁned as a multiple-set CCA (Xu et al. 2012; Lorbert and Ramadge 2012; Chen et al. 2014). Since the un- supervised CCA techniques are employed for solving HA problems, the solution may not be optimized for MVP analy- sis. In other w ords, CCA just ﬁnds a set of mappings to max- imize the correlation between same time-points of functional activities (in voxel-level) for all subjects, while it must max- imize the correlation between homogeneous stimuli (from the same category) and also r emo ve the corr elation between differ ent cate gories of stimuli . Indeed, this is a common problem in Machine Learning. For instance, Linear Discrim- inant Analysis (LD A) is mostly used rather than Principal Component Analysis (PCA) in the classiﬁcation analysis, where LDA uses the supervision information such as class labels or similarity between samples for impro ving the per- formance of classiﬁcation methods. As the main contrib ution of this paper, we introduce Lo- cal Discriminant Hyperalignment (LDHA) method, which incorporates the idea of Local Discriminate Analysis (LD A) into CCA (Peng, Zhang, and Zhang 2010) in order to im- prov e the performance of the hyperalignment solution. In brief, the idea of locality is deﬁned based on the stimuli categories (class labels) in the train-set, where the proposed method ﬁrstly generates two sets for each category of stim- uli, i.e. the set of nearest homogeneous stimuli as within- class neighborhoods and the set of stimuli from distinct cat- egories as between-class neighborhoods. Then, these two sets are used to provide a better HA solution, where the correlation between the within-class neighborhoods is maxi- mized, and also the correlation among between-class neigh- borhoods approaches to near zero. The rest of this paper is organized as follows: In Section 2, this study brieﬂy revie ws some related works. Then, it introduces the proposed method in Section 3. Experimen- tal results are reported in Section 4; and ﬁnally , this paper presents conclusion and pointed out some future works in Section 5. Related W orks There are sev eral studies, which used functional and anatomical features for alignment. Sabuncu et al. (2010) employed cortical warping for maximizing the inter-subject correlation between functional responses across subjects. Conroy et al. (2009) also dev eloped a method to maximize the alignment of intra-subject patterns of cortical functional connectivity by using a cortical w arping. Hyperalignment (HA) is proposed by Haxby et al. (2011), which is an ‘anatomy free’ alignment method based on func- tional features. HA utilized the Procrustean transformation (Sch ¨ onemann 1966) to map each the functional responses of each subject into a common high-dimensional model (tem- plate). The performance of MVP analysis by using the Hy- peralignment is rapidly increased in comparison with the methods that just use the anatomical alignment (Haxby et al. 2011; Haxby , Connolly , and Guntupalli 2014). Lorbert et al. (2012) developed Kernel Hyperalignment (KHA) to conduct nonlinear hyperalignment in an embed- ding space. Xu et al. (2012) introduced the re gularized Hy- peralignment, which makes connections to Canonical Cor- relation Analysis (CCA) (Gower and Dijksterhuis 2004). Dmochowski et al. (2012) applied correlated component analysis to maximize inter-subject correlation by aggregat- ing the subjects data into an individual matrix. Sui et al. (2011; 2013) proposed a ‘multimodal CCA + joint inde- pendent component analysis (ICA)’ on multimodal data to identify the unique and shared variance associated with each imaging modality . Chen et al. (2014) e xamined a two-phase joint SVD-Hyperalignment algorithm, where a joint Sin- gular V alue Decomposition (SVD) is employed to provide dimensionality reduction, then HA aligns the subjects’ re- sponses in the lower dimensional feature space. Micheal et al. (2015) developed the GICA, IV A algo- rithms, which do not assume time-synchronized stimulus. Hence, they concatenate data along the time dimension (im- plying spatial consistency) and learn spatial independent components. Recently , Guntupalli et al. (2016) proposed a linear model of shared representational spaces in human cor- tex. This model can capture ﬁne-scale distinctions among population responses with response-tuning basis functions that are common across brains and models cortical patterns of neural responses with indi vidual-speciﬁc topographic ba- sis functions. The Proposed Method This paper proposes a supervised version of hyperalignment method for applying the MVP classiﬁcation. The procedure is so simple: such as all classiﬁcation problems, there are two sets of data points, i.e. train-set and test-set. The train- set is used for generating the classiﬁcation model. Then, the generated model is ev aluated on the test-set. Since functional activities in dif ferent brains are originally unaligned in com- parison with each other , the previous HA methods just used the data points in the train-set to generate a template for functional aligning in both the train and test sets before the MVP analysis. By contrast, our proposed method uses data points as well as class labels in the train-set for generating the HA template (which will be denoted by G ). In train-set, the preprocessed fMRI time series collected for S subjects can be deﬁned by X ( i ) = n x ( i ) mn o ∈ R T × V , i = 1 : S, m = 1 : T , n = 1 : V , where T denotes the number of time points in unites of TRs (T ime of Repetition), V is the number of vox els, and x ( i ) mn ∈ R denotes the functional activity for the i − th subject in the m − th time point and the n − th vox el. In test-set, we hav e similar notations where the data points are deﬁned by ¯ X ( i ) = n ¯ x ( i ) mn o ∈ R T × V , i = 1 : ¯ S , m = 1 : T , n = 1 : V . Here, ¯ S is the number of subjects in the test- set. In addition, the class labels in the train-set is denoted by Y = n y m o ∈ N T , m = 1 : T . Since there are more voxels than TRs in most of the fMRI studies, X ( i ) and the voxel correlation map ( X ( i ) ) > X ( j ) may not be full rank (Chen et al. 2014; Conroy et al. 2009; Lorbert and Ramadge 2012; Xu et al. 2012). In addition, time synchronized stimulus en- sures temporal alignment, i.e. the m − th time point for all of the subjects represents the same simulation (Xu et al. 2012; Lorbert and Ramadge 2012). Indeed, the main goal of HA methods is aligning the columns of X ( i ) across subjects (Conroy et al. 2009; Xu et al. 2012), where the column rep- resentation of the functional acti vities for i − th subject and n − th voxel can be also deﬁned as follo ws: x ( i ) .n ∈ R T = n x ( i ) mn | x ( i ) mn ∈ X ( i ) and m = 1 : T o (1) W e ﬁrstly need a metric to qualify the functional align- ment. Inter-Subject Correlation (ISC) is a classical metric for functional alignment, which can be deﬁned for two dif- ferent subjects as follows (Haxby et al. 2011; Xu et al. 2012; Chen et al. 2014): ISC ( X ( i ) , X ( j ) ) = ( 1 / V ) tr (( X ( i ) ) > X ( j ) ) = 1 V V X n =1  x ( i ) .n  > x ( j ) .n = 1 V V X m =1 V X n =1 x ( i ) mn x ( j ) mn (2) where tr () is the trace function. If the functional activities are column-wise standardized ( X ( i ) ∼ N (0 , 1) ), the ISC lies in [ − 1 , +1] , where the lar ge v alues represent better alignment (Conroy et al. 2009; Lorbert and Ramadge 2012; Xu et al. 2012; Chen et al. 2014). Based on (2), the hyperalignment can be formulated as follows: ρ = arg max i,j =1: S X i A ( ` ) R ( ` ) = I , ` = 1 : S (4) where I denotes the identity matrix and the matrices A ( ` ) ∈ R V × V , ` = 1 : S are symmetric and positi ve deﬁnite. Gener- ally , if A ( ` ) = I , then we hav e hyperalignment or a multi- set orthogonal Procrustes problem, which is commonly used in share analysis. In addition, if A ( ` ) = ( X ( ` ) ) > X ( ` ) , then (4) denotes a form of multi-set Canonical Correlation Anal- ysis (CCA) (Lorbert and Ramadge 2012; Xu et al. 2012; Lorbert and Ramadge 2012; Chen et al. 2014). Lemma 1. The equation (4) is equivalent to: ρ = arg min S X i =1 k X ( i ) R ( i ) − G k 2 F subject to ( R ( ` ) ) > A ( ` ) R ( ` ) = I , ` = 1 : S (5) wher e G ∈ R T × V is the HA template: G = 1 S S X j =1 X ( j ) R ( j ) (6) Pr oof. Please refer to (Gower and Dijksterhuis 2004; Lorbert and Ramadge 2012) for the proof. Indeed, the HA template ( G ) can be used for functional alignment in the test-set before MVP analysis. Most of pre- vious studies have used CCA for ﬁnding this template (Xu et al. 2012; Haxby et al. 2011; Chen et al. 2014). Lemma 2. Canonical Correlation Analysis (CCA) ﬁnds an optimum solution for solving (4) by e xploiting the objective function max i,j =1 : S  ( R ( i ) ) > C ( i,j ) R ( j )  , and then G also can be calculated based on (6) . Brieﬂy , the CCA solution can be formulated as follows: ρ = arg max i,j =1 : S  ( R ( i ) ) > C ( i,j ) R ( j ) p (( R ( i ) ) > C ( i ) R ( i ) )(( R ( j ) ) > C ( j ) R ( j ) )  (7) wher e C ( i ) ∈ R V × V = E h ( X ( i ) ) > X ( i ) i = ( X ( i ) ) > X ( i ) , C ( j ) ∈ R V × V = E h ( X ( j ) ) > X ( j ) i = ( X ( j ) ) > X ( j ) , and C ( i,j ) ∈ R V × V = E h ( X ( i ) ) > X ( j ) i = ( X ( i ) ) > X ( j ) . The solution of CCA can be obtained by computing a generalized eigen value decomposition pr oblem (Hardoon, Szedmak, and Shawe T aylor 2004; Peng, Zhang, and Zhang 2010). Pr oof. Equation (4) can be written as follo ws: k X ( i ) R ( i ) − X ( j ) R ( j ) k 2 F = − 2( R ( i ) ) > ( X ( i ) ) > X ( j ) R ( j ) +( R ( i ) ) > ( X ( i ) ) > X ( i ) R ( i ) + ( R ( j ) ) > ( X ( j ) ) > X ( j ) R ( j ) ≡ ( R ( i ) ) > ( X ( i ) ) > X ( j ) R ( j ) p (( R ( i ) ) > ( X ( i ) ) > X ( i ) R ( i ) )(( R ( j ) ) > ( X ( j ) ) > X ( j ) R ( j ) ) (8) Remark 1. The HA solution gener ated by unsupervised CCA may not be optimum for MVP analysis. W e just explain two issues in the unsupervised solutions. Consider fMRI time series included visual stimuli, wher e two subjects watch two photos of cats as well as two photos of human faces. In this example, the sequence of stimuli is demonstrated by [ cat1 , face1 , cat2 , face2 ] for each subject after pr epr o- cessing steps. The unsupervised solution ﬁnds two mappings to maximize the corr elation in the voxel-level, wher e the vox- els for each subject are only compar ed with the voxels for other subjects with the same locations. As a r esult, the se- quence of the mentioned comparison in the stimulus-level is shown by: ( S1 : cat1 ↑ S2 : cat1 ) ; ( S1 : face1 ↑ S2 : face1 ) ; ( S1 : cat2 ↑ S2 : cat2 ) ; ( S1 : face2 ↑ S2 : face2 ) wher e ↑ denotes the operator for maximizing correlation and the S1 and S2 ar e the indices of subjects 1 and 2, r espec- tively . Now , we can explain two issues. Indeed, the CCA so- lution her e just maximized the correlation for the stimuli in the same locations, while the y must also maximize the corre- lation between all stimuli in the same cate gory and minimize the corr elation between differ ent cate gories of stimuli. Our appr oach for solving mentioned issues can be illustrated by: ( S1 : cat1 , 2 ↑ S2 : cat1 , 2 ) ; ( S1 : face1 , 2 ↑ S2 : face1 , 2 ) ; ( S1 : cat1 , 2 ↓ S2 : face1 , 2 ) ; ( S1 : face1 , 2 ↓ S2 : cat1 , 2 ) wher e ↓ denotes the operator for minimizing corr elation. This paper proposes Local Discriminant Hyperalignment (LDHA), which combines the idea of locality into CCA (Peng, Zhang, and Zhang 2010) in order to pro vide a better HA solution in the MVP analysis. Since unaligned (before applying the HA method) functional activities in different subjects cannot be directly compared with each other , the neighborhoods matrix α = n α mn o ∈ R T × T is deﬁned by using class labels ( Y ) in the train-set as follows: α nm = α mn =  0 y m 6 = y n 1 y m = y n , m, n = 1 : T , m < n (9) where the number of within-class neighborhoods is the per - mutation of all stimuli in each category , and the number of between-classes neighborhoods denotes by the permutation of all stimuli in distinct categories. The cov ariance matrices for within-class W ( i,j ) = n w ( i,j ) mn o ∈ R V × V and between- classes B ( i,j ) = n b ( i,j ) mn o ∈ R V × V are deﬁned as follows: w ( i,j ) mn = T X ` =1 T X k =1 α `k x ( i ) `m x ( j ) kn + α `k x ( i ) `n x ( j ) km (10) b ( i,j ) mn = T X ` =1 T X k =1 (1 − α `k ) x ( i ) `m x ( j ) kn +(1 − α `k ) x ( i ) `n x ( j ) km (11) where m, n = 1 : V . The Local Discriminant Hy- peralignment (LDHA) objectiv e function is deﬁned by max i,j =1 : S  ( R ( i ) ) > e C ( i,j ) R ( j )  where e C ( i,j ) = W ( i,j ) − ( η /T 2 ) B ( i,j ) . Here, η is the number of non-zero cells in the matrix α , and T is the number of time points in unites of TRs. In addition, the solution of the LDHA can be reformu- lated as follows: ρ = arg max i,j =1 : S,i e C ( i,j ) R ( j ) p (( R ( i ) ) > C ( i ) R ( i ) )(( R ( j ) ) > C ( j ) R ( j ) ) subject to ( R ( ` ) ) > C ( ` ) R ( ` ) = I , ` = 1 : S (12) The main difference between LHDA and HA is the super- vised covariance matrix ( e C ( i,j ) ). Indeed, LDHA can be con- sidered equiv alent of the classical CCA (Lemma 2), where the correlations of non-homogeneous stimuli ( B ( i,j ) ) are participated to the CCA problem with a negati ve sign and all of the homogeneous stimuli in each category will be compared ( W ( i,j ) ) with each other . In addition, LDHA is related to the LDCCA method (Peng, Zhang, and Zhang 2010), where we hav e a mechanism in the LDCCA to man- ually select the relevant neighborhoods for each category of stimuli based on class-labels and also the balance factor is dynamically assigned based on the data structure. Lemma 3. Same as the classical CCA, LDHA can be solved as a generalized eig en value decomposition pr oblem. Pr oof. e C ( i,j )  C ( j )  − 1 e C ( j,i ) R ( i ) =  Λ ( i,j )  2 C ( i ) R ( i ) e C ( j,i )  C ( i )  − 1 e C ( i,j ) R ( j ) =  Λ ( i,j )  2 C ( j ) R ( j ) (13) Algorithm 1 Local Discriminate Hyperalignment (LDHA) Input: Data points X ( i ) and X ( j ) , class labels Y : Output: Hyperalignment parameters R ( i ) and R ( j ) : Method: 1. Generate α by (9). 2. Calculate W ( i,j ) , B ( i,j ) by using (10) and (11). 3. Calculate e C ( i,j ) . 4. Compute H ( i,j ) =  C ( i )  − 1 / 2 e C ( i,j )  C ( j )  − 1 / 2 . 5. Perform SVD: H ( i,j ) = P ( i,j ) Λ ( i,j )  Q ( i,j )  > . 6. Return R ( i ) =  C ( i )  − 1 / 2 P ( i,j ) and R ( j ) =  C ( j )  − 1 / 2 Q ( i,j ) . This paper uses the Singular V alue Decomposition (SVD) to solve LDHA pr oblem, where H ( i,j ) =  C ( i )  − 1 / 2 e C ( i,j )  C ( j )  − 1 / 2 , P ( i,j ) =  C ( i )  1 / 2 R ( i ) , and Q ( i,j ) =  C ( j )  1 / 2 R ( j ) . By considering SVD parame- ters, the (13) is equivalent to:      H ( i,j )  H ( i,j )  > P ( i,j ) =  Λ ( i,j )  2 P ( i,j )  H ( i,j )  > H ( i,j ) Q ( i,j ) =  Λ ( i,j )  2 Q ( i,j ) (14) wher e H ( i,j ) = P ( i,j ) Λ ( i,j )  Q ( i,j )  > . The Hyperalign- ment solution also can be deﬁned as follows:      R ( i ) =  C ( i )  − 1 / 2 P ( i,j ) R ( j ) =  C ( j )  − 1 / 2 Q ( i,j ) (15) Algorithm 1 illustrates the LDHA procedure for solving the HA problem between two different subjects. As men- tioned before, the LDHA is used for MVP analysis. Algo- rithm 2 demonstrates a general template for MVP analysis based on LDHA method. As this algorithm depicted, the procedure of generating the HA template ( G ) in the train stage is changed, while the template is used in the test stage such as the unsupervised HA methods. Therefore, we do not need the class labels in the test stage. Indeed, the pro- posed method in comparison with the unsupervised solu- tions just generates more optimum HA template for aligning functional neural acti vities, where this template can maxi- mize the correlation between all stimuli in the same category and minimize the correlation between different categories of stimuli. Experiments The empirical studies are presented in this section. Same as pre vious studies (Chen et al. 2014; Xu et al. 2012; Lorbert and Ramadge 2012; Haxby et al. 2011), this paper generates the classiﬁcation model by using the ν -SVM algo- rithms (Smola and Sch ¨ olkopf 2004), i.e. the binary ν -SVM T able 1: Accuracy of Classiﬁcation Methods Data Sets ν -SVM HA KHA SCCA SVD-HA LDHA DS005 ( 2 classes) 71.65 ± 0.97 81.27 ± 0.59 83.06 ± 0.36 85.29 ± 0.49 90.82 ± 1.23 94.32 ± 0.16 DS105 ( 8 classes) 22.89 ± 1.02 30.03 ± 0.87 32.62 ± 0.52 37.14 ± 0.91 40.21 ± 0.83 54.04 ± 0.09 DS107 ( 4 classes) 38.84 ± 0.82 43.01 ± 0.56 46.82 ± 0.37 52.69 ± 0.69 59.54 ± 0.99 74.73 ± 0.19 DS117 ( 2 classes) 73.32 ± 1.67 77.93 ± 0.29 84.22 ± 0.44 83.32 ± 0.41 95.62 ± 0.83 95.07 ± 0.27 T able 2: Area Under the R OC Curve (A UC) of Classiﬁcation Methods Data Sets ν -SVM HA KHA SCCA SVD-HA LDHA DS005 ( 2 classes) 68.37 ± 1.01 70.32 ± 0.92 82.22 ± 0.42 80.91 ± 0.21 88.54 ± 0.71 93.25 ± 0.92 DS105 ( 8 classes) 21.76 ± 0.91 28.91 ± 1.03 30.35 ± 0.39 36.23 ± 0.57 37.61 ± 0.62 53.86 ± 0.17 DS107 ( 4 classes) 36.84 ± 1.45 40.21 ± 0.33 43.63 ± 0.61 50.41 ± 0.92 57.54 ± 0.31 72.03 ± 0.37 DS117 ( 2 classes) 70.17 ± 0.59 76.14 ± 0.49 81.54 ± 0.92 80.92 ± 0.28 92.14 ± 0.42 94.23 ± 0.94 Algorithm 2 A general template for MVP analysis by using Local Discriminate Hyperalignment (LDHA) Input: T rain Set X ( i ) , i = 1 : S , T est Set b X ( j ) , j = 1 : ˆ S : Output: Classiﬁcation Performance ( AC C , AU C ): Method: 01. Initiate R ( i ) , i = 1 : S . 02. Do 03. For each subject X ( i ) , i = 1 : S : 04. Update R ( i ) by Alg. 1 and X ( ` ) , ` = i + 1 : S . 05. End F oreach 06. Until X ( i ) R ( i ) , i = 1 : S do not change in this step. 07. T rain a classiﬁer by X ( i ) R ( i ) , i = 1 : S 08. Initiate b R ( j ) , j = 1 : ˆ S . 09. Generate G based on (6) by using R ( i ) , i = 1 : S 10. For each subject b X ( j ) , j = 1 : ˆ S : 11. Compute b R ( j ) by classical HA (Eq. 5,7) and G . 12. End F oreach 13. Evaluate the classiﬁer by using b X ( j ) b R ( j ) , j = 1 : ˆ S . for datasets with just two categories of stimuli, and multi- label ν -SVM (Smola and Sch ¨ olkopf 2004; Lorbert and Ra- madge 2012) as multi-class approach. All emplo yed datasets in this paper are separately preprocessed by SPM 12 (6685) (www .ﬁl.ion.ucl.ac.uk/spm/), i.e. slice timing, anatomical alignment, normalization, smoothing. Regions of Interests (R OIs) are also deﬁned by using the main references of each dataset. The features (vox els in the R OIs) are partitioned to train set and test set by using Leave-One-Out (LOO) cross- validation across subjects (leav e-one-subject-out). The HA methods are applied for functional aligning the neural ac- tivities and generating the general template ( G ) . Finally , the classiﬁcation model is generated for ev aluating the per - formance of different methods. Performance of LDHA is compared with the ν -SVM algorithm as the baseline (it just uses anatomical alignment without the hyperalignment mapping), the standard hyperalignment (HA) (Haxby et al. 2011; Guntupalli et al. 2016), Kernel-based hyperalignment (KHA) (Lorbert and Ramadge 2012), Regularized hyper- alignment (SCCA) (Xu et al. 2012), and Joint SVD hyper- alignment (SVD-HA) (Chen et al. 2014). Further , KHA al- gorithm is employed by the Gaussian kernel, which gener- ated the best results in the original paper (Lorbert and Ra- madge 2012). In addition, regularized parameters ( α, β ) in SCCA are considered optimum based on (Xu et al. 2012). All algorithms are implemented in the MA TLAB R2016b (9.1) on a PC with certain speciﬁcations 1 by authors in or- der to generate experimental results. Simple T asks Analysis This paper utilizes 4 datasets, shared by openfmri.org, for running empirical studies of this section. These datasets contain simple tasks such as watching a gray-scale photo or tapping a key , etc. As the ﬁrst dataset, ‘V isual Object Recognition’ (DS105) includes 6 subjects and 71 sessions. It also contains 8 classes (categories) of visual stimuli, i.e. gray-scale images of faces, houses, cats, bottles, scissors, shoes, chairs, and scrambles (nonsense patterns). Please see (Haxby et al. 2011; Haxby , Connolly , and Guntupalli 2014) for more information. As the second dataset, ‘Multi- subject, multi-modal human neuroimaging dataset’ (DS117) includes MEG and fMRI images for 19 subjects and 171 ses- sions. This paper just uses the fMRI images of this dataset. It also contains 2 classes of visual stimuli, i.e. human faces, and scrambles. Please see (W akeman and Henson 2015) for more information. The responses of vox els in the ventral temporal cortex (VT) are analyzed for these two datasets. As the third dataset, ‘W ord and Object Processing’ (DS107) includes 49 subjects and 98 sessions. It contains 4 classes of visual stimuli, i.e. words, objects, scrambles, consonants. R OIs and technical information are deﬁned based on (Dun- can et al. 2009). As the last dataset, ‘Mixed-gambles task’ (DS005) includes 16 subjects and 48 sessions. It also con- tains 2 classes of risk tasks in the human brain, where the chance of selection is 50 / 50 . Further, the R OIs for func- tional alignment are selected based on the original paper (T om et al. 2007). T able 1 and 2 respectively demonstrate the classiﬁcation Accuracy and Area Under the R OC Curve (A UC) in per - centage (%) for the predictors. These tables report the per- formance of predictors based on the cate gories of the stim- uli. As these tables demonstrate, the proposed algorithm has generated better performance in comparison with other methods because it pro vided a better functional alignment of 1 DEL , CPU = Intel Xeon E5-2630 v3 (8 × 2.4 GHz), RAM = 64GB, OS = Elementary OS 0.4 Loki 200 400 600 800 1000 1200 20 25 30 35 40 45 50 55 60 65 70 75 80 Classification Accuracy (%) # of voxels per hemisphere vSVM HA KHA SCCA SVD-HA LDHA (a) Forrest Gump (TRs = 100) 200 400 600 800 1000 1200 20 25 30 35 40 45 50 55 60 65 Classification Accuracy (%) # of voxels per hemisphere vSVM HA KHA SCCA SVD-HA LDHA (b) Raiders of the Lost Ark (TRs = 100) 200 400 600 800 1000 1200 20 25 30 35 40 45 50 55 60 65 70 75 80 Classification Accuracy (%) # of voxels per hemisphere vSVM HA KHA SCCA SVD-HA LDHA (c) Forrest Gump (TRs = 200) 200 400 600 800 1000 1200 20 25 30 35 40 45 50 55 60 65 70 75 Classification Accuracy (%) # of voxels per hemisphere vSVM HA KHA SCCA SVD-HA LDHA (d) Raiders of the Lost Ark (TRs = 200) 200 400 600 800 1000 1200 20 25 30 35 40 45 50 55 60 65 70 75 80 Classification Accuracy (%) # of voxels per hemisphere vSVM HA KHA SCCA SVD-HA LDHA (e) Forrest Gump (TRs = 400) 200 400 600 800 1000 12 00 20 25 30 35 40 45 50 55 60 65 70 75 80 Classification Accuracy (%) # of voxels per hemisphere vSVM HA KHA SCCA SVD-HA LDHA (f) Raiders of the Lost Ark (TRs = 400) 200 400 600 800 1000 1200 20 25 30 35 40 45 50 55 60 65 70 75 80 Classification Accuracy (%) # of voxels per hemisphere vSVM HA KHA SCCA SVD-HA LDHA (g) Forrest Gump (TRs = 2000) 200 400 600 800 1000 1200 20 25 30 35 40 45 50 55 60 65 70 75 80 Classification Accuracy (%) # of voxels per hemisphere vSVM HA KHA SCCA SVD-HA LDHA (h) Raiders of the Lost Ark (TRs = 2000) Figure 1: Comparison of 6 dif ferent HA algorithms by using 100 TRs, 400 TRs, 800 TRs, and 2000 TRs and 100 to 1200 ranked v oxels. neural activities by exploiting the locality properties of the fMRI datasets. This issue is more signiﬁcant when the num- ber of classes is more than 2 such as datasets DS105 and DS107. Complex T asks Analysis This section employs two fMRI datasets, which are related to watching mo vies. As the ﬁrst dataset, ‘ A high-resolution 7-T esla fMRI dataset from complex natural stimulation with an audio movie’ (DS113) includes the fMRI data of 20 sub- jects, who watched ‘F orrest Gump (1994)’ movie during the experiment. This dataset provided by www .openfmri.org. Please see (Hanke et al. 2014) for more information. In the second dataset, subjects watched ‘Raiders of the Lost Ark (1981)’, where whole brain volumes are 48 and the number of subjects are 10. Please see (Chen et al. 2014; Chen et al. 2015; Lorbert and Ramadge 2012; Sabuncu et al. 2010) for more information. The responses of vox els in the ventral temporal cortex (VT) are collected to align the data while subjects watched the movie. Figure 1 il- lustrates the generated results for these two datasets. As depicted in this ﬁgure, we rank order the voxels by em- ploying the voxel selection method in (Haxby et al. 2011; Chen et al. 2014); and the e xperiments are repeated by using the different number of ranked voxels, i.e. 100, 200, 400, 600, 800, 1000, and 1200. In addition, the empirical stud- ies are reported by using the ﬁrst 100 TRs, 400 TRs, 800 TRs, and 2000 TRs in both datasets. Figure 1 illustrates that the LDHA achie ves superior performance to other HA al- gorithms. As mentioned before, our method can generate a better functional alignment of neural activities when the concept of the locality used for functional alignments. This improv ement is more signiﬁcant when the number of TRs in ﬁgure 1 is limited. Conclusion One of the main challenges in fMRI studies, especially MVP analysis, is using multi-subject datasets. On the one hand, the multi-subject analysis is necessary to estimate the v a- lidity of the generated results across subjects. On the other hand, analyzing multi-subject fMRI data requires accurate functional alignment between neuronal activities of differ - ent subjects for improving the performance of the ﬁnal re- sults. Hyperalignment (HA) is one of the most ef fectiv e functional alignment methods, which can be formulated as a CCA problem for aligning neural acti vities of different sub- jects to a common space. The HA solution in MVP analysis may not be optimum because it mostly utilizes the unsuper- vised CCA techniques for functional alignment. This paper proposes the Local Discriminant Hyperalignment (LDHA) as a nov el supervised HA solution, which employs the con- cept of locality in machine learning for improving the per- formances of both functional alignment and MVP analysis. Indeed, this paper deﬁnes the locality based on the stimuli categories (class labels) in the train-set. In a nutshell, the proposed method ﬁrstly generates tw o sets for each cate gory of stimuli, i.e. the set of homogeneous stimuli as within- class neighborhoods and the set of stimuli from distinct cate- gories as between-class neighborhoods. Then, these two sets are used to provide a better HA solution, where the correla- tion between the homogeneous stimuli is maximized, and also the correlation between different categories of stimuli is near to zero. Experimental studies on multi-subject MVP analysis demonstrate that the LDHA method achiev es supe- rior performance to other state-of-the-art HA algorithms. In the future, we will plan to dev elop a kernel-based version of LDHA for impro ving its performance in non-linear prob- lems. Acknowledgment W e thank the anonymous revie wers for comments. This work was supported in part by the National Natural Sci- ence Foundation of China (61422204 and 61473149), Jiangsu Natural Science Foundation for Distinguished Y oung Scholar (BK20130034) and NUAA Fundamental Re- search Funds (NE2013105). References [Chen et al. 2014] Chen, P . H.; Guntupalli, J. S.; Haxby , J. V .; and Ramadge, P . J. 2014. Joint svd-hyperalignment for multi-subject fmri data alignment. In IEEE Interna- tional W orkshop on Machine Learning for Signal Pr ocessing (MLSP) , 1–6. IEEE. [Chen et al. 2015] Chen, P . H. C.; Chen, J.; Y eshurun, Y .; Hasson, U.; Haxby , J.; and Ramadge, P . J. 2015. A reduced- dimension fmri shared response model. In Advances in Neu- ral Information Pr ocessing Systems , 460–468. [Conroy et al. 2009] Conroy , B.; Singer , B.; Haxby , J.; and Ramadge, P . J. 2009. fmri-based inter-subject cortical align- ment using functional connectivity . In Advances in Neural Information Pr ocessing Systems , 378–386. [Dmochowski et al. 2012] Dmochowski, J. P .; Sajda, P .; Dias, J.; and Parra, L. C. 2012. Correlated components of ongoing eeg point to emotionally laden attention–a possible marker of engagement? F r ontiers in Human Neur oscience 6:112. [Duncan et al. 2009] Duncan, K. J.; P attamadilok, C.; Knierim, I.; and De vlin, J. T . 2009. Consistency and variability in functional localisers. Neur oImag e 46(4):1018–1026. [Figueiredo and Now ak 2016] Figueiredo, M. A., and Now ak, R. D. 2016. Ordered weighted l1 regularized regression with strongly correlated covariates: Theoretical aspects. In Proceedings of the 19th International Confer- ence on Artiﬁcial Intelligence and Statistics , 930–938. [Gower and Dijksterhuis 2004] Gower , J. C., and Dijkster- huis, G. B. 2004. Pr ocrustes pr oblems , volume 30. Oxford Univ ersity Press on Demand. [Guntupalli et al. 2016] Guntupalli, J. S.; Hank e, M.; Halchenko, Y . O.; Connolly , A. C.; Ramadge, P . J.; and Haxby , J. V . 2016. A model of representational spaces in human cortex. Cerebr al Cortex bhw068. [Hanke et al. 2014] Hanke, M.; Baumgartner , F . J.; Ibe, P .; Kaule, F . R.; Pollmann, S.; Speck, O.; Zinke, W .; and Stadler , J. 2014. A high-resolution 7-tesla fmri dataset from complex natural stimulation with an audio movie. Scientiﬁc Data 1. [Hardoon, Szedmak, and Shawe T aylor 2004] Hardoon, D. R.; Szedmak, S.; and Shawe T aylor , J. 2004. Canonical correlation analysis: An overvie w with application to learning methods. Neural Computation 16(12):2639–2664. [Haxby et al. 2011] Haxby , J. V .; Guntupalli, J. S.; Connolly , A. C.; Halchenko, Y . O.; Conroy , B. R.; Gobbini, M. I.; Hanke, M.; and Ramadge, P . J. 2011. A common, high- dimensional model of the representational space in human ventral temporal corte x. Neur on 72(2):404–416. [Haxby , Connolly , and Guntupalli 2014] Haxby , J. V .; Con- nolly , A. C.; and Guntupalli, J. S. 2014. Decoding neural representational spaces using multi variate pattern analysis. Annual Revie w of Neur oscience 37:435–456. [Lorbert and Ramadge 2012] Lorbert, A., and Ramadge, P . J. 2012. K ernel hyperalignment. In Advances in Neural Infor- mation Pr ocessing Systems , 1790–1798. [Michael et al. 2015] Michael, A. M.; Anderson, M.; Miller , R. L.; Adalı, T .; and Calhoun, V . D. 2015. Preserving subject variability in group fmri analysis: performance ev aluation of gica vs. iv a. Distributed Networks-New Outlooks on Cer e- bellar Function 106. [Mohr et al. 2015] Mohr , H.; W olfensteller , U.; Frimmel, S.; and Ruge, H. 2015. Sparse regularization techniques pro- vide novel insights into outcome integration processes. Neu- r oImag e 104:163–176. [Oswal et al. 2016] Oswal, U.; Cox, C.; Lambon-Ralph, M.; Rogers, T .; and Nowak, R. 2016. Representational similarity learning with application to brain netw orks. In Pr oceedings of The 33r d International Confer ence on Mac hine Learning , 1041–1049. [Peng, Zhang, and Zhang 2010] Peng, Y .; Zhang, D.; and Zhang, J. 2010. A ne w canonical correlation analysis algo- rithm with local discrimination. Neural Pr ocessing Letters 31(1):1–15. [Rademacher et al. 1993] Rademacher , J.; Caviness, V . S.; Steinmetz, H.; and Galab urda, A. 1993. T opographical v ari- ation of the human primary cortices: implications for neu- roimaging, brain mapping, and neurobiology . Cer ebr al Cor - tex 3(4):313–329. [Sabuncu et al. 2010] Sabuncu, M. R.; Singer , B. D.; Con- roy , B.; Bryan, R. E.; Ramadge, P . J.; and Haxby , J. V . 2010. Function-based intersubject alignment of human cor- tical anatomy . Cerebr al Cortex 20(1):130–140. [Sch ¨ onemann 1966] Sch ¨ onemann, P . H. 1966. A generalized solution of the orthogonal procrustes problem. Psychome- trika 31(1):1–10. [Smola and Sch ¨ olkopf 2004] Smola, A. J., and Sch ¨ olkopf, B. 2004. A tutorial on support v ector regression. Statistics and Computing 14(3):199–222. [Sui et al. 2011] Sui, J.; Pearlson, G.; Caprihan, A.; Adali, T .; Kiehl, K. A.; Liu, J.; Y amamoto, J.; and Calhoun, V . D. 2011. Discriminating schizophrenia and bipolar disorder by fusing fmri and dti in a multimodal cca+ joint ica model. Neur oImag e 57(3):839–855. [Sui et al. 2013] Sui, J.; He, H.; Pearlson, G. D.; Adali, T .; Kiehl, K. A.; Y u, Q.; Clark, V . P .; Castro, E.; White, T .; Mueller , B. A.; et al. 2013. Three-way (n-way) fusion of brain imaging data based on mcca+ jica and its application to discriminating schizophrenia. Neur oImag e 66:119–132. [T alairach and T ournoux 1988] T alairach, J., and T ournoux, P . 1988. Co-planar ster eotaxic atlas of the human br ain. 3- Dimensional pr oportional system: an appr oach to cerebr al imaging . Thieme. [T om et al. 2007] T om, S. M.; Fox, C. R.; Trepel, C.; and Poldrack, R. A. 2007. The neural basis of loss aversion in decision-making under risk. Science 315(5811):515–518. [W akeman and Henson 2015] W akeman, D. G., and Henson, R. N. 2015. A multi-subject, multi-modal human neu- roimaging dataset. Scientiﬁc Data 2. [W atson et al. 1993] W atson, J. D.; Myers, R.; Frackowiak, R. S. J.; Hajnal, J. V .; W oods, R. P .; Mazziotta, J. C.; Shipp, S.; and Zeki, S. 1993. Area v5 of the human brain: e vidence from a combined study using positron emission tomography and magnetic resonance imaging. Cerebr al Cortex 3(2):79– 94. [Xu et al. 2012] Xu, H.; Lorbert, A.; Ramadge, P . J.; Gun- tupalli, J. S.; and Haxby , J. V . 2012. Regularized hyper- alignment of multi-set fmri data. In IEEE Statistical Signal Pr ocessing W orkshop (SSP) , 229–232. IEEE.

Local Discriminant Hyperalignment for multi-subject fMRI data alignment

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment