Classification of Major Depressive Disorder via Multi-Site Weighted LASSO Model

Large-scale collaborative analysis of brain imaging data, in psychiatry and neu-rology, offers a new source of statistical power to discover features that boost ac-curacy in disease classification, differential diagnosis, and outcome prediction. Howe…

Authors: Dajiang Zhu, Br, alyn C. Riedel

Classification o f Major Depre ssive Disor der via Multi-Site Wei ghted LASSO M odel Dajiang Zhu 1 , Brandal yn C. Riedel 1 , Neda Jahanshad 1 , Nynke A. Gro enewold 2 ,3 , Dan J. Stein 3 , Ian H. Gotlib 4 , Matthe w D. Sacchet 5 , Danai Dima 6,7 , James H. Cole 8 , Cynthia H.Y. Fu 9 , Henrik Walter 10 , Ilya M. Veer 11 , Tho mas Frodl 11 ,1 2 , Lianne Schmaal 13.14,15 , Dick J. Veltman 15 , Paul M. T hompson 1 1 Imaging Genetics Center, USC Stevens Neuroimaging and Informatics Institute, Keck School of Medicine of the University of Southern California , CA, USA; 2 BCN NeuroImaging Center and Department of Neuroscience of the Unive rsity of Groningen, University Medical Center Groningen, The Netherlands; 3 Dept of Psychiatry and Mental Health, University of Cape Town , South Africa; 4 Neurosciences Program and Department of Psychology, Stanford University , CA, USA; 5 Department of Psychiatry and Behavioral Sciences, Stanford University , CA, USA; 6 Dept of Neuroimaging, Institute of Psychiatry, Psychology and Neuro science, King’s College London , UK; 7 Dept of Psychology, School of Arts and Social Science, City, University of London , U K; 8 Department of Medicine , Imperial College London , U K; 9 Department of Psychological Medicine, King’s College L ondon , U K; 10 Dept of Psychiatry and Psychotherapy, Charité Universitätsmedizin Berlin , Germany; 11 Department of Psychiatry, Trinity College Dublin, Ireland; 12 Dept of Psychiatry and Psychotherapy, Otto von Guericke University Magdeburg, Germany; 13 Dept of Psychiatry and Neuroscience Campus Am sterdam, VU University Medical Center, The Netherlands; 14 Orygen, The National Centre of Excellence in Youth Mental Health, Australia; 15 Center for Youth Mental Health, The University of Melbourne, Australia Abstract. Large-scale collaborative anal ysis o f brain imaging d ata, in psychia- try and neurology, offers a new source o f statistical power to discover features that boost accuracy in disease classification, differential diagno sis, and outcome prediction. However, due to data privacy re gulations o r li mited accessibilit y to large datasets across the world , it is challenging to efficie ntly integrate distrib - uted information. Here we propose a novel classi fication framew ork throu gh multi-site weighted LASSO: each site p erforms an iterative weighted LASSO for feature selection separately. Within each iteration, the classification resu lt and the selected features are collected to update the weighting parameters for each feature. This new weight is used to guide the LASSO process at th e next iteration. Only the features that help to improve the classification accuracy are preserved. In tests on data from five sites ( 299 patients with major depressive disorder (MDD) and 258 norm al co ntrols) , our method boosted classification accuracy for MDD by 4 .9% on average . This result shows th e p otential of the proposed new strategy as an effective and practical collaborative platform for machine learning on large scale distributed imaging and biobank data. Keywords: MDD , weighted LASSO 1 Introduction Major depressive d isorder (MDD) affects over 350 million people worldwide [1] and takes an im mense personal toll on patients and their families, p lacing a vast econo mic burden on society. MDD involves a wide spectru m of sympto ms, var ying risk factors, and var ying response to treat ment [2] . Unfortunately, early diagnosis o f MDD is c hal- lenging and is based on behavioral criteria; consistent structural and functional brain abnormalities in MDD are just beginning to b e understood. Neuroimaging of large cohorts can identify c haracteristic corr elates of depression, and may also help to de- tect modulatory ef fects of interventions, and en vironmental and genetic risk factors. Recent ad vances in brain i maging, s uch as magnetic resonance imaging (MRI) and its variants, allow r esearchers to investigate brain abnormalities and identify statistical factors that i nfluence them, a nd ho w they relate to diag nosis a nd outcomes [ 12 ]. Re- searchers have rep orted br ain structural and functio nal alterations i n MDD using dif- ferent modalitie s o f MRI. Recently, the ENIGM A-MDD Working Gro up found that adults with MDD have thinner cor tical gra y matter i n the o rbitofrontal cortices, i nsu- la, anterior/posterior cingulate and t e mporal lobes co mpared to healthy adul ts without a diagnosis of MDD [3 ]. A subcor tical study – the largest to date – showed that MDD patients tend to have s maller hipp ocampal volumes than controls [4]. Diffusion tensor imaging (DTI) [5] reveals, on average , lo wer fractional anisotrop y in the frontal lobe and right occipital lob e of MDD p atients. MDD patie nts may also show aberr ant func- tional con nectivity in the default mode network (DMN) and other tas k -related func- tional brain networks [6]. Fig. 1. Overview of our proposed framework. Even so, classification of MD D is st ill c hallenging. T here ar e three major barriers : first, t hough significant d ifference s have been found, these previously identified b rain regions or brain measures are not always consi stent markers for MDD classification [7]; second, besides T1 imaging, other modalitie s i ncluding DT I and functiona l mag- netic reso nance imaging (fM RI) are not common ly acquired in a clinical setting; last, it is not al ways easy for collaborati ng medical centers to perform an integrated d ata analysis due to data privacy re gulations that li mit t he exchan ge of individual ra w data and due to large transfer times and storage requirements for thousands of images. As biobanks grow, we need an efficient platform to integrate predictive infor matio n fro m multiple centers; as the avail able datasets increase, this effort should increase the statistical po wer to identify p redicto rs of d isease diagnosis and future o utcomes, be- yond what each site co uld identify on its own . In this study, we introduc e a mu lti-site weighted LASS O (MSW -LASSO) model to b oost classification performance for ea ch individual participatin g site, b y integrat- ing their kno wledge for feature selection and res ults fro m cl assificatio n. As s hown i n Fig. 1 , our proposed frame work features the following c haracteristics: ( 1) each site retains t heir o wn data and p erforms weighted LASSO regression, for feature selec - tion, locally; (2) o nly the selected brain measures and the classification result s are shared to o ther sites; (3) infor mation o n t he selected brain measures and the corre- sponding clas sification res ult s are integrated to generate a unified weight vec tor across features; this is then s en t to each site. T his weight vecto r will be applied to the weighted LASSO in the next i teration; (4 ) if the new weight vector leads to a new set of brain measures and better classification performance, the new set o f brain measures will be sent to other si tes. Other wise, it is discard ed and the o ld one is recovered. 2 Methods 2.1 Data and de mographics For this st udy, we used d ata from five sites across the world . The total number of participants is 557 ; all of them were old er than 2 1 years old. D emographic i nfor- mation for each site’s par ticipants i s summarized in Table 1 . Sites Total N Total N of MDD patients (%) Total N of Controls (%) Age of Controls (Mean ± SD; y) Age of MDD (Mean ± SD ; y) % Female MDD % Female Total 1 Groningen 45 22 (48.89%) 23 (51.11%) 42.78 ± 14. 36 43.14 ± 13.8 72.73 73.33 2 Stanford 110 54 (49.09%) 56 (50.91) 38.17 ± 9.97 37.75 ± 9.78 57.41 60 .00 3 BRCDECC 130 69 (53.08%) 61 (46.92%) 51.72 ± 7.94 47.85 ± 8.91 68.12 60.77 4 Berlin 172 101 (58.72%) 71 (41.28%) 41.09 ± 12.85 41.21 ± 11.82 64.36 60.47 5 Dublin 100 53 (53%) 47 (47%) 38.49 ± 12.37 41.81 ± 10.76 62.26 57 .00 Combined 557 299 (53.68%) 258 (46.32$) Table 1. Demographics for the five sites participating in the current study. 2.2 Data preproce ssing As in most co mmon clinical settings, onl y T1 -weighted MRI brain scans were ac- quired at each site ; quality control and a nalyses were perfor med locally. Sixt y-eight (34 left/34 right) co rtical gray matter regions, 7 subcortical gray matter regions and the lateral ventricles were segmented with FreeS urfer [ 8] . Detailed i mage acq uisition, pre-processing, brain segmentation and qualit y control methods may b e found in [ 3, 9]. Brain measures include co rtical thickness a nd s urface area for co rtical regions and volume for subcortical regions a nd later al ventricles. In total, 152 brain measure s were considered in this study. 2.3 Algorithm overview To better illustrate the al gorithms, we define the follo wing notations: 1.   : The selected b rain measures (feat ures) of S ite-i ; 2.   : The classification per formance of Site-i ; 3. W : The weight vector; 4. w-LASSO (W,   ) : Performing weighted LASSO o n   with weight vector – W ; 5. SVM (   ,   ) : Performing SVM classi fier on   using the feature set -   ; The algorithms have two p arts that are run at each site , and an integration ser ver. At first, the integration se rver initializes a weight vector with all ones and send s it to all sites. Eac h site use this weig ht vector to conduct weighted LASSO ( Section 2.6 ) w ith their own data lo cally. If the selected features have better classification p erformance, it will send th e new features a nd the correspondin g cla ssification res ult to t he integra- tion server. I f there is no improve ment i n classi fication accurac y, it will send the o ld ones. After t he integration server receives the updates from all sites, it generate s a new weight vector ( Section 2.5 ) accor ding to different feature sets and their cla ssifi- cation performance. T he detailed strateg y is discussed in Sec tion 2.5 . Algorithm 1 ( Integr ation Server) 1. Initializ e W (with all featur es weighted as one) 2. Send W to all sites 3. while at least one site h as improv ement on A 4. up date W (Section 2.5) 5 . Send W to all sit es 6. end w hile 7. Send W with null to all sites Table 2. Main steps of Algorithm 1. Algorithm 2 ( Site-i ) 1.       0 2. w hile received W is not null 3.   󰆒  w-LASSO (W,   ) (Section 2.6) 4. if   󰆒 ≠   5.   󰆒  SVM (   󰆒 ,   ) 6. if   󰆒 >   7. send   󰆒 and   󰆒 to Integration S erver 8.      󰆒 ,      󰆒 9. else send   and   to Integration Ser ver 10. end if 11 . end if 12. end w hile Table 3. Main steps of Algorithm 2. 2.4 Ordinary LASSO and w eigh ted LASSO LASSO [11] is a shrinkage method for linear r egression. T he ordinary LA SSO is defined as:   (LASSO) = arg min             + λ        (1) Y and x ar e the ob servations and predicto rs. λ is kno wn as the sparsity parameter. It minimizes the su m of squared errors while pe nalizing the s um o f the ab solute values of the coe fficients -  . As LASSO regres sion will force many coefficients to be zero, it is widely used for variable selection [ 11] . However, the classical LASSO shrinkage procedure might be bias ed when esti - mating lar ge coefficients [1 2]. To alleviate this risk, ad aptive LASSO [1 2] was d evel- oped and it tends to assign each pr edictor w ith d ifferent pe nalty para meters . Thus it can a void having larger coef ficients penalized more heavily t han small coefficie nts. Similarly, the motivation o f multi -site weighted LASSO (MSW -LASSO) is to penal- ize different pr edictors (brain measures), by assigning different weights, accor ding to its clas sification per formance acro ss all site s. Ge nerating the weights for each brain measure (feature) and the MS W -LASSO model ar e discussed in Section 2.5 and 2.6 . 2.5 Generation of a Multi-Site Weight In Algorith m 1 , after the i ntegration server receives the info rmation on selec ted fea- tures (brain measures) and th e co rresponding classi fication performance of each site, it generates a new weig ht for each feature. T he new weight fo r the   feature is:   =             (2)    =               (3) Here m is the number o f sites.   is the classificatio n accurac y of site - s .   is the pro- portion of p articipants in site - s relative to the total number of participants at all sites. Eq. (3) penalizes the features that only “ survi ved ” in a s mall number of sites. On the contrary, if a speci fic feature was selected by all sites, meaning all sites agree t hat this feature is impor tant, it tend s to have a larger weig ht. In Eq. (2) we co nsider bo th the classification performance and the prop ortion o f samples. I f a site has achie ved very high cla ssification accurac y and it has a relati vely small sa mple size compared to other sites, t he features selected will be co nservatively “recommend ed ” to o ther sites. In general, if the feature was selected b y more sites and resulted in higher cla ssifica- tion accuracy, it has lar ger weights. 2.6 Multi-Site weight LASSO In this section, we define the multi -site weighted LASSO (MSW -LASSO) model:    = arg min             + λ  󰇛             󰇜       (4) Here   represents the MRI measures a fter controlli ng the effects of a ge, sex a nd in- tracranial volu me (ICV), which are managed within d ifferent sites. y is the label indi- cating MDD patient or control. n is the 1 52 brain measures (features) in this st udy. In our MSW -LASSO m odel, a feature with larger weights i mplies higher classificatio n performance and/or recognitio n by multiple site s. Hence it will be penalized less and has a greater chance of bein g selected by the site s that did not consider this feature in the previous iteration. 3 Results 3.1 Classification improve ments thro ugh t he MSW-LASSO model In this st udy, we applied Algorithm 1 and Algorith m 2 on data from five sites across the world. In the first iteratio n, the i ntegration ser ver initialized a wei ght vector with all ones and sent it to all sites. T herefore, these five sites conducted regular LASSO regression in t he first round. After a s mall set of features was selec ted us ing similar strategy i n [9] within each sit e, they performed classificati on locally using a support vector machine (SVM) and shared the best classificatio n acc uracy to the i ntegratio n server, as well as t he set of sel ected features. T hen the inte gration server generate d the new weight accor ding to Eq. (2 ) and sent it back to all sites. Fro m t he second itera - tion, each site per formed MSW -LASSO until none of them ha s improvement on t he classification result. In total, the se five sites ran MSW -L ASSO for six iterations ; t he classification perfor mance for each ro und is summarized in Fig. 2 (a - e) . Fig. 2. Applying MSW -LASSO to the d ata coming from five sites (a-e). Each subfigure shows the classifica tion accuracy (ACC), specificity (SPE) and sensitivity (SEN) at each iteration. (f) shows the improvement in classification accuracy at each site after performing MSW -LA SSO. Though the Stanford and Berli n site s did not sho w any i mprovements after t he seco nd iteration, the classification per formance at the BRCDECC site and Dublin cont inued improving u ntil the sixth iteration. Hence our MSW- LASSO terminated at the sixth round. F ig. 2f shows the improve ments o f clas sification accurac y for all five sites - the average i mprovement is 4 .9 %. The sparsity level of t he LASSO is set as 16 % - which means that 1 6% of 152 features te nd to be selected in the LASSO proce ss. Section 3.3 sho ws t he rep roducibility of res ults with different sparsity leve ls. When conducing SVM cla ssificatio n, the sa me kernel (RBF) was used, and we perfor med a grid search for possible par ameters. Only the best cla ssification results ar e adopted . 3.2 Analysis of M SW-LASSO fea tures In the process o f MSW - LASSO, only t he new set o f feat ures resultin g in improve- ments in clas sification ar e accepted . Otherwise, the prior se t of features is p reserved. The ne w features are also “rec ommended” to other sites by i ncreasing the co rrespond- ing weights of t he ne w features. Fig. 3 displays the cha nges of the involved features through six iterations and the top 5 features selected by the major ity of sites. Fig. 3. (a) Number of in volved features throu gh six iterations. (b-f) The top five co nsistently selected features across sites. Within each subfigure, the top showed the locations of th e corre- sponding features and the bottom indicated how many sites selected this feature through the MSW -LA SSO process. (b -c) are cortical thickness and (d-f) are surface area measures. At the first iteratio n, t here are 88 features selected b y five sites. T his number dec reas- es over MSW -LASSO iterations. Only 73 feat ures are pr eserved a fter six i terations but the average classificatio n accurac y increased by 4.9 %. Moreover, if a feature is originally selected by t he majority of sites, it tends to b e continually selected after multiple iterations ( Fig. 3d -e ). For those “ pro mising” features that are accepted by fewer sites at first, t hey might be incorpor ated by more site s as the iteratio n increased ( Fig. 2b -c, f ). 3.3 Reproducibility of the M SW-LASSO Selected Features Improvement, in % Selected features Improvement, in % ACC SPE SEN ACC SPE SEN 13% 3.1 1.8 4.4 33% 2.6 3.1 2.5 20% 3.9 1.4 6.0 36% 1.7 2.1 1.5 23% 3.8 2.9 4.4 40% 2.5 4.1 1.4 26% 4.3 3.4 5.2 43% 3.1 1.1 5.0 30% 2.9 3.0 2.9 46% 2.8 3.9 1.9 Table 4. Repro ducibility results with different sparsit y levels. The colu mn of selected features represents the per centage of features preser ved during the LASSO p rocedure, and the average improveme nt in accuracy, sensiti vity, and specificity b y sparsity. For LASSO -related proble ms, there is no closed -for m solution for the selec tion of sparsity level; this is highly data dep endent. T o validate our MSW -LASSO model, we repeated Algorith m 1 and Algorithm 2 at different sparsity level s, which leads to preservation of d ifferent proport ions of the features. T he reproducibilit y performance of our prop osed MSW -LASSO is summarized in Table 4 . 4 Conclusion and Discussion Here we pro posed a novel multi- site wei ghted LASSO model to heuristically improve classification perfor mance for multiple sites. By sharing the knowledge of features that mig ht help to i mprove cl assification ac curac y with oth er sites, each site ha s mul- tiple opportunities to reconsider its o wn set of selected features and strive to increase the accurac y at eac h iteration. In this study, the average i mprovement in classi fication accuracy is 4.9% for five site s. We offer a proof o f concept for d istributed machine learning that may be scaled up to other d isorders, modalities, and feat ure sets. 5 References 1. World Health Organization. World Health Organization Depression Fact sheet, No . 369. (2012). Available from: http://www.who.int/mediacentre/f actsheets/fs369/en/ . 2. Fried , E.I., et al. "Depression is more than the sum sc ore of its parts: individual DSM symptoms have different risk factors." Psych M ed . 2067 -2076 (2014). 3. Sch maal, L., et al. "Cortical abnorm alities in adults and adolescents with major depression based on brain scans from 20 cohorts worldwide in the ENIGMA Major Depressive Disor- der Working Group." Mol Psych. doi: 10.1038/mp.2016.60 (2016). 4. Sch maal, L., et al. "Subcortical brain alterations in ma jor depressive disorder: findings from the ENIGMA Major Depressive Disorder w orking group." Mol Psych 806-812 (2016). 5. Liao, Y., et al. "Is depression a disc onnection syndrome? Meta-analysis of diffusion ten sor imaging studies in patients with MDD." J Psych & Neurosci . 49 (2013). 6. Sambataro, F., et al. "Revisiting default mode network function in major depression: evi- dence for disrupted subsystem c onn ectivity." Psych l M ed . 2041- 2051 (2014). 7. Lo, A. , et al. "Why significant variables aren’t automatically good predictors." PNAS . 13892-13897 (2015). 8. h ttps://surfer.nmr.mg h.harvard.edu/ 9. Zh u, D., et al. Large-scale classification of major depressive disorder via distributed Lass o. Proc. of SPIE , 10160 (2017). 10. Tibshirani, R. , “ Regression shrinkage and selection via the LASSO. ” Journal of the Royal Statistical Society . 58: 267 – 288 (1996). 11. Li , Qin gyang, et al., "La rge-Sc ale Collaborative Imaging Genetics Studies of Risk Genetic Factors for Alzheimer’s Disease Ac ross Multiple In stitutions." MICCAI . 335-343 (2016). 12. Zou, H., “ The adaptive LASSO and its oracle properties. ” J. Amer. Statist. Assoc 101(476):1418-1429 (2006). 13. Koutsouleris, N., et al. Individualized differential diagnosis of schizophrenia and mood disorders using neuroanatomical biomarkers. Brain , 138 (7), 2059-2073 (2015). * * Supported in part by NIH grant U54 EB020403; see ref . 3 for additional support to co - authors for cohort recruitment.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment