Dilated deeply supervised networks for hippocampus segmentation in MRI

Dilated deeply sup ervised net w orks for hipp o campus segmen tation in MRI ? Luk as F olle 1 , Sulaiman V esal 1 , Nishant Ra vikumar 1 , Andreas Maier 1 1 P attern Recognition Lab, F riedrich-Alexan der-Universit ¨ at Erlangen-N ¨ urn b erg, German y lukas.folle@fau.de Abstract. Tissue loss in the hippo campi has b een hea vily correlated with the progression of Alzheimer’s Disease (AD). The shape and struc- ture of the hippo campus are important factors in terms of early AD diagnosis and prognosis b y clinicians. How ever, manual segmentation of suc h subcortical structures in MR studies is a challengin g and sub jective task. In this paper, we inv estigate v ariants of the well known 3D U-Net, a type of conv olution neural netw ork (CNN) for semantic segmen tation tasks. W e prop ose an alternativ e form of the 3D U-Net, which uses dilated con volutions and deep supervision to i ncorp orate m ulti-scale informat ion in to the model. The prop osed method is ev aluated on the task of hip- po campus head and bo dy segmen tation in an MRI dataset, provided as part of the MICCAI 2018 segmentation decathlon chal lenge. The exper- imen tal results show that our approach outperforms other conv en tional methods in terms of diﬀeren t segmen tation accuracy metrics. 1 In tro duction Neurodegenerative brain disorders are a ma jor cause of disability , and early mortalit y , in many dev elop ed and dev eloping countries worldwid e. Alzheimer’s disease is a t yp e of dementia that aﬀects 20 % of the p opulation ov er 80 years of age, w orldwide [1]. Curren tly , AD is t ypically only di agnosed in patien ts present- ing with symptoms of cognitive impairment, and b eha vioural changes [2]. With high-resolution MRI structural changes in the brain whic h accompany the onset of AD, can b e recognized in vivo [3]. Early disease stages classiﬁed as mild cog- nitiv e impairment that o ccur prior to AD, can also b e iden tiﬁed in some cases, and the asso ciated structural c hanges within the brain can subsequently be used as biomark ers to predict the risk of conv ersion to AD. Additionally , the rate of tissue atrophy of the hipp ocampus can b e used as a te mp oral marker to mon- itor the progression of AD. The current clinical proto col to detect volumetric c hanges in the hipp ocampus is manual segmentation, whic h is time-consuming, observ er-dep enden t and challenging [2]. Consequen tly , an automated approac h to hippo campus segmen tation is imp erativ e to impro ve the eﬃciency and ac- curacy of the diagnostic w orkﬂow. Sev eral automatic and semi-automatic seg- men tation approac hes ha ve b een prop osed, which utilize T1-w eighted structural ? Supported by the EFI pro ject: BIG-THERA. 2 L. F olle et al. MRIs, to segmen t t he hippo campus. A m ulti-atlas segmentation approach w as proposed in [4], to join tly lo calize and segmen t the hi pp o campi using the a verage of all registered atlases. In [5], robust segmen tation approach w as proposed, us- ing sub ject-sp eciﬁc 3D optimal lo cal maps, with a hybrid active contour mo del to automatically segment hippo campus. In recen t years, conv olution neural net works (CNNs) hav e achiev ed state-of- the-art performance in a v ariet y of medical image segmen tation t asks. Sp eciﬁ- cally , the U-Net [6], an e nco der-deco der net work, has receiv ed tremend ous atten- tion within the medical imaging comm unity . Expanding the U-Net to process 3D v olumes rather than 2D slices w as proposed in [7] using 3D con volutions (3D U- Net). This w as modiﬁed in [8] b y increasing the c hannels in the cen ter part of t he net work (V-Net). In this paper, w e propose a CNN for automatic segmentati on of the hipp o campus. Our net work is based on the 3D U-Net, with dilated conv o- lutions in the lo w est la yer b et w een the enco der and deco der paths [9], residual connections b et w een the con volution blo cks in the enco der path, and residual connections b et ween the conv olution blo c ks and the ﬁnal la yer of the deco der path. A sc hematic of the netw ork is presen ted in Fig. 1. The main con tribution of this pap er is the combination of dilated conv olutions in the low est part of the netw ork with the ensembl e of the deco der outputs for the ﬁnal prediction, pro viding a mechanism for ‘deep sup ervision’. W e ev aluated the performance of the net work using the hippo campus dataset pro vided as part of the Medical Segmen tation Decathlon challenge 1 hosted at MICCAI 2018, and compared it to diﬀerent 3D U-net based architectures. 1 h ttp://medicaldecathlon.com/ Fig. 1. Netw ork Arc hitecture with residual connections in the enco der path, dilated conv olutions at the lo west lay er and residual connections b et ween the deco der stages and the ﬁnal la yer. Dilated deeply supervised netw orks for hipp ocampus segmentation in MRI 3 2 Metho dology Segmen tation tasks often require integration of lo cal and global con text, in ad- dition to learning multi-scale features. How ever, training segmen tation netw orks that incorp orate these properties and act directly on volumetric data, is compu- tationally in tensive. W e ad dress this b y including d ilated con volutions within the net work to imbue greater global context during feature extraction and combine the output of the deco der la yers for the ﬁnal mask prediction, thereby encour- aging the learning of multi-scale features, while prov iding a means for eﬃcient bac kpropagation of gradien ts through the netw ork. Beyond that, this mo diﬁca- tion yields the beneﬁt of residual connections to the deco ding part while retaining the same num b er of mo del parameters. The proposed net work consists of four enco der and deco der blo c ks, each con taining tw o 3D conv olution lay ers with kernel size of 3x3x3, batch normal- ization and leaky rectiﬁed linear units (leaky RELU) as activ ation functions. The enco der blo c ks additionally use residual connections and 3D max-p ooling op erations, as illustrated in Fig. 1. The decoder blocks use 3D up-sampling with a factor of tw o. The four dilated con volution lay ers emplo yed in the b ottlenec k of the netw ork are conﬁgured such that the ﬁrst lay er uses a dilation rate of one, and each subsequen t lay er increases the dilation rate by a m ultiple of tw o, as proposed in [9]. The output of eac h deco der blo c k is up-sampled to match the dimensions of the ﬁnal mask predicted by the netw ork, follo wing which, they are all concatenated. 2.1 Data acquisition Images from 263 sub jects were provided as part of the Medical Decathlon chal- lenge 2018, for hipp ocampus head and b o dy segmentation. The sub jects w ere scanned with a T1-wei ghted MPRAGE sequence (TI / TR / TE = 860 / 8.0 / 3.7ms) and man ually annotated with the left and right, anterior and p osterior, hippo campus b y V anderbilt Univ ersity Medical Center. W e split the data set suc h that 90% were used for training and v alidating the netw ork, via nine-fold cross-v alidation, and 10% of the data-set was used for testing. As the data pro- vided was already truncated to the region of int erest around the hippo campus, v ery little data pre-pro cessing was necessary . Z-score normalization based on mean and standard deviation of the in tensities was applied to eac h patient scan. 2.2 T raining pro cedures Our mo del is trained from scratch and ev aluated using the Dice similarit y co ef- ﬁcien t (DSC), Jaccard index (JI) and normalized surface distance (NSD). DSC and JI measure the ov erlap of the ground-truth and model-predicted segmenta- tions, while NSD is computed b et ween the reconstructed surfaces. These were the oﬃcial metrics used to assess segmen tation accuracy in the decathlon chal- lenge as well. The dice co eﬃcien t loss is widely used for training segmentation 4 L. F olle et al. net works [8]. W e used a combination of binary cross entrop y and DSC loss func- tions to train all netw orks inv estigated in this study , as prop osed in [9]. This com bined loss (Eq.1) is less sensitive to cl ass imbalance and lev erages the adv an- tages of b oth loss functions. Our experiments demonstrated better segmentation accuracy when using the com bined loss in comparison to employing either indi- vidually . ζ ( y , ˆ y ) = ζ dc ( y , ˆ y ) + ζ bce ( y , ˆ y ) (1) In Eq.(1) ˆ y denotes the output of the mo del and the ground truth lab els are denoted by y . W e use the t wo-class version of the DSC loss ζ dc ( y , ˆ y ) prop osed in [8][9], the Adam optimizer with a learning rate of 0.0005, and trained the net w ork for 500 ep ochs. Additionally , the learning-rate w as reduced gradually (using a factor of 0.1), if the v alidation loss did not improv e after 10 ep ochs. T o preven t o verﬁtting and improv e the robustness of our approac h to v aried hipp ocampal shapes, we augmen ted the dataset with random rotations and ﬂipping. Based on our exp erimen ts, we found that augmen ting with large rotation angles produced w orse segmentation masks, consequently , we reduced the rotation angles to b e in the range of ± 10 degrees. 3 Results and Discussion In order to assess the p erformance of diﬀerent net works, w e used the Dice Co- eﬃcien t Score (DSC), Jaccard index and Normalised Surface Distance (NSD) with 4mm tolerance. The segmentation p erformance of our mo del, V-Net, 3D U-Net and 3D U-Net with dilated conv olutions are compared in T able 1. The V-Net ac hieved mean DSC scores of 96.8%, 87.2% and 84.8%, for the training, v alidation and test sets, respectively . The p erformance of the 3D U-Net is close to the V-Net p erformance with mean DSC scores of 96.5%, 85.5% and 86.5% respectively . 3D U-Net with dilated conv olutions was able to impro ve the scores to 97.7%, 87.8% and 87.9%, resp ectiv ely . How ev er, the proposed approach out- p erformed the others with scores of 98.4%, 88.2% and 88.2% for the training, v alidation, and test sets, resp ectiv ely . Additionally , our approach consistently outperformed the other state-of-the-art net works, in terms of the JI and NSD metrics as well, as highlighted in T able 1. In Fig. 2 the segmentati on qualit y of the prop osed metho d is visually com- pared with V-Net, 3D U-Net, and 3D U-Net with dilated con volution s. Here, red represen ts the ground-truth, yello w, green and cyan represent the predictions of V-Net, dilated 3D U-Net and the prop osed metho d, respectively . In the second column, the adv antage of dilated conv olutions is highlighted, in comparison to the V-Net, which failed to segment the small disjoint parts of the mask in the top righ t. How ever, the dilated 3D U-Net and the prop osed metho d were able to capture those areas due to the increased global context im bued in the learned features. Fig. 3 depicts 3D surface meshes of tw o diﬀeren t patients. Columns t wo and three illustrate the outputs of V-Net and our metho d, resp ectiv ely . The lo wer b oundary of the red part (b ody) of the hipp o campus in the ground-truth Dilated deeply supervised netw orks for hipp ocampus segmentation in MRI 5 Fig. 2. Each image represent s a diﬀerent MRI slice from a diﬀerent patien t. The corresponding segmen tations are o verlaid: Red contour represents ground-truth, y ellow V-Net, green 3D U-Net with dilated conv olutions and cyan our prop osed method. T able 1. Segmentation accuracy ev aluated in terms of DSC, JI and NSD for the V-Net, 3D U-Net, dilated 3D U-Net and the prop osed method. T raining V alidation T esting Methods DSC DSC DSC JI NSD V-Net 0.968 0.872 0.848 0.736 0.954 3D U-Net 0.965 0.858 0.865 0.740 0.960 3D U-Net + Dilation 0.977 0.878 0.879 0.785 0.960 Proposed metho d 0.984 0.882 0.882 0.790 0.962 surfaces, contains ridge-lik e structures which are typical of hipp ocampal struc- ture. While the V-Net pred icted surfaces are relativ ely smooth in this region, the proposed approac h is more successful in capturing these subtle shape v ariations. 4 Conclusion W e prop osed a 3D U-Net based segmentation framework with dilated con volu- tions in the deep est part of the net work and deep sup ervision in the deco der part of the netw ork. The dilated conv olutions capture global context due to their larger receptiv e ﬁelds. Deep sup ervision help ed further improv e segmenta- tion accuracy , by incorp orating multi-scale information more eﬃciently during the training process. W e show ed that our net work consistent ly outp erforms the V-Net, 3D U-Net, and 3D U-Net with dilated conv olutions, in terms of all metrics ev aluated. F uture work will aim to use the prop osed framework for segmenta- tion in whole brain MRI v olumes, and on diﬀeren t segmen tation tasks in medi cal imaging. References 1. F erri C, Prince M, Bra yne C, Brodaty H, F ratiglioni L, Ganguli M, et al. Global prev alence of dementia: A Delphi consensus study . The Lancet. 2005 12;366(9503):2112–2117. 6 L. F olle et al. Fig. 3. Rows represent 3D surface visualizations for t wo diﬀerent patients. Columns from left to right are the ground truth surfaces, and those predicted b y the V-Net and our approach, resp ectiv ely . Hipp ocampus head is visualized in green and b o dy in red. T est case 1 (a) Ground truth (b) V-Net (c) Prop osed metho d T est case 2 (d) Ground truth (e) V-Net (f ) Proposed metho d 2. Hamp el H, T eip el SJ, Buerger K. Neurobiologische F r ¨ uhdiagnostik der Alzheimer- Krankheit. Der Nervenarzt. 2007;78:1310–1318. 3. Sch uﬀ N, W o erner N, Boreta L, Kornﬁeld T, Shaw L, T ro janowski J, et al. MRI of hippo campal v olume loss in early Alzheimer’s disease in relation to Ap oE genotype and biomarkers. Brain. 2009;132(4):1067–1077. 4. Plassard AJ, McHugo M, Heck ers S, Landman BA. Multi-scale hippo campal par- cellation improv es atlas-based segmentation accuracy . In: Medical Imaging 2017: Image Processing. vol. 10133. International So ciet y for Optics and Photonics; 2017. p. 101332D. 5. Zarpalas D, Gkon tra P , Daras P , Maglav eras N. Accurate and F ully Automatic Hippo campus Segmentation Using Sub ject-Sp eciﬁc 3D Optimal Lo cal Maps Into a Hybrid Ac tive Contour Mo del. IEEE Journal of T ranslational Engineering in Health and Medicine. 2014;2:1–16. 6. Ronneberger O, Fischer P , Bro x T. U-net: Con volutional netw orks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted interv ention. Springer; 2015. p. 234–241. 7. ¸ Ci¸ cek ¨ O, Abdulk adir A, Lienk amp SS, Bro x T, Ronneb erger O. 3D U-Net: Learning Dense V olumetric Segmen tation from Sparse Annotation. In: S Ourselin MRSGU W S W ells, Josk owicz L, editors. Medical Imag e Computing and Computer-Assisted In terven tion (MICCAI). vol. 9901 of LNCS. Springer; 2016. p. 424–432. 8. Milletari F, Na v ab N, Ahmadi SA. V-Net: F ully Conv olutional Neural Netw orks for V olumetric Medical Image Segmen tation. In: 2016 F ourth International Conference on 3D Vision (3DV); 2016. p. 565–571. 9. V esal S, Raviku mar N, Maier A. Dilated Con volutions in Neural Netw orks for Left Atria l Segmentation in 3D Gadolinium Enhanced-MRI. arXiv preprin t arXiv:180801673. 2018;.

Dilated deeply supervised networks for hippocampus segmentation in MRI

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment