DeepMRSeg: A convolutional deep neural network for anatomy and abnormality segmentation on MR images

DeepMRSeg: A con v olutional deep neural net w ork for anatom y and abnormalit y segmen tation on MR images Jimit Doshi a, ∗ , Gura y Erus a , Mohamad Hab es a , Christos Da v atzik os a a Center for Biome dic al Image Computing and A nalytics, University of Pennsylvania, Philadelphia, P A, USA Abstract Segmen tation has been a ma jor task in neuroimaging. A large num b er of automated metho ds ha ve b een dev elop ed for segmen ting health y and dis- eased brain tissues. In recen t y ears, deep learning techniques ha ve attracted a lot of attention as a result of their high accuracy in diﬀeren t segmenta- tion problems. W e presen t a new deep learning based segmen tation metho d, DeepMRSeg, that can b e applied in a generic w ay to a v ariety of segmen- tation tasks. The prop osed arc hitecture com bines recen t adv ances in the ﬁeld of biomedical image segmentation and computer vision. W e use a mo d- iﬁed UNet architecture that tak es adv antage of m ultiple con volution ﬁlter sizes to achiev e multi-scale feature extraction adaptiv e to the desired seg- men tation task. Imp ortan tly , our metho d op erates on minimally pro cessed ra w MRI scan. W e v alidated our method on a wide range of segmentation tasks, including white matter lesion segmen tation, segmen tation of deep brain structures and hipp o campus segmen tation. W e provide code and pre-trained mo dels to allo w researchers apply our metho d on their own datasets. Keywor ds: MRI, Segmen tation, Deep Learning, Conv olutional Neural Net work, White Matter Lesions ∗ Corresp onding author. Address: Universit y of Pennsylv ania, Richards Building, 3700 Hamilton W alk, 7th Flo or, Philadelphia, P A 19104. T el: +1 215 662 7362. E-mail: jimit.doshi@p ennmedicine.upenn.edu Pr eprint submitte d to ArXiv July 5, 2019 1. In tro duction Segmen tation has b een a ma jor task in medical image analysis since the early years of the ﬁeld, as it enables quan tiﬁcation of normal and abnormal anatomical regions b oth for individual assessments and for comparativ e group analyses (Gonzalez-Villa et al., 2016). In neuroimaging, m ultiple automated metho ds hav e b een developed for v arious problems, such as brain extraction, segmen tation of anatomical regions of interest (ROIs), white matter lesion (WML) segmen tation and segmentation of brain tumor sub-regions. Im- p ortan tly , each of these problems hav e their o wn sp eciﬁc challenges, mainly due to v ariations in image mo dalities and imaging signatures that b est char- acterize target regions. These v ariations motiv ated dev elopment of a large n umber of distinct task-sp eciﬁc segmen tation metho ds (Kala v athi P, 2016; An b eek et al., 2004; Eugenio Iglesias and Sabuncu, 2014; Gordillo et al., 2013; Desp oto vic et al., 2015). Mac hine learning has play ed a key role in enabling no vel metho ds that ac hieved accuracy comparable to, or surpassing human raters. In the com- monly used sup ervised learning framework, examples with ground-truth la- b els are presented to the learning algorithm in order to construct a model that learns imaging patterns that characterize the target segmen tations. The mo del is then applied to new scans for segmenting the target areas on them. Common sup ervised learning techniques, suc h as the supp ort v ector ma- c hines, ha v e obtained v ery promising results. Ho wev er, they require a num b er of sophisticated prepro cessing and feature elimination/selection steps, mak- ing them vulnerable to negative eﬀects of scanner v ariations and limiting their widespread usage in clinical settings. In recent y ears, deep learning tec hniques ha ve attracted a lot of attention as a result of their state-of-the-art p erformance in m ultiple ma jor problems in computer vision and image analysis (Guo et al., 2018). In neuroimaging, con volutional neural netw orks (CNN) hav e started to gain p opularit y , with successful applications on v arious image recognition tasks (Kamnitsas et al., 2016; Akkus et al., 2017; Anw ar et al., 2018). CNNs are d eep neural net works designed to tak e adv antage of the 2D or 3D structure of the input images. An input image passes through a series of conv olution lay ers follow ed by p o oling la yers, whic h are acting together as ﬁlters to extract m ultiple translation in v ariant lo cal features, without need for the man ual feature engineering traditionally required. In this paper w e present DeepMRSeg, a deep learning based segmen tation 2 metho d that can b e applied in a generic wa y to a v ariety of segmentation tasks. Our metho d uses a mo diﬁed version of the UNet architecture (Ron- neb erger et al., 2015), with ResNet (He et al., 2015) and mo diﬁed Inception- ResNet-A (Szegedy et al., 2016) blo c ks in the enco ding and deco ding paths, taking adv an tage of recent adv ances in biomedical image segmentation and image classiﬁcation. The residual connections allo w the UNet arc hitecture to learn residual mappings, while multiple branc hes of con volutional lay ers with diﬀeren t k ernel sizes allo w the net work to learn m ulti-scale features that are adaptive to the segmen tation task at hand. Also, we replace the max- p o ol op erations in the UNet architecture with 1x1 conv olutional ﬁlters to prev ent the loss of ﬁne boundary details that are imp ortan t for a segmen- tation task. Imp ortantly , our metho d op erates on minimally pro cessed ra w MRI scans and it can b e directly applied for diﬀerent segmen tation problems after training a mo del with the sp eciﬁc training data t yp e. W e v alidated our metho d on a wide range of segmen tation tasks, including WML segmen ta- tion, segmentation of deep brain structures, and hipp o campus segmen tation. The DeepMRSeg metho d and trained mo dels are pro vided on our online platform, IPP (https://ipp.cbica.upenn.edu/), to allo w users to apply our metho ds and mo dels without need for installing an y softw are pack ages. 2. Related W ork UNet architecture was in tro duced by Ronneb erger et al. (2015). UNet has b een an imp ortant adv ancement in the application of deep CNNs to the problem of biomedical image segmentation. CNNs hav e been initially used on classiﬁcation problems by mapping input images to output class labels. Ho wev er, in segmen tation tasks, the desired output is an image, e.g. a binary segmen tation map. UNet extended the CNN architecture by supplemen ting the usual con tracting or enco ding path with a symmetric expanding or de- co ding path, where p o oling op erators are replaced b y upsampling operators. The enco ding path allo ws the architecture to learn spatially relev ant context information and the deco ding path adds the precise lo calization back to the arc hitecture, leading to a ﬁnal segmentation image as output of the mo del. A straightforw ard wa y for impro ving the p erformance of deep neural net- w orks is to increase the netw ork size, either b y increasing the depth (n umber of lay ers) or the width (n umber of units at eac h la yer). The ResNet architec- ture (He et al., 2015), was in tro duced to address an imp ortant limitation of deep learning, kno wn as the degradation problem. As a net work goes deep er, 3 the gradient of the error function used for the w eight up dates ma y v anish, resulting in degrading accuracy . The main idea in ResNet is to learn a “resid- ual mapping” instead of directly learning the desired underlying mapping to address this problem. F or this purp ose, shortcut connections that skip one or more lay ers are introduced. These shortcut connections are identit y mappings that simply add their outputs to the outputs of the stack ed con- v olutional lay ers, th us making the lay er learn the residual. ResNet allow ed training deep er net w orks, while pro viding signiﬁcantly faster conv ergence. Large conv olution op erations are computationally expensive, as the netw ork will hav e a larger n umber of parameters to learn. The Inception framework (Szegedy et al., 2014), was prop osed to o vercome this limitation b y intro- ducing sparsity in the netw ork arc hitecture. The main idea is to constrain the netw ork to low er dimensions and group them into highly correlated ﬁlter units. The authors used diﬀeren t types of conv olutional ﬁlter sizes (1x1, 3x3, 5x5) instead of a single ﬁlter size (3x3), allowing the netw ork to learn diﬀer- en t representations of the input image. This strategy helped to reduce the n umber of parameters/connections, thus reducing the complexity of the net- w ork and allowing the netw ork to go wider while keeping the computational budget constan t. As noted by Krizhevsky et al. (2012), when con volutional ﬁlters are arranged in diﬀeren t groups, the net work can learn distinct features from eac h group, with lo w correlation b et ween the learned features across groups. This w as demonstrated in AlexNet, where the net work consistently iden tiﬁed color-agnostic and color-sp eciﬁc features in diﬀerent ﬁlter groups. The same concept also applies to Inception netw ork, where through the use of diﬀeren t con volutional ﬁlters at a single lay er the netw ork learns feature representa- tions at diﬀeren t resolution levels. 3. Net w ork Architecture 3.1. Overview An o verview of the proposed architecture is illustrated in ﬁgure 1. DeepMRSeg is built up on comp onen ts that combine ideas from recent adv ances in the ﬁeld, as describ ed in the previous section. The netw ork arc hitecture consists of an enco ding path and a corresp onding deco ding path, as in UNet (Ron- neb erger et al., 2015), follo w ed b y a v oxel-wise, m ulti-class soft-max classiﬁer to pro duce class probabilities for eac h vo xel indep endently . An initial pro- jection lay er transforms the input feature maps (m) in to the desired n umber 4 of features (f ). These features go through a pre-enco ding blo c k, consisting of ResNet blo cks that extract v arious features from the input images and forms the input to the UNet. The encoding path of the netw ork consists of en- co der blo c ks that op erate at diﬀerent feature map resolutions. A t each la yer, the feature maps are subsampled using the “transition down” op eration and they are fed into a ResInc blo ck. The size of the feature maps decreases at eac h lay er, while their receptiv e ﬁeld increases, thus enco ding more context information into the net work. The deco ding path wa y includes up-sampling op erations symmetric to the encoding blo c ks, coupled with ResInc blo cks. In- dividual comp onents of the DeepMRSeg architecture are explained in more details b elo w. Figure 1: Overview of the DeepMRSeg netw ork arc hitecture 3.2. Pr oje ction L ayer This lay er is used at the start of the netw ork to pro ject the input image c hannels, or mo dalities (m), to a set of feature maps (f ). This is accomplished with a con volution lay er with a kernel size of 1x1 and stride 1. In tuitively , this la yer learns a linear combination of the input c hannels and pro jects 5 it on to the desired num b er of feature maps required for subsequen t la yers. The conv olution in this lay er is follo wed b y batc h normalization (Ioﬀe and Szegedy, 2015) and ReLU activ ation (Nair and Hinton, 2010). 3.3. R esNet Mo dule F ollowing the general design principles describ ed in (Szegedy et al., 2015), w e av oid represen tational b ottlenecks with extreme compression early in the net work. T o ac hiev e this, we add the traditional ResNet blo c k (ﬁg. 2.A) b efore the enco ding path wa y to ensure that the input features go through a few la yers of conv olutions b efore b eing fed in to the ResInc blo c k. Figure 2: The ResNet and ResInc blo ck architectures. The input data is the set of “f ” feature maps obtained from the output of the previous la yer. 3.4. R esInc Mo dule The ResInc mo dule (ﬁg. 2.B), mo diﬁed from the Inception-ResNet-A mo dule of the Inception-ResNet-v2 arc hitecture, is used at ev ery lev el of the UNet arc hitecture, coupled with the “T ransition Down” op eration. This mo dule splits the incoming input feature maps in to 4 branches and an iden- tit y mapping that is added bac k to the ﬁnal output. Each branch reduces the 6 input dimensionality from “f ” feature maps to “f/4” feature maps. This in- tro duces a b ottleneck by reducing the dimensionalit y of the incoming features and also reduces the num b er of learnable parameters in the netw ork. Eac h branc h subsequen tly transforms the input features with a v arying n um b er of 3x3 con volution lay ers. This ensures that each branc h learns a diﬀeren t represen tation of the input features b y learning shallow as w ell as deep er features, and allows the subsequent la yer to abstract features from diﬀerent scales sim ultaneously . This prop erty can b e extremely useful when dealing with segmen tation tasks for more complex or heterogeneous structures. The 4 branc hes are concatenated to form a single feature v ector that then go es through a 1x1 con volution la y er b efore the residual connection is added. This 1x1 con volution acts as a linear weigh ting of the features learned in eac h branc h. Thus, if a certain representation is not useful, it can b e assigned a lo wer weigh t. This conﬁguration of the ResInc mo dule has less than one-third the num- b er of learnable parameters compared to the traditional ResNet blo c k. This allo ws us to increase the width and depth of the netw ork while keeping the n umber of learnable parameters low. 3.5. T r ansition Down Blo cks In the traditional UNet architecture, a maxp o ol la yer is used to reduce the dimensionality of the feature maps. This allows the subsequen t conv o- lutional lay ers to ha ve a larger ﬁeld of view and therefore, more con textual information. This op eration is used to achiev e translation in v ariance ov er small spatial shifts in the input image. Adding several suc h op erations in the net work achiev es more translation inv ariance for robust classiﬁcation, but it can also lead to a loss of spatial resolution (b oundary detail) of the input fea- ture maps. This lossy representation of the b oundary details is not desirable for segmen tation tasks where b oundary delineation is imp ortant. Motiv ated by the w ork done by (Springenberg et al., 2014), w e hav e re- placed the maxp o ol la y er with a 1x1 conv olution lay er with stride 2, with the in tuition that w e let the netw ork learn the parameters required for achiev- ing the do wnsampling of the feature maps. This adds a few more learnable parameters in the netw ork, resulting in a larger mo del size compared to max- p o oling. Ho wev er, while maxp o oling w orks on subsampling within each fea- ture map, the prop osed conv olution op eration allows to mo del inter-feature dep endencies. 7 The c hoice of kernel size was based on minimizing the n umber of learnable parameters outside of blo c ks containing residual connections. This op eration serv es the tw o tasks of subsampling the feature maps and doubling the num- b er of feature maps sim ultaneously . This is follow ed by batc h normalization and ReLU activ ation. 3.6. T r ansition Up Blo cks F or upsampling the feature maps, we use a transp osed conv olution lay er with a kernel size of 1 and stride 2. Upsampling the feature maps allo ws the addition of the skip connection from the previous la y er from the enco ding path. These skip connections are essential as they add the spatial lo calization information back into the net work. Along with upsampling, the num b er of output feature maps are reduced to one-forth. 3.7. T r aining Metho dolo gy The cost function to b e minimized w as a com bination of softmax cross- en tropy ( l C E ), mean squared error ( l M S E ) and soft Intersection Over Union ( l I O U ). l M S E and l I O U w ere calculated b et ween the one-hot enco ded lab els and the predicted soft max probabilities. The ﬁnal loss, l T OT w as an equal w eighted sum of l C E , l M S E and l I O U . Adam optimizer was used to minimize the loss function ( l T OT ). W e started with a learning rate of 0 . 05 and used a deca yed learning rate sc hedule with a decay factor of 0 . 98 ev ery ep o ch. Data augmentation was done using randomized left/right ﬂipping, follow ed b y random translation, rotation and brightness/con trast adjustmen t. 3.8. Evaluation Datasets W e p erformed v alidation experiments on 3 diﬀerent segmentation prob- lems, sp eciﬁcally WML, mid-brain and hippo campus segmentation. W e used publicly a v ailable datasets with ground-truth lab els for eac h problem. WML segmen tation: W e used the training dataset that w as provided as part of the MICCAI 2017 WML Segmentation Challenge (Kuijf et al., 2019). This dataset included 3D T1-weigh ted and 2D m ulti-slice FLAIR scans from 60 sub jects and manually delineated WML masks for these scans. The MRI scans were acquired from three diﬀeren t institutes/scanners: the Univ ersity Medical Center (UMC), Utrec ht, VU Univ ersity Medical Centre (VU), Amsterdam, and the National Univ ersity Health System (NUHS), Sin- gap ore. Manual segmen tations w ere generated by one exp ert rater, following 8 Figure 3: Example ground-truth segmentation for the WML. the ST andards for Rep ortIng V ascular c hanges on nEuroimaging (STRIVE) proto col (W ardlaw et al., 2013) (Figure 3). Deep brain segmen tation: W e applied DeepMRSeg for segmen tation of deep brain structures using the publicly av ailable datset from MICCAI 2013 segmen tation challenge (Asman et al., 2013). This dataset included T1- w eighted scans of 35 sub jects from OASIS pro ject and corresp onding man u- ally created reference lab els for 14 deep brain structures. The target regions of in terest included accumbens, am ygdala, caudate, hipp o campus, pallidum, putamen and thalamus, separately for the left and the righ t hemispheres (ﬁg 4). Hipp o campus segmen tation: W e also applied DeepMRSeg for segment- ing the hipp o campus, a structure critical in learning and memory , and partic- ularly vulnerable to damage in early stages of AD (Mu and Gage, 2011). W e used the dataset provided as part of the Decathlon c hallenge, consisting of 3D T1-weigh ted MPRAGE scans of 195 sub jects and manual hipp o campus segmen tations in to t wo sub-regions, hipp o campus tail and b o dy (Simpson et al., 2019) (Figure 5). 9 Figure 4: Example ground-truth segmentation for deep brain structures. Figure 5: Example ground-truth segmentation for the hipp ocampus. 10 4. Ev aluation Exp erimen ts and Metrics W e compared DeepMRSeg against a mo diﬁed UNet arc hitecture where a batch normalization la y er w as added betw een the con v olution and ReLU la yers. This was used as the b enchmark metho d to compare the prop osed arc hitecture against. The loss function, optimizer and data augmentation w as the same as the one used for the prop osed arc hitecture. The tw o net- w ork mo dels were trained using the appropriate t yp e of lab eled data for eac h sp eciﬁc segmen tation task. W e p erformed four-fold cross-v alidation in all ex- p erimen ts, with 25% of the data left out for testing and the remaining data used for training and v alidation. This w as rep eated 20 times with random- ization, giving us a robust estimate of the p erformance of the net works. W e quantiﬁed the p erformance of the net works using three complemen- tary metrics that are commonly used for measuring the ov erlap b etw een binary segmentation masks. W e rep ort the F 1 score (also known as the Dice co eﬃcien t), the F 2 score and the balanced accuracy b etw een automated and exp ert delineated ground-truth segmen tations. W e calculated the scores indi- vidually for each sub ject and target R OI, and w e rep orted mean and standard deviation of the scores across all sub jects. In neuroimaging analyses, rather than binary segmentation masks, total v olumes of segmen ted regions are often used as primary v ariables of inter- est. F or this reason, w e also calculated metrics to ev aluate the accuracy of total v olume estimations from the segmentations. W e used the concordance correlation co eﬃcient ( ρ c ), a metric that measures the agreement b et ween t wo v ariables and that is commonly applied for ev aluating repro ducibility or in ter-rater consistency . W e rep orted the ρ c b et ween automated and ground- truth segmen tation volumes for all sub jects. The metrics that are used in our ev aluations are brieﬂy describ ed b elow. The F 1 score or Dice co eﬃcient (Dice, 1945) is a spatial ov erlap statistic used to gauge the similarit y of tw o segmen tations. It’s deﬁned as: D S C = F 1 = 2 | X ∩ Y | | X | + | Y | = 2 T P 2 T P + F P + F N where X and Y are the predicted and ground truth lab els, T P , T N , F P and F N are the true p ositive, true negative, false p ositive and false negative rates resp ectiv ely . The F 2 score is commonly used in applications where recall is more im- p ortan t than precision (as compared to F 1 ): 11 F 2 = 5 T P 5 T P + 4 F N + F P Considering that our target datasets may typically b e imbalanced, i.e. the foreground (target segmen tation) ma y b e m uch smaller compared to the bac kground, w e also rep ort the balanced accuracy ( B AC C ), whic h is deﬁned as: B AC C = T P R + T N R 2 where T P R = T P T P + F N and T N R = T N T N + F P The concordance correlation co eﬃcien t is deﬁned as: ρ c = 2 ρσ x σ y σ 2 x + σ 2 y + ( µ x − µ y ) 2 where µ x and µ y are the means and σ x and σ y are the v ariances of the t wo v ariables, and ρ is the correlation correlation b etw een them. 5. Exp erimen tal Results 5.1. White matter lesion se gmentation The distribution of balanced accuracy , F 1 and F 2 scores for the seg- men tations obtained using UNet and DeepMRSeg are shown in ﬁgure 6. DeepMRSeg obtained a signiﬁcantly b etter balanced accuracy and F 2 score. The mean Dice ( F 1 ) score for b oth metho ds w as similar with no signiﬁcant diﬀerences. DeepMRSeg also obtained a signiﬁcantly higher ρ c score (T able 1). T able 1: Scores for the 4 ev aluation metrics for segmen tation of WML using UNet and DeepMRSeg mo dels. F or the three ov erlap metrics, B AC C , F 1 and F 2 , we rep ort mean and standard deviation across all sub jects. B AC C F 1 F 2 ρ c Unet DeepMRSeg Unet DeepMRSeg Unet DeepMRSeg Unet DeepMRSeg WML 0.876 (0.06) 0.889 (0.06) 0.768 (0.10) 0.765 (0.10) 0.759 (0.10) 0.769 (0.10) 0.956 0.962 12 Figure 6: Distribution of scores for the 3 ev aluation metrics for segmentation of WML using UNet and DeepMRSeg models 5.2. Mid-br ain se gmentation UNet and DeepMRSeg net works w ere applied for segmenting each scan in to 14 target R OIs. W e calculated ev aluation scores for each ROI indep en- den tly . The distribution of o v erlap scores o v er all R OIs and sub jects are sho wn in ﬁgure 7. DeepMRSeg obtained a signiﬁcan tly higher balanced ac- curacy for eac h R OI indpendently , as well as on av erage across all R OIs. DeepMRSeg also obtained higher ρ c for all R OIs except left caudate and left and righ t pallidum (table 2). Figure 7: Distribution of scores for the 3 ev aluation metrics for segmentation of deep brain structures using UNet and DeepMRSeg mo dels 13 T able 2: Scores for the 4 ev aluation metrics for segmen tation of deep brain structures using UNet and DeepMRSeg mo dels. F or the three ov erlap metrics, B AC C , F 1 and F 2 , w e report mean and standard deviation across all sub jects. B AC C F 1 F 2 ρ c ROI Unet DeepMRSeg Unet DeepMRSeg Unet DeepMRSeg Unet DeepMRSeg Right Accumbens Area 0.867 (0.05) 0.894 (0.05) 0.762 (0.05) 0.765 (0.07) 0.743 (0.07) 0.777 (0.09) 0.682 0.764 Left Accumbens Area 0.855 (0.05) 0.896 (0.05) 0.755 (0.06) 0.775 (0.06) 0.725 (0.08) 0.784 (0.08) 0.565 0.808 Right Amygdala 0.845 (0.05) 0.88 (0.03) 0.751 (0.06) 0.782 (0.04) 0.712 (0.08) 0.767 (0.05) 0.266 0.466 Left Amygdala 0.855 (0.04) 0.889 (0.03) 0.764 (0.05) 0.798 (0.04) 0.73 (0.07) 0.785 (0.05) 0.349 0.592 Right Caudate 0.928 (0.04) 0.945 (0.05) 0.883 (0.06) 0.893 (0.07) 0.866 (0.08) 0.89 (0.08) 0.802 0.854 Left Caudate 0.943 (0.02) 0.948 (0.03) 0.891 (0.04) 0.897 (0.05) 0.887 (0.04) 0.896 (0.05) 0.912 0.900 Right Hipp ocampus 0.892 (0.03) 0.924 (0.03) 0.833 (0.03) 0.858 (0.03) 0.802 (0.05) 0.852 (0.04) 0.425 0.736 Left Hipp ocampus 0.893 (0.03) 0.921 (0.03) 0.832 (0.03) 0.856 (0.03) 0.803 (0.05) 0.847 (0.04) 0.396 0.662 Right Pallidum 0.916 (0.04) 0.952 (0.03) 0.854 (0.05) 0.859 (0.04) 0.84 (0.06) 0.886 (0.04) 0.778 0.687 Left Pallidum 0.927 (0.05) 0.955 (0.03) 0.857 (0.06) 0.856 (0.04) 0.854 (0.08) 0.886 (0.04) 0.674 0.473 Right Putamen 0.941 (0.02) 0.953 (0.02) 0.901 (0.03) 0.908 (0.03) 0.889 (0.04) 0.906 (0.04) 0.863 0.899 Left Putamen 0.939 (0.03) 0.955 (0.03) 0.899 (0.04) 0.907 (0.04) 0.886 (0.05) 0.908 (0.05) 0.815 0.894 Right Thalamus Prop er 0.936 (0.02) 0.959 (0.02) 0.9 (0.02) 0.914 (0.01) 0.883 (0.04) 0.916 (0.02) 0.752 0.906 Left Thalamus Prop er 0.946 (0.02) 0.954 (0.02) 0.906 (0.01) 0.912 (0.01) 0.898 (0.03) 0.91 (0.02) 0.849 0.888 Average 0.906 (0.02) 0.93 (0.02) 0.842 (0.03) 0.856 (0.03) 0.823 (0.04) 0.858 (0.04) 0.987 0.993 5.3. Hipp o c ampus se gmentation The hipp o campus w as segmented into tw o sub-regions using UNet and DeepMRSeg. W e calculated ov erlap scores for eac h sub-region indep enden tly . DeepMRSeg obtained a signiﬁcantly b etter accuracy for b oth hipp o campus sub-regions (Figure 8 and table 3). T able 3: Scores for the 4 ev aluation metrics for segmentation of hipp o campus sub-regions using UNet and DeepMRSeg mo dels. F or the three ov erlap metrics, B AC C , F 1 and F 2 , w e report mean and standard deviation across all sub jects. B AC C F 1 F 2 ρ c ROI Unet DeepMRSeg Unet DeepMRSeg Unet DeepMRSeg Unet DeepMRSeg Anterior 0.917 (0.03) 0.925 (0.03) 0.866 (0.04) 0.869 (0.04) 0.848 (0.06) 0.858 (0.05) 0.765 0.800 Posterior 0.908 (0.03) 0.920 (0.03) 0.849 (0.05) 0.858 (0.04) 0.830 (0.06) 0.847 (0.05) 0.624 0.734 Average 0.913 (0.03) 0.922 (0.02) 0.857 (0.04) 0.862 (0.04) 0.839 (0.05) 0.852 (0.04) 0.726 0.786 14 Figure 8: Distribution of scores for the 3 ev aluation metrics for segmentation of hipp o cam- pus sub-regions using UNet and DeepMRSeg models 6. Conclusions W e presen ted a no vel deep learn ing based MRI segmen tation metho d that com bines elements from recen t ma jor adv ances in the ﬁeld. The prop osed net- w ork architecture w as built with tw o main motiv ations: providing a generic to ol that can be used for diﬀeren t t yp es of segmentation problems, rather than b eing sp eciﬁc to a single type of target lab el; and allowing a wide range of users to easily segment their images by directly using their raw T1 scans without need for an y pre-pro cessing steps. In our v alidation experiments, w e sho wed that DeepMRSeg outp erformed a standard UNet implementation used as b enc hmark in three diﬀeren t segmen tation tasks, ac hieving highly accurate segmen tations in all tasks. W e provide code and pre-trained mo dels that can b e used for applying DeepMRSeg on new scans. References Akkus, Z., Galimzianov a, A., Ho ogi, A., Rubin, D. L., Eric kson, B. J., Aug 2017. Deep learning for brain mri segmentation: State of the art and future directions. Journal of Digital Imaging 30 (4), 449–459. An b eek, P ., Vinck en, K. L., v an Osch, M. J., Bissc hops, R. H., v an der Grond, J., 2004. Probabilistic segmentation of white matter lesions in mr imaging. NeuroImage 21 (3), 1037 – 1044. An war, S. M., Ma jid, M., Qa yyum, A., Awais, M., Alnow ami, M., Khan, M. K., No v. 2018. Medical image analysis using con volutional neural net- 15 w orks: A review. J. Med. Syst. 42 (11), 1–13. URL https://doi.org/10.1007/s10916- 018- 1088- 1 Asman, A., Akhondi-Asl, A., W ang, H., T ustison, N., Av ants, B., W arﬁeld, S. K., Landman, B., 2013. Miccai 2013 segmentation algorithms, theory and applications (sata) c hallenge results summary . In: MICCAI Chal- lenge W orkshop on Segmentation: Algorithms, Theory and Applications (SA T A). Desp oto vic, I., Go ossens, B., Philips, W., 2015. MRI segmen tation of the hu- man brain: Challenges, metho ds, and applications. Comp. Math. Metho ds in Medicine 2015, 450341:1–450341:23. Dice, L. R., 1945. Measures of the amoun t of ecologic asso ciation betw een sp ecies. Ecology 26 (3), 297–302. Eugenio Iglesias, J., Sabuncu, M., 12 2014. Multi-atlas segmentation of biomedical images: A surv ey . Medical image analysis 24. Gonzalez-Villa, S., Oliv er, A., V alverde, S., W ang, L., Zwiggelaar, R., Llado, X., 2016. A review on brain structures segmen tation in magnetic resonance imaging. Artiﬁcial In telligence in Medicine 73, 45 – 69. Gordillo, N., Montsen y , E., Sobrevilla, P ., 2013. State of the art survey on mri brain tumor segmentation. Magnetic Resonance Imaging 31 (8), 1426 – 1438. Guo, Y., Liu, Y., Georgiou, T., Lew, M. S., Jun 2018. A review of semantic segmen tation using deep neural netw orks. International Journal of Multi- media Information Retriev al 7 (2), 87–93. He, K., Zhang, X., Ren, S., Sun, J., 2015. Deep residual learning for image recognition. arXiv preprin t abs/1512.03385. Ioﬀe, S., Szegedy , C., 2015. Batch normalization: Accelerating deep net work training by reducing internal co v ariate shift. arXiv preprint abs/1502.03167. URL Kala v athi P , P . V., 2016. Methods on skull stripping of mri head scan images- a review. J Digit Imaging 29 (3), 365 – 379. 16 Kamnitsas, K., Ledig, C., New combe, V. F. J., Simpson, J. P ., Kane, A. D., Menon, D. K., Ruec kert, D., Glo c ker, B., 2016. Eﬃcien t m ulti-scale 3d CNN with fully connected CRF for accurate brain lesion segmentation. arXiv preprin t abs/1603.05959. URL Krizhevsky , A., Sutsk ever, I., Hinton, G. E., 2012. Imagenet classiﬁcation with deep conv olutional neural net w orks. In: P ereira, F., Burges, C. J. C., Bottou, L., W einberger, K. Q. (Eds.), Adv ances in Neural Information Pro cessing Systems 25. Curran Asso ciates, Inc., pp. 1097–1105. Kuijf, H. J., Biesbroek, J. M., de Bresser, J., Heinen, R., Andermatt, S., Ben to, M., Berseth, M., Bely aev, M., Cardoso, M. J., Casamitjana, A., Collins, D. L., Dadar, M., Georgiou, A., Ghafo orian, M., Jin, D., Kha demi, A., Knight, J., Li, H., Llad, X., Luna, M., Mahmo o d, Q., McKinley , R., Mehrtash, A., Ourselin, S., P ark, B., Park, H., P ark, S. H., P ezold, S., Puy- bareau, E., Rittner, L., Sudre, C. H., V alv erde, S., Vilaplana, V., Wiest, R., Xu, Y., Xu, Z., Zeng, G., Zhang, J., Zheng, G., Chen, C., v an der Flier, W., Barkhof, F., Viergev er, M. A., Biessels, G. J., 2019. Standardized as- sessmen t of automatic segmentation of white matter h yp erintensities; re- sults of the wmh segmen tation challenge. IEEE T ransactions on Medical Imaging, 1–1. Mu, Y., Gage, F. H., 2011. Adult hipp o campal neurogenesis and its role in alzheimer’s disease. In: Molecular Neuro degeneration. Nair, V., Hinton, G. E., 2010. Rectiﬁed linear units improv e restricted b oltz- mann machines. In: Pro ceedings of the 27th International Conference on International Conference on Mac hine Learning. ICML’10. Omnipress, USA, pp. 807–814. Ronneb erger, O., Fisc her, P ., Brox, T., 2015. U-net: Con v olutional net works for biomedical image segmen tation. arXiv preprint Simpson, A. L., Antonelli, M., Bak as, S., Bilello, M., F arahani, K., v an Ginnek en, B., Kopp-Sc hneider, A., Landman, B. A., Litjens, G. J. S., Menze, B. H., Ronneb erger, O., Summers, R. M., Bilic, P ., Christ, P . F., Do, R. K. G., Gollub, M., Golia-Pernic k a, J., Hec k ers, S., Jarnagin, W. R., McHugo, M., Nap el, S., V oron tsov, E., Maier-Hein, L., Cardoso, M. J., 2019. A large annotated medical image dataset for the developmen t and 17 ev aluation of segmentation algorithms. arXiv preprint abs/1902.09063. URL Springen b erg, J. T., Dosovitskiy , A., Bro x, T., Riedmiller, M., 2014. Striving for simplicit y: The all conv olutional net. arXiv preprin t Szegedy , C., Ioﬀe, S., V anhouc ke, V., 2016. Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv preprin t abs/1602.07261. URL Szegedy , C., Liu, W., Jia, Y., Sermanet, P ., Reed, S. E., Anguelov, D., Erhan, D., V anhouck e, V., Rabino vic h, A., 2014. Going deep er with con volutions. arXiv preprin t abs/1409.4842. URL Szegedy , C., V anhouc ke, V., Ioﬀe, S., Shlens, J., W o jna, Z., 2015. Re- thinking the inception arc hitecture for computer vision. arXiv preprin t abs/1512.00567. URL W ardlaw, J., Smith, E., J Biessels, G., Cordonnier, C., F azek as, F., F rayne, R., Lindley , R., O’Brien, J., Barkhof, F., R Benav en te, O., E Black, S., Bra yne, C., Breteler, M., Chabriat, H., DeCarli, C., De Leeuw, F.- E., Doubal, F., Duering, M., C F o x, N., for Rep ortIng V ascular c hanges on nEuroimaging (STRIVE), 2013. Neuroimaging standards for researc h in to small v essel disease and its con tribution to ageing and neuro degener- ation. The Lancet Neurology 12, 822–838. 18

DeepMRSeg: A convolutional deep neural network for anatomy and abnormality segmentation on MR images

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment