Domain specific cues improve robustness of deep learning based segmentation of ct volumes

Domain sp eciﬁc cues impro v e robustness of deep learning based segmen tation of ct v olumes Marie Klo enne 1 , 2 ? , Sebastian Niehaus 1 , 3 ? , Leonie Lamp e 1 , Alb erto Merola 1 , Janis Reinelt 1 , Ingo Ro eder 3 , 4 , and Nico Sc herf 4 , 5 1 AICURA medical, Bessemerstrasse 22, 12103 Berlin, Germany firstname.lastname@aicura-medical.com 2 T ec hnische F akult¨ at, Univ ersit¨ at Bielefeld, Universit¨ atsstrasse 25, 33615 Bielefeld, German y 3 Institute for Medical Informatics and Biometry , Carl Gustav Carus F acult y of Medicine, T ec hnische Univ ersit¨ at Dresden, F etscherstrasse 74, 01307 Dresden, German y 4 National Cen ter of T umor Diseases (NCT), Partner Site Dresden, 01307 Dresden, German y 5 Max Planc k Institute for Human Cognitive and Brain Sciences, Stephanstrasse 1a, 04103 Leipzig, German y Abstract. Mac hine Learning has considerably impro ved medical image analysis in the past y ears. Although data-driven approaches are intrin- sically adaptive and thus, generic, they often do not p erform the same w ay on data from diﬀerent imaging modalities. In particular Computed tomograph y (CT) data p oses man y challenges to medical image seg- men tation based on con volutional neural netw orks (CNNs), mostly due to the broad dynamic range of in tensities and the v arying num b er of recorded slices of CT volumes. In this pap er, we address these issues with a framew ork that com bines domain-speciﬁc data prepro cessing and augmen tation with state-of-the-art CNN architectures. The focus is not limited to optimise the score, but also to stabilise the prediction p erfor- mance since this is a mandatory requirement for use in automated and semi-automated w orkﬂows in the clinical en vironment. The framew ork is v alidated with an architecture comparison to show CNN arc hitecture-indep enden t eﬀects of our framew ork functionalit y . W e compare a modiﬁed U-Net and a mo diﬁed Mixed-Scale Dense Netw ork (MS-D Net) to compare dilated conv olutions for parallel multi-scale pro- cessing to the U-Net approach based on traditional scaling operations. Finally , we prop ose an ensemble mo del combining the strengths of dif- feren t individual metho ds. The framew ork performs w ell on a range of tasks suc h as liver and kidney segmen tation, without signiﬁcant diﬀerences in prediction p erformance on strongly diﬀering volume sizes and v arying slice thickness. Thus our framew ork is an essen tial step tow ards p erforming robust segmentation of unkno wn real-world samples. ? The authors con tributed equally to this pap er. In tro duction Spatial c haracteristics of tumours lik e size, shape, lo cation or gro wth pattern are cen tral clinical features. Changes in these characteristics are essential indicators of disease progression and treatmen t eﬀects. Automated, quan titative assessment of these c haracteristics and their c hanges from radiological images w ould yield an eﬃcien t and ob jectiv e to ol for radiologists to monitor the course of the disease. Th us, a reliable and accurate automated segmentation metho d is desirable to extract spatial tumour and organ characteristics from computed tomography (CT) v olumes. In recent years, conv olutional neural netw orks (CNNs) (Krizhevsky , Sutsk ev er, & Hin ton, 2012) b ecame the state of the art method for image segmen tation, as well as many other tasks in computer vision (V oulodimos, Doulamis, Doulamis, & Protopapadakis, 2018), suc h as image classiﬁcation, ob ject detection and ob ject trac king (Mo en et al., 2019). The applications of CNNs are div erse, but the general data handling or prepro cessing is often very similar in each case since the feature extraction is p erformed internally by the CNN itself. Improv emen ts in the application of CNNs for medical image processing often address the neural net work arc hitecture, the training algorithm or the use case (Minnemaa et al., 2018; Chlebus, Schenk, Moltz, v an Ginneken, & Hahn, 2018). At the same time, most authors tend to ignore the data handling itself, treating medical images suc h as CT volumes the same wa y as grayscale images or RGB images just with additional dimensions. Ho wev er, this approac h neglects prior information about the sp eciﬁc ph ys- ical pro cesses that underlie images acquisition and determine image contrast, p ossibly leading to sub optimal and sometimes inaccurate image analysis. F or instance, while most image formats map pixels on relative scales of a few hun- dred v alues, vo xels in CT volumes are mapp ed on the Hounsﬁeld scale (Bro der, 2011), a quantitativ e mapping of radio density calibrated such that the v alue for air is -1000 Hounsﬁeld Units (HU) and that for w ater is 0 HU, with v alues in the h uman b ody reaching up to ab out 2000 HU (cortical bone). Therefore, in con trast to most standard images where pixel intensities themselves migh t not b e meaningful, the actual grey v alues of CT volumes carry tissue-sp eciﬁc information (Brenner, 2007), and sp ecial consideration is required to leverage it. The tissue-speciﬁc information also means, that CT data t ypically contains a range of v alues that are not necessarily relev an t for a particular diagnostic ques- tion (Costello e e t al., 2013; Harris, Adams, Lloyd, & Harvey , 1993). Thus, when radiologists insp ect CT volumes for diagnosis, they typically rely on windowing, i.e. they restrict the range of display ed grey v alues to fo cus the image infor- mation to relev ant v alues. CNN-based image segmentation framew orks rarely include suc h p oten tially essential steps from the exp ert workﬂo w. They are as- suming that the data only has to b e normalised and the net work will then learn b y itself to fo cus on the relev ant image regions. In this paper, we address the c hallenges of a clinically meaningful CT vol- ume pro cessing and present a domain speciﬁc framew ork for CNN based image segmen tation. The prop osed framework is inspired b y insights on b oth the data acquisition pro cess and the diagnostic pro cess p erformed b y the radiologist, ad- dressing in particular the spatial information CT volumes and the use of the HU scale. Our fo cus is not on the optimisation of the loss function on the whole dataset, but instead on obtaining a robust segmen tation quality , indep enden t of the dif- ferences in size and shap e of the input volumes. F or this reason, we also consider the standard deviation of the dice score as a measure of robustness for ev aluation. If we use a segmentation mo del in an automated or semi-automated pro cess in whic h the result of the segmentation is not directly analysed, particularly strong segmen tation errors p ose a problem b ecause the user tends to rely on the seg- men tation mo del and only analyse the ﬁnal result of the pro cess. Therefore, our goal is to speciﬁcally address the demands of algorithms for CT pro cessing in the clinical environmen t, where w e require algorithms to pro cess eac h volume consisten tly and without signiﬁcant diﬀerences in the quality of the output. W e ev aluated the framework with a mixed-scale dense conv olutional neural net work (MS-D Net) (Pelt & Sethian, 2017) with dilated con volutions and the nnU-Net (Isensee et al., 2018) with traditional scaling op erations, which is a mo diﬁed U-Net (Cicek, Abdulk adir, Lienk amp, Brox, & Ronneberger, 2015). W e consider b oth a 2D-CNN and a 3D-CNN implementation for each architecture. Finally , w e sho w an ensemble CNN, which allows combining the longitudinal information leveraged in 3D-CNNs with the prop ortionally higher v alue of each segmen ted v oxel in the 2D-CNNs training pro cess, resulting in more accurate results from a theoretical p oin t of view. The t ypical assumption b ehind cross- v alidation is that the data set is represen tative of y et to b e seen real data, and the test or v alidation sample should also reﬂect this. Th us, w e would usually balance all folds, so they contain typical samples and also p ossible outliers. But we wan t to assess how robust the trained mo dels are and thus we do not randomly mix the folds. Instead, we assign each sample to a fold dep ending on the num b er and thickness of its slices. This w ay , we will alw ays ha ve samples in the test set that are indep enden t of the training data, and w e simulate the w orst-case scenario for the application in the clinical environmen t. In order to make the results repro ducible, we use op en datasets for training and ev aluation. W e train and v alidate the CNN-mo dels for kidney tumour segmen tation on the dataset of the 2019 Kidney T umor Segmentation Challenge (Heller et al., 2019). F or the liver segmentation, we use the dataset of the CHAOS - Com bined (CT-MR) Health y Ab dominal Organ Segmen tation Challenge (Selver et al., 2019). It seems lik e the rise of Deep Learning metho ds in medical image analysis has split the comm unity in to tw o factions: those who em brace such methods and those who do not trust them. W e think that to apply Deep Learning in a clin- ical setting, the CNN architectures and the entire workﬂo w for data pro cessing and augmentation need to b e adapted, requiring considerable knowledge of the diagnostic question and the imaging mo dalit y at hand. In this work, we w ant to sho w that in order to build clinically applicable CNN-based frameworks, we require diﬀeren t exp ertise and input from tec hnical and medical domain experts. Metho d In the follo wing, w e describe the data preprocessing and augmen tation in section 1, the netw ork architectures in section 1 and the training pro cedure in section 1. The prepro cessing includes volume shape reduction and grey-v alue windowing. The prop osed augmentation addresses the scarcity of data, with the aim of pro- viding additional samples for the training pro cedure. F or the CNN architectures w e consider tw o mo dels: one with dilated conv olutions (MS-D) and one with traditional scaling op erations (U-Net). W e further explain the construction of the stac ked CNN mo del. Subsequently , in section 1 the training pro cedure for the t wo considered architectures is describ ed. Prepro cessing and Augmen tation In order to ensure an adequate data qualit y in the training pro cess for each mo del, we adapt the data prepro cessing and augmentation for CT data. The follo wing description of prepro cessing is tailored to the dataset of the KiTS Kidney T umor Segmentation Challenge (Heller et al., 2019) and the dataset of the CHA OS - Combined (CT-MR) Healthy Ab dominal Organ Segmen tation Challenge (Selver et al., 2019), but can b e applied to any other CT dataset with minor c hanges. Image Prepro cessing The image normalization is adapted from (Isensee et al., 2018) and further extended to make it more general and enable a more realistic normalization for real life applications. W e adapted the image normalisation from (Isensee et al., 2018) to b etter suit real-w orld applications. T o reduce the complexit y and optimise the dynamic range, we apply a windo wing to each v olume by clipping the vo xels grey v alue range to a (0.6, 0.99) p ercen tile range that corresp onds to the window a ra- diologist would use for decision-making. F or other segmentation problems, the p ercen tiles m ust b e adjusted to ﬁt the in tensity distribution of the relev ant bo dy parts (W e show examples in Figure 1 ). W e then normalise the window ed data using the z-score using the intensit y statistics (mean, standard deviation) from only a random sample of the data set. Using the statistical information from the full dataset w ould b e b etter but do es not reﬂect the real conditions in a clinical en vironment. In order to sav e costs and time and reduce exposure to radiation in CT acquisition, the radiologist typically conﬁnes a CT acquisition to the region of in terest (ROI) ( Figure 2 ). This ROI typically deﬁned lib erally not to miss an area that is p otentially relev an t to the diagnosis. Th us, in a clinical setting, the num ber of acquired slices in a CT volume considerably v aries. This p oses a challenge to the application of standard CNN pip elines which often assume a regular data sampling. T o standardise the data, we decided to reduce each v olume to 16 slices as we do not need to upsample volumes that con tain only a few slices. Instead, our metho d selects slices at random from eac h volume, and Fig. 1. Three examples for the use case orien ted windo wing ((A) Bone oriented win- do wing, (B) Organ oriented windowing, (C) Lung oriented windowing). The organ orien ted windowing is applied in this w ork, while the other t wo examples would b e used for the analysis of abnormalities in lung or b on y structures in CT. b y rep eating the sampling pro cess p er volume, we also get a simultaneous data augmen tation eﬀect. W e exclude background slices during the training phase since these are also not considered in the test phase. W e observ ed that increasing the num b er of slices did not yield b etter results, which is consisten t with the observ ation that most CNNs only use a small semantic con text for decision making (Hu, Shen, Albanie, Sun, & W u, 2017; LaLonde & Bagci, 2018). In order to sa ve GPU memory , w e do wnsampled eac h slice from 512 x 512 v oxels to 128 x 128 vo xels as in our exp erimen ts larger slice sizes did not yield b etter segmentation p erformance. Image Augmen tation As additional augmentation steps we used image nois- ing with a normally distributed noise map, slice skipping, slice interpolation and a range shift to address p oten tial v ariation in the CT acquisition pro cess ( Figure 2 ). W e further rotated the images by a random angle (maximum of 16 degrees) Fig. 2. CT scanning conﬁguration, which p oses challenges to the application of CNNs. The represen tation ab o ve presents the v arying slice thickness, which allows mapping the same region of interest to a diﬀerent num b er of slices. The representation b elo w sho ws the v arying size of volumes dep ending on the chosen region of interest. to simulate the inevitable v ariability in patient p ositioning, that o ccurs in clini- cal routine despite ﬁxation. These augmentation steps should more realistically mo del the expected data v ariation when applying the deep learning models in clinical practice. Arc hitecture T o demonstrate the independence of our preprocessing and augmen tation frame- w ork from the concrete underlying neural netw ork architecture, w e compared tw o conceptually diﬀerent CNN mo dels. The ﬁrst architecture we consider here is a mo diﬁed version of the widely-used U-Net called nnU-Net (Isensee et al., 2018). This architecture extends the original U-Net architecture (Cicek et al., 2015) by replacing batch normalization (Ioﬀe & Szegedy, 2015) with instance normaliza- tion (Ulyano v, V edaldi, & Lempitsky , 2016) and ReLUs with LeakyReLU units of slop e 1e-2 (Maas, Hannun, & Ng, 2013). As the second architecture, we chose the mixed-scale dense con volutional neural net w ork (MS-D net) (P elt & Sethian, 2017). W e mo diﬁed it in the same wa y as the U-Net to remov e the inﬂuence of the activ ation function in our comparison. W e hav e chosen these tw o rather ex- treme v ariants of CNNs to compare the traditional do wn- and upscaling ﬂow with the parallel m ulti-scale approach using dilated conv olutions. In clinical diagnoses, the radiologist lo cates the tumour and relev ant adja- cen t structures not only by examining the individual slice but also the adjacen t slices. Thus, a 3D CNN might seem like the obvious choice in order to not lose the spatial information from the 3D context. Ho wev er, previous work has clearly sho wn that 3D segmen tation metho ds perform w orse than 2D approac h when the data is anisotropic (Baumgartner, Ko c h, Pollefeys, & Kon ukoglu, 2017; Isensee et al., 2017), whic h is regularly the case in medical imaging. Another reason why medical image segmen tation with 3D CNNs often prov es challenging is the v ari- able n umber of slices p er v olume. The slice num b er dep ends on v arious external factors like bo dy region under in vestigation, diagnostic question, diﬀeren t size of the sub jects and other trade-oﬀs betw een data quality , minimal scanning time and radiation exp osure. Thus somewhat counterin tuitively , 3DD CNNs do not necessarily p erform b etter than 2D versions in many circumstances, and robust mo dels should consider b oth options. Here, we combined diﬀerent mo dels into a single, stack ed CNN mo del to lev erage the diﬀeren t strengths of each arc hitecture as ensem ble metho ds sho wed sup erior p erformance in sev eral detection tasks (Dolz et al., 2017; Kamnitsas et al., 2018; T eramoto, F ujita, Y amamuro, & T amaki, 2016). F or the kidney- tumour segmen tation we stack ed a set of 3D MS-D Nets trained to classify v oxels into kidney and bac kground (without a distinction betw een the healthy kidney tissue and the tumour tissue), and a set of 2D nnU-Nets trained to p erform classiﬁcation into healthy tissue, tumour and background. F or the liver segmen tation, b oth mo dels p erform binary classiﬁcation of vo xels in to liver and bac kground. T raining W e trained all net works indep enden tly from scratc h. The ov erall training pro ce- dure shown in Algorithm 1 was implemented in Python with T ensorﬂow 1.14 and p erformed on an IBM Po wer System Accelerated Compute Serv er (AC922) with tw o NVIDIA T esla V100 GPUs. This setup allow ed us to parallelise the exp erimen ts, but our prop osed approach also works on typical systems with an NVIDIA GTX 1080. In each epo c h, the volumes of a randomly selected batc h are prepro cessed and augmented (lines 9-12). W e used a batch size of 28 for the 2D netw orks, while we had to reduce the batch size to 1 (sto c hastic gradient descent) for the 3D versions of the mo diﬁed architectures. W e use data augmen tation in 80 p er cen t of the training batc hes for 3D and 90 p er cent of training batc hes in 2D. W e applied the intensit y range shift to 20 p er cent of data in b oth cases. T o up date the w eights θ i of the neural net work function f , we used the AD AM optimisation with the parameter c onﬁguration prop osed in (Kingma & Ba, 2014). Our loss function L (line 16 in Algorithm 1 ) is a com bination of the T animoto loss L T animoto and the categorical crossentrop y L C E , weigh ted b y α = 0 . 6 and β = 0 . 4 resp ectiv ely . The T animoto loss is implemented as sho wn in equation 1, where ˆ y ∈ ˆ Y denotes the set of predicted vo xel-wise annotations and y ∈ Y denotes the set of ground truth vo xel-wise annotations. The adv antage Algorithm 1 T raining pro cedure 1: Initialize netw ork f with random weigh ts θ 0 2: Initialize v alidation data V validate 3: Initialize batch size n 4: Assume standard deviation σ 5: Select windowing p ercen tile P 6: rep eat 7: rep eat 8: Select random volume v 9: Windo wing( v , P v ) 10: Normalization( v , σ ) 11: Augmen tation of v 12: Do wnsampling and slide reduction of v 13: V batch ← v 14: un til Number of v in V batch = n 15: V batch, ˆ y = f ( V batch,x ; θ i ) 16: L i = L T animoto ( V batch, ˆ y , V batch,y ) α + L C E ( V batch, ˆ y , V batch,y ) β 17: θ i +1 = AD AM( L i , θ i ) 18: L validation = V alidate( f ( V validate,x ; θ i +1 , V validate,y ) 19: un til Conv ergence of L validation of the T animoto co eﬃcien t is that it treats each class indep enden tly and is thus particularly suitable for problems with a high class im balance which is t ypically the case in medical imaging. Ho wev er, this also leads to a maxim um error if a particular class do es not o ccur in the curren t sample. This eﬀect is atten uated b y the smo oth factor smooth . W e empirically chose a small smooth of 1 e − 5. A more detailed discussion is given in (Kay aliba y , Jensen, & v an der Smagt, 2017). L T animoto ( ˆ Y , Y ) = 1 − ˆ Y Y + smooth | ˆ Y | 2 + | Y | 2 − ˆ Y Y + smooth (1) Ev aluation W e compared the augmen tation of our framework to the multidimensional im- age augmentation method from (DeepMind Health Researc h T eam, 2018) im- plemen ted in T ensorFlow (an illustration of the diﬀeren t exp erimen ts is shown in Figure 3). Since the normalisation and the windo wing of the CT volume has a strong inﬂuence on the cropping and selection of slices, we used the same prepro cessing for b oth augmentation metho ds. W e implemen ted b oth CNN ar- c hitectures in a 2D and 3D v ersion and ev aluated each mo del in a 5-fold cross- v alidation. T o include the inﬂuence of edge cases in our v alidation, we sorted the data according to the num b er of slices, so the mo dels w ere alw ays v alidated on CT volumes that did not occur in the training data set in a similar form. W e nu- merically ev aluated the mo del predictions volume-wise using the Dice score, as sho wn in equation 2 using the same annotation as in equation 1. W e rep ort the Fig. 3. Overview of the workﬂo ws considered in the experiments. W e switc h three parts of the workﬂo w: (A) Input dimensionalit y , (B) Augmen tation to olkit, (C) Conv olutional Neural Netw ork. This ﬁgure do es not include the experiment with the ensem ble mo del. resulting scores av eraged ov er volumes and cross-v alidation folds for the kidney tumour segmen tation in table 1 and for the liver segmentation in table 2. s Dice ( ˆ Y , Y ) = 2 ˆ Y Y | ˆ Y | 2 + | Y | 2 (2) The results show that the av erage prediction p erformance of mo dels trained with CT-sp eciﬁc image augmentation is on par with the p erformance of mo dels using multidimensional augmentation. Ho wev er, the CT-sp eciﬁc prepro cessing yields stable results whose standard deviation is an order of magnitude lo wer than the state-of-the-art multidimensional approach from (DeepMind Health Researc h T eam, 2018). Our results also conﬁrm the empirical ﬁndings that in- cluding 3D spatial information in mo dels do es not necessarily lead to a b etter segmen tation p erformance for anisotropic data. Regarding the diﬀerent architectures, we found v arying results. F or the kid- ney segmen tation task, w e found that the 3D MS-D Net sho ws few er bac kground T able 1. Results for the kidney tumor segmen tation: T otal Dice scores are rep orted (mean ± stdv.) for each segmen tation class, the diﬀeren t architectures and input di- mensionalities (2D and 3D). Eac h approach is v alidated with the multidimensional image augmentation (MIA) for T ensorﬂo w and with our CT-sp eciﬁc image augmenta- tion (CTIA). Kidney T umor T otal nnU-Net + MIA 2D 0 . 962 ± 0 . 006 0 . 840 ± 0 . 013 0 . 929 ± 0 . 009 nnU-Net + CTIA 2D 0 . 961 ± 0 . 001 0 . 844 ± 0 . 007 0 . 931 ± 0 . 002 nnU-Net + MIA 3D 0 . 960 ± 0 . 012 0 . 839 ± 0 . 021 0 . 929 ± 0 . 014 nnU-Net + CTIA 3D 0 . 960 ± 0 . 002 0 . 841 ± 0 . 008 0 . 925 ± 0 . 003 MS-D Net + MIA 2D 0 . 950 ± 0 . 011 0 . 774 ± 0 . 022 0 . 913 ± 0 . 014 MS-D Net + CTIA 2D 0 . 950 ± 0 . 001 0 . 779 ± 0 . 009 0 . 914 ± 0 . 003 MS-D Net + MIA 3D 0 . 947 ± 0 . 012 0 . 764 ± 0 . 024 0 . 906 ± 0 . 018 MS-D Net + CTIA 3D 0 . 948 ± 0 . 002 0 . 765 ± 0 . 009 0 . 907 ± 0 . 003 Stac ked CNN 0 . 968 ± 0 . 001 0 . 845 ± 0 . 004 0 . 947 ± 0 . 002 errors in binary segmen tation. These ﬁndings indicate that this m ulti-scale ar- c hitecture can detect whole ob jects very well, but the ﬁner distinction b et ween foreground classes (kidney and tumour tissue) w orks comparativ ely p o orly . F or liv er segmentation, we found that the MS-D Net generally led to more segmen- tation errors. How ev er, the MS-D Net errors are typically indep enden t of the segmen tation errors of the U-Net approach. In particular, slices with only small regions of in terest (shown in Figure 4 ) p ose a challenge. Since the errors of the MS-D Net are complemen tary to the errors of the nnU- Net for b oth cases, a stack ed CNN leads to consistently b etter results, as it can learn to balance the strengths and weaknesses of the diﬀeren t mo dels. Here, we constructed a stack ed CNN consisting of a set of 3D MS-D Nets and a set of 2D nnU-Nets trained with CT-sp eciﬁc image augmen tation. F or eac h set, we selected the top-5 models based on their v alidation score in the previous experiment. The stac ked ensemble of neural netw ork predictor consisten tly delivered the most accurate and stable predictions by combining the diﬀerent individual strengths of their mem b ers (see table 1 and 2). Conclusion In this w ork, w e prop ose a robust mac hine learning framew ork for medical image segmen tation addressing the sp eciﬁc demands of CT images for clinical applica- tions. Our analysis fo cused on the often neglected inﬂuence of prepro cessing and data augmen tation on segmentation accuracy and stabilit y . W e systematically ev aluated this framework for tw o diﬀerent state-of-the-art CNN architectures and 2D and 3D input data, resp ectiv ely . In line with previous ﬁndings (Baumgartner et al., 2017; Isensee et al., 2017), our results show that 3D spatial information do es not necessarily lead to b etter segmen tation p erformance in particular concerning detailed, small-scale image structures. In our exp erimen ts, the kind of segmen tation errors v aried b et ween T able 2. Results for liver segmen tation: T otal Dice score (mean ± stdv.) for the diﬀer- en t architectures and input dimensionalities (2D and 3D). Each approac h is v alidated with the m ultidimensional image augmentation (MIA) for T ensorﬂow and with our CT-sp eciﬁc image augmentation (CTIA). T otal nnU-Net + MIA 2D 0 . 974 ± 0 . 031 nnU-Net + CTIA 2D 0 . 978 ± 0 . 001 nnU-Net + MIA 3D 0 . 941 ± 0 . 027 nnU-Net + CTIA 3D 0 . 944 ± 0 . 014 MS-D Net + MIA 2D 0 . 961 ± 0 . 032 MS-D Net + CTIA 2D 0 . 964 ± 0 . 002 MS-D Net + MIA 3D 0 . 942 ± 0 . 037 MS-D Net + CTIA 3D 0 . 942 ± 0 . 004 Stac ked CNN 0 . 980 ± 0 . 001 Fig. 4. Examples of c hallenging 2D segmentation cases for liver segmen tation (top) and kidney tumor segmentation (b ottom). neural netw ork mo dels, and we show ed that a stack ed CNN mo del combining a top- n selection from each mo del indeed outp erformed all other approaches con- sidered in this work. Th us, our ﬁndings clearly suggest an ensem ble approach as an eﬀectiv e wa y to ac hieve more robust and th us, reliable p erformance in a routine setting. Most imp ortan tly , our work that our domain-sp eciﬁc data pre- pro cessing sc heme yields highly robust segmentation results with an order of magnitude low er v ariation b et ween samples while main taining the same a ver- age segmentation accuracy as the general-purp ose approach indep enden t of the underlying CNN arc hitecture. Existing clinical nephrometry scores hav e a p o or predictiv e p o wer (Heller et al., 2019) and massively reduce the underlying information contained in CT v olumes. The improv ed characterisation of kidney tumours through a more eﬃ- cien t, ob jectiv e and reliable segmen tation, should yield better clinical ev aluation, b etter prediction of clinical outcomes, and ultimately a b etter treatmen t of the underlying pathology . In our view, to pa ve the w ay to routine clinical appli- cations of machine learning methods for diagnostic decision supp ort, w e must fo cus on improving the robustness and reliability of our segmentation metho ds. As a ﬁrst step, our w ork addresses fundamental metho dological c hallenges in automated segmentation of CT v olumes for medical use, to yield reliable organ and tumour segmen tation. References Baumgartner, C., Ko c h, L., Pollefeys, M., & Konuk oglu, E. (2017). An explo- ration of 2d and 3d deep learning techniques for cardiac mr image segmen- tation. arXiv:1709.04496 [cs.CV] , 1709.04496 . Brenner, J. (2007). Computed tomography — an increasing source of radiation exp osure. New England Journal of Me dicine , 357(22) , 2277–2284. doi: h ttps://doi.org/doi:10.1056/nejmra072149 Bro der, J. (2011). Chapter 9 - imaging of nontr aumatic ab dominal c onditions . Elsevier. Chlebus, G., Sc henk, A., Moltz, J., v an Ginneken, B., & Hahn, H. (2018). Automatic liver tumor segmen tation in ct with fully conv olutional neural net works and ob ject-based p ostpro cessing. Natur e - Scientiﬁc R ep orts , 8 . Cicek, O., Ab dulk adir, A., Lienk amp, S., Brox, T., & Ronneb erger, O. (2015). 3d u-net: Learning dense volumetric segmen tation from sparse annotation. Me dic al Image Computing and Computer-Assiste d Intervention (MICCAI) - LNCS , 9901 . Costello e, C. M., Chuang, H., Chasen, B., P an, T., F o x, P ., Bassett, R., & Madew ell, J. (2013). Bone windows for distinguishing malignant from b enign primary b one tumors on fdg p et/ct. J Canc er , 4(7) , 524-530. doi: h ttps://doi.org/doi:10.7150/jca.6259 DeepMind Health Researc h T eam. (2018). Multidimensional (2d and 3d) image augmentation for tensorﬂow. Retrieved 2019- 04-30, from h ttps://github.com/deepmind/m ultidim-image- augmen tation/blob/master/do c/index.md Dolz, J., Desrosiers, C., W ang, L., Y uan, J., Shen, D., & Ayed, I. (2017). Deep cnn ensembles and suggestive annotations for infant brain mri segmenta- tion. arXiv:1712.05319 [cs.CV] , 1712.05319 . Harris, K., Adams, H., Lloyd, D., & Harvey , D. (1993). The eﬀect on apparent size of sim ulated pulmonary no dules of using three standard ct window settings. Clinic al R adiolo gy , 47 , 241-244. Heller, N., Sathianathen, N., Kalapara, A., W alczak, E., Mo ore, K., Heather Kaluzniak, H., . . . W eight, C. (2019). The kits19 challenge data: 300 kidney tumor cases with clinical context, ct seman tic segmen tations. arXiv:1904.00445 [q-bio.QM] , 1904.00445 . Hu, J., Shen, L., Albanie, S., Sun, G., & W u, E. (2017). Squeeze-and-excitation net works. arXiv:1709.01507 [cs.CV] , 1709.01507 . Ioﬀe, S., & Szegedy , C. (2015). Batc h normalization: Accelerating deep netw ork training by reducing internal cov ariate shift. arXiv:1502.03167 [cs.LG] , 1502.03167 . Isensee, F., Jaeger, P ., F ull, P ., W olf, I., Engelhardt, S., & Maier-Hein, K. (2017). Automatic cardiac disease assessmen t on cine-mri via time-series segmen ta- tion and domain speciﬁc features. arXiv:1707.00587 [cs.CV] , 1707.00587 . Isensee, F., P etersen, J., Klein, A., Zimmerer, D., Jaeger, P ., Kohl, S., . . . Maier- Hein, K. (2018). nnu-net: Self-adapting framework for u-net-based medical image segmen tation. arXiv:1809.10486 [cs.CV] , 1809.10486 . Kamnitsas, K., Bai, W., F errante, E., McDonagh, S., Sinclair, M., P awlo wski, N., . . . Glo c ker, B. (2018). Ensem bles of multiple models and arc hitec- tures for robust brain tumour segmentation. Br ainlesion: Glioma, Multiple Scler osis, Str oke and T r aumatic Br ain Injuries . Ka yaliba y , B., Jensen, G., & v an der Smagt, P . (2017). Cnn-based segmen tation of medical imaging data. arXiv:1701.03056 [cs.CV] , 1701.03056 . Kingma, D., & Ba, J. (2014). Adam: A metho d for sto c hastic optimization. arXiv:1412.6980 [cs.LG] , 1412.6980 . Krizhevsky , A., Sutsk ever, I., & Hin ton, G. (2012). Imagenet classiﬁcation with deep conv olutional neural netw orks. A dvanc es in Neur al Information Pr o c essing Systems (NIPS 2012) , 25 , 1097-1105. LaLonde, R., & Bagci, U. (2018). Capsules for ob ject segmentation. arXiv:1804.04241 [stat.ML] , 1804.04241 . Maas, A., Hannun, A., & Ng, A. (2013). Rectiﬁer nonlinearities improv e neural net work acoustic mo dels. International Confer enc e on Machine L e arning (ICML) . Minnemaa, J., v an Eijnatten, M., Kou w, W., Diblen, F., Mendrik, A., & W olﬀ, J. (2018). Ct image segmen tation of bone for medical additiv e man ufacturing using a con volutional neural netw ork. Computers in Biolo gy and Me dicine , 103 , 130-139. Mo en, E., Bannon, D., Kudo, T., Graf, W., Cov ert, M., & V an V alen, D. (2019). Deep learning for cellular image analysis. Natur e Metho ds . doi: h ttps://doi.org/doi:10.1038/s41592-019-0403-1DO9 P elt, D., & Sethian, J. (2017). A mixed-scale dense conv olutional neu- ral netw ork for image analysis. PNAS , 115 (2) , 254-259. doi: h ttps://doi.org/doi:10.1073/pnas.1715832114 Selv er, A., Uenal, G., Dicle, O., Gezer, S., Baris, M., Aslan, S., . . . Kazaz, E. (2019). Chaos - combined (ct-mr) healthy ab dominal organ segmentation. In The ie e e international symp osium on biome dic al imaging (isbi). T eramoto, A., F ujita, H., Y amamuro, O., & T amaki, T. (2016). Automated detection of pulmonary no dules in p et/ct images: Ensem ble false-p ositiv e reduction using a con volutional neural net w ork technique. Me dic al Physics - Quantitative imaging and image pr o c essing , 11 May 2017 , 2821-2827. Uly anov, D., V edaldi, A., & Lempitsky , V. (2016). Instance normalization: The missing ingredient for fast stylization. arXiv:1607.08022 [cs.CV] , 1607.08022 . V oulo dimos, A., Doulamis, N., Doulamis, A., & Protopapadakis, E. (2018). Deep learning for computer vision: A brief review. Computational Intel ligenc e and Neur oscienc e , 2018 . doi: https://doi.org/doi:10.1155/2018/7068349

Domain specific cues improve robustness of deep learning based segmentation of ct volumes

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment