Deep Transfer Learning Methods for Colon Cancer Classification in Confocal Laser Microscopy Images

Deep T ransfer Learning Metho ds for Colon Cancer Classiﬁcation in Confo cal Laser Microscop y Images Nils Gessert 1 ∗ , Marcel Bengs 1 ∗ , Luk as Wittig 2 , Daniel Dr¨ omann 2 , T obias Kec k 3 , Alexander Schlaefer 1 , David B. Ellebrec ht 3 Preprint. Accepted for publication in IJCARS. Abstract Purp ose The gold standard for colorectal cancer metastases detection in the peritoneum is histological ev aluation of a remo ved tissue sample. F or feedback during in terven- tions, real-time in-vivo imaging with confo cal laser microscopy has b een prop osed for diﬀerentiation of benign and malignant tissue by man ual exp ert ev aluation. Automatic image classiﬁcation could impro ve the surgical w orkﬂow further by pro viding immediate feedback. Metho ds W e analyze the feasibility of classifying tissue from confo cal laser microscopy in the colon and p eritoneum. F or this purpose we adopt b oth classical and state- of-the-art conv olutional neural netw orks to directly learn from the images. As the a v ailable dataset is small, we in vestigate sev eral transfer learning strategies including partial freezing v ariants and full ﬁne-tuning. W e address the distinction of diﬀerent tissue t yp es, as w ell as b enign and malignant tissue. R esults W e presen t a thorough analysis of transfer learning strategies for colorectal cancer with confo cal laser microscopy . In the peritonuem, metastases are classiﬁed with an AUC of 97 . 1 and in the colon the primarius is classiﬁed with an AUC of 73 . 1. In general, transfer learning substan tially impro ves p erformance o ver training from scratc h. W e ﬁnd that the optimal transfer learning strategy diﬀers for models and classiﬁcation tasks. Conclusions W e demonstrate that conv olutional neural netw orks and transfer learning can b e used to identify cancer tissue with confocal laser microscop y . W e sho w that there is no generally optimal transfer learning strategy and mo del as well as task- sp eciﬁc engineering is required. Giv en the high p erformance for the peritoneum,  Nils Gessert, E-mail: nils.gessert@tuhh.de, T el.: +49 (0)40 42878 3389, https://orcid.org/0000-0001-6325-5092 ∗ Authors con tributed equally 1 Institute of Medical T ec hnology , Ham burg Universit y of T ec hnology , Hamburg, German y 2 Department of Pulmology , Univ ersity Medical Centre Schleswig-Holstein, L ¨ ubeck, German y 3 Department of Surgery , Universit y Medical Centre Schleswig-Holstein, L ¨ ubeck, German y 2 Gessert et al. ev en with a small dataset, application for intraoperative decision supp ort could be feasible. Keyw ords Colon Cancer · Confo cal Laser Microscopy · T ransfer Learning · Con volution Neural Netw ork 1 Introduction Colorectal cancer is very common and it is often associated with metastatic spread [1]. In particular, p eritoneal carcinomatosis (PC) can arise in later stages of de- v elopment which often shortens patien t surviv al times substantially [2, 3]. Th us, early and reliable detection of metastases is crucial. Diagnosis with t ypical external imaging techniques such as computed tomography (CT) and magnetic resonance imaging (MRI) is diﬃcult for PC as a very high resolution is required. F or example, preop erativ e CT has b een shown to b e ineﬀectiv e to detect individual p eritoneal tumor dep osits and the interobserv er v ariability among exp erts w as signiﬁcant [4]. Also, integrated PET/CT did not provide suﬃcien t information for accurate as- sessmen t [5]. F or MRI, studies hav e sho wn impro vemen t ov er assessmen t with CT only [6, 7] but o verall, its resolution is still a limitation [8]. Therefore, exploratory laparoscop y is generally employ ed to inv estigate the presence of PC [9]. Recen tly , a new intraoperative device using confo cal laser microscopy (CLM) has b een in tro duced which provides submicrometer image resolution [10]. In the study , ten rats receiv ed colon carcionoma cell implants in the colon and p eri- toneum. After a growth p erio d, laparotom y with in-vivo CLM was performed. CLM images of healthy and malignant colon tissue, as well as healthy and malig- nan t p eritoneum were acquired. It was shown that exp erts are able to distinguish diﬀeren t tissue types as w ell as health y and malignan t tissue from CLM. This raises the question whether image pro cessing tec hniques can b e used to automatically classify diﬀerent tissue types. This could enable faster and improv ed in traop erativ e decision supp ort with CLM. Recen tly , automatic tissue characterization has b een successfully addressed using deep learning methods such as con volutional neural net works (CNNs) for seman tic segmentation and classiﬁcation [11, 12]. F or example, skin cancer classiﬁ- cation at dermatologist-lev el performance w as ac hieved [13]. How ever, the datasets for this and related studies are large and commonly , datasets for medical learn- ing tasks are small [14]. This can b e problematic as insuﬃcient data for optimal training might lead to ov erﬁtting and limited generalization. This is particularly imp ortan t for deep learning mo dels whic h can b e prone to ov erﬁtting due to their large num b er of trainable parameters. T o ov ercome this issue, transfer learning metho ds hav e been proposed where a deep learning mo del is ﬁrst pretrained on a diﬀeren t, large dataset [15]. Then, information from the source domain can b e transferred to the (medical) target domain using strategies such as ”oﬀ-the-shelf” features, partial lay er freezing, or full ﬁne-tuning [16]. While this has b een suc- cessfully applied for medical learning tasks [17], there is no single solution for all problems and the optimal transfer learning strategy is highly dependent on the imaging mo dality and dataset size [18]. Automatic analysis of CLM images has b een proposed for diﬀeren t tissue t yp es suc h as human skin [19], the cornea [20] or the oral cavit y [21]. Recently , deep Deep T ransfer Learning Metho ds for Colon Cancer Classiﬁcation with CLM 3 learning metho ds ha ve b een applied to CLM and similar mo dalities. F or example, CNNs hav e b een used for oral squamous cell carcinoma classiﬁcation [21] and motion correction with CLM [22]. Similarly , skin images from CLM hav e been used with CNN-based classiﬁcation [23]. F or the gastroin testinal tract, CNNs hav e b een used to distinguish three classes of Barret’s esophagus [24]. Also, brain tumor classiﬁcation with CNNs and CLM has shown promising results [25]. F or example, a CNN has b een used to diﬀeren tiate CLM images with and without diagnostic v alue for a physician during surgery [26]. Also, weakly-supervised localization has b een used to derive lo cal information in CLM images from image-level lab els only [27]. So far, deep learning-based classiﬁcation of colorectal cancer from CLM images has not b een addressed. Also, while sev eral approaches ha ve used CLM and CNNs for other problems [28], there is no analysis of transfer learning prop erties for col- orectal cancer with CLM. Therefore, w e study deep learning-based colon cancer classiﬁcation from CLM images with a v ariety of transfer learning metho ds from the ImageNet dataset. W e consider training from scratc h, partial la yer freezing, ”oﬀ-the-shelf” features and full ﬁne-tuning to inv estigate how transferable Ima- geNet features are to CLM. W e p erform this study with the classic mo dels VGG- 16 [29] and Inception-V3 [30] as well as the state-of-the-art architectures Densenet [31] and squeeze-and-excitation net w orks [32] to analyze the consistency of transfer strategies across architectures. W e consider the classes health y colon (HC), ma- lignan t colon (MC), healthy p eritoneum (HP) and malignan t peritoneum (MP). Based on these classes, we address three binary classiﬁcation tasks with CLM. First, we consider the diﬀerentiation of organs (HP vs. HC). Then, we study the detection of malignant tissue in tw o types of organs (HP vs. MP and HC vs. MC). This allows us to study v ariations across diﬀerent classiﬁcation tasks for CLM. A preliminary v ersion of this paper was presen ted at the BVM W orkshop 2019 [33]. W e substantially revised the pap er, extended the review of the literature and p erformed more experiments with additional transfer strategies and more archi- tectures. This pap er is structured as follows. First, we describ e our mo dels and transfer learning stratgies and the data set we use in Section 2. Then we rep ort our results in Section 3 and discuss them in Section 4. Last, w e conclude in Section 5. 2 Metho ds 2.1 Model Architectures and T raining Strategies First, w e consider the classic mo del VGG-16 [29] with the addition of batch normal- ization which enables faster training of the architecture by reducing the internal co v ariate shift [34]. The mo del itself is simple as it consists of sev eral stac ked con- v olutional lay ers without further augmentation. In b etw een blo c ks of tw o to three con volutional la yers with kernel sizes of 3 × 3 and 1 × 1, max p ooling reduces the spatial dimensions. Subsequent conv olutions double the num b er of feature maps. A building blo ck of the architecture is shown in Figure 1 (top left). Due to its simple structure, the architecture can serv e as a baseline. Second, w e employ Inception-V3 [30]. The mo del consists of multiple Inception blo c ks which follow t wo core design principles. First, the blo cks hav e a multi- path structure, i.e., the input feature maps are pro cessed in parallel by diﬀerent 4 Gessert et al. Input F in Output F out ResNext H × W × C Avg. Pool FC - σ 1 × 1 × C 1 × 1 × C + Input F in ResBlock F in + k F in + N k ResBlock Compress Avg. Pool Output F out b b b b b b b b b b b b (c) (d) Input F in C. 3 × 3 F 12 2 × C. 3 × 3 F 22 Output F out (b) Input F in C. 3 × 3 2 F in C. 3 × 3 2 F in Output F out (a) C. 1 × 1 F 21 C. 1 × 1 F 11 Pool F in Fig. 1: The building blocks of the models we use. The building blocks from CNN arc hitectures as indicated in Figure 2. W e employ VGG-16 (a), Inception-V3 (b), Densenet121 (c) and SE- Resnext50 (d). F denotes the num b er of feature maps in each blo c k. The Conv blo c ks also contain ReLU activ ations and batch normalization for VGG-16 (a). SE-Resnext50 is shown in simpliﬁed form without its b ottleneck in the SE mo dule. FC- σ is a fully-connected lay er with sigmoid activ ation. C. is an abbreviation for conv olutional layers. Note that Inception-V3 employs multiple blo c k v ariants and w e show one example. con volution and p o oling op erations. At the blo c k’s output, the feature maps from all paths are concatenated. Second, the conv olutional paths p erform a reduction op eration that downsizes the feature map dimension with 1 × 1 k ernels. Then, computationally more exp ensiv e 3 × 3 conv olutions pro cess the lo wer dimensional represen tations. The output feature map size is increased if the spatial dimensions are reduced inside the blo c k whic h a voids represen tational limitations. The idea of reduction and expansion has also found its w ay into the Resnet arc hitecture [35] which is a core comp onent of the next tw o mo dels. Resnets learn a residual instead of a full feature transformation by using skip connections. In detail, a Resnet blo ck (ResBlo c k) computes x ( l ) = a ( F ( x ( l − 1) , θ ( l ) ) + x ( l − 1) ) (1) where x ( l ) is the block output, x ( l − 1) is the block input, a is a ReLU activ ation [36] and F represen ts t wo conv olutional lay ers with parameters θ ( l ) . The skip connection enables b etter gradient propagation for improv ed training. Third, we consider Densenet121 [31], a state-of-the-art architecture whic h strives for more eﬃciency b y in tro ducing extensiv e feature reuse. In particular, within one DenseBlo c k, features computed in previous lay ers are also fed into the subsequent la yers. T o k eep the feature map sizes mo derate, compression blo cks reduce the feature maps b et w een DenseBlo cks. The DenseBlock is sho wn in Figure 1 (b ottom left). Deep T ransfer Learning Metho ds for Colon Cancer Classiﬁcation with CLM 5 Input Crop Mo del Block b b b 1 Mo del Block N − 1 Mo del Block N FC La y er C = 2 Retrain Classiﬁer Pa rtial F reeze 1 Pa rtial F reeze 2 F ull Fine-T uning Fig. 2: The diﬀerent transfer learning scenarios w e inv estigate. Model Blo c k refers to one of the blo c ks shown in Figure 1. Green indicates that blo cks are retrained. Red indicates that blocks are frozen with their w eights ha ving b een trained on ImageNet. F ourth, we adopt the architecture SE-Resnext50 [32]. At its core, the mo del uses Resnext blo cks [37] which are an extension of Resnet. Here, the single con- v olutional path F is split into multiple paths with individual lay ers which in- creases representational p o wer. The key addition in SE-Resnext50 is the use of squeeze-and-excitation (SE) mo dules which recalibrate the f eature maps learned b y Resnext blo c ks. These mo dules ha ve shown improv ed p erformance with only a minimal increase in the n umber of parameters. The concept is sho wn in Figure 1 (b ottom right). Due to the small dataset size, we study sev eral transfer learning strategies where the ab o ve-men tioned mo dels are trained on ImageNet. W e cut oﬀ the last la yer of all mo dels and replace it with a fully-connected lay er with t wo outputs for binary classiﬁcation. W e apply a softmax lay er on top and the ﬁnal classiﬁcation output is the class with the highest probabilit y . W e train a separate model for eac h of our binary classiﬁcation tasks. As a baseline, we consider training from scratc h, i.e. all weigh ts are randomly initialized. Then, we use several diﬀerent transfer learning strategies illustrated in Figure 2. The ﬁrst transfer approach follows the ”oﬀ-the-shelf” features idea. Here, only the new classiﬁer is retrained on features extracted by the pretrained CNN. W e also consider tw o partial freezing metho ds, where an initial part of the netw ork remains frozen and the part closer to the classiﬁer is retrained. W e c hose the freezing p oin ts blo ck-wise, i.e. we do not cut in to building blo c ks. Last, w e consider full ﬁne-tuning where all weigh ts in the netw ork are retrained with a small learning rate. The diﬀerent strategies represent diﬀerent abstractions of feature transfer b et ween ImageNet and CLM images. T o further impro ve generalization, we employ online data augmentation with random image ﬂipping and random c hanges in brightness and contrast. F urther- more, w e use random cropping with crops of size 224 × 224 (299 × 299 for Inception- V3) taken from the full images of size 384 × 384. W e use the Adam algorithm for optimization. W e adapt learning rates and the num b er of training ep ochs for the diﬀeren t transfer scenarios. W e use a cross-entrop y loss function with additional w eighting to accoun t for the slight class imbalance. In detail, w e multiply the loss of a training example b y N /n i where N is the total num b er of training examples in the current fold and n i is the num b er of examples b elonging to class i in the 6 Gessert et al. Fig. 3: Examples for the diﬀeren t classes. Malignant colon tissue, healthy colon tissue, malig- nant p eritoneum tissue and healthy p eritoneum tissue are shown from left to right. curren t fold. In this wa y , underrepresented classes receive a higher weigh ting in the loss function. During ev aluation, we use mutli-crop ev aluation with N c = 36 ev enly spread crops ov er the images. This ensures that all image regions are co v- ered with large o verlaps betw een crops. The ﬁnal predictions are a veraged o ver the N c crops. W e implement our mo dels in PyT orch. 2.2 Dataset and Exp erimen ts The dataset w as collected in a previous study conducted at the Univ ersity Hospital Sc hleswig-Holstein in L¨ ub ec k where exp ert assessmen t of CLM images in the colon area was ev aluated [10]. A custom intraoperative device with integrated CLM (Karl Storz GmbH & Co KG, T uttlingen, Germany) was built. The image resolution w as 384 × 384 pixels whic h co vers a ﬁeld of view of 300 µ m × 300 µ m. In the study , ten rats received colon adeno carcinoma cell implantation in the colon and p eritoneum with a growth time of seven days. Then, laparotomy was conducted and images of health y colon tissue, malignan t colon tissue, health y p eritoneum tissue and malignant p eritoneum tissue w ere obtained. Example CLM images for eac h tissue type are sho wn in Figure 3. After remov al of low qualit y images, 1577 images remained with 533 belonging to class HC, 309 b elonging to class MC, 343 b elonging to class HP and 392 b elonging to class MP . Note that some sub jects are missing classes such that, on av erage, six sub jects p er class remain. Ground-truth annotation of all images was obtained by tissue remov al of the scanned areas and subsequen t histological ev aluation. Due to the small dataset size, we chose a cross-v alidation scheme where im- ages from one sub ject are left for ev aluation and training is p erformed on the remaining ones. Thus, all rep orted results are the mean v alue of, on av erage, six training scenarios with six diﬀerent folds. Based on the four classes, we address three binary classiﬁcation problems. First, we consider HC vs. HP , i.e., w e in- v estigate the feasibilit y of distinguishing the diﬀerent organs in CLM. Then, we consider the diﬀerentiation of healthy and malignant tissue with the tw o binary classiﬁcation problems HP vs. MP and HC vs. MC. W e report the accuracy , sen- sitivit y , sp eciﬁcit y , F1-score and AUC. W e use the AUC as the main metric as it is threshold-indep endent. Deep T ransfer Learning Metho ds for Colon Cancer Classiﬁcation with CLM 7 1 2 3 4 5 T raining T yp e 0 . 20 0 . 25 0 . 30 0 . 35 0 . 40 0 . 45 0 . 50 0 . 55 0 . 60 0 . 65 0 . 70 0 . 75 0 . 80 0 . 85 0 . 90 0 . 95 1 . 00 A UC HC vs. HP DenseNet SE-RX Inception V GG16BN 1 2 3 4 5 T raining T yp e 0 . 20 0 . 25 0 . 30 0 . 35 0 . 40 0 . 45 0 . 50 0 . 55 0 . 60 0 . 65 0 . 70 0 . 75 0 . 80 0 . 85 0 . 90 0 . 95 1 . 00 HP vs. MP DenseNet SE-RX Inception V GG16BN 1 2 3 4 5 T raining T yp e 0 . 20 0 . 25 0 . 30 0 . 35 0 . 40 0 . 45 0 . 50 0 . 55 0 . 60 0 . 65 0 . 70 0 . 75 0 . 80 0 . 85 0 . 90 0 . 95 1 . 00 HC vs. MC DenseNet SE-RX Inception V GG16BN Fig. 4: AUC v alues of all applied architectures for the diﬀerent classiﬁcation problems. W e ev aluate the follo wing training t yp es: (1) retrain classiﬁer, (2) partial freeze 1, (3) partial freeze 2, (4) full ﬁne-tuning, (5) training from scratch. F or each v alue the standard deviation ov er multiple folds is represented b y an error bar. 3 Results First, we compare the diﬀerent transfer learning scenarios describ ed in Section 2 across all arc hitectures for eac h classiﬁcation scenario, see Figure 4. In general, the A UC is very high for the diﬀerentiation of diﬀerent health y tissue types and health y and malignant p eritoneum tissue. The AUC for classifying malignant colon tissue is substan tially low er. Also, the standard deviation is higher for this task. T raining from scratch p erforms worst for all architectures and classiﬁcation scenarios. Regarding the transfer learning scenarios, training from scratc h performs worst for all classiﬁcation scenarios. F or tw o of the three scenarios, only retraining the classiﬁer shows substantially low er p erformance than other transfer scenarios. There are no clear trends betw een the partial freezing and ﬁne-tuning scenarios. Second, we go in to more details for the classiﬁcation task HP vs. MP . Figure 5 sho ws the ROC curves for all mo dels with all transfer learning scenarios for the classiﬁcation task. Op erating points with a go od trade-oﬀ in the upp er left corner v ary for each mo del. F or VGG-16, retraining the classiﬁer only stands out. F or Densenet121, partial freezing p erforms well. F or Inception-V3 and SE-Resnext50, partial freezing and ﬁne-tuning p erform similar. Third, an ov erview of the b est p erforming transfer strategies is shown in T a- ble 1. Comparing individual results for each architecture, no mo del clearly out- p erforms the others consistently . In general, Densenet121 p erforms slightly b etter across the tasks. The optimal transfer strategy diﬀers across mo dels and classiﬁ- cation tasks. F or HC vs. HP and for Densenet121 in general, the partial freezing metho d p erforms b est. 8 Gessert et al. 0 . 00 0 . 25 0 . 50 0 . 75 1 . 00 F alse P ositiv e Rate 0 . 00 0 . 25 0 . 50 0 . 75 1 . 00 T rue P ositiv e Rate Inception-V3 Classiﬁer F reeze 1 F reeze 2 Fine-T uning F rom Scratc h 0 . 00 0 . 25 0 . 50 0 . 75 1 . 00 F alse P ositiv e Rate 0 . 00 0 . 25 0 . 50 0 . 75 1 . 00 T rue P ositiv e Rate Densenet121 Classiﬁer F reeze 1 F reeze 2 Fine-T uning F rom Scratc h 0 . 00 0 . 25 0 . 50 0 . 75 1 . 00 F alse P ositiv e Rate 0 . 00 0 . 25 0 . 50 0 . 75 1 . 00 T rue P ositiv e Rate SE-Resnext50 Classiﬁer F reeze 1 F reeze 2 Fine-T uning F rom Scratc h 0 . 00 0 . 25 0 . 50 0 . 75 1 . 00 F alse P ositiv e Rate 0 . 00 0 . 25 0 . 50 0 . 75 1 . 00 T rue P ositiv e Rate V GG-16 Classiﬁer F reeze 1 F reeze 2 Fine-T uning F rom Scratc h Fig. 5: ROC curv e for the diﬀerent architectures and the diﬀerent training types, sho wn for the classiﬁcation of HP vs. MP . Last, w e pro vide training times for all architectures and training scenarios, see Figure 6. In general, freezing more w eights during training reduces the o verall training time. F urthermore, training time lo osely scales with the num b er of train- able parameters as VGG-16 contains the most parameters and shows the longest training times, follow ed by SE-Resnext50. 4 Discussion W e study deep transfer learning metho ds for CLM images for three binary clas- siﬁcation problems. Automatic decision supp ort with CLM during interv entions could impro ve workﬂo w with immediate feedback on the tissue properties. F or this purp ose we inv estigate the use of CNNs with four diﬀeren t architectures and ﬁv e training scenarios. The three classiﬁcation tasks. As a baseline, diﬀerentiating healthy colon and peritoneum tissue w orks w ell with an A UC o ver 0 . 90 for partial freezing across all mo dels, see Figure 4. This indicates that discriminative features for diﬀerent organs can b e learned from CLM images. Similarly , for classiﬁcation of metastases in the p eritoneum the AUC is around 0 . 90 for all transfer learning scenarios. How- ev er, classifying healthy and malignant colon tissue p erforms substantially worse with an AUC of ≈ 0 . 70 for partial freezing and ﬁne-tuning. The task app ears to Deep T ransfer Learning Metho ds for Colon Cancer Classiﬁcation with CLM 9 T able 1: The b est p erforming transfer learning method for each model and classiﬁcation task. Densenet refers to the Densenet121 mo del, SE-RX50 refers to the SE-Resnext50 mo del. F or each training scenario, the b est performing conﬁguration is marked bold. All v alues are given in percent. The sensitivity is given with resp ect to the cancer class and for the case of organ diﬀerentiation it is given with respect to the p eritoneum class. Type Accuracy Sensitivity Speciﬁcity F1-Score A UC HC vs. HP Inception F reeze 1 87.7 79.9 94.4 90.4 95.7 Densenet F reeze 1 91.2 82.8 95.3 91.9 92.6 SE-RX50 F reeze 1 85.8 78.5 96.3 91.3 91.9 VGG-16 F reeze 1 82.5 74.9 91.8 87.2 91.6 HP vs. MP Inception F reeze 2 85.9 86.6 87.0 86.8 95.6 Densenet F reeze 2 83.3 84.6 83.2 84.0 91.9 SE-RX50 F reeze 1 81.7 84.6 83.2 84.0 90.9 VGG-16 Classiﬁer 88.0 91.0 84.6 87.9 97.1 HC vs. MC Inception Fine-T uning 63.1 71.0 57.0 63.7 68.0 Densenet F reeze 1 70.0 72.9 64.1 69.1 73.1 SE-RX50 Fine-T uning 63.7 66.7 65.9 69.1 71.8 VGG-16 F reeze 2 63.5 67.6 64.2 68.1 72.0 b e more diﬃcult which is also reﬂected in a slightly higher standard deviation. This indicates higher uncertaint y of mo del predictions. This could b e caused by the heterogeneous app earance of colon tissue in diﬀerent parts of the colon which complicates the learning task in conjunction with the small dataset size. F urther- more, during developmen t, colon carcinoma cells transform from a healthy stage to adenoma and then carcinoma. At earlier stages, healthy and malignant cells can still hav e similar app earance whic h complicates the learning task. T ransfer learning scenarios. Figure 4 also provides an o verview of the trans- fer strategies across all models. Clearly , transfer learning substantially outp erforms training from scratch across all classiﬁcation tasks which supp orts the eﬀective- ness of transfer learning for medical image classiﬁcation problems [38]. The results indicate that meaningful feature transfer from the natural image domain to CLM images is p ossible, although the images hav e a v astly diﬀerent appearance. How- ev er, comparing transfer strategies, only retraining the classiﬁer p erforms worse than other scenarios in tw o out of three classiﬁcation tasks. This agrees with re- sults of a previous study on transfer learning with CLM images in neurosurgery [28]. Here, the authors found that full ﬁne-tuning outp erforms retraining of the classiﬁer only . How ever, in our case, retraining the classiﬁer only also shows a high p erformance for the task HP vs. MP . This could be caused b y fragile co-adaptation of w eights [39] which leads to large p erformance diﬀerences b et ween the diﬀerent classiﬁcation tasks. F or some tasks (e.g. HP vs. MP) recov ery and reuse of p oten- tially co-adapted weigh ts might b e feasible while reuse is impaired for other tasks (e.g. HC vs. MC). The partial freezing and ﬁne-tuning strategies appear to b e more consistent across tasks, how ever, the optimal strategy still diﬀers. Overall, our results indicate that the transferability of features not only dep ends on the imaging mo dalit y but also the classiﬁcation task. This adds to previous insights on transfer learning in the medical domain where the optimal transfer strategy w as found to b e mo dalit y and dataset size dep endent [18]. Comparing the partial 10 Gessert et al. Classiﬁer F reeze 1 F reeze 2 F ull T raining T raining T yp e 0 . 0 0 . 5 1 . 0 1 . 5 2 . 0 2 . 5 3 . 0 3 . 5 4 . 0 4 . 5 5 . 0 5 . 5 6 . 0 6 . 5 7 . 0 7 . 5 8 . 0 8 . 5 9 . 0 9 . 5 10 . 0 10 . 5 11 . 0 11 . 5 12 . 0 12 . 5 13 . 0 13 . 5 T raining time in min utes DenseNet SE-RX Inception V GG16BN Fig. 6: T raining times for 90 ep ochs of all applied architectures for the diﬀerent training scenarios for the classiﬁcation task HP vs HC. Note that for training from scratch the same num b er of parameters is trained as for full-ﬁne tuning. Thus, training times are equivalen t for the tw o cases. freezing and ﬁne-tuning strategies, p erformance is very close and there is no op- timal strategy for each of the tasks. How ever, training times are also an asp ect to consider for the diﬀeren t transfer learning strategies. As shown in Figure 6, freezing more parameters inside the architecture leads to reduced training times. Th us, partial freezing can b e generally seen as adv antageous as it often achiev es similar p erformance as full tine-tuning while requiring less training time. F or ap- plication, this insight could b e useful when adopting and retraining mo dels f or cancer classiﬁcation in other organs or when new er arc hitectures are introduced. Diﬀeren t architectures for CLM. T o analyze the diﬀerent transfer strate- gies further, w e consider the R OC curves of each architecture for the HP vs. MP task, see Figure 5. F or this task, using ”oﬀ-the-shelf” features and only re- training the classiﬁer p erformed considerably b etter than for the other tasks. As discussed b efore, this indicates that transfer learning scenarios are classiﬁcation task-dep enden t. In detail, the R OC curv es rev eal that VGG-16 stands out in partic- ular where retraining the classiﬁer only performs b est out of all transfer strategies. In transfer learning researc h, VGG-16 is still a p opular general purp ose feature extractor for n umerous tasks [40, 11]. F or the other arc hitectures, the optimal strat- egy diﬀers. F or example, for Densenet121, the partial freezing metho ds show go o d op erating points in the upp er, left corner of the ROC-curv e. F or Inception-V3 and SE-Resnext50, partial freezing and ﬁne-tuning p erform similar with no clearly su- Deep T ransfer Learning Metho ds for Colon Cancer Classiﬁcation with CLM 11 p erior metho d. This indicates that the choice of transfer learning metho d dep ends on the architecture. This should b e exp ected, as the mo dels hav e very diﬀerent blo c k types and each freezing type ﬁxes a diﬀerent num b er of parameters. The de- tailed results in T able 1 with additional metrics underline this insight. There is no optimal transfer learning strategy and the best p erforming strategy v aries for dif- feren t architectures and classiﬁcation tasks. Overall, we demonstrate that transfer learning has an impact on p erformance, how ev er, there is no simple rule-of-thum b for optimal transfer learning with CLM. Our results show that examining diﬀerent freezing strategies can considerably improv e p erformance for individual mo dels. 5 Conclusion W e inv estigate the feasibilit y of colon cancer classiﬁcation in CLM images using CNNs and multiple transfer learning scenarios. Using in-vivo images of healthy and malignan t colon and p eritoneum tissue obtained from ten sub jects, we adopt four arc hitectures and ﬁve transfer learning scenarios for three classiﬁcation problems with CLM. Our results show that diﬀeren t organs as well as health y and malignan t p eritoneum tissue can be classiﬁed with deep transfer learning. W e sho w that transfer learning from ImageNet is successful with CLM but the transferability of features is limited. W e ﬁnd that there is no single optimal mo del or transfer strategy for all CLM classiﬁcation problems and that task-sp eciﬁc engineering is lik ely required for application. F or future work, our results could b e extended to more classiﬁcation problems with CLM. Compliance with Ethical Standards F unding: The authors hav e no funding to declare. Conﬂict of Interest: The authors declare that they hav e no conﬂict of interest. Ethical Approv al: All pro cedures p erformed in studies inv olving animals were in ac- cordance with the ethical standards of the institution or practice at which the studies were conducted. Informed Consent: Informed consent w as obtained from all individual participants in- cluded in the study . References 1. T orre, L.A., Bray , F., Siegel, R.L., F erlay , J., Lortet-Tieulent, J., Jemal, A. (2015) Global cancer statistics, 2012. CA: a cancer journal for clinicians 65 (2), 87–108 2. V erw aal, V.J., v an Ruth, S., Witk amp, A., Bo ot, H., v an Slo oten, G., Zoetmulder, F.A. (2005) Long-term surviv al of p eritoneal carcinomatosis of colorectal origin. Annals of surgical oncology 12 (1), 65–71 3. F rank o, J., Shi, Q., Goldman, C.D., Pock a j, B.A., Nelson, G.D., Goldb erg, R.M., Pitot, H.C., Grothey , A., Alberts, S.R., Sargent, D.J. (2012) T reatment of colorectal p eritoneal carcinomatosis with systemic chemotherap y: a p ooled analysis of north central cancer treatment group phase iii trials n9741 and n9841. Journal of Clinical Oncology 30 (3), 263 4. de Bree, E., Koops, W., Kr¨ oger, R., v an Ruth, S., Witk amp, A.J., Zoetmulder, F.A. (2004) Peritoneal carcinomatosis from colorectal or appendiceal origin: correlation of preoperative ct with intraoperative ﬁndings and ev aluation of interobserv er agreement. Journal of surgical oncology 86 (2), 64–73 12 Gessert et al. 5. Dromain, C., Leb oulleux, S., Aup erin, A., Go ere, D., Malk a, D., Lumbroso, J., Sch um- berger, M., Sigal, R., Elias, D. (2008) Staging of p eritoneal carcinomatosis: enhanced ct vs. p et/ct. Abdominal imaging 33 (1), 87–93 6. Low, R.N., Semelk a, R.C., W oraw attanakul, S., Alzate, G.D. (2000) Extrahepatic ab dom- inal imaging in patients with malignancy: comparison of mr imaging and helical ct in 164 patients. Journal of Magnetic Resonance Imaging: An Oﬃcial Journal of the International Society for Magnetic Resonance in Medicine 12 (2), 269–277 7. Iafrate, F., Ciolina, M., Sammartino, P ., Baldassari, P ., Rengo, M., Lucchesi, P ., Sibio, S., Accarpio, F., Di Giorgio, A., Laghi, A. (2012) Peritoneal carcinomatosis: imaging with 64-mdct and 3t mri with diﬀusion-weigh ted imaging. Ab dominal imaging 37 (4), 616–627 8. Gonz´ alez-Moreno, S., Gonz´ alez-Bay´ on, L., Ortega-P ´ erez, G., Gonz´ alez-Hernando, C. (2009) Imaging of p eritoneal carcinomatosis. The Cancer Journal 15 (3), 184–189 9. Ishigami, S., Uenosono, Y., Arigami, T., Y anagita, S., Okumura, H., Uchik ado, Y., Kita, Y., Kurahara, H., Kijima, Y., Nak a jo, A., Maemura, K., Natsugo e, S. (2014) Clinical utility of periop erativ e staging laparoscopy for adv anced gastric cancer. W orld journal of surgical oncology 12 (1), 350 10. Ellebrech t, D.B., Kuemp ers, C., Horn, M., Keck, T., Kleemann, M. (2018) Confo cal laser microscopy as nov el approach for real-time and in-vivo tissue examination during minimal- inv asive surgery in colon cancer. Surgical endoscopy pp. 1–7 11. Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafo orian, M., v an der Laak, J.A., V an Ginneken, B., S´ anchez, C.I. (2017) A surv ey on deep learning in medical image analysis. Medical image analysis 42 , 60–88 12. Goceri, E., Goceri, N. (2017) Deep learning in medical image analysis: recen t adv ances and future trends. In: International Conferences Computer Graphics, Visualization, Computer Vision and Image Pro cessing, pp. 305–311 13. Estev a, A., Kuprel, B., Nov oa, R.A., Ko, J., Swetter, S.M., Blau, H.M., Thrun, S. (2017) Dermatologist-level classiﬁcation of skin cancer with deep neural netw orks. Na- ture 542 (7639), 115 14. Shen, D., W u, G., Suk, H.I. (2017) Deep learning in medical image analysis. Ann ual review of biomedical engineering 19 , 221–248 15. Bengio, Y. (2012) Deep learning of representations for unsupervised and transfer learning. In: Pro ceedings of ICML W orkshop on Unsup ervised and T ransfer Learning, pp. 17–36 16. Hoo-Chang, S., Roth, H.R., Gao, M., Lu, L., Xu, Z., Nogues, I., Y ao, J., Mollura, D., Summers, R.M. (2016) Deep conv olutional neural netw orks for computer-aided detection: Cnn architectures, dataset characteristics and transfer learning. IEEE transactions on medical imaging 35 (5), 1285 17. Gessert, N., Lutz, M., Heyder, M., Latus, S., Leistner, D.M., Abdelwahed, Y.S., Schlae- fer, A. (2019) Automatic plaque detection in ivoct pullbacks using conv olutional neural netw orks. IEEE transactions on medical imaging 38 (2), 426–434 18. T a jbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Got wa y , M.B., Liang, J. (2016) Conv olutional neural net works for medical image analysis: F ull training or ﬁne tuning? IEEE transactions on medical imaging 35 (5), 1299–1312 19. Ra jadhy aksha, M., Grossman, M., Esterowitz, D., W ebb, R.H., Anderson, R.R. (1995) In vivo confo cal scanning laser microscop y of human skin: melanin provides strong con trast. Journal of Inv estigative Dermatology 104 (6), 946–952 20. Niederer, R. L., Perumal, D., Sherwin, T., McGhee, C.N. (2007) Age-related diﬀerences in the normal h uman cornea: a laser scanning in vivo confo cal microscopy study . British Journal of Ophthalmology 21. Aubreville, M., Knipfer, C., Oetter, N., Jaremenk o, C., Rodner, E., Denzler, J., Bohr, C., Neumann, H., Stelzle, F., Maier, A. (2017) Automatic classiﬁcation of cancerous tissue in laserendomicroscopy images of the oral cavit y using deep learning. Scientiﬁc rep orts 7 (1), 11,979 22. Aubreville, M., Stoeve, M., Oetter, N., Goncalves, M., Knipfer, C., Neumann, H., Bohr, C., Stelzle, F., Maier, A. (2018) Deep learning-based detection of motion artifacts in prob e- based confocal laser endomicroscopy images. International journal of computer assisted radiology and surgery pp. 1–12 23. Wiltgen, M., Bloice, M. (2016) Automatic in terpretation of melanocytic images in confocal laser scanning microscopy . In: Microscopy and Analysis. InT ech 24. Hong, J., Park, B.y ., Park, H. (2017) Conv olutional neural netw ork classiﬁer for distin- guishing barrett’s esophagus and neoplasia endomicroscopy images. In: Engineering in Medicine and Biology So ciety (EMBC), 2017 39th Annual In ternational Conference of the IEEE, pp. 2892–2895. IEEE Deep T ransfer Learning Metho ds for Colon Cancer Classiﬁcation with CLM 13 25. Izadyyazdanabadi, M., Belykh, E., Mo oney , M.A., Esch bacher, J.M., Nak a ji, P ., Y ang, Y., Preul, M.C. (2018) Prosp ects for theranostics in neurosurgical imaging: Emp o wering confocal laser endomicroscop y diagnostics via deep learning. F rontiers in Oncology 8 , 240 26. Izadyyazdanabadi, M., Belykh, E., Martirosy an, N., Esch bacher, J., Nak a ji, P ., Y ang, Y., Preul, M.C. (2017) Improving utilit y of brain tumor confocal laser endomicroscopy: ob jec- tive v alue assessment and diagnostic frame detection with conv olutional neural netw orks. In: Medical Imaging 2017: Computer-Aided Diagnosis, vol. 10134, p. 101342J. Interna- tional So ciet y for Optics and Photonics 27. Izadyyazdanabadi, M., Belykh, E., Cav allo, C., Zhao, X., Gandhi, S., Moreira, L.B., Es- ch bacher, J., Nak a ji, P ., Preul, M.C., Y ang, Y. (2018) W eakly-supervised learning-based feature lo calization for confo cal laser endomicroscopy glioma images. In: International Conference on Medical Image Computing and Computer-Assisted Interv ention, pp. 300– 308. Springer 28. Izadyyazdanabadi, M., Belykh, E., Mo oney , M., Martirosyan, N., Esch bacher, J., Nak a ji, P ., Preul, M.C., Y ang, Y. (2018) Conv olutional neural netw orks: ensem ble modeling, ﬁne- tuning and unsupervised semantic localization for neurosurgical cle images. Journal of Visual Communication and Image Represen tation 54 , 10–20 29. Simony an, K., Zisserman, A. (2014) V ery deep conv olutional net works for large-scale image recognition. arXiv preprint 30. Szegedy , C., V anhouck e, V., Ioﬀe, S., Shlens, J., W o jna, Z. (2016) Rethinking the inception architecture for computer vision. In: CVPR, pp. 2818–2826 31. Huang, G., Liu, Z., W einberger, K.Q., v an der Maaten, L. (2016) Densely connected con- volutional networks. arXiv preprint 32. Hu, J., Shen, L., Sun, G. (2018) Squeeze-and-excitation netw orks. In: Pro ceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 33. Gessert, N., Wittig, L., Dr¨ omann, D., Keck, T., Schlaefer, A., Ellebrech t, D.B. (2019) F easibility of colon cancer detection in confo cal laser microscopy images using conv olution neural netw orks. In: Bildverarbeitung f ¨ ur die Medizin 2019 34. Ioﬀe, S., Szegedy , C. (2015) Batch normalization: Accelerating deep net work training by reducing internal co v ariate shift. In: ICML 35. He, K., Zhang, X., Ren, S., Sun, J. (2016) Deep residual learning for image recognition. In: CVPR, pp. 770–778 36. Nair, V., Hinton, G.E. (2010) Rectiﬁed linear units improv e restricted boltzmann machines. In: ICML, pp. 807–814 37. Xie, S., Girshick, R., Doll´ ar, P ., T u, Z., He, K. (2017) Aggregated residual transformations for deep neural netw orks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5987–5995. IEEE 38. Shin, H.C., Roth, H.R., Gao, M., Le Lu, Xu, Z., Nogues, I., Y ao, J., Mollura, D., Summers, R.M. (2016) Deep conv olutional neural networks for computer-aided detection: CNN ar- chitectures, dataset characteristics and transfer learning. IEEE T ransactions on Medical Imaging 35 (5), 1285–1298 39. Y osinski, J., Clune, J., Bengio, Y., Lipson, H. (2014) Ho w transferable are features in deep neural netw orks? In: Adv ances in neural information processing systems, pp. 3320–3328 40. Herath, S., Harandi, M., Porikli, F. (2017) Going deep er into action recognition: A surv ey . Image and vision computing 60 , 4–21

Deep Transfer Learning Methods for Colon Cancer Classification in Confocal Laser Microscopy Images

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment