Classification of glomerular hypercellularity using convolutional features and support vector machine

Classiﬁcation of glomerular h yp ercellularit y using con v olutional features and supp ort v ector mac hine P aulo Chagas a, ∗ , Luiz Souza a, ∗ , Ik aro Ara´ ujo b , Na yze Aldeman c , Angelo Duarte d , Mic hele Angelo d , W ashington LC dos-San tos e, ∗∗ , Luciano Oliv eira a, ∗∗ a IVISION L ab, Universidade F eder al da Bahia, Bahia, Brazil b PPGM, Universidade F e deral da Bahia, Bahia, Br azil c Dep artamento de Medicina Esp ecializada - Universidade F e der al do Piau ´ ı, Piau ´ ı, Br azil d Universidade Estadual de F eir a de Santana, Bahia, Brazil e F unda¸ c˜ ao Oswaldo Cruz - Instituto Gon¸ calo Moniz, Bahia, Br azil Abstract Glomeruli are histological structures of the kidney cortex formed by in terwo ven blo od capillaries, and are resp onsible for blo o d ﬁltration. Glomerular lesions impair kidney ﬁltration capability , leading to protein loss and metab olic waste reten tion. An example of lesion is the glomerular hypercellularity , which is c haracterized by an increase in the num b er of cell nuclei in diﬀeren t areas of the glomeruli. Glomerular hypercellularity is a frequent lesion present in diﬀer- en t kidney diseases. Automatic detection of glomerular hypercellularity would accelerate the screening of scanned histological slides for the lesion, enhancing clinical diagnosis. Ha ving this in mind, w e prop ose a new approac h for clas- siﬁcation of h yp ercellularit y in human kidney images. Our prop osed method in tro duces a no vel architecture of a conv olutional neural netw ork (CNN) along with a supp ort vector mac hine, achieving near p erfect av erage results with the FIOCR UZ data set in a binary classiﬁcation (lesion or normal). Our deep- based classiﬁer outp erformed the state-of-the-art results on the same data set. Additionally , classiﬁcation of hypercellularity sub-lesions was also p erformed, considering mesangial, endo capilar and b oth lesions; in this m ulti-classiﬁcation task, our prop osed metho d just failed in 4% of the cases. T o the best of our ∗ First t wo authors contributed equally . ∗∗ Corresponding authors: Luciano Oliveira, email: lreb ouca@ufba.br, W ashington L-C dos Santos, email: wluis@bahia.ﬁocruz.br Pr eprint submitted to arxiv.or g July 2, 2019 kno wledge, this is the ﬁrst study on deep learning ov er a data set of glomerular h yp ercellularit y images of human kidney . Keywor ds: h yp ercellularit y , human kidney biopsy , con volutional neural net work. 1. In tro duction Digital histopathology is a research ﬁeld that exploits digital images for the analysis of tissue samples. The digital pictures are obtained either by scanning histological whole-slide-images (WSIs) or by collecting snapshots of histological structures relev an t for the diagnosis of diseases (Al-Janabi et al., 2012). This approac h makes gathering large-scale data sets of histological lesions easier to review or to exc hange information among pathologists without the inconv enience of working with the actual glass slides. The evolution of the computer vision ﬁeld impacted the entire digital medicine, supp orting pathologists on the automatic analysis of v arious t yp es of medical images, as well as improving the accuracy of computer-aided diagnosis (Madabhushi and Lee, 2016; Litjens et al., 2017). In the sp ecial case of renal histopathology , disease markers are mostly found in the glomeruli, presen ting highly diverse and heterogeneous characteristics. The glomerulus is a histological structure from the kidney cortex, formed by a net work of capillaries charged of p erforming bloo d ﬁltration. As an elemen tary ﬁltering structure, it is targeted with many primary and systemic diseases, lead- ing to diﬀerent patterns of glomerular lesions. Finding and classifying glomeru- lar lesions are fundamental steps tow ard the diagnosis of many kidney diseases. These tasks rely on the exp ertise of pathologists and muc h eﬀort has b een made to b etter deﬁne and create consensus ab out relev ant lesions. In fact, after suc- cessiv e discussion and v alidation studies in the ﬁeld, increased consistency has b een achiev ed in the diagnosis and classiﬁcation of glomerular renal diseases suc h as lupus nephritis, IgA nephropath y , and rejection of kidney transplant (Ba jema et al., 2018; T rimarchi et al., 2017; Joosten et al., 2005). Some lim- iting factors to the p erformance of histological diagnosis are the complexity of 2 Figure 1: Mark with an X on the image with hypercellularity lesion. lesions, which, in some cases, ma y impair a clear deﬁnition in terms of crite- ria and consequently a suitable agreement among pathologists (Barisoni et al., 2013). P articularly , glomerular hypercellularity is a frequent lesion found in kidney biopsies, deﬁned b y an increase in the n um b er of cells in the glomeruli. This t yp e of lesion is an integral comp onent of man y glomerular diseases such as proliferativ e and membranoproliferativ e glomerulonephritis, being a marker of activit y in lupus and IgA nephropathy (Ba jema et al., 2018; T rimarchi et al., 2017). Hyp ercellularity can b e iden tiﬁed by a careful lo ok at the histological sections from the glomeruli, searching for the presence of agglomerates formed b y four or more cell n uclei in the mesangial area (mesangial h yp ercellularity), or b y cell aggregates that ﬁll the capillary lumen (endo capillary h yp ercellularit y) (Ch urg et al., 1995; F ogo, 2003). Figure 1 sho ws the complexit y of this problem, and the follo wing question can be raised: Which image depicts a glomerulus with a hyp er c el lularity lesion? The answer to this question is the image on the right due to the increased n uclei densit y; on the left, the image sho ws an example of a normal glomerulus with no signiﬁcan t n um b er of cell clusters. Although h yp ercellularit y is easy to deﬁne and usually easy to b e assessed in histological sections, an agreemen t among pathologists ma y decrease for fo cal h yp ercellularit y and for occurrences in speciﬁc regions of the glomerulus. F or in- 3 stance, a recent report from the IgA Nephr op athy Classiﬁc ation Working Gr oup sho wed inconsistencies among sp ecialist even in the use of dic hotomous MEST system scores such as E (endo capillary hypercellularity) and M (mesangial hy- p ercellularit y) (T rimarc hi et al., 2017). Correct assessmen t of these scores is cru- cial for relev ant clinical-pathological correlation and for predicting the patient outcome. A consisten t glomerulus classiﬁcation can b e deemed as an imp or- tan t and diﬃcult step to wards diagnosing a renal disease in a biopsy ev aluation (P edraza et al., 2017). Some works ha ve already approac hed the tasks of glomerulus identiﬁcation and segmen tation (Sarder et al., 2016; Kannan et al., 2019; Simon et al., 2018), whic h are useful in situations that require an analysis of the en tire WSI. Barros et al. (2017) prop osed a metho d relying on classical image pre-pro cessing tec h- niques and a k-nearest neighborho o d to classify hypercellularity lesions; that w ork used 811 images of h uman glomeruli (referred here as FIOCRUZ data set) stained with hematoxylin-eosin (H&E) and perio dic acid–Sc hiﬀ (P AS) from a set of biopsy slides. More recen tly , deep neural netw orks outp erformed hand- crafted features for some tasks on histological images as w ell, ac hieving stunning results in diﬀeren t scenarios (Janow czyk and Madabhushi, 2016; Xu et al., 2016; Sharma et al., 2017; W ahab et al., 2017; Spanhol et al., 2016; Hou et al., 2016; Zhang et al., 2018; F abija ´ nsk a, 2018; Gandomk ar et al., 2018). In particular to glomerular detection with deep-learning, Marsh et al. (2018) introduced a con volutional neural netw ork for automatic lo calization of glomeruli, further classifying global glomerulosclerosis in donor kidney biopsies for transplanta- tion. An automated pro cess for glomerular lesion classiﬁcation would hav e many applications, such as: Large-scale classiﬁcation of cases based on histological images, consistency of morphological classiﬁcation, and iden tiﬁcation of tissue mark ers of disease progression. 4 1.1. Contributions Three main con tributions are brought here: (i) Instead of using con ven tional classiﬁcation metho ds as in (Barros et al., 2017), w e prop ose a CNN-based arc hitecture to extract trainable features to represen t a glomerulus, (ii) b y using the prop osed CNN as a feature extractor, an SVM classiﬁes the CNN features as a normal or a injured glomerulus, (iii) we also extend the prop osed mo del for classiﬁcation of sp eciﬁc hypercellularity lesions (endo capillary hypercellularity , mesangial hypercellularity , and b oth), providing an analysis of the generated features for b oth binary and multi-lesion classiﬁcation. The ﬁnal CNN-SVM classiﬁer reac hed near p erfect results in four diﬀerent train/test splits of the data set introduced in (Barros et al., 2017); in the multi-classiﬁcation task, the same architecture failed in just 4% of the cases in ten-fold cross-v alidation study . A t the end, the misclassiﬁed images were analyzed b y three pathologists, sho wing that there were no consensus for most of those images. 2. Classifying glomerular hypercellularity The classiﬁcation of a glomerular hypercellularity lesion could b e tac kled as deﬁning areas and counting nuclei. If the num b er of nuclei p er area surpasses a threshold, one can diagnose a glomerulus as with a hypercellularity lesion. Instead of following this pathologist-annotation approac h, an automatic clas- siﬁcation consists of using examples of histological images to train a classiﬁer. A histological image is a 2-dimensional grid of pixels that brings sp eciﬁc infor- mation suc h as colors, edges, shap es, textures, which can b e general or sp eciﬁc to classify a glomerular lesion. Consequently , conceiving a successful feature extractor demands some domain exp ertise, which brings us to the following question: What is the b est fe atur e set for classifying glomerular hyp er c el lularity lesions? Man y feature extraction tec hniques are av ailable in the literature, and a sp e- ciﬁc metho d could b e designed as well. In contrast to conv entional classiﬁers, deep-learning aims to automatically learn hierarchical feature representations 5 of the input data, without the need of creating an y particular feature extractor (LeCun et al., 2015). Our work prop oses a nov el CNN-based arc hitecture for glomerular hypercellularity classiﬁcation. After training a CNN, it is possible to use a strong classiﬁer on the conv olutional backbone of features. This wa y , w e prop ose to use a CNN architecture to extract trainable features, which ul- timately will feed an SVM to carry on the ﬁnal classiﬁcation. The prop osed arc hitecture is ev aluated for b oth binary and multi-class classiﬁcation. The ra- tionale to use an SVM is based on the main characteristic of this classiﬁer that is to cast optimization problems, which are conv ex and quadratic. Ultimately , these characteristics guaran tee that the hyperplane found is the optimum one. The second reason is to analyze the behavior of feature space extracted from the CNN, whic h empirically demonstrated to b e linear, in our exp erimen ts. Linearit y in the feature space is expected to pro vide faster and higher results. 2.1. Conc eiving the pr op ose d CNN ar chite ctur e There are several well-established CNN architectures av ailable in the liter- ature (Canziani et al., 2016), which w ere designed to b e robust to deal with h undreds of diﬀeren t classes. How ever, these mo dels tend to ov erﬁtting, when trained using few data. Since the data set we used (Barros et al., 2017) consists of a small training set, we decided to build our own architecture from scratch, mo difying it accordingly to our needs. The ultimate goal is to focus on achieving a high accuracy , av oiding o verﬁtting. A CNN architecture is organized in lay ers, eac h one applying a sp eciﬁc op- eration. Although there are man y v ariations of CNN arc hitectures, they share some basic components, suc h as con volutional, p ooling, and fully-connected lay- ers (Gu et al., 2015). The conv olutional lay er is the fundamen tal building blo ck of a CNN mo del, which is comprised of v arious learnable kernels (ﬁlters) fol- lo wed by a nonlinear activ ation function. A p ooling lay er (usually applied after a conv olutional lay er) is used to compute feature maps condensed in a smaller represen tation with the goal of achieving some inv ariance. After some conv o- lutional and po oling op erations, the top of the net w ork results in a high-lev el 6 represen tation of the input image, which is more robust than the raw pixel in- formation, or hop efully than handcrafted features. This type of architecture requires a fully-connected lay er to p erform high-level classiﬁcation using those features, working as a m ultilay er perceptron (MLP) on top of a CNN backbone. F our arc hitectures were initially implemented and Figure 2 highlights the con volutional blocks (CNN bac kb one) used for feature extraction, and the MLP blo c ks (fully-connected lay ers and ﬁnal activ ation) used for classiﬁcation. The ﬁrst architecture was designed in the view of in vestigating how the lesion clas- siﬁcation b eha v ed using fewer lay ers. In addition to the op erations previously cited, batch normalization, regularization, and drop out op erations were applied to reduce ov erﬁtting. The ﬁrst architecture (Fig. 2a) is comp osed mainly of four con volutional lay ers, with the other op erations applied b et ween those lay- ers, follow ed by one fully-connected lay er. A rectiﬁed linear unit (ReLu) w as used as an activ ation function and max-function for p ooling op erations. F or the calculation of the class probabilities after the fully-connected la yers, a sig- moid function w as ﬁrst tried, and further changed to a soft-max function. With this ﬁrst arc hitecture in mind, up dates were p erformed based on the stabilit y of the accuracy curve in the v alidation set, and other three architectures were prop osed (Figs. 2b, 2c and 2d). In order to choose the b est mo del among the candidate arc hitectures, we randomly selected 90% of the data set for training the mo del, while using 10% for v alidation. T o deal with the great size of the data set in memory , w e applied a mini-batc h strategy , which consists of using several batc hes of N images to up date the ﬁnal mo del (instead of one single blo c k of data). After eac h ep o c h, the prop osed architecture was ev aluated by using the v alidation set. Since we fo cused on reducing the o verﬁtting, the more likely architecture to b e selected w ould b e the one with high accuracy and less oscillation in the accuracy . Fig- ure 3 shows the accuracy curv e for each arc hitecture, illustrating the raise not only on the accuracy p eak, but also on the stability of the curve after sev eral ep ochs. Our ﬁnal CNN architecture (Fig. 2d) consists mainly of six con volu- tional lay ers, ﬁve max-p ooling lay ers, follo w ed by three fully-connected and one 7 (a) Architecture 1 (b) Architecture 2 (c) Architecture 3 (d) Architecture 4 Figure 2: F our CNN architectures prop osed here: (a) Architecture 1 and (b) architecture 2, with four conv olutional layers in the backbone; (c) architecture 3 and (d) architecture 4, with ﬁve and six conv olutional lay ers, resp ectively , in the bac kb one. 8 Figure 3: Stabilit y evaluation of CNN architectures. F rom ’Architecture 1’ to ’Arc hitecture 4’, accuracy reaches stability as the n umber of training ep o c hs increases. Best results were achiev ed with ’Architecture 4’, which used a softmax lay er at the top, resulting in a more stable accuracy on the v alidation set. soft-max lay ers for classiﬁcation. The training parameters w ere empirically ob- tained through several exp erimen ts on the four architectures. The b est results using architecture 4 were achiev ed by training the deep netw ork using the follo wing parameters: 200 ep ochs, Adam training algorithm (Kingma and Ba, 2014), 10 − 6 of deca y rate, batch size of 32, and a learning rate of 10 − 4 . 2.2. Classifying the CNN fe atur es with SVM After choosing the b est architecture, the trained CNN features fed an SVM, instead of the multi-la y er p erceptron used for training the mo del. This CNN- SVM architecture was ev aluated with four k ernel functions: Linear, radial basis function (RBF), p olynomial and sigmoid (see Fig. 4). 9 Feature extractor Input image Feature vector (N=128) RGB channels resized to 224x224 32,(7,7) 32,(7,7) 64,(5,5) 64,(5,5) 128,(3,3) 128,(3,3) SVM Linear Polynomial RBF Sigmoid Kernels Figure 4: CNN-SVM architecture. F rom left to right: a glomerulus as an input image in an R GB color space, resized to 224 × 224 pixels. After applying architecture 4 (Fig. 2d) for feature extraction , a feature vector with 128 features is generated. Finally , the resultant feature vector is classiﬁed by an SVM ev aluated by considering linear, p olynomial, RBF and sigmoid k ernel functions. SVM is a sup ervised binary classiﬁer, whic h ﬁnds an optimal hyperplane to separate the classes of h yp ercellularit y from those of normal glomeruli b y using v supp ort vectors. When these classes are non linearly separable, diﬀerent kernel functions can b e used to map the input v ectors to a higher-dimensional space (so-called fe atur e sp ac e ), in whic h the input image can be linearly separated. T o classify an input feature vector, SVM ev aluates the sign of a function f(x), giv en b y f ( x ) = sig n v X i =1 y i α i N ( x i , x j ) + b ! , (1) where there are v -supp ort vectors with the mo del parameters, y i , and a bias parameter, b , laplacian co eﬃcients from the dual optimization problem, α i . N ( x i , x j ) is a kernel function. 3. Exp erimen t analysis 3.1. Data set In order to assess the performance of our prop osed CNN arc hitecture, the data set introduced in (Barros et al., 2017) w as used. The data set consists of 811 images, including 300 images of normal h uman glomerulus, while 511 images of h uman glomerulus with h yp ercellularit y . As the images originated from h uman 10 kidneys with diﬀerent diseases, the cellular comp onent of the h yp ercellularit y v aries among the cases. The images were selected from the digital histological image library of the Gon¸ calo Moniz Institute (FIOCRUZ), including cases of all the kidney biopsies p erformed for the diagnosis of glomerular diseases in refer- ral nephrology services of public hospitals in Bahia state, Brazil, b etw een 2003 and 2015. The tissue samples w ere ﬁxed in Bouin’s ﬁxativ e or formalin–acetic acid–alcohol, included in paraﬃn. Sections of 2-3 µ m w ere stained b y H&E and P AS. The images w ere captured using an Olympus QColor 3 digital camera attac hed to a Nikon E600 optical microscop e (using × 200 magniﬁcation). De- tails of the clinical and demographic characteristics of the patien ts from which kidney biopses w ere collected are presented in (Barros et al., 2017). Considering Oxford MEST, the former binary data set w as relab eled in to four classes: Endo- capillary (90 images with endo capillary hypercellularity), with mesangial (238 images with mesangial hypercellularity), endoMes (179 images of b oth lesions) and normal (304 images with no lesion). In this re-ev aluation pro cess, using the MEST criteria for hypercellularity , it is noteworth y that four images were misclassiﬁed as lesioned glomeruli in the original binary data set used by Barros et al. (2017). This o ccurrence led to a diﬀerence b et ween the n umber of normal glomeruli on the binary corpus (300 images) and on the 4-class (304) data set. 3.2. Metho dolo gy All images were resized and normalized to 224 × 224 pixels. F or a compara- tiv e ev aluation considering a binary classiﬁcation, a K-fold cross-v alidation was applied, v arying K as 2, 3, 5 and 10 folds. On eac h iteration, 1 diﬀeren t fold is used for v alidation, and the rest ( K − 1 folds) is used for training the model. With the b est CNN architecture, we compared the p erformance of tw o types of classiﬁers on the top of CNN backbone: CNN-MLP and CNN-SVM. Our metho dology can b e summarized in t wo steps: • CNN-MLP : the b est architecture is ﬁrst found by using only 90/10 split without cross-v alidation. Next, using diﬀeren t v alues of K, we applied 11 K-fold cross-v alidation, analyzing the p erformance of the mo dels using diﬀeren t sizes of training and v alidation data. • CNN-SVM : F or eac h v alue of K (folds), we selected the b est CNN-MLP mo del. Then, we used the CNN features, obtained from the last lay er b efore the fully-connected MLP , for the input of the SVM (see Fig. 4). Finally , for the multi classiﬁcation, w e used the same approac h as the binary classiﬁcation, but without v arying the v alue of K . Since the 4-class data set is deriv ed from the original data set used for binary classiﬁcation, the num b er of images p er class b ecame smaller. This w ay , we decided to use K = 10 in order to av oid a very small n umber of training samples p er class. The one-v ersus-all tec hnique w as used to achiev e SVM multi-class outputs. 3.3. Evaluation metrics F our metrics were used to ev aluate our proposed metho d: Precision (P) as the ratio of correctly predicting glomerular hypercellularity , and the sum of predicted true p ositive and false p ositive observ ations (whereb y high precis ion is regarded to lo w false p ositiv e rate), recall (R) as the ratio of correctly pre- dicting glomerular h yp ercellularit y , and the sum of predicted true p ositiv e and false negativ e observ ations (whereby high recall is regarded to lo w false negative rate), f1-score (F1) as the w eighted av erage of precision and recall (whereby high f1-score is regarded to high precision and recall rates), and, ﬁnally , ac- curacy (A CC) as the ratio of correctly predicting glomerular hypercellularity and normal glomeruli, and the total sum of p ositiv e and negative observ ations (whereb y accuracy is prop ortional to true p ositive and true negative rates, and in versely prop ortional to false p ositiv e and false negative rates). 3.4. Evaluating the pr op ose d CNN mo del for binary classiﬁc ation The ﬁnal CNN was ev aluated b y using the av erage of the chosen metrics, observing ho w the mo del generalized the classes as the size of the training and v alidation set changed. It is noteworth y that a K equals to 2 means a split 12 T able 1: Comparison b et ween four diﬀerent train/test splits with CNN-MLP on binary clas- siﬁcation. The av erage metrics and their standard deviations are given for precision ( µ P), recall ( µ R), f1-score ( µ F1) and accuracy ( µ ACC). Split CNN-MLP µ P µ R µ F1 µ Acc 90/10 0.996 ( ± 0.009 ) 0.997 ( ± 0.006 ) 0 . 995( ± 0 . 012) 0.996 ( ± 0.008 ) 80/20 0 . 995( ± 0 . 008) 0 . 994( ± 0 . 009) 0.996 ( ± 0.006) 0 . 995( ± 0 . 007) 67/33 0 . 995( ± 0 . 005) 0 . 994( ± 0 . 005) 0 . 995( ± 0 . 005) 0 . 995( ± 0 . 005) 50/50 0 . 987( ± 0 . 003) 0 . 987( ± 0 . 003) 0 . 987( ± 0 . 003) 0 . 988( ± 0 . 003) of 50/50, as well as, K equals to 3, 5 and 10, mean 67/33, 80/20 and 90/10, resp ectiv ely . Since the training set decreases prop ortionally to K, we used a tec hnique of image augmentation, enlarging t wice the original data set after ap- plying pre-deﬁned random mo diﬁcations such as rotation, horizontal ﬂip, zo om and shift. The training parameters were the same as the ones used to train the last architecture (see Section 2.1). F or each v alue of K, there were K diﬀer- en t v alidation sets, resulting in K training pro cesses and K candidate mo dels at the ending of the training. F or example, for K=10 there is one mo del for eac h training set combination, resulting in 10 mo dels. When we ev aluate only the CNN-MLP approac h, the av erage of the metrics were computed with re- sp ect to these 10 mo dels. Ho w ever, since the aim was using the mo del as a feature extractor bac kb one, the b est one out of the 10 candidates was selected, c ho osing the one with highest accuracy of all ep o c hs. T able 1 sho ws the results of training the proposed CNN-MLP mo del, displaying the av erage metrics and their standard deviations for eac h train/test split. In general, all the train/test splits returned top results, ac hieving accuracies b et ween 98.8% (50/50 split) and 99.6% (90/10 split). As exp ected, in the exp eriments using larger training sets (90/10 split), b etter results were ac hieved, although the worst scenario (50/50 split) still sho w ed sup erior v alues for all the prop osed metrics (around 98%) in comparison with previous work (Barros et al., 2017) (85%). Another observ a- tion is the small s tandard deviation on all results, demonstrating the stability of the mo del. 13 T able 2: Range of parameters to be ev aluated for each SVM kernel. Kernel F unction N ( x i , x j ) P arameter Linear x i T x j ’C’: [0.001, 0.01, 0.1, 1, 10, 100] RBF exp ( − γ || x i − x j || 2 ), ’C’: [0.001, 0.01, 0.1, 1, 10, 100], where γ refers to gamma ’gamma’: [0.001, 0.01, 1, 1.5, 2] Polynomial ( γ ( x i T x i ) + r ) d , ’C’: [0.001, 0.01, 0.1, 1, 10, 100], where γ denotes gamma, r by co ef θ ’gamma’: [0.001, 0.01, 1, 1.5, 2], and d by degree ’degree’:[1,2,3,4] Sigmoid tanh ( γ ( x i T x j ) + r ), ’C’: [0.001, 0.01, 0.1, 1, 10, 100], where γ denotes gamma ’gamma’: [0.001, 0.01, 1, 1.5, 2] and r is sp eciﬁed by co ef θ 3.5. Cho osing the b est SVM kernel for binary classiﬁc ation Cho osing optimal parameter v alues for the SVM kernel raises some questions ab out the in terpretation of the mo del generated by this function and the results obtained. These questions w ere inv estigated in sev eral works (Chapelle et al., 2002; Duan et al., 2003; Imbault and Lebart, 2004; F u et al., 2004; de Souza et al., 2006). As sho wn in T able 2, the CNN-SVM architecture was ev aluated with three parameters of k ernel functions: ’C’, ’gamma’ and ’degree’. The regularization parameter ’C’ is 1 b y default, common to all SVM kernels, trading oﬀ misclassiﬁcation of training examples against ﬂatness of the solution. A low ’C’ makes the classiﬁer ﬂatness smooth, while a high one can lead to ov erﬁtting. The ’gamma’ parameter is usually 1 by default divided by num b er of features, and it is presented in all SVM kernels, but the linear. A small ’gamma’ v alue represen ts a Gaussian distribution of the kernel function with large v ariance in such a w ay that the mo del might not capture the ”shap e” of the data set. When ’gamma’ is high, the resulting mo del will b ehav e similarly to a linear k ernel with a set of hyperplanes separating the p oints of the tw o classes; hence, large gamma tak es to high bias and low v ariance mo dels, and vice-v ersa. The ’degree’ parameter is 3 by default, and used only in p olynomial kernel function. This parameter adjusts the feature space for higher-dimensional interactions. Larger ’degrees’ tend to o verﬁt the data. The same range of K v alues applied to ev aluate the CNN was also used to 14 T able 3: The b est results p er SVM kernel on binary classiﬁcation. Split Kernel P arameters µ Acc 90/10 Linear ’C’: 1 1.000 ± (0.000) RBF C’: 0.1, ’gamma’: 0.001 1.000 ± (0.000) Polynomial ’C’: 1, ’degree’: 1, 1.000 ± (0.000) ’gamma’: 1 Sigmoid ’C’: 0.01, ’gamma’: 0.01 1.000 ± (0.000) 80/20 Linear ’C’: 0.001 0.994 ± (0.011) RBF C’: 0.1, ’gamma’: 0.01 0.996 ± (0.010) Polynomial ’C’: 0.001, ’degree’: 1, 0.994 ± (0.011) ’gamma’: 1 Sigmoid ’C’: 0.01, ’gamma’: 0.01 0.996 ± (0.010) 67/33 Linear ’C’: 10 0.993 ± (0.006) RBF ’C’: 1, ’gamma’: 1 0.994 ± (0.003) Polynomial ’C’: 0.001, ’degree’: 3, 0.994 ± (0.007) ’gamma’: 2 Sigmoid ’C’: 1, ’gamma’: 0.01 0.991 ± (0.009) 50/50 Linear ’C’: 0.01 0.988 ± (0.005) RBF C’: 10, ’gamma’: 0.01 0.988 ± (0.005) Polynomial ’C’: 0.001, ’degree’: 2, 0.988 ± (0.005) ’gamma’: 1.5 Sigmoid ’C’: 1, ’gamma’: 0.01 0.988 ± (0.005) T able 4: Comparison b et ween four diﬀeren t train/test splits with CNN-SVM on binary classi- ﬁcation. The av erage metrics and their standard deviation are given for precision ( µ P), recall ( µ R), f1-score ( µ F1) and accuracy ( µ ACC). Split CNN-SVM µ P µ R µ F1 µ Acc 90/10 1.000 ( ± 0.000 ) 1.000 ( ± 0.000 ) 1.000 ( ± 0.000 ) 1.000 ( ± 0.000 ) 80/20 0 . 996( ± 0 . 006) 0 . 996( ± 0 . 007) 0 . 996( ± 0 . 003) 0 . 996( ± 0 . 010) 67/33 0 . 996( ± 0 . 004) 0 . 996( ± 0 . 004) 0 . 996( ± 0 . 001) 0 . 994( ± 0 . 007) 50/50 0 . 988( ± 0 . 008) 0 . 983( ± 0 . 015) 0 . 985( ± 0 . 004) 0 . 988( ± 0 . 005) ev aluate SVM. T able 3 shows the best parameter co m binations for each kernel at eac h split, using accuracy as a metric for optimization. It is notew orthy that the linear k ernel achieving top results means that the feature space can b e linearly separable. The o verall results of the CNN-SVM approach are summarized in T able 4, sho wing the p erformance of the proposed approac h using the previously deﬁned metrics. F or the 90/10 split, all SVM kernels show ed p erfect A CC. F or 15 T able 5: Comparison b et ween CNN-MLP and CNN-SVM models on 4-class classiﬁcation with 90/10 split. The av erage metrics and their standard deviation are given for precision ( µ P), recall ( µ R), f1-score ( µ F1) and accuracy ( µ ACC). Method µ P µ R µ F1 µ Acc CNN-MLP 0 . 925( ± 0 . 063) 0 . 911( ± 0 . 084) 0 . 913( ± 0 . 080) 0 . 906( ± 0 . 085) CNN-SVM 0.944 ( ± 0.034 ) 0.945 ( ± 0.033 ) 0.944 ( ± 0.034 ) 0.945 ( ± 0.056 ) T able 6: The b est parameters p er SVM kernel on 4-class classiﬁcation. Kernel Parameters µ Acc Linear ’C’: 0.01 0.945 ± (0.056) RBF C’: 10, ’gamma’: 0.001 0.944 ± (0.057) P olynomial ’C’: 0.01, ’degree’: 1 0.945 ± (0.056) ’gamma’: 1 Sigmoid ’C’: 10, ’gamma’: 0.001 0.945 ± (0.056) the 80/20 split, RBF and sigmoid kernels achiev ed the highest results. In the 67/33 split, RB F kernel obtained the b est result. In the 50/50 split, all SVM k ernels ac hieved the same results. 3.6. Extending the pr op ose d ar chite ctur e to multi-classiﬁc ation W e also prop osed the use of our CNN-SVM architecture for classiﬁcation of a 4-class data set, including the following classes: Endo capillary h yp ercellu- larit y , mesangial h yp ercellularit y , endoMes (b oth lesions) hypercellularity , and normal glomerulus. The same binary classiﬁcation metho dology was follow ed, but now main taining K=10 on the cross-v alidation. As exp ected, the only mo d- iﬁcation on the CNN architecture was the num b er of dense lay ers at the top of the mo del, since the num b er of classes was changed. A t each fold on cross- v alidation, weigh ts from the best CNN-MLP mo del on binary classiﬁcation w ere loaded, updating only the n um b er of classes on the last lay er. Then, the whole CNN-MLP mo del was retrained on the 4-class data set using the same former training parameters, ac hieving an a verage accuracy of 90.6%. Just as the binary classiﬁcation, the b est model was selected among the 10 mo dels from each fold of cross-v alidation, using the CNN backbone as a feature extractor, feeding an 16 T able 7: Comparative p erformance for glomerular hypercellularity on binary classiﬁcation. Metho d Precision Recall F1-Score Accuracy CNN-SVM 1.00 1.00 1.00 1.00 CNN-MLP 0 . 99 0 . 99 0 . 99 0 . 99 Barros et al. (2017) 0.88 0.88 0.88 0.85 SVM classiﬁer. The k ernel parameters were v aried in the same wa y as in the former exp erimen ts, achieving, as the b est result, an av erage accuracy of 94.5%. T able 5 displays the ﬁnal results for CNN-MLP and CNN-SVM classiﬁcation on the 4-class data set, while T able 6 shows the parameters of the b est results for eac h SVM kernel. The linear kernel achiev ed the ov erall b est result again, pro ving the robustness of the CNN arc hitecture for feature extraction. 4. Discussion and conclusions Ov erall on binary classiﬁcation, the t wo classiﬁcation approaches (CNN- MLP and CNN-SVM) ac hieved high results on all metrics with lo w standard deviations, as show ed in T ables 1 and 4. The tw o methods had close results, with CNN-SVM approac h sho wing a slightly b etter performance for every v alue of K , pro ving the robustness of the ﬁnal prop osed mo del. Despite the un balanced data set (more samples for lesion than for normal glomeruli), w e did not observ e the mo dels b eing heavily biased on the class with more images. This behavior may b e due to tw o factors: Image augmen tation and feature quality . The pro cess of image augmentation helped to solv e this issue by increasing the num b er of images through random mo diﬁcations on the original training set. The features obtained from the CNN backbone prov ed to b e highly suitable for classiﬁcation using all kernels, achieving an av erage accuracy of 100% on the linear k ernel. This outcome demonstrates that, despite the size of the CNN features (128), these features are linearly separable, whic h is an outstanding ﬁnding. A summary of the results of binary classiﬁcation is presented in T able 7, displa ying the b est results of the CNN-MLP and CNN-SVM mo dels in compar- ison with the metho d prop osed in (Barros et al., 2017). As that previous w ork 17 did not use the F1-score for ev aluation, w e calculated this score based on the pro vided precision and recall. Hence, we could compare the three results using all four metrics, considering 10-fold cross-v alidation (90/10 split). T o the b est of our knowledge, Barros et al. (2017) were the ﬁrst to address the problem of glomerular hypercellularity lesion classiﬁcation so far, what demonstrates that w e achiev ed an improv ement of 15 p ercentage p oin ts with our prop osed deep learning-based model on the same data set. Considering the 4-class classiﬁcation, b oth CNN-MLP and CNN-SVM mo d- els achiev ed high results, e v en though the gap b etw een these t wo approac hes has increased (around four p ercen tage p oin ts), as w e can see in T able 5. This b eha vior may hav e o ccurred due to the diﬃcult y of diﬀerentiating the 4 classes, mainly with respect to the sub-lesions. Another relev ant characteristic is the endoMes class, whic h contains features that can be confused with both endo- capillary and mesangial hypercellularity . Figure 5 illustrates the feature space of the data set plotted using the t-distributed sto chastic neighbor em b edding (t- SNE), which is a common technique for visualizing high-dimensional data into 2-dimensional plots. It’s noteworth y that the ”no lesion” class is well separated from the other lesion classes, whic h explains the 100% accuracy of the binary classiﬁcation. The three lesion classes ha ve some well-deﬁned groups, but these classes also hav e some areas with quite an ov erlap of instances, meaning that images con taininig endo capillary , mesangial and endoMes h yp ercellularit y can b e very similar. Figure 6 shows six images misclassiﬁed by the CNN-SVM mo del, consid- ering every p ossible error com bination. These images depict complex lesions that ma y represent a c hallenge even for nephropathologists (corrob orating with the t-SNE visualization). Figure 6(a) represents a glomeruli with increased circularit y caused by cell proliferation and inﬂux of inﬂammatory cell with dis- ruption of glomerular compartmen ts. Figure 6(b) represents a glomeruli with h yp ercellularit y combined with mesangial matrix expansion and capillary wall thic kening probably b y immune complex deposition on the suendothelial and on the subepithelial asp ects of the glomerular basemen t membrane burling the lim- 18 Figure 5: t-SNE visualization of the 4-class data set. The CNN feature extractor generates a 128-dimensional feature v ector, and the t-SNE algorithm reduces the dimensionality to a 2-dimensional vector to help the analysis of clusters. its of glomerular compartments. Figure 6(c) hypercellularity is combined with capillary w all thick ening and partial mesangial dissolution. In Figures 6(d) and (f ), mesangial and capillary lumen are not alw ays well deﬁned. W e sho wed these six images to b e indep endently classiﬁed by three pathologists. The results of this analysis are shown in T able 8. Complete agreement among nephropathol- ogists on the distribution of hypercellularity w as achiev ed only in t wo out of the six images. In diagnostic practice most of the diﬃculties generated b y these complex lesions are usually solved b y examining contiguous tissue sections of 19 (a) (b) (c) (d) (e) (f ) Figure 6: Six images of misclassiﬁed glomeruli with CNN-SVM architecture. F rom the left to right: (a) endo capillary hypercellularity misclassiﬁed as mesangial hypercellularity , (b) endo- capillary hypercellularity misclassiﬁed as endoMes h yp ercellularit y , (c) mesangial hypercellu- larity misclassiﬁed as endo capillary hypercellularity , (d) mensangial h yp ercellularit y misclas- siﬁed as endoMes hypercellularity , (e) endoMes hypercellularity misclassiﬁed as endocapillary hypercellularity , and (f ) endoMes h yp ercellularit y misclassiﬁed as mesangial hypercellularity . 20 T able 8: Comparison b etw een the pathologists’ lab els and the results obtained by the trained CNN-SVM mo del. The Po ol column represen ts the ma jority voting outcome: computer (COMP) or pathologist (P A T). Image Classiﬁer P o ol (see Fig.6) Pathologist 1 Pathologist 2 Pathologist 3 CNN-SVM a END END END MES P A T b END ENDOMES ENDOMES E NDOMES COMP c MES END ENDOMES END COMP d MES MES MES ENDOMES P A T e ENDOMES MES ENDOMES END P A T f ENDOMES ENDOMES END MES P A T 2 to 10 µ m apart, stained with a v ariety of techniques to highlight basement mem brane and mesangial matrix such as P AS and Periodic acid-methenamine silv er (P AMS). Although p erfect results on FIOCRUZ data set hav e b een achiev ed, there is a considerable gap to mov e from academic researc h to practical computational systems that assist pathologists in an eﬀective w ay . F or future work, we are in vestigating diﬀeren t wa ys of using a transfer learning approac h to initialize our netw ork with better w eights for generalizing glomerulus image classes, where suﬃcien t training data exists. Additionally , we plan to expand the num b er of samples (now around 31,000 unlab elled images) in the data set, w orking with other types of lesions and histological stains used in the pathology laboratory for b etter data analysis. Another work in progress is the automatic glomerulus segmen tation in a WSI, containing sev eral glomeruli; the goal is to classify eac h found glomerulus, considering also the individual detection of eac h glomerulus comp onen t. 5. Ethical Considerations This w ork w as conducted in accordance with resolution No. 466/12 of the Brazilian National Health Council. T o preserve conﬁdentialit y , the im- ages (including those sho wn in the pap er) were separated from other patient’s data. No data presented herein allows patient identiﬁcation. All the pro ce- 21 dures were approv ed b y the Ethics Committee for Research Inv olving Human Sub jects of the Gon¸ calo Moniz Institute from the Oswaldo Cruz F oundation (CPqGM/FIOCR UZ), Proto cols No. 188/09 and No. 1817574. Ac kno wledgment The w ork was sp onsored by F unda¸ c˜ ao de Amparo ` a Pesquisa do Estado da Bahia (F APESB) grants TO-SUS0031/2018, TO-BOL0660/2018, TO-BOL0344/2018 and TO-PET0008/2015, resp ectiv ely . W ashington LC dos-San tos has a researc h sc holarship from CNPq Proc. No. 306779/2017. Luciano Oliveira has a research sc holarship from CNPq Pro c. No. 307550/2018-4. References References Al-Janabi S, Huisman A, V an Diest PJ. Digital pathology: current status and future persp ectives. Histopathology 2012;61(1):1–9. Ba jema IM, Wilhelm us S, Alpers CE, Bruijn JA, Colvin RB, Co ok HT, D’Agati VD, F errario F, Haas M, Jennette JC, et al. Revision of the international so ciet y of nephrology/renal pathology so ciet y classiﬁcation for lupus nephritis: clariﬁcation of deﬁnitions, and modiﬁed national institutes of health activit y and c hronicit y indices. Kidney in ternational 2018;93(4):789–96. Barisoni L, Nast CC, Jennette JC, Hodgin JB, Herzenberg AM, Lemley KV, Con wa y CM, Kopp JB, Kretzler M, Lienczewski C, et al. Digital pathology ev aluation in the multicen ter nephrotic syndrome study net work (neptune). Clinical Journal of the American So ciety of Nephrology 2013;8(8):1449–59. Barros GO, Nav arro B, Duarte A, Dos-Santos WL. Pathospotter-k: A computa- tional tool for the automatic iden tiﬁcation of glomerular lesions in histological images of kidneys. Scientiﬁc rep orts 2017;7:46769. 22 Canziani A, P aszke A, Culurciello E. An analysis of deep neural net w ork models for practical applications. arXiv preprint arXiv:160507678 2016;. Chap elle O, V apnik V, Bousquet O, Mukherjee S. Cho osing m ultiple parameters for support vector machines. Machine learning 2002;46(1-3):131–59. Ch urg J, Bernstein J, Glassock R. Renal disease: classiﬁcation and atlas of glomerular diseases. Igaku-Shoin, New Y ork, T okyo 1995;. Duan K, Keerthi SS, P o o AN. Ev aluation of simple p erformance measures for tuning svm hyperparameters. Neuro computing 2003;51:41–59. F abija ´ nsk a A. Segmen tation of corneal endothelium images using a u-net-based con volutional neural net work. Artiﬁcial Intelligence in Medicine 2018;88:1 – 13. F ogo AB. Approach to renal biopsy . American Journal of Kidney Diseases 2003;42(4):826–36. F u X, Ong C, Keerthi S, Hung GG, Goh L. Extracting the knowledge em b edded in supp ort vector machines. In: IEEE International Joint Conference on Neural Net w orks. IEEE; v olume 1; 2004. p. 291–6. Gandomk ar Z, Brennan PC, Mello-Thoms C. Mudern: Multi-category classiﬁca- tion of breast histopathological image using deep residual netw orks. Artiﬁcial In telligence in Medicine 2018;88:14 – 24. Gu J, W ang Z, Kuen J, Ma L, Shahroudy A, Shuai B, Liu T, W ang X, W ang L, W ang G, et al. Recen t adv ances in conv olutional neural netw orks. arXiv preprin t arXiv:151207108 2015;. Hou L, Samaras D, Kurc TM, Gao Y, Davis JE, Saltz JH. P atch-based conv o- lutional neural netw ork for whole slide tissue image classiﬁcation. In: IEEE Conference on Computer Vision and P attern Recognition. 2016. p. 2424–33. 23 Im bault F, Lebart K. A sto c hastic optimization approac h for parameter tuning of supp ort v ector machines. In: International Conference on P attern Recog- nition. IEEE; volume 4; 2004. p. 597–600. Jano wczyk A, Madabhushi A. Deep learning for digital pathology image anal- ysis: A comprehensive tutorial with selected use cases. Journal of pathology informatics 2016;7. Jo osten SA, Sijpkens YW, v an Ko oten C, Paul LC. Chronic renal allograft rejection: P athophysiologic considerations. Kidney In ternational 2005;68(1):1 – 13. Kannan S, Morgan LA, Liang B, Cheung MG, Lin CQ, Mun D, Nader RG, Belghasem ME, Henderson JM, F rancis JM, Chitalia VC, Kolac halama VB. Segmen tation of glomeruli within trichrome images using deep learning. Kid- ney In ternational Reports 2019;. Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv preprin t arXiv:14126980 2014;. LeCun Y, Bengio Y, Hin ton G. Deep learning. Nature 2015;521(7553):436–44. Litjens G, Ko oi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafo orian M, v an der Laak JA, v an Ginneken B, S´ anchez CI. A survey on deep learning in medical image analysis. Medical Image Analysis 2017;42:60 – 88. Madabh ushi A, Lee G. Image analysis and machine learning in digital pathology: Challenges and opp ortunities. Medical Image Analysis 2016;33:170 –5. Marsh JN, Matlo ck MK, Kudose S, Liu TC, Stapp en b ec k TS, Gaut JP, Swami- dass SJ. Deep learning global glomerulosclerosis in transplant kidney frozen sections. bioRxiv 2018;:292789. P edraza A, Gallego J, Lop ez S, Gonzalez L, Laurinavicius A, Bueno G. Glomeru- lus classiﬁcation with con volutional neural netw orks. In: Annual Conference on Medical Image Understanding and Analysis. Springer; 2017. p. 839–49. 24 Sarder P, Ginley B, T omaszewski JE. Automated renal histopathology: digital extraction and quantiﬁcation of renal pathology . In: Medical Imaging 2016: Digital Pathology . In ternational So ciet y for Optics and Photonics; v olume 9791; 2016. p. 97910F. Sharma H, Zerb e N, Klempert I, Hellwic h O, Hufnagl P. Deep con volutional neural netw orks for automatic classiﬁcation of gastric carcinoma using whole slide images in digital histopathology . Computerized Medical Imaging and Graphics 2017;61:2–13. Simon O, Y acoub R, Jain S, T omaszewski JE, Sarder P. Multi-radial lbp fea- tures as a to ol for rapid glomerular detection and assessment in whole slide histopathology images. Scientiﬁc rep orts 2018;8(1):2032. de Souza BF, de Carv alho AC, Calvo R, Ishii RP. Multiclass svm model selection using particle swarm optimization. In: In ternational Conference on Hybrid In telligent Systems. IEEE; 2006. p. 31. Spanhol FA, Oliv eira LS, Petitjean C, Heutte L. Breast cancer histopathological image classiﬁcation using con volutional neural netw orks. In: In ternational Join t Conference on Neural Netw orks. IEEE; 2016. p. 2560–7. T rimarchi H, Barratt J, Cattran DC, Co ok HT, Copp o R, Haas M, Liu ZH, Rob erts IS, Y uza wa Y, Zhang H, et al. Oxford classiﬁcation of iga nephropa- th y 2016: an up date from the iga nephropath y classiﬁcation working group. Kidney in ternational 2017;91(5):1014–21. W ahab N, Khan A, Lee YS. Tw o-phase deep con volutional neural netw ork for reducing class skewness in histopathological images based breast cancer detection. Computers in biology and medicine 2017;85:86–97. Xu J, Luo X, W ang G, Gilmore H, Madabh ushi A. A deep conv olutional neu- ral netw ork for segmenting and classifying epithelial and stromal regions in histopathological images. Neuro computing 2016;191:214–23. 25 Zhang G, Hsu CHR, Lai H, Zheng X. Deep learning based feature represen tation for automated skin histopathological image annotation. Multimedia T o ols and Applications 2018;77(8):9849–69. 26

Classification of glomerular hypercellularity using convolutional features and support vector machine

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment