Classification of glomerular hypercellularity using convolutional features and support vector machine
Glomeruli are histological structures of the kidney cortex formed by interwoven blood capillaries, and are responsible for blood filtration. Glomerular lesions impair kidney filtration capability, leading to protein loss and metabolic waste retention…
Authors: Paulo Chagas, Luiz Souza, Ikaro Araujo
Classification of glomerular h yp ercellularit y using con v olutional features and supp ort v ector mac hine P aulo Chagas a, ∗ , Luiz Souza a, ∗ , Ik aro Ara´ ujo b , Na yze Aldeman c , Angelo Duarte d , Mic hele Angelo d , W ashington LC dos-San tos e, ∗∗ , Luciano Oliv eira a, ∗∗ a IVISION L ab, Universidade F eder al da Bahia, Bahia, Brazil b PPGM, Universidade F e deral da Bahia, Bahia, Br azil c Dep artamento de Medicina Esp ecializada - Universidade F e der al do Piau ´ ı, Piau ´ ı, Br azil d Universidade Estadual de F eir a de Santana, Bahia, Brazil e F unda¸ c˜ ao Oswaldo Cruz - Instituto Gon¸ calo Moniz, Bahia, Br azil Abstract Glomeruli are histological structures of the kidney cortex formed by in terwo ven blo od capillaries, and are resp onsible for blo o d filtration. Glomerular lesions impair kidney filtration capability , leading to protein loss and metab olic waste reten tion. An example of lesion is the glomerular hypercellularity , which is c haracterized by an increase in the num b er of cell nuclei in differen t areas of the glomeruli. Glomerular hypercellularity is a frequent lesion present in differ- en t kidney diseases. Automatic detection of glomerular hypercellularity would accelerate the screening of scanned histological slides for the lesion, enhancing clinical diagnosis. Ha ving this in mind, w e prop ose a new approac h for clas- sification of h yp ercellularit y in human kidney images. Our prop osed method in tro duces a no vel architecture of a conv olutional neural netw ork (CNN) along with a supp ort vector mac hine, achieving near p erfect av erage results with the FIOCR UZ data set in a binary classification (lesion or normal). Our deep- based classifier outp erformed the state-of-the-art results on the same data set. Additionally , classification of hypercellularity sub-lesions was also p erformed, considering mesangial, endo capilar and b oth lesions; in this m ulti-classification task, our prop osed metho d just failed in 4% of the cases. T o the best of our ∗ First t wo authors contributed equally . ∗∗ Corresponding authors: Luciano Oliveira, email: lreb ouca@ufba.br, W ashington L-C dos Santos, email: wluis@bahia.fiocruz.br Pr eprint submitted to arxiv.or g July 2, 2019 kno wledge, this is the first study on deep learning ov er a data set of glomerular h yp ercellularit y images of human kidney . Keywor ds: h yp ercellularit y , human kidney biopsy , con volutional neural net work. 1. In tro duction Digital histopathology is a research field that exploits digital images for the analysis of tissue samples. The digital pictures are obtained either by scanning histological whole-slide-images (WSIs) or by collecting snapshots of histological structures relev an t for the diagnosis of diseases (Al-Janabi et al., 2012). This approac h makes gathering large-scale data sets of histological lesions easier to review or to exc hange information among pathologists without the inconv enience of working with the actual glass slides. The evolution of the computer vision field impacted the entire digital medicine, supp orting pathologists on the automatic analysis of v arious t yp es of medical images, as well as improving the accuracy of computer-aided diagnosis (Madabhushi and Lee, 2016; Litjens et al., 2017). In the sp ecial case of renal histopathology , disease markers are mostly found in the glomeruli, presen ting highly diverse and heterogeneous characteristics. The glomerulus is a histological structure from the kidney cortex, formed by a net work of capillaries charged of p erforming bloo d filtration. As an elemen tary filtering structure, it is targeted with many primary and systemic diseases, lead- ing to different patterns of glomerular lesions. Finding and classifying glomeru- lar lesions are fundamental steps tow ard the diagnosis of many kidney diseases. These tasks rely on the exp ertise of pathologists and muc h effort has b een made to b etter define and create consensus ab out relev ant lesions. In fact, after suc- cessiv e discussion and v alidation studies in the field, increased consistency has b een achiev ed in the diagnosis and classification of glomerular renal diseases suc h as lupus nephritis, IgA nephropath y , and rejection of kidney transplant (Ba jema et al., 2018; T rimarchi et al., 2017; Joosten et al., 2005). Some lim- iting factors to the p erformance of histological diagnosis are the complexity of 2 Figure 1: Mark with an X on the image with hypercellularity lesion. lesions, which, in some cases, ma y impair a clear definition in terms of crite- ria and consequently a suitable agreement among pathologists (Barisoni et al., 2013). P articularly , glomerular hypercellularity is a frequent lesion found in kidney biopsies, defined b y an increase in the n um b er of cells in the glomeruli. This t yp e of lesion is an integral comp onent of man y glomerular diseases such as proliferativ e and membranoproliferativ e glomerulonephritis, being a marker of activit y in lupus and IgA nephropathy (Ba jema et al., 2018; T rimarchi et al., 2017). Hyp ercellularity can b e iden tified by a careful lo ok at the histological sections from the glomeruli, searching for the presence of agglomerates formed b y four or more cell n uclei in the mesangial area (mesangial h yp ercellularity), or b y cell aggregates that fill the capillary lumen (endo capillary h yp ercellularit y) (Ch urg et al., 1995; F ogo, 2003). Figure 1 sho ws the complexit y of this problem, and the follo wing question can be raised: Which image depicts a glomerulus with a hyp er c el lularity lesion? The answer to this question is the image on the right due to the increased n uclei densit y; on the left, the image sho ws an example of a normal glomerulus with no significan t n um b er of cell clusters. Although h yp ercellularit y is easy to define and usually easy to b e assessed in histological sections, an agreemen t among pathologists ma y decrease for fo cal h yp ercellularit y and for occurrences in specific regions of the glomerulus. F or in- 3 stance, a recent report from the IgA Nephr op athy Classific ation Working Gr oup sho wed inconsistencies among sp ecialist even in the use of dic hotomous MEST system scores such as E (endo capillary hypercellularity) and M (mesangial hy- p ercellularit y) (T rimarc hi et al., 2017). Correct assessmen t of these scores is cru- cial for relev ant clinical-pathological correlation and for predicting the patient outcome. A consisten t glomerulus classification can b e deemed as an imp or- tan t and difficult step to wards diagnosing a renal disease in a biopsy ev aluation (P edraza et al., 2017). Some works ha ve already approac hed the tasks of glomerulus identification and segmen tation (Sarder et al., 2016; Kannan et al., 2019; Simon et al., 2018), whic h are useful in situations that require an analysis of the en tire WSI. Barros et al. (2017) prop osed a metho d relying on classical image pre-pro cessing tec h- niques and a k-nearest neighborho o d to classify hypercellularity lesions; that w ork used 811 images of h uman glomeruli (referred here as FIOCRUZ data set) stained with hematoxylin-eosin (H&E) and perio dic acid–Sc hiff (P AS) from a set of biopsy slides. More recen tly , deep neural netw orks outp erformed hand- crafted features for some tasks on histological images as w ell, ac hieving stunning results in differen t scenarios (Janow czyk and Madabhushi, 2016; Xu et al., 2016; Sharma et al., 2017; W ahab et al., 2017; Spanhol et al., 2016; Hou et al., 2016; Zhang et al., 2018; F abija ´ nsk a, 2018; Gandomk ar et al., 2018). In particular to glomerular detection with deep-learning, Marsh et al. (2018) introduced a con volutional neural netw ork for automatic lo calization of glomeruli, further classifying global glomerulosclerosis in donor kidney biopsies for transplanta- tion. An automated pro cess for glomerular lesion classification would hav e many applications, such as: Large-scale classification of cases based on histological images, consistency of morphological classification, and iden tification of tissue mark ers of disease progression. 4 1.1. Contributions Three main con tributions are brought here: (i) Instead of using con ven tional classification metho ds as in (Barros et al., 2017), w e prop ose a CNN-based arc hitecture to extract trainable features to represen t a glomerulus, (ii) b y using the prop osed CNN as a feature extractor, an SVM classifies the CNN features as a normal or a injured glomerulus, (iii) we also extend the prop osed mo del for classification of sp ecific hypercellularity lesions (endo capillary hypercellularity , mesangial hypercellularity , and b oth), providing an analysis of the generated features for b oth binary and multi-lesion classification. The final CNN-SVM classifier reac hed near p erfect results in four different train/test splits of the data set introduced in (Barros et al., 2017); in the multi-classification task, the same architecture failed in just 4% of the cases in ten-fold cross-v alidation study . A t the end, the misclassified images were analyzed b y three pathologists, sho wing that there were no consensus for most of those images. 2. Classifying glomerular hypercellularity The classification of a glomerular hypercellularity lesion could b e tac kled as defining areas and counting nuclei. If the num b er of nuclei p er area surpasses a threshold, one can diagnose a glomerulus as with a hypercellularity lesion. Instead of following this pathologist-annotation approac h, an automatic clas- sification consists of using examples of histological images to train a classifier. A histological image is a 2-dimensional grid of pixels that brings sp ecific infor- mation suc h as colors, edges, shap es, textures, which can b e general or sp ecific to classify a glomerular lesion. Consequently , conceiving a successful feature extractor demands some domain exp ertise, which brings us to the following question: What is the b est fe atur e set for classifying glomerular hyp er c el lularity lesions? Man y feature extraction tec hniques are av ailable in the literature, and a sp e- cific metho d could b e designed as well. In contrast to conv entional classifiers, deep-learning aims to automatically learn hierarchical feature representations 5 of the input data, without the need of creating an y particular feature extractor (LeCun et al., 2015). Our work prop oses a nov el CNN-based arc hitecture for glomerular hypercellularity classification. After training a CNN, it is possible to use a strong classifier on the conv olutional backbone of features. This wa y , w e prop ose to use a CNN architecture to extract trainable features, which ul- timately will feed an SVM to carry on the final classification. The prop osed arc hitecture is ev aluated for b oth binary and multi-class classification. The ra- tionale to use an SVM is based on the main characteristic of this classifier that is to cast optimization problems, which are conv ex and quadratic. Ultimately , these characteristics guaran tee that the hyperplane found is the optimum one. The second reason is to analyze the behavior of feature space extracted from the CNN, whic h empirically demonstrated to b e linear, in our exp erimen ts. Linearit y in the feature space is expected to pro vide faster and higher results. 2.1. Conc eiving the pr op ose d CNN ar chite ctur e There are several well-established CNN architectures av ailable in the liter- ature (Canziani et al., 2016), which w ere designed to b e robust to deal with h undreds of differen t classes. How ever, these mo dels tend to ov erfitting, when trained using few data. Since the data set we used (Barros et al., 2017) consists of a small training set, we decided to build our own architecture from scratch, mo difying it accordingly to our needs. The ultimate goal is to focus on achieving a high accuracy , av oiding o verfitting. A CNN architecture is organized in lay ers, eac h one applying a sp ecific op- eration. Although there are man y v ariations of CNN arc hitectures, they share some basic components, suc h as con volutional, p ooling, and fully-connected lay- ers (Gu et al., 2015). The conv olutional lay er is the fundamen tal building blo ck of a CNN mo del, which is comprised of v arious learnable kernels (filters) fol- lo wed by a nonlinear activ ation function. A p ooling lay er (usually applied after a conv olutional lay er) is used to compute feature maps condensed in a smaller represen tation with the goal of achieving some inv ariance. After some conv o- lutional and po oling op erations, the top of the net w ork results in a high-lev el 6 represen tation of the input image, which is more robust than the raw pixel in- formation, or hop efully than handcrafted features. This type of architecture requires a fully-connected lay er to p erform high-level classification using those features, working as a m ultilay er perceptron (MLP) on top of a CNN backbone. F our arc hitectures were initially implemented and Figure 2 highlights the con volutional blocks (CNN bac kb one) used for feature extraction, and the MLP blo c ks (fully-connected lay ers and final activ ation) used for classification. The first architecture was designed in the view of in vestigating how the lesion clas- sification b eha v ed using fewer lay ers. In addition to the op erations previously cited, batch normalization, regularization, and drop out op erations were applied to reduce ov erfitting. The first architecture (Fig. 2a) is comp osed mainly of four con volutional lay ers, with the other op erations applied b et ween those lay- ers, follow ed by one fully-connected lay er. A rectified linear unit (ReLu) w as used as an activ ation function and max-function for p ooling op erations. F or the calculation of the class probabilities after the fully-connected la yers, a sig- moid function w as first tried, and further changed to a soft-max function. With this first arc hitecture in mind, up dates were p erformed based on the stabilit y of the accuracy curve in the v alidation set, and other three architectures were prop osed (Figs. 2b, 2c and 2d). In order to choose the b est mo del among the candidate arc hitectures, we randomly selected 90% of the data set for training the mo del, while using 10% for v alidation. T o deal with the great size of the data set in memory , w e applied a mini-batc h strategy , which consists of using several batc hes of N images to up date the final mo del (instead of one single blo c k of data). After eac h ep o c h, the prop osed architecture was ev aluated by using the v alidation set. Since we fo cused on reducing the o verfitting, the more likely architecture to b e selected w ould b e the one with high accuracy and less oscillation in the accuracy . Fig- ure 3 shows the accuracy curv e for each arc hitecture, illustrating the raise not only on the accuracy p eak, but also on the stability of the curve after sev eral ep ochs. Our final CNN architecture (Fig. 2d) consists mainly of six con volu- tional lay ers, five max-p ooling lay ers, follo w ed by three fully-connected and one 7 (a) Architecture 1 (b) Architecture 2 (c) Architecture 3 (d) Architecture 4 Figure 2: F our CNN architectures prop osed here: (a) Architecture 1 and (b) architecture 2, with four conv olutional layers in the backbone; (c) architecture 3 and (d) architecture 4, with five and six conv olutional lay ers, resp ectively , in the bac kb one. 8 Figure 3: Stabilit y evaluation of CNN architectures. F rom ’Architecture 1’ to ’Arc hitecture 4’, accuracy reaches stability as the n umber of training ep o c hs increases. Best results were achiev ed with ’Architecture 4’, which used a softmax lay er at the top, resulting in a more stable accuracy on the v alidation set. soft-max lay ers for classification. The training parameters w ere empirically ob- tained through several exp erimen ts on the four architectures. The b est results using architecture 4 were achiev ed by training the deep netw ork using the follo wing parameters: 200 ep ochs, Adam training algorithm (Kingma and Ba, 2014), 10 − 6 of deca y rate, batch size of 32, and a learning rate of 10 − 4 . 2.2. Classifying the CNN fe atur es with SVM After choosing the b est architecture, the trained CNN features fed an SVM, instead of the multi-la y er p erceptron used for training the mo del. This CNN- SVM architecture was ev aluated with four k ernel functions: Linear, radial basis function (RBF), p olynomial and sigmoid (see Fig. 4). 9 Feature extractor Input image Feature vector (N=128) RGB channels resized to 224x224 32,(7,7) 32,(7,7) 64,(5,5) 64,(5,5) 128,(3,3) 128,(3,3) SVM Linear Polynomial RBF Sigmoid Kernels Figure 4: CNN-SVM architecture. F rom left to right: a glomerulus as an input image in an R GB color space, resized to 224 × 224 pixels. After applying architecture 4 (Fig. 2d) for feature extraction , a feature vector with 128 features is generated. Finally , the resultant feature vector is classified by an SVM ev aluated by considering linear, p olynomial, RBF and sigmoid k ernel functions. SVM is a sup ervised binary classifier, whic h finds an optimal hyperplane to separate the classes of h yp ercellularit y from those of normal glomeruli b y using v supp ort vectors. When these classes are non linearly separable, different kernel functions can b e used to map the input v ectors to a higher-dimensional space (so-called fe atur e sp ac e ), in whic h the input image can be linearly separated. T o classify an input feature vector, SVM ev aluates the sign of a function f(x), giv en b y f ( x ) = sig n v X i =1 y i α i N ( x i , x j ) + b ! , (1) where there are v -supp ort vectors with the mo del parameters, y i , and a bias parameter, b , laplacian co efficients from the dual optimization problem, α i . N ( x i , x j ) is a kernel function. 3. Exp erimen t analysis 3.1. Data set In order to assess the performance of our prop osed CNN arc hitecture, the data set introduced in (Barros et al., 2017) w as used. The data set consists of 811 images, including 300 images of normal h uman glomerulus, while 511 images of h uman glomerulus with h yp ercellularit y . As the images originated from h uman 10 kidneys with different diseases, the cellular comp onent of the h yp ercellularit y v aries among the cases. The images were selected from the digital histological image library of the Gon¸ calo Moniz Institute (FIOCRUZ), including cases of all the kidney biopsies p erformed for the diagnosis of glomerular diseases in refer- ral nephrology services of public hospitals in Bahia state, Brazil, b etw een 2003 and 2015. The tissue samples w ere fixed in Bouin’s fixativ e or formalin–acetic acid–alcohol, included in paraffin. Sections of 2-3 µ m w ere stained b y H&E and P AS. The images w ere captured using an Olympus QColor 3 digital camera attac hed to a Nikon E600 optical microscop e (using × 200 magnification). De- tails of the clinical and demographic characteristics of the patien ts from which kidney biopses w ere collected are presented in (Barros et al., 2017). Considering Oxford MEST, the former binary data set w as relab eled in to four classes: Endo- capillary (90 images with endo capillary hypercellularity), with mesangial (238 images with mesangial hypercellularity), endoMes (179 images of b oth lesions) and normal (304 images with no lesion). In this re-ev aluation pro cess, using the MEST criteria for hypercellularity , it is noteworth y that four images were misclassified as lesioned glomeruli in the original binary data set used by Barros et al. (2017). This o ccurrence led to a difference b et ween the n umber of normal glomeruli on the binary corpus (300 images) and on the 4-class (304) data set. 3.2. Metho dolo gy All images were resized and normalized to 224 × 224 pixels. F or a compara- tiv e ev aluation considering a binary classification, a K-fold cross-v alidation was applied, v arying K as 2, 3, 5 and 10 folds. On eac h iteration, 1 differen t fold is used for v alidation, and the rest ( K − 1 folds) is used for training the model. With the b est CNN architecture, we compared the p erformance of tw o types of classifiers on the top of CNN backbone: CNN-MLP and CNN-SVM. Our metho dology can b e summarized in t wo steps: • CNN-MLP : the b est architecture is first found by using only 90/10 split without cross-v alidation. Next, using differen t v alues of K, we applied 11 K-fold cross-v alidation, analyzing the p erformance of the mo dels using differen t sizes of training and v alidation data. • CNN-SVM : F or eac h v alue of K (folds), we selected the b est CNN-MLP mo del. Then, we used the CNN features, obtained from the last lay er b efore the fully-connected MLP , for the input of the SVM (see Fig. 4). Finally , for the multi classification, w e used the same approac h as the binary classification, but without v arying the v alue of K . Since the 4-class data set is deriv ed from the original data set used for binary classification, the num b er of images p er class b ecame smaller. This w ay , we decided to use K = 10 in order to av oid a very small n umber of training samples p er class. The one-v ersus-all tec hnique w as used to achiev e SVM multi-class outputs. 3.3. Evaluation metrics F our metrics were used to ev aluate our proposed metho d: Precision (P) as the ratio of correctly predicting glomerular hypercellularity , and the sum of predicted true p ositive and false p ositive observ ations (whereb y high precis ion is regarded to lo w false p ositiv e rate), recall (R) as the ratio of correctly pre- dicting glomerular h yp ercellularit y , and the sum of predicted true p ositiv e and false negativ e observ ations (whereby high recall is regarded to lo w false negative rate), f1-score (F1) as the w eighted av erage of precision and recall (whereby high f1-score is regarded to high precision and recall rates), and, finally , ac- curacy (A CC) as the ratio of correctly predicting glomerular hypercellularity and normal glomeruli, and the total sum of p ositiv e and negative observ ations (whereb y accuracy is prop ortional to true p ositive and true negative rates, and in versely prop ortional to false p ositiv e and false negative rates). 3.4. Evaluating the pr op ose d CNN mo del for binary classific ation The final CNN was ev aluated b y using the av erage of the chosen metrics, observing ho w the mo del generalized the classes as the size of the training and v alidation set changed. It is noteworth y that a K equals to 2 means a split 12 T able 1: Comparison b et ween four different train/test splits with CNN-MLP on binary clas- sification. The av erage metrics and their standard deviations are given for precision ( µ P), recall ( µ R), f1-score ( µ F1) and accuracy ( µ ACC). Split CNN-MLP µ P µ R µ F1 µ Acc 90/10 0.996 ( ± 0.009 ) 0.997 ( ± 0.006 ) 0 . 995( ± 0 . 012) 0.996 ( ± 0.008 ) 80/20 0 . 995( ± 0 . 008) 0 . 994( ± 0 . 009) 0.996 ( ± 0.006) 0 . 995( ± 0 . 007) 67/33 0 . 995( ± 0 . 005) 0 . 994( ± 0 . 005) 0 . 995( ± 0 . 005) 0 . 995( ± 0 . 005) 50/50 0 . 987( ± 0 . 003) 0 . 987( ± 0 . 003) 0 . 987( ± 0 . 003) 0 . 988( ± 0 . 003) of 50/50, as well as, K equals to 3, 5 and 10, mean 67/33, 80/20 and 90/10, resp ectiv ely . Since the training set decreases prop ortionally to K, we used a tec hnique of image augmentation, enlarging t wice the original data set after ap- plying pre-defined random mo difications such as rotation, horizontal flip, zo om and shift. The training parameters were the same as the ones used to train the last architecture (see Section 2.1). F or each v alue of K, there were K differ- en t v alidation sets, resulting in K training pro cesses and K candidate mo dels at the ending of the training. F or example, for K=10 there is one mo del for eac h training set combination, resulting in 10 mo dels. When we ev aluate only the CNN-MLP approac h, the av erage of the metrics were computed with re- sp ect to these 10 mo dels. Ho w ever, since the aim was using the mo del as a feature extractor bac kb one, the b est one out of the 10 candidates was selected, c ho osing the one with highest accuracy of all ep o c hs. T able 1 sho ws the results of training the proposed CNN-MLP mo del, displaying the av erage metrics and their standard deviations for eac h train/test split. In general, all the train/test splits returned top results, ac hieving accuracies b et ween 98.8% (50/50 split) and 99.6% (90/10 split). As exp ected, in the exp eriments using larger training sets (90/10 split), b etter results were ac hieved, although the worst scenario (50/50 split) still sho w ed sup erior v alues for all the prop osed metrics (around 98%) in comparison with previous work (Barros et al., 2017) (85%). Another observ a- tion is the small s tandard deviation on all results, demonstrating the stability of the mo del. 13 T able 2: Range of parameters to be ev aluated for each SVM kernel. Kernel F unction N ( x i , x j ) P arameter Linear x i T x j ’C’: [0.001, 0.01, 0.1, 1, 10, 100] RBF exp ( − γ || x i − x j || 2 ), ’C’: [0.001, 0.01, 0.1, 1, 10, 100], where γ refers to gamma ’gamma’: [0.001, 0.01, 1, 1.5, 2] Polynomial ( γ ( x i T x i ) + r ) d , ’C’: [0.001, 0.01, 0.1, 1, 10, 100], where γ denotes gamma, r by co ef θ ’gamma’: [0.001, 0.01, 1, 1.5, 2], and d by degree ’degree’:[1,2,3,4] Sigmoid tanh ( γ ( x i T x j ) + r ), ’C’: [0.001, 0.01, 0.1, 1, 10, 100], where γ denotes gamma ’gamma’: [0.001, 0.01, 1, 1.5, 2] and r is sp ecified by co ef θ 3.5. Cho osing the b est SVM kernel for binary classific ation Cho osing optimal parameter v alues for the SVM kernel raises some questions ab out the in terpretation of the mo del generated by this function and the results obtained. These questions w ere inv estigated in sev eral works (Chapelle et al., 2002; Duan et al., 2003; Imbault and Lebart, 2004; F u et al., 2004; de Souza et al., 2006). As sho wn in T able 2, the CNN-SVM architecture was ev aluated with three parameters of k ernel functions: ’C’, ’gamma’ and ’degree’. The regularization parameter ’C’ is 1 b y default, common to all SVM kernels, trading off misclassification of training examples against flatness of the solution. A low ’C’ makes the classifier flatness smooth, while a high one can lead to ov erfitting. The ’gamma’ parameter is usually 1 by default divided by num b er of features, and it is presented in all SVM kernels, but the linear. A small ’gamma’ v alue represen ts a Gaussian distribution of the kernel function with large v ariance in such a w ay that the mo del might not capture the ”shap e” of the data set. When ’gamma’ is high, the resulting mo del will b ehav e similarly to a linear k ernel with a set of hyperplanes separating the p oints of the tw o classes; hence, large gamma tak es to high bias and low v ariance mo dels, and vice-v ersa. The ’degree’ parameter is 3 by default, and used only in p olynomial kernel function. This parameter adjusts the feature space for higher-dimensional interactions. Larger ’degrees’ tend to o verfit the data. The same range of K v alues applied to ev aluate the CNN was also used to 14 T able 3: The b est results p er SVM kernel on binary classification. Split Kernel P arameters µ Acc 90/10 Linear ’C’: 1 1.000 ± (0.000) RBF C’: 0.1, ’gamma’: 0.001 1.000 ± (0.000) Polynomial ’C’: 1, ’degree’: 1, 1.000 ± (0.000) ’gamma’: 1 Sigmoid ’C’: 0.01, ’gamma’: 0.01 1.000 ± (0.000) 80/20 Linear ’C’: 0.001 0.994 ± (0.011) RBF C’: 0.1, ’gamma’: 0.01 0.996 ± (0.010) Polynomial ’C’: 0.001, ’degree’: 1, 0.994 ± (0.011) ’gamma’: 1 Sigmoid ’C’: 0.01, ’gamma’: 0.01 0.996 ± (0.010) 67/33 Linear ’C’: 10 0.993 ± (0.006) RBF ’C’: 1, ’gamma’: 1 0.994 ± (0.003) Polynomial ’C’: 0.001, ’degree’: 3, 0.994 ± (0.007) ’gamma’: 2 Sigmoid ’C’: 1, ’gamma’: 0.01 0.991 ± (0.009) 50/50 Linear ’C’: 0.01 0.988 ± (0.005) RBF C’: 10, ’gamma’: 0.01 0.988 ± (0.005) Polynomial ’C’: 0.001, ’degree’: 2, 0.988 ± (0.005) ’gamma’: 1.5 Sigmoid ’C’: 1, ’gamma’: 0.01 0.988 ± (0.005) T able 4: Comparison b et ween four differen t train/test splits with CNN-SVM on binary classi- fication. The av erage metrics and their standard deviation are given for precision ( µ P), recall ( µ R), f1-score ( µ F1) and accuracy ( µ ACC). Split CNN-SVM µ P µ R µ F1 µ Acc 90/10 1.000 ( ± 0.000 ) 1.000 ( ± 0.000 ) 1.000 ( ± 0.000 ) 1.000 ( ± 0.000 ) 80/20 0 . 996( ± 0 . 006) 0 . 996( ± 0 . 007) 0 . 996( ± 0 . 003) 0 . 996( ± 0 . 010) 67/33 0 . 996( ± 0 . 004) 0 . 996( ± 0 . 004) 0 . 996( ± 0 . 001) 0 . 994( ± 0 . 007) 50/50 0 . 988( ± 0 . 008) 0 . 983( ± 0 . 015) 0 . 985( ± 0 . 004) 0 . 988( ± 0 . 005) ev aluate SVM. T able 3 shows the best parameter co m binations for each kernel at eac h split, using accuracy as a metric for optimization. It is notew orthy that the linear k ernel achieving top results means that the feature space can b e linearly separable. The o verall results of the CNN-SVM approach are summarized in T able 4, sho wing the p erformance of the proposed approac h using the previously defined metrics. F or the 90/10 split, all SVM kernels show ed p erfect A CC. F or 15 T able 5: Comparison b et ween CNN-MLP and CNN-SVM models on 4-class classification with 90/10 split. The av erage metrics and their standard deviation are given for precision ( µ P), recall ( µ R), f1-score ( µ F1) and accuracy ( µ ACC). Method µ P µ R µ F1 µ Acc CNN-MLP 0 . 925( ± 0 . 063) 0 . 911( ± 0 . 084) 0 . 913( ± 0 . 080) 0 . 906( ± 0 . 085) CNN-SVM 0.944 ( ± 0.034 ) 0.945 ( ± 0.033 ) 0.944 ( ± 0.034 ) 0.945 ( ± 0.056 ) T able 6: The b est parameters p er SVM kernel on 4-class classification. Kernel Parameters µ Acc Linear ’C’: 0.01 0.945 ± (0.056) RBF C’: 10, ’gamma’: 0.001 0.944 ± (0.057) P olynomial ’C’: 0.01, ’degree’: 1 0.945 ± (0.056) ’gamma’: 1 Sigmoid ’C’: 10, ’gamma’: 0.001 0.945 ± (0.056) the 80/20 split, RBF and sigmoid kernels achiev ed the highest results. In the 67/33 split, RB F kernel obtained the b est result. In the 50/50 split, all SVM k ernels ac hieved the same results. 3.6. Extending the pr op ose d ar chite ctur e to multi-classific ation W e also prop osed the use of our CNN-SVM architecture for classification of a 4-class data set, including the following classes: Endo capillary h yp ercellu- larit y , mesangial h yp ercellularit y , endoMes (b oth lesions) hypercellularity , and normal glomerulus. The same binary classification metho dology was follow ed, but now main taining K=10 on the cross-v alidation. As exp ected, the only mo d- ification on the CNN architecture was the num b er of dense lay ers at the top of the mo del, since the num b er of classes was changed. A t each fold on cross- v alidation, weigh ts from the best CNN-MLP mo del on binary classification w ere loaded, updating only the n um b er of classes on the last lay er. Then, the whole CNN-MLP mo del was retrained on the 4-class data set using the same former training parameters, ac hieving an a verage accuracy of 90.6%. Just as the binary classification, the b est model was selected among the 10 mo dels from each fold of cross-v alidation, using the CNN backbone as a feature extractor, feeding an 16 T able 7: Comparative p erformance for glomerular hypercellularity on binary classification. Metho d Precision Recall F1-Score Accuracy CNN-SVM 1.00 1.00 1.00 1.00 CNN-MLP 0 . 99 0 . 99 0 . 99 0 . 99 Barros et al. (2017) 0.88 0.88 0.88 0.85 SVM classifier. The k ernel parameters were v aried in the same wa y as in the former exp erimen ts, achieving, as the b est result, an av erage accuracy of 94.5%. T able 5 displays the final results for CNN-MLP and CNN-SVM classification on the 4-class data set, while T able 6 shows the parameters of the b est results for eac h SVM kernel. The linear kernel achiev ed the ov erall b est result again, pro ving the robustness of the CNN arc hitecture for feature extraction. 4. Discussion and conclusions Ov erall on binary classification, the t wo classification approaches (CNN- MLP and CNN-SVM) ac hieved high results on all metrics with lo w standard deviations, as show ed in T ables 1 and 4. The tw o methods had close results, with CNN-SVM approac h sho wing a slightly b etter performance for every v alue of K , pro ving the robustness of the final prop osed mo del. Despite the un balanced data set (more samples for lesion than for normal glomeruli), w e did not observ e the mo dels b eing heavily biased on the class with more images. This behavior may b e due to tw o factors: Image augmen tation and feature quality . The pro cess of image augmentation helped to solv e this issue by increasing the num b er of images through random mo difications on the original training set. The features obtained from the CNN backbone prov ed to b e highly suitable for classification using all kernels, achieving an av erage accuracy of 100% on the linear k ernel. This outcome demonstrates that, despite the size of the CNN features (128), these features are linearly separable, whic h is an outstanding finding. A summary of the results of binary classification is presented in T able 7, displa ying the b est results of the CNN-MLP and CNN-SVM mo dels in compar- ison with the metho d prop osed in (Barros et al., 2017). As that previous w ork 17 did not use the F1-score for ev aluation, w e calculated this score based on the pro vided precision and recall. Hence, we could compare the three results using all four metrics, considering 10-fold cross-v alidation (90/10 split). T o the b est of our knowledge, Barros et al. (2017) were the first to address the problem of glomerular hypercellularity lesion classification so far, what demonstrates that w e achiev ed an improv ement of 15 p ercentage p oin ts with our prop osed deep learning-based model on the same data set. Considering the 4-class classification, b oth CNN-MLP and CNN-SVM mo d- els achiev ed high results, e v en though the gap b etw een these t wo approac hes has increased (around four p ercen tage p oin ts), as w e can see in T able 5. This b eha vior may hav e o ccurred due to the difficult y of differentiating the 4 classes, mainly with respect to the sub-lesions. Another relev ant characteristic is the endoMes class, whic h contains features that can be confused with both endo- capillary and mesangial hypercellularity . Figure 5 illustrates the feature space of the data set plotted using the t-distributed sto chastic neighbor em b edding (t- SNE), which is a common technique for visualizing high-dimensional data into 2-dimensional plots. It’s noteworth y that the ”no lesion” class is well separated from the other lesion classes, whic h explains the 100% accuracy of the binary classification. The three lesion classes ha ve some well-defined groups, but these classes also hav e some areas with quite an ov erlap of instances, meaning that images con taininig endo capillary , mesangial and endoMes h yp ercellularit y can b e very similar. Figure 6 shows six images misclassified by the CNN-SVM mo del, consid- ering every p ossible error com bination. These images depict complex lesions that ma y represent a c hallenge even for nephropathologists (corrob orating with the t-SNE visualization). Figure 6(a) represents a glomeruli with increased circularit y caused by cell proliferation and influx of inflammatory cell with dis- ruption of glomerular compartmen ts. Figure 6(b) represents a glomeruli with h yp ercellularit y combined with mesangial matrix expansion and capillary wall thic kening probably b y immune complex deposition on the suendothelial and on the subepithelial asp ects of the glomerular basemen t membrane burling the lim- 18 Figure 5: t-SNE visualization of the 4-class data set. The CNN feature extractor generates a 128-dimensional feature v ector, and the t-SNE algorithm reduces the dimensionality to a 2-dimensional vector to help the analysis of clusters. its of glomerular compartments. Figure 6(c) hypercellularity is combined with capillary w all thick ening and partial mesangial dissolution. In Figures 6(d) and (f ), mesangial and capillary lumen are not alw ays well defined. W e sho wed these six images to b e indep endently classified by three pathologists. The results of this analysis are shown in T able 8. Complete agreement among nephropathol- ogists on the distribution of hypercellularity w as achiev ed only in t wo out of the six images. In diagnostic practice most of the difficulties generated b y these complex lesions are usually solved b y examining contiguous tissue sections of 19 (a) (b) (c) (d) (e) (f ) Figure 6: Six images of misclassified glomeruli with CNN-SVM architecture. F rom the left to right: (a) endo capillary hypercellularity misclassified as mesangial hypercellularity , (b) endo- capillary hypercellularity misclassified as endoMes h yp ercellularit y , (c) mesangial hypercellu- larity misclassified as endo capillary hypercellularity , (d) mensangial h yp ercellularit y misclas- sified as endoMes hypercellularity , (e) endoMes hypercellularity misclassified as endocapillary hypercellularity , and (f ) endoMes h yp ercellularit y misclassified as mesangial hypercellularity . 20 T able 8: Comparison b etw een the pathologists’ lab els and the results obtained by the trained CNN-SVM mo del. The Po ol column represen ts the ma jority voting outcome: computer (COMP) or pathologist (P A T). Image Classifier P o ol (see Fig.6) Pathologist 1 Pathologist 2 Pathologist 3 CNN-SVM a END END END MES P A T b END ENDOMES ENDOMES E NDOMES COMP c MES END ENDOMES END COMP d MES MES MES ENDOMES P A T e ENDOMES MES ENDOMES END P A T f ENDOMES ENDOMES END MES P A T 2 to 10 µ m apart, stained with a v ariety of techniques to highlight basement mem brane and mesangial matrix such as P AS and Periodic acid-methenamine silv er (P AMS). Although p erfect results on FIOCRUZ data set hav e b een achiev ed, there is a considerable gap to mov e from academic researc h to practical computational systems that assist pathologists in an effective w ay . F or future work, we are in vestigating differen t wa ys of using a transfer learning approac h to initialize our netw ork with better w eights for generalizing glomerulus image classes, where sufficien t training data exists. Additionally , we plan to expand the num b er of samples (now around 31,000 unlab elled images) in the data set, w orking with other types of lesions and histological stains used in the pathology laboratory for b etter data analysis. Another work in progress is the automatic glomerulus segmen tation in a WSI, containing sev eral glomeruli; the goal is to classify eac h found glomerulus, considering also the individual detection of eac h glomerulus comp onen t. 5. Ethical Considerations This w ork w as conducted in accordance with resolution No. 466/12 of the Brazilian National Health Council. T o preserve confidentialit y , the im- ages (including those sho wn in the pap er) were separated from other patient’s data. No data presented herein allows patient identification. All the pro ce- 21 dures were approv ed b y the Ethics Committee for Research Inv olving Human Sub jects of the Gon¸ calo Moniz Institute from the Oswaldo Cruz F oundation (CPqGM/FIOCR UZ), Proto cols No. 188/09 and No. 1817574. Ac kno wledgment The w ork was sp onsored by F unda¸ c˜ ao de Amparo ` a Pesquisa do Estado da Bahia (F APESB) grants TO-SUS0031/2018, TO-BOL0660/2018, TO-BOL0344/2018 and TO-PET0008/2015, resp ectiv ely . W ashington LC dos-San tos has a researc h sc holarship from CNPq Proc. No. 306779/2017. Luciano Oliveira has a research sc holarship from CNPq Pro c. No. 307550/2018-4. References References Al-Janabi S, Huisman A, V an Diest PJ. Digital pathology: current status and future persp ectives. Histopathology 2012;61(1):1–9. Ba jema IM, Wilhelm us S, Alpers CE, Bruijn JA, Colvin RB, Co ok HT, D’Agati VD, F errario F, Haas M, Jennette JC, et al. Revision of the international so ciet y of nephrology/renal pathology so ciet y classification for lupus nephritis: clarification of definitions, and modified national institutes of health activit y and c hronicit y indices. Kidney in ternational 2018;93(4):789–96. Barisoni L, Nast CC, Jennette JC, Hodgin JB, Herzenberg AM, Lemley KV, Con wa y CM, Kopp JB, Kretzler M, Lienczewski C, et al. Digital pathology ev aluation in the multicen ter nephrotic syndrome study net work (neptune). Clinical Journal of the American So ciety of Nephrology 2013;8(8):1449–59. Barros GO, Nav arro B, Duarte A, Dos-Santos WL. Pathospotter-k: A computa- tional tool for the automatic iden tification of glomerular lesions in histological images of kidneys. Scientific rep orts 2017;7:46769. 22 Canziani A, P aszke A, Culurciello E. An analysis of deep neural net w ork models for practical applications. arXiv preprint arXiv:160507678 2016;. Chap elle O, V apnik V, Bousquet O, Mukherjee S. Cho osing m ultiple parameters for support vector machines. Machine learning 2002;46(1-3):131–59. Ch urg J, Bernstein J, Glassock R. Renal disease: classification and atlas of glomerular diseases. Igaku-Shoin, New Y ork, T okyo 1995;. Duan K, Keerthi SS, P o o AN. Ev aluation of simple p erformance measures for tuning svm hyperparameters. Neuro computing 2003;51:41–59. F abija ´ nsk a A. Segmen tation of corneal endothelium images using a u-net-based con volutional neural net work. Artificial Intelligence in Medicine 2018;88:1 – 13. F ogo AB. Approach to renal biopsy . American Journal of Kidney Diseases 2003;42(4):826–36. F u X, Ong C, Keerthi S, Hung GG, Goh L. Extracting the knowledge em b edded in supp ort vector machines. In: IEEE International Joint Conference on Neural Net w orks. IEEE; v olume 1; 2004. p. 291–6. Gandomk ar Z, Brennan PC, Mello-Thoms C. Mudern: Multi-category classifica- tion of breast histopathological image using deep residual netw orks. Artificial In telligence in Medicine 2018;88:14 – 24. Gu J, W ang Z, Kuen J, Ma L, Shahroudy A, Shuai B, Liu T, W ang X, W ang L, W ang G, et al. Recen t adv ances in conv olutional neural netw orks. arXiv preprin t arXiv:151207108 2015;. Hou L, Samaras D, Kurc TM, Gao Y, Davis JE, Saltz JH. P atch-based conv o- lutional neural netw ork for whole slide tissue image classification. In: IEEE Conference on Computer Vision and P attern Recognition. 2016. p. 2424–33. 23 Im bault F, Lebart K. A sto c hastic optimization approac h for parameter tuning of supp ort v ector machines. In: International Conference on P attern Recog- nition. IEEE; volume 4; 2004. p. 597–600. Jano wczyk A, Madabhushi A. Deep learning for digital pathology image anal- ysis: A comprehensive tutorial with selected use cases. Journal of pathology informatics 2016;7. Jo osten SA, Sijpkens YW, v an Ko oten C, Paul LC. Chronic renal allograft rejection: P athophysiologic considerations. Kidney In ternational 2005;68(1):1 – 13. Kannan S, Morgan LA, Liang B, Cheung MG, Lin CQ, Mun D, Nader RG, Belghasem ME, Henderson JM, F rancis JM, Chitalia VC, Kolac halama VB. Segmen tation of glomeruli within trichrome images using deep learning. Kid- ney In ternational Reports 2019;. Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv preprin t arXiv:14126980 2014;. LeCun Y, Bengio Y, Hin ton G. Deep learning. Nature 2015;521(7553):436–44. Litjens G, Ko oi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafo orian M, v an der Laak JA, v an Ginneken B, S´ anchez CI. A survey on deep learning in medical image analysis. Medical Image Analysis 2017;42:60 – 88. Madabh ushi A, Lee G. Image analysis and machine learning in digital pathology: Challenges and opp ortunities. Medical Image Analysis 2016;33:170 –5. Marsh JN, Matlo ck MK, Kudose S, Liu TC, Stapp en b ec k TS, Gaut JP, Swami- dass SJ. Deep learning global glomerulosclerosis in transplant kidney frozen sections. bioRxiv 2018;:292789. P edraza A, Gallego J, Lop ez S, Gonzalez L, Laurinavicius A, Bueno G. Glomeru- lus classification with con volutional neural netw orks. In: Annual Conference on Medical Image Understanding and Analysis. Springer; 2017. p. 839–49. 24 Sarder P, Ginley B, T omaszewski JE. Automated renal histopathology: digital extraction and quantification of renal pathology . In: Medical Imaging 2016: Digital Pathology . In ternational So ciet y for Optics and Photonics; v olume 9791; 2016. p. 97910F. Sharma H, Zerb e N, Klempert I, Hellwic h O, Hufnagl P. Deep con volutional neural netw orks for automatic classification of gastric carcinoma using whole slide images in digital histopathology . Computerized Medical Imaging and Graphics 2017;61:2–13. Simon O, Y acoub R, Jain S, T omaszewski JE, Sarder P. Multi-radial lbp fea- tures as a to ol for rapid glomerular detection and assessment in whole slide histopathology images. Scientific rep orts 2018;8(1):2032. de Souza BF, de Carv alho AC, Calvo R, Ishii RP. Multiclass svm model selection using particle swarm optimization. In: In ternational Conference on Hybrid In telligent Systems. IEEE; 2006. p. 31. Spanhol FA, Oliv eira LS, Petitjean C, Heutte L. Breast cancer histopathological image classification using con volutional neural netw orks. In: In ternational Join t Conference on Neural Netw orks. IEEE; 2016. p. 2560–7. T rimarchi H, Barratt J, Cattran DC, Co ok HT, Copp o R, Haas M, Liu ZH, Rob erts IS, Y uza wa Y, Zhang H, et al. Oxford classification of iga nephropa- th y 2016: an up date from the iga nephropath y classification working group. Kidney in ternational 2017;91(5):1014–21. W ahab N, Khan A, Lee YS. Tw o-phase deep con volutional neural netw ork for reducing class skewness in histopathological images based breast cancer detection. Computers in biology and medicine 2017;85:86–97. Xu J, Luo X, W ang G, Gilmore H, Madabh ushi A. A deep conv olutional neu- ral netw ork for segmenting and classifying epithelial and stromal regions in histopathological images. Neuro computing 2016;191:214–23. 25 Zhang G, Hsu CHR, Lai H, Zheng X. Deep learning based feature represen tation for automated skin histopathological image annotation. Multimedia T o ols and Applications 2018;77(8):9849–69. 26
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment