An Ensemble-based System for Microaneurysm Detection and Diabetic Retinopathy Grading

TRANSA CTIONS ON BIOMEDICAL ENGINEERING 1 An Ensemble-based System for Microaneurysm Detection and Diabetic Retinopathy Grading B ´ alint Antal, Student Member , IEEE, and Andr ´ as Hajdu, Member , IEEE Abstract —Reliable microaneurysm detection in digital fundus images is still an open issue in medical image processing. W e pro- pose an ensemble-based framew ork to impr ov e microaneurysm detection. Unlike the well-known appr oach of considering the output of multiple classiﬁers, we propose a combination of internal components of micr oaneurysm detectors, namely pr epro- cessing methods and candidate extractors. W e ha ve evaluated our approach for microaneurysm detection in an online competition, where this algorithm is currently ranked as ﬁrst and also on two other databases. Since microaneurysm detection is decisiv e in diabetic retinopathy grading, we also tested the proposed method for this task on the publicly available Messidor database, where a pr omising A UC 0.90 with 0.01 uncertainty is achieved in a ’DR/non-DR’-type classiﬁcation based on the presence or absence of the microaneurysms. Index T erms —Microaneurysm detection, Ensemble-based sys- tems, Diabetic retinopathy grading, Fundus image processing. I . I N T RO D U C T I O N D IABETIC retinopathy (DR) is a serious eye disease that originates from diabetes mellitus and is the most common cause of blindness in the de veloped countries. Early treatment can prev ent patients to become af fected from this condition or at least the progression of DR can be slowed down. Thus, mass screening of patients suf fering from diabetes is highly desired, but manual grading is slow and resource demanding. Therefore, sev eral efforts have been made to establish reliable computer-aided screening systems based on color fundus images [1]. The promising results reported by Fleming et al. [2] and Jelinek et al. [3] indicates that automatic DR screening systems are getting closer to be used in clinical settings. A key feature to recognize DR is to detect microaneurysms (MAs) in the fundus of the e ye. The importance of handling MAs are two-fold. First, they are normally the earliest sign of DR, hence their timely and precise detection is essential. On the other hand, the grading performance of computer-aided DR screening systems highly depends on MA detection [3] [4]. In this paper , we propose a microaneurysm detector which provides remarkable results from both aspects. One way to ensure high reliability and raise accuracy in a detector is to consider ensemble-based systems, which hav e been proven to be efﬁcient in sev eral ﬁelds. Howe ver , the usual ensemble techniques aim to combine class labels or real values which cannot be adopted in our case. In MA detection, detectors provide spatial coordinates as centers of potential B ´ alint Antal and Andr ´ as Hajdu are with the Univ ersity of Debrecen, Faculty of Informatics, POB 12, 4010, Debrecen, Hungary . E-mail: { antal.balint, ha- jdu.andras } @inf.unideb .hu. The corresponding author is B ´ alint Antal. Phone: +36 52 512-900/62830, Fax: +36 52 512-900/62822. MA candidates. The use of well-kno wn ensemble techniques would require a classiﬁcation of each pixel, which can be misleading in our context, since different algorithms extract MAs with dif ferent approaches and the MA centers may not coincide exactly . T o overcome this dif ﬁculty , we gather close MA candidates of the individual detectors and apply a voting scheme on them. In [5], Niemeijer et al. showed that the fusion of the results of the se veral MA detectors lead to an increased av erage sensitivity measured at sev en predeﬁned false positiv e rates. In this paper, we propose a framew ork to build MA detector ensembles based on the combination of the internal components of the detectors not only on their output as in [5]. Some of our earlier research on combining MA detectors did not provide reassuring results [6]. T o increase the accuracy of such ensembles, we must identify the weak points of MA detection. The ﬁrst difﬁculty originates from the shape characteristics of MAs. They appear as small circular dark spots on the surface of the retina (see Figure 1), which can be hard to distinguish from fragments of the v ascular system or from certain eye features. Most MA detectors tackle this problem in the follo wing way: ﬁrst, the green channel of the fundus image is e xtracted and preprocessed to enhance MA like characteristics. Then, in a coarse le vel step (which will be referred as candidate e xtraction in the rest of the paper), all MA-like objects are detected in the image. Finally , a ﬁne lev el algorithm (usually a supervised classiﬁer) remov es the potentially false detections based on some assumptions about MAs. Our former in vestigations sho wed that the lo w sensitivity of MA detectors originates from the candidate e xtractor part [7]. Howe ver , we could increase the sensitivity by applying proper preprocessing methods before candidate extraction. This technique causes a slight increment in the number of false positiv es, b ut it can be decreased by classiﬁcation or voting. In this paper , we propose an ef fective microaneurysm de- tector based on the combination of preprocessing methods and candidate extractors. W e provide an ensemble creation framew ork to select the best combination. An exhaustiv e quantitativ e analysis is also gi ven to pro ve the superiority of our approach ov er individual algorithms. W e also inv estigate the grading performance of our method, which is prov en to be competitive with other screening systems. The rest of the paper is or ganized as follows: the selected preprocessing methods and candidate extractors are presented in section II and III, respectiv ely . The details of the proposed ensemble creation framework is discussed in section IV. W e present our ev aluation methodology in section V. In section VI, we summarize our e xperimental results. A detailed discus- TRANSA CTIONS ON BIOMEDICAL ENGINEERING 2 Fig. 1. Sample digital fundus image with a microaneurysm. sion is given in section VII to address several issues. Finally , we draw conclusions in section VIII. I I . P R E P RO C E S S I N G M E T H O D S In this section, we present the selected preprocessing methods, which we consider to be applied before executing MA candidate e xtraction. The selection of the preprocessing method and candidate extractor components for this frame- work is a challenging task. Comparison of preprocessing methods dedicated to microaneurysm detection has not been published yet. Since preprocessing methods need to be highly interchangeable, we must select algorithms which can be used before any candidate extractor and do not change the characteristics of the original images (unlike e.g. shade cor - rection [8]). W e also found some techniques to generate too noisy images for MA detection (histogram equalization [8], adaptiv e histogram equalization [8] or color normalization [8]). Thus, we have selected methods which are well-known in medical image processing and preserve image characteristics. Naturally , the proposed system can be improved in the future with adding new methods. A summary on the key differences of the algorithms is giv en in T able I. A. W alter-Klein contrast enhancement [9] This preprocessing method aims to enhance the contrast of fundus images by applying a gray lev el transformation using the following operator: f 0 =              1 2 ( f 0 max − f 0 min ) ( µ − f min ) r · ( f − f min ) r + f 0 min , f ≤ µ, − 1 2 ( f 0 max − f 0 min ) ( µ − f max ) r · ( f − f max ) r + f 0 max , f ≥ µ, where { f min , . . . , f max } , { f 0 min , . . . , f 0 max } are the inten- sity le vels of the original and the enhanced image, respecti vely , µ is the mean value of the original grayscale image and r ∈ R is a transition parameter . B. Contrast limited adaptive histogram equalization [10] Contrast limited adaptiv e histogram equalization (CLAHE) is a popular technique in biomedical image processing, since it is very effecti ve in making the usually interesting salient parts more visible. The image is split into disjoint regions, and in each region a local histogram equalization is applied. Then, the boundaries between the regions are eliminated with a bilinear interpolation. T ABLE I S U MM A RY O F T H E K E Y D I FF ER E N C ES O F T H E P R EP RO C E S SI N G M E T H OD S . Algorithm Aim Method W alter-Klein contrast enhancement gray level transformation CLAHE salient object enhancement local histogram equalization V essel Removal MA enhancement near vessels vessel removal and inpainting Illumination eq. MA enhancement at the border of the R OI vignette correction C. V essel r emoval and extrapolation [11] W e in vestigate the effect of processing images with the complete vessel system being removed based on the idea proposed in [11]. W e extrapolate the missing parts to ﬁll in the holes caused by the removal using the inpainting algorithm presented in [12]. MAs appearing near vessels become more easily detectable in this way . D. Illumination equalization [8] This preprocessing method aims to reduce the vignetting effect caused by unev en illumination of retinal images. Each pixel intensity is set according to the following formula: f 0 = f + µ d − µ l , where f , f 0 are the original and the new pixel intensity v alues, respectiv ely , µ d is the desired av erage intensity and µ l is the local a verage intensity . MAs appearing on the border of the retina are enhanced by this step. E. No prepr ocessing W e also consider the results of the candidate extractors obtained for the original images without any preprocessing. That is, we formally consider a ”No preprocessing” operation, as well. I I I . M I C RO A N E U RY S M C A N D I D A T E E X T R A C TO R S Candidate extraction is a process which aims to spot any objects in the image showing MA-like characteristics. Indi vid- ual MA detectors consider different principles to extract MA candidates. In this section, we provide a brief ov erview of the candidate extractors in volved in our analysis. Again, just as for preprocessing methods, adding new MA candidate extractors may lead to further improvement in the future. A summary on the key differences of the candidate extractor algorithms and their performance measured in the ROC training dataset [13] are shown in T able II. A. W alter et al. [14] Candidate extraction is accomplished by grayscale diameter closing. That is, this method aims to ﬁnd all suf ﬁciently small dark patterns on the green channel. Finally , a double threshold is applied. TRANSA CTIONS ON BIOMEDICAL ENGINEERING 3 T ABLE II S U MM A RY O F T H E K E Y D I FF ER E N C ES O F T H E C A ND I DATE E X T RA CT O R S . T H E S E NS I T I VI T Y A N D A V E R AG E N U MB E R O F FA L SE P O S IT I V E S P E R I M AG E ( F P / I ) I S M E AS U R E D O N T H E R OC T R A I NI N G D A TA BA SE W I T H D E F AU L T PAR A M ET E R S E TT I N GS . Algorithm Method Sensitivity FP / I W alter diameter closing 36% 154.42 Spencer top-hat transformation 12% 20.3 Hough circular Hough-transformation 28% 505.85 Zhang matching multiple Gaussian masks 33% 328.3 Lazar cross-section proﬁle analysis 48% 73.94 B. Spencer et al. [15] From the input fundus image, the v ascular map is extracted by applying twelve morphological top-hat transformations with twelve rotated linear structuring elements (with a radial resolution 15 ◦ ). Then, the v ascular map is subtracted from the input image, which is follo wed by the application of a Gaussian matched ﬁlter . The resulting image is then binarized with a ﬁxed threshold. Since the extracted candidates are not precise representations of the actual lesions, a region growing step is also applied to them. While the original paper [15] is written to detect MAs on ﬂuorescein angiographic images, our implementation is based on the modiﬁed version published by Fleming et al. [16]. C. Cir cular Hough-transformation [17] Follo wing the idea presented in [17], we established an approach based on the detection of small circular spots in the image. Candidates are obtained by detecting circles on the images using circular Hough transformation. W ith this technique, a set of circular objects can be extracted from the image. D. Zhang et al. [18] In order to extract candidates, this method constructs a max- imal correlation response image for the input retinal image. This is accomplished by considering the maximal correlation coefﬁcient with ﬁve Gaussian masks with dif ferent standard deviations for each pixel. The maximal correlation response image is thresholded with a ﬁxed threshold value to obtain the candidates. V essel detection and region growing is applied to reduce the number of candidates, and to determine their precise size, respectiv ely . E. Lazar et al. [19] Pixel-wise cross-section proﬁles with multiple orientations are used to construct a multi-directional height map. This map assigns a set of height values that describe the distinction of the pixel from its surroundings in a particular direction. In a modiﬁed multile vel attribute opening step, a score map is constructed from which the MAs are extracted by thresholding. I V . E N S E M B L E C R E A T I O N In this section, we describe our ensemble creation approach. In our framework, an ensemble E is a set of h preprocessing method, candidate extractor i or shortly h P P , C E i pairs. The meaning of a h preprocessing method, candidate e xtractor i pair is that ﬁrst we apply the preprocessing method to the input image and then we apply the candidate e xtractor to this result. That is, such a pair will e xtract a set of candidates H E from the original image. If an ensemble E contain more h preprocessing method, candidate extractor i pairs, their outputs are fused in the following way: for each candidate c , all such candidates of the other participants are collected, whose euclidean distance d is smaller than a predeﬁned constant r ∈ R from c . Let I c denote that the set of these points collected for a candidate c . Then, the centroid calculated from I c is put into H E . Ensemble creation is a process where all ensembles E from an ensemble pool E is ev aluated and the best performing one E best ∈ E regarding an ev aluation function on a training set is selected. T o ev aluate an ensemble E , its output candidate set H E must be compared to the ground truth in the following way: if for a c ∈ H E exists a point in the ground truth, whose euclidean distance d from c is smaller than a predeﬁned constant r ∈ R , then c is considered as a true positive. Otherwise, c is false positive, while each ground truth point is a false negati ve which does not have a close candidate from H E . The selection of the optimal ensemble E best would require each possible h preprocessing method, candidate extractor i ensembles to be ev aluated to ﬁnd the optimal one. Howe ver , currently we consider M = N = 5 preprocessing methods and candidate extractors in our experiments. That is, we hav e 25 h preprocessing method, candidate extractor i pairs with 2 25 number of possible combinations to form the ensemble. It would be very resource-demanding to evaluate such a large number of combinations, so we used simulated annealing [20] as a search algorithm to ﬁnd the ﬁnal ensemble, which is prov en to be effecti ve in such large search spaces. Howe ver , we describe the selection procedure as an exhausti ve search in the latter parts, since it is better to ev aluate all conﬁgurations if enough resources are av ailable, and sev eral other choices of search algorithms are possible. As an energy function, we used the competition perfor- mance metric CPM [13], which is deﬁned as the average sensitivity lev el at se ven predeﬁned false positi ve per image rate ( 1 / 8 , 1 / 4 , 1 / 2 , 1 , 2 , 4 , 8 ) [13]. The process of ensemble creation is also shown in Figure 2. Fig. 2. Flow chart of the ensemble-based framework. The ensemble creation part results in a set of h preprocessing method, candidate extractor i pairs. This ensemble E best then can be used to detect MAs on unknown images. The ﬁnal ensemble is applied in real detection in the same w ay as in the training phase. Namely , the ﬁnal MAs are detected by the fusion of the MA candidates of the indi vidual pairs building up TRANSA CTIONS ON BIOMEDICAL ENGINEERING 4 the ensemble E best . Similarly , for ev ery detected MA we will hav e a conﬁdence value as described above. Thus, for the ﬁnal decision on the presence of MAs, the output MA set needs to be thresholded according to the assigned conﬁdence values. The choice of the threshold value is discussed in section VII in detail. The proposed ensemble creation method can be summarized through the following steps: Algorithm 1 : Selection of the optimal combination of preprocessing methods and candidate extrac- tors. 1. E ← P ( P P i × C E j ) , i = 1 , . . . , M , j = 1 , . . . , N 2. C P M best ← 0 3. E best ← N U LL 4. for all E ∈ E do 5. H E ← ∅ 6. for all p ∈ E do 7. for all MA candidate c detected by p do 8. I c ← { c 0 | c 0 is a MA candidate found by a p 0 ∈ E , with p 6 = p 0 and d ( c, c 0 ) < r } ∪ { c } 9. conf idence ( c ) = | I c | | E | , 10. H E ← H E ∪ centr oid ( I c ) 11. end for 12. end for 13. if C P M ( H E ) > C P M best then 14. C P M best ← C P M ( H E ) 15. E best ← E 16. end if 17. end for 18. retur n E best V . M E T H O D O L O G Y W e hav e ev aluated the proposed approach for both MA detection and DR grading. In this section, we present the ev aluation methodology we used in each case. A. MA detection W e hav e ev aluated the MA detection capabilities of the proposed method in the R OC competition for MA detectors [13], as well as on a publicly av ailable [21] and a priv ate database. In this section, we provide a brief overvie w on these databases and on the methodology we used for the ev aluation of MA detection performance of the proposed approach. 1) Retinopathy Online Challenge (R OC) [13]: R OC is a worldwide competition dedicated to measure the accuracy of microaneurysm detectors. The R OC database consists of 50 training and 50 test images with different resolutions ( 768 × 576 , 1058 × 1061 and 1389 × 1383 ), 45 ◦ FO V and JPEG compression. The average number of MAs for the training and test sets are 6.72 and 6.86, respectively . There are 13 and 10 images of the training and test sets, where no MAs are marked by the experts. 2) Diar etDB1 2.1 database [21]: The DiaretDB1 2.1 database contains 28 losslessly compressed training and 61 test images with a 1500 × 1152 resolution and 50 ◦ FO V . The av erage number of MAs for the training and test sets are 4.34 and 3.91, respectively . There are 15 and 39 images of the training and test sets, where no MAs are marked by the experts. 3) Private database pr ovided by Moorﬁelds Eye Hospital, UK: This database consists of 60 losslessly compressed im- ages with a resolution 3072 × 2048 and 45 ◦ FO V . The av erage number of MAs for the training and test sets are 8.67 and 8.87, respectiv ely . There are 10 and 8 images of the training and test sets, respectively , where no MAs are marked by the experts. 4) T esting: For each database, we provide the Free- response Receiv er Operating Characteristic (FR OC) curves [22], which plots the sensiti vity against the av erage number of false positiv es per image. T o measure the sensitivity at dif- ferent av erage false positive per image lev els, we thresholded the output set of the MA detector based on the conﬁdence values assigned to each candidate. F or the R OC dataset, we also provide the current ranking of the competition along with the CPM v alues (see section VIII for details) that serves as the basis for the ranking. In addition, we also calculated a partial A UC of the algorithms in the same range (between 1 / 8 and 8 ) by normalizing the average false positive per image ﬁgure by dividing with the maximum ( 8 ) and applying trapezoidal integration. The empirical A UC calculated this w ay is likely to underestimate the true A UC. Howe ver , the uncertainty for the partial A UCs may be quite high due to the low number of images. B. DR grading W e have also ev aluated our ensemble-based approach to see its grading performance to recognize DR. For this aim, we determined the image-level classiﬁcation rate of the ensemble on the Messidor 1 dataset containing 1200 images. That is, the presence of any MA means that the image contains signs of DR, while the absence of MAs indicates a healthy case. In other words, a pure yes/no decision of the system has been tested. 1) Ensemble cr eation: As there is no training set provided for the Messidor database, we used an independent dataset (the R OC dataset) to train our algorithm. Note that, this is quite a strong handicap in comparison with the usual approach to train on a part of the same database. Ho wever , we feel that in this w ay we can get much closer to measure up the true performance of our system under real circumstances. 2) T esting: W e used the publicly av ailable Messidor database for testing. This database consists of 1200 losslessly compressed images with 45 ◦ FO V and dif ferent resolutions ( 440 × 960 , 2240 × 1488 and 2304 × 1536 ). For each image, a grading score ranging from R0 to R3 is pro vided. These grades correspond to the following clinical conditions: a patient with an R0 grade has no DR. R1 and R2 are mild and severe cases of non-proliferative retinopathy , respectively . Finally , R3 1 Kindly provided by the Messidor program partners (see http://messidor .crihan.fr). TRANSA CTIONS ON BIOMEDICAL ENGINEERING 5 T ABLE III h P R E P RO CE S S I NG M E T HO D , C A N D ID A T E E X TR AC T OR i PAI R S S E L E CT E D A S M E M BE R S O F T H E E N S EM B L E F O R T H E T H R E E DAT A S E T . R , D , M D E NO TE W H E TH E R T H E PAI R I S S E L E CT E D F O R T H E RO C , D I A R ET 2 . 1 , O R T H E M O OR FI E L DS D A TAS E T , R E S PE C T I VE LY . W alter Spencer Hough Lazar Zhang W alter-Klein M R CLAHE R, D M R D V essel Removal D R, D, M R, D Illumination eq. R, M No preprocessing R M R, D R is the most serious condition (proliferativ e retinopathy). The grading is based on the appearance of MAs, haemorrhages and neov ascularization. The proportion of the images in the Messidor dataset: 540 R0 (46%), 153 R1 (12.75%), 247 R2 (20.58%) and 260 R3 (21.67%). In our ev aluation, we classiﬁed the retinal images whether they contain signs of DR (R1, R2, R3) or not (R0). The MA detector classiﬁes an image as diseased if at least one MA was detected, and healthy otherwise. W e measured the sensitivity , speciﬁcity and accuracy of the detector at different lev els by thresholding the conﬁdence values assigned to the MA candidates as described in section IV using the follo wing formulas: sensitiv ity = tp tp + f n , specif icity = tn tn + f p , and accur acy = tp + tn tp + f n + tn + f p . W e also measured that the percentage of correctly recognized cases for each grade. W e provided a ﬁtted Receiv er Operating Characteristic (R OC) curve along with the empirical and ﬁtted A UC for the proposed method on the Messidor database. For curve ﬁtting, we used JROCFIT [23]. V I . R E S U L T S In this section, we present our experimental results for both MA detection and DR grading. A. MA detection In T able III, we exhibit the h preprocessing method, candi- date extractor i pairs included in the selected ensembles for the three datasets, respectively . The ro ws of the table show the preprocessing methods from section II, while the columns label the candidate extractor algorithms listed in section III. T able IV contains the ranked quantitative results of the participants at the R OC competition, with the proposed en- semble (DRSCREEN) highlighted as the current leader . The performance of the ensemble is also shown in Figure 3 in terms of a FROC curve. As we can see from T able IV, the proposed ensemble earned both a higher CPM score and a higher partial A UC than the individual algorithms. The FR OC curves of the ensemble for the DiaretDB1 v2.1 and for the Moorﬁelds database is shown in Figures 4 and 5, respectiv ely . T o the best of our knowledge, no corresponding T ABLE IV Q UA N TI TA T I V E R E SU LT S O F T H E R OC C O M PE T I T IO N . F O R E AC H PART I C I P AT IN G T E AM , T H E C O MP E T I TI O N P ER F O R MA N C E M E T RI C A N D T H E PART I A L AU C A R E P R ES E N T ED . T eam CPM AUC DRSCREEN 0.434 0.551 Niemeijer et al. 0.395 0.469 LaTIM 0.381 0.489 ISMV 0.375 0.435 OKmedical II 0.369 0.465 OKmedical 0.357 0.430 Lazar et al. 0.355 0.449 GIB 0.322 0.399 Fujita 0.310 0.378 IRIA 0.264 0.368 W aikato 0.206 0.273 0 0 , 1 0 , 2 0 , 3 0 , 4 0 , 5 0 , 6 0 , 7 0 , 8 0 , 9 1 0 , 0 0 1 0 ,0 0 2 0 ,0 0 3 0 ,0 0 4 0 ,0 0 5 0 ,0 0 6 0 ,0 0 7 0 ,0 0 8 0 ,0 0 9 0 ,0 0 1 0 0 ,00 Sensitivi ty A v er ag e FP s pe r Imag e Ensemble Fig. 3. FROC curve of the ensemble on the R OC dataset. quantitativ e results hav e been published for these databases yet. Thus, we disclose the results of the ensemble-based method only . 0 0 , 1 0 , 2 0 , 3 0 , 4 0 , 5 0 , 6 0 , 7 0 , 8 0 , 9 1 0 , 0 0 1 0 ,0 0 2 0 ,0 0 3 0 ,0 0 4 0 ,0 0 5 0 ,0 0 6 0 ,0 0 7 0 ,0 0 8 0 ,0 0 9 0 ,0 0 1 0 0 ,00 Sensitivi ty A v er ag e FP s pe r Imag e Ensemble Fig. 4. FROC curve of the ensemble on the DiaretDB2.1 dataset. B. DR grading In T able V, we provide the sensitivity , speciﬁcity and accuracy measures of our detector corresponding to different threshold values, respectiv ely . The ﬁtted R OC curve of the detector can be seen in Figure 6. The empirical area under curve (A UC) is 0.875, while the A UC for the ﬁtted curve is 0 . 90 ± 0 . 01 . T able V also contains the percentage of the correctly recognized cases for each class. TRANSA CTIONS ON BIOMEDICAL ENGINEERING 6 0 0 , 1 0 , 2 0 , 3 0 , 4 0 , 5 0 , 6 0 , 7 0 , 8 0 , 9 1 0 , 0 0 1 0 ,0 0 2 0 ,0 0 3 0 ,0 0 4 0 ,0 0 5 0 ,0 0 6 0 ,0 0 7 0 ,0 0 8 0 ,0 0 9 0 ,0 0 1 0 0 ,00 Sensitivi ty A v er ag e FP s pe r Imag e Ensemble Fig. 5. FROC curve of the ensemble on the Moorﬁelds dataset. T ABLE V R E SU LT S O N T H E M E S S ID O R D A TA SE T . F O R E A CH T H R ES H O L D , S E NS I T I VI T Y , S P E C IFI C I T Y , AC C U R AC Y A N D T H E P E R CE N TAG E O F C O RR E C T L Y R E CO G N I ZE D C A SE S F O R E AC H G RA D E A R E P R ES E N T ED . Threshold 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Sensitivity 1 1 1 0.99 0.96 0.76 0.31 Speciﬁcity 0 0.01 0.03 0.14 0.51 0.88 0.98 Accuracy 0.53 0.54 0.55 0.59 0.75 0.82 0.62 R0 0.00 0.01 0.03 0.14 0.51 0.88 0.98 R1 1.00 1.00 1.00 0.97 0.92 0.60 0.18 R2 1.00 1.00 1.00 1.00 0.96 0.72 0.29 R3 1.00 1.00 1.00 1.00 0.98 0.92 0.42 V I I . D I S C U S S I O N A strong point of the proposed method is that it performs well under difﬁcult circumstances. Figure 7 sho ws an example image where the application of CLAHE made it easier to distinguish the MAs from their background. Ho wev er, the use of the vessel remov al and inpainting preprocessing method caused the missing of a true MA, while the detection of the remaining MA is easier in the absence of thin retinal vessels. Thus, using different preprocessing methods with candidate extractors creates div ersity among the members of the ensemble, which is desired for systems using multiple estimators [24]. This diversity ensures the suppression of false detections, since div erse detectors tend to make dif ferent mistakes. Thus, the false detections are likely to recei ve lower conﬁdence values in the voting procedure. 0 0 , 1 0 , 2 0 , 3 0 , 4 0 , 5 0 , 6 0 , 7 0 , 8 0 , 9 1 0 0 , 1 0 , 2 0 , 3 0 , 4 0 , 5 0 , 6 0 , 7 0 , 8 0 , 9 1 Sens itivity 1 - sp ecificity R OC cur v e Fig. 6. ROC curve of the ensemble on the Messidor dataset. (a) Original (b) CLAHE (c) V essel re- mov al Fig. 7. The effect of different preprocessing methods where MAs are hard to detect. Our experimental results show that the proposed ensemble- based MA detector outperforms the current individual ap- proaches in MA detection. It has been also prov en that the framew ork has high ﬂexibility for dif ferent datasets. As it can be seen in T able III, the ensemble members may vary , which suggests relatively high variance among databases in this ﬁeld. Despite this variability , the performance of the ensemble still remained stable. In [13], the authors measured a human e xpert av erage false positive rate at the R OC dataset against the consensus of three human experts. This lev el is approximately 1 FP per image [13] for the ROC database, on which level our ensemble achie ved the best score in the competition. Thus, we can recommend to use this le vel for thresholding at the ensemble creation phase and use it for detecting MAs on unknown images. As for DR grading, our ensemble also performed well. It is also important to see how the different classes (R0, R1, R2, R3) are recognized at dif ferent levels. As it can be desired, the sev erity of DR affects the performance of our detector . At each threshold lev el, where the sensitivity is less than 1.0, the more severe case recognized with higher probability . The selection of the appropriate threshold is also an impor- tant issue for our detector to provide sufﬁcient sensitivity and speciﬁcity rate. In [4], the authors suggest that sensitivity is more important for a screening system than speciﬁcity . In op- position, the British Diabetic Association (BD A) recommends 80% sensiti vity and 95% speciﬁcity for DR screening [25]. In T able V, we can see that the most accurate result is achieved with the threshold value 0.9. By applying the ﬁrst idea, we might consider the results corresponding to the threshold value 0.8 as the best in our experiment, where 96% sensitivity and 51% speciﬁcity are achieved. That is, we recognized almost all of the cases where DR is present, and half of the healthy ones. The closest to the second recommendation is the performance achiev ed at the 0.9 lev el: 76% sensitivity and 88% speciﬁcity . It is dif ﬁcult to compare our method to other screening systems. First of all, to the best of our kno wledge, no other results reported for the complete Messidor database. Other screening systems are tested on pri vate images. Unfortunately , the proportion of non-DR/DR cases are varying in these experiments. Abramoff et al. [4] reported 0.86 A UC on a population where 4.96% of the cases had at least minimum signs of DR. The databases on which Agurto et al. [26] tested, 74.43% and 76.26% cases contained signs of DR and they achiev ed 0.81 and 0.89 A UCs, respectiv ely . The closest to match the requirements of BDA is the system of Jelinek et al. [3] with a 85% sensitivity and 90% speciﬁcity , where approximately 30% of patients had DR. Similar proportion TRANSA CTIONS ON BIOMEDICAL ENGINEERING 7 (35.88%) of patients having DR are reported by Fleming et al. [2] in their automatic screening system. Despite the promising results, our system still misclassiﬁes some stage where serious case of DR is present. T o improv e grading performance, we must take into account the presence or absence of more DR-speciﬁc lesions (e.g. exudates), which are essential in a clinical setting. Howe ver , our MA detector can serve as a main component of such a system. V I I I . C O N C L U S I O N In this paper , we hav e proposed an ensemble-based mi- croaneurysm detector which has prov ed its high efﬁcienc y in an open online challenge with its ﬁrst position. Our novel framew ork relies on a set of h preprocessing method, candidate extractor i pairs, from which a search algorithm selects an optimal combination. Since our approach is modular, we can expect further improvements by adding more preprocessing methods and candidate extractors. W e hav e also ev aluated the grading performance of this detector in the 1200 images of the Messidor database. W e ha ve achiev ed a 0 . 90 ± 0 . 01 A UC v alue, which is competitiv e with the pre viously reported results on other databases. The grading results presented in this paper are already promising. Howe ver , a proper screening system should contain other components, which is expected to increase the performance of this approach, as well. A C K N O W L E D G E M E N T This work was supported in part by the J ´ anos Bolyai grant of the Hungarian Academy of Sciences, and by the TECH08- 2 project DRSCREEN - Dev eloping a computer based im- age processing system for diabetic retinopathy screening of the National Ofﬁce for Research and T echnology of Hun- gary (contract no.: OM-00194/2008, OM-00195/2008, OM- 00196/2008). R E F E R E N C E S [1] M. Abramoff, M. Niemeijer , M. Suttorp-Schulten, M. A. V ierge ver, S. R. Russel, and B. van Ginneken, “Evaluation of a system for automatic detection of diabetic retinopathy from color fundus photographs in a large population of patients with diabetes, ” Diabetes Car e , vol. 31, pp. 193–198, 2008. [2] A. D. Fleming, K. A. Goatman, S. Philip, G. J. Prescott, P . F . Sharp, and J. A. Olson, “ Automated grading for diabetic retinopathy: a lar ge- scale audit using arbitration by clinical experts, ” British J ournal of Ophthalmology , vol. 94, no. 12, pp. 1606–1610, 2010. [3] H. J. Jelinek, M. J. Cree, D. W orsley , A. Luckie, and P . Nixon, “ An automated microaneurysm detector as a tool for identiﬁcation of diabetic retinopathy in rural optometric practice. ” Clin Exp Optom , vol. 89, no. 5, pp. 299–305, 2006. [4] M. Abramoff, J. Reinhardt, S. Russell, J. Folk, V . Mahajan, M. Niemei- jer , and G. Quellec, “ Automated early detection of diabetic retinopathy , ” Ophthalmology , vol. 117, no. 6, pp. 1147–1154, 2010. [5] M. Niemeijer, M. Loog, M. D. Abramoff, M. A. V iergev er , M. Prokop, and B. van Ginneken, “On combining computer-aided detection sys- tems, ” IEEE T ransactions on Medical Imaging , vol. 30, pp. 215 – 223, 2011. [6] B. Antal, I. Lazar, A. Hajdu, Z. T orok, A. Csutak, and T . Peto, “ A multi-lev el ensemble-based system for detecting microaneurysms in fundus images, ” in 4th IEEE International W orkshop on Soft Computing Applications (SOF A 2010) , 2010, pp. 137–142. [7] B. Antal and A. Hajdu, “Impro ving microaneurysm detection using an optimally selected subset of candidate extractors and preprocessing methods, ” P attern Recognition , vol. 45, no. 1, pp. 264 – 270, 2012. [8] A. A. A. Y oussif, A. Z. Ghalw ash, and A. S. Ghoneim, “Comparative study of contrast enhancement and illumination equalization methods for retinal vasculature segmentation, ” Pr oc. Cair o International Biomedical Engineering Conferemce , 2006. [9] T . W alter and J. Klein, “ Automatic detection of microaneyrysms in color fundus images of the human retina by means of the bounding box closing, ” Lecture Notes in Computer Science , vol. 2526, pp. 210–220, 2002. [10] K. Zuiderveld, “Contrast limited adaptiv e histogram equalization, ” Graphics gems , vol. IV , pp. 474–485, 1994. [11] S. Ravishankar , A. Jain, and A. Mittal, “ Automated feature extraction for early detection of diabetic retinopathy in fundus images, ” in Computer V ision and P attern Recognition , 2009, pp. 210–217. [12] A. Criminisi, P . Perez, and K. T oyama, “Object remov al by exemplar- based inpainting, ” in Computer V ision and P attern Recognition , vol. 2, 2003, pp. II–721 – II–728. [13] M. Niemeijer, B. v an Ginneken, M. Cree, A. Mizutani, G. Quellec, C. Sanchez, B. Zhang, R. Hornero, M. Lamard, C. Muramatsu, X. Wu, G. Cazuguel, J. Y ou, A. Mayo, Q. Li, Y . Hatanaka, B. Cochener, C. Roux, F . Karray , M. Garcia, H. Fujita, and M. Abramoff, “Retinopa- thy online challenge: Automatic detection of microaneurysms in digital color fundus photographs, ” IEEE T ransactions on Medical Imaging , vol. 29, no. 1, pp. 185–195, 2010. [14] T . W alter, P . Massin, A. Ar ginay , R. Ordonez, C. Jeulin, and J. C. Klein, “ Automatic detection of microaneurysms in color fundus images, ” Medical Image Analysis , vol. 11, pp. 555–566, 2007. [15] T . Spencer, J. A. Olson, K. C. McHardy , P . F . Sharp, and J. V . Forrester , “ An image-processing strategy for the segmentation and quantiﬁcation of microaneurysms in ﬂuorescein angiograms of the ocular fundus, ” Computers and Biomedical Research , vol. 29, pp. 284–302, 1996. [16] A. D. Fleming, S. Philip, and K. A. Goatman, “ Automated microa- neurysm detection using local contrast normalization and local vessel detection, ” IEEE T ransactions on Medical Imaging , v ol. 25(9), pp. 1223– 1232, 2006. [17] S. Abdelazeem, “Microaneurysm detection using vessels removal and circular hough transform, ” Proceedings of the Nineteenth National Radio Science Conference , pp. 421 – 426, 2002. [18] B. Zhang, X. W u, J. Y ou, Q. Li, and F . Karray , “Detection of microa- neurysms using multi-scale correlation coefﬁcients, ” P attern Recogn. , vol. 43, no. 6, pp. 2237–2248, 2010. [19] I. Lazar and A. Hajdu, “Microaneurysm detection in retinal images using a rotating cross-section based model, ” in 2011 IEEE International Symposium on Biomedical Imaging , 2011, pp. 1405 –1409. [20] S. Kirkpatrick, C. D. Gelatt, and M. P . V ecchi, “Optimization by simulated annealing, ” Science , v ol. 220, pp. 671–680, 1983. [21] T . Kauppi, V . Kalesnykiene, J.-K. Kmrinen, L. Lensu, I. Sorri, A. Rani- nen, R. V outilainen, H. Uusitalo, H. Klviinen, and J. Pietil, “Diaretdb1 diabetic retinopathy database and ev aluation protocol, ” Proc. of the 11th Conf. on Medical Image Understanding and Analysis (MIUA2007) , pp. 61–65, 2007. [22] D. Chakraborty , “Clinical relev ance of the roc and free-response paradigms for comparing imaging system efﬁcacies, ” Radiation Pro- tection Dosimetry , vol. 139, no. 1-3, pp. 37–41, 2010. [23] J. Eng. Roc analysis: web-based calculator for roc curv es. Johns Hopkins Univ ersity , Baltimore. [Online]. A vailable: http://www .jrocﬁt.org [24] L. I. Kunchev a, Combining P attern Classiﬁers. Methods and Algorithms . W iley , 2004. [25] B. D. Association, “Retinal photography screening for diabetic eye disease, ” T ech. Rep., 1997. [26] C. Agurto, E. S. Barriga, V . Murray , S. Nemeth, R. Crammer, W . Bau- man, G. Zamora, M. S. Pattichis, and P . Soliz, “ Automatic detection of diabetic retinopathy and age-related macular degeneration in digital fundus images, ” In vestigative Ophthalmology & V isual Science , vol. 52, no. 8, pp. 5862–5871, 2011.

An Ensemble-based System for Microaneurysm Detection and Diabetic Retinopathy Grading

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment