Unsupervised edge map scoring: a statistical complexity approach

Unsup ervised edge map scoring: a sta tistical complexit y approac h. Ja vier Gimenez a , J orge Martinez b , Ana Georgina F lesia a a F acultad de Matem´ atic a, Astr onom ´ ıa y F ´ ısic a, U niversid ad Nacional de C´ or dob a. Ing. Me dina Al lende s/n, Ciudad U n iversi taria CP 5000, C´ or dob a, Ar gentina. b Dep artament o de Matem´ atic a, U niversid ad Nacional del Sur, Ave nida Alem 1253. 2do Piso Bah ´ ıa Blanc a, B8000CP B, Ar gentina. Abstract W e prop ose a new Statistical Complexit y Measure ( SCM ) to qualify edge maps with- out Ground T r ut h (GT) kno wledge. The measure is the pro duct of tw o indices, an Equilibrium index E obtained by pro jecting the edge map into a family of edge patterns, and an Entr opy index H , deﬁne d as a function of the Kolmogoro v Smirnov (KS) statistic. This new m easure can b e used f o r p erformance c haracterization whic h includes: (i) the sp eciﬁc ev aluation of an algorithm (intra-tec hnique pro cess) in order to identify its b est parameters, and (ii) the comparison of diﬀeren t algor it hms (inter-tec hnique pro cess) in order to classify them according to their quality . Results made ov er images of the South Florida and Berk eley databases sho w that our approac h signiﬁcan tly impro v es ov er Pratt’s Figure of Merit (PF oM) whic h is the ob jectiv e reference-based edge map ev aluation standard, as it tak es in to accoun t more features in its ev aluation. Keywor ds: unsup ervised quality measure, edge maps, statistical complexit y , edge patterns, entrop y , Kolmogorov Smirnov statistic 1. In tro duction In m ost image pr o cessing tec hniques, the dete ction and h andling of the edge structure of the input image is v ery imp ortan t. F rom ob ject detection to image transmission, the qualit y of edge manipulation takes great part in the success of the pro cessing. Nev erthe- less, there is no univ ersal deﬁnition of the notion o f edge. F o r Ab dou and Prat t , a n edge is deﬁned as a lo cal change in luminance or discon tin uit y in the luminance in tensit y of the image [1] while Kitc hen and Rosenfeld p oin ted o ut that the edge concept dep ends on the type of pro cess ing and analysis in whic h it is in v olv ed [2]. Therefore, many researc hers hav e designed optimal Edge D etection Algorithms ( EDA) related to diﬀeren t prop erties of the edge structure, but only a few ha ve studied how to measure the edge strength and qualit y of general edge maps [3]. Eﬀectiv e and o b jectiv e Edge Detection (ED) ev aluation measures m ust b e deve lop ed in order to assess EDA p erformance. Email addr esses: j gimenez@m ate.uncor.edu (Javier Gim enez), mart inez@uns .edu.ar (Jorge Martinez), flesia@f amaf.unc .edu.ar (A na Georgina Flesia) Pr eprint submitte d to Computer Vision and Image understanding F ebruary 11, 2014 In g eneral, ED ev aluatio n measures can b e classiﬁed due to the need of a reference map called G round T ruth (GT) (sup ervised o r unsup ervised measures) and the type of score that they output, quan titat ive or q ualitat ive. Some w ell known ex amples of quan ti- tativ e sup ervise d measures, also called discrepancy measures, are Pratt’s Fig ure of Merit (PF oM) [4], Kappa index [5], and Baddeley’s D elta Metric (BDM) [6]. A comparison b et w een these discrepancy measures and some other sup ervised statistical measures w ere p erformed in [7]. Tw o main conclusions w ere drawn fro m their exp erimen ts: i) up to date, there is no con vincing solution for edge image comparison or quality ev aluation; ii) the biases of the measures can be he lpful in applications where there is a particular interest in p enalizing or ignoring some sp eciﬁc kind of error. A sup ervised qualit y metric fo r binary do cumen ts based on structural pixel matching, taking in to accoun t global edge structures in tro ducing a smo othness term in the matc hing function w as prop osed in [8]. Examples of binary do cumen ts a re text ﬁles either photo copied, fa xed o r scanned, with fast pub- lishing resolution. In t his kind o f binary do cumen ts, bad visual word understanding is not alwa ys related to classical lo w scoring. PF oM is kno wn t o giv e high scores to lighter maps, with high rate of false negatives , but it is not acquiescen t to h uman p erception [9]. Without the g uide of a GT, assessing edge ma ps qualit y is a more diﬃcult task. The unsup ervised ED measures that are found in the literature lo ok for speciﬁc c haracteristics of the input edge map, suc h as coherence, [10], con tin uit y , [2 ], smo othness and go o d con tin uation, [11, 12], or a n sp eciﬁc pattern iden tiﬁcation, [13], among others [14, 15]. Bo w er et al. studied t he bias intro duce d b y the searc h of only one ch aracteristic [16]. They rep orted a similar conclusion to the one giv en b y [7]: there is no unique solution; moreo v er, selected b est maps are qualita tiv ely diﬀeren t, and bias can no t b e estimated without f ur t her assumption of the error incurred. Recen tly , Yitzhaky and P eli prop osed an unsup ervised ev aluation pro cedure of ED tec hniques based o n the consensus a pproac h [1 7 ]. Using the corresp ondence b et w een dif- feren t standard EDA results, an estimated b est edge map ( c on sensus map ) w as o btained and la ter used as an estimated ground t r uth (EGT). Correspo ndenc e w as computed b y using b oth a receiv er op erating c haracteristics (R OC) analysis a nd a Chi-square test for standard bina r y o utputs, considering a trade oﬀ b et wee n structure and noisiness in the detection results. F ernandez-Garcia et al. pro vided a deﬁnition of c onsensus e dge m ap that is close to the notion of conﬁdence set. They argued that in order to compare ED pro cedures, it is not esencial to use t he b est and exact GT; rather it is only necessary to use a reliable EGT that allows correct classiﬁcation or ranking o f the ED A to b e obtained, [1 8]. They also noted that their appro ac h ma y b e used t o ev aluate detections from diﬀerent EDA (inter-tec hnique p erformance c har acterization) only if t hes e detectors aim at the same output format . Our prop osed ev aluation metho ds also take in to accoun t this assumption. The consensus approach suﬀers from bia s rega rding the generation of the candidate edge maps used to deﬁne the EGT. If the ma jorit y of the edge maps considered are not of adequate qualit y or fail to extract certain edge structures whic h a r e detected by o nly a small selection of the edge maps, this will b e r eﬂected b oth in the consensus EGT and in the qualit y of the ev aluation metho dology deriv ed from it. In a sense, it p enalize s algorithms that do not agree with the failures of the other algo rithms. In this pa per w e deﬁne a new non-reference measure that do es not depend, directly or indirectly , on G T data. As the previous measures, it can b e used for ED p erformance char- 2 acterization whic h includes: (i) the sp eciﬁc ev aluation of an algo rithm (intra-tec hnique pro cess) in order to iden tify its b es t parameters, and (ii) the comparison of diﬀeren t al- gorithms with the same o utput format (in ter-tec hnique pro cess) in order to classify them according to their qualit y . Our prop osal, denoted Statistical Complexit y Measure ( SCM) searc hes for a compro- mise b et w een t w o extreme v alues in the space of edge maps: a map with few edge p oin ts in a p erfect shap e ( Equilibrium ) and with many edge p oin ts r a ndomly lo cated ( Inf o rma- tion ). The new measure is the pro duct of tw o indices, an Equilibrium index E obtained b y com bining lo cal correlation b et wee n the edge map and a family of predetermined edge patterns and an Entr op y index H , deﬁned as a function of the Kolmogoro v Sm irnov (KS) statistic. SCM give s v alue b etw een zero and o ne, b eing zero the minim um and one the maxim um qualit y . Konishi et a l. deﬁned an stat istical E D algor it hm whic h relies on Chernoﬀ information and en tropy of probability distributions conditional to edge and non- edge state [15]. Their v a lidation exp erimen ts studied elemen ts similar to the indices that are part o f our Complexit y Measure. They also noted that maps with scattered random p oin ts ma y giv e high informatio n r ega rdles s the real structure of t he image, but the comb inations of shap e seeking measures with en t r o p y functionals greatly reduces the proba bility of such anomalies. The pap er is organized as f ollo ws. In Section 2, a cosine based discrepancy measure Q B to score a map against a collection of hand-made GT (sup ervise d case), or a gainst a collection of ﬁxed signiﬁcan t patterns (unsupervised case) is intro duced. In Section 3 the concepts b ehind the Equilibrium index E a nd Entr opy index H are in tro duced, a nd the ﬁnal SCM C as the pro duct o f b oth indices is deﬁned. Exp erimen t s and results are discusse d in Section 4 and conclusions and comments are left fo r Section 5. 2. Cosine-based Similarit y Measure (CSM). In this sec tion, a cosine-based similarit y measure Q B is in tr o duced as an in termediate step in the deﬁnition of the ﬁnal measure C , alo ng with neces sary notation. Let I b e an ( a ) ( b ) ( c ) Figure 1: Ev aluation of CSM ( a ) Original L ar ge Building ima g e from So uth Florida database , ( b ) GT, ( c ) Sob el ED output (thresholding parameter T = 0 . 068) with Q B = 0 . 3976. image, b an edge map asso ciated with I , (this is, b is a binary image with the same size as I ), and g its GT (if it is a v ailable). Figure 1 sho ws an example of suc h images. A 3 simple measure of similarity b et w een b and g is the cosine of t he angle b et w een them [19], when t hey are seen as 1- D v ectors (concatenating all columns one under another). W e deﬁne the index Q B as the maxim um of all similarit y v alues Q B ( b ) = max 1 ≤ i ≤ n Q ( g i , b ) = max 1 ≤ i ≤ n g T i b k g i k k b k . (1) b eing k x k = √ x T x , and B = { g 1 , g 2 , . . . , g n } a collection of GT images (that could hav e only one elemen t). The Cauc h y Sc h w artz inequalit y implies that the index is upp er b ounded b y one thus attaining suc h b ound o nly when the map is optimal ( b ∈ B ). Since edge maps a nd GT images are binary images, t he index is lo w er b ounded b y 0, a ttained only in the absence of any similarit y (when b is ort ho gonal to B ). When no GT is a v aila ble, predeﬁned edge patterns may b e lo cally sough t in the edge map with this measure. 3. A statistical complexit y measure (SCM) In this section, a new SCM in the contex t of unsup ervised ev aluation of edge maps is in tro duced. This new measure can b e used to iden tify the optimal parameters of a giv en algorithm, but also to compare a nd rank the r esults of diﬀerent algorithms. In this framew ork, the concepts of Equilibrium and Inform ation can b e discussed and scoring indices fo r suc h qualities in edge ma ps can b e prop osed. F ollowing the general structure of complexit y measures describ ed b y [20], SCM is deﬁned a s C ( b ) = E ( b ) H ( b ) , (2) where E is an Equilibrium index, H is an Entr o py index, and b is an edge map. T o deﬁne suc h indices, the concepts of Equilibrium and I nformation in the context of ED m ust b e discussed . An edge map is w ell balanced (reac hed Equilibrium ) if it is structurally ( a ) ( b ) ( c ) ( d ) Figure 2: ( a ) Or ig inal 215 image from the South Flor ida database, ( b ), ( c ) and ( d ) Cann y ED outputs with para meters : high threshold T h = 0 . 01 , 0 . 19 , 0 . 99, lo w er threshold T l = 0 . 4 × T h ; standard deviation σ = √ 2, resp ectiv ely . simple. In this sense, the map in panel ( d ) of Figure 2 is b etter balanced than the edge map in panel ( c ) and in turn, the one in ( c ) is b etter balanced than the one in ( b ). Regarding Entr opy , one map has more information than another if the discontin uities, textures and shap es o f the ana lyzed image are b etter characterized. The o v erabundance 4 of information pro duces chaotic (cluttered) edge maps lik e in ( b ), but the absence of information pro duces po or edge maps lik e in ( d ). Th us, Equilibrium and Info rm ation a re t w o complemen tary concepts, and the Co m plexity searc hes for a balance p oint b et w een them. Th us, t o quantify the Equilibrium of an edge map, w e measured lo cal cor r elation against a fa mily of sp eciﬁc edge patt erns assessin g the correct iden tiﬁcation and v alue of the usual lo cal c haracteristics of edges. Since En tr opy should measure the amount of information of a sy stem, which is maximized w hen the s ystem reac hes a random state, w e assess the randomness of an edge map with a n index based on con trasting the statistical distribution of the spatia l edge p ositions ag ainst the bidimensional uniform distribution. 3.1. Equilibrium index Ab dou and Pratt in tro duced in their seminal pap er the notion of Figure of Merit in order to score fragmente d, oﬀset and smeared edge pat terns in comparison with the ideal edges presen t in the G T [1]. The Equilibrium index should p erform a similar task in the unsup ervised case, th us t he G T will b e replaced b y a family B of carefully c hosen binary edge patterns. Abusing nota tion, let B = { b 1 , b 2 , . . . , b n } b e a collection of N × N edge patterns transformed in to column v ectors. Sliding a N × N windo w ov er the edge map b , cen tered in eac h edge pixel p osition k , edge sub-maps b ( k ) are extracted and transformed in to column v ectors. The CSM of each sub-maps with resp ect to the family of edge pa tterns B is computed b y Q B  b ( k )  = max 1 ≤ j ≤ n b T j b ( k ) k b j k   b ( k )   . (3) The Equilibrium of b with respect to the fa mily of edge patterns B is deﬁned as the a v erage of the lo cal CSM computed o nly on edge pixels k , E ( b ) = 1 | E b | | E b | X k =1 Q B  b ( k )  , (4) where E b is the set of a ll edge pixels in the binary edge map b , and | E b | is the cardinal n um b er of suc h set. 3.1.1. A family o f e dge p atterns The family B of edge patterns could b e v ery general, but in this pap er, as in [2], only line-lik e edge pa t t erns a r e considered (F igure 3) . Being a line segmen t an esse ntial primitiv e graphic, it can b e used to construct many other ob jects. Our line patterns are made with an accurate and eﬃcien t raster line-generating algorithm deﬁned in [21]. Bresenham stated that his line alg orithms provide the b est-ﬁt appro ximations to the true lines by minimizing the error (distance) to the true primitive . Beginning with ray traces that go through the origin, 140 edge patterns of size 7 × 7 w ere constructed and stored in the presen t database (F igure 3). In Figure 4, the v alues of Q B on diﬀeren t patterns that app ear in a Sob el edge map (computed from image blo ck ) are sho wn. The edge pattern ( c ), ( f ), ( g ) and ( h ) sho w the p erformance of the index when the edges a re close to line segmen ts. The maps ( h ) - ( k ) 5 Figure 3: A family of lineal edge 7 × 7 pixel patterns obtained with Bresenham’s algorithm. sho w the b eha vior of (Eq. 4) in presence of thic k edges. The maxim um v alue is r eached in ( h ), a pattern of a line o f one pixel width. Noisy patterns ( b )-( e ) reac h an index v alue lo w er than 0 .5 4. ( a ) ( b ) ( c ) ( d ) ( e ) ( f ) Q B = 0 , 378 Q B = 0 , 534 Q B = 0 , 507 Q B = 0 , 428 Q B = 0 , 654 ( g ) ( h ) ( i ) ( j ) ( k ) Q B = 0 , 755 Q B = 1 Q B = 0 , 707 Q B = 0 , 577 Q B = 0 , 378 Figure 4: ( a ) Edge map of ima g e blo c k , ( b )-( k ) window s of size 7 × 7 extracted from ( a ). 3.2. Information and the Kolmo gor ov Smirnov (KS) statistic Shannon stated that the information provided b y an observ at io n is prop ortional to ho w improbable it w as [22]. Relating this notion with the edge detection problem , having three p oints aligned in an edge map implies tha t the probabilit y of having a fourth p oin t next to them is higher than the pro ba bilit y of ha ving a p oin t further aw ay . Therefore, observing a p oint in a place with low probability giv es more information than observing a p oin t in an exp ected place. Randomness in space p ositions is at the core of the notion of e dge information . T esting randomness in space is a task usually done b y testing the nu ll h yp othesis o f uniform distribution in t he unit square with the use the KS statistic. Suc h statistic tak es v alues betw een 0 and 1 , r ejecting the uniform h yp o thesis for v alues close to 1. F or a giv en edge map b , let φ = ( φ 1 , φ 2 ) be an injectiv e function that maps the edge p ositions ( i, j ) ∈ E to the unit sq uare [0 , 1] × [0 , 1], deﬁned by φ ( i, j ) = ( 2 i − 1 2 N , 2 j − 1 2 M ) where 6 N × M is the image size. Let D b e the KS bidimensional statistic deﬁned by D ( b ) = max ( x,y ) ∈ R 2 | F b ( x, y ) − F ( x, y ) | , (5) where F is the cum ulative distribution function of an uniform distributed bidimensional v ector, and F b the empirical distribution function of the sample φ ( E ) giv en b y F b ( x, y ) = |{ ( i, j ) ∈ E /φ 1 ( i ) ≤ x , φ 2 ( j ) ≤ y }| | E | . (6) The en tro p y measure H is deﬁned as H ( b ) = 1 − D ( b ) . (7) D ( b ) w as computed using the eﬃcien t a lg orithm b y [23]. 4. Results and A naly sis. 4.1. A im of the exp eriments The computational exp erimen ts sho wn in t his section are designed to inv estigate b oth the edge discrimination p o w er of the concepts of Information and Eq uilib rium imple- men ted b y H and E and the use of C fo r ED perfor ma nce c haracterization whic h includes: (i) the sp eciﬁc ev alua t io n of a n algorit hm (in tra-tec hnique pro cess) in order to iden tify its b est pa rameters, and (ii) the comparison of diﬀeren t algorithms (in ter-tech nique pro cess) in o rder to classify them according to their quality . In o rder to do so, images fro m b enc hmark databases, compiled sp eciﬁcally for edge detection and ob ject b oundary detection w ere selected, and for each image, a database of edge ma ps w as made b y sampling the parameter space of sev eral w ell kno wn gradient based ED algor it hms (EDA). On suc h data ba se , C scoring is compared ag a inst referenc e- based measures, i.e. meas ures that take in to accoun t the GT provided b y the image b enc hmark databa se. The reference -based measures considered are o ur Q B , deﬁned in Section 2 and the golden standa r d PF o M discrepancy measure giv en b y P α ( g , b ) = 1 max {| E b | , | E g |} X k ∈ E b 1 1 + αd 2 ( k , E g ) , where α = 1 / 9, d is the Euclidean measure, E g and E b are the edge pixels subse t o f maps g a nd b resp ectiv ely , [1, 7] . The p erformance of C in the intra tec hnique ev aluatio n pro cess was done by studying the e dge map se lected for the maximum v a lue of the scoring curv es o v er the database and the actual scoring v alue. The former giv es visual evidence and the later give s informatio n ab out the balance b et w een Equilibri um and En tr opy (in the case o f o ur unsup ervised measure) and whic h of them is the closest to the GT in the case of PF o M and Q B . The scoring curv es C , E and H also shed ligh t on how each index reﬂects the degradations pro duced b y excess or absence of edge pruning. 7 Sev eral reference-based measures w ere compared in [7] b y using a database of degraded images made b y a pplying three diﬀeren t degradatio n operato r s to a single output of Cann y ED A, with sp eciﬁc par ameters selected to pro vide a n ov erall go o d edge ma p related to the GT. The o p erators were addition of fa lse p ositiv es, addition of false negativ es, and diagonal displacemen ts in a random fashion. The exp erimen ts considered in this pap er do not include r a ndom modiﬁcatio ns in the map; all edges are true edges if they are considered in a n appropriate scale. Also, all other EDA considered here, b esides Cann y , a im at the same output for ma t as in [17, 18]. They ar e all g radien t based ED A, th us they can b e compared using qualit y curv es [9]. Cann y EDA was also used as a b enc hmark detector in [24]. The mo del presen ted a s a baseline w as Matlab’s implemen tation of the Canny edge detector, with and without h ysteresis. F o r bo th cases, standard deviation σ w as t he only parameter to ﬁt, since the thresholding parameters w ere considered parameters of the Precision-Recall (R OC) curv e deﬁned a s reference-based ev aluation methodo lo gy . W e follow the metho d of [9] to describ e the qualit y of an ED A that pro duces a n edge map b ( p ), b eing p parameters in a one dimens ional section S of the ED A parametric space. Giv en the EDA, w e pro duce an ev aluat io n curv e where each p oin t on the curve is indep enden tly computed by ﬁrst setting the parameters p of the EDA to pro duce a binary map and then computing the ev aluation measure on suc h map. When a single p erformance measure is required o r is suﬃcien t, the maximal v alue of the ev aluation measure M b S = arg max p ∈ S M ( b ( p )) , (8) is rep orted as a summary o f the detector p erformance. 4.2. Edge map da tab ase T o construct the database, ﬁv e gradien t based EDA w ere considered: Cann y [2 5], Prewitt [26], Sob el [27] , Rob erts [28] and Laplacian of Gaussian (LoG) [29], all provided b y t he Matlab edge function from Matlab’s Image Pro cessing T o olb ox. According to Matlab’s help, Canny ED A has three parameters: σ , Gaussian standard deviation of the deriv ativ e ﬁlter, and lo w and hig h h ysteresis thresholding para mete rs. Edge thinning by non maxima suppression w as previously p erformed to thresholding. The LoG ED A has tw o par ameters : σ (standard deviation of the Gaussian Laplacian ﬁlter) a nd T (thresholding parameter). Prewitt, Rob erts and Sob el EDA a r e computed b y con v olving the image with t heir corresp onding g radien t op erators along the x and y direction. Thresholding is later applied to the gradient mo dule. Images are not prepro- cessed with smo othing or denoising algorithms, since Matlab applies a priv ate function to thinner edges a nd clean spurious p oin ts after thresholding. F or a ll images, a collection of 1 00 edge maps we re generated b y mo ving eac h EDA parameters as follow s: • Canny ED A: standard deviation is set at σ = √ 2, high threshold h ysteresis pa- rameter ( T h ) is (equally spaced) sampled 100 times fr o m zero to one, and the low threshold parameter ( T l ) is set a s T l = 0 . 4 × T h . • Sob el, Prewitt and Rob erts EDA: threshold parameter ( T ) is sampled 100 times from 0.00 4 to 0.39 6. Edge thinning is turned on. 8 • Lo G ED A: standard deviation is set at σ = √ 2 and the threshold parameter ( T ) is sampled 100 times f rom 0.0004 to 0.0396 . The ﬁnal data ba se has 500 edge maps for each real image considered. Moving eac h ED A threshold from the smallest to the largest v alue g enerally pro duces edge maps with a v arying n um b er of f eat ured p oints . The maxim um v a lue of C ov er suc h EDA outputs selects the edge map with optimal balance b et w een Equilibrium and Information . ( a ) ( b ) ( c ) ( d ) Figure 5: Comparison among GT and extreme edge maps ( a ) Orig inal image 109 , ( b ) GT, ( c )-( d ) Cann y’s extreme edge maps with T h = 0 . 01 , 0 . 99; T l = 0 . 4 × T h ; and σ = √ 2. 4.3. First exp eriment: South Florida datab a se The South F lorida D atabase is a public database prov ided by [1 6 ] sp eciﬁcally for edge detection assessmen t. It largely consists of indo or images with lit t le bac kground texture. It includes tw o collections: o ne of 50 natural images and other with 10 aerial images. Eac h of the ﬁfty images o f the ﬁrst collection con tains a single o b ject approximately cen tered in the image and set ag ainst a natural bac kground. The second collection has images of man ma de constructions. F or eac h of the 60 images, w e constructed a database of 500 edge maps and all measures E , H , C , Q B and PF oM w ere computed on them, the latest tw o by using t he GT av ailable in the data base. W e selected t w o images for a qualita t ive p erfo rmance discus sion: image 109 , go o d qualit y graysc ale indo or image with a central ob ject (F ig ure 5 ( a )) and image wo o ds , a go o d qualit y outdo ors a erial image whic h depicts sev eral buildings surrounded b y woo ds and coun try roads (Figure 8). The ﬁrst image has little texture while the sec ond contains a lot of v egetation (e.g., grass, shrubs, trees), whic h corresp onds to texture in the imag e. The GT depicts edges only related to ob j ect b oundary , whic h do not include the trees presen t in the image. Ov erall, this image is a c hallenge for edge detectors, in particular, for those whic h only use gray scale information. In Figure 5 , image 109 and its G T are sho wn along with t w o extreme edge maps computed with Cann y ED A, with T h = 0 . 01 and T h = 0 . 99. The ﬁrst edge map has man y t ex ture details tra nsformed in short edges, a nd the second has almost no edges. The other 98 edge maps are comprised in b et w een these t w o extreme edge maps. The ev aluation curv es, constructed with the v alues of the E , H and C o v er the collec- tion of EDA outputs as a function of the T h v alues, a re sho wn in F igure 6 along with a plot of PF oM a nd Q B o v er the same parameter range. The C ev aluation curv e displa ys the usual b eha vior o f complexit y measures; it shows a p eak when b oth measures are balanced. Qualitative comparisons b et w een best edge maps according to C and PF oM 9 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 high threshold E H C 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 high threshold PF oM Q B ( a ) ( b ) ˇ ( c ) ( d ) Figure 6: Canny EDA quality curv es of image 1 09 . ( a ) Plot of E , H and C vs hig h threshold T h , ( b ) Plot of PF oM and Q B vs T h , ( c ) Best map according to C (score 0 . 714 ) and ( d ) Best map according to PF oM (score 0 . 4784) and Q B (score 0 . 3068 ) . Cann y EDA parameters are high threshold T h =0.08 and 0.1 4 res p ectiv ely , lo w threshold T l = 0 . 4 × T h and σ = √ 2. are made, see Figure 7. The b oundary in the lo w er part of the coac h is missing in the edge map selected by PF oM, (Figure 7 ( d )) . Our measure selected a more deﬁned edge map, i.e. with more edge p oin ts, (Figure 7 ( c )), th us sho wing a more deﬁned con tour around the couc h. Also, comparing the thresholds v alue, the C measure selected a map with T h = 0 . 08 and PF oM a map with higher threshold, T h = 0 . 14. PF oM was rep orted as a measure with a bias tow ards false negative s, i.e. it giv es hig her scores to edge maps with few edges [7 ]. This accoun t for the missing edge b oundary p oints around the main ob ject in the PF o M edge map. The second image, wo o ds , pro vides a challenge to all the detectors. The edge maps selected by C , PF oM and Q B are sho wn in Figure 8. Maps are qualitativ ely diﬀeren t, selected fro m diﬀeren t quartiles of the parameter range. Reference-based measures select: PF oM to Rob ert EDA a s the b est edge detector and Q B to Sob el EDA. T able 1 shows their para meters v alue a nd scores. The scores are lo w, i.e. indicating low agreemen t b et w een the outputs and GT. The analysis of the scores giv en b y PF oM and Q B to the edge maps selected b y C , (T able 2, sixth to sev en th column) rev eals that the reference-based measures hea vily p enalize the false p ositiv es that are in tro duced b y outlining the w o o ds a nd coun try road surrounding the buildings in the image, whic h are not depicted in the GT. T able 2, third to ﬁfth column, sho w E , H and C scores, rev ealing that the most balanced ma p related to E is given b y Lo G ED A; the map with most edge en trop y , i.e. related to H , is given 10 ( a ) ( b ) ( c ) ( d ) Figure 7: ( a ) Image 109 ; ( b ) Enlarged view of the mark ed region in ( a ); ( c ) Enlarged view of the region sho wn in ( b ), extracted f rom b est Canny map according to C ; ( d ) Enlarg ed view o f the regio n sho wn in ( b ), extracted f rom b est Canny map a ccording to PF oM. b y Cann y ED A and the one that sho ws b etter stat istical complexit y ,i.e. related to C , is also Cann y ED A. The scores giv en b y PF oM and Q B to suc h maps selec t as the b est map t he one giv en by Prewitt ED A. An in teresting comparison is giv en b y the scores of E and Q B . The Equilibrium index is based on Q B b y replacing the GT with a family of preselected lo cal patterns, sought in the whole image. The provided GT, whic h do es not outline all t he ob jects in the image, misleads the reference-based measures to w ards ligh ter maps while the use of pre-deﬁned edge patterns helps prev en ting suc h problem. An enlarged v iew of the southeast corner of the image wo o ds depicting a coun try ro a d is sho wn in Figure 9) ( a ) with its corresp onding G T in panel ( b ). Panel ( c ) and ( d ) are Cann y edge maps, ( e ) and ( f ) LoG edge maps, ( g ) and ( h ) Prewitt edge maps, ( i ) and ( j ) Sob el edge maps, ( k ) and ( l ) Rob erts edge maps. The ﬁrst of each pair w as selected b y C and the second of eac h pair b y PF oM. In e ach edge map selected b y the refe rence-based measures, the coun try road is p o orly deﬁned. Instead, ev ery edge map selected b y C , F ig ure 9 ( c ) , ( e ), ( g ), ( i ) and ( k ) hav e the country road w ell deﬁned, as w ell a s the vegetation surrounding it. Finally , w e discuss the empirical statistical distribution of the maxim um C scorings on all EDA maps computed with all South Florida images. Bo xplots of suc h v a lues are sho wn in Figur e 10 a long with b o xplots of PF oM scores computed on the same edge maps, the ones selected b y our C measure. W e infer from comparison b et w een a ll C empirical 11 ED A ED A optimal pa r a meters PF oM ED A optimal parameters Q B Cann y T l =0.228 T h =0.57 0.5082 T l =0.264 T h =0.66 0.3257 LoG T =0 .0096 0.4545 T =0 .01 0.2728 Prewitt T =0 .0096 0.5479 T =0 .1 0.3 9 61 Rob erts T =0.0096 0.5565 T =0.0 72 0.3583 Sob el T =0 .1 0.5526 T =0 .096 0.3978 T able 1: Maxim um scores of PF o M and Q B ev aluation curv es for all EDA , computed o v er image wo o ds .(Maxim um scores b y column are highligh ted in b old typeface.) ED A C - based optimal parameters E H C PF oM Q B Cann y T l =0.076 T h =0.19 0.7472 0.9646 0.7208 0.3061 0.2876 LoG T =0.0076 0.7760 0.9201 0.7139 0.3951 0.2654 Prewitt T =0 .064 0.7384 0.92 62 0.6839 0.4197 0.3787 Rob erts T =0.0052 0.6515 0.93 68 0.6104 0.3649 0.3348 Sob el T =0 .064 0.7326 0.92 93 0.6808 0.4137 0.3775 T able 2: E , H and C scores of b est ED A outputs of image wo o ds . Q B and PF oM scores corresp ond to the b est edge map according to C .(Maxim um scores b y column are high- ligh ted in b old t yp eface.) distributions that Cann y EDA pro duces slightly b etter maps than the other detectors. PF oM score diﬀeren tly suc h maps giving t he same mo derately lo w mean (around 0.5 v alue) to all gradien t detectors but Lo G ED A. W e show examples of suc h images and b est ED A maps in Figure refotra s. Eac h ro w displays outputs from a diﬀeren t EDA. Last panel of eac h row sho ws a plo t of the C curv e as a function of high threshold. The diﬀeren t shap es of the C curv es can b e accoun ted for the diﬀerences in the sampling of eac h EDA parameter space. 4.4. Se c ond exp e riment: Imag e with multiple GT im a ges Some authors b eliev e that the man ual GT approac h is essen tial for p erformance c har- acterization b ecaus e most researc hers do not regard r esults on syn thetic images as con- vincing and still wish to see results on real images, see [1 8] and references therein. Un- fortunately , man ual GT annotation is dubious, tedious and time-consuming. In addition, diﬀeren t a nnotators of t en giv e diﬀeren t GT or the same a nnotator can g ive diﬀerent GT to the same real image at diﬀerent times [30]. The use of c onsensus GT for real images a v oids b oth (1) the sub jectiv e generation of man ual GT for real images and (2 ) the gen- eration of artiﬁcial GT f o r artiﬁcial imag es, whic h pro ba bly do not faithfully represen t the real scenes . On the other hand, the consensus G T allow s man y real images to b e used f or p erformance c haracterization, [18, 17]. In this exp erimen t, w e show that the use of our measure giv es results similar to the ones obta ined when a p o ol of G T images a re used in the intra-tec hnique a nd in the in ter-tec hnique ev aluation problem. W e selected a n imag e from the Berk eley Segmen tatio n Database [30], a b enchmark database for b oundary detection algo r it hms that pro vides images with sev eral hand ma de 12 segmen t a tions oﬀered as GT. Th us, the lev el of detail of the diﬀeren t GT segmen tations is div erse, and it represen ts the hu man opinion on what the b oundary edges of the ob jects in imag es are. Besides, edge detection is not the same as b oundary detection; b oundary maps sho w o nly t he outline of main ob jects while edge maps sho w the whole structure of the image. In Figure 11 ﬁve diﬀeren t GT images av a ilable f o r the image 86 000 are sho wn. All sup ervised measures dep end on the leve l of detail of the G T, th us any sup ervised measure computed with GT1 (highly detailed) will giv e high marks to a more cluttered edge map, but if computed by using GT5 (little detailed) it will certainly giv e a maxim um score to a map with v ery f ew edge p oints . Unsup ervised measures score diﬀeren tly; they searc h general c haracteristics in the map, determined in this case by the speciﬁc database of edge pat t erns and the KS statis- tic. This example aims at exploring the degree of matching of C scoring with h uman observ ation and supervised measures. Thus we use only the gold standard sup ervised measure, PF oM, a nd the gold standard EDA, Cann y EDA , to elab orate the example. W e obtained C scores for the ﬁv e GT images a nd the 100 e dge maps outputs of Cann y ED A computed with parameters describ ed in the previous subsec tion. The same 100 Cann y edge maps w ere scored with PF oM, b y using all diﬀeren t ha nd- made GT images and they w ere scored with Q B using diﬀeren t collections of GT images. Our reference- based measure Q B pro vides a c on s e nsus sc or e in this framew ork. In Figure 11 ( a ), the b est map selec ted b y C is sho wn. In panel ( b ) the optimal edge map selected using PF oM with G T1 image is sho wn. That map was also selec ted with the sup ervised measure Q B using the collection o f all man-made GT images a v ailable. In panel ( c ), the edge map selected by Q B b y using G T2, GT3, GT4 a nd GT5 is sho wn. In panel ( d ), the edge map using PF oM with GT5 is sho wn. Visual insp ection tells us that the maps selected by PF oM and C are almost iden tical when the G T is the most detailed one (GT1). But the diﬀerences are v ery striking when PF oM is us ing GT5 (the le ast detailed o ne) as GT, i.e. the map selected los t the structure of the building. In T able 3, the v alues of E , C and H ov er the collection of ﬁv e GT images are show n. Our C measure giv es the maxim um scoring to t he most detailed GT. In this example, GT E H C GT1 0.8146 0.8612 0.7015 GT2 0.7730 0.8292 0.6410 GT3 0.7599 0.7852 0.5966 GT4 0.7597 0.7817 0.5939 GT5 0.7778 0.8040 0.6253 T able 3 : E , H and C scores o f a ll GT images a v ailable fo r the image 8 6 0 00 .(The b est results a re highligh ted in b old t yp eface.) three may or conclusions are dra wn: • By using sup ervised measures, the degree of details of the G T impacts on t he qualit y o f the edge map selected. PF oM selects a b etter map using a detailed G T than using a less detailed GT. Our reference-based measure Q B selects edge maps 13 that are as go o d as the ones selected b y PF oM, and it accommo dates the use of a whole collection of GT ima g es to selec t the b est edge map when mo ving parameters in a ﬁxed range. • C selects edge maps that are as go o d as the ones s elected b y PF oM at its b est (when the G T is detailed enough) but selects b etter maps tha n PF oM when the GT is v ery simple (almost a b oundary). The Equilibrium index E is based on the Q B measure computed o ve r a rich database of patterns; this op eration is b etter than correlating with a simple (or inaccurate) GT. 5. Conclusions In this pap er, new ideas of edge Equilibrium and edge In formation are discuss ed. They lead to the deﬁnition of a new SCM for scoring binary maps. T o measure edge Equilibrium , a similarity index w as deﬁned b y pro jecting the edge map in to a family of edge patterns that scores the con tin uit y a nd width of edges in ﬁxed size windo ws of the edge map. T o measure I nformation , a new Entr op y index based on the K S statistic w as deﬁned. The SCM is the pro duct o f the Equilibrium and Entr opy indices and it is eﬀectiv ely used f o r p erformance c haracterization whic h includes: ( i) the sp eciﬁc ev aluation of an algo rithm (in tra-techn ique pro cess) in order to iden tify its b est parameters, and (ii) the comparison of diﬀeren t algorithms (in ter-tec hnique pro cess) in order to classify them according to their quality . Our experimen ts w ere made with common edge detectors that ar e used b y a large n um b er of practitioners. More complex edge detectors a im at sp eciﬁc characteristics in the images, th us the measure should b e mo diﬁed accordingly with a pat t ern database that accommo dates those general characteristic s. Activ e contour metho ds as applied in [31] ar e based on the stat istical distribution of the noise presen t in PolSAR images. A measure lik e ours m ust carefully b e mo diﬁed to score suc h EDA outputs, which is the scope of another pap er. W e are also studying alternativ e deﬁnitions for the Entr opy index ba se d on edge map histogra m functionals that could b e tailor ed to measure the p erformance of b oundary detection algorithms more accurately . Ac kno wledgmen ts This pap er has b een partially supp orted b y the Argentine an G ran ts PICT 2008 -00291, and SeCyT-UNC. JM was partially supp orted by a SGCyT-UNS grad student tra ve l fello wship. JM w an ts to thanks F amaf-UNC for it s hospitality while the preparatio n of this man uscript. JG w as support ed b y a Conicet graduate studen t fellow ship. The authors w ants to thank Prof. Alejandro F rery for enrich ing discussions that lead to the deﬁnition of the measure. All measure relat ed Matla b co de w as written b y the authors and it is a v ailable to b e do wnloaded from the Repro ducible Researc h r epository of AGF at Univ ersit y of Cordoba. References [1] I. Abdou, W. Pratt, Quan titative design and ev aluation of enhance- men t/thresholding edge detectors, Pro ceedings of the IEEE 67 ( 5 ) (1979 ) 753–763. 14 [2] L. Kitch en, A. Rosenfeld, Edge ev aluation using lo cal edge coherence, IEEE T rans- actions o n Systems, Man and Cybernetics 11 (9) (19 81) 59 7–605. [3] G . P apari, N. P etk o v, Edge and line orien ted con tour detection: Stat e of the art, Image and Vision Computing 29 (2-3) (201 1 ) 79–103. [4] W. K. Pratt , D igital Image Pro ces sing, 4th Edition, Wiley-In terscience, 197 8. [5] J. Cohen, A co eﬃcien t of agreemen t for nominal scales, Educational and Psyc holog- ical Measuremen t 20 (1) (1 960) 3 7–46. [6] A. J. Baddeley , An error metric fo r binary imag es, Robust Computer Vision: Qualit y of Vision Algorithms ( 1 992) 59–78 . [7] C. Lop ez-Molina, B. De Baets, H. Bustince, Quantitativ e error measures for edge detection, Pattern Recognition 46 (24) (2013 ) 1125 –1139. [8] W. Ja ng, C. K im, SEQM: Edge qualit y assessm en t based on structural pixel mat ch- ing, IEEE Visual Communic ations and Imag e Pro cessing Conference (V CIP) (2012) 1–6. [9] N. F ernandez-Garc ´ ıa, R. Medina-Carnicer, A. Carmona- P oy ato , F. Madrid-Cuev as, M. Prieto-Villegas, Characterization of empirical discrepancy ev aluation measures, P attern Recognition Letters 25 (1 ) (2004) 35–47. [10] D. Bry an t, D . Bo uldin, Ev aluatio n o f edge op erators using relativ e and absolute grading, IE EE Conference o n P attern Recognitio n and Imag e P ro cessing (1979 ) 138– 145. [11] Q. Z h u, Eﬃcien t ev aluations o f edge connectivit y and width uniformit y , Imag e and Vision Computing 14 (1 ) (1996) 21–34. [12] M. Heath, S. Sark ar, T. Sano c ki, K. Bo wy er, A r o bust visual metho d fo r assessing the relative p erformance o f edge-detection a lgorithms, IEEE T ransactions on Pattern Analysis and Mac hine In telligence (TP AMI) 19 (12) (1997) 1338–1359. [13] J. Bernsen, An ob jectiv e a nd sub jective ev aluation of edge detection metho ds in images, Philips Journal of Researc h 46 (1991) 57–94 . [14] S. Nercessian, S. Agaian, K. P anetta, A new reference-based measure for ob jectiv e edge map ev aluat io n, Pro ceedings SPIE 7351 Mobile Multimedia/Image Pro cessing, Securit y and Applications (2009). [15] S. Konishi, A. Y uille, J. Coughlan, A statistical approac h to multi-scale edge detec- tion, Image and Vision Computing 21 ( 1) (2003) 37– 48. [16] K. Bow y er, C. K ranen burg, S. Doughert y , Edge detector ev aluation using empir- ical ROC curve s, IEEE Conference on Computer Vision and P a t t ern Recognition (CVPR) 1 (1999) 354 – 359. 15 [17] Y. Yitzhaky , E. Peli, A metho d for o b jectiv e edge detection ev aluat io n and detector parameter selection, IEEE T ransactions on P attern Analysis and Mac hine In telli- gence (TP AMI) 25 (8) (200 3 ) 1027–103 3 . [18] N. F ernandez-Garc ´ ıa, A. Carmona-Po yato, R. Medina-Carnicer, F. Madrid-Cuev as, Automatic generation of consensus gro und truth for the comparison of edge detection tec hniques, Image and Vision Computing 26 (2008) 49 6 –511. [19] S. Theo doridis, K. Koutro um bas, P attern Recognition, Elsevier Science, 2008. [20] R. L´ op ez-Ruiz, H. L. Mancini, X. Calb et, A statistical measure of complexit y , Ph ysics Letters A 209 (5–6) (1995) 321–326. [21] J. E. Bresenh am, Algorithm for computer con trol of a digital plotter, IBM Systems Journal 4 (1) (1965) 25–30. [22] C. E. Shannon, A mathematical theory of comm unication, The Bell System T ec hnical Journal 27 (3) (19 4 8) 379–423 . [23] A. Justel, D. P e ˜ na, R. Za mar, A m ultiv ariate Ko lmogoro v-Smirno v test of go o dness of ﬁt, Statistics & Probability Letters 35 (3) (1997) 251–2 5 9. [24] C. Martin, C. F ow lk es, J. Malik, Lear ning to detect natural image b oundaries using lo cal br ig h tness, color and texture cues, IEEE T ransactions on P attern Analysis and Mac hine In telligence (TP AMI) 26 (5) (2004) 530– 549. [25] J. Cann y , A computationa l approach to edge detection, IEEE T ransactions on Pat- tern Analysis and Machine Intelligenc e (TP AMI) 8 ( 6 ) (1986 ) 679– 698. [26] J. M. S. Prewitt, Ob ject enhancemen t and extraction, B. Lipkin and A. Rosenfeld, Eds. New Y ork: Academic, 1970. [27] I. E. Sob el, Camera mo dels and mac hine p erception, Ph.D. thesis, Stanford Univ er- sit y , Stanford, CA, USA (1970). [28] L. G . Rob erts, Mac hine P erception o f Three-Dimensional Solids, Outstanding Dis- sertations in the Computer Sciences, Garland Publishing, New Y ork, 1963. [29] D. Marr, E. Hildreth, Theory of Edge Detection, Pro ceedings o f the Ro y al So ciet y of London, Series B, Biological Science s 20 7 (1167) (1980) 187–2 17. [30] D. Martin, C. F owlk es, D. T a l, J. Malik, A database of human segmen ted natural images and its application to ev aluating segmen t a tion algor it hms and measuring ecological stat istics, IEEE International Conference on Computer Vision (ICCV) 2 (2001) 416 –423. [31] E. Giro n, A. F rery , F. Cribari-Neto, Nonparametric edge detection in speck led im- agery , Mathematics and Computers in Sim ulation 82 (2012) 21 8 2–2198. 16 Figure 8: Best ED A map selection according to C , PF oM a nd Q B , from a collection of edge maps of image wo o ds , made with diﬀeren t EDA. First row : Original image wo o ds and GT. F rom the second to t he last ro w, eac h column corresp onds to b est edge map according to C , PF oM and Q B obtained b y diﬀeren t EDA: Cann y; LoG; Prewitt; Rob erts and Sob el, resp ectiv ely . 17 ( a ) ( b ) ( c ) ( d ) ( e ) ( f ) ( g ) ( h ) ( i ) ( j ) ( k ) ( l ) Figure 9: ( a ) Enlarged view of the southeast corner of the image wo o ds depicting a coun try road. ( b ) GT from ( a ). ( c ) and ( d ) are Cann y edge maps, ( e ) and ( f ) LoG edge maps, ( g ) and ( h ) Prewitt edge maps, ( i ) and ( j ) Sob el edge maps, ( k ) and ( l ) Rob erts edge maps. The ﬁrst of each pair w as selected b y C and the second of eac h pair b y PF oM. 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 Canny LoG Prewitt Roberts Sobel 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 Canny LoG Prewitt Roberts Sobel ( a ) ( b ) Figure 10 : ( a ) Bo xplot of maxim um C score v alues, ( b ) Box plot of PF oM s cores computed on edge maps selec ted b y C . Scores are computed o v er all 50 images of the ﬁrst collection of South Florida database. 18 ( a ) ( b ) ( c ) ( d ) ( e ) ( f ) GT 1 ( g ) GT 2 ( h ) GT 3 ( i ) GT 4 ( j ) GT 5 Figure 11: ( a ) Image 86000 ; ( b ) b est map selected with measure C ; ( c ) b est map selected b y PF oM with GT1; ( d ) b est map selected b y Q B with all G T; ( e ) b est map selecte d by PF oM with GT3. ( f )-( j ) are respectiv ely (G T1)- (GT5) GT imag es from the Berk eley database. 19 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 high threshold E H C 0 0.2 0.4 0.6 0.8 1 0.4 0.6 0.8 1 high threshold E H C 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 high threshold E H C 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 high threshold E H C 0 0.2 0.4 0.6 0.8 1 0.4 0.6 0.8 1 high threshold E H C Figure 12: F rom top to b ottom, images and detectors, 109 , Cann y; c oﬀe e , Lo G ; 218 , Prewitt; e gg , Rob erts; p arkin gmeter , Sobel. The last panel of eac h row sho ws plots of the C measure vs t he threshold v alues o f the corresp onding detector. 20

Unsupervised edge map scoring: a statistical complexity approach

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment