Land Cover Mapping Using Ensemble Feature Selection Methods
Ensemble classification is an emerging approach to land cover mapping whereby the final classification output is a result of a consensus of classifiers. Intuitively, an ensemble system should consist of base classifiers which are diverse i.e. classif…
Authors: A. Gidudu, B. Abe, T. Marwala
Lan d Cover Mappin g Using En se m ble Featur e Selection M e thods Gidudu, A . *§ , Abe , B. § a nd Marwal a, T. § § School of Ele ctrical and I n for m ation Enginee r ing Unive rsity of the Wit w a te r srand , 2050, South A fr i ca Abstra ct En s e mble c l assification is a n eme r ging approach to l a nd cove r m a pping whe r e by the f in al classific ati on output is a resul t of a ‘consensus’ of classifiers. In tu i tively, an en s e m b le system s ho uld consist of bas e cl assifiers which ar e diverse i.e. cl assifie r s who se deci s ion bo u nd a rie s e rr d iffe r en tly. In t hi s paper en semble feature s ele ction is us ed to impos e dive rsity in ensembles. T he f e a tures of t he constituent base cl assifiers fo r e ach ensemble were cr e a ted t hro ugh a n e x h a ustive search a lgo r i t hm usi ng diffe r ent se par abil it y indice s. For each en semble, the classification a c curacy w as de rived as well as a dive rsit y me asur e purported to give a me a sure o f t he i n - en s em ble diversity. T he c o rr e lation b e t wee n ens em bl e cl assification acc ura cy and diversity measure w as de t e rmined to establi s h t he interplay b e t ween t he two variable s. Fr o m t he findings of this pa per, d ive rsit y measures a s c urr e ntl y f o r m ulat e d do not provide an adequate m e a ns upon which to c on stitut e en s em bl e s f o r l and cove r m a ppi ng. Keywo r ds: Ensemble Fe ature Selection, Diversity, Diversity Measures, Land Cover Mapping 1.0 Intr oduction Incre asingly Ear th o bservation has become a pr ime s ource of data in t he geoscien ces and m a ny relat ed discipline s p e r m itting r e s e a rch into t he distant past, th e pre s e nt a nd in t o the futur e (Krame r, 2002). This has re sulted in t o new c l ar i t y and b e tt e r awareness o f the earth’s dyn a mi c nature. Earth o bs e rvation is based on the pr emi s e that information i s available from t he elec trom a gne t ic ene rgy f ield ar i sing fr om the e ar th ’s surface (or a tmosphe r e o r both) a nd in par ti cul a r fr om t he spatial, s pe ctra l and temporal variations in t h a t field (Kra m e r, 2002). One of t he ar e a s of r esearch interest has a lw a ys b ee n how t o r e late Ear th obse rvation output e .g. a e r i al p ho t og r aphs and sa tell ite im a ge s to known features (e.g. land c ove r ). T he le ap from m a nual a e rial photographic interpretation to ‘autom atic’ classification w a s inspired by t he avail ability of experime ntal data i n vario us ba n d s in the mid 1960’s as a p rel u de t o the launch of the Earth Re s o ur ce s T e chnol ogy S atelli t e (ERTS – which was lat e r r en a med La nd sa t 1). This necessi t at e d t he ado pt i on of digi t al m ult i v ar i ate stat i s ti cal me t hod s for the extraction of l and cover infor m ati on (Landg r ebe , 1997). Some of t he earlie s t im a ge classifiers at the time inc l uded m a ximum likelihood a nd minim u m distance to me a n s cl assifiers (Landg r e b e , 2005; Wack er and Landgre b e , 1972). Artifici al Ne ur al Ne twork an alysis was po pular at the time , howeve r t he then com putat i onal capacity inhib i ted i t s wide spr e ad us e (L a n dgrebe, 1997). T o date , im age c l assification has benefitted from a dv a ncem ents in im pr oved c om putational p owe r a nd algorithm develop m en t. An exa m ple o f t he subsequent a lgo rit hm s that have taken r oo t in im a ge classification in c l u de k-Neare st N eigh bours, Support Ve ctor Mach ine s, S elf Organi s ing Maps, Ne ura l Ne twork s, k - m eans clustering and obje ct orien t e d classific ati on . In light of t he improved c om putat i onal p owe r, var i ety of classification algorithm s, d a tasets wi t h inc r e a sing n u m b e r of bands, o ne o f the growing a re as o f i n t e r e st has bee n how t o ‘combine’ c lassifie rs in a p r o cess bette r known as ensemble classification. Ensemble c l assification is premi sed on combining t he outputs of a give n number of classifiers in o rder to de r ive an a cc ura te c lassific ation (Foody e t al., 2007). In fields li ke computat i onal int e lligence , c om bi ning classifie rs is now a n esta bl ished r esearch ar e a (Kunche v a a nd Whitake r, 2003) a n d goes by a vari e ty of na m es s uc h a s m ultiple c l assifier system s, mi xt ure of e xperts, com mittee of classifiers, a n d en s em bl e ba sed systems (Polikar, 2006). On e o f the m a in prer e quisi t e s in bu ild ing a n en s em b le sy st em i s en s uring that the r e is diversity a m ong the bas e (con st ituent) classifiers (Y u and Cho , 2006). Di ve rsity in en s e mble s y s tem s m a y be e nsur ed through t he use of diffe r en t: tra i ning d ata se ts, classifiers, features or tr aining par ame t e rs (Polikar, 2006) . Pre vious wo rk rel ati ng e nsemble c l assification to l and cove r m apping has fo cus ed on inve stigating how combining dif fer e nt classifie rs im pacts on classific ati on a cc ura c y (Foody et al., 2007), how diff e r en t types of en sembles ca n be applied to l and cover m appi ng (Pal, 2007) and also enforcing di ve rs i t y thro ugh baggi ng fo r land cove r m a pping (Steele and P atte rson , 200 1). In this pa pe r, en s em ble feature s e l e ction is inve stigat e d as a m e ans of i m a ge classification where b y diversity is enforced t h r o ugh using different features. Ano t he r key a spect in t h is paper w ill b e to establi sh i f the r e is any c o rr e lation b e tween cl assification accura c y and one of t he common dive rsity me asur e s. The paper is ar ran ged as follow s: section 2 gives an over v iew of ensem b le classification and diversity me asur e s, sec tion 3 will pre s e nt t he me t ho dol ogy develo ped to ca rry o ut the re s e ar c h, sec ti on 4 will pr e s en t a nd discuss the r e su lts a cc ruing the r e of. 2.0 Overvi ew of En se m b l e Classifica tion The m a in idea b ehin d e nsemble c l assification is that one is i n t e r e st e d in ta king adva n tage of var io us cl assifiers at thei r d isp o sa l to come up wi th a ‘consen s us’ r e sult. The challenge at hand involve s de ciding which classif ie rs to conside r a nd how t o combine their resul ts. From the li t e r ature (e .g. Po lkar, 2006) it is recomm end ed t h at the constituen t classifiers in the ensemble h a ve di ff e r e nt decision bound a rie s, b ec ause if iden tical there will be no gain in c om bi ning the c l assifiers (Shipp and Kunchev a, 2002). Such a set is con si de r e d t o b e dive r se (P o lkar, 2006). Dive r sity in ensem ble s ystem s has b e en more com m only e xpl o r ed b y c on s ide ring diffe r en t classifiers; tra i n ing a given classifier on diffe r en t p ortion s of t he data; using a classifier wi t h different parame ter sp e cifications a n d using di fferent features. T wo methods wh ic h have gained pr om inence in ens em b le c l assification rese ar ch include bagging o r bootstrap a g g r eg ating (Breim a n , 1996) a n d Ad a boo st or r ew eighting boosting (Fr e u nd a nd Sch a pi r e , 1996) which prin cipally involve training a classifie r on diffe r en t training data. The f ocus of this pape r i s ensemble feature s elec tion which entails en suring dive rsity th rough training a given c l a ssif ie r on diffe r en t fe atur e s, w hich in r em ote sen sing would b e the differ e nt s en s o r ban ds. By varying the fe atur e subsets us e d to gene ra te the ensemble classifie r, d ive rsit y i s en sur ed since the ba se cl assifiers tend to err in di ffe r en t subspace s o f the insta n c e space (Oza and Tume r , 2008; T sym bal e t a l . 2005) as illustrated in Figur e 1. Some of t he t e chniques us e d to s elec t f e a ture s to b e us ed i n e nsemble system s incl u de gene tic algorithm s (Opitz, 1999) , e x h aust i ve s e ar ch me t hod s a nd r andom s e lection of feature subse ts (Ho, 1998). Of equal im p o r tance t o en s em ble classification is how to c om bine t he re su l ts o f the base classifie rs (Foody et al., 2007). A num ber of appr o ache s e x i st t o combine i n f o r m at ion from multi p le c l assifie r s (Huang and L e e s, 2004; Valentini a nd Ma sul li, 2003; Giacin to and R o l i , 2001) such as m a jo rit y voting (Chan and Paelinck x, 2008), we ighted m a jo r i t y voting (Po l ik ar , 2006) or m or e s ophisti cat ed me t hod s like con sensus theory ( Bene diksson and Sw ain, 1992) an d stacking (D žeroski and Zenko , 2004). One of the em e r ging areas of r e search i n t e rest has been how to quantify dive rs i ty, a s a resul t of which nume r ous diversity m e a sur e s are under inve st ig ation in the l iteratur e . The m a in focus o f inve st ig a tion has cente red on finding m easures which ca n be used as a basis upon which to buil d dive r se en s em bl e s ystem s. In the li t e r ature ( e . g. P olk ar , 2006; K unc hev a and Whi taker 2003), ther e are t wo categoriz ations of dive rsity me asur e s namely: pair- w is e a n d non-pair-wise dive r sity me asur e s. Example s of pa i r -wi s e me asur es i nclude: Q stat i stic , correlation coefficien t, agreem ent measure, disag reemen t measure a nd double -fault measur e . The d i ve r sity m e a sure f o r the ensemble i s de r ived b y calc ulati ng the a ve r age of t he pair-w ise m easur e s of the const i tuen t classifie rs (T sym bal e t al., 2005; Sh ipp a n d Kunchev a, 2002). Non - pair-w ise di ve rs ity me asur e s include: the en t ropy measur e , Koh a v i-Wol pert variance and measurement of inter-rater agreem ent. 3.0 Meth o dol ogy The study a re a f o r th is r e s e a rch w a s Kam pala, th e capi tal of Uganda. T h e o pt ical ba nd s of a 2001 Land sa t im a ge (colu m n 171 and row 60) formed the data se t from whi ch en sembles we r e creat e d a nd investig ated. The r e wer e five la nd cove r class e s of inte rest co nsidered inc luding: w a ter, buil t up are a s, thi c k swa m ps, ligh t swa m ps an d o ther vege ta tion . Ten e n s e m b le s we r e created e ac h wi t h five bas e cl assifiers, the num b e r f ive having been ar bi trarily c ho s en . F o r e ach ensemble, the ba se c l assifi e rs we r e m a de up of t he band s whi ch yielded the best s e pa rabil ity ind ices (best five band com b in ation s in this c as e ). Three se par abil it y indice s whe re used namely: Bhatta ch a ryy a di sta n c e , dive r gence and transfo r m ed dive r gence . For e a ch base cl assifier a n d corr e sp on ding ensem b le a n d a land cover m ap w as de ri ve d using Gaussian Support Ve cto r Ma ch ines. The l and cove r m a p fo r each ensem b le was con s eq u en t ly der ived through m a jority vo t i ng pr im ar ily due t o its si m p lici t y ( V a lentini a nd Masulli , 2002) . Each of the de rived land cove r maps w a s com par ed with ground t ruth data to a sc e rta i n t hei r c lassific ation a c curacy. In orde r to de t e r mine t he diversity of e ach en s em b le the k appa a n aly sis w as used to give t he me a sur e of agree me nt be t w een the con st i tu e nt bas e m aps a nd u ltim at ely the overal l ensemble diversity. The infl u ence of dive rsity on l and cover classifi cation accura c y for each ensemble was eval uat e d b y c om pa ring the derived land c ove r cl a ssif ication a c curacies wi t h the de r ived dive rsit y me a sure s. 4.0 Resu lts, Di sc u ssion and Co n clusions Table 1 gi ve s a su m m a ry o f the results de p i cting the en sembles con stituted de pending on t he separa bi lity inde x used, the r e sp ec tive bas e classifier cl assification accuracy asse ssme nt a nd t he con s eq u en t en s em bl e cl assification accuracie s. It also gi ve s t he dive rsit y me a sure p e r e nsemble a ccording to deg r ee o f ag reemen t a nd variance . F rom Table 1 it c an b e observed t h a t for all en s em bles, whe r e as t he en s em b le c l assification accura c y was b e tt e r than m a ny of t he base c lassifie rs, in no case w as i t b e t te r than t he best cl assifier w it hin the ensemble. It is, howeve r , critic a l to no t e , and the p o ss i b ility is indicated here a n d r e p o rted elsewhe r e (e.g. Bruzz o ne a nd Cossu, 2004), th at w he r e a s the ensem b le classif ication m a y not b e mo r e a cc ura te than a ll of the ba se cl assifiers us ed in its c on struction (F oody e t al., 2007), it certainly r ed uces the r isk of m a king a particul a rly p oo r s e lection (Pol i k ar , 2006). Table 1 a lso shows that a cro ss a l l en s emble s, the r e sp e ct ive cl assification accuracy increased as the s ize o f the ba se classifie rs in cr e as e d. This is furt he r confirmed from T able 2 de picti ng the b inomi a l t e sts of s igni ficance of the b e t ween ensemble c l assification accuracie s. In t he simp le c as e of de t e r mining if t he r e is a dif ference b e t wee n t w o classifications (2 sided te st), the null hy p o t he s i s (H o ) t hat the r e is no s igni fican t diffe r ence will be r e jected if |Z| > 1. 96 (Congal ton a nd Green , 1998). F o r e ach s e pa rabili t y index us ed , inc reasing the num b e r of fe atur e s in the base cl assifiers in gene ra l s i gnific a n tly increased the ensemble classification accuracy. T h e ensemble (E) with f ive f e at ure s per base classif ie r w a s s een t o be signific antl y b e tt e r than a ll the o t he r en sembles apar t from D3, whe r e t he d ifference was de emed insigni ficant. Fr o m the r e sults, nothing concl usive can be ded uced r e garding whi c h of the used s e pa rabili ty indic e s i s b e st suit e d a s a ba sis upon wh ich to buil d en sembles. The r elation s hip b e t ween en s em b le c lassific ati on accura cy and diversity w as investig at ed b y dete r mini ng the c o r relation b e t wee n ensemble c l assification accuracy and t he agr eem ent me asur e whi c h in this cas e w as the Ka ppa value . T his was c om put e d b y a ve r aging t he in- e nsem ble pair- w is e kappa v a l ues o f th e base classifie r s m e asur e d a g a in st each o t he r . In or der to ge t a better a ppre ci ation on the in-ensem bl e dive rsity, the variance w as also c omputed fo r the computed pair-wise kappa v alues. Intui t ively , the mo r e dive rse t he en s em ble, the lowe r the agreeme nt between t he classifie r s and conseq u e ntly t he lower the con sequent k appa v a l ue s . By exte nsion, the mo r e divers e the ensemble , t he bigge r the variance b e t ween the in -ensemble pa i r -wi s e kappa values. Fi g ur e 2 depicts t he interplay b e t ween the en semble accuracy and dive r si t y m e asur e , which in this case is t he me a n of the i n - en s em b le m easure of agreeme nt c om put e d from the in-ensemble pa i r -wi s e kappa v a l ue s. Fig ur e 3 g ives a m o dification o f the ensemble me a sure of a g r ee me nt whe r eby inste ad of conside r ing th e me a n of t he in- en s em ble pair-wis e kappa value s, t he ir variance s a re conside r ed . The coefficien t of co rr e lation of the line of b e st fi t in Fig ur e 2 is 0.83 while in Figure 3 is -0. 72. Fr o m Figur e 2 and 3, respe ctively, it ca n b e deduced that en sem bl e accuracy incre as e s as the a g r ee men t b e tween the bas e cl assifiers inc reases and as the v ar i ance between t he ba se classifier o utput de cr e ases. In e f fe ct, thi s wo u ld ide all y im ply that th e ensemble c l a ssific ation a c curacy would increase i f there is mo r e a g r ee me nt b e tween the bas e classifie r outputs. The contra d ic t ion this impute s is th at to ge t hi ghe r en sem bl e c l assification accuracy the r e is need fo r le ss diversity among the base classifie r s. The r e s ul t s bring to t he for e t he c h a llenge th at come s wit h incl u ding d ive rs ity m e a sur e s in ensembl e cl assification rese ar ch . Clearly its use in det e r mi ning dive r sity for land cover m apping is counter in tuitive. The pr o b lem m a y st em from using classifier output as th e basis upon which to me asur e diversity . Whe r eas dive r si t y , a s de fined in ensemble cl assificat i on r esearch , is pr e mi s ed on having decision boundarie s wh ich e rr d iff e r ently , usi ng outputs to de t e r mine the m easure of dive r sity pr e supposes th a t using diffe r ent decision boundar ie s wo ul d yield differen t resul ts. In the case of ensemble featur e selection , ba se classifie r s fr om different f e a ture s certainly r esul t in decision boundari e s whi ch err dif fer e ntly (and hence exhibit dive r si t y ), h oweve r , their fin a l classification o utputs are simil ar a s th e high coef ficients of c o r re lation depict. Hence bas ing o n t he outputs as a me asur e of diversity clearly gives a poor reflec tion of how dive rse t he en sem bl e is. In their c o ncluding r e m a rks, Shipp and Kunchev a (2002) posi t t hat the quantification of d ive rsit y and i ts us e in de t e r mining diversity in ens em bl e s will only b e possi b le whe n a mo r e pr ec ise for m ulation of the no tion of dive r si t y i s obtained. Until t he n d iffe r en t he uristics will ha ve t o be employed . Whereas ensemble classific ati on pre s e nts a un ique appr o a ch to l and cove r m a pping , the quantification o f dive r sity and its con s eq u e nt infl u e nce in de t e r m ining the t ype of en sembles is cle a rly still o pen fo r re search. Ackn ow l edgements The autho rs would like to acknowledge the support of the Un i ve r sity of the Witwatersra n d, Departmen t of Scien ce and Technology and r e viewe r s. Refe rences Benedik tsson, J.A . and Swain, P.H. 1992. Con s en sus theo retic c l assification me thods. I EEE Trans. o n System s, Ma n and Cy bernetics, vol . 22, pp. 688 – 704 Breim a n, L. 1 996. Baggin g predic t o r s. Machine L e a rning , 24(2), pp 123 – 140 Bruzzom e, L. and Cossu, R., 2002. A multiple-c as cade - cl a ssifie r s y st em f o r a ro bust and par ti ally unsupe rvised up d a ti ng of land-cove r m aps. IEEE Tran sa c tions on G eo science and R em ot e S en s ing, 40, pp. 1984–1996 Chan , J. , C ., a n d Pa e linckx, D. 2008. Ev aluation of R an dom Fo r e s t and Ad a boo st t ree -bas e d ensemble cl assification a nd sp e ctra l band s e lection for e cotope m a pping using a i rborne hy p e rsp e ctr al im a ge r y . Remo t e S en s ing of Envi ronmen t 112, pp 2999 – 3011 Cong a l t on , R . G., and Gr ee n, K. 1998. A ss e ss ing t he Accuracy o f R em o t e ly S ensed Data: Prin ciples a nd Practice s. (B oca Ra to n, Flo rida: Lewi s Publishe rs) Dže roski, S. and Zenko , B. 2004. Is Com bining Cl assifiers wi th S tacking Better than Selec t ing the Best One? Ma ch ine Learning . 54( 3) pp 25 5 – 273 (Hi ngh a m , MA , USA: Kluwe r A ca dem ic Publishe rs ) Foody , G.M., Bo yd, D. S. and Sanche z-Hernandez, C. 2 007. Mappi ng a specific c l a ss with a n ensem b le of cl assifiers. In ter n a ti onal Jour n al of R em ote S e n s ing , 2 8(8), pp 1733 – 1746 Fr e und, Y., a n d Schapire, R. 199 6. E xperime nts with a new boosting algorithm . In Proceeding s of the 13 th In t e r n a tion al Conference on Machine Le ar ning , Bar i , It aly , pp 148 – 156. (Mo rgan Kaufm a nn) Giacin t o, G., a n d R o li, F. 2001 . De sign of e ffective ne ur al networ k en s e m b le s fo r im a ge c l assification proce ss e s. Im a ge Vi si on and Com puting Journ al, 19:9/10, pp 699 – 707 Huang , Z a nd L e e s, B.G., (2004). Com bini ng non -par ame tr i c m odel s for m u l t i sourc e predictive f o r est m a pping . P ho t og r amm etric Enginee r ing and R em o t e S en s ing, vol . 70, pp 415 – 425 Ho , T. K. 1998. The r andom subspa c e m ethod for constructing de c i s ion f o r e s ts. IEEE T ran saction s o f Pattern Analysis and Machine Intelligen ce , 20 (8), pp 832 – 844 Kra m er J. H ., 2002. Obse rvation o f the e a rth and its e nvironme nt: S urvey o f m issions and s en sors (4 th Edition ). (Berlin: Springe r) Kun cheva, L., a n d W h itaker, C. , J. 2003. Measu res of diversity in classifie r en sembles a nd t hei r rel a tionshi p w it h t he ensemble a cc ura c y. Machine Learning 51 pp 181 – 207 (Hingha m , MA, U S A: Kluwe r Ac a dem i c Publishers ) Landg r ebe, D. 2005 . Multi spe ctral La n d S en sing: Whe r e Fr om , Whe r e t o? IEEE T ransacti ons on Geo s cience and Remo t e S en s i ng , 43, pp 433 - 440. Landg r ebe, D. 1997. T he evolution of Landsat Data A n alysis. Pho t og r amm etric E nginee r ing a nd R em o t e Sensing , 63, pp 859 – 867 Opitz , D. 1999. Feature Se lection for Ensemble s. In Pr o cee dings of the 16 th National Confe r ence o n A rtificial/ Intelligence (AAAI) , Or l ando -Florida, USA , pp 379 - 384 Oza, C . N. a n d Tume r, K. 2008. C l a ssif ie r en s em b le: S e lec t re a l – w orld applications. In for m ati on Fusion , vol. 9, pp 4 – 20 Pal, M., 2007. Ensem b le Learni ng wi t h D e c i s ion T ree fo r Remo t e S e nsing C l a ssific ation. In Proceeding s of the World Academ y of Scien ce, Enginee r ing and T ec hnology Vol 26 pp 735 - 737 Parikh , D., a n d Poli k a r, R. 2007. An E n s em b le-B a sed I n cr em enta l Learning A ppr o a ch t o Data F usi on . IEEE Transacti on s On Systems, Ma n , A nd Cyber ne tics —Part B: Cyberne t ics, Vol . 37, No. 2 Polik ar , R. 2006. Ensem bl e based s ystem s in deci sion m a king . IEEE C i rcuits and Sy st e m s Ma g a zine , pp 21 – 44 Steele , B., M. an d P atte rson, D., A. Lan d Cover Mapping using Combin a tion a nd E n semble Cl assifiers. Com puting Science and St atisti cs vol 33, pp 236 - 247 Shi pp, C., A. a n d Kunc he va, L., 2002. Re lati on ship be t we en c om b in a tion m e thods and m e a sur e s of dive rsity in combining classifie rs. In for m ation Fusion 3 pp 135 - 148 Tsym ba l , A., P e c he nizkiy, M., a n d Cunningham , P. 2005. Di ve rsity in s e a rch str ate gie s f o r en s em bl e feature sele ct ion . In f o r m ation Fusion, 6(1), pp 83 – 98 V alentini, G., a n d Ma sulli , F. 2002. Ensemble s of learning m a chine s. In: Neural N e ts WI RN Vie tri, Lecture Note s in Comput e r Sci ence s, e dited by ; T agl iaf e r ri, R and Marinaro, M., vol . 2486, pp 3 - 19 Wacke r , A . G., a n d Langrebe , D. 1972. Minimum Distance Classification in R emo t e S en sing. I n Pr o ceeding s of the 1 st Canadian Symposi um for Remo t e S ensing Fe br uar y . 7 th – 9 th Feb 1972. Y u, E., and Cho, S. 2006. Ensem ble B as ed in GA Wrapper Fe a ture S e lection. Com put e r s and Industrial Engine e r ing 51 pp 111 – 116 * Correspo ndi ng Autho r : Tel.: +27117177261; Fax: +27114031929 Emai l: An thony.Gidudu@wi ts.a c.za ; A nt ho ny.G id udu@gm a il.com
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment