Relationship between Diversity and Perfomance of Multiple Classifiers for Decision Support

RELATI ONSHIP BETWEEN DIVERS ITY AND PERFO MANCE O F MULTIPLE CLASSIFIE RS FOR DECISION SUPP ORT R. Mu sehane , F. Nets h i on g olwe , F. V . Nelw amo ndo * , L. M asisi and T. Ma rwala Schoo l of E lectrical & In f ormat ion Eng ineering , Uni vers i ty o f t he W itwatersr and, P riva t e Bag 3 , 2050 , Johann esbu rg, Sou th Afri ca * Gra d uate Schoo l of Ar ts an d Science s Harv ard Univers ity G S AS Mail Cen ter C hild 412 , 26 Ev eret t Str eet Ca mbri d ge, Massac huset ts, 021 38 USA Abstr act: Th e paper pre sen ts t he inv estigati on a n d impl e men tation of th e re lation ship betwe en diversity a n d t he perf orm a n ce of mu lt ipl e c lassifi ers on classific ation accuracy. T he stu dy is c ritical as to bu i ld classifie r s that a re stron g an d can ge n e ralize b etter. T he param eters of the n e ural n et work within the comm ittee we re varied to in duce div ersity; he n c e s tructural div er sity is the focus for this study . T he hidd en no de s and the act iv ati on fun ction ar e th e paramete rs that were varied . The diversity measures that were adop ted from eco logy such as S hann o n a n d S impson were used to qu an t ify diversi t y. Gen etic algorithm is used t o find the optim al en se mbl e by usin g t he accu racy as t he c ost f un ction. The r esul ts observ ed show s t h at t her e is a relati on ship between structu ral d iv er si ty an d accura cy. It is o bserv ed that the classification accuracy of an en s e mb le in c reases as t he div ersity in creases. There w as an i ncrease of 3%-6 % in t he clas s ifica ti on a ccu rac y. Key words: C la ssi fica t ion , D iv e r si ty Measu res, G en et ic Algo rithm, Mul t iple Cla ssifier s, Stru ct ural Div er s ity . 1. INTR OD UCTIO N Comp utatio nal intel lig ence tech n i ques ha v e been used i n ma n y classificatio n prob l ems. T he literature emp ha sis e s that a gro u p of classifie rs is bette r than one clas sifie r [1-5]. T his is beca use the d ecisio n that i s m ad e by a comm ittee of c l assifiers is bette r than the d ec i sion ma de by o ne class ifier. I n this pape r the comm ittee o f clas sifie rs will b e refe rred as an ensemb l e. T h e mos t po pula r way t o gain con fiden ce on the ge neralisation ab ility of an ensemb le i s by introducing di v ers i t y wit hin the ensemble [1, 2, 5]. T his has l ed to th e de ve l opme nt of me asures of di vers it y and v ari o us ag greg atio n sc h eme s for combining classif i ers. H ow ever, d ive r sit y is no t clea rl y defined [6 , 7]. Thu s, a proper m easure of di versit y that wi ll rela te dive rsity to accu racy is to be adop t ed. Current me tho ds commo nl y use the outcom e o f the indiv idual clas sifiers of an ensemble to measu r e di v ers it y . Henc e an ensemble is c on sidere d di v erse i f class i f iers within the en semb le p roduce dif fe rent outcom es a s op posed to hav ing the same o utcomes [1, 6, 7]. I n this pap er, as oppo s ed to look i ng a t the outc omes of the indiv idual clas sifie rs, e nsemble d iversit y is v i ewed as the structura l v ariation within classif i ers th at fo rm an ense mb le [1, 5]. Thus , di vers it y will b e induced b y cha ngin g structural pa ra mete rs of a ne ural network [5]. T h e paper in vestig ates t he relatio nship be tween structural diver sity within an ensem b l e and th e predic tion accurac y of the ensemb l e. I t has been intuitive l y accepte d th at t he clas sifie rs to be comb i ned sh ould be divers e [8]. This is bec aus e it has been fo und mean i ng less to c ombine identical cl ass if iers be cause n o im proveme nt can be ac hieve d when co mb i ning t hem [8, 9]. Hence , m easuring structura l div e r sit y and relatin g it to acc uracy i s crucial i n orde r t o b uild bette r learning mac hine s. Ho we v er, it is nec e ssa ry t o find the op timal size of a n en s emble that gives better ge neraliza ti o n . T herefore , a study on the si ze of the ensemble was done as to find the op ti mal size t hat ca n be use d for the in ve stiga tion. T h e m e th ods fo r me asuring structural div ers ity are to be de v ised and imp lemente d. Mo r eover , the o u tcome d i ve rsit y of structura ll y di ff ere nt classifie rs is critical t o be mea su red. T his is becau se i t is essen tial t o s ho w how correla t ed th e ou tc om es of the struc t ura lly dif f eren t clas sifiers is. Hen ce , the limitatio ns of accuracy in the structura l d iv ersit y are to b e justifie d. Diffe ren t me thod s for cr ea t ing di ve rsit y su ch as bagging and boo sting h ave been explo red [1, 3]. Ho we v er, the agg r egatio n me thods are to be used t o co mb in e the ense mb le p redictions. M ethods o f voti ng and av eraging have been found t o be popular [9, 10] and h ence are u sed in this study . T h e pape r first discu sse s th e ba ckground in se ction 2. Analy sis of the d a ta u sed fo r this st udy is presente d i n sec tion 3. The accuracy measu re and str uctura l mea su r es of di v ers it y used are discussed in sec tion 4 and sectio n 5. T h e m ethodo lo gie s use d in inves tiga tin g the eff ect of diver sity on generaliza tion are p resente d i n sectio n 6. The results an d future work are then disc u ssed in sectio n 7. 2. BACK G ROUN D 2.1. Neural Netwo rks Neura l Network s (NN ) are co mp utatio nal mo de ls t hat have t he ab il ity to l ea rn and m od el l inear and non- li nea r sy stems [11]. T here a re m an y t y pes o f n eural n et work s bu t the mo st commo n n eural n etwo r k ar chitec ture is t he m u lti lay e r p erc eptro n (MLP) [11]. T he n eural ne twork arc hitecture t ha t is used in this pape r is a M L P n etwork as sho w n in Figure 1. Th e MLP netwo rk has the input layer, the hidden la y er and the o ut put laye r. An MLP network has p aramete rs such a s learning r ate, nu m be r of hidden no des and the a ct i vatio n function. T h ese pa ra mete rs c an be va ried t o i nduce structural dive rsit y [5]. T he gene ral equa t ion of the ou tput fu n ctio n of a MLP neura l n etwork is show n belo w (1). ) ) ( ( ) 2 ( 0 1 ) 1 ( 0 1 ) 1 ( ) 2 ( k M j j j N i i j i nner kj outer k w w x w f w f y + + = ∑ ∑ = = (1) where : k y is t he output from the neural network, is the outpu t a ctiv atio n f un ction that can be lin ea r, softmax or logistic, is the hidd en la y er tangen t ial a cti v ation f unction. M is the num be r of t he hidden units , N is t he num be r of inp ut un its, ) 2 ( kj w and ) 1 ( ij w are t he weights in the first and sec ond la y er mov ing from input i to h idden unit j, ) 1 ( 0 j w an d ) 2 ( 0 k w a re the bias es f o r th e unit j. Figu re 1: Th e MLP neu ral netwo rk a rchitecture T h e in puts in to the neur a l ne tw ork are th e demograph ic data attributes from the HIV an tenata l survey and the ou tput is the HI V status o f the ind iv id ual whe re 0 rep resen ts ne ga t i v e and 1 represe nts positiv e. T h e weights of the NN a re up d ated using a back propaga tion algorithm during the training stage [11].T he threshold of 0 .5 i s used in order to achieve a zero or one s olution from the neural netwo rk. T his mean s th at any v alue less tha n 0.5 is co nve rted to 0 and any va lue mo re than 0 .5 is c onve rted to 1. 2.2. Ge ne ti c Algor ith m T h e gene tic a l go rith ms (GA ) are comp ut ationa l m o dels that are base d on the ev o luti on of bio logical pop u lation [2]. Potential so l ution s a r e encode d as t he chromosome s of some i ndiv i dual. T hese indi v iduals are i nitially ge nerate d random l y . T he indiv i duals are e valua ted throug h t he defined fitness functio n. Eac h preceding ge neration is popula t ed by t he fitness solutio n (membe rs) of the previo u s gen eration and t heir off s pring. The offsp rings a r e create d through crossov er an d m utatio n. T h e crossov er process com bines gen etic info r m atio n o f two prev i ous f i ttest s olutions to cr eate new offspr i ng s. Mutation alters th e gen es of the i ndiv idual to intro d uce mo re di v ers ity i nto th e population . I n this way , th e i nitial ge nerate d so l ution can be improv ed ove r t ime [2, 12]. 3. DATA ANALYS IS 3.1. Data C ollection T h e dataset used for t he study is from anten atal clinics i n So u th Africa and it was collected by the depar t me nt of hea l th in 2001. The fea tures in the d ata include the a ge , grav i dity, par ity , e ducatio n, etc. T he demo gr aphic dat a used in the study is sho wn i n t ab le 1 b elo w. T he province was pro v i ded as a stri ng s o it was co n ve rted to an i nt eg er f ro m 1 t o 9. T able 1: T he fea tures from the survey Va riable Ty pe Ran ge 1 Ag e intege r 13-50 2 Educatio n intege r 0-13 3 Parity intege r 0 -9 4 Grav i dit y intege r 1-12 5 Prov in ce intege r 1 -9 6 Ag e of fathe r intege r 14-60 7 HIV status b i nary 0-1 T h e age is that of th e mother visiting the clinic. Educa tion rep resen ts the level of educa tion the mo ther has and rang es fr om 1-13, where 1-12 c orre sponds to gra de 1 t o 12 and 1 3 rep resents t ertiary ed ucati o n . Parity is the num be r of ti m es the mothe r ha s giv en birth wh i lst grav i dity is the number of ti mes t he mo ther ha s been preg nant. Both these quantities are importa nt, as the y sho w the reproduc ti ve a ctivity as well as the reproduc ti ve hea l th state of the wom en . The a ge of the fa t he r respo n si ble fo r the curre nt pre g n ancy is also given an d t he prov i nce entry c orre sponds t o t he geograp hic area where the mother comes from . The l ast feature is the HIV status of the mother where 0 represe nts a negative status wh ilst 1 repr esents a p os iti ve status. 3.2. Data P re-Processing T h e data pr eproce ssing is nece ssary i n orde r to eli m inate impo ssib le situati o n s such as pa r it y be i ng gr eate r than grav i dity b ecau se i t is not po ssi b le for the mo th er t o gi ve birth without fa lling pregna nt. T he pre-proces sing o f th e data resu l ted in a r educ tion of the da t a set. T o us e the datas et f or training, it needs to b e no r malized because some of the data va ria b les with l arge r varianc es will influe nce the result more than others. T h is ensures that all va riables can contribute to t he f i na l netw o rk weights of the predictio n model [13]. The refore, all the data is to be no rmalized be t ween 0 and 1 using ( 2). min max min x x x i x norm x − − = ( 2) where : m in x a nd m ax x is the m ini m u m and maximum va lue of the fea t ures of the data sam ples re spective l y. T h e data were di v id ed into three sets, t he training, va lidation and testing data. Th is was done as to av o i d ov er-fittin g of t he n etwork. The neu ral networks a r e trained by 60% of the data, va l id ated w it h 20% an d t ested with 20% . 4. ME ASUR EME NT OF ACCURA CY Reg re ssio n problems mos tly foc us on using the m e an squa re erro r between the a ctual outcome and the pred icted ou tc om e as a m easure of how we l l neur al networks a r e pe rforming . I n classificatio n pro blem s, th e ac cu ra cy c an be meas ur e d u sing t he conf usi on m atri x [14]. Analy sis of the da tase t t ha t i s b eing used showe d that the data is bia sed towa rds the nega tive HI V status outcomes . Henc e, the data was di vide d such that there is equal n u mb er of HI V p ositive a nd nega ti ve cases. T h e acc urac y me asure that i s used in this study is giv en by (3). = % Accuracy % 100 × + + + + FN FP TN TP TN TP (3) Whe re: = is th e true p os it ive -1 c lass i f ied as a 1, = is the true neg ati v e - 0 cla ssifie d as a 0, = is the f al se neg ati v e -1 cla ssif ied as a 0, = is th e f als e p ositi ve - 0 class if i ed as a 1. 5. MEAS U REME NT OF DIVERS ITY 5.1. Shan non-W iener D ive rsity Measure Sha nno n entropy is a di v ersit y meas ure that was ad opted f ro m ecology and in fo r mation t heory t o und ersta nd ense mb le di v ersity [15]. T his measure is i m pleme n t ed t o me asure structural d i ve rsit y . T h e Sh ann o n -Wiene r ind ex is com mo n l y used in info rmatio n theory t o quantify the unce rt ainty of the state [ 15, 16 ]. If the stat es are di v ers e on e becomes uncerta in of th e ou tcome. I t i s also used i n eco l ogy to m easu re di v ersity of the spec i es. I nstead of bio logical s pecie s, the sp ecies are co nsidered as the indiv idual base classif i er s. T h e Sha nnon di versit y me asure is g i v en by (4). ) lo g ( 1 ln N M i N i n N i n D ∑ = − =                 (4) Whe re: = nu m ber of n eural networks that have the same structure = total num be r of neura l netwo rks in an ense mb l e = tota l nu mb er of diff erent neural networ ks /species = the div ersity index T h e d ive rsity r a n ges f ro m 0 to 1 , where 0 in dica t es low diver sity a nd 1 indicate s h igh es t div ers it y . 5.2. Simpson Diver sity Me asure T h e other measure that w as impleme nted is the Si mpson diver sity meas ure . This me asu r e is also adopte d from eco l ogy t o quantify di v ers it y . I t is quantified b y (5 ). ∑ = − − = M i N N i n i n D 1 ) 1 ( ) 1 ( (5) = nu m ber of n eural networks that have the same structure = total num be r of neura l netwo rks in an ense mb l e = tota l nu mb er of diff erent neural networ ks /species T h e divers i t y i ndex i s give n b y . The di ve rsit y increa ses as the index inc reases. I t ran g es f r om 0 to 1 where 0 me ans there is no di ve rsit y a nd 1 indi c ate th e high est div ersit y . 6. METH OD OLOGY 6.1. Creation of Base Classifie rs Sinc e the foc us of the stud y is the structural di v ersit y , the ac ti v ati o n functio n, lea rning rate and the num be r of hidden no des were v arie d as to induce d i v ersity. Ho wever, v a r ying all the p ar ameters was found to be ineff e ctive b ecau se t he class i fiers tend to gene rali ze t he same w ay . T h erefore , o n l y hidden nodes and acti v ation f unction we re varie d for this inv estigatio n. T h e classif i ers are trained indiv idua l l y using the back prop agatio n me thod ; where th e error is propag at e d back so as t o adjust the weig h ts acc ordingly . T he data use d for training, v al idation and testing a re the HI V data. All t he f eatures of t he input are fed to a l l the networks . The clas sifie rs which have the training accuracy of 60% w e r e ac cepte d. T h e training ac curacy b et ween 60% and 6 3% was achieved . T he hi dden node s were va ried from 7 t o 57 and t he acti va t ion func tion be tw ee n t he l og istics and the linear functio n was random l y varied. T he classifier s wer e trained using quas i- Newton algorithm f o r 10 0 c yc les a t the sam e l earn ing rate of 0.01. 6.2. Co mmittee o f Classi fi ers T h e comm ittee of class ifiers i mp r ove s effic ienc y and clas sificatio n a cc ur acy [17, 18]. T his ensures that the results a re based on the cons ensus d ec ision o f th e b ase clas sifie rs. The base classifie rs o pe rate conc urrentl y during the clas sific atio n and their o utputs are integ rated t o ob tain the fina l ou t put [18]. The model for the c ommittee of c lass i fiers is sho wn in fig ure 2. Netwo rk1 Netwo rk2 Netwo rk3 Combin e Neu ral Ne twork s Output s O 1 O 2 O 3 O Ensem ble Outp ut Inputs Figu re 2: Th e class i f i e r ensemb l e of n eural netw orks T h ere are many agg regati o n methods that c an b e used t o comb ine the ou t co me s of classifiers. The se were explored in t h e prelim i na ry repor t. T he ense mb le outcome s wer e all agg regate d using si mp l e ma jorit y voting . T his wa s cho s en beca us e it is p op ular and ea sy t o i m pleme nt [9]. T h e o ut come s o f eac h ind i v i dual from an ensemb l e a r e f irst co n ve rted to 0 o r 1 u sing 0 .5 as a t hresh old. T he ma jorit y vo ting metho d cho ose s the pred iction that is mo stly pre dicte d by differen t cl assif iers [19]. The o t her me thod th at was impleme nted wa s ave r a g i ng. All the ou tc om es from all the class i f iers are ta ke n and ave raged. 6.3. Evaluation of Optim al Ense mble Size I t is i mp or t an t to us e th e optimal size o f an ens emble t ha t results in bette r gene r a l isatio n o f the d ata [20]. T he ense mb le siz e is determined by the number of classifie rs that belo ng t o an ense mb le. T he created classifie rs wer e used to carry ou t th is experiment. T h e ensemb le size w as increm ented by o n e from 1 to 50 . Howeve r, the structure of the netwo rks w as ma d e to be differe nt b y va rying the hidden node s as t he ensemb l e size increases. Hence , t he size o f the ne twork itself is incre ased as t he nu mbe r of clas sifie rs in the ensemble increases [4]. Figure 3 b el o w sho w s the res ult s ob tained. I t was howeve r o b s erved that th e rela tions hi p between the size and accu racy of the en semble depe nds on the ac cu racy of the ind i v i dual cl a ssi f iers t ha t be long t o the ense mb le. I n crea sing th e siz e of the n eural ne two rk b y increa sing the hidden nodes tends to i mprove the clas sificatio n accu racy a s t he nu mb er of the classi fie rs i n an ense mb l e i ncrea ses. Howeve r , an increa se in si ze results in an incre as e i n t he pr edictio n accuracy . Co nseq uently, after the o pti ma l s ize of 19 classi f iers is rea ched, the ac curacy tends t o remain co nstant. Neve rtheles s, th e siz e of 21 was foun d t o be optima l since it pro d uc ed the b est acc uracy. T h e results ob tained ar e foun d t o b e con current wit h literature. Current ly the op timal si ze of an ensem ble is 25 [18 , 20 ]. Th erefo re, an ense mb le size of 21 i s used for evaluating the re l ationsh i p be tween di v ersit y and per forma nce o f classi fie rs on HI V clas sificatio n. 0 5 10 15 20 25 30 35 40 45 50 48 50 52 54 56 58 60 62 64 66 68 N umber of N eura l N etwor k s Classi f icati on Acc ura c y(%) Av eragin g Vot ing Figu re 3: Th e ensem ble size and classif i c atio n acc ur acy 6.4. Evalu ation of Outcom e Diver sity Currently , m easuring the outcome di v ers i ty had been po pula r t han meas uri ng t he s tructural diversit y [6]. I t was ho wev er necessary t o m e asu re the o utc om e d i ve rsity f o r this st udy . Th i s is beca use it is essential to meas ur e the deg r ee of t he a greemen t and disag re em ent on the ou tc om es of the ensem ble . This expe riment was usef ul for ana l y sing the lim itatio ns on s tructural d i v ersit y re sults. T h e divers ity m e asure s uc h a s Q statistics was u sed t o me asure dive rsity . Q statistics ev aluate the deg r ee of si m ilarit y and dissim i larity in the outcom e s of t h e classif iers wit hin t he ense mb le [8]. The d i ve rsit y ind ex ra nge s f ro m -1 to 1 where 0 indicates the highes t div ers ity and 1 indicate low est di v ersit y [6]. For all 21 clas sifie rs i n an ensem ble , ea ch classifie r is paired with eve r y o t he r classifier withi n the ensemb le. T he r es ult s fr om this study show that ou tc om es of the structurally div ers e c la ssif iers withi n t he ense mb le are highly correla ted. T his i s indicated by a Q va lue which is closer t o 1. T he obtained Q value is from 0.88 to 0 .91. 6.5. Evalu ation of Str uctural Dive rsity T h e create d classifier s were used to in v estiga te the rela ti o n ship be t ween the di v ersity and accu r acy . The r e wer e ten base cla ssifiers o r species that were se l e cted f ro m the created cla ssi fie rs which are all structurally diffe rent based only on the hi dden nodes and a cti v ation f unctions. T hese networks h ad differen t acti v ation f unction and hidden nodes were varied from 10 to 55 i n step s 5. T h e G A has the capabilities to search larg e spaces for a global op tima l s o l ut ion [5]. G A was therefore us ed to search fo r 2 1 c l ass if i ers f rom the 1 0 base c lassifie rs using the ac curacy as the fitnes s function. T he fittest f unction is g i v en by: 2 ) ( Acc Acc T Functio n Fittest − − = (6) Whe re: Acc T is the targ eted ac curac y and Acc is t he ob tained accuracy . The GA c o n ti nues to s earc h un til th e erro r b etween the t arg et ed a cc uracy an d the obta ined ac cu racy i s minim al. Firstl y , it was neces sar y to optim i ze the accurac i es t hat could be attaine d i n o rder to m inim iz e the com pu t ational cos t. T herea fte r, th e attained accu ra cies wer e used in the sec ond run as t he t arge t a ccu racy . The size of the neura l netwo rk comm ittee used is 21 classif iers which are forme d from a combinatio n o f 10 unique base clas sifie rs. Hence , each en se mb le will hav e a repetition of ce rtain clas sifie rs. Once t he ensem ble of 21 classifie rs prod uces the targe ted dive rsit y , the co rresponding structura l div ersi t y is o bta i ne d using both Sim pso n and Sha nno n di vers it y measures g iven in (4) and (5). T h e algor ithm impleme nted is show n i n fig ure 4. Figu re 4: Th e algo rithm use d for eva luating di v ersit y 7. RESUL TS ANALYS IS 7.1. Struc tural Dive rsity Analy sis I n this study , di v ersit y was induced by va r y i n g the pa ra mete rs of t he classif i ers that fo r m an ensemble [5, 16] . The in v estiga tion was done on an ensemb le of 21 clas sifie rs. Figure 5 shows the ob t a ined results using t he Sha nno n dive rsit y m e asure . F igure 6 shows the results ob tained using the Sim pson dive rsit y m e asure. 0.58 0.6 0.62 0.64 0.66 0.68 0. 7 64 64.5 65 65.5 66 66.5 Shanno n Div e rsity Inde x C lassif icat ion Accuracy(%) Figu re 5: Th e eva luation of Sha nno n index w ith accura cy 0.83 0.84 0.85 0. 86 0.87 0. 88 0.89 0 .9 0.91 64.4 64.6 64.8 65 65.2 65.4 65.6 65.8 66 66.2 Simps on Div ersity Inde x C las sifica tio n Accu ra cy(%) Fig ure 6: Ev aluation of Simp son index with accu racy T h e figu res indicate tha t an increase in st ruc tur al dive rsit y results i n an increa se in ac curacy which i s in agree me nt with [16] . T he e xper i m ent wa s don e seve ral times ob se rvi ng t he relatio nship betwee n di vers it y and accurac y using bo th Si mp s on and Shannon di v ersity m e asure . T h erefore the results show n a b ove are th e a verag e of ten diffe rent e xp eri m ents that we re per forme d. Th e re sults sho w that t he two me as ur es are concurren t. In the Sha nno n diversity meas ure, the GA was a ble to atta i n wide range of d ive rsity w herea s in th e Si mpso n meas ure, the range is l imited from 0.8 to 0. 9. T h is was becau s e th e Sha nno n d i v ersit y index de pends on t he n umb er of base clas sifie rs whereas th e Si m pson ’s in de x dep e nds on how eve nl y d istributed t he base classifie rs are [1 5]. Shanno n has sho wn that the mo re uncerta in one is of the outcom e, the more d i ve rse an ensemble is.The res ults clearl y show that structural v ariation o f the pa ra m eters of the n eural netwo rk (cla ssif i e r) do es have a r elations hip with clas sificatio n a ccuracy As the structura l diversity increa se d s o did the acc uracy. 7.2. Discussion and R eco m mend ati o ns I t wa s h oweve r obse rved that the individu al cl ass ifiers within the e nsem ble wer e hi g hly corre lated i n the ou tc om es. T his had affecte d t he results b ecau se ve ry l ow and high ac cu r acies c ould not be attained. I t is howev er recomm en ded that a strategy of adding classifie rs in an ense mb le s uch that onl y clas sifiers th at are uncorre l ated are accepted i n a n ens emble can be ad opte d. T he expe ri men t foc uses on training the classifie rs using all the f eatures of the da ta. I t is ho wever recom mended that diffe rent n etwo r ks ca n be fe d different f eatures of the data. Th is might ens ure that t he outcom es of classifie rs are not highly correlated . He nce, a high er range of ac cu racy and divers it y in de x ca n be attained. During the training stage of the m achine, the weights a r e no rmally random ly ini tialised. H owev er , i t has been foun d t ha t dif ferent in itia l weights induce div ers it y withi n the e nsemb le [1]. T he Shanno n and Simpson divers it y me asures fo cuses on how structurally different t he clas sifie rs in a n ens emble are. T hese me asures d o no t co ns i de r d i v ersity induced dur i ng in itia lisatio n o f weights. T h erefore , it i s recomm ended that fo r future work, a be tter mea sure of structural divers i t y th at inco rporates t he effe ct of we ight initialisatio n should b e dev el op ed. 8. CON CLUSIO N T h e pape r presented t he re l ationsh i p b etwee n structural diver sity and ge n era li zatio n a cc ura cy u sin g Sha nno n and Simp so n di ver sity meas ures t o q ua ntif y d i ve rsit y . The inves tiga tion is ne ce s sary as to build l ear ning ma ch ines o r comm itt ee of n etworks tha t can generalize b etter. T h e results have clearly sh own th at as t he structura l d ive rsit y index ba sed on the mea sures used in cre ases, the ensemb l e ac cu racy increa ses. H ence , the clas si fie rs ca n b e m ade structura ll y differe nt in order to gain goo d class if i ca t ion ac cu racy. T h is has b r o u ght an incre ase o f 3% t o 6% in t he clas sificatio n accuracy . The metho d used to compu te th e results wa s found t o b e com putatio n ally ex pensive due t o the u se of GA. T here i s h owever l imitatio ns broug ht abo ut by th e indi v idua l classifiers produc i ng si mila r ou tc om es ev en th o u gh they are structurally different. Ho wever, the use of measuring structural di v ersit y i n bu ildi ng goo d ensemble s of class ifie r s is stil l to be exp l o red. ACK NOWLEDGEMEN T T h e author would like t o tha nk Fulufhe l o Netshiongo lwe for his cooperation and contrib uti on d uring th e project as a project p artner . Profe ssor Tshilidzi Ma rwa la is t hank ed for supervis ing the pro j ect and additional th a nks are ex tended to the po st g raduate student L ese di Masisi f or his co ntributio n durin g i m plementatio n of the p roject. REFERENC ES [1] G. Brown, J. W y att, R . Harris an d X .Yao . “ Dive rsit y Creatio n Methods : A Sur vey and Categoriz ation , ” Journal of informat ion Fu si on , pp 5-20, Vo l. 6 , No. 1, 2005. [2] J. S y l ve ster, N .V. C h awla , “E vo l utiona r y Ensem ble Creatio n and Thinning ”, Pro c. Of I nternatio nal J oint Co nference on Neural Netwo rks, pp 51 48-5155, 2006. [3] N.V. C hawla, J. Sy lvester, “ E xp l o iting D ive rsit y in En se m b l es: I mprov ing the P erfo rmance on Unba lanced Dat a se t s”, Mu l tiple Classifie r Syste ms , Lec ture N otes in Comp ut er Science , Spring er, V ol. 4472, pp 397- 406, 2007. [4] Y. K i ma , W.N. Stree t , Filip po Men cze r, “ Optim al ensemb le c onstruc tion v ia m eta-e vo lutionar y ensemb le s”, Expert Sy st ems with Applicatio ns, Vo l. 30, No. 4, pp 705-714, 2006. [5] L. Masisi, F.V. Nelwam ond o , T . Marw ala,”T h e effec t of structura l d iv ersit y of an ens emble of classif ier s o n classificatio n ac curacy ”, IASTED Int e rnational C onferenc e on Mo dellin g an d Simul a t io n (Africa-MS) , pp 1- 6, 2008 . [6] L.I Kunch e va, C. J. W h itak er, “Measure s o f Divers ity i n Cla ssi fie r Ensemb l es a nd T heir Relatio nship wit h the Ensem ble Accuracy”, M achine Lea rning , Vo l. 51, No . 2, pp 181–207, 2003. [7] L. I. Kunchev a and C. J. Whitak e r , “ Ten meas ures of diver sit y in cla ssie r ensemb l es: limits fo r t wo classie rs”, I n P roc. of IEE W o rkshop o n Intell i gent Sensor P rocessing, pp 1-10, 2001. [8] R. Polik ar, “ Ensem ble b ased system on d ecision making ”, IEE E Circuit and System Magazine , pp 2 1- 45, [9] C. A S hipp , L . I Kuncheva , "Rela tionshi p be t ween comb in atio n metho ds a nd mea sures o f diversit y in comb in ing classif iers", Information F u sion , Vol. 3, No . 2, pp 135- 148 , 2002 [10] A. Lip nickas, “Classifie rs fusio n with data dep endent fus ion with data depe n dent aggreg atio n sc hemes” , Int e rnational C onfer ence on Informatio n Network s, Syste ms and Techn ologies , pp 147 - 153, 2001 [11] M. Bisho p, Pattern R ecog nit ion and Machine Learn ing , Spring er Scie nce and Busines s Media, 2006 [12] T Marwala. B aye sian Tr aining of Neural Networks Using Genetic Programming. Pattern Reco gnition Lette rs, Vol. 28, p p. 1452-1458, 2007. [13] I . T Nabney. Netlab: A lgorithm s for Partten Reco gnitio n. Sp ringer , 2001. [14] B.B L ek e, T . Marwala, T . Te tte y , “Au t o en coder netwo rks for HIV classificatio n”, Curre nt Sci en ce , Vo l. 91, No. 11, 2006. [15] D.G. Mc donald, J. Dimm ick, ” T h e co nceptualiz a- tio n a n d Measureme nt of Diver sity” , Communic ation Resea rch, SAGE p ublications, V ol. 30, No. 1, pp 60- 79, 2003 [16] L . Masisi, F.V. Ne lwam ond o, T . Marwala , “ The u se entro py measu r es to measu re th e structural dive rsit y of an ensemb l e of classifie r s v i a the use o f Gene ti c Algo r ith m ”, Sc hool of El ec t rical an d in f or mation En ginee ring W itwaters rand University, ICCC, 2008, ac cepted [17] J. Kittler, M . Ha tef , R. D uin , J. Matas. “On Comb ining Classifiers” , IEEE Transactions on P attern Analysis and Mac hine Intel ligence , Vol. 20, No . 3, p p 226-23 9, 1998. [18] D. Op itz, R. Ma clin , “Pop u lar Ensemble Me tho d s: An Empiric al Study”, Journal of A rtific ial Int e lligence Researc h, V ol. 11 , N o. 8, p p 169 - 198, 1999. [19] A. Lip nickas, “Classifie rs fusio n with data dep endent fus ion with data depe n dent aggreg atio n sc hemes” , Int e rnational C onfer ence on Informatio n Network s, Syste ms and Tec hnologies, I C INA S Te, page 1 47- page 153, 2001 [20] W.D. Penny , S.J. R ob erts.”Bay esia n Neura l netw ork s for Class ification: How useful i s the E v id ence Fram ework,” N eur al Network s, Vo l . 1 2, No 1, pp.877-8 92, 1 999

Relationship between Diversity and Perfomance of Multiple Classifiers for Decision Support

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment