Relationship between Diversity and Perfomance of Multiple Classifiers for Decision Support

The paper presents the investigation and implementation of the relationship between diversity and the performance of multiple classifiers on classification accuracy. The study is critical as to build classifiers that are strong and can generalize bet…

Authors: ** R. Musehane, F. Netshiongolwe, F.V. Nelwamondo*

Relationship between Diversity and Perfomance of Multiple Classifiers   for Decision Support
RELATI ONSHIP BETWEEN DIVERS ITY AND PERFO MANCE O F MULTIPLE CLASSIFIE RS FOR DECISION SUPP ORT R. Mu sehane , F. Nets h i on g olwe , F. V . Nelw amo ndo * , L. M asisi and T. Ma rwala Schoo l of E lectrical & In f ormat ion Eng ineering , Uni vers i ty o f t he W itwatersr and, P riva t e Bag 3 , 2050 , Johann esbu rg, Sou th Afri ca * Gra d uate Schoo l of Ar ts an d Science s Harv ard Univers ity G S AS Mail Cen ter C hild 412 , 26 Ev eret t Str eet Ca mbri d ge, Massac huset ts, 021 38 USA Abstr act: Th e paper pre sen ts t he inv estigati on a n d impl e men tation of th e re lation ship betwe en diversity a n d t he perf orm a n ce of mu lt ipl e c lassifi ers on classific ation accuracy. T he stu dy is c ritical as to bu i ld classifie r s that a re stron g an d can ge n e ralize b etter. T he param eters of the n e ural n et work within the comm ittee we re varied to in duce div ersity; he n c e s tructural div er sity is the focus for this study . T he hidd en no de s and the act iv ati on fun ction ar e th e paramete rs that were varied . The diversity measures that were adop ted from eco logy such as S hann o n a n d S impson were used to qu an t ify diversi t y. Gen etic algorithm is used t o find the optim al en se mbl e by usin g t he accu racy as t he c ost f un ction. The r esul ts observ ed show s t h at t her e is a relati on ship between structu ral d iv er si ty an d accura cy. It is o bserv ed that the classification accuracy of an en s e mb le in c reases as t he div ersity in creases. There w as an i ncrease of 3%-6 % in t he clas s ifica ti on a ccu rac y. Key words: C la ssi fica t ion , D iv e r si ty Measu res, G en et ic Algo rithm, Mul t iple Cla ssifier s, Stru ct ural Div er s ity . 1. INTR OD UCTIO N Comp utatio nal intel lig ence tech n i ques ha v e been used i n ma n y classificatio n prob l ems. T he literature emp ha sis e s that a gro u p of classifie rs is bette r than one clas sifie r [1-5]. T his is beca use the d ecisio n that i s m ad e by a comm ittee of c l assifiers is bette r than the d ec i sion ma de by o ne class ifier. I n this pape r the comm ittee o f clas sifie rs will b e refe rred as an ensemb l e. T h e mos t po pula r way t o gain con fiden ce on the ge neralisation ab ility of an ensemb le i s by introducing di v ers i t y wit hin the ensemble [1, 2, 5]. T his has l ed to th e de ve l opme nt of me asures of di vers it y and v ari o us ag greg atio n sc h eme s for combining classif i ers. H ow ever, d ive r sit y is no t clea rl y defined [6 , 7]. Thu s, a proper m easure of di versit y that wi ll rela te dive rsity to accu racy is to be adop t ed. Current me tho ds commo nl y use the outcom e o f the indiv idual clas sifiers of an ensemble to measu r e di v ers it y . Henc e an ensemble is c on sidere d di v erse i f class i f iers within the en semb le p roduce dif fe rent outcom es a s op posed to hav ing the same o utcomes [1, 6, 7]. I n this pap er, as oppo s ed to look i ng a t the outc omes of the indiv idual clas sifie rs, e nsemble d iversit y is v i ewed as the structura l v ariation within classif i ers th at fo rm an ense mb le [1, 5]. Thus , di vers it y will b e induced b y cha ngin g structural pa ra mete rs of a ne ural network [5]. T h e paper in vestig ates t he relatio nship be tween structural diver sity within an ensem b l e and th e predic tion accurac y of the ensemb l e. I t has been intuitive l y accepte d th at t he clas sifie rs to be comb i ned sh ould be divers e [8]. This is bec aus e it has been fo und mean i ng less to c ombine identical cl ass if iers be cause n o im proveme nt can be ac hieve d when co mb i ning t hem [8, 9]. Hence , m easuring structura l div e r sit y and relatin g it to acc uracy i s crucial i n orde r t o b uild bette r learning mac hine s. Ho we v er, it is nec e ssa ry t o find the op timal size of a n en s emble that gives better ge neraliza ti o n . T herefore , a study on the si ze of the ensemble was done as to find the op ti mal size t hat ca n be use d for the in ve stiga tion. T h e m e th ods fo r me asuring structural div ers ity are to be de v ised and imp lemente d. Mo r eover , the o u tcome d i ve rsit y of structura ll y di ff ere nt classifie rs is critical t o be mea su red. T his is becau se i t is essen tial t o s ho w how correla t ed th e ou tc om es of the struc t ura lly dif f eren t clas sifiers is. Hen ce , the limitatio ns of accuracy in the structura l d iv ersit y are to b e justifie d. Diffe ren t me thod s for cr ea t ing di ve rsit y su ch as bagging and boo sting h ave been explo red [1, 3]. Ho we v er, the agg r egatio n me thods are to be used t o co mb in e the ense mb le p redictions. M ethods o f voti ng and av eraging have been found t o be popular [9, 10] and h ence are u sed in this study . T h e pape r first discu sse s th e ba ckground in se ction 2. Analy sis of the d a ta u sed fo r this st udy is presente d i n sec tion 3. The accuracy measu re and str uctura l mea su r es of di v ers it y used are discussed in sec tion 4 and sectio n 5. T h e m ethodo lo gie s use d in inves tiga tin g the eff ect of diver sity on generaliza tion are p resente d i n sectio n 6. The results an d future work are then disc u ssed in sectio n 7. 2. BACK G ROUN D 2.1. Neural Netwo rks Neura l Network s (NN ) are co mp utatio nal mo de ls t hat have t he ab il ity to l ea rn and m od el l inear and non- li nea r sy stems [11]. T here a re m an y t y pes o f n eural n et work s bu t the mo st commo n n eural n etwo r k ar chitec ture is t he m u lti lay e r p erc eptro n (MLP) [11]. T he n eural ne twork arc hitecture t ha t is used in this pape r is a M L P n etwork as sho w n in Figure 1. Th e MLP netwo rk has the input layer, the hidden la y er and the o ut put laye r. An MLP network has p aramete rs such a s learning r ate, nu m be r of hidden no des and the a ct i vatio n function. T h ese pa ra mete rs c an be va ried t o i nduce structural dive rsit y [5]. T he gene ral equa t ion of the ou tput fu n ctio n of a MLP neura l n etwork is show n belo w (1). ) ) ( ( ) 2 ( 0 1 ) 1 ( 0 1 ) 1 ( ) 2 ( k M j j j N i i j i nner kj outer k w w x w f w f y + + = ∑ ∑ = = (1) where : k y is t he output from the neural network, is the outpu t a ctiv atio n f un ction that can be lin ea r, softmax or logistic, is the hidd en la y er tangen t ial a cti v ation f unction. M is the num be r of t he hidden units , N is t he num be r of inp ut un its, ) 2 ( kj w and ) 1 ( ij w are t he weights in the first and sec ond la y er mov ing from input i to h idden unit j, ) 1 ( 0 j w an d ) 2 ( 0 k w a re the bias es f o r th e unit j. Figu re 1: Th e MLP neu ral netwo rk a rchitecture T h e in puts in to the neur a l ne tw ork are th e demograph ic data attributes from the HIV an tenata l survey and the ou tput is the HI V status o f the ind iv id ual whe re 0 rep resen ts ne ga t i v e and 1 represe nts positiv e. T h e weights of the NN a re up d ated using a back propaga tion algorithm during the training stage [11].T he threshold of 0 .5 i s used in order to achieve a zero or one s olution from the neural netwo rk. T his mean s th at any v alue less tha n 0.5 is co nve rted to 0 and any va lue mo re than 0 .5 is c onve rted to 1. 2.2. Ge ne ti c Algor ith m T h e gene tic a l go rith ms (GA ) are comp ut ationa l m o dels that are base d on the ev o luti on of bio logical pop u lation [2]. Potential so l ution s a r e encode d as t he chromosome s of some i ndiv i dual. T hese indi v iduals are i nitially ge nerate d random l y . T he indiv i duals are e valua ted throug h t he defined fitness functio n. Eac h preceding ge neration is popula t ed by t he fitness solutio n (membe rs) of the previo u s gen eration and t heir off s pring. The offsp rings a r e create d through crossov er an d m utatio n. T h e crossov er process com bines gen etic info r m atio n o f two prev i ous f i ttest s olutions to cr eate new offspr i ng s. Mutation alters th e gen es of the i ndiv idual to intro d uce mo re di v ers ity i nto th e population . I n this way , th e i nitial ge nerate d so l ution can be improv ed ove r t ime [2, 12]. 3. DATA ANALYS IS 3.1. Data C ollection T h e dataset used for t he study is from anten atal clinics i n So u th Africa and it was collected by the depar t me nt of hea l th in 2001. The fea tures in the d ata include the a ge , grav i dity, par ity , e ducatio n, etc. T he demo gr aphic dat a used in the study is sho wn i n t ab le 1 b elo w. T he province was pro v i ded as a stri ng s o it was co n ve rted to an i nt eg er f ro m 1 t o 9. T able 1: T he fea tures from the survey Va riable Ty pe Ran ge 1 Ag e intege r 13-50 2 Educatio n intege r 0-13 3 Parity intege r 0 -9 4 Grav i dit y intege r 1-12 5 Prov in ce intege r 1 -9 6 Ag e of fathe r intege r 14-60 7 HIV status b i nary 0-1 T h e age is that of th e mother visiting the clinic. Educa tion rep resen ts the level of educa tion the mo ther has and rang es fr om 1-13, where 1-12 c orre sponds to gra de 1 t o 12 and 1 3 rep resents t ertiary ed ucati o n . Parity is the num be r of ti m es the mothe r ha s giv en birth wh i lst grav i dity is the number of ti mes t he mo ther ha s been preg nant. Both these quantities are importa nt, as the y sho w the reproduc ti ve a ctivity as well as the reproduc ti ve hea l th state of the wom en . The a ge of the fa t he r respo n si ble fo r the curre nt pre g n ancy is also given an d t he prov i nce entry c orre sponds t o t he geograp hic area where the mother comes from . The l ast feature is the HIV status of the mother where 0 represe nts a negative status wh ilst 1 repr esents a p os iti ve status. 3.2. Data P re-Processing T h e data pr eproce ssing is nece ssary i n orde r to eli m inate impo ssib le situati o n s such as pa r it y be i ng gr eate r than grav i dity b ecau se i t is not po ssi b le for the mo th er t o gi ve birth without fa lling pregna nt. T he pre-proces sing o f th e data resu l ted in a r educ tion of the da t a set. T o us e the datas et f or training, it needs to b e no r malized because some of the data va ria b les with l arge r varianc es will influe nce the result more than others. T h is ensures that all va riables can contribute to t he f i na l netw o rk weights of the predictio n model [13]. The refore, all the data is to be no rmalized be t ween 0 and 1 using ( 2). min max min x x x i x norm x − − = ( 2) where : m in x a nd m ax x is the m ini m u m and maximum va lue of the fea t ures of the data sam ples re spective l y. T h e data were di v id ed into three sets, t he training, va lidation and testing data. Th is was done as to av o i d ov er-fittin g of t he n etwork. The neu ral networks a r e trained by 60% of the data, va l id ated w it h 20% an d t ested with 20% . 4. ME ASUR EME NT OF ACCURA CY Reg re ssio n problems mos tly foc us on using the m e an squa re erro r between the a ctual outcome and the pred icted ou tc om e as a m easure of how we l l neur al networks a r e pe rforming . I n classificatio n pro blem s, th e ac cu ra cy c an be meas ur e d u sing t he conf usi on m atri x [14]. Analy sis of the da tase t t ha t i s b eing used showe d that the data is bia sed towa rds the nega tive HI V status outcomes . Henc e, the data was di vide d such that there is equal n u mb er of HI V p ositive a nd nega ti ve cases. T h e acc urac y me asure that i s used in this study is giv en by (3). = % Accuracy % 100 × + + + + FN FP TN TP TN TP (3) Whe re: = is th e true p os it ive -1 c lass i f ied as a 1, = is the true neg ati v e - 0 cla ssifie d as a 0, = is the f al se neg ati v e -1 cla ssif ied as a 0, = is th e f als e p ositi ve - 0 class if i ed as a 1. 5. MEAS U REME NT OF DIVERS ITY 5.1. Shan non-W iener D ive rsity Measure Sha nno n entropy is a di v ersit y meas ure that was ad opted f ro m ecology and in fo r mation t heory t o und ersta nd ense mb le di v ersity [15]. T his measure is i m pleme n t ed t o me asure structural d i ve rsit y . T h e Sh ann o n -Wiene r ind ex is com mo n l y used in info rmatio n theory t o quantify the unce rt ainty of the state [ 15, 16 ]. If the stat es are di v ers e on e becomes uncerta in of th e ou tcome. I t i s also used i n eco l ogy to m easu re di v ersity of the spec i es. I nstead of bio logical s pecie s, the sp ecies are co nsidered as the indiv idual base classif i er s. T h e Sha nnon di versit y me asure is g i v en by (4). ) lo g ( 1 ln N M i N i n N i n D ∑ = − =                 (4) Whe re: = nu m ber of n eural networks that have the same structure = total num be r of neura l netwo rks in an ense mb l e = tota l nu mb er of diff erent neural networ ks /species = the div ersity index T h e d ive rsity r a n ges f ro m 0 to 1 , where 0 in dica t es low diver sity a nd 1 indicate s h igh es t div ers it y . 5.2. Simpson Diver sity Me asure T h e other measure that w as impleme nted is the Si mpson diver sity meas ure . This me asu r e is also adopte d from eco l ogy t o quantify di v ers it y . I t is quantified b y (5 ). ∑ = − − = M i N N i n i n D 1 ) 1 ( ) 1 ( (5) = nu m ber of n eural networks that have the same structure = total num be r of neura l netwo rks in an ense mb l e = tota l nu mb er of diff erent neural networ ks /species T h e divers i t y i ndex i s give n b y . The di ve rsit y increa ses as the index inc reases. I t ran g es f r om 0 to 1 where 0 me ans there is no di ve rsit y a nd 1 indi c ate th e high est div ersit y . 6. METH OD OLOGY 6.1. Creation of Base Classifie rs Sinc e the foc us of the stud y is the structural di v ersit y , the ac ti v ati o n functio n, lea rning rate and the num be r of hidden no des were v arie d as to induce d i v ersity. Ho wever, v a r ying all the p ar ameters was found to be ineff e ctive b ecau se t he class i fiers tend to gene rali ze t he same w ay . T h erefore , o n l y hidden nodes and acti v ation f unction we re varie d for this inv estigatio n. T h e classif i ers are trained indiv idua l l y using the back prop agatio n me thod ; where th e error is propag at e d back so as t o adjust the weig h ts acc ordingly . T he data use d for training, v al idation and testing a re the HI V data. All t he f eatures of t he input are fed to a l l the networks . The clas sifie rs which have the training accuracy of 60% w e r e ac cepte d. T h e training ac curacy b et ween 60% and 6 3% was achieved . T he hi dden node s were va ried from 7 t o 57 and t he acti va t ion func tion be tw ee n t he l og istics and the linear functio n was random l y varied. T he classifier s wer e trained using quas i- Newton algorithm f o r 10 0 c yc les a t the sam e l earn ing rate of 0.01. 6.2. Co mmittee o f Classi fi ers T h e comm ittee of class ifiers i mp r ove s effic ienc y and clas sificatio n a cc ur acy [17, 18]. T his ensures that the results a re based on the cons ensus d ec ision o f th e b ase clas sifie rs. The base classifie rs o pe rate conc urrentl y during the clas sific atio n and their o utputs are integ rated t o ob tain the fina l ou t put [18]. The model for the c ommittee of c lass i fiers is sho wn in fig ure 2. Netwo rk1 Netwo rk2 Netwo rk3 Combin e Neu ral Ne twork s Output s O 1 O 2 O 3 O Ensem ble Outp ut Inputs Figu re 2: Th e class i f i e r ensemb l e of n eural netw orks T h ere are many agg regati o n methods that c an b e used t o comb ine the ou t co me s of classifiers. The se were explored in t h e prelim i na ry repor t. T he ense mb le outcome s wer e all agg regate d using si mp l e ma jorit y voting . T his wa s cho s en beca us e it is p op ular and ea sy t o i m pleme nt [9]. T h e o ut come s o f eac h ind i v i dual from an ensemb l e a r e f irst co n ve rted to 0 o r 1 u sing 0 .5 as a t hresh old. T he ma jorit y vo ting metho d cho ose s the pred iction that is mo stly pre dicte d by differen t cl assif iers [19]. The o t her me thod th at was impleme nted wa s ave r a g i ng. All the ou tc om es from all the class i f iers are ta ke n and ave raged. 6.3. Evaluation of Optim al Ense mble Size I t is i mp or t an t to us e th e optimal size o f an ens emble t ha t results in bette r gene r a l isatio n o f the d ata [20]. T he ense mb le siz e is determined by the number of classifie rs that belo ng t o an ense mb le. T he created classifie rs wer e used to carry ou t th is experiment. T h e ensemb le size w as increm ented by o n e from 1 to 50 . Howeve r, the structure of the netwo rks w as ma d e to be differe nt b y va rying the hidden node s as t he ensemb l e size increases. Hence , t he size o f the ne twork itself is incre ased as t he nu mbe r of clas sifie rs in the ensemble increases [4]. Figure 3 b el o w sho w s the res ult s ob tained. I t was howeve r o b s erved that th e rela tions hi p between the size and accu racy of the en semble depe nds on the ac cu racy of the ind i v i dual cl a ssi f iers t ha t be long t o the ense mb le. I n crea sing th e siz e of the n eural ne two rk b y increa sing the hidden nodes tends to i mprove the clas sificatio n accu racy a s t he nu mb er of the classi fie rs i n an ense mb l e i ncrea ses. Howeve r , an increa se in si ze results in an incre as e i n t he pr edictio n accuracy . Co nseq uently, after the o pti ma l s ize of 19 classi f iers is rea ched, the ac curacy tends t o remain co nstant. Neve rtheles s, th e siz e of 21 was foun d t o be optima l since it pro d uc ed the b est acc uracy. T h e results ob tained ar e foun d t o b e con current wit h literature. Current ly the op timal si ze of an ensem ble is 25 [18 , 20 ]. Th erefo re, an ense mb le size of 21 i s used for evaluating the re l ationsh i p be tween di v ersit y and per forma nce o f classi fie rs on HI V clas sificatio n. 0 5 10 15 20 25 30 35 40 45 50 48 50 52 54 56 58 60 62 64 66 68 N umber of N eura l N etwor k s Classi f icati on Acc ura c y(%) Av eragin g Vot ing Figu re 3: Th e ensem ble size and classif i c atio n acc ur acy 6.4. Evalu ation of Outcom e Diver sity Currently , m easuring the outcome di v ers i ty had been po pula r t han meas uri ng t he s tructural diversit y [6]. I t was ho wev er necessary t o m e asu re the o utc om e d i ve rsity f o r this st udy . Th i s is beca use it is essential to meas ur e the deg r ee of t he a greemen t and disag re em ent on the ou tc om es of the ensem ble . This expe riment was usef ul for ana l y sing the lim itatio ns on s tructural d i v ersit y re sults. T h e divers ity m e asure s uc h a s Q statistics was u sed t o me asure dive rsity . Q statistics ev aluate the deg r ee of si m ilarit y and dissim i larity in the outcom e s of t h e classif iers wit hin t he ense mb le [8]. The d i ve rsit y ind ex ra nge s f ro m -1 to 1 where 0 indicates the highes t div ers ity and 1 indicate low est di v ersit y [6]. For all 21 clas sifie rs i n an ensem ble , ea ch classifie r is paired with eve r y o t he r classifier withi n the ensemb le. T he r es ult s fr om this study show that ou tc om es of the structurally div ers e c la ssif iers withi n t he ense mb le are highly correla ted. T his i s indicated by a Q va lue which is closer t o 1. T he obtained Q value is from 0.88 to 0 .91. 6.5. Evalu ation of Str uctural Dive rsity T h e create d classifier s were used to in v estiga te the rela ti o n ship be t ween the di v ersity and accu r acy . The r e wer e ten base cla ssifiers o r species that were se l e cted f ro m the created cla ssi fie rs which are all structurally diffe rent based only on the hi dden nodes and a cti v ation f unctions. T hese networks h ad differen t acti v ation f unction and hidden nodes were varied from 10 to 55 i n step s 5. T h e G A has the capabilities to search larg e spaces for a global op tima l s o l ut ion [5]. G A was therefore us ed to search fo r 2 1 c l ass if i ers f rom the 1 0 base c lassifie rs using the ac curacy as the fitnes s function. T he fittest f unction is g i v en by: 2 ) ( Acc Acc T Functio n Fittest − − = (6) Whe re: Acc T is the targ eted ac curac y and Acc is t he ob tained accuracy . The GA c o n ti nues to s earc h un til th e erro r b etween the t arg et ed a cc uracy an d the obta ined ac cu racy i s minim al. Firstl y , it was neces sar y to optim i ze the accurac i es t hat could be attaine d i n o rder to m inim iz e the com pu t ational cos t. T herea fte r, th e attained accu ra cies wer e used in the sec ond run as t he t arge t a ccu racy . The size of the neura l netwo rk comm ittee used is 21 classif iers which are forme d from a combinatio n o f 10 unique base clas sifie rs. Hence , each en se mb le will hav e a repetition of ce rtain clas sifie rs. Once t he ensem ble of 21 classifie rs prod uces the targe ted dive rsit y , the co rresponding structura l div ersi t y is o bta i ne d using both Sim pso n and Sha nno n di vers it y measures g iven in (4) and (5). T h e algor ithm impleme nted is show n i n fig ure 4. Figu re 4: Th e algo rithm use d for eva luating di v ersit y 7. RESUL TS ANALYS IS 7.1. Struc tural Dive rsity Analy sis I n this study , di v ersit y was induced by va r y i n g the pa ra mete rs of t he classif i ers that fo r m an ensemble [5, 16] . The in v estiga tion was done on an ensemb le of 21 clas sifie rs. Figure 5 shows the ob t a ined results using t he Sha nno n dive rsit y m e asure . F igure 6 shows the results ob tained using the Sim pson dive rsit y m e asure. 0.58 0.6 0.62 0.64 0.66 0.68 0. 7 64 64.5 65 65.5 66 66.5 Shanno n Div e rsity Inde x C lassif icat ion Accuracy(%) Figu re 5: Th e eva luation of Sha nno n index w ith accura cy 0.83 0.84 0.85 0. 86 0.87 0. 88 0.89 0 .9 0.91 64.4 64.6 64.8 65 65.2 65.4 65.6 65.8 66 66.2 Simps on Div ersity Inde x C las sifica tio n Accu ra cy(%) Fig ure 6: Ev aluation of Simp son index with accu racy T h e figu res indicate tha t an increase in st ruc tur al dive rsit y results i n an increa se in ac curacy which i s in agree me nt with [16] . T he e xper i m ent wa s don e seve ral times ob se rvi ng t he relatio nship betwee n di vers it y and accurac y using bo th Si mp s on and Shannon di v ersity m e asure . T h erefore the results show n a b ove are th e a verag e of ten diffe rent e xp eri m ents that we re per forme d. Th e re sults sho w that t he two me as ur es are concurren t. In the Sha nno n diversity meas ure, the GA was a ble to atta i n wide range of d ive rsity w herea s in th e Si mpso n meas ure, the range is l imited from 0.8 to 0. 9. T h is was becau s e th e Sha nno n d i v ersit y index de pends on t he n umb er of base clas sifie rs whereas th e Si m pson ’s in de x dep e nds on how eve nl y d istributed t he base classifie rs are [1 5]. Shanno n has sho wn that the mo re uncerta in one is of the outcom e, the more d i ve rse an ensemble is.The res ults clearl y show that structural v ariation o f the pa ra m eters of the n eural netwo rk (cla ssif i e r) do es have a r elations hip with clas sificatio n a ccuracy As the structura l diversity increa se d s o did the acc uracy. 7.2. Discussion and R eco m mend ati o ns I t wa s h oweve r obse rved that the individu al cl ass ifiers within the e nsem ble wer e hi g hly corre lated i n the ou tc om es. T his had affecte d t he results b ecau se ve ry l ow and high ac cu r acies c ould not be attained. I t is howev er recomm en ded that a strategy of adding classifie rs in an ense mb le s uch that onl y clas sifiers th at are uncorre l ated are accepted i n a n ens emble can be ad opte d. T he expe ri men t foc uses on training the classifie rs using all the f eatures of the da ta. I t is ho wever recom mended that diffe rent n etwo r ks ca n be fe d different f eatures of the data. Th is might ens ure that t he outcom es of classifie rs are not highly correlated . He nce, a high er range of ac cu racy and divers it y in de x ca n be attained. During the training stage of the m achine, the weights a r e no rmally random ly ini tialised. H owev er , i t has been foun d t ha t dif ferent in itia l weights induce div ers it y withi n the e nsemb le [1]. T he Shanno n and Simpson divers it y me asures fo cuses on how structurally different t he clas sifie rs in a n ens emble are. T hese me asures d o no t co ns i de r d i v ersity induced dur i ng in itia lisatio n o f weights. T h erefore , it i s recomm ended that fo r future work, a be tter mea sure of structural divers i t y th at inco rporates t he effe ct of we ight initialisatio n should b e dev el op ed. 8. CON CLUSIO N T h e pape r presented t he re l ationsh i p b etwee n structural diver sity and ge n era li zatio n a cc ura cy u sin g Sha nno n and Simp so n di ver sity meas ures t o q ua ntif y d i ve rsit y . The inves tiga tion is ne ce s sary as to build l ear ning ma ch ines o r comm itt ee of n etworks tha t can generalize b etter. T h e results have clearly sh own th at as t he structura l d ive rsit y index ba sed on the mea sures used in cre ases, the ensemb l e ac cu racy increa ses. H ence , the clas si fie rs ca n b e m ade structura ll y differe nt in order to gain goo d class if i ca t ion ac cu racy. T h is has b r o u ght an incre ase o f 3% t o 6% in t he clas sificatio n accuracy . The metho d used to compu te th e results wa s found t o b e com putatio n ally ex pensive due t o the u se of GA. T here i s h owever l imitatio ns broug ht abo ut by th e indi v idua l classifiers produc i ng si mila r ou tc om es ev en th o u gh they are structurally different. Ho wever, the use of measuring structural di v ersit y i n bu ildi ng goo d ensemble s of class ifie r s is stil l to be exp l o red. ACK NOWLEDGEMEN T T h e author would like t o tha nk Fulufhe l o Netshiongo lwe for his cooperation and contrib uti on d uring th e project as a project p artner . Profe ssor Tshilidzi Ma rwa la is t hank ed for supervis ing the pro j ect and additional th a nks are ex tended to the po st g raduate student L ese di Masisi f or his co ntributio n durin g i m plementatio n of the p roject. REFERENC ES [1] G. Brown, J. W y att, R . Harris an d X .Yao . “ Dive rsit y Creatio n Methods : A Sur vey and Categoriz ation , ” Journal of informat ion Fu si on , pp 5-20, Vo l. 6 , No. 1, 2005. [2] J. S y l ve ster, N .V. C h awla , “E vo l utiona r y Ensem ble Creatio n and Thinning ”, Pro c. Of I nternatio nal J oint Co nference on Neural Netwo rks, pp 51 48-5155, 2006. [3] N.V. C hawla, J. Sy lvester, “ E xp l o iting D ive rsit y in En se m b l es: I mprov ing the P erfo rmance on Unba lanced Dat a se t s”, Mu l tiple Classifie r Syste ms , Lec ture N otes in Comp ut er Science , Spring er, V ol. 4472, pp 397- 406, 2007. [4] Y. K i ma , W.N. Stree t , Filip po Men cze r, “ Optim al ensemb le c onstruc tion v ia m eta-e vo lutionar y ensemb le s”, Expert Sy st ems with Applicatio ns, Vo l. 30, No. 4, pp 705-714, 2006. [5] L. Masisi, F.V. Nelwam ond o , T . Marw ala,”T h e effec t of structura l d iv ersit y of an ens emble of classif ier s o n classificatio n ac curacy ”, IASTED Int e rnational C onferenc e on Mo dellin g an d Simul a t io n (Africa-MS) , pp 1- 6, 2008 . [6] L.I Kunch e va, C. J. W h itak er, “Measure s o f Divers ity i n Cla ssi fie r Ensemb l es a nd T heir Relatio nship wit h the Ensem ble Accuracy”, M achine Lea rning , Vo l. 51, No . 2, pp 181–207, 2003. [7] L. I. Kunchev a and C. J. Whitak e r , “ Ten meas ures of diver sit y in cla ssie r ensemb l es: limits fo r t wo classie rs”, I n P roc. of IEE W o rkshop o n Intell i gent Sensor P rocessing, pp 1-10, 2001. [8] R. Polik ar, “ Ensem ble b ased system on d ecision making ”, IEE E Circuit and System Magazine , pp 2 1- 45, [9] C. A S hipp , L . I Kuncheva , "Rela tionshi p be t ween comb in atio n metho ds a nd mea sures o f diversit y in comb in ing classif iers", Information F u sion , Vol. 3, No . 2, pp 135- 148 , 2002 [10] A. Lip nickas, “Classifie rs fusio n with data dep endent fus ion with data depe n dent aggreg atio n sc hemes” , Int e rnational C onfer ence on Informatio n Network s, Syste ms and Techn ologies , pp 147 - 153, 2001 [11] M. Bisho p, Pattern R ecog nit ion and Machine Learn ing , Spring er Scie nce and Busines s Media, 2006 [12] T Marwala. B aye sian Tr aining of Neural Networks Using Genetic Programming. Pattern Reco gnition Lette rs, Vol. 28, p p. 1452-1458, 2007. [13] I . T Nabney. Netlab: A lgorithm s for Partten Reco gnitio n. Sp ringer , 2001. [14] B.B L ek e, T . Marwala, T . Te tte y , “Au t o en coder netwo rks for HIV classificatio n”, Curre nt Sci en ce , Vo l. 91, No. 11, 2006. [15] D.G. Mc donald, J. Dimm ick, ” T h e co nceptualiz a- tio n a n d Measureme nt of Diver sity” , Communic ation Resea rch, SAGE p ublications, V ol. 30, No. 1, pp 60- 79, 2003 [16] L . Masisi, F.V. Ne lwam ond o, T . Marwala , “ The u se entro py measu r es to measu re th e structural dive rsit y of an ensemb l e of classifie r s v i a the use o f Gene ti c Algo r ith m ”, Sc hool of El ec t rical an d in f or mation En ginee ring W itwaters rand University, ICCC, 2008, ac cepted [17] J. Kittler, M . Ha tef , R. D uin , J. Matas. “On Comb ining Classifiers” , IEEE Transactions on P attern Analysis and Mac hine Intel ligence , Vol. 20, No . 3, p p 226-23 9, 1998. [18] D. Op itz, R. Ma clin , “Pop u lar Ensemble Me tho d s: An Empiric al Study”, Journal of A rtific ial Int e lligence Researc h, V ol. 11 , N o. 8, p p 169 - 198, 1999. [19] A. Lip nickas, “Classifie rs fusio n with data dep endent fus ion with data depe n dent aggreg atio n sc hemes” , Int e rnational C onfer ence on Informatio n Network s, Syste ms and Tec hnologies, I C INA S Te, page 1 47- page 153, 2001 [20] W.D. Penny , S.J. R ob erts.”Bay esia n Neura l netw ork s for Class ification: How useful i s the E v id ence Fram ework,” N eur al Network s, Vo l . 1 2, No 1, pp.877-8 92, 1 999

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment