Comparison of Selection Methods in On-line Distributed Evolutionary Robotics
In this paper, we study the impact of selection methods in the context of on-line on-board distributed evolutionary algorithms. We propose a variant of the mEDEA algorithm in which we add a selection operator, and we apply it in a taskdriven scenario…
Authors: I~naki Fern, ez Perez (INRIA Nancy - Gr, Est / LORIA)
Comparison of Selection Metho ds in On-line Distributed Ev olutionary Rob otics Iñaki F ernández P érez Univ ersité de Lorraine, LORIA inaki.fer nandez@lo ria.fr Amine Boumaza Univ ersité de Lorraine, LORIA amine.bou maza@lori a.fr F rançois Charpillet INRIA Nancy Grand-Est, LORIA francois. charpille t@loria.fr Campus scien tifique BP 239 V ando euvre-lès-Nancy Cedex, F-54506, No vem b er 22, 2018 Pér ez, I. F., Boumaza, A. and Charpi l let, F. Comp arison of Sele ction Metho ds i n On- line Distribute d Evolutionary R ob otics. In Pr o c e e di ngs of the Int. Conf. on the Synthesis and Simulation of Livi ng Systems (A life’ 14), p age s 2 8 2 - 289, MIT Pr ess, 2014. Abstract In this pap er, we study the impact of selection methods in the con text o f on-line on-board distributed ev olutio nary algorithms. W e prop ose a v arian t of the m E DEA algorith m in whic h we add a selection operator, and we apply it in a task-driv en sce- nario. W e ev aluate four selection methods that induc e differen t int ensit y of selection pressure in a m ulti-rob ot na vigation with obstac le a voi dance task and a collectiv e foraging task. Exp erimen ts show t hat a small in tensit y of selectio n pressure is su f- ficien t to rapidly obtain go o d p erformances on the ta s ks at hand. W e in tro duce differen t measures to compare the sele ction metho ds, and sho w that the higher the selectio n pressure, the better the p erformanc es obtained, esp ecially for the more c hallengin g fo o d foraging task. 1 In tro duction Ev olutionary rob otics (ER) ([9]) aims to design r ob otic agen ts’ b eha viors using ev o- lution ary algorithms (EA) ([7]). In this con text, EA’s are traditional ly seen as a to ol to optimize agen ts’ con trollers w.r.t to an explic it ob jectiv e function (fitness). 1 This pro cess is carried out in an off-lin e fashion; once the beha vior is learne d and the con troller optimized, the ag en ts are deplo ye d and their con trollers’ paramete r s remain fixed. On the oth er hand, on-line evo lution ([15]) tak es a differen t approa ch in whic h b eha vior learning is p erformed during the actual executi on of a task. In these al- gorithms, learning or optimization is a con tin uous pro cess, i.e. rob otic agen ts are constan tly exploring new b eha viors and adapti ng their con trollers to new condit ions in their enviro nmen t. Usually , this is referred to as adaptatio n. These t wo visions of ER can b e related to on-line and off-line approac hes in Mac hine Learning (ML). Off-line ML algo r ithms learn a sp ec ific task an d solu tions should general ize to unseen situat ions after the learni ng pro cess, w hereas on -line ML algorith m s progressiv ely adapt s olutio ns to new presen ted situations while solving the task. In th is sense, b oth on-line ML algorith ms and on-line EA’s p erform lifelong adapt ation or learning, to p ossibly c hanging en vironmen ts or ob jecti v es. In this pap er, we fo cus on on-line distributed evol ution of swarm rob otic agen ts. W e are in terested in learning individ ual agen ts b eha viors in a distributed con text where the agen ts ad apt th eir con trollers to en vironmen tal condit ions independen tly while deplo y ed. These a gen ts ma y lo cally comm unicate with ea c h othe r a nd do not ha v e a global view of the sw arm. In this sen s e, this app r oac h finds man y ties with Artificial Life, where the ob jectiv e is to design autonomous organisms that adapt to their en vironmen t. On-line distributed ev olution may b e viewed as distributin g an E A on the swarm of agen ts. T raditi onal ev olutiona ry op erators (m utation, crosso ver etc.) are p er- formed o n the agen ts and lo cal comm unication ensures the spread of genetic m aterial in the p op ulation of agen ts. In EA’s, selection op erators driv e evo lution to ward fit individua ls b y con trolling the in tensit y of selection pressure to solv e the giv en task. These op erators and their impact on ev olutio nary dynamics ha ve b ee n extensiv ely studied in off-line con texts ([7]). In this paper, we study thei r impac t in on-line distributed ER, where ev olu- tionary dynamics are differen t to the off-line case: sele ction is p erformed locally on partial p opulation s and fitness v alues on whic h select ion is p erformed are not reli- able. Our experimen ts sho w that, in this con text, a strong sele ction pressure lead s to the best p erformances, con trary to classic al approac hes in whic h low er selection pressure is preferred, to main tain div ersit y in the p opulation. T his result suggests that, in distributed ER algorit hms, div ers ity is already main tained by the disjoin t sub-population s. Sev eral auth ors hav e addressed on-lin e ev olution of rob otic agen t con trollers in differen t con texts: adapta tion to dynamica lly ch anging en vironmen ts ([5]), param- eter tun ing ([6]), ev olution of self-assem bly ([2]), commun ication ([11]), phototaxis and na vigation ([8], [12]). Some of this w ork is detailed in the next section. The autho rs use differen t selec tion mec hanisms in ducing di fferent int ensities of selection pressure to driv e ev olution. In this pap er, we compare differen t s election op era- tors and measure the impa ct they ha v e on the p erformances of learning tw o sw arm rob otics tasks: na vigation with obstacle a void ance and collectiv e fo o d f oraging. W e b egin b y reviewing differen t selection sc hemes prop osed in the con text of on- line distributed ER and then we presen t th e algorithm th at will serve as a test b ed , along with the selection metho ds w e compare. In the fourth section, we deta il our exp erimen tal setting and discuss the results. Finally , we close with some concludin g remarks and f uture directions of r esearch . 2 2 Related W ork In the follo wing, we review sev eral on-line distributed ER algorithms and discuss the selectio n mec hanisms that were applie d to ensure the desired in tensit y of selection pressure in order to driv e evo lution. A common c haracteristic of on-line distributed ER algorithms is that eac h agen t has one con troller at a time, that it execut es (the activ e con troller), and locally spreads altered copies of this con troller to other agen ts. In this sense, agen ts ha v e only a partial view of the p o pulati on in the sw arm (a lo cal repository). Fitness as- signmen t or ev aluation of individual ch romosomes is p erf orm ed b y the agen ts th em- selv es and is thu s noisy , as differen t agen ts ev aluate their activ e co ntrol lers in differ- en t conditions. Select ion ta k es plac e when the activ e con troller is to be repla ced by a new one fr om the rep ository . PGT A (Probabilistic Gene T ransfer Algorithm) in tro duced b y [15], is commonly cited as th e first imple men tation of a d istributed on-line ER algorit hm. This al go- rithm ev olv es the w eigh ts of fixed-topology neural con trollers and agen ts exc hange parts (genes) of their resp ectiv e c hromosomes using lo cal broadcasts. The algorithm conside rs a virtual energy lev el that reflects the p erf ormance of the agen t’s con- troller. This energy lev el increases eve r y time the agen ts reac h an energy source and decreases whenev er comm unicati on tak es p lace. F urthermore, the rate at whic h the agen ts broadcast their genes is prop ortional to their energy lev el and con v ersely , the rate at whic h they accept a receiv ed gene is inv ersely prop ortional to the ir energy lev el. This wa y , selection pressure is int ro duced in that fit agen ts transmit their genes to unfit ones. [12] in tro duced o dNEA T , an on-lin e distributed ve rsion of NEA T (Neuro-Ev olution of Augmen ting T opologies) ([13]), where eac h agen t has one activ e c hromosome that is transmitted to nearb y agen ts. Collected c hromosomes fr om other agen ts are stored in a local rep ository within nic hes of species according to th eir top ologic al similar- ities, as in NEA T. Each agen t has a virtual energy leve l th at increases when the task is p erf ormed correctly and decrease s otherwise. This energy lev el is sampled p erio dically to measure fi tness v alues and, w henev er this leve l reac hes zero, the ac- tiv e c hromosome is replaced b y one in the rep ository . At this p o in t, a sp ecies is selected based on its a v erage fitness v alue, then a c hromosome is selecte d within this sp ecies using binary tournamen t. Each agen t broadcasts its activ e c hromosome at a rate proportional to th e av erage fitn ess of th e sp e cies it belongs to. This, add ed to the fact that the a ctiv e c hromosome is select ed from fit nic hes, main tains a ce rtain selectio n pressure to ward fit individuals. EDEA (Em b odied Distributed Ev olutionary Algorith m ) ([8]), wa s applied to differen t sw arm robotics tasks: photota xis, na vigation w ith obstacle a v oidance and collec tiv e patrolling. I n this algorithm, eac h agen t p ossesses one c hromosome, whose con troller is execute d and ev aluated on a giv en task. A t eac h iteration , agen ts broad- cast their c hromosomes alongside with their fitness to other nearb y agen ts with a giv en probabilit y (fixed parameter). Up on reception , an agen t s elects a c hromosome from those collect ed using binary tournamen t. This last c hromosome is then mu- tated and recom bined (using crosso ve r) with the curren t activ e chromo some with probabi lit y f ( x ′ ) s c × f ( x ) , where f ( x ′ ) is the fitness of the selected ch romosome, f ( x ) is the fitness of the agen t’s curren t c hromosome and s c is a s calar con trolling the in- tensit y of selection pressure. T o ensure an accurate measure of fitness v alues, agen ts ev aluate their con trollers for at least a minim um p erio d of time (maturat ion age), 3 during whic h a gen ts neither transmit nor receiv e other c hromosomes. With mEDEA (minimal En vironmen t-driv en Distribu ted Evo lutionary Algo - rithm), [3] address evol utiona r y adapta tion with implicit fitness, i.e. without a task- driv en fitness fun ction. The algo r ithm tak es a gen e p ersp ec tiv e in whic h succe ssful c hromosomes are those that spread o ve r the p o pulati on of agen ts and w hic h requires: 1) to maximize mating opportunities and 2) to minimize the risk for age n ts (their v ehicles). A t eve ry time step, agen ts execute their resp ec tive activ e con trollers a nd lo cally broadca st m utated copies of the corresp onding c hromosomes. Receiv ed chromo - somes (transmitted b y ot her age n ts) are stored in a lo cal list. At the en d of the executi on p erio d ( lifetime ), th e ac tive c hromosome is replaced with a rando mly se- lected one from the agen t’s li st and the list is emptied. An agen t di es if there are no c hromosomes in its list (if it did not meet other agen ts) and it remains dead un til it receiv es a c hromosome from anothe r agen t p assing by . The a uthors sho w that the nu m b er o f living a gen ts rises with time an d remains at a sustained lev el. F urthermore, agen ts dev elop na vigation and obstacle av oid- ance capab ilities that allo w them to b et ter spread their ch r omosomes. This w ork sho ws that en vironmen t-driv en selectio n pressure alone can main tain a certain lev el of adaptation in a swarm of rob otic agen ts. A sli gh tly mo di fied v ersion of this algo- rithm is used in this w ork and is detailed in the next section. [10] prop osed MONEE (Multi- Ob jecti v e aNd op en-Ended Evolu tion), an exten- sion to mEDEA ad ding a task-driv en pressure as well a s a mec hanism (called mar- k et) f or balancing the distribution of tasks among the p opulation of agen ts, if sev eral tasks are to b e tac kled. Their exp erimen ts sho w that MONEE is capabl e of impro v- ing mEDEA’s p erformances in a collectiv e concurren t foraging task, in whic h agen ts ha v e t o collect items of sev eral kinds. The authors sho w that the swarm is ab le to adapt to the en vironmen t (as mEDEA ensures), while foraging differen t kinds of items (optimizing the task-solving b eha v- ior). In this con text, eac h t yp e of item is considered a differen t t ask. The algorithm uses an explicit fitness functio n in order to guide the searc h to ward b etter p erforming solutio ns. T he market mec hanism, whic h tak es in to accoun t the scarcit y of items, ensures that ag ent s do not focus on the most frequen t kind of items (the easiest task), th us ne glectin g less fr equen t ones. In their pap er, the ag ent ’s con troller is se- lected using rank-based selection f rom the agen t’s list of c hromosomes. The autho rs argue that when a sp ecific task is to b e addressed, a task-driv en selection pressure is necessary . This idea is discussed in the remainder of this pap er. In the a foremen tioned w orks, authors used differen t cla ssical selection op erators from e v olutionary computat ion in on- line distributed ER al gorithms. It is ho wev er not clear if these op erators p erform in the same fashion as when they are used in an off-line non-distribu ted manner. In an on-line and distributed con text, evolu tionary dynamics are differen t, since selection is p erformed locally at th e agen t leve l and o v er the individuals whose v ehicles had the opp ortunit y to meet. In addition , and this is not inheren t to on-lin e distributed ev olution but to man y ER con texts, fitness ev aluation is in trinsically noisy as the agen ts ev aluate their con trollers in differen t condit ions, whic h ma y ha ve a great impact on their p erformance. A legit imate ques- tion one could ask is: do es it still make sense to use selection? In this pap er, we compare differen t selection methods corresp onding to differen t in tensities of selecti on pressure in a task-d riv en con text. W e app ly these methods in a mo dified v ersion of mE DEA an d measure thei r impact on t wo differen t sw arm 4 rob otics tasks. 3 Algorithms In this section, w e describ e the v arian t of mEDEA w e used in our exp erimen ts (Al- gorithm 1). I t is run b y all the agen ts of the swarm in dep enden tly in a d istributed manner. At an y time, eac h agen t p ossesses a s ingle con troller w hic h is r andomly initia lized at the b eg inning of ev olution. The main differenc e w.r.t. mE DEA is that the algorithm alternates b et ween t wo phases, namely an ev aluatio n phase, in whic h the agen t runs, ev aluates and trans- mits its con troller to nearb y listeni ng agen ts, and a listening phase, in whic h the agen t do es not mo v e and listens to incoming c hromosomes, sen t b y nearb y agen ts. The ev aluati on and the listenin g phases last T e and T l resp ectiv ely , an d, for differen t rob ots, they tak e place at differen t momen ts. Sinc e the differen t robots are desyn- c hronized, rob ots in th e ev aluati on phase are able to spread their g enomes to other rob ots that are in the listenin g phase. If only one common phase tak es place, an agen t that turns on the sp ot transmits its con troller t o an y fitter agen t crossin g it, as broadcast and reception are simu lta- neous. This separa tion in t w o phases is inspired from MONEE where it is argued that it lessens the spread of p oorly ac hieving con trollers. Also, task-driv en selection w as in tro du ced in MONEE to s im ultaneously tac kle sev eral tasks. The agen t’s co n troller is execute d and ev aluate d during the ev aluation pha se. F or eac h agen t, this phase last s T e time-steps at m ost 1 . During this ph ase, at eac h time-step the ag en t execute s its curren t con troller b y readin g the senso r s’ inp uts and computing the motors’ outpu ts. The agen t also up dates the fitness v alue of the con troller, based on the outc ome of the its act ions, and lo cally broa dcasts b oth th e c hromosome corresp onding to its con troller and its curren t fi tness v alue. Once the T e ev aluation steps are elapsed, the agen t begins its listening phase, whic h lasts T l time-steps. During this ph ase, the ag ent sto ps and listens for in com- ing c hromosomes fr om nearb y passing ag ent s (ag ent s tha t are in their ev aluati on phase). These c hromosomes are transmitted al ong with their respectiv e fitness v al- ues. Consequen tly , a t the end of this phase, an agen t has a lo cal list of chromoso m es and fitnesses, or local p opulatio n. Anothe r difference w.r.t mEDEA is that the local p opulati on also con tains the agen t’s curren t c hromosome. This is done to ensure that all agen ts alwa ys ha ve at least one c hromosome in their r esp ectiv e p op ulations, whic h happ ens p articula rly when an agen t is isolated d uring its listenin g phase and do es not r eceiv e an y other c hromosome. In mEDEA, isolated age n ts sta y inact iv e un til they receiv e a c hromosome from anothe r age n t pa s s ing by . After the listening p erio d, the agen t need s to load a new con troller for its next ev aluation phase. T o do so, it selects a c hromosome from its list using one of the selectio n metho ds discussed further. The selected chromoso m e is then m utated and b ecomes the agen t’s activ e con troller. In this case, muta tion consists in adding a normal random v ariable w ith mean 0 and v ariance σ 2 to eac h gene (eac h synaptic w eigh t of the neuro-con troller). Once the next con troller is c hosen, the list is emptied. This means select ion is p erformed on a list of c hromosomes that ha ve b ee n colle cted b y the agen t during the 1 A little rando m num b er is substra cted from T e so as the e v alua tion phases of the ag ents are not synchronized. 5 Algorithm 1 mEDEA 1: g a := r andom () 2: w hile true do 3: := ∅ 4: // Ev aluation phase 5: for t = 1 to T e do 6: exec ( g a ) 7: br oadcast ( g a ) 8: end for 9: // Listening phase 10: for t = 1 to T l do 11: := S listen () 12: end for 13: := S { g a } 14: sel ected := sel ect () 15: g a := mu tate ( selected ) 16: end whi le previous listening phase. At this time, the new con troller’s ev aluation phase b eg ins. W e consider one iterati on of the algo r ithm (ev aluation plus listening phase) as one generat ion. The selecti on metho d selects the new chro mosome among the collected ones based on th eir fitne ss. This can b e do ne in differen t manners, dep ending on the desired in tensit y of selection pressure. In this paper we compare four differen t selec- tion metho ds, eac h one defining a differen t in tensit y of task-driv en selectio n pressure. The c hoice of these selecti on m etho ds aims at giving a large span of in tensities of selectio n pressure, from the strongest ( Best ), to the lo west ( R andom ): Best Select ion: This metho d deterministica lly selects the con troller w ith the highest fitness. This is the select ion method with the s trongest selection pressure, as the agen t will nev er b e allo we d to select a con troller wit h a lo we r fitn ess than the previous one. Best selection can b e compared to an elitist selectio n sc heme where the b est fit con trollers are alw a ys k ept. Algorithm 2 Best selection 1: order x i and index as x i : n suc h that: f ( x 1: n ) ≥ f ( x 2: n ) ≥ . . . ≥ f ( x n : n ) 2: re turn x 1: n Rank-Based Sele ction: In this case, selection probabi lities are assigned to eac h co ntrol ler accordi ng to their rank, i.e. the position of th e con troller in the list, once sorted w .r.t. fitness v alues. The b est con troller has the high est probabi lit y of b eing sele cted; ho wev er, less fit con trollers still ha ve a p ositiv e probabilit y of b eing selected . T r aditio nally , this metho d is preferred to Roulette Wheel selection that assigns individu als probabiliti es proportional to their fit ness v alues, whic h hig hly biases ev olution tow ard b est individuals. 6 Algorithm 3 Rank-based selection 1: order x i and index as x i : n suc h that: f ( x 1: n ) ≥ f ( x 2: n ) ≥ . . . ≥ f ( x n : n ) 2: select x i : n with p robability P r ( x i : n ) = n +1 − i 1+2+ ... + n 3: re turn x i : n Binary T ournament: This method uniformly samples a n umber of con trollers equal to the size of the tourna men t (t wo in our case) and selects the one with the highest fitness. Here, the selection pressure is adjusted through the size of the tour- namen t: t he higher the size, the higher the selec tion pressure, the extreme case b eing when the t ournamen t size is equa l to the size of the populatio n. In th is case , the b est con troller is c hosen 2 . Conv ersely , when the size of the tournamen t is tw o, the induced selection pressure is the lo west. Algorithm 4 k -T ournamen t selection 1: uniformly sample k x i , n oted { x 1: k , . . . , x k : k } 2: order x i : k suc h that: f ( x 1: k ) ≥ f ( x 2: k ) ≥ . . . ≥ f ( x k : k ) 3: re turn x 1: k Random Selection: This method selects a con troller in the lo cal p opulation at random, disregarding its fitness v alue and therefore induc ing no task-driv en selectio n pressure at all. R andon selection is considered as a baseline for comparisons with the other metho ds that effectiv ely induce a certain task-driv en s election pressure. As discussed in the previous secti on, th is is th e selection operator use d by m EDEA for evol ving surviv al capab ilities of the swarm w ithout an y task-driv en explicit goal. By considering R andom in our exp erimen ts, we aim to compare the original mEDEA selectio n sch eme with more selectiv e op erators. Eac h one of these four selectio n methods induces a differen t in tensit y of selec- tion pressure on the ev olution of the sw arm. In the next section, we describ e our exp erimen ts comparing the impact of eac h one of these inte nsities. 4 Exp erimen ts W e compare these selecti on m ethods on a set of exp erimen ts in sim ulation for t wo differen t tasks, fast-forw ard navi gation and collectiv e foraging , whic h are t wo w ell- studied b enc hmarks in sw arm robotics. All our exp eriment s were p erformed on th e Rob oRobo sim ulator ([4]). 4.1 Description In all exp erimen ts, a swarm of rob otic agen ts is deplo y ed in a b o unded en vironmen t con tainin g static obstacle s (blac k lines in Figure 1). Agen ts also p erceiv e other agen ts as obstacle s. 2 It is as sumed that sampling is pe r formed without replac ement . 7 All the agen ts in the sw arm are morph ologic ally homoge neous, i.e . they ha v e the same ph ysical properties, sensors and m otors, and only d iffer in the parameters of their respectiv e con trollers. Eac h agen t has 8 obstac le proximit y sensors ev enly distribut ed aroun d the agen t, and 8 fo o d item sensors are added in the case of the foraging task. An item sensor m easures the distan ce to the closest item in the direc- tion of the sensor. These simu lated agen ts a re s imilar to Khep era or e-puc k rob ots. W e use a recurren t neural net w ork as the arc hitecture of the neuro-con trollers of the a gen ts (Figure 1). The inputs of the net w ork are the a ctiv ation v alues of all sensors and the 2 outputs corresp ond to the transla tional and rotat ional v elo cities of the agen t. The activ ation function of the outpu t neurons is a h yp erbolic tangen t, taking v alues in [ − 1 , +1] . T w o bias connect ions (one for eac h output neuron), as w ell as 4 recurren t conne ctions (previous sp eed a nd previous rotation for b oth ou tputs) are added. This setup yields 22 connection w eigh ts for the na vigation task and 38 for the foraging task in the neuro-c ont r oller. The ch romosome of the con troller is the vec tor of these w eigh ts. T able 1 summarizes the differen t parameters used in our exp erimen ts. bias obstacle sensors fo o d sen sors v r v t Figure 1: Left: the simulation environmen t con ta i ning ag en ts (red dot s with thin hair- lines represen ting sensors), obstacles (dark lines) and fo od items (blue dots). Righ t: the arc hitecture of the neuro-con tro ller. In the na vigation task, agen ts m ust learn to mov e as fast and straigh t as p ossible in the en vironmen t while a v oiding obsta cles, whereas in the foraging task, agen ts m ust col lect fo od items prese n t in the en vironmen t (Figure 1 ). An it em is collected when an ag ent passes o v er it, at whic h time it is replaced b y another item at a random lo cation. W e define the fitness function for the na vigation task after the one in tro duced in 8 T able 1: Ex p erimen ta l settings. Exp erimen ts Num b er of fo o d items 150 Sw arm size 50 agents Exp. length 5 × 1 0 5 sim. steps Num b er of runs 30 Ev o lution Ev o lution length ∼ 250 generations T e 2000 − r and (0 , 500) sim. steps T l 200 sim. steps Chromosome size Na v .: 22, F orag.: 38 Mutation step-size σ = 0 . 5 ([9]). Eac h agen t r computes its fitness at generation g as: f g r = T e X t =1 v t ( t ) · (1 − | v r ( t ) | ) · min ( a s ( t )) (1) where v t ( t ) , v r ( t ) and a s ( t ) are resp ectiv ely the translation al v elo cit y , the rotationa l v elo cit y and the activ ations of t he obsta cle sensors of th e ag ent at eac h time-step t of its ev aluation phase. In the foraging task, a con troller’s fitness is computed as the num b er of items collec ted during its ev aluation phase. F urthe rmore, since we are in terested in the p erformance of the en tire s warm, we de fine the sw arm fitness as the sum of the individ ual fitness of all agen ts at eac h generation: F s ( g ) = X r ∈ sw ar m f g r (2) 4.2 Measures A c haracteristic of on-line ER is that agen ts learn as they are p erf orming the ac- tual task in an op en-end ed wa y . In this con text, the b est fitness eve r reac hed by the swarm is not a reliabl e measure, s ince it only reflects a ”go o d” perf ormance at one poin t of the ev olutio n. F urthermore, fitness ev aluation is inheren tly noisy , due to differen t ev aluat ion conditions encoun tered b y the agen ts. Therefore, we in tro- duce four measures that will b e used to compare the impact of the differen t selection methods. These measures s ummarize information on the sw arm span ning o ve r sev- eral generati ons. They are used o nly for ev aluatio n and comparison of the sele ction methods a nd are co m puted once the ev olution has ended. A pictorial description of these four measures is sho wn in Figure 2. • A v erage accum ulated swa r m fitness ( f c ): is the a v erage swarm fitne ss in the last generations. This metric reflects the p erformance of the swa rm at the end of the ev olution. In our exp erimen ts, w e compute the av erage o ver the last 8% generat ions. • Fixed budget swarm fit ness ( f b ): is the swarm fit ness reac hed at a certain generat ion (compu tation al budget) . This measure helps to compare differen t 9 f t g b f c f t f b g b f t g f f t t f f t f a Figure 2: A pictoria l de scripti o n of the four compari son measures. F rom top to bott o m and left to right: the a verage accum ulated sw arm fitness, the fixed budget sw arm fitness, the tim e to reac h target and the accumulated fitness abov e target. methods on the same grounds. In our exp eriment s, we measu r e this v alue at 92% of the evo lution, whic h corresp onds to the first generation con sidered in the computation of f c . • Time to reac h target ( g f ): is the firs t gene ration at whic h a prede fined targe t fitness is reac hed. If this lev el is ne v er r eac hed, g f corresp onds to th e last generat ion. W e fixed the target at 80% of the maxim um fitne s s reac hed o v er all runs and all selection metho ds. This metric reflects a certain con v ergence rate of the algorit hms, i.e. ho w fast the sw arm hits the target fitness on the task at hand. • A ccum ulated fitness ab o v e target ( f a ): is the sum of all swarm fitne s s v alues ab ov e a predefined ta rget fitness. I t refle cts to whic h exten t the target leve l is exceede d and if this p erformance is main tained o ver the long run. W e used the same target fitness as with g f . These compa r ison m easures are not to b e tak en individual ly . F or instance f c and f b complemen t eac h other and give an indication of the lev el and stabilit y of the p erformance reac hed by the sw arm at the end of ev olution. If f b and f c are close then p erformance o f the swarm is stable. Also , g f and f a com bined refle ct ho w fast a giv en fitness lev el is reac hed and to w hic h exten t that level is excee ded. A dding the t wo latter measures to f c sho ws if that trend is main tained. 10 4.3 Results and discussion F or b oth na vigation and foragi ng tasks, we run 30 independen t runs for eac h selec- tion method, and w e measured F s at eac h generati on in all runs. Figures 3 and 4 sho w the median F s p er generati on ov er the 30 runs for eac h task. W e compu ted the four p erforman ce metrics in the case of na vigation (Figure 5) and of foraging (Figure 6). F or both tasks, we p erformed pa irwise 3 Mann-W hitney tests at 99% confid ence on these m easures, b et ween the four selectio n metho ds. 0 50 100 150 200 250 10000 20000 30000 40000 50000 60000 Generations Fitness Best Rank−based Bin. T ourn. Random Figure 3: Median swarm fitness per gene ration ov er t he 30 runs for the navigation task. On the one hand, upon analysis of Figure 3 and Figure 4, we observ e that the sw arm rapidly reac hes a high fitness lev el in b oth tasks whenev er there is a task- driv en selection pressu re, i.e. with Best , R ank-b ase d or Binary tournament selec- tion. On the other hand, without an y selection pressure ( R andom ), learning is m uch slo we r. F urthermore, f or the three former selection methods the algorithm reac hes comparabl e lev els of p erformance in terms of median v alues of the sw arm fitness. An exception ca n be not ed for Best selection in the foraging task, whic h ou tp er- forms R ank-b ase d and Binary tournam e nt . Despite the lo wer p erformances ac hiev ed b y R andom , the sw arm still manages to learn beha viors for b oth tasks. This can be seen in the increasing trend of the 3 P a irwise in this context means all combinations of pairs of selection metho ds, s ix combinations in our case. 11 0 50 100 150 200 250 0 500 1000 1500 2000 2500 Generations Fitness Best Rank−based Bin. T ourn. Random Figure 4: Median swarm fitness per gene ration o v er the 3 0 run s for the foraging t a sk. median sw arm fitness in Figure 3 a nd Figure 4 . This result is exp ected on the n a vi- gation task. As it is the case in ([3]), en vironmen tal pressure drive s ev olutio n to ward b eha viors tha t maximize mating o pp ortuniti es and t h us beha viors th at explore the en vironmen t, inc r easing the swarm fitness. The same trend is also observ ed on the foraging task. The impro vemen t is slo we r but still presen t w ith R andom selection . T his coul d b e expl ained b y the fact that collec ting items is a byproduct of maximizing mating opportunities. Agen ts colle ct items b y ch ance while they na vigate trying to mate. When insp ecting the sw arm in the sim ulator, w e observed that , when selection pressure is presen t, the ev olve d b eha viors driv e the agen ts to ward fo o d items w hic h m eans that the fo o d sensors are in fact exploited. In other words, ev olution dro ve the con trollers to use these sensors. Ho we v er, w ithout an y selection pressure ( R andom ), there can not b e a similar drive. W e also observe d this in the sim ulator: agen ts we re not at tracted b y foo d items for R andom selecti on. When we analyze the comparison measures we in tro duced earlier, similar trends are observ ed. Figure 5 (resp ectiv ely Figure 6) sho ws the b o x and whisk ers plots of the four measures for eac h selection metho d o ver the 30 runs for the na vigation task (resp ectiv ely the foraging task). On the na vigation ta sk, the pairwise comparisons of th e four measures, using Mann-W hitney tests at 99% confid ence lev el, yield s ignific an t statistical difference b et wee n all selecti on methods except b et we en Best and R ank-b ase d (p-v alue= 0 . 079 5 ) 12 Figure 5: B o x and whiske r plots ( 30 i ndependen t runs) of the comparison measures for the four selectio n metho ds on the na viga tion task. F rom top to b ottom and left to righ t: f c , f b , g f and f a . The lab el p > 0 . 0 1 indicates no statistical difference for the correspo nding t w o selection metho ds. and b etw een R ank-b ase d and Binary tournament (p-v alue 0 . 0116 ) in the case of the time to reac h target ( g f ). W e also observ e that Best reac hes a higher swarm fitness for the fixed budge t than the rest of selec tion meth o ds, and this lev el is main tained at the e nd of ev olu- tion, as is sho wn in f c and f b (upper left and righ t in the figure). The target fitness lev el is rapidly r eac hed for the three methods in ducing selection pressure, and there is not significa nt difference betw een Best and R ank-b ase d , nor b et ween R ank-b ase d and Binary T ournament w.r.t g f (lo w er left). F urthermore, in the case of Best , the required leve l is not only reac hed but surpassed during the en tire evolu tion, leadin g to a v alue of f a m uc h higher than the ones of the r est of selection methods (lo wer righ t). Ho wev er, this is not the case for R andom selection that has m uc h lo wer f b (upper righ t) and f c (upper left), and do es not reac h the target fitness level on more than half the r uns that were launc hed (lo wer left and r igh t). On the foraging task, there is a s ignific an t difference for all pairwise comparisons, except b et we en Binary T ournament and R andom in the case of the time to reac h 13 Figure 6: Bo x and whisk er plot s ( 3 0 indep enden t runs) of the comparison measures for the four selection metho ds on t he foraging task. F ro m top to b ot t om a nd left to right: f c , f b , g f and f a . The label p > 0 . 01 indicates no statistical difference for the corresp o nding t wo selection methods. target, g f , and the accum ulated fitness ab o ve target, f a , (p-v alue= 0 . 041 9 in b ot h cases). This is explaine d b y the fact that v ery f ew runs attained the target fitness 4 in whic h case g f is the last generati on and f a is almost zero. T here is also no statistical difference b etw een R ank-b ase d and Bina ry T ournament on t he fixed budget sw arm fitness, f b , (p-v alue= 0 . 0105 ). This means that Binary T ournament reac hes a fitness at the giv en budget that is comparable to the one of R ank-b ase d , but it do es not main tain this leve l, since f or these t wo methods th e difference is significan t on f c . Best also giv es b etter results on the foraging task: a high swarm fitness is reac hed and main tained at the end of ev olution ( f b and f c , upp er left and righ t). It surpasses the target fitness lev el in almost all r uns mu c h faster and to a larger exten t than R ank-b ase d , that also manages to r eac h the required lev el for most runs ( g f , lo w er left), altho ugh b y a lo w er lev el ( f a , lo w er r igh t). This is not the case for T ournament and R andom that do not ac hiev e the target fitness lev el for most runs (lo wer left and 4 F or b o th tasks , the target fitness is 80% o f the highest fitness rea ched by all metho ds during all runs . 14 righ t). W e can observe that all task-driv en selection pressures yield muc h b et ter p er- formances on b oth tasks compa r ed to R andom s election. Conseque n tly , we ma y conclu de that select ion pressure has a p ositiv e impact on p erformances, when solv- ing a give n task, and when the ob jecti ve is not only to ach iev e adaptation of the sw arm as it was the original motiv ation of mEDEA. F urther, statistical tests sho w a direct correlation betw een the selection pressure and the p erformances ac hiev ed b y the swa r m on the t wo considered tasks. In other w ords, the stronger the selection pressure, the b et ter th e p erf ormances r eac hed by the s w arm. In general, it has b een argued that elitist strategies are not desirable in tradi- tional EA’s, and the same argumen t holds for traditiona l ER. This is due to the fact that elit ist strategies ma y lead to a premature con verg ence at l o cal optima. There exists an extensiv e b o dy of w ork, esp ecially in non-con v ex optimizati on, where it is preferable to explicitly main tain a certain lev el of div ersity in the p opulatio n to escape lo cal optima and to deal with the exploratio n vs. explo itation dilemma. This requiremen t is p erhaps not as strong in th e con text of distributed ER as our ex- p erimen ts sho w . Selec tion is perf ormed among a p ortion of the p opulat ion at the agen t lev el, therefore, one m igh t argue that these algorithms already main tain a cer- tain lev el of div ers ity inheren t to the fact that sub-p opulat ions are distributed on the differen t agen ts. C omparisons with other approac hes in whic h separated s ub- p opulati ons are evo lved , such as spat ially structured EA’s ([14]) and island mo dels ([1]), could give further insigh ts on the dynamics of this kind of evo lution. 5 Conclusions In this pa p er, we studied the impact of task-driv en selecti on pressures in on-line distribut ed ER for swarm b eh avio r learning . This kind of algorithms raises several question s concernin g the usefulness of selection pressure (partial views of p opula- tion, noisy fitness v alues, etc .). W e compared four selection metho ds inducing dif- feren t in tensities of selection pressure o n t wo tasks: na vigation with obstacl e a void - ance and colle ctiv e foraging . Our exp erimen ts sho w that selection pressure largely impro v es p erf ormances, and that the in tensit y of the selection op erator p ositiv ely correlates with the p erformances of the swarm. F oraging a nd na vigation ca n b e considered as relativ ely simp le tasks and w e b e- liev e that more complex and c hallengin g ones, inv olving decept ive fitness functions, could giv e further insigh ts o n selection and ev olution dynamics in the distribute d case. References [1] Enrique Alba and Marco T omassini. Para llelism and ev olutionary algorithms. Evolutionary Computation, IEEE T r ansactions on , 6(5):443–4 62, 2002. [2] Raffaele Bianco and Stefano Nolfi. T o w ard op en-ende d ev olutiona ry rob otics: ev olving elemen tary r ob otic units able to self-assem ble and self-reproduce. Con- ne ction Scienc e , 16(4):22 7–248, 2004. 15 [3] Nicolas Bredec he and Jean-Marc Mon tanier. Envi ronmen t-driv en Em b o died Ev olution in a Po pulati on of Auton omous Agen ts. In PPSN 2010 , pages 2 90– 299, Krako w , Pola nd, 20 10. [4] Nicolas Bredec he, Jean-Marc Mon tanier, Berend W eel, and Ev ert Haasdijk. Rob orobo! a fast rob ot s im ulator for swa rm and collectiv e robotics. CoRR , abs/13 04.288 8, 2013. [5] Cr istian M Din u, Plamen Dimitro v, Berend W eel, and AE Eib en. Self-ad apting fitness ev aluatio n times for on-line ev olutio n of simula ted rob ots. In Pr o c. of GECCO’13 , pages 191–19 8. A C M, 2013. [6] AE Eiben, Giorgos Karafotias, and Ev ert Ha asdijk. Self-adaptiv e m utation in on-lin e, on-board evol utionary robotics. In F ourth IEEE Int. Co nf. on Self- A daptive and Self-Or ganizing Systems W orkshop (SASOW), 2010 , page s 147– 152. IEEE, 2010. [7] Agoston E. Eib en and Jim E. Smith. Intr o duction to Evolutionary Comp u ting . Springe r, 2003. [8] Giorgos Karafotias, Ev ert Haasdijk, and Agosto n Endre Eiben. An algo rithm for distributed on -line, on-b oard evo lutionary robotics. In Pr o c. of GECCO ’11 , pages 171–1 78, New Y ork, NY, 2011. ACM. [9] Stefano Nolfi and Dario Floreano. Evolutionary R ob otics: The Biolo gy, Intel li- genc e, and T e chnolo gy . MIT Press, Cambridg e, MA, USA, 2000. [10] Nikita Nosk o v, Ev ert Haasdijk, Berend W eel, and A.E. Eiben. Monee : Us- ing paren tal inv estmen t to com bine op en-ended and task-driv en ev olution . In Esparcia- Alcázar, editor, App. of Evol. Comput. , v olume 7835 of LNCS . Springe r Berlin, 2013. [11] LuisE. Pineda, A.E. Eib en, an d Marteen Steen. Ev olving comm unication in rob otic s warms using on-line, on-board, distributed ev olutionary algorithms. In Cecilia et al . Chio, editor, Applic ations of Evolutionary Computation , v olume 7248 of LNCS , pages 529–538 . Springer Berlin Heidelberg, 2012. [12] F ernand o Silv a, P aulo Urbano, Sanc ho Oliv eira, and Anders Lyhne C hris- tensen. o dneat: an algorithm for distribu ted online, on b oard ev olution of rob ot b eha viours. I n Artificial Life , volu m e 13, pages 251–25 8. MIT P ress, 2012. [13] Kenneth O. Stanl ey and Risto Miikkulainen . Ev olving neural net wo r ks through augmen ting topologies. E vol. Comput. , 10(2):99–1 27, June 2002. [14] Marco T omassini. Sp atial ly structur e d evolutionary algorithms . Springer Berlin, 2005. [15] Ric hard A. W atson, Sev an G. Ficici, and Jordan B. Po llac k. Embo died ev olu- tion: Distrib uting an ev olutio nary algo rithm in a p o pulati on of rob ots. R ob otics and Autonomous Systems , 39:1–18, 2002. 16
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment