Prediction of Platinum Prices Using Dynamically Weighted Mixture of Experts

Prediction of Platinum Price s Using Dy namicall y Weighted Mix ture of Experts Baruch Lubi n s ky , Beki r G e nc and Ts h ilidzi Marwala Unive rsity o f the Witwate rsr and Pr i v ate Bag x3 Wi ts, 2050, South A fric a Abstrac t —N eural netw ork s are powerfu l t o o ls f o r classif ication and regress ion in static enviro nment s. Th is paper de scribe s a techn iqu e fo r creating an ensemb le of n eural netwo rks t ha t adap ts dynam ically t o chan gin g co nditio ns. The mo del separat es the input space into four regio ns and each netwo rk is given a weigh t i n each reg ion based on its perform an ce o n samp les fro m that region . Th e ensem b le adapts dyn amica lly by constantly adjusting these weigh ts based o n the ne two rks’ curre nt perfo rmance . The d atase t used is a co llectio n o f financ ial ind icato rs with t he go al o f predi cting the future plat inum price . A n ensemble with no weigh tings do es no t impro v e on the naiv e estimate o f no w eek ly change; ou r wei ght ing algo rithm gi v es an averag e percentage error of 63 % for twenty w eek s of predi ctio n. Index T er ms —Adap tiv e estima tion , Mult ilayer pe rc eptro n, Neura l netwo rk s, Plat inum, Pred ictio n me thod s I. I NTRO DU CTIO N Neural netwo rks a re know n to be powerful too ls for predicti ng f uture values of a ti me se ries [1]. Neu ral n e two rk regressio n is espec ially p owerful in cases w her e simple linear r eg ress ion i s ineffe c tive due t o th e n o n lin ea r na tu r e of th e sy stem [2]. On e such s y stem is th e finan c ial m arke t, in w h ich acc ur ate pr edic tio n s ar e diff i c ul t to make. This paper exa min e s a n eu r al n e t w ork technique used to f orec a st the price o f pla t in u m . This pape r demo n s trates th e adva n tage gained by spatially di v i ding an input space . T hat i s s eparating the input into diffe r ent r egio ns an d giving eac h memb er o f an ensemb l e a w eight in e ac h n etw ork. T h ese we igh ts are th e n extended to be functions o f ti me in the study o f pla tinum price p r edictio n s. The approac h take n gives positive results w i th a relatively h ig h accu r acy of prediction to this n o t o r iously diff icult pr ob lem [3]. A mixture of expe r ts is used in order to c ov er the dive r se factors that affe ct th e future price of platinum. The vo ti ng we ight allo cated to each h y pothesis is updated after e a ch test sample. This dy n amic we ight ing is a nove l a pproach to the prob lem an d is s how n to greatly increase the accu r acy of th e e n se mble. II. M IXTURE OF E X PERT S The powe r of a n eu r al n etw ork to make predictio n s can be gr e atly i ncreased by comb i nin g the outpu t of a n umb er of n etw orks co ll ec ted i n an e n semb l e [4]. For case s whe r e the sy stem is too co mpl ex t o be learned by a sin gle netwo r k, a m ix ture of exper ts ca n be used. Each n etw ork can c orrec t ly le a rn so me feat ure of th e sy stem. Th e se netwo r ks can th e n b e comb in ed t o prov i de a model fo r th e w h ole s ys t e m. Th e method of combining the n etw orks will depe n d o n the natu r e of th e data be ing modele d [5]. III. S PA T IAL D I VIS IO N OF I NPUT S P A CE One method of combining the outputs of th e n etw orks in an ensemb l e is to s i mply take th e mea n of all the outputs. How ever this does n o t take a dv an tage of th e fact that eac h netwo r k may have l e arn ed a d iffe r ent f eature of th e dataset. Any in put space can be divided spatia l ly along different fe at ures to create r egio n s in th e input spac e [6]. The perfo r mance of each expert can the n be judge d per r egio n. Each n etw ork i s a ssi gned a num e r ica l w eight in eac h regio n . Th e n the output of the en se m b le is the we igh ted ave r age o f th e each n etw ork’s output [7] . Conside r an ensemb l e made up o f netw orks k f w i th correspo n ding weights ) ( regi on w k . For a giv en in pu t x r in region i t he predictio n o f th e e n semb le is: ∑ ∑ = k k k k k i w i w x f y ) ( ) ( ) ( r (1) Thus the co n trib ut ion of t he n etw orks with the h ighest weights will ha v e the greatest impact o n the value of th e output. Th e we igh ts are in i tially d ef aul ted to a value of 1. Whe n all the we ight s are 1, the outp ut is si m ply the mean of the output o f e ach n e two r k. A. Preliminary Testing This method of dividin g t h e input space into diff er ent regio n s is teste d on a sample dataset , c r eated using PR Too l s [8]. F i gu r e 1 show s a n ex a mp le of th e “Banana” dataset generated by P R Tools. Th e diff eren t m arke r s r ep r ese n t the two classes to be classif ied. The classif ier use d i s an e n semb l e of multi - l ay er percept r on (MLP) n eu ra l n etw orks. The dataset is d ivided into 150 and samples for training and 50 fo r t e stin g . Using diffe r ent data poi n ts fo r testing than training e n su r es that the generalizatio n ability of th e ensemble is t e sted an d that the classif i e r do es n o t o ve r fit to the trai n ing sa m ples. Figure 1 s h ow s th e diff er e nt r egio n s o f the in put sp a ce , separated by th e das h ed lines. The r egio ns are create d by separating bo th axes by th e median of th e feature. Ea c h netwo r k ha s a ve ct o r of weigh ts correspo n di ng to each regio n . These w eight s are adj usted during training. Fo r eac h sample, i f the n etw ork classifie s corr ec t ly , the r e lev an t weight is multiplie d by 1.2 other w ise it is multiplie d by 0.4. These values a re fo un d t o gi v e we igh ts t hat are co n strained to reasonable values. W h en the ensemb l e is tested, the output is th e n the weighted ave r age of th e output of each netwo r k, acco r di ng the w eigh ts calcul at ed. Fig. 1. Dat a used to demo nstrate the po w er of dividing the input space The acc ur acy of the ensemb le is tested 100 tim es t o obtain a mea nin gf ul average for fo ur diff erent we i ghting schemes . The t es t is r u n fo r n etw orks w i th n o w eigh ts (all weights are 1) a nd the decisio n i s si m p l y th e mea n of al l the outputs. A s mall improv ement is g a ined by giving eac h netwo r k one w eight co rr es ponding to i ts o verall perfo r mance. The n the input sp ace i s divided into two regio n s by only dividin g o n one featu r e with two w eigh ts, and t h e n f our r e gions . T he results are sh o wn i n table I, accuracy is th e n umb er o f corr ec t classifications ove r the total n umber o f test samples. TABLE I C LASSIFIER ACCURACY W I TH DIFFEREN T W EIGHTINGS Test Accuracy No we i ghts 82.92 % 1 we igh t 83.50 % 2 we igh ts 86.24 % 4 we igh ts 88.18 % These r esults show tha t th e perfo r ma n ce of an ensemb le is imp r ov ed by gi v i ng mo re stre n gth to t h e o utput of a netwo r k that ha s better a c curacy . Th e per f ormance of th e ensemb l e i s impr ov ed even further whe n the in put space is divided and w eigh ts are assigned f or each r egio n. T h es e regio n s need n o t divide the diff eren t classe s perfec tl y t o be effe ctive. Th e r egio ns in figure 1 are separ a t ed al o ng the median o f e ach f eatur e which p r ov es to be an adequate method fo r def in ing the regio n s. Th is t e st s h ow s th at th e divisio n s in the in pu t space n e ed not r ep r ese n t an y complex fe at ure of th e datase t. This relatively simpl e data demo n strates th e powe r of w eigh ting the diff erent experts in an ensemb le a s a functio n of the pos i tio n of t h e input . IV. D YNAMIC W EIGHTING The previo us section des cr ibe s an effec t ive meth o d of co m bining a number of n eural n etw orks into an e n semb l e w i th sig n ifica n tly improv ed performance . How e v er , the data in that exa mple – or more i mpo rtantly t he function t ha t generates them – i s statio n ary . We are inte r ested in data be in g draw n f r om a dy n amic envi r onme nt . In t hi s case, it is not suf fic i e n t to tr ai n th e n etw orks and co m b in e them i n an ensemb l e . The fac t o r s w h ic h ge n e r ated the traini n g dat a are unlikely to be pr es en t a t th e time o f t e stin g . The ensemb l e needs to adap t to cha nging condit ions. The co n ce pt of spati a l we i ghtings is extended fo r ti me series such that the we ightings c hange as a f unctio n of time [9]. The we igh t for each r eg ion i s updated after eac h sample. This is poss i b l e fo r ti me seri es in which, as each sample is r e ce i v ed, th e correc t output f or th e previo us sample b ec omes know n . Adjust in g the w eigh ts in thi s manner i s a pow erful meth od of i mplementi ng an adaptive mode l of a sy stem. At each time step, the ensemb le is updated w i t h out retraini n g eac h netwo r k. If we assume th at the facto r s gove rn ing a sy ste m vary slow l y w it hi n so me bounded space, an e n se m b le with dy n amic weights can retain i ts accuracy over time ev en as the sy stem changes. Such a model can adapt co n tinuous l y pr o vide d th e co n ditio ns o f the s y stem w ere en co un tered in training. Thus effe ctive adaptation is a chie ved, w it hout th e cos t of retraini n g the e n se mble. A. Platinum Price Case Stud y An ex a mple of a s y stem th at displ a y s th e characte r istics desc r ibe d above is th e pla t inum pric e. This i s a noto r iously diff icult sy stem t o mode l due t o th e vast range of fac t ors that impact i t [3] . How eve r, these facto r s ar e limited w ith in a reasonable sc ope. The price of plat i n um is used h e r e as an example t o il l ustrate the powe r of dy n amic weighting to make p r edictio n s i n a co m plex sy ste m. 1) Neural Netw ork Structure: Eac h expe rt in the ensemb l e is a MLP n eu ral n etw ork. The n etw ork takes as its i nput t h e cu rr e n t market tre n ds and the output is the predictio n f or the f ut u r e c h ange in plati num price . The inputs ar e th e prices of pla tinum, palladium , r adiu m , gold and Brent Crude an d the So ut h African Ra n d t o US D ollar exc h ange rate a s these a re conside r ed to be t he bes t indicato r s [3] . The data i s smoo t hed by taking we ekl y ave r ages and then normalize d by considering the percentage cha n ge fo r each we ek. Th e in pu t s are the changes during the previo us wee k. T h e corr ec t output is the percentage change in plati num price during the sub sequent wee k. Individual netwo r ks are trained by the M arkov Ch ai n Mo n te Carl o (MCMC) met hod [10] . The w eigh ts o f th e netwo r k are i n itialized randomly an d then adjusted in small random ste ps in an attempt to r educ e th e mean squa re error ove r th e set of tr ai n ing data. This e n su r es that th e full weight spac e of th e netw orks is explo r ed and thus in c r ease s the diversity and ge n e r aliz ation of the en se m b le. The netwo r ks are found, h eu r istic a lly , t o perform bes t with two hidde n n o des. The activat i o n funct ion at the hi dde n lay er is a sigm o i d, which is s h ifted t o constrain the output on the range o f [−1; 1]. O n th e o utput node s, the activatio n function, also a sigmoid , ensures that the output is in the range [−0.2; 0.2] as this is th e r ange of a c tual we ekly pr ice changes . 2) Ensemble : An en se mble is initially created w i th on netwo r k trained f or a s h ort amou n t of t ime – 10 epo chs – o n the full training datase t. This n e two r k beco m es th e initial be n chmark f or the e n semb l e. Th e accuracy of t he ensemb l e is measu r ed by ta ki ng t h e mean n orma lized square err o r ove r all th e samples . If n t is the co r r e ct output fo r input n and ) ( n y is the output o f the ensemb le, the err o r is: ∑         − = n n n t t n y N erro r 2 ) ( 1 (2) for N data points. If 0 = n t th e square err o r is used. It is neces sar y t o take t he per ce n tage error measu r eme n t so t hat an e n semb le does n ot appear to be per fo r ming we l l simply bec a use th e price changes slowly an d the abso lute err o r i s small. An ensemble is trained on a full se t of t raining data, 100 samples ar e suf ficie nt t o expo s e th e en se mble t o a la rge range of ma r ket f orces. At eac h iteratio n , a n ew r andom netwo r k is cr ea ted and trained on a subse t of th e traini n g data – 20 we eks i n thi s study . This give s th e n etw ork a chance to b ecome spe cial ized on a small po rt io n of the data. Then the netw ork is a d ded to the ensemb le. If this de creases the err o r over t he full training set, the n etw ork is r etai n ed, otherwise it is di sc ar de d. This method of selectio n leads to goo d gen eraliza t io n perfo r m an ce of th e ensemble . Training co n tinues i n this manne r fo r a f i xed n umbe r of it erations o r until some sto pping c r ite r io n is me t. The i nput sp ace is di v i de d in to fo ur regions acco r ding to the tw o features w h ic h are co n side r e d mos t signif i c an t. T h e exc h ange r ate and gold p ri ce s are used with th e divisio n along the zero line (which is close to th e median for bo t h fe at ures). Thus the sec tions are based on whether the value is in c r easi ng o r decreasing. The spec ific selection of regio n s is n o t impo r ta nt , prov ided that the input sp a c e i s divided f air ly evenl y . Ea c h n etw ork within the en se mble is assigned a we ight i n eac h r eg ion correspo n d in g to its perfo r mance o n sample s from that r egio n. The output of the ensemb l e i s th e we igh ted ave ra ge of t he output of each of the expe rts. The powe r of th e e n semb le co mes from the way th at th e weights of e a c h n etw ork a r e set and updated. F ollow in g training, an ensemb le is co n sidered to contain experts on the vari ed range of factors th a t imp a ct the m a r ket. Initia l ly each expe r t i s we i ghted equally by a vecto r of 1’s co r responding to eac h po r tion of the input space . In o r der to make accu r ate predict i o n s, t h e w eigh ts of th e netwo r ks must be adjusted fo r the current market situ a tio n . T hi s method relies on the a ss um ptio n that th e f a c t o r s in flue n c in g the pri c e of plat inum exist in a bo un ded space and va ry slow l y . Af t e r each sample be c o mes kn ow n the w eights are recalculated fo r the 10 previo us samples. It is not n ec ess a ry to retain the past input data , as eac h n etw ork’s output w i ll not c han ge . The most recent sa m ple is g i v en the most significa n ce as s how n in figu r e 2 . Fig. 2. Decre asing contri bution of ol der samp les to the weight The we i ght in each r egio n tends expo n entially towards 1 so th at if a w eight – positive or n egativ e – i s n ot be i ng co n tinuously r einfo r c ed, it will lose its sign ific ance. Th is preve n ts any weight f r om dominating th e ensemb l e after it has be come i rr elev an t. 3) Result s: Th e pe r fo r mance of an ensemb l e in predicti n g the futu r e price of platin u m i s measu r ed by equatio n 2. A n aïv e pr edic t io n wo ul d guess th at the r e i s n o change i n the pric e each w eek, leading to an err o r o f 1. This is the benchmark against w h ich any pr edic tion is co mpar ed. A n u m b er of t es t s a re r un t o compare the perfo r mance of diffe r ent mo dels. I n eac h case , 10 e n se m b l e s a re trained and tested; the av erage perfo r m ance is give n in table II. TABLE II P REDICTION ACCURACY W ITH D IFFERENT WE I G HTING SCHEME S 4 wee ks 10 wee ks 20 wee ks U n w eighted 1.1374 1.1155 1.0265 Static We ight 0.8602 0.9708 1.024 Dy n amic We i g h t 0.4461 0.5223 0.7416 The en se m b l e s w it h consta n tly updated w eigh ts (Dy n amic W eight) clearly outperfo r m the ensemb les which are un-we i ghted or statically w eigh ted. The statically weighted ensemb les are w eight e d a t the sta r t o f th e test period, b ut those w eigh tings r emai n f i xe d. Thi s giv es an adva n tage in th e short term, but over a lo n ge r t i me period doe s n ot imp r ov e th e pe r f orm ance at all. The results of table II are achiev ed by ensemb l es in w h ich eac h netw ork is trained fo r 20 epoc h s. Inc r easi n g this period to 40 epo chs imp roves t he pe r f orm ance of th e dynamically w eighted e n semb les to 0. 4069 over 11 we eks. This co rr espo n ds to a n average erro r of about 63 %, by taking the square root, w h ic h i s co n side r ed goo d fo r data o f this t y pe [11]. The pr edicte d change is plotted along with the actual c h ange i n figu re 3. This s h ow s t he accuracy of th e predic t ions , which are clo s e t o th e actual value s an d fo llow th e shape we l l. The wo r st pr ed ictions are on samples with th e largest ch a nge and in these cases, the predicto r make s a smaller predict ion in the correc t di r e ction. In additio n to accuracy of predictio n, it is significa nt that direction of th e chan ge is predicted co r rectly for each week i n thi s example. This is a usef ul pr o per ty for a fina n cial p r edicto r to ha v e. The proble m can be r e duce d to a binary classif icat io n o f w h ether the price will increase or dec rease during th e follow in g we ek. This m o del gen e rall y has an accuracy of abo ut 90 % f or the b i na ry c lassification. Fig. 3. Pre dicted and act ual wee k l y price ch ange V. C ONCLUSION Making predict i o n s o f c o mm o di ty pr ice s, such as pla ti n u m, is a diff icult and attractiv e task. Thi s neu r al netwo r k approach prov i des good pr edictions of th e platinum price. This is ac h iev ed v ia a mixtu r e of e xper ts w ith dy namic weights. The w eights attributed t o eac h n etw ork are functions o f both th e r egio n of the input and time. This is a nove l approac h whic h is we ll sui ted t o this co m plex sy stem, w h ich cha n ges o ver time. Th e e nsemb le i s able to a d a pt w i thout retraining w h ic h sav es bo th computatio nal time and memory . The r esults show th at this i s a po w er f ul m et h od of using neu ra l n etw orks fo r time se ri es p r edict i o n. R EFERENCE S [1] Georg D orffner. Neural n etwork s for t ime series pro cessing. Neura l Netw o rk Wor ld , 6:447 –46 8, 1996. [2] Madan M. Gu pta, Liang Jin, and N o r iy a su H omma. Static and Dyna mic Neura l Netw o rks . IEEE Press, 20 03. [3] Bekir G enc. Where i s p lat inu m hea ding? S chool of Mining Engineerin g, University of Witwatersrand, Johann esburg, South Africa. [4] Robi P olikar. E nsemble based syste ms i n d ecision mak ing. IEEE Circuits an d Sys tems Magazin e , 2006 . [5] Kagan T umer a nd Joydee p G ho s h. Analysis of decisio n bou nd a ries in l inearly combined n e u ral cl a ssifier s. Pattern Rec ognition , 29(2 ):341 – 34 8, 1996. [6] Robert A. Ja cobs, Micha el I. J ordan, Stev en J. N owlan, a nd Geoffr e y E. Hinton. Adap tive m ixtur es of l ocal ex pert s. Neu ral Comp utation , 3( 1):79–87 , 1991. [7] Nick Little stone and Ma nfred K . War m u th. Th e weighted maj ority algorith m. In f. Comp ut. , 108(2 ):212– 261, 199 4. [8] R.P.W. Du in, P. Ju szczak, P. Paclik, E. P eka lska, D . de R idder, D.M.J. T ax, and S. Ver zakov. Prto ols4.1, a matlab t oolbox for pattern re c o gnition. T echnical r eport, Delft Univ ersity of Technolo gy, 2007 . [9] Matthew Karnick, Metin Ahiskali, Mi chael D. Mu hlbaier, a nd Robi Polikar. Learnin g concept dri f t in non -st atio nary envir onments using a n ensemble o f cla ssifier s bas ed appr oach. I nternational J oint Conferen ce on Neural Ne tworks, 20 08. [10] Christophe Andri eu, Nando de Fr e ita s, Arnaud Dou cet, and Michael I. J ordan. An introducti on to MCM C for ma chine learnin g. Machin e Learnin g , 2003. [11] Chan Ma n-Chung, Wong C hi-Che ong, and Lam Chi - C hung. Financial ti me se r ie s forecasti ng by neural net w ork usin g conjugat e gradient l earning a lgorit hm a nd mul tiple lin ear r egre ss i on weight initializati on. T echnical re port, The H ong Kon g Pol ytechnic Univer s ity, 199 7.

Prediction of Platinum Prices Using Dynamically Weighted Mixture of Experts

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment