STORM - A Novel Information Fusion and Cluster Interpretation Technique

STOR M - A No v el Informatio n F usion a nd Cluster In terpretation T ec hnique Jan F ey er eisl and U w e Aic k el in Sch o ol of Com puter Science, The Universit y of Nottingham, NG8 1BB, UK { jqf,uxa } @cs.nott.ac.uk Abst ract. Analysis o f dat a wit hou t lab els is commonly sub ject to scru- tiny by unsup ervised mac hi ne learning techniques. Su c h techniques pro- vide more mean ingf ul repre sentatio ns, useful for b ett er u nde rsta ndin g of a problem at hand, than by lo ok ing only at the data itself. Although abundan t exp ert kno wledge exis ts in many areas where unla b el led d ata is exa mine d, suc h knowledge is r arely incorp orated into automatic analy sis. Incorpo ration of exp ert know l edge is freq uently a matte r of com b ining mu l tiple data sources from disp arat e hypoth etical spa ces. In cases where such spaces b elong to differ en t data t y p es, this task b ecomes ev en more c ha llenging. In this paper w e pres en t a n ov el immune-inspired metho d that enables the fus ion of such disp arat e t yp es of da ta for a sp ecific set of pro blem s. W e show that our metho d provi d es a b ette r visual under- stan ding o f one hy p othetical space with the help of data from an othe r hyp othe tica l space. W e b elieve that our mo del h as impl icatio ns for t he field of expl orato ry data analysis and kno wledge dis co very . 1 In tro duc tion The machine lear ning com mun i ty embraces t wo t yp es o f learnin g tha t en co m pa ss the ma jor it y of algori thms pr ese n t within this field . Sup er vise d le arnin g, whe re example s of d ata of in te rest exist, and u nsup ervised le arnin g wher e no ex pli ci t example s are av aila ble. When examp les a re prese n t, a decisio n fu nc t io n ca n b e found by explo iting t he knowled ge of such exa mples. O n the othe r hand, with out such k nowledge, only simila rity b e tw ee n da ta ca n b e exploite d in o rder to find groups of d ata tha t share some commo n attri but es [1]. The human immune s ystem has insp ired a n u m b e r of algor ithms th at fall in to these t wo cate gories [3] , y et it do es not si mply op erate only within these t wo real ms. Knowledge is emb ed ded within DN A p assed d own from ge neratio n to gene rati on, eventually tran sfor ming into biologica l en tit ies or funct iona litie s tha t provide ad dit io na l knowledge to what is lea rned durin g the lifeti me of a li vi n g b eing. One exa mple of such inh e rite d knowledge c an b e found withi n T oll -l i k e recep tors (TL R), pre se n t o n several t yp es o f im mune cells [4 ]. In th is w ork we will s how tha t an analog y o f T LRs pr ovides an insi gh t in to a third class o f learnin g tha t enco d es knowled ge tha t is no t within the sa me hyp oth etica l space a s knowledge enco ded within a tra inin g or testi ng datas et. Such a t yp e of lear ning b ecome s esp e cially usef ul wher e no lab elled exam ples exist, b ut so me knowled ge ab o ut classe s of interest is acknowledge d. W e b elie v e tha t such inc o rp ora ti on provides fo r b ett er u nde rstan ding of under lying dat a based on more than b lind functio n ap prox im atio n. In the re mainin g sect ions of this pap er w e first outli ne the fun ction ality of TLRs, follow e d by o ur hy p othesis. A des crip tion of the u nderl ying machine le ar n- ing alg or ith m is the n pre sented. A the or eti cal sp ecif icatio n of our StO rM mo del is th e n describ e d, o utlining a clu ster inter pre tat ion technique stem ming from our mo d el. This is follow ed by exp e rimental evid ence co nfir ming our h yp o the s is . 2 T oll- L ik e Rec epto rs TLRs are a se t of rece ptors on the surface of immune c ells which ac t as se n sor s to fore ign microbial pro duc ts. One intere sting asp ec t o f these rece ptors is tha t they ac t like p iano keys. A diffe re n t sound is play e d whe n a di ffer en t k e y or co mbi na ti on of keys is pre ssed at once. In a si milar wa y , w hen a single rec e p tor senses a c he mical, it res ults in a di ff er e n t actio n p er formed than whe n a n um b er of r ecepto rs sense v ariou s chemical s within a sp e cific tim e p erio d [ 7] . A simple d efinitio n of T LRs is tha t the y ar e the i nitial de tec to rs of path oge ns att acki ng a syste m. The y so und an alar m whe n they enc oun te r ce rtain vir us- or bacte ria-sp ec ific chemical s, which trig ger a cascade of e v ents p o ten tia lly resu lting in an im mune resp o nse. Unlike i n many ot her par ts of the immune system, all this i s p ossib le due to evolved knowled ge passed down from parents to of fspr in g o ver many gen era tions . F or a det ailed d escriptio n of re search in the area of TL Rs the read er is dir ected to [4]. 2.1 TLR s and Le arn ing The id ea o f T LRs for the purp ose of learnin g is a very simple one. I t is a dir ec t tra nsl ati on of the r ecepto rs’ fun ction ality within t he h u man b o dy . T LR is a signat ure or a funct ion, enc o d ing so me known tru th. A set of TLR s on t he othe r hand enco de a class of inter e st. Our hy p othesi s is tha t by formulating da ta from one hyp oth etic al space as a set of T LRs, w e will b e a ble to combine info rm ation from t wo disp arat e space s in a wa y tha t will g iv e us a b ett er und ers tan din g of the p roble m. T his f usion of info rma tion is esp ecially us eful in cases where s o me knowledge ab o ut d ata of in ter es t is k nown, y et li mited am ou n t or no exa mpl es exi st. V apnik [ 10] also realise d this missing area b etw een s up ervise d and un su p er- vised learni ng and pr op o sed a r elated id ea, whic h he terms “ mas te r-c la s s l ear n - ing”. In his w ork how ever, he pr op o ses an e xtensio n o f a s up ervised le ar ni n g setting, where a tr aining datas et b elon ging to spa ce χ, with lab els, is su ppl e- mented with an ad dit ion al d es cri pti on of t his data in another spac e χ ∗ . Th is desc ript ion of da ta is called “hidden info rma tion” , which can exist in the form of exp ert knowledge d escrib ing the under lying prob lem. V apn ik co m b ine s his mo del with his supp or t v e ctor machine alg or ith m and shows tha t a p o etic de- script ion [10 ] o f a set of images o f n umb ers provides mo re useful knowled ge f or learnin g tha n a higher resolution ima ge, which holds mor e “te chn ic al ” i nf or ma - tion ab out the underl ying di gits. In V apni k’s w or k a p o etic desc ript ion is a p o et ’s tex tu al dep iction of the u nderl ying i mage. V ap nik’s ai m is to improv e the c las - sificatio n p erfo rmance of sup ervised funct ion es tim at io n based on thi s “h idd en infor mation”. In contr ast , we pro p ose a metho d to fuse exp er t knowled ge (on e hyp oth etica l space ) with “technic al” inf orm atio n (anot her hypo the tica l s pace ) for the purp ose of unsupervise d ana lysis and vis ua li sa ti on for b ett er e xp lor at or y data ana l ysi s. 2.2 TL R Mo d el Using V apnik ’s nota tio n w e ca n formalise our T LR a nalog y . In sup ervised le ar n- ing a pa ir (x i , c i ) is giv en, where x i denot es a vector o f so me di me ns i ona li ty and c i denot es a clas s lab el. In u nsup ervise d lea rning o nly ( x i ) is g iv e n. In o ur mo del a tuple (x i , s i ) is given, where s i de notes a data s tru c tu re which enco des some ad dit io na l knowledge o r side info rma tion ab o ut a data instance . As o ur knowledge mi gh t b e li mited, s i mi gh t b e e mpty whe n no knowledge exists. Data in s i b elo ngs to a space χ ∗ , whi c h is rela ted to χ . By this w e mea n tha t there exists a m ea ning ful corr elat ion b etw ee n χ ∗ and χ . Wi th ou t such cor relat ion, w e can state tha t the k nowledge repr esented in χ ∗ is not descrip tiv e o f χ. In ord er to incorp orate exp ert knowledge as p art of a l earni ng mechanis m, w e pr op o se the use of an es tabl ished machine le ar ni ng tec h nique, which provides n u merou s featur es tha t are b enefic ial to our mo del . De scrip tion of this technique fo llows. 3 The S elf-O rgan ising Map Self-or ganisin g netw o rk algorith ms provide a num b er of mechanisms desirab le fo r many co mp u tat ion a l tasks. F eatu res such as manifold lea rning, di me ns i ona li ty redu ctio n, multid ime nsio nal scali ng as w el l cluster ing a nd vis ua li sa ti on thro ugh sel f-or ga ni sati on are all mechanisms tha t in c ombi na ti o n provide a s uitable bas is for the inc o rp ora ti on of o ur T LR ana logy and for it s b e tte r un der st an di ng . O n e t yp e of se lf-or ganisin g netw o rk is the Sel f-Organi sing Ma p ( SOM) alg ori th m de - v e lop ed b y T euvo K ohone n [ 5]. F or a de tailed desc ript ion of the algor ithm th e read er is referred to K ohonen’s e xtensi v e b o o k on this t opic [ 6]. It is i mp ort an t to note how ever tha t the SO M is only one of many t yp es of algorith ms tha t w e b e liev e our mo del can b e applied to. In general any top o logy prese rving and manifold lear ning al gorit hm could p ossib ly b e exte nde d in order to achieve a compar able outco me. SOM w a s chosen due to its simp licity , sp eed and visu ali- sation ca pabil itie s. 4 StOrM - TL R E nhan ced SOM 4.1 Mo d el Overvi ew In o rder to fuse knowledge fro m dis pa rat e hyp oth et ica l space s with the un su- p ervise d learning outco me of the SOM , we exte nd the original alg or ith m in a i n u m b e r of w ays. T hese can b e d ivided into t wo cate gories. Firstl y data fusi on and c orr e lati on is p er formed thro ugh an e xtension of the original SO M. Se co n d l y a cluste r int er p r eta tio n alg or ith m is devise d, explo iting t he ex tended SO M i n or- der to provide a b ette r visua l rep res entat ion of under lying dat a. F usio n of Hy p oth etica l Spa ces - In the original SOM , the alg or ith m is prese n ted with an inp ut x = [ ξ 1 , ξ 2 , . . . , ξ n ] T e J n (1 ) where ξ is a n attr ibute of t he input vector x. This is the “te c h nic al” inf or ma - tion on which the S OM is tra in ed. In o ur exp eri men t s t his v ec tor, fo r exa m ple , comprise s nor malised real-v a lued data desc ribing a tim e-ba sed snap shot o f b e- haviour of one ru nning p ro cess a ccord ing to a num b er of host based measur es. In our mo d el t wo ad di tio na l inp uts e xists. Fir st, i nput for t he sep arat e h yp o the ti ca l space , which is in the form s = [ ς 1 , ς 2 , . . . , ς m ] T (2 ) where s is a v e ctor, how ever this time compr ising of an arbit rary nu m b e r , m, of v ariab les ς tha t enco de insta nce sp ecific info rm ation relate d to x, from space χ ∗ . In our exp er iments this vector co mprises all API calls tha t the pro cess, whose sn aps h ot is e nco d ed b y x, imp ort s. Second, a vector e enco d ing our exp ert knowledge e xis ts, e = [ 6 1 , 6 2 , . . . , 6 m ] T (3 ) Here vector e can how ev er comprise no t only o f fixe d da ta, 6, but also of functio ns, 6, tha t exp ress t he e xp ert’s se t of knowled ge tha t is desired to b e obse rv e d and id en tified within s or x. In contr ast to s, vector e is a glob al, rat her than a p e r in stance vector. Re tu rn in g to o ur im m u nological analog y , e is our rep e rtoire of TLR recepto rs, where ea c h 6 is a n individua l rec eptor . In our exp eri ments e is a v ec tor o f stri ngs t hat is re pre se nt ati ve of t he ma j ority of API call na mes asso ciated with networking fun ction ali t y in the Wind ows OS. Each e le me n t 6 is re pr ese nta ti ve of a sub c lass of ne tw orki ng func tions (e. g. 6 1 = ‘http’ ). Once the o riginal SOM is p rese n ted with input x, it finds the most sim il ar pro totyp e v e ctor and its ass o ciated no de, a lso called the “winner no de ” c , k x — m c k = m i n { k x — m i k } (4 ) This no de along wit h a ll n o des in its im mediate nei ghbour ho o d is subs e- quently s ub ject to a learning p ro ce ss, o ver a p rede fined time p er io d, with a discre te ti me -c o o rdi na te t , m i (t + 1) = m i + h ci (t)[ x(t) — m i (t)] (5) Here the functio n h ci denot es the ne ighb ourho od func tion which det erm ines the am oun t by which a pro totyp e v e ctor m i is affected durin g t he lear ning pr o- cess. T his dep ends on no d e i’s di stance fro m t he “winne r no de” c. Gene rally , the following smo o thing kernel, wri tte n in terms of t he Ga ussian func tion, is u s ed, h c i ( t ) = α ( t ) · e x p k r c — — r i k 2 (6 ) 2 σ 2 ( t ) where α deno tes the lea rning r ate and σ define s the widt h of the kerne l. V aria bles r c and r i are lo cation v e ctors o f the win ner no d e c and cu rre n tly obse rv e d no d e i in the out pu t grid. F o r more d etailed ex plan ati on see Koho nen’s b o o k [6]. Once thi s learnin g ter mina tes, the alg or ith m prese nts a d iscrete regular gri d containing a low er d imension al rep res entat ion of the input which p reserves top ol- ogy o f the learned data. This gr id comprise s of no des i, with which r ef er en ce v e ctors m i are asso ciated, tha t hold the l earned info rma tion. In our mo de l an ad dit io na l refe rence vector , l i , exists. This vector lea rns in fo rm at io n a cc or di n g to the fo llowing add ition al com put ation ste p, l c (t + 1) = l c (t) ∨ Λ(s (t) , e(t ), x (t) ) (7) where Λ i s a m a tching functio n tha t ev alua tes w hich ele ments of s a nd x satisf y condit ions sp ecified within e . In ot her words this functio n e v aluate s whic h TLRs hav e b een ac tiv a te d for the c urrently obse rved winner no de c. Op erato r ∨ is a Bo ol ean op er ator on e lements o f l a nd the out put of Λ, i.e . l c learn s whi c h known tru th has b een obser ved by a winner no de c o ver the dur ati on of the SO M learnin g pro cess. T he res ult of this ad dit io na l step is tha t desire d kn o wle dge from the se par ate hyp ot het ica l space is co rre late d with the p ro duc ed map. Thi s corr elat ion is e xhibited by the en hanced SO M out put co n taini ng no des i whi c h hav e t wo asso ciated r eferenc e v ec t o r s . One “te c h nic al”, m i , fro m χ and one o f co rre lat ed exp ert knowled ge, l i , ext ra cte d from t he sep arate hyp oth etica l spac e χ ∗ . I n our e xp er iments l i learns w he ther nod e i has e ver b een d eemed a wi nne r for so me input x asso cia ted with a pro cess th at uses W indows netw or king API functio ns. T he e nco ding o f inf o rm at io n from χ ∗ can now b e used , fo r exa m pl e, for cluster inter pre tat ion an d lab elling. In o ur exp erime n ts explo ited to del inea te a cl uster o f no d es re sp onsibl e for netw orki ng b ehaviour. It is im p ort an t to no te tha t this info rm ation can howev er b e exploite d in other w ays to enha nce the out put of t he SOM. F or examp le by affecti ng the actual SOM l earnin g fun ction , in orde r to include info rma tion from χ ∗ in the act ual m ap genera tion pro ce ss. Cl ust er In t e rpr et a tio n - Due to top o logy p reservi ng natur e of the SOM , newly intro d uced e xp ert k nowledge ca n now b e used t o i dentify cluster s o f in- terest w it hin the out put map. Our pr op o sed cluster inter pre tati on tec hn iq ue comprise s of t wo steps. Firs tly a n esta blish ed a lgo rith m called the Uni fied D is- tance M atrix (U- Ma trix ) [9 ] is exploited in order to find nod es which p os sibl y lie o n cl uster b oun dar ies. This info rma tion is sub se que ntly used by a step whic h connect s no des with si milar TLR inf or m ati o n, l i , i n o rder to delinea te clu ster s of in ter es t. Cluster B ound ary Se ar ch - T he cluster b o un dar y sear ch alg or it hm e xp lo it s an idea inc o rp ora te d within the U-Matri x vis ua li sa ti on tech ni que . Thi s me tho d shows dissi milari ties b etw een neigh b ouring nod es in order to high ligh t where p os- sible cluste r b ou ndaries lie. I n or der to find no des which lie on such b oun dar ies, w e prop o se to c ollect inf o rm a tio n ab o u t al l i n te r-no d e d istances alo ng b oth i and j dime nsion s. Onc e this inf orm atio n is ob tai ne d, it is sub jec ted to a v aria bilit y functio n wh ich identifie s di stance s b etw een nod es tha t signific antly dif fer from distance s b etw een the ma j ority of no des on the map. T his functi on is sub ject to further future resear ch, how ever here we p rovide some p ointers o n how i t mi gh t op e rate and how it is e mpl oy ed in our exp e riments. A w e ll tra in ed SOM ma p can b e tho ug h t o f a s co mprisin g o f cluster s existin g wit hin a datas et on whic h it is tra in ed. A care ful exa min ati on of p rop or tion of clusters versus in ter -c lu ste r no de s ne eds to b e p erfo rmed in o rder to de ter min e such p rop o rtion cor re ct ly . An exa mple of a qu anti tat ive measur e tha t could b e use d to rep re se n t such rat io can b e found a s follows. Assu ming 2 5% of no des w it hin a ge ner ate d map are inter-cl uste r no de s, w e define any inter- no de dista nce a b ov e the 3r d quantile o f all inter-no de distances as l ying on a cl uster b oundar y . Thus w e can lab el all no de s whose dis si mi la rit y is ab o ve the 3 rd q uan til e thr es hol d, as b e ing a clu ste r b ou nd ary no de. L ab el ling using No d e Con ne ctivit y - Once w e ob tain clust er b ou nda ry info r- mation, we can use our lea rned T LR knowled ge in or der to lab el clusters tha t exist w it hin t he SOM map. In ord er to achiev e this, the pres erv a tion of top olog y within a SO M map is explo ited by exploring neighb ouring no des w it hin a seg - me n t of a map, d elineated by b o un dar y no des. T he lab elling alg or ith m tra v er se s all no des m i within o ur map and connec ts no de s which l ie within a nei gh b ou ri ng region. A regio n i s usuall y b ounded by the previo usly fou nd b ou nd ary no de s. Once all p ossib le no des a re connected, a connected regio n i s ev a lua ted for the most fr equently o ccur ring ac tivate d TL R t yp e. T his info rm ation then pr o vid e s a lab e l for a ll nod es w it hin such c onnect ed r egion. A n e xample map where the result of the se ste ps can b e see n is i n Fig. 1(c ). In the lit era tu re othe r SO M cluster inter pre tat ion techniques exi st. T hes e techniques exp loit v a rious ad dit io na l machine learni ng metho d s to ac hie ve the ir goal. F or exa mple a t wo- sta ge pro cedure, where SOM out put is fed into a tra di - tional clusteri ng techniq ue, such as k-mean s [11 ], or hierar c hic al clusterin g [12 ], ev a lua te d with t he help of numerou s cluste r v alid ity indice s. S imilarl y to ou r w or k B rugger et al. [2] also exploit the top og ra p hic surfa ce of t he SOM, ho w ev e r with the help of an alg or ith m called clusot, rat her than the U-Ma trix me tho d used in our w or k. It is i mp ort an t to not e here tha t our mo de l is not onl y a c luster in ter pretat ion or lab elling technique. Our mo del provides a metho d for corre latin g data from disp arat e sour ces, which in this p ap er is used to i dentify a nd su bse que ntly lab el cluster s of inter e st, w ith ou t tra di tio n all y lab el led data . T he mo d el can ho w ev er b e used for other purp oses which co uld b e nefit from the ex plo ita tio n of the data f usion tha t is the res ult of our T LR fun ction ali t y . As mentioned b efor e, for example , the SOM learning fu nction can b e affected to take into ac cou n t data from the se para te hyp oth etic al space . This is one o f our future resear c h g oa ls . 5 Exp erimen ts Two e xp eri men t s w ere p er formed in ord er to v alidate our prop o sed mo d el. On e to v alida te the StOrM mo d el and o ne to pre se n t it wi th a more co mplex d at aset. Da tas ets co mprising o f χ an d χ ∗ neede d to b e ch o sen. As χ ∗ co mprises of exp ert knowledge which c orrela tes w i th χ, such exp e r t knowled ge had to b e found. Behavioural anal ysis o f runnin g p ro cesse s w a s chose n a s the ta rget d omai n and disc rimi natio n of networking app licat ions as a proble m area . Ab un d an t e xp er t knowledge exists in this d om a in . T e ch nic a l D ata - Beh avio ur of Ru nni ng Pro ces ses : In o rder t o c ol lec t “techni cal” data, χ, Micro soft Perf or ma nc e Counters API [ 8] w as used. The following seven pro cess-sp ec ific att ribu tes and one syste m wide attr ibute w ere selecte d to b e mo nitor ed: IO Write Oper at io ns /s e c, IO R e ad Op e r ati on s/sec, IO Other Oper at io ns /s e c, IO D ata Oper at io ns /s e c, % Pr ivile ge d Time, % Pr o c esso r Time, % User T ime, Da tag r ams Se nt/ se c . This set of ei gh t featur es yield s the ability to ob serve b ehaviour o f run ning pro cesses b ased o n their I/O act ivit y , CPU and netw ork usa ge on the W indows O S. F or detailed exp lan ati on of eac h at trib ut e, the reader is referr ed to [8]. T he da ta w as norma lised and tra ns for med into an 8 -dime nsional i nput fea ture vector, x. TL R K n ow l e dg e - St ati c An al ysi s of Exe cuta bl es : As w e d esire to disc rimi nate b etw een p ro ce sses tha t p erfo rm net w o rkin g activity and no n - n e t- w or king appl icat ions, suitab le exp er t knowledge fro m χ ∗ had to b e chosen. A set of W indows API ca lls used f o r netw or k co mmu nic ati on in Windows OS was se - lected fro m MSD N l ibrar y [ 8]. This library is a reso urce where e xp er t kn o wled ge on v ariou s W indows sp ecific libra ries is pre sented and ca tegorised acc ording to v a rious s yste m funct ions. The following se t of str ings, re pre se nta ti ve of n um er - ous API calls, w as c ho sen fr om a set of libra ries tha t are used for net w orki ng within the op e rating s ystem: Int erne t, Ftp, Http, WinHttp , WSA, Rpc, Uu id, Dns, Dhcp , Ne tbios, Net, Snmp, WNe t. These str ings, which rep re se n t more th an 90% of Windows ne tw ork ing funct ions, w e re selec ted as our TLR knowledge and enco d ed in e. Static binary a nalysis o f r unni ng pro cesses w a s t hen p erf or me d, in or der to ev al uate which API calls each p ro ce ss i mp orts. T his info rma tion w a s su bs eq ue ntl y t ran sf or me d into the inpu t featur e vector s, re pre se nti ng API c all s pr ese n t in a n ex ecuta ble. 5.1 Res ults Exp eri men t I - In the first exp eri me n t t wo ru nning pr o cesses w ere obs er v e d for the dur at ion of app roxim atel y 150 second s. Na mely the edito r Notep a d and 0 5 0 5 N e i g h b o u r d i s t a n c e p l o t 8 6 4 2 V 2 V 6 V 1 0 ( a ) C V o 4 m V 8 p o n e n t s ( b ) U - M a t r i x V 3 V 7 V 1 1 N e V i 5 g h b o u r V 9 d i s t a n c e p l o t N e i g h b o u r d i s t a n c e p l o t M S N M S N M S N M S N M S N M S N M S N M S N M S N 5 5 5 5 5 5 0 5 5 5 5 5 5 5 5 5 0 5 5 5 M S N M S N M S N M S N M S N M S N M S N @ @ @ 8 5 5 5 0 0 5 5 5 0 5 @ @ @ 8 5 5 5 0 0 5 5 5 0 5 M S N M S N @ @ @ 0 0 0 0 0 0 0 0 5 5 @ @ @ 0 0 0 0 0 0 0 0 5 5 N p a d N p a d M S N M S N M S N @ @ @ 0 0 0 0 5 6 5 0 5 0 0 @ @ @ 0 0 0 0 5 6 5 0 5 0 0 N p a d M S N M S N M S N @ @ 0 0 0 0 0 5 0 0 5 5 @ @ 0 0 0 0 0 5 0 0 5 5 N p a d M S N M S N M S N M S N M S N @ @ 0 0 0 0 4 5 5 5 0 5 5 @ @ 0 0 0 0 4 5 5 5 0 5 5 N p a d N p a d M S N M S N @ @ @ 0 0 0 0 5 0 0 0 0 5 @ @ @ 0 0 0 0 5 0 0 0 0 5 0 0 0 @ @ @ @ 0 5 5 N p a d M S N @ @ @ @ M S N M S N 0 0 0 5 2 0 0 0 0 0 0 5 0 5 5 2 N p a d N p a d N p a d M S N 0 0 0 0 @ @ @ 0 0 0 0 0 5 0 0 0 0 @ @ @ 0 0 0 0 0 5 N p a d N p a d N p a d N p a d N p a d M S N M S N M S N 0 0 0 0 0 @ @ 5 0 5 0 0 0 0 0 @ @ 5 0 5 ( c ) S t O r M ( d ) S t O r M + l a b e l s Fig. 1. SOM and StOrM results for Exp erim en t I - In (c) and (d) th e “@” sig n denotes no des on cluster bo und aries, lines con nec t no d es belonging to cluster o f interest and num b ers show the amoun t of TLRs fl agged during traini ng. messagi ng cli en t MSN Li ve Me ssenger. T hese t wo appl icat ions w ere chosen due to their differ ence in terms of netw o rking func tiona lity . In t his exp eri me n t we w an t to show tha t by inc o rp ora ti n g extra knowled ge as p art o f the SOM al go - rithm, w e can identify wh ich cluster wit hin the SO M out put corr esp ond s to net- w or king ac tivity a nd thus identify t he M esse nger ap pli cat ion. If we provide an y machine learni ng alg or ith m with o ur “techni cal” data th en we can generate a set of cluster s, bu t withou t lab els, and ther efore w e ar e u nable to de ter min e whi c h cluster b e longs to which ac tivity/p ro ces s. Usin g our en co ded T LR info rma tion w e can provide enoug h info rm atio n in o rder to help us dist ing uish b etw een c lus - ters tha t denot e t he activity of i nterest and cluster s tha t are ir rele v an t to ou r pr ob le m. Results of t his exp eri me n t c an b e se en in Fi gure 1. Figure 1 (a) sho ws com p o- nen t pla nes [6 ] of the SOM. T his is a sta nd ar d metho d fo r visuali sing r efe r enc e v e ctors of each no de in the map. E ach com p on e n t (att ribute ) is rep resente d as a pie slice, wher e the size shows the ma gni tud e of this attr ibute in a given no d e. T able 1. Exp erime n tal Resul ts N o des ( i ) Winne r N o des (m c ) Qua n ti t y Mean Net. Le v el Ex p erime n t T ota l TLR.on TLR .of f TLR.on TLR .of f I 100 35 14 0.9 873 0.0 029 I I 400 137 131 0.3 151 0.1 830 Figure 1 (b) shows the map usi ng the U-Ma trix [9] m e tho d . T hese t wo visual isa - tions a re sta nd ar d metho ds for pr esenting the out put of SO M. Even t hough w e can see tha t p ossib ly t wo c lusters exist i n o ur data , it is di fficult to dist ing uish which one co rresp onds to netw orki ng a ctivity and th u s M essenger . With ou t la - b els and m uch un de rsta nd ing of t he da ta, suc h disc rimi natio n is di fficult. On the other ha nd with the help of o ur StO rM mo del and the inc o rp ora ti o n of so me exp er t k nowledge, w e can au to ma ti cal ly deli neate t he cluster of inter e st, seen i n Figure 1(c ). T his c onnected set of no d es hig hlights a cluster re pr ese nta ti ve o f the behaviour of t he Messenger ap pli cat ion. This result ca n b e v al ida te d ag ai ns t a map with lab els, show i ng the lab elled true c luster regi on in Figure 1( d). The exp eri me n t has b een run te n times, yieldin g the following resu lts, s ee n in T able 1. Out of a total o f 10 0 no d es, on a verage 35 % of no d es w e re wi nne r no de s, co rre late d with giv en exp ert knowledge from χ ∗ . F o ur tee n p erc en t of all no de s were winner no d es th at had no co rrelatio n at a ll. In ord er to confir m the corr elat ion corr ectness w e l o ok at the mean netw or king level for co rrel ate d ver- sus un co rre late d winner no des. W e p erform t his ca lcula tion as w e are in ter este d in fi nding groups or c lusters o f no des w h ich re pr ese n t app lica tions denot ing net - w or king b ehav io ur. In this exp eri me n t co rrel ate d no de s, all id entified co rre ct ly as b elongin g to M essen ger, ha ve an average ne tw orkin g le v el of almo st 99% of the total netw o rkin g activ ity pr ese n t dur ing the exp eri ment, whereas un corr ela ted no de s, all b e longing to “ Not ep ad ” hav e on av erage b elow 1% . F ro m the a b ov e analysis an d the f o ur figures it ca n b e se en tha t our StO rM mo del p rovides a w ay o f cor relat ing exp er t kno wled ge with sta nd ar d “t ec hni cal” infor mation for the p urp o se of clus ter ide nti fic at io n and lab el ling. T his cor rel a- tion helps wit h the ide nti fic at io n of da ta or cluster s of intere st which, with out lab el s, w o uld other wise b e diffic ult to id e n tif y . Exp eri men t I I - In o rder to assess o ur StOrM mo d el on a mor e c o mp l ex datas et, a lar ger num b er o f r unning pr o cesses w ere monit ored a nd su bse que n tl y analyse d. I n total 33 runni ng app licat ions w ere monit ored d uring a ses sion o f sta nd ar d use o f the host ma c hine fo r app roxim ately 200 se co nd s. Figure 2 and T able 1 show results of t his e xp er imen t. With a more co mp le x datas et it is mo re difficult to inter pret result s us ing the sta nd ar d techniques se e n in F igures 2(a) and 2( b). It is p ossib le t o deduc e tha t a num b e r o f cl usters exi st within the underlyin g d ata, how ever which of those are the c lusters of in te r e st is very diffic ult to discer n. With the help of our St OrM m o del how e v er , the un- 3 ( a ) C o m p o n e n t s de rst an di ng of the SOM out put b eco mes m u ch ea sier, this can b e see n in Fig u r e 2(c) . Co nnected regio ns of the map clearl y h ig hli g h t a cluster of no des deno tin g app lications tha t exhib it netw or king activi ty d ue to their impl em entation of net- w or king functions pre se n t in the W indows OS. Lab els in Figure 2(d) show tha t ap pli ca ti on s that one w o uld in t uitively re gard a s having netw o rking fu nc ti on - ality ar e group e d withi n t he connec ted regio n and non -networkin g app lica tion s lie outside o f this cluster . This is true for mo st c ases with a ha ndful o f ex cep - tions. These are att rib ut ed to the fact tha t such appl icat ions either do no t use Wind ow s n e t w orki ng functi ons o r ha ve not b een ac tive duri ng the sessio n. T o v a lidate t he co rrect del ineat ion of the netw ork ing cl uster, quantita tive ana lysi s w as aga in p erformed on c o rr el a t e d v e rsus unc orr ela ted no de mean net w orki ng activity . In this case t he dis crim ina tion b etw ee n the t wo groups is not a s app ar- en t as in exp eri me n t o ne. This is d ue to t wo rea sons. The netw orki ng attr ibute used in our “te chnica l” data i s a global mea sure, rat her than a p er pro ce ss sig nal . Second ly no t all app licat ions use Windows net w o rki ng function s for com muni - cation. Again 10 r uns have b een p erformed and anal ysed. The pr o duce d SO M out put contains 4 00 no des, out of which 137 w i nner no des hav e co rre lat ed exp er t knowledge and 13 1 w in ner no des hav e no cor rela tion. The av era ge net w orki ng activity for co rre late d no des (3 2% o f total netw or king ac tiv ity) is ap prox ima tely 14% higher tha n tha t of un co rre lated no des ( 18% of total netw orkin g ac tivi t y) . This result again confir ms tha t the cluster hig hlighted by the StO rM m o del delinea tes no des which are re pr es e n t at ive of the cla ss of in ter e st. 6 Con clus ion s In this w o rk w e have pro p osed a nov el im mune -i nsp ire d idea tha t provides ne w p ossib ilities for k nowledge disc ov ery a nd ex plo rat ory data a nalysis. T he p rop os ed StOrM m o d el inc o rp ora te s an analo gy of the so-cal led T oll-like recep tors from the human i mmune s ystem. T his mod el pr ovides an ins igh t into a new cla ss of learnin g where ad dit io na l knowledge c an b e fused wit h tra dit ion al “t ec hni cal” data. This add itio nal inform atio n do e s not need to b elo ng to t he sa me h yp o- the tic al space as knowledge enco ded within a trai ning or testing dat ase t. T he prop osed mo d el i s exp lored w i th t he help of t wo e xp eri me n t s , grou nded wit hin b ehavioural anal ysis o f run ning pro cesses o n a host syst em. T his exp eri me n tal N e i g h b o u r d i s t a n c e p l o t N e i g h b o u r d i s t a n c e p l o t 6 6 2 1 2 2 2 0 2 1 1 1 1 1 0 2 0 0 0 2 2 0 1 2 2 0 1 2 2 1 0 1 1 0 0 0 2 2 0 0 0 0 1 1 1 2 0 0 0 2 2 0 0 0 0 0 0 0 0 0 0 0 5 5 2 1 1 1 0 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 2 0 1 2 2 1 1 1 1 1 0 1 1 0 0 0 0 0 0 0 2 2 0 2 2 1 1 0 0 0 0 0 0 0 0 2 0 0 0 0 1 0 2 2 2 0 1 1 1 1 1 0 0 2 0 1 0 0 0 0 M a p p i n g p l o t G o v v s s m m o a j e e u u c v v g t s s n n t i a a a c s s m m l 2 f m e e e m c c m r r e a c c n t t U h h s d d v d o h h h r s s s s a e e s s p c h h h x n p p p p p p t t d d G h d x c c q c c c c r r a a o o o f a o o o o o p p t v o h h h i l l o o o o r l g g t t t t m o t a o o o a e l l l j o e s s s s u c v g y s s s t s c c n r v v v v t i s c m t t t l 2 f h m o o e e c m m v v e o l l n t U h h s m d v o d s j a s o o o o u p h v x t n s t n d s s s s d m x c m e t t c a p v n t h s t m d a e j a e s u c h v t s t n d G t i s m c 2 f m e m p o e n t h s d o v a o a e s c h x g t n t d t i c x c l 2 f e m m m p v e U m v o j u p v x n s n s d m x m e c a n t h s t d e a e s h t d c a a p l g g g g g g g g g g g g g g s v c h o s s s v v t c c c c c c c c c c c c c h h o s s v t t c h o s t a l g c t f m v v m m o j u v n n n s m e c n n n n n n n n n t h d d i a a e e h h s s t t t d x c c m p p p l s s s s s s s s s s s s s s s o s s s s s s s s s s s s s s s r s e m P s s s s s s s s s s s s s s s s a s s s s s s s s s s s s s s s s i n s s t m D v s s m o j s s u v v t n N s m m e c e n n n n n t h t d j j a a a e u u u h v t t t t d s s s s s s s s s s s s s s s s s s s s s s s s s p m c c c i p n n h g a e c t d t f m o n s e s s a e r a c w r r s s s s s s s s c c m h h p v p v v p v v p G h h q c c c c c c r a o o o o i f f p o t o v v h h h h h l i i l o o o o r r g l r t m o t t a a o o o o o o a l l l l j o v e e s s s s u c c v v g y y s s s s s t s s c n n r r G v v v v t t i s c m m t t t t t t l 2 f f h m e o e e c m m m v o e o l n n t U h h s s m d d d d o v a d o o s j a a e s s s o u c p h h v v x g t t t n n s t t n d G t f i s s c d m m x c c l 2 m e t c m m a p p o e n n U U U h s s s t d v a o d d e a e s e c c p W h x t n n n s t t d t t i a c d x c l 2 f f f e e i m m m m r a p n e a c U t d v o o o d h r e s s p c h x o n n n p p v h d x q w c r a o f a o t v h l i o s r g l t t m t a o S e l j o e s u v v y s s c n e r v s m t h m o e a c v o l n t r h h s m d c s j a a e s o u h v h t s t t n d s s m c m e t c p n t h s d s a e s h v t d c c a p h l g o s t a l g a l g c t f m o n c m v v m d j u v v n n n n n s m e c v n n n n t t h m d d d j a a a e u h v t t t t n d s m c c e c p p p v n t h m d a e h t n d c e p t d h c p W i n s s d w e e o s u a a w e a r r a c c s u h h r S s s s s c c h h p p v p v p e l h q t c c r a o o a f o t h h i l o o r r r l g t t a a c o o l l o s e s s h y y s s v c c r G v v t t h c a o o o v h o l l l l h h m o o a s j o o u c v g t s t s n G t i s s c m t l 2 f m e e t c m m o e n t U h s d o v o d a e s p h x g n t d G c d x c l e m a p o U t o a d e c p g t G t i c d l 2 f e m m m a o e U t o v a o e c p x g t n t i c d x l 2 f e m m m a v v e U t m v o d i e j n u p v x n s n G f s d m x o m e e c a o v c n t t h s t m d d o a a e j a e s u c h v g r t s s t n d d d t i s c m c l 2 f e e e e c m m m p e a a a a a n t U h d v o r r r r a e p c c h x n s t d h h h d x c e f f f a p a i i l l t t t r e e c r G h h h f o o o o i l o t s s e g t t r l h e o U a s p t t i d 2 a e t v a e x t i x 2 e v x x c m d c m d a t i c c 2 m e v d d p x y x t h o p p n y t h o p p n y t h o n v m w W W a r i e n n s − d d d w e a o o s u u a u w w e a a r t a c s s s h u u s h r S S s d e c c c h p p s e e l l a h q t t e r o a a r f o t a i o c r r l t t c c h r l o s s e s c h h h h h y p v p c r v h h c r a o o f o t t t h o l i l o r l h t t a o s l o e o s t y s c r v s t h o t o l h s s o t v s c a t h l g o s s v G t c a a o h l l g g g o a a c c s t s t t t i i c t t t l 2 2 2 f f f e e m m m m e e a U s v v d o o o r s e p c h x x x n n n p a h d x x x q o r f a t i o c r l t t a h e l e s h y p r v q h r o t o r t a s o s s t y p s c e o o o o o o a o o o h r l l o s s c s G v v s h e t f o o o a i l o o t r e c g g r G h l l h e e f o i U l t s e p p g g g t r d l h e a a o U t t a s e e p t t G i d 2 a o o e t o o o o o o o o o v e x g x l e U p d a t e p y t h o p p n y t h o p p n y t h o n C i i n l a f o o m c T a W r r d d d d y i n d w w o v u u w m a a s u u s w S e c c W a s e G l l a t t e r a i r e o v n a c r s − m o d c a h r j s s e u c c c a h v h g o t p p p p s s n n t i a s c h m u q l w 2 f c r m m m a o o e e c m m m r f o t t e h a l i o o c s n h t U r h g l s s t d d d d t v a o o h r S l l o s a a d e e s s e s s p c h h x y s n p p s c t t e d r v v a h d x c q t h e r o o o a r f a o p p t o a i l o c r l h t t t t c s h r e l o s s s s s e o s c h t y y y p v p p v p p s c r v s h h c c c e r a o o o o t f o t t t h h h o a l i l o o o r l h t t a o o o s r l l l o s e o s s s c h t y s s s p s c r v v v s h q t t t h e o o t f t o a i l o r l h t a s r l s e o s c h t y p r v s h q h o t f t o o i r l s t a s l s e e s t y p r v a h o r o o c s s s h l e s h t p v a a q r r o t c c r s t a h h o s e h y p p s c a q e r o o r o t a l o c r h t a h r l o s o s c h y p p c c v s h q r o o t f o t i l o o o r l h t t a l o e o s y c r v s h o t t o l h s o t s t p y t h o p p n y t t t h o p p n y t h o p p n y t h o n s e a r M M c s h e O O i n a M M d r c e h x i e n C r d l a e m x e C T i r n l r a v a f o m m y c T w a W r a d d d r y i e n s − d e a o s a u w e r t a c s h s h r S s d e c h p p e a h q r o a r f o t i o c r r l t t a c h l o e s h y p c r v h r o o o l h s t s o e o p t c a s y o t t t r h l c h s o h e o p n p a s y r t t r o h c s t o h o e n p c a r o r o l c h t h o o p c s r o t o l h t o o c s o t l h o s t W i n S C P p y t h o p n y t t t t t t t h o p p n y t h o n 4 4 0 0 1 2 0 2 2 0 0 0 0 0 0 2 0 1 1 1 0 @ 0 R T H D C P s s e L a r M M c h O i n M i n d f e o x c e C a i r n l r a f d o m c T a W r W d d d y i n L d L W o o W w g i n L i s n d S L P W o o e r w g a o i n i s r x n d c S y P W h o e r w a o i n s r x d c S y W h o e w a i n s r d c S h o e w a s r c S i e h e x a p r l o c h r v v e m w a r e − a u t h s s d e r v i c e s p y t h o n 0 0 0 @ 2 0 0 0 0 0 0 0 0 0 2 0 1 1 1 1 1 C C C R T H s e D a C r M c P h O L i n M d M e x O e C M M i r n l a f o m c C T a i n l r a f d d d o m P y c a T a i r n d d d t y D o t N e t v m w a r e − a u t h s s d e r v i c s e r r r r v v i c s e e s s r v i c e s 0 0 @ 0 @ 1 2 2 2 0 0 0 0 0 0 2 0 @ 1 @ 0 0 0 1 C s s e C R a C T r r c H h D i n C d M P e x O L e M r C l a m C T l r a a m y C T l r a a a m y y C T l r a a m y T r v v a m y w a r e − a u t h s s d e r v i c s s e r v i c s s e e e r r r r v v v i i i c c c c s e e e s s s r v i c s s e r v i c e s 3 0 0 @ 1 1 0 2 2 0 0 0 0 0 0 0 0 @ 0 @ 0 @ 1 1 1 i e x p l o i i e r e x p l o r C C e C R C T H s s e e D a a C r r c c P s h h e L i i n a d r c e e s h x x e i e n a r d r c e h x i e n r d e x e r v m w a r e − a u t h s s d e r r r v i c e s s e r v i c e s 2 0 @ 1 1 0 2 2 0 0 0 0 0 0 @ 0 @ 0 @ 0 @ 0 @ 1 @ 0 @ 0 @ R T H D R R C T P H L D C P s s e L a a a a a a a a a a r c s h e i n a d r c e h x i e n r d e x e r M O M M P O a M i n M t D O o M t M N O e t M P a i n t D o t N e t s e r v i c s e e s r r r r r v i c s s e s r v i c e s 2 0 @ 0 @ 1 0 0 0 0 0 0 0 @ 0 @ 0 @ 0 @ 0 @ 0 @ 0 @ 1 @ 0 @ 0 @ S k y p e R i T e H x p D R l o C T r P H e L D C i e P x s p e L l a o r r c e s h e i n a d r c e h x i e n r d e P x e a r i n t D o t M N O e t M M O M M O M P a i n t P D a o i t n N t P D D e a t o o i t n N t D e e t t o t N e t s e r v i c e s 2 2 2 2 @ 0 @ 0 0 0 0 0 0 0 @ 0 @ 0 0 0 0 @ 0 @ 0 @ 0 @ 0 @ 0 w i n l o g o n i e x p R l o T T r H e D C P L w i n l o w w g o i n n l o g o n P a i n t D o t N e t C C C C C C P a i n t D o t N e t s k y p e s P k y M p e s P k y y M p e s P k y M p e s s P k y M p e P M s e r v i c e s p y t h o n 2 0 2 2 0 @ 0 @ 0 0 0 0 0 @ 0 0 0 0 @ 0 @ 0 @ 0 @ 0 0 w i n l o g o n S k y p e S k y p e i e x p l o r C C e C C C C C C C C s k y p e s s P k y y M p e s s P k y M p e s s P k y M p e s s P k y M p e P M n o t e p a d + + e x p l o r e r 2 0 2 2 0 @ 0 @ 0 @ 0 @ 0 @ 0 @ 0 @ 0 0 0 @ 0 @ 0 @ 0 @ 0 @ 0 @ 0 w i n l o g o n S k y p e S k y p e C C C C C C s k y p e s P k y y y y y y y y y M p e P M n o t e p n n a o d t e + p + a d + + e x p l o r e r 2 0 @ 2 2 2 @ 0 @ 0 0 @ 0 @ 0 @ 0 @ 0 @ 0 @ 0 @ 0 @ 0 0 0 @ 0 @ 0 1 1 w i n l o g o n S k y p e S k y p e S k y p e c s r s s n o t e p n n a o d t t t t t t t e + p + n a a o d t e + p + + n a o d t e + p + a d + + e x p l o r e r 0 0 @ 0 @ 0 @ 0 @ 0 @ 0 0 0 @ 0 @ 0 @ 0 @ 0 @ 0 @ 0 @ 0 0 0 0 0 c s r s s c s r s s l s a s s l s a s s l s a s s n o t e p n n a o d t t t t t t t t t t t t t t t e + p + a d + + e x p l o r e r 1 1 1 @ 1 @ 1 @ 0 @ 0 @ 0 0 @ 0 @ 0 @ 0 0 0 @ 0 @ 0 0 0 0 0 j q s j q s j q s j q s j q s c s r s s c s r s s c s r s s l s a s s l s a s s l s a s s n o t e p a d + + e x p l o e e r e x p r l o e e r e x p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p r l o r e r 1 1 1 1 0 0 0 0 0 @ 0 @ 0 @ 0 0 0 @ 0 @ 0 0 0 0 0 j q s j q s j q s j q s c s r s s c s r s s c s r s s l s a s s l s a s s l s a s s l s a s s n o t e p a d + + e x p l o r e r e x p l o r e r V 2 V 6 V 1 0 V 3 V 7 V 1 1 V 4 V 8 V 5 V 9 ( b ) U - M a t r i x ( c ) S t O r M ( d ) L a b e l s F i g . 2 . S O M a n d S t O r M r e s u l t s f o r E x p e r i m e n t I I domai n provides for meanin gful selec tion of e xp ert kn owledge e nco d ed with in our mo del. Such know le dge is i n t he fo rm o f exp e rt infor mat ion as provided b y existin g functiona l ca tegor isat ion of pro grammi ng metho d s imp lem ente d wi thi n the W indows O S. Perfor med exp eriments s how e ncoura ging re sults in ter ms o f improved vis ua li sa ti on for ex plo rat ory data anal ysis and knowledge d isc o ver y due to the pro p osed auto mati c cluster inter pr eta ti on al go rit h m. Our mo del highlights a uni que t yp e of learni ng which b ec omes es p ec ial ly useful, where no lab elle d e xample s e xist but so me knowledge abo ut classes o f interest is ackno wled ged . F ro m t he fie ld of info rm ation secur ity a ll the w ay to medica l science s, many do mains where exp ert k nowledge is abu nd an t e xist, y et such knowled ge i s difficu lt to incorp orate within tra dit ion al knowledge d isco ver y techniques. W e b eliev e tha t our technique is a step for ward tow a rds co m bi ni ng such disp arat e knowledge with tra di tio nal sour ces o f info rm ation and tha t suc h fusion can greatl y i mprov e un de rsta nd ing of t he pro blem at han d. References 1. C. M. Bishop. Patte rn Re c o g nitio n and M achine L e arni ng (In form ation Scienc e and Statisti cs). Spring er, A ugust 2006. 2. D. Bru gger , M. Bogd an, and W. Rosenstiel. Automatic cluster detecti on in k oho- nen’s som . Neura l Networks, IEEE T r ansact ions on , 19 (3):442–459, 2008 . 3. L. N. d e Cas tro and J. Timm is. Artificia l Immune Systems: A N ew Computational Intel l igenc e Appr o ach. Springe r, 1 st edition, No vem b er 20 02. 4. A. Dunne an d L . A. J. O’Neill. The interleukin-1 receptor/ toll-like recept or sup er- family: signal transducti on during infla mm atio n and h ost def ens e. Scienc e’s STK E 2003, 2003:re3, 20 03. 5. T. Kohonen. Automatic formation of to polo gical maps of pat ter ns in a self- organizing system. Pro c ee ding s of the 2nd Sc andina vian Confer enc e o n Image A nal ysis, p ages 214–220, 19 81. 6. T. Kohone n. Self-Or ganizin g Maps. Springer-V erlag, Ber lin, 199 6. 7. B. Malissen and J. J. Ewbank. ’T aiLoRin g’ th e response of dend ritic cells to pathoge ns. Natur e Imm u nolo gy, 6( 8):749–750, 200 5. 8. Micro sof t . Perfor manc e cou nters api. http://msdn.micr osoft.c om/ en- us/library/aa366 781(V S.85). aspx, 1(1) :1–1 , March 2 00 9. 9. A. Ul tsc h and H. P . Siemon. Ko honen’s self organizin g feature maps for exp lor ato ry dat a a naly sis. In Pr o c e e dings Intern. Neur al Networks, pages 305–308, P aris, 1990 . Kluw er Academic Press. 10. V. V apnik, A. V ashist, and N. Pa vl ovitch. L earn ing using hidden inf ormat ion : Master-class lear nin g. In F . F ogelman-S ouli, D. P e rrotta, J. Piskorski, and R. Stein- b erger, editors, NA TO Scienc e for Pe ac e an d Se curity S eries, D: Informati on and Com mun ic at ion Se curity, volume 19, pages 3–14. IOS Press, 200 8. 11. J. V esanto and E. Alhoni emi. C lustering of the self-or gan izing ma p. Neur al Net- works, IEEE T r ansactions on, 11(3):586–600, 2 000. 12. S. W u. Clustering of the self-o rgan izing map using a clusterin g v ali dit y index based on inter-cluster and intr a -cluster density . Patte rn R ec o gnit ion, 37( 2):17 5– 188, F ebru ary 2004.

STORM - A Novel Information Fusion and Cluster Interpretation Technique

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment