Real-Time Alert Correlation with Type Graphs

Real- Time A l e r t C o rr e l a t io n wi t h T yp e G r a ph s Gianni T edesco 1 and Uw e A i c k eli n 1 School of C om pu t e r Sc ience, U n i v e r si t y of N ot t i n gh am , N ot t i n gh am NG8 1BB, U n i t e d K i n gd om Abstr act. The pr emise of au t om at e d aler t corre la tion is t o ac ce p t that false al e r t s from a low lev el i n t r u si on d e t ec t i on s y s t e m ar e i n e v i t ab le a nd use attack models to expla in the ou tpu t in an und e r s t an d ab le way . Se v- eral al gor i t h m s e x is t for t h is pu r p ose which us e attack graphs to mo del the way s in w hich at t ack s c an be com bined. Thes e al go r i t h m s can b e class ified i n to t wo b r oad categories namely scenar io-gra ph appro aches, which c r e at e an attack model start ing fr om a v u l n e r ab ili t y as sess m e n t an d t y p e- gr ap h app roa ches which rely on a n ab strac t mo del of t h e r ela- t i on s b e t wee n at tack t y p es . Some resea rch in to i m p r o v i n g the efficiency of t y p e- gr ap h corre la tion has been carried ou t bu t t h is rese arch has ig- nored the hypothe sizi ng of missing al e r t s . Ou r work i s to p r ese n t a nov el t y p e- gr ap h a l go r i t h m which un i fi es cor rela tio n and hypothe siz ing in to a single oper ation. Ou r e x p e r i m e n t al r es u l t s indica te that the appr oach is e x t r e m el y e ffi ci e n t in the face of i n t e n si v e al e r t s and pro du ces c om p ac t output graphs comp ara ble to other t ec hn i q u es . 1 I n t r o du c t io n The outp ut of i n t r us io n de tec tio n s y s t em s (I DS) is gener all y a t i m e series of discrete e v en t s calle d “a l er t s ” wi t h ea c h ev en t descr ibin g, at a l o w le v el, f ea t ur es of th e netw or k t r a ffi c. These aler t attr ib u te s t yp ic a ll y inc lude th e end p oi n t s and co m m un i ca t io n c han nels i mp l i ca t ed in an ale r t and th e t y p e of a l er t . A r g ua b ly th e m o s t s ig ni fi ca n t p r oblem wi t h analyzin g ID S a l er t s is the hi gh volume of false a larms. Ev e n wi t ho ut false al arms IDS a l er t s require s ome i n t er pr et a t io n. This is b e cause att acks are o f t en sp l i t in to several s t a g es , each of which m a y g en e r a t e many a l er t s . This o bs er v a t io n has lead to th e prop osal tha t a l er t s b e a ut o m a t i ca ll y co r - r el a t ed using a m o del of att acks which e nco des t h ei r pr er eq ui s i t es a nd co n s e- quences [10]. Typicall y t hes e metho d s i nv olve r epr es en t i ng atta ck t y p es as v er - t i ce s i n a d irecte d a c yclic graph w hich we shall ca ll an “at tack grap h”. Edges i n atta ck graphs r epr es en t th e r el a t io nsh i p b e t ween pr er eq ui s i t es and co n s eq ue nc e s of atta cks . I n t ui t i v el y spea king, a directe d edge will co nnect att ack A to a tt a c k B if A p repar es f or B . Resear ch ha s sh own th at such t ec hn i qu es a re capa ble of: 1. A gg r eg a t i n g a l er t s which i mpl y the same, or sim ilar, conse quenc es. An a g- g r eg a t ed grou p of a l er t s is call ed a h y p er -a l er t . 2. Ignori ng e x t r a ne o us a l er t s wh ich do no t co rr el a t e wi t h an y t h ing . 3. Uncov eri ng mi ssi ng a l er t s in a n ale rt s t r ea m an d h y p o t h es izi n g t hei r at- tr ibu t e v alues whe re p ossibl e [8]. H y p o t hes es m ay opt ional ly b e co m pa r ed a g a i n s t o t her e vide nce sour ce s such as s y s t em l ogs [ 11 ] . These a ut o m a t ed ale rt co rr el a t io n t ec hni qu es may be d ivided in to t w o ca t - egories ba sed on th e t y p e of atta ck mo del w hich i s e nco ded in th e atta c k g r aph. W e s hall ref er to th e t wo ca t ego r i es as t y p e-g r aph a nd sc en ario - gr ap h a lgo r i t hm s . Scenari o graph a lg o r i t hm s r el y on a c omp lete and c orrec t vu l ne r a b i li t y a ssess - men t to g en er a t e a grap h of att ack seque nces sp eci fic to th e pr o t ect ed net w o r k [ 6 ] . While t h i s ap proach al lows for r eal-t ime a ut o m a t ed co rr el a t io n i t fails co m - pletel y i f netw ork addr esse s are re- assigned or if the v ul ner a bi li t y a sses s me n t is errone ous. C onv erse ly , t y p e graph a lg o r i t hm s mo del only abstr act att ack t y p es which a llows f or m ore r o bus t co rr el a t io n but wi t h a h igher co m pu t a t io na l co s t . In [10 ] co rr el a t io n is pe rformed in b atch mo de onl y a nd in [15 ], whe re vu l ner a - bi li t y a sses s men t dat a is i n c o r p o r a t ed, a slidin g co rr el a t io n w ind o w is r eq u i r ed to k ee p th e proble m m an a g ea b l e. W e a ss er t tha t real- time co rr el a t io n is desir ab le b ec ause i t allows for t i m el y a ut o m a t ed re sp onses. If th e t i m e lag b e tw een det ec tio n and res p onse i s t o o g r ea t then att acks such a s rapi dl y spreading worms may b ecome m uch more d i ffi cul t to co n t a i n. Real-ti me o p er a t io n also f a cil i t a t es t ec hni qu es such as [4 ] wher e co r - r el a t io n out put is used to p er form a t a r g et ed f ore nsic anal ysis of network t r a ffi c for th e purp oses of disc o vering nov el att acks and v a r i a t io ns of known a tt a c k s . Our wo rk is m o t i v a t ed by th e need for a co rr el a t io n a lg o r i t hm wi t h b o t h t h e flexi bi li t y of an abstr act atta ck t y p e-g r aph and sim ilar p erforma nce c ha r a ct er i s - t i cs to stat e of th e a rt sce nari o graph a lgo r i t hm s . Sp ec ific all y , we wis h to a v oi d rel ying on pri or k nowledge of ne tw ork t o p olo g y and th e d i s t r i but io n of vu l ne r a - bi li t i es in th e pr o t ect ed n etwo r k. It is also desirable to av oid rel ying o n a s li d ing co rr el a t io n w ind ow whic h would all o w “l ow a nd sl ow” att acks to b ec ome lo s t . The aim of t hi s pa p er is to develop a n a ut o m a t ed ale rt co rr el a t io n a lg o r i t hm using atta ck t y p e gra phs w hich is s u i t a b l e f or d ep l o y m en t i n a real-time s et t i n g . A t he o r et i ca l anal ysis of co m pu t a t io na l co mp l ex i t y wil l b e provided. F or v er i fi - ca t io n th e a lgo r i t hm w ill b e ex p er i me n t a ll y ev a l ua t ed in t er m s o f p er f o r m a nce and a cc ur a c y . Our p rop ose d solut ion works by r e-s t r uct ur i ng th e t y p e graph co rr el a t io n a l go r i t hm pr es en t ed by Nin g et al. such tha t i t act s on i ndi vidual a l er t s in se- quence rat her t ha n all a l er t s i n ba t c h. The basic a ppr oac h is to k eep a n i n t er n a l da t ab a s e of h y p er -a l er t s o f each t y p e a nd use in-mem ory index es to effi ci en t l y find pr er eq ui s i t es of e ach new h y p er -a l er t . The s ize of th e in- mem ory da t ab a s e is mini mized by eli m i na t i n g r ed un da n t i nf o r m a t io n which do e s no t co n t r i but e to th e co rr el a t io n proc e ss. H y p o t hes izi ng of missin g a l er t s i s a rec ursive s p eci a l- case of th e co rr el a t io n a lg o r i t hm which can input h y p er -a l er t s wi t h wil d- c a r d attr i b ut es . The main co n t r i but io ns of t hi s wo r k ar e a t y p e-g r aph co rr el a t io n a l go r i t hm s u i t a b l e f or r eal-t ime u se. T he a lg o r i t hm dep en ds on a nov el i n de x str uc tu r e and unifie s th e co rr el a t i o n and h y p o t h es i s i ng s t eps in to a si ngle a lg o - r i t hm . This pap er is s t r uctur ed as foll o ws. First a brief di scussi on of r el a t ed work is giv en in sec tion 2. F r om here w e pr es en t a d es cr ip ti on and form al d ef in i t io n of th e proble m i n sec tion 3. Buil ding on t hi s def in i t io n, a soluti on is pr es en t ed i n section 4 whi ch solves th e mini mal IA C problem wher e t her e a re no false ne g a t i v e a l er t s . This min ima l a lgo r i t hm is de v el op e d to th e fulle r soluti on pr es en t ed and anal yzed in secti on 5. Section 6 provides an e mpirical anal ysis of the a lg o r i t hm . In th e final section th e r es ul t s are disc ussed , con clusi ons drawn a nd future w o r k pr o p o s ed. 2 R e l a t e d W o r k Seminal wo rks such a s [5, 13] laid th e grou ndwork f or a ut o m a t ed ana lysis of s ecu r i t y r el a t ed facts and ev en t s . These works prop osed a forma l t he o r y of co m - puter att acks by mode ling th e pr er eq ui s i t es and c onseq uence s of v ul ne r a b i li t i es in atta ck graphs and for mal gra mm ars r es p ect i v el y . W a ng e t al. t a k e a v ul ner a bi li t y -ce n t r ic approach to alert co rr el a t io n [6]. I n t hi s work an a ut o m a t ed vu l ner a bi li t y anal ysis [12 , 14] cr ea t es a n atta ck g r aph co ns i s t i ng of t w o t y p es of v er t ex , at tack s and s t a t es . O nly t ho s e att acks w hi c h hav e be en f ound on th e pr o t ect ed sy s t em are i nclude d . Al l attack v er t i ce s a r e b oun d wi t h attr ib ut e v alues s uch as IP addre sses an d p o r t s . The co rr el a t io n a l go r i t hm works by p erf ormin g a br ea d th fir s t sear ch on the atta c k graph. H ig h p erforma nc e is achieved by en um er a t i n g al l p os sible fac t a ss ig nm en t s for e v er y atta ck t y p e and pr e- co mpu t in g an optimize d g rap h str uc tu r e for co rr el a t io n. A no t he r i m p o r t an t c oncept in t h i s work is “ i mp l ici t co rr el a t io n” whe reby o n ly th e la te s t a ler t which s a t i s fi es an atta ck s t ep is s t o r ed in mem or y . H ow ever, w e hav e a ss er t ed tha t i t is unde sirable to ass ume that the defen der can r eli a bly kno w of all vu l n er a b i li t i es on the netwo rk. The ref ore our w ork u ses an a b s t r a ct a tt a c k -t y p e m od el a l t ho u g h we do u se a simi lar h y p o t hes izi ng t ec hn i q ue and t r y to preserve th e no t io n of i mp l ici t co rr el a t io n as far as p o ss i b l e. Ning e t al . t a k e a logical a ppr oach to mo deli ng atta ck seque nces f or a ut o- m a t ed co rr el a t ion [ 2 , 9, 10]. The t ec hn i qu e is i n t end e d to b e appl ied in ba tch t o an off-l ine da t ab a s e of c ollect ed h y p er - a l er t s . T he f un da men t a l b uildi ng blo ck of th e appr oach i s the def i ni t io n of a “h y p er - a l er t t y p e” which r epr es en t s a t y p e of atta ck and i t s pr er eq u i s i t es and con se quence s. Each h y p er - a l er t t y p e co n s i s t s of a t r i pl e of fac t na mes and pr er eq u i s i t es a nd c onclus ions w hich are pr edi ca t e expre ssions wi t h free v ariabl es b ound fr om th e fact na mes. If a pr edi ca t e a p - p ears in th e con seque nce of one h y p er -a l er t t y p e and th e pr er eq u i s i t e of a no t he r th en th e for mer “may prep are for” the latt er. The a ss ig nm en t of f acts to an y such shared pr edi ca t e are u sed to ca l cul a t e eq ua li t y co n s t r a i n t s b e tw een th e t w o t y p es . A h y p er -a l er t of a given t y p e is simpl y a t up l e of attr ibu t e v alues co rr es p o nd - ing to th e fact na me s f or tha t t y p e. C o rr el a t io n is p er forme d in ba tch on a set of h y p er - a l er t s , each h y p er -a l er t is c onsidere d a p o t en t i a l v er t ex in the co rr el a t io n graph and if eq ua li t y co n s t r a i n t s are s a t i s if ied b etween o t her h y p er -a l er t s t hen t hey are co rr el a t ed by addi ng a dir ected edge b etween t h em p r o vide d th at t hei r t i m es t a mp s s how th e cor rect t em p o r a l o r der . H y p o t hes izi ng of mi ssi ng a l er t s is tre ated in [8, 11] . The problem he re is t ha t when some s t eps in an atta ck hav e b een miss ed by th e underl ying ID S th en t h e res ul ta nt co rr el a t io n graph s may b e s pl i t and require a dd i t io na l proce ssin g t o r e-i n t eg r a t e t h em . The ap proach t a ken i n t h ei r wo r k i nv olves four s t eps : 1. Subgra phs of the co rr el a t io n graph are cl u s t er ed accor din g to th e a tt r i but e v alues of t hei r hyp er a l er t s . 2. Onc e ca nd i da t e s ubgra phs hav e b een selec ted for i n t eg r a t io n, a sp ecia l h y p er - ale r t t y p e-g r aph is co n su l t ed which has had i ndirect ed ges adde d to i t . Pairs of sub graphs ar e th en co rr el a t ed using t hes e new ed ges to defi ne t h e ind ir ec tly -p r ep ar es- for r el a t io n. 3. Whe n an ind irect co rr el a t io n o ccur s t her e are one or more p at hs in th e t y p e- graph c onnect ing th e t wo h y p er - a l er t t y p es . New h y p er -a l er t s are cr ea t ed t o connect th e t wo co rr el a t io n graphs and t hei r att rib ut e v a lues i nferr ed u s i ng th e eq ua li t y co ns t r a i n t s in th e g r aph. 4. Beca use th e pri or s t eps may g en er a t e many r edu n d a n t h y p o t h es es wi t h eq u iv a l en t fact v alues, a co ns oli da t io n s t ep red uces the size of th e final co r - r el a t io n g r aph. The w ork pr es en t ed in t hi s pap er t a k es a di ff er en t a ppro ach a nd simply re lie s on rec ursin g b ackw ard s t hr o ug h the t y p e-g r aph w henever a h y p er - a l er t is i npu t which has no t had i t ’ s pr er eq ui s i t es me t by a n o t he r h y p er -a l er t in th e s y s t em . Our me tho d i s at once more effi ci en t and eli m i na t es th e co ns ol i da t io n s t ep b y t er m i na t i ng recurs ion as s o on as a dupl i ca t e h y p o t hes i s is g en e r a t ed. In [15] co rr el a t io n and h y p o t h es izi ng is p erfor med, again, in ba tch m o d e. How ever in t h i s case a stat e/ e vent mo del is chosen so th at evide nce f rom co m - plem en t a r y sour ces such a s vu l ner a bi li t y anal ysis a nd raw au di t logs. The at- t a c k mo del is co n v er t ed in to a Bay esian network wher e prior pr o ba b i li t i es a r e assigne d manuall y by human e x p er t s . A slidin g t i m e wind o w is use d to li m i t mem ory usa ge an d pr e v en t a co mbi n a t o r i a l exp losi on in r un-time co mp l ex i t y a ss o c i a t ed wi t h the Bay es ia n i nfere nce a lgo r i t hm . Our work is m o s t simil ar to [7] in wh ich in-me mor y inde xes ar e used t o s ig ni fi ca n t l y s p eed up co rr el a t io n l eaving th e RDBM S ju st to s t o r e a log of h y p er - a l er t s on disk. The m o s t r el e v an t co n t r i but io n in t h ei r wo rk is th e prop osal t o index i ns t a nce s of pr edi ca t es rat her tha n hy p er a l er t s . Their r es u l t s i ndi ca t e that the a lg o r i t hm would b e s u i t a bl e f or real-time o p er a t io n but h y p o t hes izi ng of mi ssing a l er t s i s no t add re ssed an d mus t pr esu mab ly b e p erf ormed as a p o s t - proc essi ng s t ep on the co rr el a t io n grap h. The w ork pr es en t ed in t h i s pap e r t a k es a di ff er en t a ppr oach and i ns t ea d i nde xes i ns t a nce s of th e P r e p ar eF or r el a t io n. In sum mar y , t her e ar e s everal a ut o m a t ed co rr el a t io n a lg o r i t hm s . Those w h i c h are s ui t a b l e for real-ti me o p er a t io n ei t he r rel y o n th e defen der b ein g ab le t o correct ly an d comp letel y en um er a t e p oss ible co mbi na t io n s of att acks on t h ei r pr o t ect ed netwo r k, or worse, r el y on a slidin g t i m e wind o w whi ch ope ns up t he co rr el a t o r to “l o w and slow” or “al er t inject ion” att acks . The abstr act t y p e-g r aph approa ch a pp ears m ore pro mis ing a nd has b een p art ly optimize d f or r ea l -t i m e dep l o y me n t . Our work buil ds on prior t ec hni qu es by usin g a n ov el i nd ex i n g str uc tu r e and u nif ying th e co rr el a t io n an d h y p o t h es izi n g s t eps in to a s i ng le real-ti me a lgo r i t hm . 3 Pro ble m D e fi n i t io n F or th e purp oses of cl a r i t y th e i n t r u s io n aler t co rr el a t io n (IAC) proble m will b e solv e d in t w o s t ep s . F i r s t l y th e “min ima l IAC pr oble m” in which a t o t a ll y a cc u r a t e aler t s t r ea m is i nput an d no a l er t s ar e h y p o t h es i ze d a nd sec ond l y; t he “ e x t end ed IA C pr oble m” i n whic h s ome a l er t s can b e mis sing a nd th e s y s t em mus t h y p o t h es ize a l er t s . The f ollo w ing proble m d efi ni t i o n is base d on th at pro- p osed by Ni ng et a l . [ 8–1 1 ] . Defi nition 1 . An attac k mo del c onsists of lo gic al pr e di c ate s, hyp er-al ert t y p e s and i mplica ti on r el at io ns . A hy p e r-a le rt ty p e T is a triple ( f act, p r e r e q u i s i t e , con se q uen c e) w her e f act i s a set of at tri b ut e name s a sso ciate d with the t y p e , p r e r e q u i s i t e and co nseq uenc e ar e sets of pr e dic ate expr essi ons with fr e e v a r i - ables b ound f r om f act. P r e r e q ( T ) a nd C o nse q (T ) de note t he se t of p r e d i c a t e expr essi ons fr om t he pr er e qu isi te and c onsequenc e ele ments of T r e s p ec t i v e l y . F or br e vity w e ass ume al l im plie d expr ession s t o b e i nc luded in C onse q (T ) . W e shal l r efer to the set of al l h yp er-al ert t y p e s in an attack m o del a s τ . F or th e purp ose of our e xamp les w e w ill assume tha t t her e are alwa ys 4 e le- men t s in f ac t (say , source add ress, so urce p o rt , des t i na t io n a ddr ess, des t i na t io n p o r t ) . Defi nition 2 . Given an order e d p air of h yp er-alert t yp es (A , B ) the n A m a y pr ep a r e f or B i f C onse q ( A) a nd P r e r e q ( B ) shar e a t least one pre d i c at e, w i t h p ossibl y dif fere nt a r g u m e n t s . Defi nition 3. Given a n or der e d pa ir of h yp er-alert t yp es (A, B ) w her e A m a y pr ep ar e for B a se t o f e qu a lit y c on str a int s ma y b e co mp ute d. Eac h such c o n - str ain t is a set o f logic al c onju nc tion s of equality c om pariso n s b etwe e n the a t - tributes of the tw o t y p e s . L et the s e quen c es u 1 , u 2 , ..., u n and v 1 , v 2 , ..., v n b e dist inct facts in t yp e A and B r esp e ctivel y. T he n e ach c ons traint ta kes t he f o r m : u 1 = v 1 ∧ u 2 = v 2 ∧ ... ∧ u n = v n such that t here exist s p( u 1 , u 2 , ...u n ) ∈ C onseq (A) a nd p( v 1 , v 2 , ..., v n ) ∈ P r e r e q ( B ) where p is t he sa me pr e dic ate w ith p ossibl y di fferent fact a ss i g n m e n t s . N o t e th at the onl y s ubs t an t i a l differ ence b etw ee n ou r def in i t io n and tha t of Ning e t a l. i s th e r es t r i ct io n tha t any given fact m ay a pp e ar at m o s t once in t he a r g um en t s o f a pr edic a t e. The p urp ose of t hi s r es t r i ct io n will b e come cle ar i n th e foll owing s ect io ns . Defi nition 4 . Gi ven an att ack m o del, le t us define a n atta ck-t yp e gr aph T G = (V , E , C , T ) . Wher e (V , E ) is a dire cte d ac yclic gr aph. T is a bi je ction of v e r t i c e s on t o h y p er- alert t yp es. An e dge e (v 0 , v 1 ) ∈ E if a nd o nl y i f T ( v 0 ) ma y p r ep a r e for T ( v 1 ). C maps e ach e dge to a set o f c o n s t r a in t s . Defi nition 5. A hyp e r- al e rt h is simpl y a tuple of attr ib u t e value s. T y p e ( h ) is a mapping on to the set of h yp er-ale rt type s. P r e r e q ( h ) and C onse q (h) de - note t he set of pr e di c ates f ro m the pr er e qui sit e and c onseq uence of the h y p e r - alert t yp e wit h fr e e variables bo un d fro m t he att ri b ut e values of the h y p e r -a l e r t. T im estam p(h ) denotes the t imest amp of the hype r -ale rt. A hyp e r- al er t st r e a m is an y tim e-o r dere d series o f h y p e r -a l e r t s . Defi nition 6. A hyp er-a lert h of t yp e A is said t o pr ep a r e for a h y p e r -a l e r t h ′ of t y pe B if a nd only if T y pe (h) ma y pr ep ar e f or T y p e ( h ′ ) and at le ast o n e e qualit y c ons traint eval uate s to true w hen f act names have b e en s ub stituted w i t h actual value s fr om the h yp er alerts. F u rth ermo re, sinc e an even t B can b e t he c ause of a n event C if an d only if B o c curs b efor e C , an implic it ti me c ons traint ensur es f orwar d c a usalit y holds. In other wo r ds the dire cte d e dge s in T G , li ke time, move from p ast t o f u t u r e . Two hyper a l er t s ar e said to b e co rr el a t ed if a nd onl y i f the forme r pr epa r es for th e l att er. Sinc e al l th at is requ ired to co rr el a t e t w o hy p er a l er t s is th at an y one o f th e co n s t r a i n t s holds. W e m ig h t say tha t ea ch e dge in T G is lab eled wi t h a pr edi ca t e log ical f ormula, co ns i s t i ng onl y of eq ua li t y comp aris ons, i n di s j un ct i v e normal f o r m . Defi nition 7. The outp ut c o rr e l a t i o n gr aph C G i s (V , E , H ) where (V , E ) i s a DA G a nd H is a bi je ction of hy per-a l ert s to v ertic es a nd an e dge e (v 0 , v 1 ) ∈ E if a nd only if H (v 0 ) pr ep ar es for H (v 1 ) . Defi nition 8. If a hyp er-a lert h exists w her e P r e r e q ( h ) is n on-empt y and t he r e do es not e xist a h yp er-aler t h ′ such t hat h ′ pr ep ar es for h the n h is sa id t o b e “ u n e x p l a in e d ” . An un ex p la in e d a l e r t h m a y s ometi mes b e exp lained by t he c ons tru ctio n of a s e que nc e of h yp othesized h y per alerts y 1 , y 2 , ... , y n such t ha t y n pr ep ar es f o r h , y n − 1 pr ep ar es for y n , . .., and a re a l (u nh y p o the si se d ) hyp er a lert h ′ p r ep a r e s for y 1 . Ther e ma y b e se ver al alter na tiv e expla nati ons f or an y su ch h y p e r -a l e r t. The ex ten ded c o rr e l a t i o n gr aph E G ther ef or e c onsist s of (V , E , H , Y ) w i t h the same d efini tion as C G with t he addition of Y , a mapping of vertic es on t o the set of h yp othesised h yper -a ler ts which a r e r e quir e d to explai n a n y u n e x p l a in e d alerts in H . V is f orme d by the union of H an d Y . In s umma r y our prob le m is to prop ose an a lg o r i t hm w h i c h : 1. Is i n i t i a li ze d wi t h TG, a nd an empt y CG . 2. A t ea c h t i m e s t ep: (a) Input a h y p er - a l er t . (b) C o ns t r uct a c orrect and c omplete CG as p e r de f in i t io n 7 or, for t h e ex t end e d IAC pr oblem, d efi ni t io n 8 . 4 A Minimal Sol u t io n The in ner lo op of ou r p r op osed a lgo r i t hm co n s i s t s of t wo s t eps . F i r s t l y “ s ea r c h - ing for co rr el a t i o n s ” and sec ondl y “ mar king of c onse quence s”. Whe n m a r k i ng conseq uences of a t y p e T h y p er -a l er t h w e fi nd each t y p e T ′ such th at T m a y prepare f or T ′ . Then th e eq ua li t y co ns t r a i n t s be tw een th e t wo t y p es are u sed so a s to inde x ev er y p oss ible co mbi na t io n of h y p er -a l er t attr ib ute s for T ′ w hi c h should b e co nsider ed pre pa re d for by h. Each index en t r y cr ea t ed in t hi s s t a ge co n t a i n s a p oi n t er to h. Conv ersel y w hen searching f or co rr el a t io n s th e i n de x es on t y p e T are se arched u sin g th e a ttr ib u te s of h. If an e ar lier h y p er - a l er t h ′ ha s b een inpu t a nd marked i t ’ s c onseq uences i t will b e f ound dur ing th e s ea r c h ing for co rr el a t io ns s t a g e if a nd onl y if h ′ prepare s f or h. Th e str u ct ur e of our i nd ex is uniq ue and, by in dex ing eac h attr ib ut e co mbi na t i o n s epa r a t el y , the IAC is r e- duced to a s uff i ci en t l y small co ns t an t num b er of se arch and i ns er t o p er a t io n s o n balanced b inar y t r ees [ 1 ] rat her th an mul t i -di m ens io na l se arches wi t h wil d- c a r ds . This app roach e x p l oi t s t w o pr o p er t i es of th e s tr uc tu r e of th e prob lem. F i r s t l y tha t t i m e fl ows fr om past t o future , mea nin g tha t prio r a l er t s do no t need t o b e check ed and co rr el a t ed t wi ce . Sec ondl y a l t h ou g h th e number of p ossible co n - str ai nts on a gi ven edge a re e x p o nen t i a ll y r el a t ed to th e num b er of fact s, i n practice , th e num b er of fac ts and t her ef o r e th e maxi mum n umber of i n de x es required is s m a ll . Le m ma 1 . Give n a pair o f h yp er-alert typ es (T 0 , T 1 ) we take A an d B t o b e t he i r attr i bu t e sets. The se ts of att rib ute s ar e e quip ote nt, e ach c ontain ing n e l e m e n t s . Each c on s tr ain t ma y b e r epr es ente d as a se t c onta inin g 0 <= m <= n o r de r e d p airs of attr ib u t es (a, b ) such th at a ∈ A a nd b ∈ B . N o e le ment of A ma y a pp e a r as a le ft c omp one nt mor e than onc e, and no ele ment of B may a pp e ar as a r i g h t c omp onent more than onc e sinc e b y defi nition 3 the pr oblem is r estr ict e d to t he simplified case i n w hich each fact r ef err e d to in an e q uality c ons traint ma y m a k e at most o ne appe ar an ce on e ach side of t he e q u a t i o n . Ther e a r e P (n, m) · C (n, m) wa ys t o ar r ange m disti nct p airs f r om n e l e m e n t s of A a nd n elements of B , whe r e P and C ar e the p ermut e and chose f u n c t i o n s r esp e ctivel y . The nu mb er of po ssib le equ a lit y c onst r aints is ther ef or e the sum of al l c onstraints of e ach le ngth m . Pr o of. Our problem is to co n s t r uct t w o seque nce s a 1 , a 2 , ..., a m and b 1 , b 2 , ..., b m where a 1 is paired wi t h b 1 , a 2 is paired wi t h b 2 , e tc. W e shal l sol v e t he pr o ble m in t w o s epa r a t e s t eps . First we chose m el em en t s of A a nd m el em en t s of B an d secon dl y w e arr ange th e pairs. There are C ( n, m) 2 w ays to select a pa i r ( A ′ , B ′ ) w here A ′ ∈ the set of al l m -co mbi n a t io n s of el em en t s in A and B ′ ∈ th e set of all m - co mbi na t io ns of el em en t s in B . N o w to pair t h em up w e k ee p el em en t s o f A ′ in a fi xed or der a nd simpl y co un t th e way s to p er m ut e t he el em en t s of B ′ . Since t her e ar e m ! way s to p ermu te m a tt r i but es : C ( n , m ) 2 · m ! = n ! m ! ( n − m ) ! n ! · n ! m ! ( n − m ) ! n ! · m ! = m ! ( n − m ) ! · ( n − m ) ! = C ( n , m ) · P ( n , m ) ⊓ ⊔ If w e wi sh to co un t th e maxi mum numbe r of co ns t r a i n t s w hen t h er e is m o r e th an one t y p e of attrib ut e then w e can r e-use th e f ormu la ab ov e to co un t t he w ays of comp ari ng the a tt ri bu t es of each t y p e a nd t a k e th e pr o du c t : t c i Y X P ( c i , j ) · C ( c i , j ) ( 1 ) i = 1 j = 0 Where t is th e num b er of t y p es , c i is th e number of attr ib u te s of the i t h t y p e. Theref o re, if we chose 4 f acts: s ource and des t i na t io n add resse s a nd p o r t s where a ddre sse s and p o r t s are no t comp arable wi t h each o t her . Th en t her e a r e 49 p ossi ble co n s t r a i n t s to an ed ge i n T G. Since t her e ar e less co mbi na t io n s t ha n p er m ut a t io n s , th e idea is to cr ea t e an in dex for e ach of th e 16 co mbi na t io ns of facts for e ach t y p e. P er m ut a t io ns cap tur e t he p ossi bl y d i ff er en t orde rin gs f or t he attr i b ut es in th e eq ua li t y co ns t r a i n t s an d will b e used whe n i n s er t i ng i t em s i n to th e i nd e x es . A lgo r i t hm 1, re quires se v er al f ur t her de fin i t io n s to d et er m in e which co mbi - na t io ns of fie lds mus t b e i ndexe d f or ea ch t y p e a nd how to e v a l ua t e w ha t are t he conseq uences for ea ch h y p er - a l er t so tha t t hey c an b e mar k ed. A n o t io n s i m i- lar to i mp l i ci t co rr el a t io n in [6] is i n t r o du c e d. If t w o h y p er -a l er t s hav e i den t i ca l attr ib ut e v a lues th en t hey mus t also hav e i den t i ca l co nseque nces mea nin g t ha t the co rr el a t io n proc edure i s r edu n d a n t th e sec ond t i m e aro und. W e defi ne an implict co rr el a t io n s o th at all h y p er -a l er t s of a given t y p e are indexed ba sed o n th e co mbi n a t io n of f act v alues which a re used in mar king of co n s eq ue nc e s . Defi niti on 9. The Cor rela tionSe t is a re lati on on a gi ven pair of t ypes ( T, T ′ ), such tha t C orrelat ionSet( T, T ′ ) i s a set of pai rs of the f orm (a, b) w here a is a perm utati on of facts i n T a nd b is an s ubset of fa cts in T ′ such t hat a and b are eqipote nt and ther e exist s an equali ty c onstra int of t he f orm u 1 = v 1 ∧ u 2 = v 2 ∧ . ..u n = v n where se que nce u 1 , u 2 , ..., u n is the ele ment s of a arr anged i n to a fixed order an d v 1 , v 2 , ... , v n is the sequence b. De fi ni ti on 10. The In de x Se t is a r elation on a given typ e T a n d set of a l l typ es τ whic h r etu r ns subset s of f acts i n T w hich m ust b e indexed. I n d e x S e t ( T , τ ) r etu rn s ever y subset x of f a cts of T w here ther e e xists a T ′ such that T ′ m a y pr ep ar e for T and x is a rig h t-c o m p o ne nt of C or r e l ati o n S et( T ′ , T ) . De fi ni ti on 11. The Im plici t Set is a r el ation on a given t y p e T a nd set of al l t yp es τ w hich return s a s et o f fa cts in T up on w hich fut ur e c or r el a tio n s m a y dep end. I m p l i ci t S e t ( T , τ ) r etu rn s the uni on of e very subse t x of fa cts in T w he r e ther e exists a T ′ such that T ma y prep a r e f or T ′ and x is a lef t-c ompon e nt i n a n eleme nt of C o r r e l a t i o n S e t ( T , T ′ ) . I np u t : H y p er ale rt s t r ea m H , H y p er -a l er t t y p es τ O u tp u t : A ll pairs (h ′ , h) such th at h ′ prepa res f or h and b o t h are i n H fore a ch h ∈ H ( input in a scending t ime or der) d o Let T = T y p e ( h ) ; Let i b e a inde x o n I m p l i ci t S e t ( T , τ ) ; if L o okup(i, h) th e n C o n t i n u e wi t h n ex t a l er t ; e nd fore ach ind ex i on I n d e x S e t ( T , τ ) d o Let th e set o f h y p er -a l er t s R = Lo okup (i, h ) ; fo rea ch h ′ ∈ R d o Add th e pair (h ′ , h) to Ou t put ; e nd e nd fore ach T y p e T ′ where T may pr ep ar e f or T ′ d o for e ach Per mu t at io n p, inde x i on Cor relat io n S e t(T , T ′ ) d o In ser t( i , Per mu te ( h, p )) ; e nd e nd e nd Alg or ith m 1: T he minim al A TG a lg o r i t hm 5 H y p o t h e s i s i n g Missing A l e r t s A lgo r i t hm 1 do es not att e mpt t o deal wi t h missin g a l er t s in th e inp ut ale rt s t r ea m . What shoul d ha pp e n is th at f or any ale rt which arrives and is no t ex - plained by a pri or ale rt the n t h o s e a l er t s are h y p o t h es i ze d wi t h ap p r o pr i a t e f a ct v a lues. T his is d one r ecurs ivel y un t il ei t he r a h y p er -a l er t t y p e wi t h in- degree zer o is f ound, n o fa cts ca n be i nferr ed for a h y p o t hes i s or un t il a rea l ale rt is f o un d. Only if a re al ale rt is f ound will th e h y p o t h es i ze d se que nce be en t er ed in to t he co rr el a t io n gra ph. If n o r es ul t s are f ound in th e “se arch f o r co rr el a t io n s ” s t a ge th en th e cur r en t ale rt is une xplaine d. A l er t s a re h y p o t hes i ze d wi t h a tt r i but es s a t i s f y i n g each co n s t r a i n t o n each incomi ng e dge. Of t en t i m es on l y a subset of th e fact v a lues may be i nfer red f or a h y p o t hes i ze d ale rt as no t all v alues hav e t o b e ref err ed to in the e q u a li t y co ns t r a i n t s fr om th e atta c k m o del . This lea ds to a p roblem whe n we rec urse m ore th an one s t ep . The r ecu r s io n needs to t er m i na t e whe n a r eal h y p er -a l er t may prepare for a h y p o t hes i ze d o n e. There is n o g ua r an t ee th at an inde x ex i s t s f or th e su bset of fact v alues in t he h y p o t hes i ze d a l er t . Our a ppr oach leads u s to con sider th e h y p o t hes izi n g pr o b lem as i den t i ca l to th e co rr el a t io n pr oblem, exc ept tha t our h y p er - a l er t s may co n t a i n a par t ia l set of attr ib ut e v a l ues . A pre -pr oce ssin g s t ep is i n t r o du c e d in wh ich an ex pande d version of t h e I nd exS e t is ca l cul a t ed so th at all such pa rt ia l sets of attr ib ut e v alues ar e i n de x e d. Also we i n t r do uce the r el a t i o n H y p o t h e s i s S e t ( T , τ ) w her e T is a t y p e and τ i s th e set of a ll h y p er - a l er t t y p es . Th is r el a t io ns m aps on to a se t of 5-tup les wi t h th e co m p o nen t s (t, i, p, m, o ) : 1. t is a t y p e w hich may prep are f or T . 2. i is a n el em en t of th e I ndex S et of t . 3. p i s a p er m ut a t io n to app l y to fact v alues o f th e cur r en t h y p er -a l er t i n o r der to quer y th e index i of t y p e t . 4. m is th e co mbi na t io n of fac ts whi c h a pp ear in p . 5. o i s th e co mbi na t io n of fa cts th at were originall y r equire d for th e cur r en t co ns t r a i n t . In o t her wo rd s all facts men t io ned on th e r ig h t hand side of t he eq ua li t y c om pari sons f or t hi s co n s t r a i n t . The h y p o t h es izi n g a lgo r i t h m th en is a recur sive pro c edur e wi t h t wo pa r a m - eters the fir s t of wh ich i s a T G v er t ex v ′ and the sec ond i s a h y p er -a l er t h. T he functi on re tur ns true if a rea l h y p er - a l er t was co rr el a t ed or f alse o t h er wi se. T he proc edur e is tha t for ea ch el eme n t in th e H y p o th e si sS e t a ss o ci a t ed wi t h v ′ : 1. Let f b e the set of h y p o t hes i ze d fact att rib u t es in h. C o n t i n u e th e lo op if th e uni on of f and o i s no t equal to m. This avoids g en er a t i ng u nn ec e ssar y h y p o t hes es based on a stri ct subset of th e actual l y av aila ble fa ct a tt r i bu t es . 2. Let h ′ b e a new h y p er - a l er t . Use p to p e r mut e th e fac ts i n h a nd assig n t he m to h ′ . 3. C r ea t e a ke y fr om h ′ which combines fa cts r equired f or in dex i of t. Quer y i and if a r es ul t i s f ound , co rr el a t e th e r es ul t wi t h h ′ and co n t i n u e the l o o p. 4. Recurse to th e v er t ex f or t y p e t passi ng h y p er -a l er t h ′ . If th e recur sion r etur n s tr ue the n co rr el a t e h ′ wi t h h . W i t h t h i s pro ce dure h y p er -a l er t s wi t h i den t i ca l attr i bu te s may be cr ea t ed i n order to s a t i s f y d i ff er en t pa th s t hr o u g h th e atta c k graph even t ho ug h t h ey m a y ev en t ua ll y lead to th e same place. Su ch a l er t s add no t hin g to t h e i n t ell ig i b i li t y of th e r es u l t since one rea l ale r t could c once iv ably a cc o un t for all s uch i den t i ca l h y p o t hes es . W e define t w o h y p er -a l er t s as s t r a t eg i ca ll y i nd i s t i n gu i s ha b l e provided t ha t they a re of th e same t y p e, hav e the same co mbi na t io n of facts a ssigned wi t h th e same v a lue s and a pp ea r b efore th e h y p er -a l er t t h ey hav e b e en h y p o t h es i ze d to explai n. Si mi larl y to th e i mp l i ci t co rr el a t io n s t ep descri b ed in th e pr ev i o us section a h y p o t hes i ze d ale rt d a t ab a s e is add ed to each v er t ex i n th e t y p e-g r aph. 6 Emp iric al R e s u l t s T o v erif y th e t he o r y th e a lg o r i t hm is i mp lem en t ed in C[3]. T rivial s ub -g r aph eli m i na t i o n i s i mp le m en t ed by k ee pin g co un t of v er t ex degress i n C G as edge s a r e added, onl y v er t i ces wi t h de gree g r ea t er th an zer o are ou tpu t. This smal l a dd i t io n mak e s outp ut grap hs more man ag ea b l e. The Linco ln Labs 1. 0 data set is use d in th e ex p er i me n t s f or th e purp oses of g en e r a t i ng r es ul t s c omparab le to pr io r wo rk s. The se da t a - s et s include lab e ling dat a w hich all o w s f or th e co ns t r uct io n of a p erf ectl y a cc ur a t e serie s of a l er t s . A n atta ck mo del a l m o s t i den t i ca l to t ha t in [11 ] is use d. The only d iff erence is in fi xing a n error in th e origin al i n w h i c h UDP p o r t -s ca n s c oul d b e sa id to discov er TCP s er vices and vic e ver sa, w hich i s no t t he ca se. All ex p er i me n t s were ru n on a P C wi t h 1.6GH z I n t el Cor e 2 Du o CPU a nd 1GB R AM runn ing a co n t em p o r a r y Linux d i s t r i but io n. Two e x p er i men t s ar e pr op osed : e x p er i me n t # 1 is design ed to v erif y th at t he a l go r i t hm i s s ui t a b l e for a pp l i ca t io n in a real-ti me co rr el a t io n s et t i ng a s i n t end e d. E x p er i me n t # 2 is desi gne d to q ua li t a t i v el y asse ss th e h y p o t hes izi ng a lg o r i t hm when a r and om subset of r el ev an t a l er t s hav e b ee n delete d f rom a p er f ect l y a cc u r a t e aler t s t r ea m . 6.1 P e r fo r m an c e The aim of t hi s e x p er i men t is to te st th e s ui t a b i li t y o f our a lg o r i t hm f or r ea l- t i m e c o rr el a t io n. The m eth o d is to i n t er s p er s e a tru e scenar io co ns i s t i n g of 855 a l er t s wi t hi n a lar ge num b e r of rand oml y g en er a t ed a l er t s s uch th at t her e a r e 1,000,0 00 a l er t s in to tal . No di rect comp ari son wi t h prior wo r k i s p ossible her e since c ompar able a l go r i t hm s are ei t he r no t i n t end e d for real-t ime s et t i n g , do no t p er form th e h y p o t hes izi ng s t ep or use a di ff er en t atta ck mo del. I ns t ea d, t he t i m e t a ken for the s o f t w a r e to p er form th e work will b e r ec orded a nd div i ded by th e num b er of a l er t s whic h wi ll give us a co rr el a t io n-r a t e. As long as t he co rr el a t io n- r a t e is higher th an t h e r at e at which we exp e ct a l er t s to b e pr o d uc e d by th e underl y in g IDS th en th e a l go r i t hm o ug h t to b e s ui t a b l e f or r ea l -t i m e o p er a t io n. The size of the o u tp u t graphs is als o rec orde d r epr es en t i n g th e bu l k of th e me mor y ut ili za t io n of th e pr og r a m . There are s everal par a m et e rs in t hi s e x p er i me n t . F i r s t l y we will run t he ex p er i me n t wi t h v a r i a t io n s o f th e a lgo r i t h m s o tha t w e can get an idea of t he co s t s and b enef its of ea c h. 1. A lg o r i t hm 1. Minima l IAC pr o b lem . (a) W i t h i mp l ici t co rr el a t io n s di s a b le d. (b) W i t h i mp l ici t co rr el a t io n s ena b led . 2. A lg o r i t hm 2. E x t end e d IA C pr o ble m . (a) W i t h o ut co ns ol i da t i ng s t r a t eg i ca ll y i nd i s t i ng u i s ha b l e h y p o t hes es . (b) St r a t egi ca ll y i ndi s t i n gu i s ha bl e h y p o t h es es co ns oli da t ed . The ques t io n arises of h o w preci sel y to g en er a t e a lar ge n umb er of r a nd o m - ized f alse p o s i t i v e a l er t s . The attr ibut e space is 96 b i t s in to tal , based on t w o 32 bit IP addre sses, a nd t wo 16 bit p ort num be rs. If a ll attr ib u t es are t o t a ll y randomi zed th e pr o ba bi li t y of f alse co rr el a t io n s b ein g g en er a t ed is e xc ee di n g l y small. C onv ersel y if we de vise a non - rand om w o r s t -ca s e da ta- s et in w hich f a l s e alarms are cr a f t ed sp ecifical l y to g en er a t e co rr el a t io n s th en we are ven t ur i ng i n to th e area of sp ecific att acks aimed at th e co rr el a t o r i t s el f which is a pr o b lem b e y ond th e sc op e of t hi s pa p er . The chose n so l ution is base d on th e o b s er v a t io n th at in a real -world s et t i ng th e ID S is m o s t o f t en c onnec ted to a p oi n t i n an IP network w here i t c an obs erve all t r a ffi c en t er i n g or leaving tha t network. Theref ore wh ile o ne out of th e s o ur ce and d es t i na t io n addre sse s of a pack et may b e any of 2 32 p ossible IP add r ess v a lues, th e o t her side w ill b e set to one of th e addr esses on the m o n i t o r ed net w o r k which will b e a small s ubset of tha t addre ss space . T r affic no t conf ormi ng to t hes e rules is t a k i n g place outsi de of th e ran ge of co m m un i ca t io ns sy s t em s tha t t he underl ying ID S is pla ced to obser v e. Similar ly , IP servic es ten d to li s t en on well kno w n p o r t s , t yp ic a ll y t ho s e under 102 4 . Two r a n do m i za t io n meth o ds ar e c hosen, one base d on a c lass C IP net w o r k and th e o t her on a cla ss B n etwo r k. These t y p es of netw orks are d efine d a s having 2 8 and 2 16 addresses each. The a lgo r i t hm f or g en er a t i ng th e d ata is: 1. Pick a t o t a ll y r and om IP addr ess a nd p o rt n um b er . 2. Pick a ran dom IP add ress wi t h i n th e al lo wable range of our netw ork cl a ss . 3. Pick a ran dom 10 bit p ort n um b er . 4. T o ss a c oin , if hea ds th en t he full y ran dom IP is the sourc e ad dre ss, else i t ’ s th e des t i n a t io n ad dr ess . 5. T o ss a no t h er coin, if heads th en th e f ully r an dom p o rt number is the s o ur ce addre ss, e lse vic e v e r s a . Five v ersio ns of th e ran dom dat a- se t are cr ea t ed for e ach t y p e of net w o r k , making ten d ata sets in to tal . Ea ch o f th e four v a r i a t io n s of th e a lg o r i t h m w er e run on e ach of th e 10 da t a -s et s ma kin g 40 r uns in tot al . Ea ch run is r ep e a t ed t hr ee t i m es a n d th e mean CP U t i m e t a ken as th e fina l r es ul t . The v a r i a t io n i n run t i m e on th e progra m on th e same d ata- s et tu r ne d out e x t r em el y l ow so, f o r th e sake of con cision , the ind ividua l ru n -t im i ng s are no t pr es en t ed here. T h e 885 real a l er t s fr om th e LLDO S lab el ing dat a are i n t er s p er s ed ran domly , but correct ly order ed, wi t h i n e ach da t a s et . T able 1. CPU Times for Class B and C lass C R es p ec t i v el y . C l a ss E x p. M i n. ( s) M a x . ( s) M e a n ( s) Me an R a t e ( a / s) B 1( a) 1( b ) 2( a) 2( b ) 7. 476 67 5. 31 6. 553 33 6. 453 33 9. 21 6. 263 33 6. 676 67 6. 48 7. 905 5. 675 6. 599 6. 461 12 6, 50 2 17 6,2 21 15 1,5 41 15 4,7 72 C 1( a) 1( b ) 2( a) 2( b ) 7.05 3 3 6. 796 67 11. 46 10. 9067 7. 146 6 7 6. 87 11. 6033 19. 9233 7. 088 6. 818 11. 52 12. 43 141 , 088 146 ,6 75 86. 380 80, 440 If we lo ok at th e final column of t a b l e 1 we can obse r ve that th e co rr el a t io n rat e is on th e order o f 100,0 00 a l er t s p er secon d. T his seems l ikely to b e muc h f a s t er th an an IDS, cer t a i nl y the m a j o r i t y of dep l o y me n t s in any ca se. In t a b l e 2 th e colu mns stan d for the t otal num be r of v er t i ces a nd edg es i n th e o u tp ut C G resp e ctivel y . The number of false a l er t s see ms rat her a l a r m i ng T able 2. Ou tp ut size for Class B and Class C R es p ec t i v el y . E x p. H y p e r - A l e r t s C o rr e l a t io n s H y p e r - A l e r t s C o rr e l a t io n s Class B Class C 1( a) 194 , 817 157 , 73 4 346 , 782 88 8, 26 2 1( b ) 182 , 727 148 , 45 7 129 , 220 64 1, 11 5 2( a) 376 , 786 401 , 97 4 190 , 986 69 1, 80 9 2( b ) 299 , 395 302 , 55 3 190 , 417 69 1, 11 2 consider ing onl y 8 85 of t h em are pa rt of o ur sce nari o. A l t h oug h, b e ar i n min d tha t our noise is d i s t r i but ed o ver only 20 ale rt t y p es which are q ui t e h i g hly connect ed. F u rt her we have opted to r es t r i ct aler t v alues to “ r ea li s t i c” r a ng es . In prac tice a milli on a l er t s do no t occ ur ov er a fe w sec onds but p er haps days o r w ee ks . 6.2 Q ua li t y of O u tp u t The aim of t hi s e x p er i men t is to t a k e th e same t o t a l l y a cc u r a t e d at a-s e t and remov e ran dom a l er t s a nd test th e accu rac y of h y p o t h es izi n g b y ho w a ccu r a t e th e th e gra phs a re as a n incre asing number of a l er t s ar e missed. U n f o r t un a t el y th e num b er of way s of doing t h i s wi t h a da ta- s et of o f 855 a l er t s , such as ours, is a s t r o no m i ca l a nd our sample sizes would hav e to b e i na pp r o pr i a t el y large to g a i n r es ul t s w hich c an b e i n t er pret ed wi t h any confid ence. F r om exp erie nce th e a lgo - rit h m is ex t r em el y r o bus t ei t her w hen all a l er t s of one or t w o t y p es are r em o ved or sc ores o f a l er t s rem ov ed ra ndom ly . T his i n t ui t io n lea ds us on to an a l t er na - t i v e ex p er i men t a l setup. Th ere are onl y f our t y p es of a l er t s in th e ex p er i men t a l dat a set. A t l ea s t t wo t y p es are require d f or t her e to b e co rr el a t io n s and if a ll a l er t s are pr es en t th en th e o u tp u t is ideal. W e s hall e x p er i me n t wi t h r em o v i ng all 2 an d 3 co mbi na t io n s of ale r t t y p es and exa mini ng th e false co rr el a t io n r a t es which are ca l cul a t ed by han d i n t hi s ca se . These e x p er i men t s are ru n wi t h A lgo r i t hm 2(b) onl y . T o ca l cul a t e false ale r t r a t es th e ou tp u t graph s are comp ared a g a i ns t th e com ple te co rr el a t i o n g r aph which co n t a i ns 58 h y p er -a l er t s . A false ne g a t i v e is co un t ed for every ale rt i n th e com ple te C G for w hich no h y p o t h es i s ex i s t s . Conv ersel y a false p o s i t i v e is co un t ed for e v er y h y p o t hes i s which do es no t corre sp ond to a h y p er - a l er t i n th e com ple te C G. F o r lab e lin g pur p oses aler t t y p es are name d A, B, C and D, s t a ndi ng for pin g-sweep, sadmi nd-pin g, sadmi nd-ex ploit and ms t r e a m - z o m b i e r es p ect i v el y . The r es u l t s in t a b l e 3 are d i ffi cul t to anal yze wi t ho ut t a k i n g a closer lo o k at th e o ut pu t graphs pr o duce d. F or atta ck sequ ence s w hich a re s ho r t in l en g t h, missin g a l er t s can hav e a dr a s t ic effect on the fal se n e g a t i v e co rr el a t io n r a t es . F alse p o s i t i v e h y p o t hes es are a s li g h t l y less seriou s pr oblem a nd in t hi s ca s e wou l d b e en t i r el y eli m i na t ed wi t h e x i s t i n g a ud i t -r eco r d co rr el a t io n t ec hn i qu es , as pr op o sed i n [ 15 ] . T able 3 . Hy po thesis A cc u r ac y . I npu t T y p e s F a l se N e g a t i v e s F a l se P o si t i v e s A BD 3 12 B C D 32 0 A C D 26 0 A B C 14 0 A C 3 7 0 BD 35 12 C D 41 0 B C 44 0 A D 35 12 A B 20 0 7 Conclu sions and F utur e W o r k In t hi s pa p er a re al-time co rr el a t io n a lg o r i t hm using h y p er -a l er t t y p e graphs w a s prop ose d. O ur genera l a p proach was to red uce th e mi nimal I AC proble m to a series of i ns er t io n s to and remov al s from a balanc ed binar y t r ee. W e pr o ce ed e d from t her e to app roach the ex t end e d ( h y p o t hes izi ng ) pr oble m by re -p hrasi ng t he minimal pr oblem such tha t w e can r ec ursivel y in put h y p er -a l er t s wi t h u nkn own (or wi ld-car d) att rib u t es. It was sh o w n tha t such a lgo r i t hm s a re feasib le pr o v i ded a f ew co ndi t io n s are m et : – The numbe r of c ompar able fac ts in h y p er -a l er t s i s s m a ll . – If h y p er - a l er t s are to b e h y p o t h es i ze d the n t y p e-g r a ph s should b e c ho s en care full y in ord er to pr ev en t a e x p o nen t i a l expl osion in t i m e co mp l e x i t y . The a lgo r i t hm w as i mp le m en t ed and v a li d a t ed t hr o ug h a seri es of e x p er i- men t s w hich showed tha t a go od i mp le m en t a t io n is s ui t a b l e for r eal-ti me co rr e- l a t io n e ven in c ases where th e IDS alert r ate is alarmi ngl y hi gh. In t hes e ca ses th e size of th e o u tp u t graph b e comes th e o verridin g f ac tor in dete rmin ing t he pr act i cal ut ili t y of th e a lgo r i t hm . It was also confir med tha t picking th e r ig h t a gg r eg a t i o n funct ion is inv aluable in t hi s re sp ect by a llowing many h y p er - a l er t s to b e mer ged i n to a si ngle l ogical un i t . Ho weve r i t is no t i mm ed i a t el y cl ea r ho w b es t to design t h es e f uncti ons such as to mini mize large outp ut gra phs to a s a t i s f a ct o r y de g r ee . A l t h ou g h our approach do es no t req uire a v ul ne r a b i li t y a ssess me n t i t ha s b een sh o wn th at i t is pos sible to make use of su c h i nf o r m a t io n if i t is t her e [15]. It app ear s tha t our a l go r i t hm coul d b e mo dified for s imila r pu rp ose s. T he basic a pproa ch h ere would b e to i nc o r p o r a t e sp ecia l co n s t r a i n t s w hich dep end on e x t er na l evi denc e source s. T hese would be check ed b ef ore co rr el a t i n g or h y - p o t h es izi ng an a l er t . How ever t hi s leads to q ues t io n of h ow to d et er m i n e w hen to ign ore f alse n e g a t i v e v ul ner a b i li t y a sse ss me n t s if a su ccessfu l atta ck of t h e r el e v an t t y p e has b een ob se rv ed? Thi s may al so b e a f r u i t f u l direc tion f or i n v es - t ig a t io n. B i b liog r a ph y [1] Rud olf Bay e r. S ymmetri c Bi nar y B-Tees: Da ta str u c tu re and m a i n t ena nce a l go r i t hm s . A cta Inf., 1:290 –306, 197 2 . [2] Dingb ang Xu an d P en g Ning. A l er t C o rr el a t io n t hr o ug h T ri ggerin g E ven t s and C om mon Re source s. In Pr o c . 20th An n ual Com puter Se curit y Ap p l i - c ations Conf er enc e, 200 4 . [3] Gianni T edesco. ATG co rr el a t o r source c od e and do cu m e n t a t io n, 200 8 . URL h tt p: // ww w . s ca r a m a n g a. co . uk / a t g / . [4] G ianni T e desc o, Ja mi e Tw ycross, an d Uw e A ickelin. I n t eg r a t i ng i nn a t e and ad ap t ive i mm uni t y for i n t r u s io n det ec tio n. In P r o c. I n t e r n a t i o n a l Con f er- enc e on Artif icial Im mune S y s tems, 200 6 . [5] Laura P . Swiler, C y n t hi a Phil lips, David El lis, a nd Stefan Ch aker ia n. C o m - puter A tt a c k Grap h G en er a t io n To ol. In P r o c. DARP A Inf orm a tio n S u r - vivabil it y Co nfer enc e & Exp osit ion II, 200 0 . [6] Ling yu W an g, A nyi Li u, and Su shil Ja jo d ia. A n E ffi ci en t , Unifie d A pp r o a c h to C o rr el a t i n g , H y p o t hes izi ng , and Pr ed ic ti ng I n t r us io n A l er t s . In P r o c . Europe an Sym p osium on Co mputer Security, 200 5 . [7] Peng N ing a nd Din gban g Xu. Ada pt ing Quer y Opt i m i za t io n Techniques f o r E ffi ci en t I n t r us io n A l er t C o rr el a t io n. Technical R ep o r t TR-200 2-14 N C SU Dept. of Co mpu ter Scien ce, 200 2 . [8] Peng Ning a nd Din gbang Xu. H y p o t h es izi n g an d Reasoni ng a b out A tt a c k s Misse d by I n t r us io n Detect ion S ys t em s . ACM T r a nsa ctio ns on I n f o r m a t i o n and System Se curit y, 7( 4):59 1–62 7, N o vember 200 4 . [9] Peng Ning, Y un C ui, and Dou glas S . Ree ves. Anal yzi ng I n t ens i v e I n t r u s io n A l er t s Via C o rr el a t io n. In R e c ent A dvances in In tru sion Dete ctio n, 200 2 . [10 ] Peng Ning, Y u n C ui, an d Doug las S. Ree ves. C o ns t r uct i ng A tt a c k Scen a r io s t hr o ug h C o rr el a t io n of I n t r u s io n A l er t s . In Pr o c. 9th ACM Co nfer enc e on Com puter & Co m m u n icatio n s Se curit y, pa ges 245– 254, 200 2 . [11 ] Peng Nin g, Di ngba ng Xu, C hr i s t o ph e r G. He al y, and R o b er t St. A m an t . Buildi ng A tt a c k Sce narios t hr o ug h I n t eg r a t io n of C o mp le m en t a r y A l er t C o rr el a t io n Meth o ds. In Pro c . 11th Annua l Netw ork an d Distri buted S y s t e m Se curit y S ymposium, page s 97– 111, 20 04 . [12 ] Rena ud Der ais on. Ne ssus a ut o m a t ed vu l enr a b i li t y scanner, 2 008. U R L h tt p: // ww w . nes s us . o r g / . [13 ] Ste ven J. T e mpl eton and Kar l Le v i tt . R eq u i r es / P r o v i des M odel for C o m - puter A tt a c k s . In P ro c. Wor ksho p on Ne w Se curit y Paradi gm s, 200 0 . [14 ] Ste v en N o el, S us hil Ja jo dia, and Br ia n O’Be rr y. Mana ging Cyb er T h r e a t s : Issues Appro ac hes and Chal le nges, c ha pter T op ological Anal ysis of N et wo r k V u l ner a bi li t y . 200 5 . [15 ] Y a n Zhai, Peng Ning, Pu rus h Iy er, an d D ougla s S. Ree v es . Re as oning ab o ut C o mp le m en t a r y I n t r us io n Evide nce. P r o c. 20t h Annual Com puter Sec u r i t y Applications C onfer ence, 200 4 .

Real-Time Alert Correlation with Type Graphs

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment