Strategic Alert Throttling for Intrusion Detection Systems

Strat egic Alert Throttling f or Intrusion Detection Sy stems Gianni T edesco and Uw e Aic ke lin The School of Computer Science & IT The U niv ersity of N ottingham Jubilee Campus, W ollaton R oad, N ott ingham U nited Kingdom {gxt,uxa}@cs.nott.ac.uk Abstr act : - Ne tw or k intrusion detection sy stems are t hemsel v es be coming targ ets of at tack ers. Aler t f l ood attacks m a y be used to conceal malicious activity b y hiding it among a deluge of f alse aler ts sent b y t he attack er . Although t hese types of attac ks are v ery hard to s top completel y , our aim is to present tec hniques that impro v e alert t hroughput and capacity to such an e xt ent that the resources req uired to successfull y mount the attack become prohibitiv e. The k e y idea presented is to c ombine a token bucke t f ilter with a real-time cor relation algor ithm. The proposed algor ithm t hro ttles aler t output from the IDS when an att ac k is detected. The attack graph used in t he cor relation al gorithm is used to make sure that aler ts cr ucial to f or ming strategies are no t discarded b y thro ttling. Key- W or ds: - Intrusion Detection Sys tems, Intr usion Aler t Correlation, Attac k Graphs, Denial of Servi ce A ttacks, T oken Buc k et F ilter 1 Introduct ion As global aw areness of inf ormation secur ity issues has increased, so has the prolifer ation of intr usion detection t echnol ogy . N etw ork intr usion de tection sy stems (NIDSs or s impl y IDSs) are quickl y becoming a cr ucial part of t he Internet s ecurity infrastructure. Back in March 200 1 , there was a media furore[1] when the FBI Internet cr ime divi sion iss ued a wa rning concer ning t he then unreleased S tick[2] program which “essentially di sarms intr usion detection sy s tems. ” The tool a utomated what we shall call the aler t f lood att ac k. The attack wor ks because each time an i ntrusion detection sy stem raises an aler t it must m ak e some attempt to communicate the inf ormation to an operator . This communication channel c an theref ore become the targe t of a denial of ser vice attack because, like al l communication channels, it has a fixed capacity . If this channel can become o v er whelmed with bogus data, an attacker can quic kl y achie v e complete neutralization of intr usion detection capability . There are, i n fact, numerous possible types of denial of servi ce attack agains t a netw ork IDS[3], but w e will f ocus on this particular att ac k type. A great de al of research has gone in to techniques f or reducing false positiv e alarms generall y . One such technique is aler t c orrelation. The aim of alert cor relation is to anal y se the alert s tream and disco v e r strategies or attack scenar ios using some kind of model of possible attack er strategies[4]. One quite intuitiv e type of model is an attack graph[5,6,7]. The advantage of this kind of cor relation is that aler ts which do not (y e t) conf orm to a threatening attack s trategy are not displa y ed. W e propose a no v el algorithm to pro tect NIDSs from aler t-f looding attacks. The algor ithm combines a throttling a lgorit hm, namel y a token buck et f ilter , w ith an e xist ing real-time al ert cor relation algor ithm. The aim is to reduce aler ting throughput in t he f ace of an aler t f lood attack, while minimising the chances of missing i mport ant aler ts. The key to our approach is using t he att ack graph to inf orm the throttling algorithm so that they ke y aler ts which make up threatening strategies are not dropped b y the the sensor . The ne xt section of t his paper will present t he rele v ant background f or the proposed techniques. The aler t f lood attack is defined and current approaches are ex amined. The real-time cor relation algor ithm our sol ution is bas ed on is also introduced. In section 3, a modified cor relation algor ithithm is pr esented which uses throttling techniq ues to curb alert f lood attacks. In section 4 some experimental data is presented in order to demonstrate the effe ctiv eness of our techniq ue. W e finish b y presenting a summar y and some concluding remarks. 2 Bac kgroun d The pattern matching[8] model is cur rentl y the most commonl y used methodology f or detecting intr usion attempts. In this model t he NIDS is configured w ith a database of kno wn attac k pa tterns (also called signatures). An ex ample of a signature is sho wn in Listing 1 . This signature aler ts on t raffic generated b y the w ell-kno wn “BackOrifi ce” trojan horse program and detects any incoming pa c ke ts de s tined to use r dat agram protocol (UDP) por t 3 1 33 7 , containing a specific sequence of bytes anywher e within its pa y load. alert udp $EXTERNAL_NET any -> $HOME_NET 31337 (msg:"BACKDOOR BackOrifice access"; content: "|ce63 d1d2 16e7 13cf39a5 a586|";) Listing 1: A Sam ple R ule as used e.g. b y Snort. 2.1 Alert Flooding Aler t f looding att acks are achie v ed by transmitting pac ke ts that simulate int rusion attempts and which the IDS will recognise as tr ue a ttacks. T aking t he e xam ple signature in Lis ting 1 , an attacker must craft a UDP packe t, set t he destination por t to 3 1 33 7 , include the sequence of b ytes giv en in the signature and f lood the t arg e t netw or k with these pack ets. The possible ramifications of t his type of attack agains t an IDS are threef old: 1 . Sensor s torag e bec omes full, pre venting fur ther logging. 2. Sensor e x c eeds maximum aler t throughput, causing aler ts to be lost, or the s ensor to cease functioning. 3. The analy s t becom es deluged wit h false inf ormation and becomes unable to distinguish re al attacks from the fa lse ones. Because of this, att ack e rs ma y use the aler t f l ood attack as a w ay to conceal g enuine malicious activities. The aler t f looding techniq ue has been automated, and hence popul arised, b y tools such as Stick a nd Snot [9] which read in signatures di rectl y from the freel y a v ailable Snor t [1 0] IDS. Each pack et sent could also ha v e cr ucial f ields such as s ourc e and destination address modulated b y adding random data into them. This r andom noise makes it diff icult to block t he att ack using a simple pack et filter or firew all. Aler t f loods can also be ex acerbated b y the poor aler ting per f ormance of IDS syste ms in general. A quic k ex amination of the Snor t sys tem rev eals that, in its pref e rred output mode (called “unif ied”), Snor t f lushes its buff ers needlessly in a t least tw o places. This causes a reduction in t he eff ectiv e ness of t he buff ering and on UNIX like sys tems results in added sy s tem call ov erhead for e v e ry logg ed aler t. Perf ormance in this area can be understandabl y o v erlooked by t he IDS sys tem designer . After all, good engineer ing practice tells us to optimise f or the common case, and, in the w or ld of intr usion detection, an aler t is not usually t he common case. In fact, on a high-speed netw ork it should be a v ery rare e v ent indeed. Per haps the simples t w a y to reduce dat a output while maint aining the sa me intr usion detection capability is to make mi nor modifications to the signatures to make sure that t he IDS is as terse as possible. Such modifications are of ten used to reduce t he number of f alse positiv e aler ts generated. In fact generall y speaking, signatures are usually a subtle comprom ise betwe en allo wing f alse negativ e and fa lse positiv e alerts. One wa y to make t he IDS less v erbose is to f ine- tune signatures t o e x amine onl y those pac k ets destined f or the rele v ant hos ts. Let us consider BIND, D NS ser v er software inf amous for its secur ity vulnerabilities. In thi s situation, the signatures m a y be modif ied to only look for BIND e xploits if the des tination address on t he packe t matches a pre-defined list of DNS servers. O f course, the operator may actually be interested to kno w that someone is at temp ting a BIND e xploit on a w or kstation or a web ser v er . That is to say , this approach tips t he false alar m compromise to w ar ds the f alse negativ e side. Interesting l y this problem also comes up when designing attack graph for cor relation algorithm s. The S nort team addressed the problems of wide spread prolif er ation of automated a lert f looding tools like Stic k and Snot in their 1 .8 release. Their solution w as to implement a T ra nsmission C ontrol Protoc ol (TCP) state tracking sy s tem which t he y called “stre am4”. By ke eping track of TCP connection states, str eam4 is able to ignore an y segments which are not par t of such a con v ersation. In order to make t he IDS raise an aler t the attacker is no w f orced to transmit at l eas t three segments, rather t han just one. More import antl y , because t he three-wa y handshake requir es tw o hosts to be communicating, t he e xternal attack er must f ind a host on the monitored netw ork willing to par ticipate. This m ight be prev ented by a firew al l bloc king connections. Cur rentl y most sys tems keep track of TCP states. This is mainly to protect against desynchronisation attacks such as t hose described b y Ptacek and Ne wsham[3], but there is also the additional benef it of m aking sure t hat t here is no such short cut in car rying out an alert f looding attack. Furt her to performing TCP state tracking, it is also possible to track an y a pplication la y er state, enabling us to remo ve shor tcuts ev en f or protocols running o v er stateless transports such as UDP . While this is a definite impro v ement, it ca nnot co v e r all cases: For e x ample, some signatures must ignore state inf ormation a s some exploits can exi st as a single pack e t (i.e. statelessly); or because in other cases, they wor k o v er inherently stateless protocols. As w e descr ibe in the next section, token buck et filters combined wit h a ttack g raph cor relation can impro ve the situation. 2.2 T ok en Buc k e t Filter A tok en bucke t filter is an algor ithm for c ontrolling the rate of f low of data. T oken buck et f ilters hav e traditionall y been used in a number of a pplications where rate limiting has been needed. Some good e xam ples are: 1 . Ne tw or k bandwidth manag ement sy stems[1 1]. 2. Flood pr o tection i n networ k chat / t e xt conf erencing sys tems such as Int ernet Rela y Chat. 3. Flo w control in netw ork t ransport protocol s [1 2]. 4. Flood protection f or programs tha t log e xternally g enerated ev e nts such as UNIX sy slog. A tok en buck et filter has tw o parameters, bucke t size, and tok en rate [1 3]. T okens are g enerated at t he token rate and stored in a buffer called t he “bucket '” If the bucke t becomes full, the e xtra tokens are j us t discarded. Each aler t that ar r iv es must hav e a token to pass t hrough the filter . An y aler t that does not ha v e a token i s c alled “o v er- limit” and does not pass the filter . If the aler t rate is less t han the tok en-rate then credit is allo w ed to accumulate in the buck et. This stored credit allo w s f or the aler t-rate to temporar il y e x ceed t he tok en rate (or “burst”). 2.3 A ttack Graph Corr elation W ang et al pro vi de a unif ied approach to cor relating, predicting and reasoning a bout mi ssed aler ts in [1 4]. T he approach w orks in real-time and uses an i n-memory data structure to per f or m the cor relation. The correlation algor ithm is robust in the face of missing alerts from the underl ying IDS. An in-memory data structure cal led a “q ueue graph” (QG) is introduced. In order t o av oid keeping unnessecary aler ts in memory , only the latest aler t f or a giv en exploit v ertex is stored in this structure. That i s to say that the cor relation betw een such matching aler ts is left as implicit. T his allow s the algor ithm to be run in real-time without necessetating the usual sliding cor relation windo w approach which w ould allow an att ac ke r to use an aler t f lood at tack to introduce false negativ e cor relations. In this s y s tem, attack graphs are def ined as directed acy clic graphs (D A G s) ha ving two distinct types of v ertices, secur ity conditions and e xploits (see Figure 1) . Exploit v ert ices are (vuln,src,dst) tuples. The src and dst f ields a re used to tie the e xploit to specific combinations of vulnerable and attacking hosts, wildcards m a y be used. Thes e ve r tices ma y represent one or more possible alert types. A function “f” is introduced which maps alert s to an e xploit v ertices in the atta c k graph. Secur ity conditions ve r tices refer to prerequi sites and consequences of expl oits. Thus edg es connecting a condition to an e xploit are prerequisite relations and t hose connecting an e xpl oit to a condition are consequence relations. Figure 1: A Sam ple At tack Graph Attac k graphs are generated autom atically with TV A, the topological vulnerability assessment tool[1 5] which links together t he output of Nessus, IDS r ules and a vulnerability da tabase. In order to do this a fun ction which maps aler ts to e xploits is introduced. In this wa y the cor relation algor ithm i s vulnerability -centric. That is to sa y it will no t cor relate e xploits against machines which are not defined a s being vulnerable to t hem. These graphs are distinct from those used b y Ning et al in that they contain not just t he caus al relationships betw een attacks but also a databas e of vulnerable hosts on the netw ork. An IDS (in this case Snor t) is set up to send i ts aler ts directl y to the cor relation component. The wa y the attack graph is used b y the cor relation component is to treat each e xploit vertex in the g raph as a queue. Aler ts are placed in their requisite queue and a breadth first search is per f ormed in t he g raph to find pre vious e xpl oits whic h w ould cor relate wit h the cur rent one. If a queue is f ound and is non-empty then a cor relation i s generated. If a queue is empty , the algor ithm can ei ther stop or hypo thesise a missing attack and carr y on. If the edges in t he g raph are di rected f orwards in time, rather t han bac kw ards, predictions can be ge nerated in much the same w a y as c or relations. The QG st ruct ure is actually an enhanced version of the attack graph. A tree is created for each e xploit v ertex in the g raph. In these trees, the cor relation and prediction edges are all precalculated. This eff ectiv el y means that cor relation and prediction can be done in l inear time b y searching in a tree rather than quadratic time b y per f orming breadth first searc h in the attack graph and this is what makes the algorithm s uitable f or real-time application. The out put of the algor ithm is a cor relation graph which can contain a mix of real and h ypothesised aler ts and security conditions. Readers are urg ed to consult the or iginal paper f or the full details[1 4] . 3 Strategic Data R eduction W e hav e descr ibed t he a lert f lood attack in the pre vious se ctions as fundament all y a resource e xhaus tion attack. In this section w e will outline an approach t o reduce e xposure to the attack b y combining aler t t hro ttling with attack graph cor relation. Consider the case of a hum an IDS operator as a resource t hat cannot cope with ha ving to ex amine man y thous ands of bogus aler ts at the rate at which a sustained attack can produce them. There are tw o approaches to s ol ving t his type of problem: one is to increase t he amount of resources at your disposal, the other is to reduce the amount of resources requir ed. While it i s conceivable that one could scale the sensor hardw are to be fully abl e to cope with aler t f loods at a giv en rate f or a giv en length of time it seems rathe r m ore com ple x to scale the human operator . T aking the approach of minimising t he resources require d, alert dat a could be reduced b y throttling the aler t s tream to a fixed r ate. This could be achie v ed by applying a tok en buc ke t f ilter either per signature, per attack type , globally , or e v en in to comple x hierarchie s as in HTB3[1 5] . The burstiness f eature of the TBF algor ithm means that aler ts are onl y discarded under sustained high rate of al erts. Ho w e v er such approac hes run t he r isk of dropping import ant aler ts whic h can ev en assist an att ac ker in concealing their malicious activities. The ke y to our approach is to allo w the cor relation algor ithm to inter pose betwe en t he signature matching, and output components of t he IDS. By doing t his, a token buck e t f ilter can be placed at each queue in t he QG structure and o v er limit aler ts can be discarded. In order that the user ma y be inf ormed of dropped aler ts we can use a kind of “run length e ncoding” (RLE) to represent a s tring of aler ts. RLE is a simple compression tec hnique whic h replaces recurr ing sequences of symbols (called r uns) with a single symbol and a r un count N . T o decompress, one simpl y copies the symbol into the output stream N times. This is an approach familiar to UNIX users who ha v e ev er tr ied to f lood the syslog program and seen its “last messag e r epeated N times” w arning. T o implement RLE compres sion in our case, we first assume that all alert s going through t he same token buck e t f ilter are identical. Then all tha t is required is to add a counter to the queues in the QG data structure and increment tha t counter f or all o v er - limit aler ts. When t here is enough credit in the token buck e t to per mit new aler ts, we dequeue t he the alert and the counter , allo wing them to add a node in the output g raph and to be logged to per manent stor age . This allo w s f or some minimal reconst ruction of lost pack ets by just using the inf ormation in the att ac k graph. T w o questions then arise . Firstl y what to do with alert s not mapping to vertices in t he queue g raph; and secondl y what para mete rs to use for t he tok en buck e t filters. For t hose alert s w hich do not map in to e xpl oit nodes, we cannot be sure t hat we are mi ssing aler ts vital to some strategy . Since the QG algor ithm assumes a complete att ac k g raph anyw a y we could discard all such aler ts. A more prodent approach is taken in our case, and that is to apply a token bucke t filter to such alerts on a per -signature basis. As f or t he parameters of the T BFs, for those aler ts which map to v ertices in t he a ttack gr aph, w e could drop all implicitl y cor relating aler ts and keep the same strategies. Ho w e v er it i s seen as a benefit to kee p aler ts where possible, here w e en visag e that tok en rates of greater than one or two aler ts per second need not be used. For other aler ts ho w ev er , there i s, of course, a trade-off betw een da ta f idelity and eff iciency . In t he ne xt section, w e will sho w that this techni que scales up such that i t e ff ectiv el y nullif ies the computational eff ect of an alert f lood attack. 4 Emp irical Dat a W e can pe rf orm a simple test w ith the Firest orm[1 6] sy stem r unning of f-line against a tcpdump[1 7] capture f ile cont aining an aler t f lood attack captured b y S hmoo G r oup at a defcon CTF ev ent [1 8]. The attack consists of a repeated ICMP f lood at a rat e of around 7 ,343 pac k ets per second. W e per f orm 2 tests and in both, we ha v e a full signature dat abase loaded cont aining around 1 ,600 signatures, with t he netw ork dat a read directl y from the hard disk. The test machi ne was a 3.2GH z Pent ium-IV r unning Linux 2.6 with 1GB of RAM. The results sho wn a re an a v erage of three iterations f or both r uns to factor out any random f luc tuations such as ma y be ca used b y disk seek latency . The first r un (#1) is a control r un using f ires torm + QG algorit hm. The second r un ( #2) i s identical e x cept f or the addition of tok en bucke t f iltering. T w o set s of filters are used: 1 . The set of f ilters f or each e xploit verte x in the attack graph. 2. The s e t of f ilters f or each rul e which does not map to a v er tex i n the attack graph. Each of t hese f ilters is set to 2 aler ts per second and a burst of 20 al erts. These parameters are rather arbitrar y but are probabl y best set based on the operators e xperience of the ba seline aler t rate for the netw ork. # Data Size (KB) Alerts CPU Time Run T ime 1 47 5,229 300,7 4 1 1 3. 1 3 1 1 8.4 7 6 2 1,092 696 1 2. 1 53 1 2.8 1 7 T able 1: Experiment al R esults. As we can see i n T able 1 , the amount of data logged was reduced b y se v er al orders of magnitude and t he run time decreased di sproportionately to the CPU time. While t he r un time w as reduced by around 30%, t he CPU time onl y reduced by around 1 0% . This i ndicates t hat the Fires torm pr ocess i s not was ting as much ti me waiting f or I/ O completion when the token buc k et filter is enabled, The number of aler ts output is reduced by orders of magnitude. In the e xperi ment t he c ommunication channel betw e en the ID S and the ope rator is simpl y an on-disk aler t spool so t he av ailable bandwitdth is high. I n a real w orl d de plo yment, on t he other hand, it is likely t hat aler ts w ould be transmitted across a netw or k a dding further latency and bandwidth constraints. In these deplo yments w e e xpect e v en greater gains in perf ormance. From these results it is sho wn that we can eff ectiv e l y boost perf orm ance and therefor e sensor capacity , allo wing the IDS to car r y on w orking during an aler t f lood rather than becoming o v erwhelmed and possibl y e xhausting t he storag e on the sensor . E v e n if the att ac k contained twice as many pack e ts in the same space of time, it w ould not double the amount of data logged as the tok en rate is fix ed. 5 Summary and Conclusions Aler t f looding is a problem t hat will proba bl y alw a y s e xis t with intr usion detection sys tems and one that cannot be eliminated entirely . Ho w e ver , w e ha v e sho wn that it is possible to drasticall y reduce the e ff ects by recognising an att ack and throttling e x ces s alerts. W e ha v e fur ther sho wn t hat real-time aler t cor relation algor ithms can be used to pro vi de a useful context f or throtting al erts such that ke y attacks are not missed, such an approach solv es problems with either technique used in isolation. Without t he cor relation sys tem interceding betw e en the signature matching and alerting components of the IDS it is not possible for it to decide if aler ts ma y be logg ed or not and without having strategic inf ormation a v ailable to t he t hr ottling algor ithm, it could drop crucial aler ts. Furt her inv estigation is required in to producing optimal token bucke t f ilter configurations and how best to handle those aler ts which do not m ap on to an y e xpl oit v ertices in t he attack graph. Ref erences: [1] ZDNe t UK Ne w s . http://news.zdnet.co.uk/internet/se curity/0,39020375,2085099,00.htm [2] G . Corete x. “Fun With Pack ets : Designing a St ick. ” E ndeav or Sy st ems Inc., 2002. [3] T . H. Ptacek and N. N. Ne w sham. "Inser tion, Ev asion and Denial of Ser vice: Eluding Netw ork Intrusion Detection. ” Secur e N e tw or ks Inc., 1 998. [4] Xinzhou Qin. W enke Lee. “ Attack Pla n Rec ognition and Prediction Using Causal Ne tw or ks”. Pr oceedings of Annual Computer Security Applications Conf er ence , 2004. [5] P eng Ning. Y en Cui, and Douglas S R ee v es. “Constructing Attack Scenarios through Cor relation of Intr usion Aler ts. ” Pr oceedings of the 9th A CM Conf er e nce on Computer & Communications Security. 2002. pp. 245-25 4. [6] Peng Ning, Dingbang X, Chr istopher G. Heale y , R obert and St. Amant. “Building A ttack Scenarios through Integ ration of Complementar y Aler t Methods. ” Proceedings of t he 1 1th Annual Ne tw ork and Distributed Sys tem Security Symposium, 200 4, pp. 97-1 1 1 . [7] Oleg Sheyner , Joshua Haines and S omesh Jha. “ Autom ated Generation and Analy sis of Attac k Graphs. ” Pr oceedings of the IEEE Symposium on Security and Pri v ac,. 2002. pp. 27 3. [8] " The Science of Intrusion Detection Sys tem Attack Identification." Cisco Syst ems.2002, http://www.cisco.com/warp/public/c c/pd/sqsw/sqidsz/prodlit/idssa_wp. htm [9] Sniph. “snot “. 200 1 . [1 0] Mar ty Roe sch. "Snor t - Li ghtw e ight Intr usion Detection f or Netw orks". USENIX 1 3 th Sys tems Administr ation Conf e r e nce, 1 999. [1 1] G. W oodruf f, R. R og ers and P . Richards. "A cong estion control framew or k for high-speed integrated pack eti zed transpor t." IEEE G lobecomm, 88. 1 9988. [1 2] R. W ade, M. Kara and P .M. Dew . "Study of a T r ansport Protocol Emplo yi ng Bottleneck Probing and T ok en Buck et Flo w Control." Fifth IEEE Symposium on Computers and Communications, 2002. [1 3] J. T ur ner . "Ne w directions in communications (or which w a y to the inf ormation ag e?)" IEEE Communications Mag azine ,V ol.24, N o. 1 0, pp. 8-15. [1 4] Lingyu W ang, Anyi Liu and Sushil J a joda. “ An Eff icient U nified Approach t o Correlating Hypothesising, and Predicting Intr usion Aler ts.” Pr oceedings of Eur opean Symposium on Computer Security, 2005. pp. 24 7-266. [1 5] Sushil Jajodia, Ste v e Noel and Br ian O’Ber ry . “T opological analy sis of network att ac k vulnerability .” Managing Cy ber Thr eats: Issues, Appr oaches and Challeng es , 2005. Springer . pp. 248-266. [1 6] Mar tin Dev era. "Hierarc hical tok en buck et theor y ." 2002. http://luxik.cdi.cz/~devik/qos/htb /manual/theory.htm [1 7] Gianni T edesco. 2005. Fires torm IDS. http://www.scaramanga.co.uk/firest orm/ [1 8] Leres V an Jacobson, C raig McCanne and Ste v en McCanne. “tcpdump”. Lawrence Berk ele y Nat ional Laboratory . [1 9] Shmoo Group. “CCTF Defcon Data”. 200 1 . http://www.shmoo.com/cctf/

Strategic Alert Throttling for Intrusion Detection Systems

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment