Strategic Alert Throttling for Intrusion Detection Systems

Network intrusion detection systems are themselves becoming targets of attackers. Alert flood attacks may be used to conceal malicious activity by hiding it among a deluge of false alerts sent by the attacker. Although these types of attacks are very…

Authors: Gianni Tedesco, Uwe Aickelin

Strategic Alert Throttling for Intrusion Detection Systems
Strat egic Alert Throttling f or Intrusion Detection Sy stems Gianni T edesco and Uw e Aic ke lin The School of Computer Science & IT The U niv ersity of N ottingham Jubilee Campus, W ollaton R oad, N ott ingham U nited Kingdom {gxt,uxa}@cs.nott.ac.uk Abstr act : - Ne tw or k intrusion detection sy stems are t hemsel v es be coming targ ets of at tack ers. Aler t f l ood attacks m a y be used to conceal malicious activity b y hiding it among a deluge of f alse aler ts sent b y t he attack er . Although t hese types of attac ks are v ery hard to s top completel y , our aim is to present tec hniques that impro v e alert t hroughput and capacity to such an e xt ent that the resources req uired to successfull y mount the attack become prohibitiv e. The k e y idea presented is to c ombine a token bucke t f ilter with a real-time cor relation algor ithm. The proposed algor ithm t hro ttles aler t output from the IDS when an att ac k is detected. The attack graph used in t he cor relation al gorithm is used to make sure that aler ts cr ucial to f or ming strategies are no t discarded b y thro ttling. Key- W or ds: - Intrusion Detection Sys tems, Intr usion Aler t Correlation, Attac k Graphs, Denial of Servi ce A ttacks, T oken Buc k et F ilter 1 Introduct ion As global aw areness of inf ormation secur ity issues has increased, so has the prolifer ation of intr usion detection t echnol ogy . N etw ork intr usion de tection sy stems (NIDSs or s impl y IDSs) are quickl y becoming a cr ucial part of t he Internet s ecurity infrastructure. Back in March 200 1 , there was a media furore[1] when the FBI Internet cr ime divi sion iss ued a wa rning concer ning t he then unreleased S tick[2] program which “essentially di sarms intr usion detection sy s tems. ” The tool a utomated what we shall call the aler t f lood att ac k. The attack wor ks because each time an i ntrusion detection sy stem raises an aler t it must m ak e some attempt to communicate the inf ormation to an operator . This communication channel c an theref ore become the targe t of a denial of ser vice attack because, like al l communication channels, it has a fixed capacity . If this channel can become o v er whelmed with bogus data, an attacker can quic kl y achie v e complete neutralization of intr usion detection capability . There are, i n fact, numerous possible types of denial of servi ce attack agains t a netw ork IDS[3], but w e will f ocus on this particular att ac k type. A great de al of research has gone in to techniques f or reducing false positiv e alarms generall y . One such technique is aler t c orrelation. The aim of alert cor relation is to anal y se the alert s tream and disco v e r strategies or attack scenar ios using some kind of model of possible attack er strategies[4]. One quite intuitiv e type of model is an attack graph[5,6,7]. The advantage of this kind of cor relation is that aler ts which do not (y e t) conf orm to a threatening attack s trategy are not displa y ed. W e propose a no v el algorithm to pro tect NIDSs from aler t-f looding attacks. The algor ithm combines a throttling a lgorit hm, namel y a token buck et f ilter , w ith an e xist ing real-time al ert cor relation algor ithm. The aim is to reduce aler ting throughput in t he f ace of an aler t f lood attack, while minimising the chances of missing i mport ant aler ts. The key to our approach is using t he att ack graph to inf orm the throttling algorithm so that they ke y aler ts which make up threatening strategies are not dropped b y the the sensor . The ne xt section of t his paper will present t he rele v ant background f or the proposed techniques. The aler t f lood attack is defined and current approaches are ex amined. The real-time cor relation algor ithm our sol ution is bas ed on is also introduced. In section 3, a modified cor relation algor ithithm is pr esented which uses throttling techniq ues to curb alert f lood attacks. In section 4 some experimental data is presented in order to demonstrate the effe ctiv eness of our techniq ue. W e finish b y presenting a summar y and some concluding remarks. 2 Bac kgroun d The pattern matching[8] model is cur rentl y the most commonl y used methodology f or detecting intr usion attempts. In this model t he NIDS is configured w ith a database of kno wn attac k pa tterns (also called signatures). An ex ample of a signature is sho wn in Listing 1 . This signature aler ts on t raffic generated b y the w ell-kno wn “BackOrifi ce” trojan horse program and detects any incoming pa c ke ts de s tined to use r dat agram protocol (UDP) por t 3 1 33 7 , containing a specific sequence of bytes anywher e within its pa y load. alert udp $EXTERNAL_NET any -> $HOME_NET 31337 (msg:"BACKDOOR BackOrifice access"; content: "|ce63 d1d2 16e7 13cf39a5 a586|";) Listing 1: A Sam ple R ule as used e.g. b y Snort. 2.1 Alert Flooding Aler t f looding att acks are achie v ed by transmitting pac ke ts that simulate int rusion attempts and which the IDS will recognise as tr ue a ttacks. T aking t he e xam ple signature in Lis ting 1 , an attacker must craft a UDP packe t, set t he destination por t to 3 1 33 7 , include the sequence of b ytes giv en in the signature and f lood the t arg e t netw or k with these pack ets. The possible ramifications of t his type of attack agains t an IDS are threef old: 1 . Sensor s torag e bec omes full, pre venting fur ther logging. 2. Sensor e x c eeds maximum aler t throughput, causing aler ts to be lost, or the s ensor to cease functioning. 3. The analy s t becom es deluged wit h false inf ormation and becomes unable to distinguish re al attacks from the fa lse ones. Because of this, att ack e rs ma y use the aler t f l ood attack as a w ay to conceal g enuine malicious activities. The aler t f looding techniq ue has been automated, and hence popul arised, b y tools such as Stick a nd Snot [9] which read in signatures di rectl y from the freel y a v ailable Snor t [1 0] IDS. Each pack et sent could also ha v e cr ucial f ields such as s ourc e and destination address modulated b y adding random data into them. This r andom noise makes it diff icult to block t he att ack using a simple pack et filter or firew all. Aler t f loods can also be ex acerbated b y the poor aler ting per f ormance of IDS syste ms in general. A quic k ex amination of the Snor t sys tem rev eals that, in its pref e rred output mode (called “unif ied”), Snor t f lushes its buff ers needlessly in a t least tw o places. This causes a reduction in t he eff ectiv e ness of t he buff ering and on UNIX like sys tems results in added sy s tem call ov erhead for e v e ry logg ed aler t. Perf ormance in this area can be understandabl y o v erlooked by t he IDS sys tem designer . After all, good engineer ing practice tells us to optimise f or the common case, and, in the w or ld of intr usion detection, an aler t is not usually t he common case. In fact, on a high-speed netw ork it should be a v ery rare e v ent indeed. Per haps the simples t w a y to reduce dat a output while maint aining the sa me intr usion detection capability is to make mi nor modifications to the signatures to make sure that t he IDS is as terse as possible. Such modifications are of ten used to reduce t he number of f alse positiv e aler ts generated. In fact generall y speaking, signatures are usually a subtle comprom ise betwe en allo wing f alse negativ e and fa lse positiv e alerts. One wa y to make t he IDS less v erbose is to f ine- tune signatures t o e x amine onl y those pac k ets destined f or the rele v ant hos ts. Let us consider BIND, D NS ser v er software inf amous for its secur ity vulnerabilities. In thi s situation, the signatures m a y be modif ied to only look for BIND e xploits if the des tination address on t he packe t matches a pre-defined list of DNS servers. O f course, the operator may actually be interested to kno w that someone is at temp ting a BIND e xploit on a w or kstation or a web ser v er . That is to say , this approach tips t he false alar m compromise to w ar ds the f alse negativ e side. Interesting l y this problem also comes up when designing attack graph for cor relation algorithm s. The S nort team addressed the problems of wide spread prolif er ation of automated a lert f looding tools like Stic k and Snot in their 1 .8 release. Their solution w as to implement a T ra nsmission C ontrol Protoc ol (TCP) state tracking sy s tem which t he y called “stre am4”. By ke eping track of TCP connection states, str eam4 is able to ignore an y segments which are not par t of such a con v ersation. In order to make t he IDS raise an aler t the attacker is no w f orced to transmit at l eas t three segments, rather t han just one. More import antl y , because t he three-wa y handshake requir es tw o hosts to be communicating, t he e xternal attack er must f ind a host on the monitored netw ork willing to par ticipate. This m ight be prev ented by a firew al l bloc king connections. Cur rentl y most sys tems keep track of TCP states. This is mainly to protect against desynchronisation attacks such as t hose described b y Ptacek and Ne wsham[3], but there is also the additional benef it of m aking sure t hat t here is no such short cut in car rying out an alert f looding attack. Furt her to performing TCP state tracking, it is also possible to track an y a pplication la y er state, enabling us to remo ve shor tcuts ev en f or protocols running o v er stateless transports such as UDP . While this is a definite impro v ement, it ca nnot co v e r all cases: For e x ample, some signatures must ignore state inf ormation a s some exploits can exi st as a single pack e t (i.e. statelessly); or because in other cases, they wor k o v er inherently stateless protocols. As w e descr ibe in the next section, token buck et filters combined wit h a ttack g raph cor relation can impro ve the situation. 2.2 T ok en Buc k e t Filter A tok en bucke t filter is an algor ithm for c ontrolling the rate of f low of data. T oken buck et f ilters hav e traditionall y been used in a number of a pplications where rate limiting has been needed. Some good e xam ples are: 1 . Ne tw or k bandwidth manag ement sy stems[1 1]. 2. Flood pr o tection i n networ k chat / t e xt conf erencing sys tems such as Int ernet Rela y Chat. 3. Flo w control in netw ork t ransport protocol s [1 2]. 4. Flood protection f or programs tha t log e xternally g enerated ev e nts such as UNIX sy slog. A tok en buck et filter has tw o parameters, bucke t size, and tok en rate [1 3]. T okens are g enerated at t he token rate and stored in a buffer called t he “bucket '” If the bucke t becomes full, the e xtra tokens are j us t discarded. Each aler t that ar r iv es must hav e a token to pass t hrough the filter . An y aler t that does not ha v e a token i s c alled “o v er- limit” and does not pass the filter . If the aler t rate is less t han the tok en-rate then credit is allo w ed to accumulate in the buck et. This stored credit allo w s f or the aler t-rate to temporar il y e x ceed t he tok en rate (or “burst”). 2.3 A ttack Graph Corr elation W ang et al pro vi de a unif ied approach to cor relating, predicting and reasoning a bout mi ssed aler ts in [1 4]. T he approach w orks in real-time and uses an i n-memory data structure to per f or m the cor relation. The correlation algor ithm is robust in the face of missing alerts from the underl ying IDS. An in-memory data structure cal led a “q ueue graph” (QG) is introduced. In order t o av oid keeping unnessecary aler ts in memory , only the latest aler t f or a giv en exploit v ertex is stored in this structure. That i s to say that the cor relation betw een such matching aler ts is left as implicit. T his allow s the algor ithm to be run in real-time without necessetating the usual sliding cor relation windo w approach which w ould allow an att ac ke r to use an aler t f lood at tack to introduce false negativ e cor relations. In this s y s tem, attack graphs are def ined as directed acy clic graphs (D A G s) ha ving two distinct types of v ertices, secur ity conditions and e xploits (see Figure 1) . Exploit v ert ices are (vuln,src,dst) tuples. The src and dst f ields a re used to tie the e xploit to specific combinations of vulnerable and attacking hosts, wildcards m a y be used. Thes e ve r tices ma y represent one or more possible alert types. A function “f” is introduced which maps alert s to an e xploit v ertices in the atta c k graph. Secur ity conditions ve r tices refer to prerequi sites and consequences of expl oits. Thus edg es connecting a condition to an e xploit are prerequisite relations and t hose connecting an e xpl oit to a condition are consequence relations. Figure 1: A Sam ple At tack Graph Attac k graphs are generated autom atically with TV A, the topological vulnerability assessment tool[1 5] which links together t he output of Nessus, IDS r ules and a vulnerability da tabase. In order to do this a fun ction which maps aler ts to e xploits is introduced. In this wa y the cor relation algor ithm i s vulnerability -centric. That is to sa y it will no t cor relate e xploits against machines which are not defined a s being vulnerable to t hem. These graphs are distinct from those used b y Ning et al in that they contain not just t he caus al relationships betw een attacks but also a databas e of vulnerable hosts on the netw ork. An IDS (in this case Snor t) is set up to send i ts aler ts directl y to the cor relation component. The wa y the attack graph is used b y the cor relation component is to treat each e xploit vertex in the g raph as a queue. Aler ts are placed in their requisite queue and a breadth first search is per f ormed in t he g raph to find pre vious e xpl oits whic h w ould cor relate wit h the cur rent one. If a queue is f ound and is non-empty then a cor relation i s generated. If a queue is empty , the algor ithm can ei ther stop or hypo thesise a missing attack and carr y on. If the edges in t he g raph are di rected f orwards in time, rather t han bac kw ards, predictions can be ge nerated in much the same w a y as c or relations. The QG st ruct ure is actually an enhanced version of the attack graph. A tree is created for each e xploit v ertex in the g raph. In these trees, the cor relation and prediction edges are all precalculated. This eff ectiv el y means that cor relation and prediction can be done in l inear time b y searching in a tree rather than quadratic time b y per f orming breadth first searc h in the attack graph and this is what makes the algorithm s uitable f or real-time application. The out put of the algor ithm is a cor relation graph which can contain a mix of real and h ypothesised aler ts and security conditions. Readers are urg ed to consult the or iginal paper f or the full details[1 4] . 3 Strategic Data R eduction W e hav e descr ibed t he a lert f lood attack in the pre vious se ctions as fundament all y a resource e xhaus tion attack. In this section w e will outline an approach t o reduce e xposure to the attack b y combining aler t t hro ttling with attack graph cor relation. Consider the case of a hum an IDS operator as a resource t hat cannot cope with ha ving to ex amine man y thous ands of bogus aler ts at the rate at which a sustained attack can produce them. There are tw o approaches to s ol ving t his type of problem: one is to increase t he amount of resources at your disposal, the other is to reduce the amount of resources requir ed. While it i s conceivable that one could scale the sensor hardw are to be fully abl e to cope with aler t f loods at a giv en rate f or a giv en length of time it seems rathe r m ore com ple x to scale the human operator . T aking the approach of minimising t he resources require d, alert dat a could be reduced b y throttling the aler t s tream to a fixed r ate. This could be achie v ed by applying a tok en buc ke t f ilter either per signature, per attack type , globally , or e v en in to comple x hierarchie s as in HTB3[1 5] . The burstiness f eature of the TBF algor ithm means that aler ts are onl y discarded under sustained high rate of al erts. Ho w e v er such approac hes run t he r isk of dropping import ant aler ts whic h can ev en assist an att ac ker in concealing their malicious activities. The ke y to our approach is to allo w the cor relation algor ithm to inter pose betwe en t he signature matching, and output components of t he IDS. By doing t his, a token buck e t f ilter can be placed at each queue in t he QG structure and o v er limit aler ts can be discarded. In order that the user ma y be inf ormed of dropped aler ts we can use a kind of “run length e ncoding” (RLE) to represent a s tring of aler ts. RLE is a simple compression tec hnique whic h replaces recurr ing sequences of symbols (called r uns) with a single symbol and a r un count N . T o decompress, one simpl y copies the symbol into the output stream N times. This is an approach familiar to UNIX users who ha v e ev er tr ied to f lood the syslog program and seen its “last messag e r epeated N times” w arning. T o implement RLE compres sion in our case, we first assume that all alert s going through t he same token buck e t f ilter are identical. Then all tha t is required is to add a counter to the queues in the QG data structure and increment tha t counter f or all o v er - limit aler ts. When t here is enough credit in the token buck e t to per mit new aler ts, we dequeue t he the alert and the counter , allo wing them to add a node in the output g raph and to be logged to per manent stor age . This allo w s f or some minimal reconst ruction of lost pack ets by just using the inf ormation in the att ac k graph. T w o questions then arise . Firstl y what to do with alert s not mapping to vertices in t he queue g raph; and secondl y what para mete rs to use for t he tok en buck e t filters. For t hose alert s w hich do not map in to e xpl oit nodes, we cannot be sure t hat we are mi ssing aler ts vital to some strategy . Since the QG algor ithm assumes a complete att ac k g raph anyw a y we could discard all such aler ts. A more prodent approach is taken in our case, and that is to apply a token bucke t filter to such alerts on a per -signature basis. As f or t he parameters of the T BFs, for those aler ts which map to v ertices in t he a ttack gr aph, w e could drop all implicitl y cor relating aler ts and keep the same strategies. Ho w e v er it i s seen as a benefit to kee p aler ts where possible, here w e en visag e that tok en rates of greater than one or two aler ts per second need not be used. For other aler ts ho w ev er , there i s, of course, a trade-off betw een da ta f idelity and eff iciency . In t he ne xt section, w e will sho w that this techni que scales up such that i t e ff ectiv el y nullif ies the computational eff ect of an alert f lood attack. 4 Emp irical Dat a W e can pe rf orm a simple test w ith the Firest orm[1 6] sy stem r unning of f-line against a tcpdump[1 7] capture f ile cont aining an aler t f lood attack captured b y S hmoo G r oup at a defcon CTF ev ent [1 8]. The attack consists of a repeated ICMP f lood at a rat e of around 7 ,343 pac k ets per second. W e per f orm 2 tests and in both, we ha v e a full signature dat abase loaded cont aining around 1 ,600 signatures, with t he netw ork dat a read directl y from the hard disk. The test machi ne was a 3.2GH z Pent ium-IV r unning Linux 2.6 with 1GB of RAM. The results sho wn a re an a v erage of three iterations f or both r uns to factor out any random f luc tuations such as ma y be ca used b y disk seek latency . The first r un (#1) is a control r un using f ires torm + QG algorit hm. The second r un ( #2) i s identical e x cept f or the addition of tok en bucke t f iltering. T w o set s of filters are used: 1 . The set of f ilters f or each e xploit verte x in the attack graph. 2. The s e t of f ilters f or each rul e which does not map to a v er tex i n the attack graph. Each of t hese f ilters is set to 2 aler ts per second and a burst of 20 al erts. These parameters are rather arbitrar y but are probabl y best set based on the operators e xperience of the ba seline aler t rate for the netw ork. # Data Size (KB) Alerts CPU Time Run T ime 1 47 5,229 300,7 4 1 1 3. 1 3 1 1 8.4 7 6 2 1,092 696 1 2. 1 53 1 2.8 1 7 T able 1: Experiment al R esults. As we can see i n T able 1 , the amount of data logged was reduced b y se v er al orders of magnitude and t he run time decreased di sproportionately to the CPU time. While t he r un time w as reduced by around 30%, t he CPU time onl y reduced by around 1 0% . This i ndicates t hat the Fires torm pr ocess i s not was ting as much ti me waiting f or I/ O completion when the token buc k et filter is enabled, The number of aler ts output is reduced by orders of magnitude. In the e xperi ment t he c ommunication channel betw e en the ID S and the ope rator is simpl y an on-disk aler t spool so t he av ailable bandwitdth is high. I n a real w orl d de plo yment, on t he other hand, it is likely t hat aler ts w ould be transmitted across a netw or k a dding further latency and bandwidth constraints. In these deplo yments w e e xpect e v en greater gains in perf ormance. From these results it is sho wn that we can eff ectiv e l y boost perf orm ance and therefor e sensor capacity , allo wing the IDS to car r y on w orking during an aler t f lood rather than becoming o v erwhelmed and possibl y e xhausting t he storag e on the sensor . E v e n if the att ac k contained twice as many pack e ts in the same space of time, it w ould not double the amount of data logged as the tok en rate is fix ed. 5 Summary and Conclusions Aler t f looding is a problem t hat will proba bl y alw a y s e xis t with intr usion detection sys tems and one that cannot be eliminated entirely . Ho w e ver , w e ha v e sho wn that it is possible to drasticall y reduce the e ff ects by recognising an att ack and throttling e x ces s alerts. W e ha v e fur ther sho wn t hat real-time aler t cor relation algor ithms can be used to pro vi de a useful context f or throtting al erts such that ke y attacks are not missed, such an approach solv es problems with either technique used in isolation. Without t he cor relation sys tem interceding betw e en the signature matching and alerting components of the IDS it is not possible for it to decide if aler ts ma y be logg ed or not and without having strategic inf ormation a v ailable to t he t hr ottling algor ithm, it could drop crucial aler ts. Furt her inv estigation is required in to producing optimal token bucke t f ilter configurations and how best to handle those aler ts which do not m ap on to an y e xpl oit v ertices in t he attack graph. Ref erences: [1] ZDNe t UK Ne w s . http://news.zdnet.co.uk/internet/se curity/0,39020375,2085099,00.htm [2] G . Corete x. “Fun With Pack ets : Designing a St ick. ” E ndeav or Sy st ems Inc., 2002. [3] T . H. Ptacek and N. N. Ne w sham. "Inser tion, Ev asion and Denial of Ser vice: Eluding Netw ork Intrusion Detection. ” Secur e N e tw or ks Inc., 1 998. [4] Xinzhou Qin. W enke Lee. “ Attack Pla n Rec ognition and Prediction Using Causal Ne tw or ks”. Pr oceedings of Annual Computer Security Applications Conf er ence , 2004. [5] P eng Ning. Y en Cui, and Douglas S R ee v es. “Constructing Attack Scenarios through Cor relation of Intr usion Aler ts. ” Pr oceedings of the 9th A CM Conf er e nce on Computer & Communications Security. 2002. pp. 245-25 4. [6] Peng Ning, Dingbang X, Chr istopher G. Heale y , R obert and St. Amant. “Building A ttack Scenarios through Integ ration of Complementar y Aler t Methods. ” Proceedings of t he 1 1th Annual Ne tw ork and Distributed Sys tem Security Symposium, 200 4, pp. 97-1 1 1 . [7] Oleg Sheyner , Joshua Haines and S omesh Jha. “ Autom ated Generation and Analy sis of Attac k Graphs. ” Pr oceedings of the IEEE Symposium on Security and Pri v ac,. 2002. pp. 27 3. [8] " The Science of Intrusion Detection Sys tem Attack Identification." Cisco Syst ems.2002, http://www.cisco.com/warp/public/c c/pd/sqsw/sqidsz/prodlit/idssa_wp. htm [9] Sniph. “snot “. 200 1 . [1 0] Mar ty Roe sch. "Snor t - Li ghtw e ight Intr usion Detection f or Netw orks". USENIX 1 3 th Sys tems Administr ation Conf e r e nce, 1 999. [1 1] G. W oodruf f, R. R og ers and P . Richards. "A cong estion control framew or k for high-speed integrated pack eti zed transpor t." IEEE G lobecomm, 88. 1 9988. [1 2] R. W ade, M. Kara and P .M. Dew . "Study of a T r ansport Protocol Emplo yi ng Bottleneck Probing and T ok en Buck et Flo w Control." Fifth IEEE Symposium on Computers and Communications, 2002. [1 3] J. T ur ner . "Ne w directions in communications (or which w a y to the inf ormation ag e?)" IEEE Communications Mag azine ,V ol.24, N o. 1 0, pp. 8-15. [1 4] Lingyu W ang, Anyi Liu and Sushil J a joda. “ An Eff icient U nified Approach t o Correlating Hypothesising, and Predicting Intr usion Aler ts.” Pr oceedings of Eur opean Symposium on Computer Security, 2005. pp. 24 7-266. [1 5] Sushil Jajodia, Ste v e Noel and Br ian O’Ber ry . “T opological analy sis of network att ac k vulnerability .” Managing Cy ber Thr eats: Issues, Appr oaches and Challeng es , 2005. Springer . pp. 248-266. [1 6] Mar tin Dev era. "Hierarc hical tok en buck et theor y ." 2002. http://luxik.cdi.cz/~devik/qos/htb /manual/theory.htm [1 7] Gianni T edesco. 2005. Fires torm IDS. http://www.scaramanga.co.uk/firest orm/ [1 8] Leres V an Jacobson, C raig McCanne and Ste v en McCanne. “tcpdump”. Lawrence Berk ele y Nat ional Laboratory . [1 9] Shmoo Group. “CCTF Defcon Data”. 200 1 . http://www.shmoo.com/cctf/

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment