Optimal Filtering of Malicious IP Sources

Optimal Filtering of Malicious IP Sources Fabio Soldo, Athina Mark opoulou Uni versity of California, Irvine { fsoldo, athina } @uci.edu Katerina Ar gyraki EPFL, Switzerland katerina.ar gyraki@epﬂ.c h Abstract — How can we protect the network infrastructure from malicious trafﬁc, such as scanning, malicious code prop- agation, and distributed denial-of-service (DDoS) attacks? One mechanism for blocking malicious trafﬁc is ﬁltering: access control lists (A CLs) can selectively block trafﬁc based on ﬁelds of the IP header . Filters (A CLs) ar e already a vailable in the r outers today but are a scarce r esource because they are stored in the expensive ternary content addr essable memory (TCAM). In this paper , we de velop, f or the ﬁrst time, a framew ork f or studying ﬁlter selection as a resource allocation problem. Within this framework, we study ﬁv e practical cases of source address/pr eﬁx ﬁltering, which correspond to different attack scenarios and operator’ s policies. W e show that ﬁlter selection optimization leads to novel variations of the multidimensional knapsack problem and we design optimal, yet computationally efﬁcient, algorithms to solve them. W e also evaluate our approach using data from Dshield.org and demonstrate that it brings signiﬁcant beneﬁts in practice. Our set of algorithms is a building block that can be immediately used by operators and manufacturers to block malicious trafﬁc in a cost-efﬁcient way . I . I N T RO D U C T I O N How can we protect our network infrastructure from ma- licious trafﬁc, such as scanning, malicious code propagation, spam, and distributed denial-of-service (DDoS) attacks? These activities cause problems on a regular basis ranging from simple annoyance to severe ﬁnancial, operational and political damage to companies, organizations and critical infrastructure. In recent years, they hav e increased in volume, sophistication, and automation, largely enabled by botnets that are used as the platform for launching these attacks. Protecting a victim (host or network) from malicious trafﬁc is a hard problem that requires the coordination of sev- eral complementary components, including non-technical (e.g. business and le gal) and technical solutions (at the application and/or network lev el). Filtering support from the network is a fundamental b uilding block in this ef fort. F or example, the victim’ s ISP may install ﬁlters to react to an ongoing attack, by blocking malicious trafﬁc before it reaches the victim. Another ISP may want to proactively identify and block the malicious trafﬁc before it reaches and compromises vulnerable hosts in the ﬁrst place. In either case, ﬁltering is a necessary operation that must be performed within the netw ork. Filtering capabilities are already av ailable at the routers today via access control lists (ACLs). A CLs allow a router to match a packet header against rules [1] and are currently used for enforcing a variety of policies, including infrastructure protection [2]. For the purpose of blocking malicious trafﬁc, a ﬁlter is a simple A CL rule that denies access to a source IP address or preﬁx. T o keep up with the high rates of modern routers, it is important that ﬁltering is implemented in hardware: indeed A CLs are stored in the T ernary Content Addressable Memory (TCAM), which allows for parallel ac- cess and reduces the number of lookups per forwarded packet. Howe v er , TCAM is more expensi ve and consumes more space and po wer than con ventional memory . The size and cost of TCAM puts a limit on the number of ﬁlters and this is not expected to change in the near future. 1 W ith thousands or tens of thousands of ﬁlters per path, an ISP alone cannot hope to block the currently witnessed attacks, not to mention attacks from multimillion-node botnets expected in the near future. Consider the example sho wn in Fig.1(a): an attacker com- mands a large number of compromised hosts to send traf ﬁc tow ards a victim V (say a webserver), thus exhausting the resources of V and prev enting it from serving its legitimate clients; the ISP of V tries to protect its client from the attack, by blocking the attack at the gatew ay router G . Ideally , G would like to assign a single ﬁlter to block each malicious IP source. Ho wev er , there are less ﬁlters than attackers and aggregation is typically used: a single ﬁlter blocks an entire source address preﬁx. This has the desired ef fect of reducing the number of ﬁlters but also the side-ef fect of blocking legitimate trafﬁc originating from that preﬁx. Therefore, ﬁlter selection becomes an optimization problem that tries to block as many malicious and as few legitimate sources as possible, giv en a certain budget on the number of ﬁlters. In this paper, we formulate, for the ﬁrst time, a general framew ork for studying ﬁlter selection as a resource allocation problem. T o the best of our knowledge, the optimal ﬁlter selection aspect has not been explored so far , as most related work on ﬁltering has focused on protocol and architectural aspects. W ithin this framew ork, we consider ﬁve practical source address ﬁltering problems, depending on the attack scenario and the operator’ s policy and constraints. Our con- tributions are twofold. On the theoretical side, ﬁlter selection optimization leads to novel variations of the multidimensional knapsack problem, and we exploit the special structure of each problem to design optimal and computationally efﬁcient algorithms. On the practical side, we provide a set of cost- 1 A router linecard or supervisor-engine card typically supports a single TCAM chip with tens of thousands of entries. For example, the Cisco Catalyst 4500, a mid-range switch, provides a 64,000-entry TCAM to be shared among all its interfaces (48- 384). Cisco 12000, a high-end router used at the Internet core, provides 20,000 entries that operate at line-speed per linecard (up to 4 Gigabit Ethernet interfaces). The Catalyst 6500 switch can ﬁt 16K-32K patterns and 2K-4K masks in the TCAM. Depending on ho w an ISP connects to its clients, each individual client can typically use only part of these ACLs, i.e. a few hundreds to a few thousands ﬁlters. 2 Malicious Host Good Host LAN ISPs Enterprise V 2 1 Gateway G A B (a) Actual network. Malicious IP Good IP IP Preﬁx Filter A a.b.* a.b.c.* a.b.c.10* a.b.c.101 1 1010 F2 F1 B (b) Hierarchy of source IP addresses and preﬁxes Fig. 1. Example of a distributed attack. Let’ s assume that the gate way router G has only two ﬁlters available to block malicious trafﬁc and protect the victim V . It uses F 1 to block a single malicious address (A) and F 2 to block preﬁx a.b.c. ∗ , which contains 3 malicious sources but also one legitimate source (B). Therefore, the selection of ﬁlter F 2 trades-off the collateral damage (blocking B) for the reduction in the number of ﬁlters (from 3 to 1). efﬁcient algorithms that can be used both by operators to block malicious traf ﬁc and by router manufacturers to optimize the use of their TCAM and eventually optimize the cost of the routers. W e would like to emphasize that we do not propose a nov el architecture for dealing with malicious trafﬁc; instead, we optimize the use of an important mechanism that already exists on the Internet today and can be immediately used as a building block in larger defense systems, as discussed in detail in Section V -A. The structure of the paper is as follo ws. In Section II- A, we formulate the general framework for studying ﬁlter selection. In Section III, we study ﬁve speciﬁc problems that correspond to different attack scenarios and operator’ s policies: blocking all addresses in a blacklist (BLOCK-ALL); blocking some addresses in a blacklist (BLOCK-SOME); blocking all/some addresses in a time-v arying blacklist (TIME- V AR YING BLOCK-ALL/SOME); blocking ﬂo ws during a DDoS ﬂooding attack to meet bandwidth constraints (FLOOD- ING); and distributed ﬁltering across several routers during ﬂooding (DIST -FLOODING). For each problem, we design an optimal, yet computationally efﬁcient, algorithm to solve it. In Section IV, we use data from Dshield.or g [3] to ev aluate the performance of our algorithms in realistic attack scenarios and demonstrate that they bring signiﬁcant beneﬁt in practice. In Section V, we position our work within (a) the bigger picture of defense against malicious trafﬁc and (b) related knapsack problems. Section VI concludes the paper . I I . P R O BL E M F O R M U L AT I O N A N D F R A M E W O R K A. Deﬁnitions and Notation Let us ﬁrst deﬁne the notation used throughout the paper , also summarized in T able I. Sour ce IP addr esses and preﬁxes. Every IPv4 address i is a 32-bit sequence. Using the standard notation IP/mask we use p/l to denote a preﬁx p of length l bits; p and l can take values l = 0 , 1 , ... 32 and p = 0 , 1 , ... 2 l − 1 respectiv ely . Sometimes, for brevity , we will write simply p to indicate preﬁx p/l . W e write i ∈ p/l to indicate that address i is within the 2 32 − l addresses covered by preﬁx p/l . Blacklists. A blacklist ( B L ) is a list of N unique malicious source IP addresses, which send malicious traf ﬁc towards the victim. Identifying which sources are malicious and should be blocked is a difﬁcult problem on its own right, but orthogonal to the focus of this paper . W e consider that the set of malicious IP sources is accurately identiﬁed by another module (e.g. an intrusion detection system and/or historical data) in a pre- processing step and is given as input to our problem. (For a discussion of these assumptions, see Section V -A.) An address is considered “bad” if it appears in a blacklist or “good” if it belongs to a whitelist (a set of legitimate addresses) G , which may or may not be explicitly giv en. In the latter case, G includes all addresses that are not in B L . Addr ess W eight. In the simplest version of the problem, an address is simply either bad or good, depending on whether it appears or not in a blacklist respectively . In a more general framew ork, a weight w i can be assigned to ev ery address i to indicate the importance of an address. W e use w i ≤ 0 for ev ery bad address i to indicate the beneﬁt from blocking it; we use w i ≥ 0 for e very good address i to indicate the collateral damage from blocking it; w i = 0 indicates indifference about whether address i will be blocked or not. The weight w i can hav e different interpretation depending on the problem, as we will see later . First, it can capture the amount of bad/good trafﬁc originating from an IP address and therefore the beneﬁt/cost of blocking that address. Second, w i can express policy: e.g. depending on the amount of money gained/lost by the ISP when blocking address i , the operator can decide to assign large positi ve weights to its important customers that should not be blocked, or large ne gati ve weights to the worst attackers that must be block ed. 2 F ilters. In this paper , we focus on source address/preﬁx ﬁltering. A ﬁlter is a simple A CL rule that speciﬁes that all addresses in preﬁx p/l should be blocked. F max denotes the maximum number of ﬁlters av ailable in TCAM and is gi ven as input to our problem. Notice that ﬁlter optimization is only meaningful when the number of av ailable ﬁlters F max is much 2 The higher the absolute value of the weight assigned to an individual bad/good address, the higher preference to block/not block that address. If all good and bad addresses are assigned the same w g and − w b respectiv ely , then the ratio w g w b is a parameter that the operator can tune to express how much she values low collateral damage vs. blocked malicious trafﬁc. At the extreme, w i = ∞ ( −∞ ) indicates that address i must never (al ways) be blocked. 3 i Generic IP address w i W eight assigned to address i BL Blacklist: a list of “bad” addresses N Number of unique addresses in B L G Whitelist: a set of “good” addresses p/l (or “ p ” for short) preﬁx p of length l bits (IP/mask notation) i ∈ p/l address i that belongs to preﬁx p/l x p/l ∈ { 1 , 0 } indicates if a ﬁlter blocks preﬁx p/l or not g p/l = P i ∈ p/l ∩ G w i collateral damage from ﬁltering preﬁx p/l b p/l = | P i ∈ p/l ∩ B w i | bad trafﬁc blocked by ﬁltering preﬁx p/l F max Maximum number of available ﬁlters z p ( F ) optimal solution of subproblem considering only addresses in preﬁx p and F ﬁlters (or z p ( F, C ) ) (and capacity C , in the case of FLOODING) T ABLE I N OTA T I O N smaller than the number of malicious sources N , which is indeed the case in practice (see introduction and [1], [2]). The decision variable x p/l ∈ { 1 , 0 } is 1 if a ﬁlter is assigned to block preﬁx p/l ; or 0 otherwise. A ﬁlter p/l blocks all 2 32 − l addresses in that range. This has the desired effect of blocking all bad traf ﬁc b p/l = | P i ∈ p/l ∩BL w i | and the side-effect of blocking all legitimate trafﬁc g p/l = P i ∈ p/l ∩G w i , originating from that preﬁx. An effecti ve ﬁlter should have a large beneﬁt b p/l and low “collateral damage” g p/l . B. Rationale and Overvie w of F iltering Pr oblems Giv en a set of malicious and le gitimate sources, and a measure of their importance ( w ’ s), the goal of ﬁlter selection is the construction of ﬁltering rules, so as to minimize the impact of malicious sources on the network using the av ailable net- work resources (e.g. ﬁlters and link capacity). Depending on the attack scenario, and the operator’ s policy and constraints, different problems may arise. E.g. the operator might want to block all malicious sources, or might tolerate to leave some unblocked; the attack might be of a low rate or a ﬂooding attack; the operator may control one or sev eral routers. In the core of each ﬁltering problem lies the follo wing: min X p/l X i ∈ p/l w i · x p/l (1) s.t. X p/l x p/l ≤ F max (2) X p/l : i ∈ p/l x p/l ≤ 1 ∀ i ∈ B L (3) x p/l ∈ { 0 , 1 } ∀ l = 0 , .. 32 , p = 0 , .. 2 l (4) Eq.(1) expresses the objective to minimize the total cost for the network, which consists of two parts: the collateral damage (terms with w i > 0 ) and the cost of leaving malicious trafﬁc unblocked (terms with w i < 0 ). W e use the notation P p/l to denote summation o ver all possible preﬁxes p/l : l = 0 , ... 32 , p = 0 , ... 2 l − 1 . Eq.(2) expresses the constraint on the number of ﬁlters. Eq.(3) states that overlapping ﬁlters are mutually exclusi ve, i.e. each malicious address should be blocked at most once, otherwise ﬁltering resources are wasted. Eq.(4) lists the decision v ariables x p/l corresponding to all possible preﬁxes; it is part of every optimization problem in this paper and will be omitted from now on for brevity . Eq.(1)-(4) provide the general framew ork for ﬁlter selection optimization. Different ﬁltering problems can be written as special cases within this frame work, possibly with additional constraints. As we discuss in Section V -B, these are all multi- dimensional knapsack problems [4], which are in general, NP- hard. The speciﬁcs of each problem affect dramatically the complexity , which can v ary from linear to NP-hard. In this paper, we formulate ﬁ ve practical ﬁltering problems, and we develop optimal, yet computationally efﬁcient algo- rithms to solve them. Here, we summarize the rationale behind each problem and our main results. The exact formulation and detailed solution for each problem is pro vided in section III. [ P 1 ] BLOCK-ALL : Assume that a blacklist B L and a whitelist G is giv en; a weight is also associated with e very good address to indicate the amount of legitimate trafﬁc originating from that address. The limit on the number of ﬁlters is F max . The ﬁrst practical goal an operator may hav e is to choose a set of ﬁlters that block all malicious sources so as to minimize the collateral damage. W e design an optimal algorithm that solves this problem at low-comple xity (linearly increasing with N , i.e. the lowest achiev able complexity for this problem). [ P 2 ] BLOCK-SOME : Assume that the same blacklist and whitelist are giv en, as in P 1 . Howe ver , the operator may be willing to block only some (instead of all) malicious addresses, so as to decrease the collateral damage, at the expense of leaving some malicious trafﬁc unblocked. She can achiev e this by assigning weights w i > 0 and w i < 0 to good and bad addresses, respectively , to express their relativ e “importance”. The goal of P 2 is to block only those subsets of malicious addresses that have the highest impact and are not co-located with important legitimate sources, so as to minimize the total cost in Eq.(1). W e design an optimal, computationally efﬁcient (linearly increasing with N ) algorithm for this problem too. [ P 3 ] TIME-V ARYING BLOCK-ALL (SOME) : Assume that a set of blacklists {BL T 0 , B L T 1 , . . . , B L T i , . . . } , and a set of whitelists {G T 0 , G T 1 , . . . , G T i , . . . } are giv en at different times, T 0 < T 1 < · · · < T i < . . . ; a weight is also associated with e very address; the limit on the number of ﬁlters is F max . The goal of P 3 is to exploit temporal correlation between blacklists at successive times and, given the solution to BLOCK-ALL(SOME) for input blacklist B L T i − 1 , to efﬁ- ciently update the ﬁltering rules and construct the solution to BLOCK-ALL(SOME) with input blacklist BL T i . [ P 4 ] FLOODING : In a distributed ﬂooding attac k , such as the one shown in Fig.1, a large number of compromised hosts send trafﬁc to the victim with the purpose of exhausting the victim’ s access bandwidth. The problem is well-known and increasingly frequent and severe. Our frame work can be used to optimally select ﬁlters in this case, so as to minimize the collateral damage and meet the bandwidth constraint (i.e. the total bandwidth of the unblocked trafﬁc should not exceed the bandwidth of the ﬂooded link, e.g. link G-V in Fig.1). The 4 input is the same as in P 1 - P 2 , and the weights capture the trafﬁc volume originating from each IP source. W e prove that the problem P 4 is NP-hard and we design a pseudo-polynomial algorithm that optimally solve problem P 4 with complexity that grows linearly with the number of sources in the blacklist and the whitelist |B L| + |G | . [ P 5 ] DIST -FLOODING : All the abov e problems aim at selecting ﬁlters at a single router . Howe ver , a network ad- ministrator , of an ISP or campus network, may use the ﬁltering resources collaborati vely across se veral r outers to better defend against an attack. (Distributed ﬁltering may also be enabled by the cooperation across sev eral ISPs against a common enemy .) The question then is not only which ﬁlters to select but also on which router to place them. Here, we focus on DIST -FLOODING, which is the practical case of distributed ﬁltering, across sev eral routers, against a ﬂooding attack. W e prove that P 5 can be decomposed into sev eral FLOODING problems, that can be solved independently and optimally one at each router . I I I . F I LT E R I N G P RO B L E M S A N D A L G O R I T H M S In this section, we give the detailed formulation of each problem and the algorithm that solves it. But ﬁrst, let us deﬁne a data structure that we use to represent the problem and to dev elop all the subsequent algorithms. A. Data Structure for Repr esenting the Pr oblem Deﬁnition 1 (LCP T r ee): Giv en a set A of N IP addresses, we deﬁne the Longest Common Preﬁx tree of A , LCP( A ), as the binary tree whose leav es represent the N IPs and all other nodes represent all and only the longest common preﬁxes between any pair of IPs in A . The preﬁxes are organized in the natural IP hierarchy , with shorter preﬁxes tow ards the root and longer preﬁxes to wards the lea ves, so that the preﬁx corresponding to a parent node includes the preﬁxes corresponding to its two children. An example is shown and discussed in Fig.2. The LCP tree can be constructed from the binary tree of all preﬁxes, by removing the branches that do not have malicious IPs and then by removing nodes with a single child. It reduces the storage for representing candidate preﬁxes by encoding those preﬁxes that are part of a feasible solution. The LCP tree is a variation of the binary (unibit) trie [5] but does not hav e nodes with a single child. W e do not claim nov elty in this data structure but we describe it in detail because we use it extensiv ely in the design of the algorithms. Complexity: W e can b uild the LCP tree from N malicious addresses by performing N insertions in a Patricia trie [5]. T o insert a string of m bits, we need at most m comparisons. Thus, the worst case complexity is O ( mN ) , where m = 32 (bits) is the constant length of an IP address. W e will make e xtensiv e use of the LCP tree in all algorithms in the rest of this section, as it provides a compact way to represent feasible solutions and to efﬁciently select the optimal one. Note that every node in the LCP-tree is a candidate preﬁx p/l ; for brevity of notation, we will use interchangeably the notation p/l and its shorter v ersion p . Fig. 2. Example of LCP-tree used in BLOCK-ALL. For ease of illustration, consider a 4-bit (instead of 32-bit) address space, i.e. from 0000 to 1111. Let B L = { 0 , 3 , 4 , 5 , 7 , 810 , 11 , 12 } be the set of malicious IPs, corresponding to the leaves of the binary tree. All remaining IPs ( 1 , 2 , 6 , 9 , 13 , 14 , 15 ) are considered legitimate and not explicitly shown. Every intermediate node represents the longest common preﬁx (LCP) covering all malicious sources in that subtree; it is associated with a cost measuring the additional collateral damage caused when we ﬁltering that node, instead of ﬁltering each of its children. E.g. the LCP of malicious addresses 0=0000 and 3=0011 is preﬁx 00**; if ﬁlter 00** is chosen instead of ﬁlters 0000 and 0011, collateral damage of 2 is caused, because the legitimate addresses 1 and 2 are also blocked. Choosing a set of source preﬁxes to ﬁlter is equiv alent to choosing a set of nodes in this LCP tree. E.g. a feasible solution to BLOCK-ALL consists of preﬁxes { 0 / 2 , 4 / 2 , 8 / 2 , 12 / 4 } that cover all malicious IPs. B. BLOCK-ALL Goal. Gi ven: (i) a blacklist of malicious addresses B L (ii) a set of legitimate sources (iii) weights assigned to each legitimate source address, indicating the amount of traf ﬁc from that address and (iv) a limit on the number of ﬁlters F max ; select source address preﬁxes so as to block all malicious sources and minimize the collateral damage. F ormulation. This can be formulated within the general framew ork of Eq.(1)-(4) by assigning w i > 0 to good addresses (the amount of legitimate trafﬁc) and weight w i = 0 to each malicious source. The goal is to minimize the total cost, which in this case is simply the total legitimate trafﬁc blocked: P p/l P i ∈ p/l w i · x p/l = P i ∈ p/l ∩G w i + 0 = g p/l . Constraint Eq.(7) enforces that ev ery malicious source should be blocked by exactly one ﬁlter . min X p/l g p/l x p/l (5) s.t. X p/l x p/l ≤ F max (6) X p/l : i ∈ p/l x p/l = 1 ∀ i ∈ B L (7) Characterizing an Optimal Solution. In the algorithm, we search for solutions that can be represented as a subtree of the LCP tree structure, as described in the following: 5 Pr oposition 3.1: Given B L and F max , there exists an op- timal solution of BLOCK-ALL that can be represented as a pruned subtree of LCP-tree( B L ) with: the same root, up to F max leav es, and non-leaf nodes having exactly two children. Pr oof: W e prov e that every feasible solution of BLOCK- ALL can be reduced to another feasible solution that (i) corresponds to a subtree of LCP-tree( BL ) as described in the proposition and (ii) has smaller or equal collateral damage. This is sufﬁcient to prov e the Prop.3.1 since an optimal solution is also a feasible one. Clearly , e very feasible solution of Eq. (5)-(7), S , can be represented as a pruned subtree of the binary tree of all possible IP preﬁxes, with the same root and leav es being the preﬁxes used as ﬁlters. Assume that S uses a preﬁx ˜ p/ ˜ l which is not in LCP-tree( B L ). Therefore, either ˜ p/ ˜ l does not contain any bad IPs or one of its two branches does not. In fact, if this w as not the case, i.e. there is at least one bad IP in both branches, then ˜ p/ ˜ l would be the longest common preﬁx of them, and as such it would be in LCP-tree( B L ). If there are no bad IPs in preﬁx ˜ p/ ˜ l , then we can safely remov e the ﬁlter ˜ p/ ˜ l , as it is not blocking any bad IPs. Similarly , if bad IPs are concentrated only in one of the two branches, then we can mov e the ﬁlter from ˜ p/ ˜ l to its child that contains all bad IP(s). In both cases, we hav e a constructed a new feasible solution, with smaller (or equal) collateral damage than the original solution. Iterating this process until all preﬁxes are in the LCP- tree sho ws that an y feasible solution can be transformed in a feasible solution corresponding to a subtree of LCP-tree( B L ), as described in the proposition and ha ving smaller or equal collateral damage. Therefore, also an optimal feasible solution can be transformed to that form. Finally , we note that e very node of the subtree so con- structed, has two (or zero) children node. By contradiction, a set of ﬁlters which can be represented as a subtree of the LCP-tree with (at least) one node p with exactly one child node, correspond to leaving unﬁltered all bad IPs contained in the child node (preﬁx) of p which is not selected in the subtree. 3 This violates constraint in Eq.(7), and thus correspond to a non-feasible solution of problem BLOCK-ALL. Algorithm. Algorithm 1, which solves BLOCK-ALL, con- sists of two main steps. First, we build the LCP-tree from the input blacklist. Second, in a bottom-up fashion, we compute z p ( F ) ∀ p, F , i.e. the minimum collateral damage needed to block all malicious IPs in the subtree of preﬁx p using at most F ﬁlters. Follo wing a dynamic programming (DP) formulation, we can ﬁnd the optimal allocation of ﬁlters in the subtree rooted at preﬁx p , by ﬁnding a value n and assigning F − n ﬁlters to the left subtree and n to the right subtree, so as to minimize the collateral damage. The fact that we need to ﬁlter all malicious addresses (leav es in the LCP tree) implies that at least one ﬁlter must be assigned to the left and right subtree, i.e. n = 1 , 2 ..., F − 1 . 3 note that in the LCP-tree every node/preﬁx contain at least one bad IP . Algorithm 1 Algorithm for BLOCK-ALL 1: build LCP-tree( B L ) 2: f or all leaf nodes l eaf do 3: z leaf ( F ) = 0 ∀ F ∈ [1 , F max ] 4: X leaf ( F ) = { l eaf } ∀ F ∈ [1 , F max ] 5: end for 6: le vel = lev el(leaf)-1 7: while l evel ≥ l evel ( r oot ) do 8: for all node p such that lev el(p)==level do 9: z p (1) = g p 10: X p (1) = { p } 11: z p ( F ) = min n =1 ,..F − 1 n z s l ( F − n ) + z s r ( n ) o ∀ F ∈ [2 , F max ] 12: X p ( F ) = X s l ( F − n ) ∪ X s r ( n ) ∀ F ∈ [2 , F max ] 13: end for 14: lev el = level - 1 15: end while 16: Return z root ( F max ) , X root ( F max ) For ev ery pair of sibling nodes, s l (left) and s r (right), with common parent node p , we hav e the DP recursive equation: z p ( F ) = min n =1 ,...,F − 1 n z s l ( F − n ) + z s r ( n ) o , F > 1 (8) with boundary conditions for leaf and intermediate nodes: z leaf ( F ) = 0 ∀ F ≥ 1 , z p (1) = g p ∀ p (9) Once we compute z p ( F ) for all preﬁxes in the LCP-tree, we simply read the v alue of the optimal solution, z root ( F max ) . W e also use the variables X p ( F ) to keep track of the set of preﬁxes used in the optimal solution. In lines (4) and (10) of Algorithm 1, X p ( F ) is initialized to the single preﬁx used. In line (12), after computing the new cost, the corresponding set of preﬁxes is updated: X p ( F ) = X s l ( F − n ) ∪ X s r ( n ) . Theor em 3.2: Alg.1 computes the optimal solution of prob- lem BLOCK-ALL: the preﬁxes that are contained in set X p ( F ) are the optimal x p/l = 1 for Eq.(5)-(7). Pr oof: Recall, z root ( F max ) denote the value of the opti- mal solution of BLOCK-ALL with F max ﬁlters (i.e. minimum amount of collateral damage), and with X root ( F max ) the set of ﬁlters selected in the optimal solution. Let s l and s r denote the two children nodes (preﬁxes) of r oot in the LCP-tree( B L ). Finding the optimal allocation of F max > 1 ﬁlters to block all IPs contained in r oot (possibly the all IP space), is equiv alent to ﬁnding the optimal allocation of x ≥ 1 ﬁlters to block all IPs in s l , and y ≥ 1 preﬁxes for bad IPs in s r , such that x + y = F max . This is because preﬁxes s l , and s r jointly contain all bad IPs. Moreover , both s l and s r contains at least one bad IP . Thus, at least one ﬁlter must be assigned to each of them. If F max = 1 , i.e. there is only one ﬁlter av ailable, the only feasible solution is to select root as the preﬁx to ﬁlter out. The same argument recursively applies to descendant nodes, until either we reach a leaf node, or we have only one ﬁlter av ailable. In these cases, the problem is tri vially solved by condition in Eq.(9). Complexity . Computing Eq.(8) for ev ery node p and for ev ery F ∈ [1 , F max − 1] inv olves N ( F max − 1) subproblems, one for ev ery pair ( p , F ) with complexity F max − 1 each. 6 z p ( F ) in Eq.(8) requires only the optimal solution at the sibling nodes, z ( s l , F − n ) , z ( s r , n ) . Thus, proceeding from the leav es to the root, we can compute the optimal solution in N ( F max − 1) 2 . This simple bound can be made tighter observing that, at e very node in the LCP-tree we do not need to compute z p ( F ) for all values F ≤ F max , but only for F ≤ min {| l eaves ( p ) | , F max } , where | l eaves ( p ) | is the number of the leaves under preﬁx p in the LCP tree. Moreover , the comple xity of computing ev ery single entry z p ( F ) is obviously F . Thus, the overall number of operations needed equals, X i ∈ Node ∆ i (∆ i + 1) 2 (10) where ∆ i = min { F max , | l eaves ( i ) |} . Let L i denote the lev el of node i in the LCP-tree, with the conv ention that we assign L = 0 to the root node. Per every node, such that L i ≤ b log  N F max  c , ∆ i = F max ; otherwise, ∆ i = | leav es ( i ) | ≤ N 2 L i , since LCP-tree is a binary tree. Thus, we hav e d log N e X L =0 X i ∈ Node lev el ( i )= L ∆ i (∆ i + 1) 2 = =  log  N F max  X L =0 X i ∈ Node lev el ( i )= L F max ( F max + 1) 2 + + d log N e X L =  log  N F max  +1 X i ∈ Node lev el ( i )= L N 2 L i  N 2 L i + 1  = =  log  N F max  X L =0 X i ∈ Node lev el ( i )= L F max ( F max + 1) 2 + + d log N e X L =  log  N F max  +1 X i ∈ Node lev el ( i )= L N 2 2 2 L i + N 2 L i ≤ (2 log  N F max  +1 − 1) F max ( F max + 1) 2 + + F max 2  F max 2 + 1  (11) ≤ N ( F max + 1) 2 + F max 2  F max 2 + 1  where Eq.(11) uses the fact that if 0 ≤ n 0 < n 1 , then P n 1 h = n 0 1 2 h ≤ 1 2 n 0 − 1 . Using this observation, the computation can be done in O ( N F max ) , which is essentially O ( N ) , since F max << N and F max does not depend on N but only on the TCAM size. Thus, the time complexity increases linearly with the number of malicious IPs N . This is the lowest achiev able complexity , within a constant factor , since we need to read all N malicious IPs at least once. C. BLOCK-SOME Goal. Giv en: (i) a blacklist of malicious addresses (ii) a set of legitimate sources (iii) weights assigned to all addresses, which express relative importance and (i v) a limit on the number of ﬁlters F max ; select some source address preﬁxes to block so as to minimize the total cost, including the cost of collateral damage and the beneﬁt of blocking malicious addresses. F ormulation. This can be formulated within the general framew ork of Eq.(1)-(4), by assigning to good and bad ad- dresses weights w i > 0 and w i < 0 respectiv ely , to express their relative importance. The goal is to minimize the total cost, as in Eq.(1), which in this case includes both collateral damage g p/l and unﬁltered malicious trafﬁc b p/l . min X p/l  g p/l − b p/l  x p/l (12) s.t. X p/l x p/l ≤ F max (13) X p/l : i ∈ p/l x p/l ≤ 1 ∀ i ∈ B L (14) Another difference from BLOCK-ALL is Eq.(14), which dic- tates that ev ery malicious source must be co vered at most by one preﬁx , but does not necessarily have to be cov ered. Characterizing an Optimal Solution. W e can lev erage again the structure of the LCP tree to characterize feasible and optimal solutions, with a proposition similar to Prop.3.1. The difference from BLOCK-ALL is that, because some bad IPs can remain unﬁltered, the pruned subtree corresponding to a feasible solution can now ha ve nodes with a single descendant. Pr oposition 3.3: Given B L and F max , there exists an op- timal solution of BLOCK-SOME that can be represented as a pruned subtree of LCP-tree( B L ) with: the same root, up to F max leav es. Pr oof: In Prop.3.1 we proved that any solution of Eq.5- 6 can be reduced to a (pruned) subtree of the LCP-tree with at most F max leav es. Moreov er , we note that constraint in Eq.(14), which imposes the use of non-overlapping preﬁxes, is automatically imposed considering the leaves of the pruned subtree as the selected ﬁlter . This prove that any feasible solution of BLOCK-SOME can be transformed in a pruned subtree of the LCP-tree with at most F max leav es. And thus, can an optimal solution. Algorithm. The algorithm is similar to Algorithm 1 in that it uses the LCP-tree and a similar DP approach. The dif ference is that not all addresses need to be covered and, at each step, we can assign n = 0 ﬁlters to the left or right subtree, i.e. in line (11) of Algorithm 1: n = 0 , 1 ..., F . W e can recursiv ely compute the optimal solution as before: z p ( F ) = min n =0 ,...,F n z s l ( F − n ) + z s r ( n ) o (15) 7 with boundary conditions for intermediate ( p ) and leaf nodes: z p (0) = 0 ∀ p (16) z p (1) = min n g p − b p , min n =0 , 1 n z s l (1 − n ) + z s r ( n ) oo (17) z leaf ( F ) = − b leaf ∀ F ≥ 1 (18) Complexity . The analysis of BLOCK-ALL can be applied to this algorithm as well. The comple xity turns out to be the same, i.e. linearly increasing in N as well. BLOCK-ALL vs. BLOCK-SOME. There is an interesting connection between the two problems. The latter can be regarded as an automatic way to select the best subset from B L , in terms of the weights w i , and run BLOCK-ALL only on that subset. The advantage is that we do not need to search for the optimal subset, which is automatically giv en in the ﬁnal solution. In the extreme case that much more importance is giv en to the bad rather than the good addresses, BLOCK- SOME degenerates to BLOCK-ALL. D. TIME-V ARYING BLOCK-ALL(SOME) So far , we have considered the static problem of ﬁlter - ing a ﬁxed set of source IP addresses. Ho wev er , malicious source IPs appear/disappear/reappear in a blacklist over time [9]. In this section, we consider the problem of ﬁltering a dynamic set of source IPs, i.e., v arying ov er time. This is equiv alent to considering different blacklists, one at every time an IP is inserted or deleted from the blacklist. Let us denote {B L T 0 , B L T 1 , . . . , B L T i , . . . } the set of different blacklists as sampled at time T 0 < T 1 < · · · < T i < . . . , when a ne w IP is inserted in the blacklist or an old one is remov ed. The trivial approach to the dynamic BLOCK-ALL problem is to run Alg.1 from scratch at e very time instance. As noted the computational comple xity of Alg.1 is lo w: it grows linearly with the number of IP addresses in the blacklist, N . Howe ver , if the overlap between two successive blacklists is large enough, we can exploit the correlation between them to construct a more ef ﬁcient scheme, which updates ﬁlters as needed, while leaving most of them unchanged. More formally , consider the following problem: Goal. Giv en a set of blacklists {B L T 0 , B L T 1 , . . . , B L T i , . . . } collected at different times, T 0 < T 1 < · · · < T i < . . . , and F max ﬁlters, ﬁnd the set of ﬁltering rules {S T 0 , S T 1 , . . . , S T i , . . . } at ev ery time such that, ∀ i = 0 , 1 , ... S T i solves BLOCK-ALL(SOME) for input blacklist B L T i . Algorithm. As mentioned abov e, if there is no or lo w ov erlap between successiv e blacklists, the ob vious solution to this problem is to run the BLOCK-ALL algorithm at ev ery time a new blacklist is provided. Otherwise, if only few IPs are inserted/remov ed from a blacklist to the successiv e one, we can update all and only the ﬁlters affected by that change. For example, consider two blacklists, B L T i − 1 , B L T i , which differ only in a single ne w IP inserted in B L T i . Assume that S T i − 1 , the solution to the BLOCK-ALL problem with blacklist B L T i − 1 , has already been computed. W e want to ﬁnd an efﬁcient algorithm that computes S T i . Fig. 3. As an example, assume having a 6-bits IP space, instead of the usual 32 bits. A new IP , corresponding to 37 in decimal notation, is inserted in the blacklist made up of IPs: 3,10,15,17,22,31,32,33,57,58. Its insertion requires that all and only its predecessor nodes in the LCP-tree are updated according to Eq.(8) (or Eq.(15) if we are running BLOCK-SOME). Moreo ver , a new node, in gray , is created to denote the longest common preﬁx between 37 and 32 (or 33). Note that, all other nodes corresponding to the longest common preﬁxes between 37 and other IPs in the blacklist, is already in the initial LCP-tree. Basically , there are to two separate cases depending on whether or not the new IP is covered by some preﬁx which is already ﬁltered in S T i − 1 . If this is the case, no further action is needed, and S T i = S T i − 1 . Otherwise, we need to modify the ﬁlters to also cov er the new IP . An ef ﬁcient way to do so, is illustrated in Fig.3. When a new IP appears in the blacklist, only one intermediate node needs to be added to the LCP-tree: the one corresponding to the longest common preﬁx between the new node the and its “closest” IP already in the blacklist (gray node in Fig.3). As learnt from the previous sections, an optimal allocation of f ﬁlters at preﬁx p/l , depends only on how these f ﬁlters are allocated to the children nodes of p in the LCP-tree. Thus, the insertion of a ne w IP in the blacklist requires only the re-computation of z p ( f ) and X p ( f ) ∀ f , through Eq.(8), for all and only the predecessors of the new node in the LCP-tree (nodes along the dashed path in Fig.3). Multiple insertions can be handled by iterating the abov e procedures for ev ery insertion. Handling remov al operations (i.e. IPs that are removed from a blacklist) is similar: when removing an IP , we also remove its parent node, since it stops being the longest common preﬁx of two IPs, and we update all other predecessor nodes according to Eq.(8). W e note that since any LCP-tree is also a binary tree, there are at most log( N ) predecessors of any leaf node, thus the abov e procedure requires O (log( N ) F max ) operations. This is a more ef ﬁcient update scheme, than running Alg.1 from scratch, if and only if the number of insert/remove operations that need to be performed to obtain B L T i from the previous blacklist, BL T i − 1 , is less than N log N . Otherwise it is less expensi ve to simply run Alg.1 with input the new blacklist, B L T i . Finally , we note that we can use the same approach to solve the dynamic BLOCK-SOME problem. In that case as well, arriv als/departures of malicious addresses from the blacklist 8 can be handled by insertions/deletions in/from the LCP-tree; the ﬁlters should be updated accordingly so that they provide an optimal solution to the static BLOCK-SOME problem for the input blacklist at e very time. E. FLOODING Goal. Giv en: (i) a blacklist of malicious addresses (ii) a set of legitimate sources (ii) the amount of traf ﬁc that each generates (iii) a limit on the number of ﬁlters F max and (iv) a constraint on the link capacity (bandwidth) C ; select some source address preﬁxes to block so as to minimize the collateral damage and make the total trafﬁc ﬁt within the link capacity . F ormulation. min X p/l g p/l x p/l (19) s.t. X p/l x p/l ≤ F max (20) X p/l  g p/l + b p/l  (1 − x p/l ) ≤ C (21) X p/l : i ∈ p/l x p/l ≤ 1 ∀ i ∈ B L (22) where, g p/l and b p/l denote the amount of good bad trafﬁc from preﬁx p/l , respecti vely . Eq.(22) indicates that we are interested in blocking some, not all, malicious sources, and that we should not use ov erlapping preﬁxes. Before the attack, the total good traf ﬁc t 0 = P p/l  g p/l + b p/l  could ﬁt within the capacity; after ﬂooding, the total trafﬁc exceeds the capacity . Eq.(21) says that the total trafﬁc that remains unblocked after ﬁltering should ﬁt within the link capacity C . Characterizing an Optimal Solution. W e use the LCP tree for all addresses B L ∪ G . Furthermore, to account for Eq.(21), we assign a cost, t p , to ev ery node in the LCP tree, represent- ing the total trafﬁc generated by preﬁx p/l , t p = g p + b p . Pr oposition 3.4: Given B L , G , F max , and C , there exists an optimal solution of the FLOODING problem that can be represented as a pruned subtree of LCP( B L ∪ G ), with the same root, up to F max leav es, and s.t. the total cost of the leav es be ≥ t 0 − C . Pr oof: The proof is along the same guideline of Prop.3.1. It can be sho wn that every feasible solution of FLOODING, S , can be mapped in another feasible solution, S 0 , which i) correspond to a subtree of LCP-tree( B L ∪ G ) as described in Prop.3.4, and ii) whose collateral damage is smaller or equal to the collateral damage of S . T o see this, assume S uses a preﬁx ˜ p/ ˜ l , which is in not in LCP-tree( B L ∪ G ). There cannot be good or bad sources in each of the two siblings preﬁxes, ˜ p/ ( ˜ l + 1) . If this was the case, ˜ p/ ˜ l would be their longest common preﬁx, and consequently it would appear in LCP-tree( BL ∪ G ). Thus, there are two cases: If ˜ p/ ˜ l does not include any good or source we can simply remove it; otherwise we can ﬁlter only the branch that has some sources. Since the removed branch does not ha ve acti ve sources, the obtained solution is still feasible and the overall collateral damage is not increased (we are ﬁltering a subset of what w as already ﬁltered). Iterating this process until all preﬁx es are in LCP-tree( BL ∪ G ), pro ve that any feasible solution can be interpreted as a subtree of the LCP-tree, where the leaves are the actual ﬁlters used. Thus, also an optimal feasible solution can be represented in this way . Finally , we have that, in order to hav e the allowed trafﬁc within the capacity C , the ﬁltered trafﬁc, represented by the sum of costs t p at the subtree leaves, must be greater of equal than t 0 − C . Algorithm. FLOODING is a 2-dimensional knapsack prob- lem (2KP), with an additional capacity constraint, Eq.(22), that makes it harder . 2KP is a “very hard” problem: not only it is NP-Hard, but also the existence of a full polynomial time approximation scheme for this problem is unlikely to exist, since it would imply that P = N P [6]. F or FLOODING we obtain the follo wing hardness result: Theor em 3.5: The optimization problem FLOODING, in Eq.(19)-(22), is NP-Hard. Pr oof: It is obvious that FLOODING is in N P . T o prove that it is also N P -hard, we consider the KP problem with a cardinality constraint: max X i ∈ I p i x i , s.t. X i ∈ I w i x i ≤ C 1 and X i ∈ I x i = k (23) which is known to be N P -hard [4], and we show that it reduces to FLOODING. First, note that any solution of the abov e problem that uses F < F max ﬁlters can be transformed to another feasible solution with exactly F max ﬁlters, without increasing the collateral damage. 4 Therefore, the inequality in Eq.(20) can be replaced by an equality without affecting the collateral damage of the optimal solution. Second, we deﬁne ¯ x p/l = 1 − x p/l , ¯ F max =  P p/l 1  − F max and we re write the above problem: max X p/l g p/l ¯ x p/l s.t. : X p/l ¯ x p/l = ¯ F max , (24) X p/l  g p/l + b p/l  ¯ x p/l ≤ C, X p/l : i ∈ p/l ¯ x p/l ≤ 1 ∀ i ∈ B L (25) For a given instance of Problem (23), we construct an equiv a- lent instance of Problem (24)-(25) by introducing the follow- ing mapping. For i = 1 , . . . , N : − ¯ g ii = p i , ( g ii + b ii ) = w i . For p/l that is not in the blacklist: ¯ g p/l = 0 and ( g p/l + b p/l ) = C + 1 . Moreov er , we assign ¯ F max = k and C = C 1 . W ith this assignment a solution to the KP problem (23) can be obtained by solving FLOODING and then taking the values of variables x p/l s.t p/l is in the blacklist. Therefore, we do not to look for a polynomial time algorithm. Instead, we designed a pseudo-polynomial time 4 This can be prov ed using the LCP-tree structure. Gi ven a solution, S , with F < F max ﬁlters, (until F < N ) there exist always a ﬁlter that can be replaced by two ﬁlters, corresponding its children. The solution constructed in such a way has F + 1 ﬁlters, keeps on blocking all IPs blocked in S , and has value less or equal than the value of S . 9 algorithm that optimally solves FLOODING, and whose com- plexity gro ws linearly with the number of acti ve sources (either good or bad). Let z p ( F , c ) be the minimum collateral damage solving FLOODING problem with F ﬁlters and capacity c : z p ( F , c ) = min n =0 ,...,F m =0 ,...,c { z s l ( F − n, c − m ) + z s r ( n, m ) } (26) Complexity . The DP approach computes O ( C F max ) entries for ev ery node. Moreov er , the computation of a single entry , giv en the entries of descendant nodes, require O ( C F max ) operations, Eq.(26). W e can leverage again the observ ation that we do not need to compute C F max entries for all nodes in the LCP tree. At a node p , it is suf ﬁcient to compute Eq.(26) only for c = 0 , ..., ˜ C = min { C, P i ∈ p/l w i } ≤ C and f = 0 , ..., ˜ F . Therefore, the optimal solution to FLOODING, z root ( F max , C ) , can be computed in O (( N + |G | ) C 2 ) time. The algorithm has pseudo-polynomial complexity since it is polynomial in C that cannot be bounded by the input length. More importantly , its complexity increases linearly with the number of IP sources in B L ∪ G . FLOODING vs. BLOCK-SOME. T o see the connection between FLOODING and BLOCK-SOME, let us consider a partial Lagrangian relaxation of (19)-(22): max λ ≥ 0 n min X p/l h (1 − λ ) g p/l − λb p/l i x p/l + (27) + X p/l λ ( g p/l + λb p/l ) − λC o s.t. X p/l x p/l ≤ F max (28) X p/l : i ∈ p/l x p/l ≤ 1 ∀ i ∈ B L (29) For ev ery ﬁxed λ ≥ 0 problem (27)-(29) is equiv alent to (19)- (22) for a speciﬁc assignments of weights w i . This shows that dual feasible solutions of FLOODING are instances of BLOCK-SOME for a particular assignment of weights. The dual problem, in the variable λ , aims exactly at tuning the Lagrangian multiplier to ﬁnd the best assignment of weights. 5 F . DIST(RIBUTED)-FLOODING Goal: Consider a victim V that connects to the Internet through its ISP and is ﬂooded by a set of attackers (listed in a blacklist B L ), as in Fig.1(a). T o reach the victim, attack trafﬁc has to pass through one or more ISP routers; let R be the set of unique routers from some attacker to the victim. Let 5 Problem (27)-(29) can be solved in a standard way with a projected subgradient method [4] x ( k ) p/l = x ∗ p/l ( λ ( k ) ) , ∀ p, l (30) λ ( k +1) = ˆ λ ( k ) + α k ` ( f g p/l + f b p/l )(1 − x ( k ) p/l ) − C ´˜ + (31) where, x ( k ) p/l is the k th iteratation, x ∗ p/l ( λ ( k ) ) is the optimal solution of (27)- (29) for λ = λ ( k ) , α k > 0 is the k th step size, and [ · ] + indicates the projection over the set of non-negative numbers. each router u ∈ R have capacity C ( u ) on the downstream link (tow ards V ) and a limited number of ﬁlters F ( u ) max . W e assume that the volume of good/bad trafﬁc through ev ery router is known. Our goal is to allocate ﬁlters across all routers, in a distributed way , so as to minimize the total collateral damage and avoid congestion on all links of the ISP network. F ormulation. Let the variables x ( u ) p/l ∈ { 0 , 1 } indicate whether or not ﬁlter p/l is used at router u . Then the distributed ﬁltering problem can be stated as: min X u ∈R X p/l g ( u ) p/l x ( u ) p/l (32) s.t. X p/l x ( u ) p/l ≤ F ( u ) max ∀ u ∈ R (33) X p/l  g ( u ) p/l + b ( u ) p/l  (1 − x ( u ) p/l ) ≤ C ( u ) ∀ u ∈ R (34) X u ∈R X p/l 3 i x ( u ) p/l ≤ 1 ∀ i ∈ B L (35) Characterizing an Optimal Solution . Giv en the sets B L , G , R , and F ( u ) max , C ( u ) at each router , we have: Pr oposition 3.6: There exists an optimal solution of DIST - FLOODING that can be represented as a set of |R| dif ferent pruned subtrees of the LCP-tree( B L ∪ G ), each corresponding to a feasible solution of FLOODING for the same input, and s.t. every subtree leaf is not a node of another subtree. Pr oof. Feasible solutions of DIST -FLOODING allocate ﬁl- ters on different routers s.t. Eq.(33) and (34) are satisﬁed independently at e very router . In the LCP tree, this means having |R| subtrees, one for ev ery router , each having at most F ( u ) max leav es and associated blocked trafﬁc ≥ t ( u ) 0 − C ( u ) , where t ( u ) 0 is the total incoming trafﬁc at router u . Each subtree on its own can be thought as a feasible solution of a FLOODING problem. Eq.(35) ensures that the same address is not ﬁltered multiple times at different routers, to av oid redundant waste of ﬁlters. In the LCP-tree, this translates into ev ery leaf of the different subtree appearing at most in one subtree.  Algorithm. Constraint (35), which imposes that different routers do not block the same preﬁxes, prev ents us from a direct decomposition of the problem. T o decouple the problem, consider the follo wing partial Lagrangian relaxation: L ( x, λ ) = X u ∈R X p/l g ( u ) p/l x ( u ) p/l + X i ∈BL λ i  X u ∈R X p/l 3 i x ( u ) p/l − 1  = X u ∈R  X p/l  g ( u ) p/l + λ p/l  x ( u ) p/l  − X i ∈BL λ i (36) where λ i is the Lagrangian multiplier (price) for the constraint in Eq.(35), and λ p/l = P i ∈ p/l λ i is the price associated with preﬁx p/l . W ith this relaxation, both the objective function and the other constraints immediately decompose in |R| 10 independent sub-problems, one per router u : min X p/l  g ( u ) p/l + λ p/l  x ( u ) p/l (37) s.t. X p/l x ( u ) p/l ≤ F ( u ) max (38) X p/l  g ( u ) p/l + b ( u ) p/l  (1 − x ( u ) p/l ) ≤ C ( u ) (39) The dual problem is: max λ i ≥ 0 X u ∈R h u ( λ ) − X i ∈BL λ i (40) where h u ( λ ) is the optimal solution of (37)-(39) for a giv en λ . Giv en the prices λ i , ev ery sub-problem (37)-(39) can be solved independently and optimally by router u using e.g. Eq. (26). Problem (40) can be solved using a projected subgradient method, similarly to Eq.(30)-(31), as discussed in [4]. Note, howe ver , that since x ∈ { 0 , 1 } the dual problem is not always guaranteed to con verge to a primal feasible solution [7], [8]. Distributed vs. Centralized Solution. The above formulation lends itself naturally to a distributed implementation. Each router needs to only solve their own subproblem (37)-(39) independently from the others. A single machine (e.g. the victim’ s gate way or a dedicated node) should solve the master problem (40) to iteratively ﬁnd the prices that coordinate all subproblems. Thus, at e very iteration of the subgradient, the new λ i ’ s need to be broadcasted to all routers. Giv en the λ i ’ s, the routes independently solve a sub-problem each and return the computed x ( u ) p/l to the node in char ge of the master problem. Even in a centralized setting, our distrib uted scheme is ef ﬁcient because it lends itself to parallel computation of Eq.(32)-(34). I V . P R A C T I C A L E V A L U A T I O N The focus of this paper is the design of optimal and com- putationally efﬁcient algorithms for a variety of ﬁlter selection problems. In this section, we use real blacklists to demonstrate that ﬁlter optimization brings signiﬁcant gain in practice. The reason is that, in practice, malicious sources appear clustered in the IP address space, a feature that is exploited by our algorithms. Due to lack of space, the simulations presented in this section are not e xhaustiv e. Howe ver , the y demonstrate the above point as well as some of the structural properties of the solution for BLOCK-ALL and BLOCK-SOME, which are at the heart of this frame work. As discussed in section III, FLOODING is essentially an instance of BLOCK-SOME for a particular assignment of weights and DIST -FLOODING consists of se veral FLOODING problems. A. Simulation Setup W e analyzed 61-days traces from Dshield.org [3] - a repos- itory of ﬁrewall and intrusion detection logs from about 2,000 different organizations. The dataset includes 758,698,491 at- tack reports, from 32,950,391 different IP sources. Each report includes, among other things, the malicious source IP and the 10 0 10 1 10 2 10 3 10 4 10 5 10 6 0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02 Fmax CD/N High Clustering − kMeans Low Clustering − optimal High Clustering − optimal Fig. 4. BLOCK-ALL: collateral damage (CD) normalized over the number of malicious sources N ) vs. number of ﬁlters F max . W e compare Algorithm 1 to K-means. (In particular, we simulated Lloyd’ s heuristic for K-means, which is NP-hard; we ran 50 runs to avoid local minima.) W e also run Algorithm 1 on two traces, those with the highest and lowest degree of clustering. victim’ s destination IP . By studying these logs, we veriﬁed that malicious sources are clustered in a few preﬁx es, rather than uniformly distributed over the IP space, which has also been observed by others [9]. This is an important observation in practice, because clustering in a blacklist means that a small number of ﬁlters is suf ﬁcient to block most malicious IPs at low collateral damage. W e looked at each victim (indi vidual IP destination) in the dataset; the set of sources attacking each victim is a blacklist for our simulations. This “view” varies considerably among victims. W e also generated good trafﬁc according to a realistic scenario: a domain hosting 20 servers, each server with average rate of 1,000 incoming good connections per second, each connection generating 5KB of traf ﬁc. W e generated the good IP addresses according to the multifractal distribution in [10]. B. Simulation Results BLOCK-ALL. In Fig. 4, we chose two different victims, each attacked by large number (up to 100,000) of malicious IPs in a single day . W e picked these particular two because they ha ve the highest and the lowest de gree of attack source clustering observed in the entire dataset. W e ran Algorithm 1 on these two blacklists and made sev eral observations. First, the optimal algorithm performs signiﬁcantly better than a generic clustering algorithm that does not exploit the structure of IP preﬁxes. In particular, it reduces the collateral damage (CD) by up to 85 % compared to K-means, when run on the same (high-clustering) blacklist. Second, as expected, the degree of clustering in a blacklist matters. The CD is lowest (highest) in the blacklist with highest (lo west) degree of clustering, respectively . Results obtained for other victim destinations and days were similar and lied in between the two extremes. A few thousands of ﬁlters were sufﬁcient to signiﬁcantly reduce collateral damage (CD) in all cases. BLOCK-SOME. In Fig.5, we focus on the blacklist with the least clustering and thus the highest CD (dashed line in Fig.4). In this worst case scenario, an alternati ve to BLOCK-ALL is 11 10 0 10 1 10 2 10 3 10 4 10 5 0 2 4 x 10 −3 CD/N Low W High W 10 0 10 1 10 2 10 3 10 4 10 5 0 0.5 1 # unfiltered bad IPs/N Low W High W 10 0 10 1 10 2 10 3 10 4 10 5 −2 −1 0 x 10 9 Tot. Cost Fmax Low W High W Fig. 5. BLOCK-SOME. (a) collateral damage (CD) (b) number of unblock ed bad IPs (UBIP) (c) total cost ( C D − W · U B I P ). The operator expresses relativ e tolerance to UBIP vs. CD by tuning the weight W = w b w a . W e considered a higher ( 2 14 ) and a lower ( 2 10 ) value of W . BLOCK-SOME, which allows the operator to trade-off lower CD for unblocked bad IPs (UBIP) by appropriately tuning the weights. For simplicity , in Fig.5, we assigned the same weights w g and w b to all good and bad sources; howe ver , the framew ork has the ﬂexibility to assign weights to speciﬁc IPs. In Fig.5(a), the CD is alw ays smaller than the corresponding CD in Fig.4; the y become equal only when we block all bad IPs. In Fig.5(b), we see that BLOCK-SOME reduces the CD by 60% compared to BLOCK-ALL while leaving unblocked only 10% of bad IPs and using only a fe w hundreds a ﬁlters, In Fig.5(c), the total cost decreases as F max increases. As deﬁned in Eq.(12), this is the weighted sum of CD and UBIP . Howe ver , the behavior of these two competing factors is more complicated and depends strongly on the input blacklist. In the data we analyzed we observed that CD tends to ﬁrst increase and then decrease with F max , while UBIP tends to decrease. 6 The ratio w b /w g captures the effort made by BLOCK-SOME to block all bad IPs and become similar to BLOCK-ALL. 7 V . O U R W O R K I N P E R S P E C T I V E A. The Bigger Picture of Defense against Malicious T rafﬁc Dealing with malicious trafﬁc is a hard problem that re- quires the cooperation of sev eral components. In this paper , we did not propose a novel solution; instead, we optimized the use of ﬁltering - a mechanism that already e xists on the Internet today and is a necessary building block of any 6 W e can explain this as follows. When a new ﬁlter is av ailable, the new optimal solution can be constructed by (i) blocking a new cluster of bad IPs (ii) splitting a blocked cluster into two ﬁlters or (iii) a combination of (i)&(ii)& merging of existing ﬁlters. For small F max , option (i) is dominant: the inherent clustering allows to ﬁnd a cluster that is not block ed yet; this increases CD and reduces UBIP . When this is not possible, option (ii) becomes dominant, CD decreases and UBIP remains constant or decreases slowly . 7 Since we picked a ratio w b /w g > 1 , bad IPs are more important. When F max is high, the algorithm ﬁrst tries to cover small clusters or single bad IPs. In the case of high W , this happens around 10 , 000 ﬁlters: CD remains almost constant in this phase, at the end of which all bad IPs are ﬁltered (as in Fig.5(b)). In the ﬁnal phase, the algorithm releases single good IPs, which are less important and all bad IPs are blocked similarly to BLOCK-ALL. bigger solution. W e focused on the optimal construction of ﬁltering rules, which can be then installed and propagated by ﬁltering protocols [11], [12]. W e rely on a detection module, e.g. an intrusion detection system or historical data, to distinguish good from bad traf ﬁc and provide us with a blacklist. Detection is a dif ﬁcult but orthogonal problem to the contribution of this paper . The sources of legitimate trafﬁc are also assumed known, for estimating the collateral damage. Finally , we consider addresses in the blacklist to be true and not spoofed. This is reasonable today that attackers hav e the luxury to use botnets, and control a huge number of infected hosts for a short period of time, so that they do not even need to use spooﬁng. On 2005, less than 20% of addresses were spoofable [13], while in 2008, only 7% of addresses in Dshield logs were found likely spoofed [9]. Even if there is some amount of spoofed trafﬁc, our algorithms treat it as the rest of malicious trafﬁc and weight the cost vs. the beneﬁt of blocking a source preﬁx (which may include both malicious spoofed and legitimate trafﬁc). Looking into the future, there is also a number of proposals promising to enforce source accountability , including ingress ﬁltering [14], self-certifying addresses [15], pack et passports [16]. T o the extent that spooﬁng interferes with the ability to deﬁne blacklists, our algorithms work best together with an anti- spooﬁng mechanism, but also do the best that can be done today without it. A practical deployment scenario is that of a single network under the same administrative authority , such as an ISP or campus network. The operator can use our algorithms to create ﬁltering rules, at a single or at se veral routers, in order to optimize the use of its own resources and defend against an attack in a cost-efﬁcient way . Our distributed algorithm may also prov e useful, not only for a distributed protocol of routers within the same ISP , but also in the future, when dif ferent ISPs start cooperating against common enemies. In a different context, our algorithms may also be applicable to conﬁgure ﬁr ewall rules to pr otect public-access networks , such as uni- versity campus networks or web-hosting networks; although ﬁrew alls are implemented in softw are, there is still an incenti ve to minimize the number of their rules for performance reasons. The follo wing papers are related to our work. In [17], source ﬁltering via A CLs was studied against DDoS attacks; howe ver , the ﬁlters were heuristically selected and the approach was entirely simulation-based. There is a body of work on ﬁr ewall rule conﬁguration [18], which focuses on management and misconﬁgurations, not on resource allocation. Furthermore, they consider ﬁrew alls for enterprises, which are not supposed to be accessed from outside and thus can be protected without ﬁltering rules. In our workshop paper [19], we also studied optimal source-based ﬁltering by aggregating source addresses into continuous ranges (of numbers in [0 , 2 32 − 1] ) not pr eﬁxes . This was an easier problem that allowed for greedy solutions. Unfortunately , ranges are not implementable in A CLs; fur- thermore, it is well-known that ranges cannot be efﬁciently approximated by a combination of preﬁxes [5] . Therefore, despite the intuition we gained in [19], we had to solve the 12 problem of preﬁx ﬁltering from scratch in this paper . B. Relation to Knapsac k Problems The optimal ﬁlter selection belongs to the family of multi- dimensional knapsack pr oblems (dKP) [4]. The general dKP problem is well-known to be NP-hard. The most relev ant variation to us is the knapsack with car dinality constraint (1.5KP) [21], [22], which has d = 2 constraints, one of them being a limit on the number of items: P j ∈N w j x j ≤ C, P j ∈N x j ≤ k . The 1.5KP problem is also NP-hard. These classic problems do not consider correlation between items. Howe ver , in ﬁltering, the selection of an item (preﬁx) voids the possibility to select other items (all overlapping preﬁxes). dKP problems with corr elation between items hav e been studied in [23], [24], where the items were partitioned into classes and up to one item per class was picked. In our case, a class is the set of all preﬁx es cov ering a certain address. Each item (preﬁx) can belong simultaneously to an y number of classes, from one class (/32 address) to all classes (/0 preﬁx). T o the best of our kno wledge, we are the ﬁrst to tackle the case where the items belong to classes that are not a partition of the set of items. Finally , continuous r elaxations do not help. Allo wing x p/l to be fractional corresponds to rate-limiting of preﬁx p/l . Howe ver , there is no advantage neither from a practical (rate limiters are more expensiv e than A CLs, because in addition to looking up packets in TCAM, they also require rate and computation on the fast path) nor from a theoretical point of view (the continuous 1.5KP is still NP-hard [25].) In summary , the special structure of ﬁltering problems, i.e. the hierarchy and overlap of candidate preﬁxes, leads to novel variations of dKP that could not be solv ed by existing methods. V I . C O N C L U S I O N In this paper , we introduced a formal framew ork to study ﬁltering problems. The framework is rooted at the theory of the knapsack problem and provides a nov el extension of it. W ithin it, we formulated ﬁve practical problems, presented in increasing order of comple xity . For each problem, we designed optimal algorithms that are also lo w-complexity (linear in the input size) in practical scenarios. W e also highlighted connections between dif ferent problems: at the heart of all problems lies BLOCK-SOME; BLOCK-ALL and FLOOD- ING are special instances for speciﬁc assignment of weights, and DIST -FLOODING decomposes into several independent FLOODING problems. Finally , we did simulations using Dshield traces; a ke y insight was that our algorithms can exploit the spatial clustering that is inherent in real blacklists. There are several directions for future work. W e plan to ex- tend the framework to dynamically update the ﬁltering rules as blacklists change ov er time, combine source- with destination- based ﬁltering, deal with adv ersarial scenarios, and study the interaction between ﬁltering and detection mechanisms. W e will also provide a more extensi ve experimental ev aluation, which is not the focus of this paper . R E F E R E N C E S [1] Cisco Systems, White Paper , “Understanding acl on catalyst 6500 series switches, ” www .cisco.com/en/US/pr oducts/hw/switches/ps708/pr oducts white paper09186a00800c9470.shtml. [2] Cisco Systems, White Paper , “Protecting your core: Infrastructure protection access control lists, ” http://www .cisco.com/en/US/tech/tk648/tk361/technolo gies white paper09186a00801a1a55.shtml. [3] Dshield http://www .dshield.or g/ [4] H.K ellerer ,U.Pferschy ,D.Pisinger,“Knapsack Problems”, Springer,2004. [5] G. V arghese, “Network Algorithmics, ” Morgan Kaufmann, 2005. [6] G.V . Gens and E.V . Levner , “Computational complexity of approxima- tion algorithms for combinatorial problems”, in Mathematical F ounda- tions of Computer Science, LNCS , vol. 74, pp.292-300, Springer 1979. [7] D.P . Bertsekas, “Non linear programming, ” Athena Scientiﬁc, 2003. [8] S. Boyd, L. V andenberghe, “Conve x Optimization, ” Cambridge Univer - sity Press, 2004. [9] Z.Chen, C.Ji, P .Barford, “Spatial-T emporal Characteristics of Internet Malicious Sources, ” in IEEE INFOCOM (Mini-Conf.) , April 2008. [10] E. Kohler , J. Li, V . Paxson, and S. Shenker , “Observed structure of addresses in IP trafﬁc, ” IEEE/ACM T oN 14(6) ,pp.1207-1218,Dec.2006. [11] K. Argyraki, D.R.Cheriton, “ Activ e Internet Traf ﬁc Filtering: Real-time Response to DoS Attacks”, in Proc. of USENIX Security 2005 . [12] X. Liu, X. Y ang, Y . Lu, “T o Filter or to Authorize: Network-Layer DoS Defense Against Multimillion-node Botnets, ” in ACM SIGCOMM 2008 . [13] R. Beverly , S. Bauer , “The Spoofer Project: Inferring the Extent of Internet Source Address Filtering on the Internet, ” in SRUTI 2005 . [14] P .Ferguson,D.Senie,“Netw ork Ingress Filtering: Defeating DoS Attacks which employ IP Source Address Spooﬁng”, RFC 2827 , May 2000. [15] D.Andersen,H.Balakrishnan,N.Feamster ,T .K oponen,D.Moon,S.Shenker , “ Accountable Internet Protocol (AIP), ” in ACM SIGCOMM 2008 , Seattle, Aug. 2008. [16] X. Liu, A. Li, X. Y ang, D. W etherall, “Passport: Secure and Adoptable Source Authentication, ” in Proc. USENIX/ACM NSDI 2008 . [17] G. Pack, J. Y oon, E. Collins, C. Estan, “On Filtering of DDoS Attacks Based on Source Address Preﬁxes, ” in Pr oc. of SecureComm , Aug.2006. [18] E. Al-Shaer et al., “Fire wall Policy Advisor Project”, De Paul Univ . http://www .mnlab .cs.depaul.edu/pr ojects/SP A/ [19] F . Soldo, K. El Defrawy , A. Markopoulou, B. Krishnamurthy , and K. v .d. Merwe, “Filtering sources of unwanted trafﬁc” in Pr oc. of Inf. Theory and Applications W orkshop , UCSD, Jan. 2008. [20] A Frville, “The multidimensional 01 knapsack problem: An overvie w”, Eur opean Journal of Operational Researc h , Elsevier , 2004 [21] A. Caprara, H. Kellerer , U. Pferschy , and D. Pisinger, “ Approximation algorithms for knapsack problems with cardinality constraints”, Euro- pean Journal of Operational Research, 123:333-345, 2000 [22] P .C. Gilmore, and R.E. Gomory , “ A linear programming approach to the cutting stock problem, part II”, Operations Researc h , 1964 [23] A. Bagchi, N.Bhattacharyya, N.Chakravarti, “LP relaxation of the two dimensional knapsack problem with box and GUB constraints”, Euro- pean Journal of Operational Researc h , Elsevier , 1994 [24] V .C. Lia, G.L. Curry , “Solving multidimensional knapsack problems with generalized upper bound constraints using critical ev ent tabu search, ” Computers and Operations Researc h , Elsevier , 2005 [25] I.R. de Farias, G.L. Nemhauser, “ A polyhedral study of the cardinality constrained knapsack problem”, in Math. Progr . , Springer , 2002 [26] P .Barford, R.Nowak, R. W illett Rebecca, V . Y egneswaran, “T oward a Model for Sources of Internet Background Radiation, ” in Pr oc. of the P assive and Active Measurement Conference (P AM ’06) , March, 2006. [27] M. P . Collins, S. Faber , J. Janies, R. W eaver , M. De Shon, “Using Uncleanliness to Predict Future Botnet Addresses, ” in Proc. of ACm IMC 2007 , San Diego, CA, Oct. 2007. [28] Z. Chen and C. Ji, “Measuring Network-A ware W orm Spreading Abil- ity , ” in Pr oc. of IEEE INFOCOM 2007 . [29] N.Megiddo and A.T amiry , “Linear Time Algorithms for Some Separable Quadratic Programming Problems”, Operations Researc h Letters , 1993. [30] MR Garey , DS Johnson, “Computers and Intractability: A Guide to the Theory of NP-completeness, ” W .H.Freeman & Co Ltd, 1979

Optimal Filtering of Malicious IP Sources

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment