Optimal Data Placement on Networks With Constant Number of Clients

Optimal Data Placemen t on Net w orks With Constan t Num b er of Clien ts Eric Angel 1 , Evripidis Bampis 1 , Gerasimos G. P ollatos ⋆ 2 , and V assilis Zissimop oulos 2 1 Universit ´ e d’ ´ Evry-V al d’Esso ne, I BISC CNRS FRE 2873, 523 place des T errasses, 91000 ´ Evry , F rance { angel,ba mpis } @ibisc. univ-evry.fr 2 Department o f Informatics and T elecomm unications, U niversit y of Athens, Greece { gpol,vas silis } @di.uo a.gr Abstract. W e introduce optimal alg orithms for the problems of data placemen t (D P) and page placement (PP) in netw orks with a constant num b er of clients each of which has limited storage av ailabilit y and is- sues requests for data ob jects. The ob jective fo r b oth problems is t o eﬃcien tly utilize eac h clien t’s storage (deciding where to place rep licas of ob jects) so that the total incurred access and install ation cost o ver al l clien ts is minimized. In the PP problem an extra constrain t on the max- im um num ber of clients served by a single client m ust b e satisﬁed. Our algorithms solve both problems optimally when all ob jects h a ve uniform lengths. When ob jects lengths are non-u n iform we also ﬁnd the optimal solution, al b eit a small, asymptotically t ight v iolation of eac h clien t’s storage size by ε l max where l max is th e max im um length of the ob jects and ε some arbitrarily small p ositive constant. W e make no assumption on the underlying top ology of the netw ork (metric, u ltrametric etc.), thus obtaining the ﬁ rst non-trivial results for non- metric data placement problems. 1 In tro duction Peer-to-p eer ﬁle sharing netw orks hav e b ecome one of the most p opular as- pec ts of everyday in ternet usage. Users from a ll around the glob e in teract in an asynchronous manner, beneﬁting from the av a ila bility of the de s ired conten t in neighboring or mo r e distant lo ca tions. The success of such systems stems from the ex ploitation o f a new r e source, diﬀerent fro m the traditiona l bandwidth - related resourc es, namely the distr ibuted storage . Widespread utilization of this new r esource is due to the fact that larg er capacities hav e b ecome cheaper, with signiﬁcantly smaller data a ccess times. Interacting user s, utilize this reso urce by installing lo ca l storag e, replicating p opular conten t and then making it av ail- able to neig hboring user s , th us dr amatically decreas ing bandwidth consumption, needed to acc e ss co nten t from the or igin s ervers a t which it is av ailable. ⋆ This w ork has b een supp orted in part by the p ro ject BIONETS (BIO logicall y in- spired NETw ork and Services) (FP6-027748) A suitable abstra ct mo del describing p erfectly the aforementioned situation is the data plac ement problem (DP) [1 ]. Under this mo del, a set o f clients (equiv- alently user s or machines) with a n underlying top o lo gy is consider ed and each client ha s a lo cal amount of stor age (cache) installed. Given the s e t of av a ilable ob jects a nd the pre fer ence that each client has for e ach ob ject, the ob jective is to de c ide a replication scheme, also referre d to as a plac ement of ob jects to lo c a l caches so as to minimize the total a ccess cost among a ll clients and ob jects. The generaliza tion of this mo del, under which each clien t’s cache has a n upp er b o und on the num b er of clients it ca n ser ve is known as the p age plac ement problem (PP) [1 4]. It should b e noted, that the term r eplication is used here instead of ca ching, bec ause under the discus s ed mo del, a client cannot change the con tent s of its lo cal stor age without re invocation of a replication a lgorithm. On the contrary , the ter m caching refer s to the pro c ess of choo sing ob jects to stor e lo cally s o as to serve requests and use a replac e ment scheme so as to repla ce some of them for o thers o n-the-ﬂy according to p opular it y o r other cr iteria. Our contr ibutions . W e descib e o ptimal a lgorithms, combining conﬁguratio ns generation and dynamic pr o gramming tech niques, for the data placemen t a nd page placement problems, when the num ber o f clients is constant. This is a na tu- ral v ariation, in ter esting from b oth a theo retical and practica l po int of view ([3, 4, 7]). Up to now, the o nly wa y to tackle these problems, was the 10- approximation algorithm o f [2] and the 13-a pproximation algo r ithm o f [6]), b oth designed fo r the general case and b oth base d on r o unding the so lution o f a n appropr ia te linear pr ogra m. When ob ject lengths are uniform (or equiv alently unit) our al- gorithm ﬁnds the optimum solution in p olynomia l time. When ob ject lengths are non-unifor m, our algor ithm returns a n optimum solution which viola tes the capacities of the clients’ caches by a small, asymptotica lly tight additive fa c tor. Our r esults, summarized in table 1, ca n b e mo diﬁed to ha ndle v a rious extensio ns of the basic pro blems such as the connected data placement problem ([2 ]) wher e ob ject updates are frequen t and consis tency of all re plica s of each ob ject has to b e guaranteed and the k -media n v ariant of DP where b ounds are imp osed on the n umber of ma x imu m replicas allow ed for each ob ject. F ur thermore, our results are a pplicable with unifor m and no n-uniform o b ject lengths and can b e employ ed indep endently of the underlying top olog y of the netw ork, thus g iving the ﬁr st non- trivial results fo r no n- metric DP problems. Related w ork . The study for the data placement pro blem ov er an arbitrary net work where all inter-clien t distances form a metric was initiated in [1] wher e the authors proved tha t the problem in the case of ob jects o f uniform length is MAXSNP-hard. They also devised a p o lynomial 2 0.5-appr oximation algo r ithm based on the rounding o f the optimal so lutio n of a suitable linear prog ram. In the case of ob jects of non-unifor m length, the authors proved tha t the problem of deciding whether an instance a dmits a so lution is NP-complete and provided a po lynomial 20.5- approximation algor ithm that pro duces a so lution a t which the capacity of each client ’s cache exceeds its capa c ity in the optimum solution by Kno wn results In this paper arbitrary M ﬁxed M metric no me tric m etric no me tric uniform lengths DP 10-appro x [1, 2, 6] - optimal non-uniform lengths DP 10-appro x with - optimal with blo w-up l max [2] blo w-up εl max page placemen t 13-approx [6] ∗ - optimal with blo w-up ∗∗ εl max connected DP 14-appro x [2] - optimal k - median DP 10-appro x [1, 2] - optimal T abl e 1. The main k now n results on data placemen t problems ( ∗ non-uniform lengths with constant blo w-u p on clients and cac he capacities, ∗∗ non-uniform lengths with constant blow-up on cache capacities only). at most the length of the largest o b ject. The appr oximation ra tio for unit-sized ob jects was later impr ov ed to 10 in [16] and [2 ]. V ar io us previous works hav e also considered v ariants of the data placement problem in terms of the underlying top olo gy . In [13] the author s consider the ca se of distances in the underlying top olog ies that for m an ultrametric, i.e. are non- negative, s ymmetric and satis fy the st r ong tr iangle inequality , that is d ( i, j ) ≤ max { d ( i, k ) , d ( k , j ) } for clients i, j, k . The authors consider a simple hierar chical net work consisting of three distances b etw een the clients and devise a po lynomial algorithm for the case of unit-sized ob jects by transfor ming it to a capacitated transp orta tio n proble m [5 ]. F or the case of genera l ultrametr ic s, an optimal po lynomial alg orithm is given in [9] bas ed on a re ductio n to the min-cost ﬂow problem. The p age-plac ement pr oblem is an imp ortant genera lization o f the data place- men t and was pro p o sed a nd studied in [14 ]. In this problem, each client ha s an extra c o nstraint on the n umber of other clients it can serve, a part fr o m the c o n- straint o n the ca pacity of its cache. In [14 ], the a uthors give 5-a ppr oximation algorithm for the problem whic h violates b oth client and cache ca pacity con- straints by a logar ithmic factor at most. In [6], the log arithmic vio lation o f b oth capacity constr aints was impro ved to constant with a 13-approximation alg o- rithm. Finally , in [1 1] and [1 5] a game-theoretic a sp ect of the da ta placement problem is studied, where clients are considered to be selﬁs h agents. In b oth works, a lg orithms ar e provided which stabilize clients in equilibrium placements. All previo us results capture situations where write requests a re rar ely o r never issued for the ob jects. In [2] the author s consider the case when write requests are co mmon and for mu late the c onne cte d data plac ement pr oblem, in which it is required that all replicas o f a n ob ject o ar e connected via a Steiner tree T o to a ro ot r o , which c a n later b e used a s a multicast tree. The ob jective is the minimization o f the total incurred ac c ess co s t and the cost o f building the Steiner tree . A 14- approximation algor ithm for the pro blem is given in [2]. This problem is a generaliza tion of the c onne ct e d facility lo c ation pr oblem for which the b es t k nown approximation ratio is 8.5 5 [17]. In section 2 we formally de ﬁne the DP problem a nd in tro duce appropr iate notation. In section 3 we pres ent our main r esults for the DP pro ble m, whereas in section 4 we present an a lgorithm for the page placement problem a nd also brieﬂy dis c uss mo diﬁca tions for the o ther ex tens ions. 2 Problem deﬁnition The data pla c e ment problem we c onsider in this pap er is iden tical to the o ne in [1] a nd is abstracted as follows 3 . Ther e is a net work N consisting of a set M of M = |M| user s (clients) and a universe O of N = |O | ob jects. In what follows we use the terms user and ma chine interc hangeably . Each o b ject o ∈ O has length l o and each user j ∈ M has a local ca pacity C j for the stor age of ob jects. The dista nce b e t ween the users ca n be represented by a dis tance matrix D (not necessar ily symmetr ic ) where d ij denotes the distance from j to i . The matrix D mo dels the underlying top ology . W e do not assume a ny restric tio ns (e.g. metric) o n the distances. Ea ch user i requests access to a set of ob jects R i ⊆ O , na mely its re quest set . F or e a ch ob ject o in its request set, client i has a demand of acces s w io > 0. This demand ca n b e interpreted a s the frequency under whic h user i requests ob ject o . The subset P i of its re q uest set, that i chooses to replica te lo c ally is referr ed to as its plac ement . Obviously , | P i | ≤ C i for unit-sized ob jects. W e assume a n insta llation cost f o i for each ob ject o and each cache i . The ob jectiv e is to choos e place ments of ob jects for every client such as the total induced access and installatio n costs for all ob jects and all clients is minimized. In the following, we will a ssume without loss of gene r ality that each ob ject o ∈ O is requested b y at le a st one us e r. W e deﬁne a c onﬁgur ation c ⊆ M as a (non empty) subset of the M machines. Thu s, we hav e 2 M − 1 distinct conﬁgurations and we denote b y C the se t of all conﬁguratio ns. F or a conﬁgur ation c ∈ C and a user j we say that j is use d with r esp e ct to c , deno ted by j ∈ c , if the conﬁgur ation c contains j ’s ca che, i.e. machine m j ∈ c . It will b e also c onv enien t to introduce the following notation: p cj = 1 if j ∈ c , a nd p cj = 0 otherwis e . F or an o b ject o , we deﬁne a c - plac ement with r esp ect to o , as a placement of o b ject o to the machines b elonging to c . Int ro ducing binary v ar iables x oc to de no te whe ther we cho o se or not the c - placement with resp ect to o , we ca n formulate our pr oblem as an integer linear progra m, denoted by I L P in the s equel, in the following way: minimize X o ∈O X c ∈C cost oc x oc sub ject to X o ∈O X c ∈C l o p cj x oc ≤ C j j ∈ M X c ∈C x oc = 1 o ∈ O x oc ∈ { 0 , 1 } o ∈ O , c ∈ C (ILP) 3 In [2] a seemingly diﬀerent but essenti ally equiv alen t form ulation of the p roblem is described . where cost oc is the total c ost induced if co nﬁguration c is used for the placement of ob ject o , that is cost oc = P j ∈M (1 − p cj ) w j o l o d j ( c ) + P j ∈M p cj f o j , with d j ( c ) = min j ′ : p cj ′ =1 d j ′ j the nea rest distanc e at which client j can access ob ject o under c o nﬁguration c . The ﬁrst set of constra ints esse ntially states that the set of ob jects that ea ch user replica tes must not vio late the user’s c a che constraint, while the second set s tates that for each o b ject exa ctly one conﬁgur ation should be chosen. In what follows we denote by O P T the optimum solution o f the previous progr am. Note that the problem, as deﬁned abov e do es not a lways admit a fea sible solution. In order to av oid trivia l ca ses o f infeasibility we ass ume in the sequel that P i ∈M C i ≤ P o ∈O l o which essentially states tha t all clients can collectively store the unio n of the requested ob jects. Other works ([13, 9]) as sume exis tense of a distant server, that is, a user holding as a ﬁxed placement the universe of ob jects, which essentially tac kles the problem of infeasibility . F or the ca se of uniform sized ob jects, this assumption has no eﬀect in the problem’s ha r dness since the hardness result of B aev et al. [1] also holds in this case. How ever, in the ca se of non-uniform sized ob jects , their result does not ho ld immediately , since it relies o n the fact that it is sometimes not p oss ible to ﬁnd any fea s ible solution. When a distant server e x ists, any instance alwa ys admits a feasible solution. Nevertheless, their pro of of non-approximability can b e ada pted and th us the following re s ult ca n b e obtained. Due to spa ce limitations, we defer the details o f the pro of to the full version of this pap er . Prop ositi on 1. F or any p olynomial time c omputable function α ( N ) , the data plac ement pr oblem with non u n iform obje ct lengths and without any augmenta- tion in c ache c ap acities, c annot b e appr oximate d within a factor of α ( N ) , unless P = NP . The pro blem can a ls o b e sta ted a s a constr ained s ho rtest pa th proble m a s follows: we introduce a no de for ea ch binary v ar iable x oc and tw o no des s a nd t and connect them as follows: for each o i , 1 ≤ i ≤ N − 1 we co nnect the no de that r epresents x o i c with every no de tha t r epresents x o i +1 c for a ll c . F ur thermore we connect no de s with no des x o 1 c and no de t with no des x o N c for all c . A t each edge ( x o i c , x o i +1 c ′ ) we assign a weigh t equal to cost o i c for 1 ≤ i ≤ N − 1. Edges ( x o N c , t ) have weigh t cost o N c and edges ( s, x o 1 c ) have a weigh t of 0. The ob jective is to ﬁnd the shortest path betw een no des s a nd t while resp ecting cache ca pacity co ns traints on each no de. These constra ints are as signed to each no de x o i c by simply s umming up for each client contained in conﬁgur ation c the current cache conten ts up to ob ject o i . This co nstrained shor test path problem can b e s olved using dynamic progr amming. It leads to the algor ithm pr esented in the next sec tio n. 3 Constan t n um b er of clien ts In this section w e fo cus in the case where the num ber of clients in the net work (i.e. users) is a c onstant. T o the b est of our k nowledge these ar e the ﬁrst results for this natura l v ar iation. W e s how that the data placement problem can b e so lved optimally in p olynomial time when all ob jects are unit-sized. When ob jects hav e diﬀerent sizes we a re still able to so lve the problem optimally , with only a small and as ymptotically tight violation of the cache capa c ities. 3.1 Uniform length ob jects Let us deﬁne a n av aila ble cache vector r = ( r 1 , r 2 , . . . , r M ), where r j denotes the current space size av ailable o n cache of user j , for 1 ≤ j ≤ M . F or 1 ≤ k ≤ N , let us deno te by f k ( r ) the cos t asso c ia ted with the optima l wa y of placing ob jects o 1 , . . . , o k on the clients’ caches, as suming that the curr ent av a ilable cache vector is r . F or any co nﬁguration c , w e denote by δ c = ( δ 1 c , . . . , δ M c ) its machine-proﬁle vector, with δ i c = 1 if conﬁgura tion c uses machine m i , and δ i c = 0 otherwise. W e assume in this sectio n that a ll lengths satisfy l o = 1, but the following recur rence holds for the gener a l ca s e and it will be a lso used in the nex t sec tion. One ha s f k ( r ) = min c : r − l o k δ c ≥ 0 ( cost o k c + f k − 1 ( r − l o k δ c )) , with f 0 ( r ) = 0 for a ny r . Finding the optimum cost to I L P re duces to the computation o f f N ( r ) with r = ( C 1 , C 2 , . . . C M ). Theorem 1. The n on- metric data plac ement pr oblem with uniform length obje cts and a ﬁxe d numb er of clients c an b e solve d optimal ly in p olynomial time. Pr o of. By using standar d techniques (see for exa mple [8 ]), the a b ove recurr ence leads to a n eﬃcien t dy namic programming a lgorithm to obtain the optimal cost and so lution of ILP . The cache vectors r can take v a lues fro m a se t of s ize Q M j =1 C j ≤ C M max where C max is the maximum c ache size. Assuming the v alues f k ( r ) ar e stored in an arr ay a nd co mputed fro m k = 1 to k = N , then for each r the time needed to compute f k ( r ) is O (2 M ), i.e. a constant time, since at most 2 M conﬁguratio ns need to b e chec ked. The total time complexity is therefore O ( N 2 M C M max ). No tice that since ob jects are unit-sized, i.e. l o = 1, ∀ o ∈ O , we can a ssume without loss of gener ality that for any capacit y we hav e that C j ≤ N . If it is not the ca se, by changing this capacity to C j := N , w e obta in an equiv alent insta nc e b eca use in the mo del co nsidered, a client has no incentiv e to replica te any distinct ob ject t wice, since this would hav e no eﬀect in the total access co st. Finally , the computation time b ecomes O ( N M +1 ). ⊓ ⊔ 3.2 Non-uniform length o b jects The previous dyna mic pr ogra mming algor ithm is in fac t pse udo -p olynomial, since the complexity O ( N M C M max ) dep ends on the maximum cache size C max . In the case of unit-sized ob jects we are able to bound C max by the total num ber of ob jects and thus obtain a p olynomia l time algorithm. In the case o f ob jects o f arbitrar y length the b ound C max ≤ N do es not hold and the alg o rithm remains pseudo-p olynomia l. Algorithm 1: DP-NU( M , O , ε ) 1 α ← ( εl max ) / N ; // update object length s 2 foreach o ∈ O do 3 l ′ o ← ⌊ l o /α ⌋ ; // update cache sizes 4 foreach j ∈ M do 5 C ′ j ← ⌊ C j α ⌋ ; // use updated lengths and cache sizes with dynamic progra mming 6 O P T α ← optimum solution of ILP α ; 7 Out put O P T α ; In what follows, we show how to design a p o lynomial time algorithm in the case o f a rbitrary -sized ob jects. W e let α = εl max / N where ε is an arbitr a rily small p ositive constant and mo dify the ob ject lengths and cache sizes appro pri- ately . T o compute a s olution we use alg orithm 1 where IL P α denotes the integer linear progr am obtained from ILP by using length l ′ o (resp. cache C ′ j ) instead of l o (resp. C j ) fo r all ob jects o and clients j . Notice how ever that the co st function in ILP α is the sa me as in ILP , i.e . the costs cost oc = P j ∈M (1 − p cj ) w j o l o d j ( c ) + P j ∈M p cj f o j are c a lculated by using the initia l lengths l o . W e hav e the following lemma. Lemma 1. Given an α > 0 , any solution x for ILP is a solution for ILP α . Pr o of. Let x be a solution of ILP . One has, ∀ j ∈ M , X o ∈O X c ∈C  l o α  p cj x oc ≤ X o ∈O X c ∈C  l o α p cj x oc  ≤ $ X o ∈O X c ∈C l o α p cj x oc % ≤  C j α  , (1) where the ﬁrst inequality comes fro m the fact that p cj and x oc are integers, the s econd inequality is a standa rd one, and the last ineq uality co mes from P o ∈O P c ∈C l o p cj x oc ≤ C j since x is a feasible solutio n of ILP . Ther efore, x satisﬁes P o ∈O P c ∈C l ′ o p cj x oc ≤ C ′ j , and x is a feasible solution of ILP α . ⊓ ⊔ F ro m the a b ove lemma, we can immediately conclude that if ILP α has no solutions, then the same holds for ILP . How e ver, if ILP ha s no feasible s olutions, ILP α could hav e feasible solutions. In the following, we as sume that ILP admits at least one fea sible solution, in order to b e able to deﬁne a n optimal s olution denoted by O PT. Lemma 2. The algorithm DP-NU ( M , O ) r eturns an optimal solution for ILP using εl max blow-up in time p olynomial in N and 1 / ε , wher e ε is an arbitr arily smal l p ositive c onstant and l max is the length of the lar gest obje ct. Pr o of. First, notice that by Lemma 1 the cost of the solution OP T α is not gr eater than the co st of the solution OPT. F urthermo r e, we hav e that X o ∈O X c ∈C  l o α  p cj x oc ≥ X o ∈O X c ∈C  l o α − 1  p cj x oc ≥ X o ∈O X c ∈C l o α p cj x oc − N which b e comes X o ∈O X c ∈C l o p cj x oc ≤ α X o ∈O X c ∈C  l o α  p cj x oc + α N ≤ α  C j α  + N α ≤ C j + N α (b y using inequality (1)) Putting α = εl max / N we get for the initial insta nce X o ∈O X c ∈C l o p cj x oc ≤ C j + εl max (2) th us, each cache s ize is vio lated by at most εl max . F or the complexity , no tice that for a ny user j ’s cache, we ca n ass ume with- out loss of g enerality that C j ≤ N l max . If it is no t the case, by changing the capacity to C j := N l max , we obtain an equiv alent instance b eca use in the mo del considered, a client has no incentiv e to replicate a ny distinct o b ject twice, since this w ould hav e no eﬀect in the total access c o st. W e have therefor e , C ′ j =  C j α  ≤ C j α ≤ N l max α ≤ N 2 ε . Finally , we obtain C ′ max = max j ∈M C ′ j ≤ N 2 /ε and b y a similar analysis as in theorem 1 , the co mplexity of O ( N 2 M C M max ) b ecomes O ( N 2 M +1 ε − M ). Notice that if α is large enough, so me lengths ⌊ l o /α ⌋ can b ecome equal to zero. In that case, the dyna mic pr ogra mming algorithm can b e accelerated for such ob jects, since a n o ptimal placement is to put them on each machine. ⊓ ⊔ Using Lemma 2 we obtain the following theo rem. Theorem 2. The non-metric data plac ement pr oblem, with n on-uniform obje ct lengths and a ﬁxe d n umb er of clients, c an b e solve d optimal ly in p olynomial time using εl max blow-up on the machines’ c ap acity, wher e ε is an arbi tr arily smal l p ositive c onst ant and l max is the length of the lar gest obje ct. The εl max blow-up s ta ted in the previous theor em is as ymptotically tight. In or de r to clarify this, co nsider an insta nce with N ob jects and tw o clients M 1 and M 2 . The lengths of the ob jects are l i = (1 − δ ) / N for i = 1 , . . . , N − 1 and l N = l max = 1 / ǫ where 0 < ǫ < 1 a nd 0 < δ < 1. The cache ca pacities of the clients a re C 1 = ǫl max = 1 for M 1 and C 2 = 1 / ǫ for M 2 . All installation costs a re 0. Client M 1 has a demand equal to 1 for the ﬁr st N − 1 ob jects and no demand for o b ject N . Clien t M 2 has als o a demand of 1 for the ﬁrst N − 1 ob jects and a demand of N for the N - th o b ject. In the optimum so lutio n O P T , M 1 replicates all the ( N − 1) ob jects and M 2 replicates o nly ob ject N . When our alg orithm is employ ed, the lengths of ob jects i , 1 ≤ i ≤ N − 1 b e come l ′ i = ⌊ l i /α ⌋ = ⌊ 1 − δ ⌋ = 0. The length o f the N -th ob ject b ecomes l ′ max = ⌊ N /ǫ ⌋ while the cache sizes beco me C ′ 1 = ⌊ 1 / α ⌋ = N and C ′ 2 = ⌊ N / ǫ ⌋ . In the optim um solution O P T α client M 1 will again choose to replicate the ( N − 1) ob jects it has demand for, but client M 2 can now choose a ll N ob jects. After restoring the original o b ject lengths and capac ities, the total blow-up is only due to M 2 and is equal to ( N − 1)((1 − δ ) / N ). Cho osing δ = 1 / ( N − 1) we get 1 − 2 / N the limit of which is 1 = ǫ · 1 ǫ = ǫ · l max , as N appr oaches inﬁnit y . 4 The page placemen t problem In the page placement pr oblem, there are bo unds imp osed on the num ber of clients that can connect to a sp eciﬁed client’s cache in o rder to access ob jects. W e denote by k j the ma x imu m n um b e r of users that can access a given user j ’s c ache. If the same user a ccess cache j for diﬀerent ob jects it is counted only once. Clearly , in this problem a client requesting an ob ject can not alwa ys us e the nea rest machine whic h r e plic a tes that ob ject to acces s it. W e need to intro duce some terminology and notatio ns. Let us deﬁne an av ailable loa d vector t = ( t 1 , t 2 , . . . , t M ), where t j denotes k j min us the curr ent nu mber of users connected to the cache j . Notice that the num ber of loa d vectors is b ounded by Q M j =1 ( k j + 1) ≤ ( M + 1 ) M . F or any c o nﬁguration c , we denote a s befo re by δ c its machine-proﬁle vector, i.e. for 1 ≤ i ≤ M , δ i c = 1 if conﬁgura tion c uses machine m i , and δ i c = 0 o therwise. Given an ob ject o and a conﬁguration c , a c -placement is a placement such that a machine m receives the ob ject o if and o nly if m ∈ c . In a c -placement, the ma chines outside c need a wa y to access the ob ject o they ar e reque s ting. W e call such a way a c onne ction p att ern ρ with r esp ect to the conﬁguration c , and we denote by Φ c the s et of all s uch p ossible connection pa tter ns. Given ρ ∈ Φ c , for all j / ∈ c and i ∈ c , w e put ρ ij = 1 if user j access ob ject o from user i , and ρ ij = 0 otherwise. Moreover, for all j ∈ c and i we hav e ρ ij = 0. Notice that | Φ c | is b ounded by | c | M −| c | ≤ M M . Finally , we deﬁne a history p attern s in the following wa y: s ij = 1 if machine m j has prev iously used ma chine m i to access an ob ject, and s ij = 0 other wise. The num ber of his tory pa tterns is equal to 2 M ( M − 1) / 2 . Given ρ and s we denote by s ∨ ρ the upda ted history pattern taking into acc ount the cur rent connectio n pattern ρ . The up da ted pattern ca n be obtained b y per forming a logical or betw een ρ and s , i.e. ( s ∨ ρ ) ij = s ij ∨ ρ ij . F or a co nnection patter n ρ ∈ ρ c and a histo ry pa ttern s , we denote by ∆ ρ,s = ( ∆ i ρ,s ) M i =1 the vector which indicates for each machine m i the num ber of machines which a r e connec ted to m i for the ﬁrst time. Suc h a vector can b e obtained in the following wa y: for 1 ≤ i ≤ M , one has ∆ i ρ,s = P M j =1 ρ ij (1 − s ij ). Finally , we deﬁne cost o,c,ρ = P i,j ∈M d ij w j o l o ρ ij + P i ∈ c f o i . F or 1 ≤ k ≤ N , let us denote by f k ( r , t, s ) the cost asso ciated with the optimal wa y of placing o b jects o 1 , . . . , o k on the clients’ caches, a ssuming that the cur rent av ailable cache vector, loa d vector and access vector ar e r esp ectively r , t, s . One ha s f k ( r , t, s ) = min C ∈C : r − l k δ c ≥ 0 min ρ ∈ ρ c : t − ∆ ρ,s ≥ 0 ( cost o k ,c,ρ + f k − 1 ( r − l o k δ c , t − ∆ ρ,s , s ∨ ρ )) , with f 0 ( r , t, s ) = 0 for any r, t, s . Theorem 3. The non-metric p age plac ement pr oblem with uniform length ob- je cts and a ﬁxe d nu mb er of clients c an b e solve d optimal ly in p olynomial time. Pr o of. Finding the optimum cost to the pr oblem reduces to the computation of f N ( r , t, s ) with r = ( C 1 , . . . C M ), t = ( k 1 , . . . , k M ) and s = (0 , . . . 0). The complexity for co mputing f k ( r , s, t ) is O (2 M M M M ), and there a re at most N C M max M M 2 M ( M − 1) / 2 ( r , s, t ) triplets. As in sectio n 3 (Theorem 1), we ca n assume that C max ≤ N and obtain a n ov erall complexity of O ( N M +1 ). ⊓ ⊔ F or the no n uniform case, the s a me recur rence relation holds, and using a s imilar techn ique a nd analysis a s in section 3 .2 (not r e pea ted here due to space limi- tations) the complexity b eco mes O ( N 2 M +1 ε − M ) and we o btain the following result: Theorem 4. The non-met ric p age plac ement pr oblem with non-uniform obje ct lengths and a ﬁxe d n umb er of clients, c an b e solve d optimal ly in p olynomial time using εl max blow-up on the machines’ c ap acity, wher e ε is an arbi tr arily smal l p ositive c onst ant and l max is the length of the lar gest obje ct. 5 Concluding Remarks In this pap er, we addr essed the problem o f replica ting data ov er a constant nu mber of netw ork clients and des igned optimal algo rithms via utiliza tion of the notion of co nﬁgurations. If all data ob jects a re eq ual in size, our algor ithm ﬁnds in p olynomial time the optimum solution. When lengths o f ob jects diﬀer, a small violation of each client’s cache capacity constraint is e no ugh, so as to be able to ﬁnd the optimum so lution. Our technique constitutes a g eneral framework that ca n also b e used for solving optima lly v ario us common extensio ns o f the pr oblem such as: (a) the k - median v ar iant in which an upp er bo und k o is imp osed on the num be r of copies of each ob ject o that ca n b e replicated in the net work and (b) the connected data placement problem [2], where apa r t fr om placing ob jects, a ll clients hold- ing replicas of the same ob ject should also be innterconnected via a directed Steiner tree. F urthermore our technique can b e applied for o ther v ariants of data placement for example the fault tolerant data placement (derived from the fault-tolerant facility lo cation pro blem [18]) wher e each client c a n be ser ved b y a given n umber of machines and the cost is obtained by summing the costs of access with r esp ect to those machines. W e defer the details for these a nd other extensions, due to space limitations, for the full version of this pa p er . The prop osed algor ithms r emain p olynomia l indep endently of any metric. An imp or tant asp ect of further resear ch is the mo diﬁcation of the describ ed algorithm s o a s to b e a ble to handle extensions inv olving paymen ts. In s uch extensions, apar t from ob ject preferences, a client a lso has a budg et to spend and pay other clients to convince them to replicate cer tain ob jects. References 1. Iva n D. Baev and Ra jmohan Ra jaraman. Approximation algorithms for data place- ment in arbitrary netw orks. In Pr o c e e dings of the ACM-SIAM Annual Symp osium on Discr ete Algorithms (SODA) , pages 661–670, 2001. 2. Iva n D. Baev, R a jmohan Ra jaraman, and Chaitany a Swam y . Ap proximatio n al- gorithms for data placement problems. SIA M Journal on Computing , 38(4):1411– 1429, 2008. 3. Jon F eldman and Matthias Ruhl. The directed steiner netw ork problem is tractable for a constant number of terminals. In 40th Annual Symp osium on F oundations of Computer Scienc e , pages 299–308, 1999. 4. Jon F eldman and Matthias Ruhl. The directed steiner netw ork problem is tractable for a constant num b er of terminals. SIAM Journal on Computing , 36(2):543–5 61, 2006. 5. R. Garﬁnkel and George L. Nemhauser. I nte ger Pr o gr amm ing . John Wiley & Sons Inc, 1973. 6. Su d ipto Guha and Kamesh Munagala. Imp ro ved al gorithms for the d ata placement problem. In Pr oc e e di ngs of the A C M-SIAM Symp osium o n Discr ete Algorithms (SOD A) , pages 106–107, 2002. 7. Benny Kimelfeld and Y ehoshua Sagiv. New algorithms for compu t ing steiner trees for a ﬁ xed num ber of terminals. Un p ublished manuscript., 2006. 8. Jon Kleinberg and ´ Ev a T ardos. Algorithm Design , chapter 6. Dynamic Program- ming. Addison W esley , 2005. 9. Madhuk ar R. Korup olu, C. Greg Plaxton, and Ra jmohan R a jaraman. Placement algorithms for hierarchical coop erative caching. Journal of Algor ithms , 38(1):260– 302, 2001. 10. Christof Krick, Harald R¨ ac ke, and Matthias W estermann. Approximatio n algo- rithms for d ata management in netw orks. In Pr o c e e dings of the 13th Annu al ACM Symp osium on Par al lel Algorithms and Ar chite ctur es , p ages 237–246 , 2001. 11. N. Laoutaris, O. A . T elelis, V . Zissimopoulos, and I . Stavrak akis. Distributed Selﬁsh Replication. IEEE T r ansactions on Par al lel and Distribute d Systems , 17(12):140 1–1413, 2006. 12. Nikolaos Laoutaris, V assilios Zissimopoulos, and Ioannis Stavrak akis. Joint ob ject placemen t and no de dimensioning for internet conten t d istribution. Information Pr o c essing L etters , 8(6):273–279, 2004. 13. Avraham Leﬀ, Jo el L. W olf, and Philip S. Y u. Replication algorithms in a remote cac hing architecture. IEEE T r ansactions on Par al l el and Di stribute d Systems , 4(11):1185 –1204, 1993. 14. Adam Meyerson, Kamesh Munagala, and Serge Plotkin. W eb cac hing u sing ac- cess statistics. I n Pr o c e e dings of the A CM-SIAM Annual Symp osium on Di scr ete Algor ithms (SODA) , pages 354–363, 2001. 15. Gerasimos G. P ollatos, Orestis T elelis, and V assilis Zissimopoulos. On the so- cial cost of distributed selﬁsh con tent replication. In 7th I nternational IFIP-TC6 Networking C onfer enc e , pages 195–206, 2008. 16. Chaitany a Swam y . Algori thms for the d ata placement problem. Unpu blished manuscri pt., 2004. 17. Chaitany a Swam y and Amit Kumar. Primal-dual algorithms for connected facility location problems. Algorithmic a , 40(4):245–26 9, 2004. 18. Chaitany a Swam y and David B. S hmoys. F ault-tolerant facilit y location. ACM T r ansactions on Algor ithms , 4(4), 2008.

Optimal Data Placement on Networks With Constant Number of Clients

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment