Optimal Scheduling of File Transfers with Divisible Sizes on Multiple Disjoint Paths
In this paper I investigate several offline and online data transfer scheduling problems and propose efficient algorithms and techniques for addressing them. In the offline case, I present a novel, heuristic, algorithm for scheduling files with divis…
Authors: Mugurel Ionut Andreica
OP TIM AL S CH EDU LIN G OF FIL E TRAN SF ER S WITH D IV ISIB LE S IZ ES O N MU L TIP LE DI SJO IN T P A THS Mu gur el Ion ut Andr eic a Com pute r S cien ce Depa rtmen t , Poli te hnic a Un ive rsi ty of Bu char est Spl aiu l In dep enden tei 313 , 06 0 042 , Buc har est , Roman ia pho ne: + ( 40 ) 722803022 , e m ai l: mugu rel .and rei ca@cs .pu b.ro we b: ht tps ://m ail .cs .pub .ro/ ~mugu rel .an drei ca ABS TRA CT In this pape r I investigat e several of fline and o nline data transfer schedulin g pr oblems an d pro pose ef ficient alg o- rithms and techniques for addressing them. In the off lin e case, I pr esent a n ovel, heuristic , alg orithm for sche duling files with divisible sizes on multip le disjoin t paths, in order to maximize the total profit (the problem is equival ent to the multiple knapsack problem with divis ible item sizes). I then consider a cost optimization problem for transferring a s e- quence of identical files, subject to time constraints imposed by the data transf er providers. For the online case I propose an algorithm ic framework ba sed on the block partiti oning method, w hich can speed up the process of resou rce alloc a- tion an d reservati on. 1. INT ROD UCT ION The im portance of data trans fer scheduling te chniques in achieving good communicati on perfo rmance h as incr eased recently , with the world-wide developm ent and deployment of distri buted system s, services an d applicati ons. In this p a- per I study several of fline and online d ata transfer schedu l- ing problem s and prop ose novel, efficient tech niques for addressing these problem s. First, I present an eff icient he u- ristic algorith m for schedulin g files with d ivisibl e sizes on multiple disjoint paths, in order to maximize the total profit. This problem is equivalen t to the m ultiple knapsack problem with divisible item sizes. Then, I present an optim al alg o- rithm for m inimizing costs when a sequence of i dentical files mus t b e transfe rred from a sou rce to a d estinati on, su b- ject to time constraints imposed by the data transfer provi d- ers. I also propose an o nline algorith mic framework for the block partitioning method, which can be used to efficiently handle onlin e res ource allo cation and reservati on requests . Th e rest of this paper is o rg anized as follow s: in Sections 2 and 3 I discuss the of fline scheduling problems I m entioned above and present the develope d solutions . In Secti on 4 I propose an algorithm ic framew ork for online resource all o- cation and reservation. In Section 5 I discuss related work and in S ection 6 I draw som e conclusi ons. 2. MA XIMUM PROF IT DA T A TRA NSF ER S W e are given n file transfer requests . For each request i, its file size (sz i >0) and profit (p i >0) are know n. Each file m ust be transf erred between the same so urce an d destinati on. W e co nsider the file sizes sorted in ascending order sz 1 ≤ sz 2 ≤ … ≤sz n . The file sizes are integer s and divisibl e, i.e. sz i =q i ·sz i-1 (2≤i≤n), where q i ≥1 is an integer num ber . Each file transfe r must be sch eduled non-preem ptively on one of the k p aths available . T he paths are disjo int and ident ical, exce pt that each path j is available only dur ing a time interval [0,T j ]. All the paths have unit transfer rat e, so the time taken to transfer a file with size sz i is sz i time units. A file transfer request may be accepted or reject ed. Accepting a request i m eans assigning it a path j and a tim e interval [t,t+sz i ) fully i n- cluded in [0,T j ]. At any moment, at most one file can b e transfer red on a path, i.e. the time intervals o f the requests assigned to th e same path m ust be disjoint. The to tal profit is the sum of the p rofits brought by each ac cepted request (if a request is rejected , it contributes nothing to the tot al profit). Obviously , we would like t o ac cept those reques ts wh ich bring a maximum total p rofit. T his problem is equivalent to the multiple knapsack pr oblem w ith divisi ble item sizes. Each path j is a knapsack of a giv en capacity T j . The file transfer requests are items whose sizes are divisi ble and w e are intereste d in finding a max imum profit subset of items, such that each item in the set is plac ed in som e knapsack an d the sum o f the item sizes in any knapsack does not exceed its capacity . The m ultiple kna psack pro blem is NP-ha rd, thus a polynomial time algorithm is un likely to exist. Even for this particul ar case with divisible item sizes , we present only a p seudop olynomial O(n·S·min{ n,S·log(S)}) time algorithm , where S is th e maximum size of an item. A direct solution obtained by exten ding the stan dard d ynam ic progr amming algorithm for the single knapsack case takes O(n·m ax{T j } k ) time (where k is the number of knapsacks) and computes a multidim ensional array P m [i,s 1 ,s 2 ,…,s k ]=the maxim um pro fit which can be achieved by choosing a subset of the first i items and filling each kna psack j up to size s j (at most). W e have P m [0,s 1 ,…,s k ]=0 (f or all t he values s j ) and ], sz s , ... , s , s 1, [i P ... , ], s , ... , sz s , s 1, [i P ], s , ... , s , sz s 1, [i P ma x p ] s ,. .., s 1, [i P ma x ] s ,. .. , s [i, P i k 2 1 m k i 2 1 m k 2 i 1 m i k 1 m k 1 m For P m [i,s 1 ,…,s k ], the ch oic es are to ei the r igno re the i th it em or place it in o ne of the k kn aps acks ( the item can be pla ced in kna psack j if s j ≥ sz i ). The maxim um profi t is g iven by P m [n,T 1 ,…, T k ]. Howeve r , this solut ion is inef fi cien t . Fate r alg orith ms mak e use of heu risti cs . The most natu ral heur isti c is th e fo llow ing one, ba sed on a gr eedy alg orit hm : Gr ee dy1Mu ltip le Knap sack (item _s et , kn apsa ck_s et): k=| knapsa ck_ se t| fil l t he fir st knap sac k o ptim all y w ith a s ubs et ite m_s ol of t he it ems if ( k =1 ) t hen re turn p rof it( item _so l) el se if ( | item _s et \ i tem_ sol |>0 ) t hen re turn pro fit( ite m_s ol) + Gr eedy 1M ult iple Kna psac k( ite m_s et \ ite m_s ol, k naps ac k_s et \ { firs t knaps ack }) Oth er heu ris tic algor ithm s con sist of so rting the it ems acc ordi ng to s ome criter ion (e.g. profit/ siz e) and insert ing th em using the Fi rst Fit heu rist ic. I wil l now pres ent a very dif ferent approa ch, wh ich prov ides the optim al solut ion in many ca ses . We will split the it ems int o grou ps: two items bel ong to the sam e grou p if they hav e the same siz e; thus , all th e items in grou p i hav e size sg i . W e con si der the grou ps sor ted in decr eas ing o rder of th e it em s izes , i .e. sg 1 >sg 2 >…>sg G (wher e G is the to tal numbe r of di stin ct item si zes) . Wi thin a group i, the items are sorte d in decr easin g orde r of th eir profits , i.e. pr i,1 ≥ pr i,2 ≥…≥ pr i,n i , whe re n i is the num ber of item s in grou p i and pr i,j is th e prof it of the j th item in the i th gr oup. I n the f irs t s tep of the algori thm , we wil l i n- se rt the item s into the knap sack s using the Firs t Fit heuri sti c. The item s are trave rse d in incr easin g ord er of the gr oup num ber and, with in a grou p, in incre asin g order of the item num ber . For each item (i,j) (the j th it em in the i th gr oup), if it can be inser ted into a knaps ack p wi th out exc eedi ng its c a- pac ity , w e will inser t it into p. The knaps ack index p is not im port ant. Because the i tem si zes a re divis ibl e, we w ill be abl e to inser t the same set of item s during this f irs t stage, no ma tter whi ch knaps ack p we choose for a specif ic item . W e wil l then succe ssiv ely impro ve the initi al solu tion , b y repla c- ing it ems wi th subs ets of items which coul d not be inse rte d dur ing the firs t stage and whos e tota l profit is larg er than the in divi dual prof it of the repla ced item . The algo rithm is ske tch ed b elow : Mult ipl eKn aps ackW ith Di visib leI tem Sizes (): fo r i= 1 to G do fo r j =1 to n i do kna psac k[ (i ,j) ]=0 fo r p =1 to k do if ( T p ≥sg i ) then / / i nse rt ite m ( i,j) int o k na psa ck p kna psac k[ ( i,j )]=p; T p =T p - sg i ; br eak imp rov ed _sol uti on=t rue whil e ( im prov ed _sol uti on ) do sm ax= the max im um s iz e o f an it em i nsi de a k naps ack nit ems =0 fo r i =G d own to 1 do if (s g i 0 } if ( m axd if> 0 ) th en (i r ,j r )=t he item to be re pla ced (f or w hic h m axd if is max imu m) Q= the sub se t of it em s i n c and, co rre spon ding t o P m ax [ nitem s, sg ir ] fo r ( i, j) in Q do kna psac k[ (i, j) ]= kn apsa ck[ (i r ,j r )] kna psac k[( i r ,j r )]=-1 ; i mpr ove d_s olut ion =tr ue el se impr ove d_s olu tion =fal se At the en d, for e ach item (i,j) we h ave three options: knapsack[ (i,j)]>0, indic ating the kna psack in to w hich the item is place d kna psac k[ (i,j)] =- 1 : the item was insert ed insi de a kn a p- sa ck d urin g t he f irst st age, but w as r eplac ed aft erw ards kna psac k[ (i,j)] =0 : the item was neve r inser ted insi de any kna ps ack During the second stage of the algorithm , we choose nitems items which have neve r been inserte d into any kna p- sack and compute the maxim um profit obtained by choosing a subset of these item s wh ose sum is sum (for each sum= 1 to smax ); these values are stored in P max [nitems, sum ] . W e then replace an item ( i r ,j r ) from a knapsack for wh ich the profit increase P max [nitems, sg ir ] - pr ir ,j r is maximum. T he replaced item is igno red from now on, as it cannot be p art o f an opt i- mal solution. By maintaining a linked list with the item s in each group, from which we remove (in O(1) time) an item when it is inserted into a knapsa ck, w e can implement the firstItem , next Item and isV alidItem functions in O(1) tim e . The optimality of the algo rithm is justified by the following facts: any valid s olution f or th e multiple knapsack can be successive ly improved to an optimal solution by replacing a su bset of items S 1 in one of the knapsacks with a subset of items S 2 outside of any knapsack. Because the item sizes are divisible, the set S 1 can alway s co ntain only o ne item. The first stage of the algorith m takes O(n·k) time and O(n) items can be inserted then. The while loop can be executed a nu m- ber of times equa l to the number of items inserted in the first stage. Each iteration o f the while loop takes O(nitems·sm ax) time. T w o upper lim its for nit ems are O (n) and . log (sm ax )) O (sm ax i sm ax 1 - sma x 1 i Since smax is bounded by S , the larges t size of an item, the overal l tim e complexity is O(n·S ·min{n,S·l og(S)}) . I com par ed the propos ed alg orit hm with thre e other a l- gor ithm s: the sing le knaps ack exten sion to multi ple kna p- sa cks , the Greedy 1Mult ip leKna psa ck algo rith m and a gree dy alg orith m wh ich sor ted the item s accordi ng to seve ral cri ter ia an d then use d the First Fit heuris ti c. I consi dere d many tes t sc enar ios and mos t of them were solv ed opt imal ly by the new algor ithm . How eve r , I wa s als o abl e to fin d tes t cas es wh ere the algorith m coul d not fin d the optim al solut ion . How ever , in terms o f perf orman ce (qu ality of th e obtain ed solu ti on and running time) , the alg orit hm I pr opos ed is a cle ar win ner , follow ed by th e Greedy 1 Mul tipl eKn apsa ck alg orith m. 3. MIN IMUM C OST DA T A TRAN SF ERS W e are given a s equence of n similar files, which need to b e sent consecut ively from a s ource to a destination . The tran s- fer of each file takes 1 time unit (thus, file i is transferre d from time i-1 to time i ). There are k data transfer p rovide rs; a provider j charg es a fixed price C j per time unit f or tran s- ferring data and leases his se rvices for at most T max, i tim e units. Because of several factors , each provider j ask s that the leased time interval inclu des a specified time interval [T 1,j , T 2,j ) (T 2,i -T 1,i ≤ T max,i ). Sin ce files cannot be transferred simu ltaneously , the time intervals rented from each provider will b e dis joint. W e may also use a default netw ork link for transfer ring a file i, which would cost us L i . Of course, we are interest ed in paying the minim um total cost for the fil e transfers. W e present here an O(k· n) d ynam ic programmin g algorithm for solving this problem. We will sort the data transfer provide rs in increasing order of T 2,i , i.e. T 2,1 ≤ T 2,2 ≤…≤ T 2,k . W e will com pute the values Cmin[i,j ] =the min imum total cost for sending the first j files using a subset of the first i providers (in the sorted order). Initially , Cmin[0,0 ]=0 and Cm in[0, j]=+ ∞, for j>0. Fo r i>0, w e have: i i 1, T p T j i i 1, i 2, i m ax , i 1, j C p) (T p] 1, Cm in[ i min C ) T (j ) T (j o r ) T T (j if , 0) (j if , L 1] - j Cm in[ i, j] 1, - Cm in[ i min j] Cm in[ i, i 1, i m ax, When comput ing Cmin [i,j] , we have the choic e of usin g th e servi ces of the i th data transf er prov ide r or not . If w e do not use them , then the cost is equ al to min{Cm in [i- 1,j] , Cmin [i ,j- 1]+L j }. If we wan t to use the i th pr ovi der , b ut j v i o- lat es the time constr aint s im pose d by the provid er ((j>T 1, i +T ma x,i ) or (j< T 2,i )), the n th e co st is + ∞ ; oth erw ise, j is th e end tim e m omen t of th e lease d tim e int erv al and w e n ee d t o ch oose the first time momen t of the in terv al (p). Using the equ ati on abov e, an O(k ·n 2 ) alg orith m can be imp lem ent ed eas ily (tak ing O(n ) tim e for eac h pair (i ,j)) . W e wil l sh o w how to c ompu te al l th e valu es Cmin [i, j] in O(n ) tim e f or ea ch va lue of i (th us , in O(1 ) time for ev ery pai r (i,j)). F or each 1 ≤ i ≤ k , w e are only intere ste d in th e value s of j w ithin th e in terva l [T 2,i , T 1,i +T ma x,i ] (the othe rs are easy to han dle ); thus , we w ill c omput e an ar ray m in p i , w here i i 1, T p q i C p) (T p] 1, Cm in[i min [q ] min p i 1, W e hav e m inp i [T 1,i ]=Cm in [i-1, T 1,i ]. E ach of th e o the r va lues can be comput ed in O(1) tim e (in orde r , from T 1,i - 1 down to T 2,i -T max ,i ): i i 1, i i C q) (T q] 1, Cm in[ i 1], [q m in p m in [q ] m in p . Aft er comp uting the array m inp i in O( n) tim e, w e can com put e in O( 1) time each val ue Cmin[i ,j], wi th j in [ T 2,i , T 1,i +T ma x,i ]: Cmin [i ,j]=m in{Cm in[i-1 ,j], Cmin [i ,j-1] +L j , (j - T 1,i )·C i +mi np i [j -T ma x,i ]}. The t otal cost is Cm in[k , n] . 4. ONL INE R ESOU RCE MA NAGE MEN T W e consider the following scenario: a resource manager receives resou rce alloca tion and reservat ion requests (data transfer requests) which need to be processed in real time (as soon as they arrive or in batches). A request asks for a ce r- tain amount o f resources (e.g. bandwidth ), subject to several types of time constraints (e.g. fix ed duration, earliest st art time, latest finish time). Many models and algorith ms have been developed for online scheduling problem s [1 ]. W e co n- sider her e the fol lowing assum ptions: time is divided into discrete, equally -sized time slots and the resource manager must ha ndle many requests simultaneous ly , providing low response t imes. Be cause of th e string ent tim e constraints, the scheduler needs some ef ficient data structures to help it check if the request’ s cons tr aints can be satisfied an d to choose appropriat e reservati on parameters (if the request is accepted ). In order to speed up the p rocessing of reques ts , we introduce an algorithmic framew ork fo r the block part i- tioning meth od : W e have an array of n cells, where each cell has a value v i (each cell corresponds to a time slot). W e will divide the n ce lls into n/k blocks of size k (we assum e that k is a divisor of n; if it is not, n can be extended to be a m ult i- ple of k or the las t block may contain f ewer cells). T he blocks are num bered from 0 to (n/k)-1. T he cells 0, …, k- 1 belong to block 0 , the cells k, …, 2·k-1 belong to block 1, …, the cells (i - 1) · k, …, (i·k) -1 b elong to block i-1. Thus, cell j belongs to block (j d iv k) (integer division) . For si m- plicity , w e store for each block B the fi rst and last cells of the block ( left [B ] and right[B] ). Us ing this partitioning, w e can support several up date and query functi ons in O(k+n/k) time. B y choosing k=sq rt(n), we have O(k+n/k)=O(sqrt(n )). Queries consist of computing a function on the values of a range of cells [a,b] (range query) or o n r etrieving the value of a single cell (point qu ery). Ran ge Q uery (a, b): com pute qFunc (v a , v a+1 , …, v b ) . An alog ously , we hav e p oint and rang e u pda tes: Ran ge Up da te(u , a , b) : v i = uFunc (u , v i ) , a≤i ≤b. The qFunc f unct ion mus t be binary and ass oci ativ e, i. e. qFun c( v a ,..,v b )=q Func (v a ,qFu nc( v a+1 ,.. ,qFu nc(v b-1 , v b ). .)) and qFun c( a,qFu nc( b,c)) =q Func( qFunc (a, b) , c) . W e must also hav e uFun c( x,y)= uFu nc(y ,x) . Only va lues v i with O(1) si ze are consi dere d (num bers and tuples w ith a fixe d numbe r of elem ents ). uFunc an d qFun c must be able to hand le unini tia- li zed ar gum ents . If one of their argu men ts is uni niti aliz ed , th ey must simply return the other ar gum en t; this part wi ll be in tent iona lly left out o f the func tio ns’ descr ipti ons . The alg o- rit hm ic f ramew ork con sist s of th e fun ct ions from T abl e 1 . Tab le 1. Algo rit hmic Fr amewo rk Fu nctio ns Upd ate Func tio ns Que ry F unc tion s BPpoi nt Upda te BPran g eUpda te BPran g eUpda tePo int s BPran g eUpda tePa rti alBl ock BPran g eUpda teFu llB lock BPpoi nt Que ry BPran g eQuery BPran g eQuery Points BPran g eQuery Partial Blo ck BPran g eQuery Fu llBl ock In order to perfo rm a range upda te, we wil l call the BPr ange Upda te fun ction with the corr esp ondi ng param ete rs (th e update value u and the upd ate in terva l [a,b]) . This fun c- ti on spl its the updat e interva l into three zones: the fi rst block B a int erse cted by the inte rval (con tain ing the cel l a), the las t bloc k B b in ters ect ed b y the in te rval (cont ainin g the ce ll b) an d all the block s in betw een B a and B b (the inner blocks) . The blo cks B a and B b may not be ful ly cont ain ed insi de the in terva l: they w ill be upda ted in O(k) time (part ial upd ate ) . All the inne r block s are ful ly cont ain ed ins ide [a, b]: th ey will be update d in O(1) time each (fu ll updat e). Since there are O(n /k) such blocks , the overa ll comp lexity of a range update is O(k +n/k ). The r ange query fu nct ion ( BPr angeQ uery ) wo rks sim ilar ly . For each block B w e will main tain two va l- ues : uagg an d qagg . uagg is th e agg regat e o f th e upda te p a- ram ete rs o f the fun ction call s whic h upda te d all the elemen ts of B (f or whi ch B was an inner block ). uagg is res et to an unin iti aliz ed value on each part ial updat e of the blo ck. qagg is the answe r to the q ue ry fun ctio n called on all the elemen ts of B. The point updat e an d qu e ry fun ction s are: BPp oint U p- date and BPpoi ntQu ery . The fram ew ork also use s a “mult i- pli cat ion ” oper ato r mop , w hich com put es the ef fects of an upd ate opera tion upon the query r esul t on a ra nge of cells . Thi s operat or must exist when range queri es and range u p- dat es are us ed togeth er , but can be igno red otherw ise . When th e da ta struct ur e is init iali ze d, the uagg value of each block is set to unin iti aliz ed ( qagg is ini tiali zed wi th th e que ry resul t on the rang e of th e bloc k’ s ce lls ). Th is fr amew ork is simi lar to the segm ent tree f ramew ork intr oduc ed in [6] and can su p- port al l t he co mb inati ons of poin t a nd r ang e qu ery a nd u pdate fun cti ons m ent ion ed th er e. BPp oin tUpd ate (u, i ): v i =uFu nc( u,v i ) B=t he b loc k to w hic h t he ce ll i be lon gs qagg [B ]= BP ra ng eQue ryP oin ts ( left[ B] , r ight [ B]) BPra nge Upda te(u , a, b) : B a , B b =the bl ock s of ce lls a a nd b if ( B a =B b ) then if (( a=l ef t[ B a ] ) and ( b= rig ht[ B a ]) ) t hen BPr an geUp da teFu ll Block ( B a , u ) el se BPr ang eUpd ate Par tia lBlo ck ( B a , u, a, b ) el se BPr an geUp da tePa rt ial Block ( B a , u, a, ri ght[ B a ] ) BPr an geUp da tePa rt ial Block ( B b , u, le ft [B b ], b ) fo r b loc k=B a +1 to B b -1 do BPr an geUp da teFu ll Block ( b loc k, u ) BPra nge Upda teP oin ts(u , a, b): fo r p =a to b do v p = uFu nc ( u , v p ) BPra nge Upda teP arti alBl ock(B, u, a, b ): BPr an geUp da tePo in ts ( u agg[ B ] , lef t[ B] , rig ht[ B ] ) uagg [B ]=un ini tial ize d BPr an geUp da tePo in ts ( u , a , b ) qagg [B ]= BPr an ge Quer yPo int s ( le ft[ B] , rig ht[ B ] ) BPra nge Upda teF ullB lock (B, u ): uagg [B ]= uFu nc ( u , uagg[ B] ) qagg [B ]= uFu nc ( mo p ( u , 1 eft[ B], ri ght [B ] ), qa gg[ B] ) BPp oin tQue ry(i ): B=t he bloc k to w hic h t he ce ll i be lon gs re turn uFu nc ( uag g[ B], v i ) BPra nge Que ry(a , b ): B a , B b =the bl ock s of ce lls a a nd b if ( B a =B b ) then re turn BP ran ge Quer yPa rti al Blo ck (B a , a , b ) el se q a = BPr a nge Quer yPa rt ial Block ( B a , a, r ight [B a ] ) q b = BPr a nge Quer yPa rt ial Block ( B b , l eft[ B b ] , b ) q=un ini tia liz ed fo r b loc k=B a +1 to B b -1 do q= qF un c ( q , BPra ng eQue ry Full Bloc k ( blo ck )) return q Func ( q a , qFunc ( q, q b )) BPra nge Que ryPo ints (a, b ): q= uni nit ial ize d fo r p =a to b do q= q Func ( q, v p ) re turn q BPra nge Que ryPa rti alBlo ck(B, a, b): BPr an geUp da tePo in ts ( u agg[ B ] , lef t[B ] , right [ B] ) uagg [B ]=un ini tial ize d re turn BP ran geQu eryPo ints ( a, b ) BPra nge Que ryFul lBl ock( B): re turn q agg[ B] In the case of point queries with range updates, only the uagg values are meaningf ul; sim ilarly , only the qagg values are meanin gful in the case of point updates with range qu e- ries. Common update and query functions can be easily i n- tegrated into the f ramew ork. For exam ple, w ith uFunc(x,y)=( x+y) , qFunc(x,y)=( x+y) and mop(u,a,b) = u·(b- a+1) , we can support point and range sum queries, together with point and r ange addition updates. For uFunc(x,y)=x+y , qFunc(x,y)=m in(x,y) and mop(u,a,b)=u , w e can support point and range minimum ( or maximum ) queries, together with po int and range addition updates. W e can also consider point and range multiplic ation upd ates, uFunc(x ,y)=x·y , w ith point and range queries: qFunc(x,y)= x·y (with mop(u,a, b)=u b-a+1 ), qFunc(x,y) =min(x,y) and qFunc(x,y) = (x+y) (w ith mop(u,a ,b)=u ). With mop(u,a,b)= u , we can support range queries and updates for some bit function s (wh ere v i =0 or 1). For uFunc(x, y)=(x or y) and uFunc(x,y)=( x and y), we can have qFunc(x,y) =(x and y) and qFunc(x,y)=(x o r y) . F or the and update, we can als o have qFu nc(x,y)=(x xor y) . W e can su pport range xor u p- dates and q ueries ( uFunc(x,y) = q Func(x,y )= (x xor y) ), but with mop(u,a,b)=( if ((( b-a+1) mod 2)=0) then 0 else u ) . I n order to obtain any combinati on of bit functions, we notice that the result o f a query depends o nly on the number of 0 and 1 values (cnt 0 , cnt 1 ) in the query range: if (cnt 1 >0) then or returns 1; if (cnt 1 mod 2=1) then xor r eturns 1; if (cnt 0 =0) then and retu rns 1. Thus, we will w ork with ( cnt 0 , cnt 1 ) t u- ples as values. W e will also conside r the con cept ual values cv i , w hich are the numerical values w e co nceptually work with. W e have v i =(1- cv i , cv i ). A query asks for the num ber of 0 and 1 co nceptua l values in the query range and an up date changes this number according to the bit function used. Any combinati on of p oint and range queries and updates is su p- ported w ith the fu nctions below: bitT upleQuery((cn t 0,x , cnt 1,x ), (cnt 0,y , cnt 1,y )): return (cnt 0,x +cnt 0,y , cnt 1,x +cn t 1,y ) bitT upleUpdate( (1 -u, u), (cnt 0 , cnt 1 ), func ): if (f unc=an d) and ( u=0) th en re turn (cnt 0 +cnt 1 , 0) else i f (func =or) and (u =1) then return (0, c nt 0 +cnt 1 ) else i f (func =xor) and ( u=1) then retu rn (c nt 1 , cnt 0 ) else re turn ( cnt 0 , cnt 1 ) If the update functi on has the effect of setting all the va l- ues in a range to the same val ue s (range set), we will again need to work w ith tuples: the values v i and the update p a- rameters u will have the form (numerical value, time_stam p) . W e need to have a timest amp () function wh ich returns inc reasing values upon successive calls . W e ca n use a global counter as a time stamp, which is incremented at every call. T he initial numerical values are assigne d an in i- tial time stamp and every update parameter gets a more r e- cent tim e stamp. The update fu nction is: uFunc((w x , t x ), (w y , t y ) ): if ( t x >t y ) then retu rn (w x , t x ) else retur n (w y , t y ) W it h thes e defin iti ons , a point query funct ion cal l on a p o- si tion i wil l retu rn the l ast upd ate pa ram ete r of an in terv al co ntai nin g th at posi ti on . A usefu l range que ry fu nct ion (us ed t ogeth er with poi nt upd ates ) is fin ding the maxim um su m s egmen t (int erv al of co nsecu tiv e cell s) ful ly cont aine d in a rang e of cells [a,b] (se e [9] for thi s problem with out upda tes ). Conce ptua lly , the va lue of a ce ll i is a num ber cv i , bu t in the fram ewor k we w ill use tuples cons ist ing of 4 valu es: (tot als um, maxls um, max r- sum , maxs um) . As sumin g that thes e valu es cor res pond to an in terva l of c ells [c ,d], we hav e th e f oll owing def in ition s: d c p p cv totals um q c p p d q 1 - c cv ma x m a x lsum d q p p 1 d q c cv m a x m ax rsum r q p p d r 1 - q d q c cv ma x ma xsum In the fram ework , a valu e v i wil l be a tu ple corres pon ding to the inte rval [i,i]. If cv i <0, then v i =(c v i , 0, 0, 0); oth erw ise , v i =(cv i , cv i , cv i , cv i ). The point upd ate functi on change s the va lue of cv i of a cell i and then rec ompu tes v i . The qFunc fun cti on is g iven bel ow: qFun c( (t x ,ml x ,mr x ,m x ), (t y ,ml y ,mr y ,m y ) ): re turn ( t x +t y , max{ ml x , t x +ml y }, max {mr y , t y +mr x }, max {m x , m y , m r x +ml y }) W e can use the range set u pdate to geth er with th e range max imum sum segm ent query – this combin ati on is not su p- port ed by the framew ork in [6]. Conce ptua lly , each cell has a num eri cal va lue cv i . Pr act ically , the fr ame wor k’ s va lues v i wil l be tupl es of th e fo llow ing f orm (tot alsu m, ma xls um, max rsum , maxs um, time _st amp) . The upda te , query and mu l- ti pli cati on fun ction s are giv en be low . W e mus t notic e tha t the fu n da m enta l comb ina tion (range set upd ate , range sum que ry) is als o s olve d. How eve r , I cou ld not f ind sui tabl e fun cti on defini tions for the combin ati on (ra nge addit ion u p- dat e, rang e m axim um su m s egmen t query ). uFun c( (tota l x , m l x , m r x , m x , t x ), (tota l y , m l y , mr y , m y , t y )): if ( t x >t y ) then re tu rn ( to tal x , m l x , m r x , m x , t x ) el se r etu rn (to tal y , m l y , m r y , m y , t y ) qFun c( (tota l x , m l x , m r x , m x , t x ), (tota l y , m l y , mr y , m y , t y )): re turn ( tota l x +tot al y , max{ ml x , total x +ml y }, max {m r y , total y +mr x }, max {m x , m y , m r x +ml y }, max {t x , t y }) m op((t otal x , m l x , mr x , m x , t x ), a, b) : re turn (( b -a+1 )·t otal x , ( b- a+1) · ml x , ( b- a+1) · mr x , ( b-a+ 1)· m x , t x ) The fram ework ’ s beh avio ur can be im prov ed by adding a dirty flag to each block . W ith the dirty flag , the qagg v alu e wil l be rec ompu ted only “ on dem and” an d n ot after every poin t or par tial blo ck u pda te. W e only ne ed t o r eplac e th e fun cti ons BPpo intU pdat e , BPr angeU pdat ePa rti alBl ock and BPr ange Que ryFul lB lock w ith th e f oll owing def ini tions : BPp oin tUpd ate (u, i ): v i =uFu nc( u,v i ) B=t he b loc k to w hic h t he ce ll i be lon gs dir ty[ B ]=true BPra nge Upda teP arti alBl ock(B, u, a, b ): BPr an geUp da tePo in ts ( u , a , b ) dir ty[ B ]=true BPra nge Que ryFul lBl ock( B): if (di rty [B] ) th en BPr an geUp da tePo in ts ( u agg[ B ] , lef t[B ] , righ t[ B] ) uagg [B ]=un ini tial ize d qagg [B ]= BPr an ge Quer yPo int s ( l eft[ B] , rig ht[ B ] ) dir ty[ B ]=fals e re turn q agg[ B] 5. REL A TED WORK Opt ima l high mul tipl ic ity sch edul ing algorith ms for fi le tra nsf ers wi th divi sib le siz es, with the object ive of minim i z- ing the ma kes pan, wer e presen ted in [2]. Rela ted bin pac k- ing , kna psack and mult ipl e knaps ack pr oblem s wer e studi ed in [3,4 ,5]. A lth ough the sing le knaps ack probl em with divis i- ble item sizes was solve d in [5], the corr esp onding mu lti ple kna psac k v ersi on does not seem to hav e been address ed so fa r . The alg ori thmic fram ew ork f or the blo ck p art ition ing tec hni que is bas ed on a sim ilar framew ork f or the segm ent tre e data structu re, pr esen ted in [6]. The bl ock par tit ionin g te chni qu e has been used in orde r to enhan ce the pe rform ance of range queri es and upda tes in man y domain s, part icul arly in dyn ami c OLA P d ata cub es [ 7 ,8 ]. 6. CON CL USI ONS In this paper I presen ted two ef fi cien t algo rith ms fo r two of f line data transf er schedu ling probl ems . The fi rst one is equ iva lent to t he multi ple kna psa ck probl em with div isib le it em sizes, for which I am unawa re of any previ ous results . The secon d one is a minim um cost optim iza tion problem , for wh ich the prop ose d dynam ic prog ram ming algo rithm is o p- tim al. For the o nli ne case I prop ose d an alg orith mic fr am e- wo rk for the block part iti oning tech ni que. The fr amewo rk all ows to ef fi cient ly han dle pairs o f query a nd u pdate ope r a- ti ons w hose usefu ln ess i s unque st iona ble in seve ral c lass es of rea l-t ime res ourc e man age rs and b an dwi dth broke rs. REF ER ENC ES [1] K. Pruhs , J. Sgall , and E. T orn g , Onli ne Sch eduli ng , CRC Pres s, 2004 . [2] M. I. An dre ica, an d N. T apus , “Hig h Mu lti plic ity Sch e- d ul ing of Fi le T ransf ers w ith Div isi ble S iz es” , P r oceedi ngs o f the Int er nati onal Sym pos ium of C ons umer Ele ctr onics , 200 8. [3] E. G . Cof fm an, Jr ., M.R. Garey , and D.S . Johns on , “Bin pac king w ith divi sibl e item siz es, ” Journal of Co mple xity , pp. 406- 428, 198 7. [4] C. Cheku ri, an d S . Kh anna , “ A PT AS for the Multip le Kn apsa ck Problem ,” Pr oceeding s of the 1 1th ACM -SIA M Sym posi um on D iscr ete Algori thms , p p. 21 3- 222, 2000. [5] W . F . J. V erh aegh , and E. H. L . Aarts , “A Po lynom ia l - T im e Algo rithm for Knaps ack with Divis ible Item Siz es”, Inf . Pr oce ss. Let t. v ol. 62( 4), pp. 217- 221, 199 7. [6] M. I. Andr eic a, an d N. T ap us , “Op tim al TCP S ende r Buf f er M anag ement S trat egy ”, Pr ocee din gs of the Inter n a- ti onal Conf er ence on Commu nicat ion The ory , Relia bil ity and Qual it y of Ser vice , 2 008. [7] H.- G . Li, T . W . L ing, S. Y . Lee, and Z. X. Loh, “Ra nge Sum Queri es in Dynam ic OLAP Dat a Cubes ,” Pr oce edi ngs of the 3 rd Int ern ation al Sympo sium on Coope rati ve Data base Sys tems f or Ad vanc ed Appl ic ation s , pp. 7 4- 81, 2 001. [8] C . K. Poo n, “Dy nam ic Orth ogon al Range Queri es in OLA P ,” The or etica l Co mput er Sc ien ce , vol . 29 6 (3) , 20 03. [9] K.-Y . Chen, and K.- M. Chao, “On the range max im um - sum segment q ue ry probl em”, Disc r et e Appl ied Mat hema- ti cs , vo l. 155, pp. 2043 - 2052 , E lsev ie r , 2007.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment