Geographic Trough Filling for Internet Datacenters

Geographic T rough Filling for Internet Datacenter s Dan Xu and Xin Liu Computer Science Department, University of Califor nia, Davis, { dan xu, xinliu } @ucdavis.edu Abstract —T o r educe datacenter energy consumpt ion and cost, current practice has consid ered demand-proportional resour ce prov isioning sch emes, where servers are turned on/off according to the load of requests. Most existin g work considers instanta- neous (In ternet) requests only , which are explicitly or implicit ly assumed to be delay-sensitive. On the other hand, in datacenters, there ex ist a v ast amount of d elay-tolerant jobs, su ch as b ack- ground/maintainance jobs. In this p aper , we explicitly d ifferenti- ate d elay-sensitiv e jobs and d elay tolerant jobs. W e focus on the problem of using delay-tolerant jobs t o ﬁll the extra capacity of datacenters, referr ed to as trough/valley ﬁ lling. Giving a higher priority t o delay-sensitive jobs, our schemes complement to most existing demand-proportional r esource pr ovisioning s chemes. Our goal is to design intelligent trough ﬁlling mechanisms th at are energy efﬁcient and also achiev e good delay p erf ormance. Speciﬁcally , we propose two joint dynamic speed scaling and trafﬁc shifting schemes, one subgradient-based and the other queue-based. Our schemes assume little statistical information of the system, which is usually difﬁcul t to ob tain i n practice. In b oth schemes, en ergy cost saving comes from dynamic sp eed scaling, statistical mu ltiplexing, electricity price diver sity , and service efﬁciency div ersity . In addition, good delay perf ormance is achiev ed in the queue-based sc heme via load shifting and capacity allocation b ased on queue conditi ons. Practical issues th at may arise in datacenter networks are consi dered, i ncluding capacity and bandwidth constraint, service agility constraint, and load shifting cost. W e use b oth artiﬁcial and real datacenter traces to ev aluate the proposed schemes. I . I N T R O D U C T I O N The fast proliferation of cloud computin g has pro moted rapid gr owth of large-scale commercial da tacenters. Major service pr oviders o ften de ploy tens to hund reds of d atacen- ters distrib uted nationwide o r even worldwide, referred to as Internet-scale da tacenters ( IDC). Becau se electricity bill contributes to a la rge p ortion of IDC oper ational expenditur e, there have been lots of efforts towards reducing IDC energy consump tion/cost. Researchers h av e considere d designing ‘lo ad-aware’ ID Cs, e.g., in [1][2][4 ]. The key idea is to provision servers according to the load o f Internet requests. E xtra ser vers are shut down or scheduled in sleeping mod e to save energy . In th is paradigm , a major challen ge is to properly size an IDC, i. e., to deter mine the number of acti ve servers, and in the meantime guaran tee the service requirem ent. For example, in [ 2], the authors propo se to predict the load o f windows live messengers and provision servers acco rdingly . In [4], the authors e stimate the current lo ad, a nd design online server provisionin g schemes to reduce energy an d server s tate transition cost, which is referred to as dynam ic “right sizing”. In the above-mention ed work, service requ ests are typ ically delay-sensitive, i.e., requiring a short de lay an d low drop ra te. Such app lications includ e searching or sign ing in a messeng er . When the load is lower , more servers would be turned o ff to sa ve energy . Ho we ver , in practice , an IDC oper ator m ay be reluctant to turn off servers in a large scale e ven at a low load of requests. One r eason is that turning on /off servers frequen tly affects Qo S a nd long term system reliability , as considered in [1]. But th e forem ost reason is that ther e are also a large amou nt of backg round or mainten ance jobs in I DCs to process, e.g., searchin g engine tunes r anking algorithm s. Thus, the “extra” cap acity can b e utilized to proce ss the bac kgroun d analytical jobs. This is referr ed to as tr ough/valley ﬁlling . T rough ﬁlling has not been stud ied thoro ughly . In this paper, we focu s on in telligent trough ﬁlling. W e assum e a given capacity provisionin g and sched uling mechanism for delay-sensitive jobs (DSJs), e.g ., those p roposed in [1][2][4][15][34][38]. W e decid e how to u se load shifting and dynamic speed scaling to c ontrol delay tolerant jobs (DTJs), e.g., back groun d analytical jobs. On on e hand , DTJ lo ad is high a nd thus its energy cost is consider able. On the othe r hand, it is desirable to assure a go od delay pe rforman ce fo r DTJs. The goa l of intelligent trou gh ﬁlling is thus to achie ve energy efﬁciency as well as good delay perfor mance (or at least guarantee the queue stability) for DTJs. Intelligent trou gh ﬁlling needs to acco mmodate the fo llow- ing issues. First, th e overall capa city o f a da tacenter is likely to be rand om, e.g ., due to server failure. Second, capacity demand of DSJs, such as In ternet req uests, varies due to dynamic load. Given the hig her p riority o f DSJs, a vailable capacity for D TJs is ran dom and har d to pre dict or lear n in statistics. Meanwhile, th e demand o f DTJs is also likely to be dynamic. Further, in order to co nsider a set of geo graphica lly dis- tributed IDCs , the re are additional constraints. First, load shift- ing is constrained by the bandwid th available between IDCs. In our setting, sim ilar to capa city , band width is prioritized for shifting DSJs, and thus results in a rand om ‘ residual bandwidth ’ f or DT Js. Seco nd, electricity prices di versity and dynamics bring challenges as well as oppo rtunities, e.g., in price-aware load shifting [32]-[39], in the context of troug h- ﬁlling. Thir d, due to heterogen ous service agility , different classes of DT Js may r equire different sets of IDCs. Moreover, different IDCs maybe heterogenous in service rates and energy consump tion for each type of DTJs. W e consider these issues and address the above challenges in this paper . In this p aper, our go al is to d esign intelligen t trough ﬁlling mechanisms, that ach iev e bo th en ergy efﬁciency and good delay perform ance. W e design joint dyn amic speed scaling and load shifting schemes. Speciﬁcally , we make the following contributions: • W e focus on tr ough ﬁlling in d istributed IDCs, which complimen ts the cu rrent work o n load-aware capa city provisioning , or price-aware load shifting . • W e consider practica l issues in IDCs, su ch as dynamic capacity an d ban dwidth co nstraints, dyn amic deman d, and heterog enous service agility and service rates. • W e ﬁrst pro pose a stochastic subgra dient based tro ugh ﬁlling scheme, name d SSTF , with the ob jectiv e of mini- mizing energy and shiftin g co st wh ile stabilizing the DTJ queues. The propo sed algor ithm does not need u nderlyin g probab ility of system states, which is u sually difﬁcult to estimate. • W e fur ther prop ose a queue-b ased troug h ﬁlling algo- rithm, called QTF , wh ich does not need any statistical system informatio n. W e show the QTF achieves desirable perfor mance in terms of cost and qu eue delay . • W e d iscuss o n how to incorpora te cap acity pr ovisioning and QoS assurance for DSJs into our pr oposed SSTF a nd QTF . • W e use both synthetic tr afﬁc trace an d real datacenter trafﬁc trace to e valuate our proposed schemes. Simulatio n results show that QTF outper forms SSTF signiﬁcantly in both cost and queue delay . The rest of pap er is organized as follows. In Section I I, we survey related work. In Section III, we describe th e system model. In Section V, we present stoc hastic sub gradient based trough ﬁlling scheme. W e furth er propose a q ueue based trough ﬁlling scheme in Section VI. W e also discuss how to extend the schemes to DSJs a nd implem entation issues in Sec- tion VII. W e ev aluate o ur pro posed schemes in Section VIII, and conclud e in Section IX. I I . R E L A T E D W O R K Industry and a cademic research c ommunity h a ve paid mu ch attention to redu cing datacenter energy co nsumption and cost. Solutions a re considered in all spec tra, inclu ding p ower - efﬁcient chip, coolin g system, deployme nt, and many others. Our work comp lements to lo ad-aware server provision ing or power-proportion al design [1]-[7]. Such works fo cus on server or resource provisioning ba sed on load of Internet requests, with serv ice level agreement SL A o r oth er Qo S metrics assured. For example, in [1], the authors p ropose server provisionin g an d dyn amic speed/voltge scaling schemes for a data center, throu gh load prediction and feedback control. Load pred iction-based server provisioning and load d ispatch is propo sed in [2] f or connection- intensive Microsoft datac enter . Online resource or server p rovisioning schemes are de signed in [3][4]. In [4], the au thors c onsider a relati ve large time interval such tha t curr ent load of requests can be estimated. Server state transition co st is also considered. Furtherm ore, the auth ors also con sider the im pact of tro ugh ﬁlling o n energy saving b y th e pro posed sche me tho ugh simu lations. Queue based server p rovisioning a nd L yap onuv o ptimization based per forman ce establishment is p roposed in [6]. Althou gh the L yaponu v op timization techniq ue is also used to sho w perfor mance o f the queue-based scheme, our problem is differ - ent, i.e., we consid er troug h-ﬁlling, with cross-datac enter load shifting an d cap acity p rovisioning. In [7], the au thors pro pose an econ omic framew ork which maximiz es the total proﬁt of resource provisioning fo r all r equests. Many other p ower managem ent schem es fo r a datacenter have been pro posed, e. g., in [8]-[3 0]. Dyn amic speed/voltge scaling sa ves p ower consum ption of a p rocessor b y a djusting the freq uency b ased on the instantaneou s load demand, e.g. in [8]-[16], wh ich can also be considered as load-aware r esource provisioning . Howe ver , mo st of the work only con siders a single pro cessor . In [15], th e author s u se MDP to ﬁnd optimal stationary D VS and load ba lancing policy to reduce serv ice cost. In th is paper, we use DVS as a part of co ntrol mechan ism for troug h ﬁlling in IDCs. An other popular scheme is virtual- ization an d server consolidatio n, e. g., in [ 20]-[25], which can reduce th e trafﬁc dynamics b y co nsolidating applications, an d thus r educe the number of active servers. There ar e also some other works o n datacen ter-le vel power management, such as workload decompo sition [26], o ptimal power allocatio n for servers with to tal power budget [27], model pr edictive contro l (MPC) the ory based hie rarchical power contro l [28], and other technique s [29][30][31]. Most rece ntly , cr oss-IDC power and cost op timization that exploits geograph ic d i versity has received signiﬁcan t attention, e.g., in [32]- [39]. T he key idea is to shif t reque sts to IDCs with lower electricity prices to red uce cost. T he tradeoff is the extra delay cau sed by trafﬁc shifting. Thus, in [3 4][38], the authors co nsider response time as the co nstraint. In [37][39], the authors consider shifting cost as the rev enue lo ss incurred by extra delay . Our work can also le verag e pr ice diversity , i. e., by ﬁlling cheap tro ughs o f IDCs. Th e d ifference is that sin ce backgr ound jo bs ar e de lay to lerant, ou r capacity provisionin g and load shifting schemes also exploit the tempo ral price div ersity , in addition to ge ograph ic diversity . I n a recen t work [5 1], the au thors use en ergy storage systems to lev erage the tempor al p rice d ynamics to cu t the energy cost, but for a single datacenter . W e refer re aders to [ 43] for a survey and [44] for d iscussions on challenges and issues in IDC power man agement. I I I . S Y S T E M M O D E L S A. The IDC and server model W e consider one serv ice p rovider with a set of N I DCs in different locations. An IDC i has K max i homog enous servers. W e consider a time slotted system, where the slo t length can be from hundre ds of milliseconds to minutes. W e assume in each slot t , the numbe r of active servers of an IDC i is ﬁxed and is d enoted by K t i . Note th at K t i varies over time, du e to either dyn amic service provision ing (e.g ., th ose propo sed in [2][4][34]) o r server failure. An acti ve server op erates at a CPU speed of s . Following the models in [11][1 2][36], we no rmalize s , i.e., 0 ≤ s ≤ 1 , where 0 re presents the idle state of an active server , and 1 represents the maximum frequen cy . W e deﬁne the capa city o f an IDC i as the sum of speed o f all acti ve ser vers. I f each server runs at the same speed s , the total capacity in tim e slot t is K t i s . Clearly , the maximu m capa city with K t i servers is K t i . In this p aper, we consid er CPU resource as the the m ain bottleneck and focus on CPU capacity scheduling . The impact of other equ ipments, i.e., memory and I/O, will b e considered in h eterogeno us service ra tes, as discussed in sub section III -C. Because scaling u p/down the speed s of an acti ve server only takes sev eral micro seconds [12][18], which is negligible, dynamic speed scaling can be co nducted instan taneously in each time slot. B. W orkload model W e co nsider two categories of demand: delay sensiti ve job s (DSJs), e.g., searchin g, email log in in , or messenger sign up, and de lay toleran t jo bs (DT Js), e.g., back groun d analytical jobs. DSJs enjoy a h igher priority on capacity alloc ation. The remaining capacity can be utilized b y the DTJs. Since th e load of DSJs is usually dyna mic, cap acity dem and of DSJs in an I DC i in each slot is consider ed rand om. W e use S t i 0 to denote the capacity allocated to DSJs at IDC i in slot t . W e assume S t i 0 is giv en, based on some e xisting load-aware capacity provisioning sch emes. A vailable capac ity for DT Js in IDC i is thus K t i − S t i 0 . For DTJs, they can be further divided into different classes to capture their different re source requirements. W e consider there are in total M different classes of DTJs in the N IDCs. If the same kind o f DSJs, e. g., tunin g we bpage r anking algorithm s, originates ( ﬁrst arr i ves) at different IDCs, we treat them as different classes. This is becau se they may have different sets o f IDCs to b e shifted to du e to distance constraints. For DTJ j , it ﬁrst originates at an IDC i . Let D t j denote the trafﬁc or load size o f DT J j in time slot t . D t j is a rando m variable. W e do n ot make assum ptions on its distribution. C. Mode ls for load shifting an d service Although a DTJ j orig inates at an IDC i , we ca n shift th e trafﬁc to other IDCs, e.g., to exploit the ir a vailable cap acity or lower electricity pr ices. Note that cro ss-IDC load shifting is practically f easible d ue to negligible shifting time delay [36], which has been wide ly co nsidered, e.g., in [32]-[4 2]. Load shifting has practical con straints. First, d ue to limited service agility of I DCs, a class of DTJ j c an potentially be served by only a subset of IDCs. Let Γ j denote the set of IDCs that can serve DTJ j , which is different for different classes of DTJs. DTJ j c an only be shif ted to IDC i ′ , whe re i ′ ∈ Γ j . Second, band width between I DCs is limited. Moreover, d ue to poten tially lo ad shifting for DSJs, which also requir es a high priority o f bandwidth provision ing, av ailable bandwid th for DTJs is limited and dynamic. This co nsideration is similar to th at in a very recent work [41], where the autho rs develop a system to rescue unutilized network band width for shifting the non-r eal-time bulk data, e.g., back up data. W e use B t ii ′ to denote th e av ailable bandwidth from IDC i to i ′ for DTJs in slot t . B t ii ′ varies over tim e, an d c an b e set in an appr opriate value to prevent signiﬁca nt network delay . Note when two T ABLE I : Main No tations K t i Number of active servers of IDC i in slot t ( K ω i for state ω ) D t j Traf ﬁc arriv al of DTJ j in slot t B t ii ′ Bandwidth constraint for DTJs between IDC i and i ′ in slot t Υ ii ′ Set of different types of DTJs s hifted from IDC i to i ′ Γ j Set of IDCs that can s erve DTJ j Π i Set of different types of DTJs s erved by IDC i S t ij Capacity/speed allocated by IDC i ( i ∈ Γ j ) to DTJ j in s lot t S t i 0 Capacity/speed allocated by IDC i to DSJs in slot t (Given vairiable) S t Capacity/speed matrix in slot t ( S ω for state ω ) r ij Unit s ervice rate by IDC i for DTJ j P t i Power c onsumption of IDC i in slot t α t i Electricity price of IDC i in slot t ( α ω i for state ω ) φ t ii ′ Load s hifting cost between IDC i and i ′ in slot t g t () T otal cost function on S t in slot t ( g ω () for state ω ) π ω Distribution of system state ω (unknown to SSTF) DTJ(DSJ) Delay tolerant (sensitive) jobs IDCs have lim ited conn ections or a lo ng distance such that load shifting is not desirable, B t ii ′ can be set as 0 f or a ll time slots. Let D t j ii ′ denote th e trafﬁc of DTJ j shifted from IDC i to i ′ . Further let Υ ii ′ denote the set o f DTJs that ﬁrst arrive at IDC i an d ca n be served b y IDC i ′ . W e hav e P j ∈ Υ ii ′ D t j ii ′ ≤ B t ii ′ as the lo ad shifting constraint. For an IDC i ∈ Γ j , it alloc ates a certain cap acity to DTJ j in time slot t , denoted by S t ij . W e hav e S t = { S t ij | j = 1 , . . . , M , i ∈ Γ j } , as the capa city allocation m atrix, which is our co ntrol variable. A n ID C i may serve m ultiple DTJs. Let Π i denote the set of all DTJs served by an IDC i . Obviously , we have the capa city allo cation constraint as P j ∈ Π i S t ij ≤ K t i − S t i 0 . W ith cap acity S t ij , DTJ j receives a ce rtain service rate. W e use the R ij ( S t ij ) as the service rate functio n on the capacity . For simplicity , w e con sider R ij () as a lin ear fun ction of S t ij , i. e., R ij ( S t ij ) = r ij S t ij . T he unit service rate r ij is heteroge nous fo r d ifferent pair s of DTJ j and IDC i . T his is b ecause dif ferent DTJs may req uire dif fe rent memo ry , I /O resource, etc. Load shifting and dyn amic speed scalin g ar e coupled . The amoun t of trafﬁc o f DTJ j shifted from IDC i to i ′ depend s on the capacity allocate d at IDC i ′ . Thu s we have D t j ii ′ ≤ r i ′ j S t i ′ j . Sin ce both energy an d load shiftin g cost increase with S t i ′ j , we have D t j ii ′ = r i ′ j S t i ′ j . The u nﬁnished jobs of a DTJ j are buf fer ed in a q ueue at the I DC where DTJ j originates. Let Q j ( t ) denote th e queue in tim e t , the queue dyn amics of DTJ j ca n be written as Q j ( t + 1) = max   Q j ( t ) − X i ∈ Γ j r ij S t ij , 0   + D t j , (1) where P i ∈ Γ j r ij S t ij is the to tal service rate a DTJ j rece i ves in slot t . D. P ower consum ption an d cost mod el According to [11][1 2], p ower consum ption o f a server (processor ) running at a speed s ∈ [0 , 1 ] is P ( s ) = ρs ν + 1 − ρ, (2) where th e expo nent ν ≥ 1 , with a typica l value of 2 [12], and 1 − ρ represents the power consump tion in the idle state, which is arou nd 0 . 6 , an d hardly lower than 0 .5 [2]. In this paper, we ch oose ν = 2 , as in [12]. Note that our schemes can be extended to th e cases with other values of ν . Consider an I DC i . In a tim e slot t , there are K t i activ e servers, and the total capacity demand is S t i . It can be shown that th e most ene rgy-efﬁcient operatio n is to let each server ev enly share the d emand, i.e., each server is running at a speed S t i K t i , which results in a total power c onsumptio n in slot t o f P t i = (1 − ρ ) K t i + ρS t i 2 K t i , (3) where S t i = S t i 0 + P j ∈ Π i S t ij . Because we focus on trough- ﬁlling, w e take K t i and S t i 0 as given constants in each time slot. W e only control S t ij . Note that P t i is a conve x function of S t ij . Besides the power c onsumptio n of servers, o ther compo - nents in an IDC, e.g., memory , I/O, hard disk, and non-IT equipmen ts such as co oling sy stems, also con tribute to the total power con sumption, which is rou ghly p roportio nal to that by servers [46]. Th us total power con sumption of an I DC can be obtained by scaling up P t i with a constant factor . For notation brevity , w e absor b this con stant factor in to the electricity price at IDC i . Electricity price exhib its sign iﬁcant diversity in b oth location and time . W e use α t i to de note the price at I DC i in time slot t . Althoug h α t i is a time-varying v ar iable, it v a ries slowly . T ypic ally , in a wh olesale market, α t i is determined by Regional T ransmission Organ ization (R TO) day- ahead based on expected load and chan ges hour ly; or altern ativ e ly , α t i is determined in real-time (every 15 min) based on th e actual load. W e con sider energy co st of an I DC as the pro duct of power consump tion and its electricity price . E. Load shifting cost W e also consider lo ad shifting cost. In practice, datacenter operator s m ay ha ve a lease with ISPs for data trafﬁc amo ng IDCs. Some large ope rators like Goog le and Micr osoft may ev en have th eir own backbon e network s to inter connect the IDCs. Eithe r case, shifting cost is usua lly incurr ed during th e acquisition or co nstruction p hase, which depends less on the trafﬁc v olum e that the internal links carr y [45]. Howe ver, since DTJs h av e a lower p riority , it is desirable to schedu le a limited link ban dwidth to th em. For example, when the tim e slot is relativ ely lo ng, a higher utilization of the link capacity b y DTJs will make the system more sensiti ve to the burst of DSJs , which enjoy a higher priority o n load shifting . T o prevent the increasing sen siti ven ess to DSJs, we use a p iece-wise linear cost fu nction with increasing rate to mode l the shif ting cost for DTJs. Let φ t ii ′ denote the shifting cost in slot t between IDC i and i ′ , we have φ t ii ′ = ma x ( a ϑ ii ′ P j ∈ Υ ii ′ D t j ii ′ B t ii ′ + b ϑ ii ′ ) , ϑ = { 1 , 2 , . . . , θ } , (4) where P j ∈ Υ ii ′ D t jii ′ B t ii ′ is the link capacity occupatio n ratio by DTJs. W e hav e a 1 ii ′ ≤ . . . a ϑ ii ′ . . . ≤ a θ ii ′ , which captures the increasing sen siti ven ess to cap acity o ccupation ratio b y DTJs. φ t ii ′ is a co n vex functio n on D t j ii ′ , and thus on S t , since it is the pointwise maximum of a set of a fﬁne fun ctions, an d D t j ii ′ is linear on S t . Th e model is also widely considered by p revious works, e.g ., in [47]. Note that our work can also incorporate other shif ting cost models with mino r mod iﬁcations. I V . A B E N C H M A R K S C H E M E In this section, we ﬁrst consider a benchm ark schem e, wh ere the goal is to minimize the time average of the total cost of N IDCs, including energy cost and s hifting cost, while stabilizing the M DTJ queu es. W e nam e it stability-assured cost optimal trough -ﬁlling (SCO TF). In each time slot, both the energy cost and the sh ifting cost are fu nctions o f S t . The overall cost in each slot also dep ends on K t i , α t i , and S t i 0 , i = 1 , . . . , N . Th us the overall cost is a time-varying fu nction on S t , den oted by g t ( S t ) . Besides, capacity allocation and shifting constrain ts, i.e., C t i and B t ii ′ , are also time-varying. Thu s S t takes values in a time-varying set. L et Λ t denote th e set of S t that satisﬁes capacity allocation and shifting con straints in slot t . SCO TF is formulated as min S t lim in f T →∞ 1 T P T t =1 g t ( S t ) s. t. lim sup T →∞ 1 T P T t =1 Q j ( t ) ≤ ∞ , (5) S t ∈ Λ t , j = 1 , . . . , M , (6) where th e ﬁrst constraint is to guarantee each DTJ queue’ s stability . Note we use ‘sup’ ( ‘inf ’ ) to guar antee the inﬁn ity exists. It is difﬁcult to solve pr oblem (6) directly in practice, because it is hard to obtain prior system information of all tim e slots. W e present th e problem o f SCO TF her e as a co st b ench- mark. Our propo sed scheme s, one stochastic subgradien t- based and one que ue-based, require little system statistical informa tion, and th us are more practical. Th e objectives of propo sed schemes a re n ot limited to gua ranteeing DTJ queue stability as in SCO TF . Go od delay perform ance is a lso desired, especially for the queue- based scheme. V . S T O C H A S T I C S U B G R A D I E N T B A S E D T R O U G H FI L L I N G W e ﬁrst consider an ergodic scenario where system state h as a steady state distribution. Here a state char acterizes a unique set of all variables in volved in the system, includ ing K t i , α t i , S t i 0 , an d B t ii ′ , i, i ′ ∈ 1 , . . . , N . Let Ω deno te th e set o f system states, and ω a g eneric system state, ω ∈ Ω , π ω the steady distribution of ω , g ω () is the cost fu nction in state ω . Let S ω denote the capacity allocatio n matrix in state ω , which is in the set Λ ω . Let ~ λ denote the mean of arrival rate vector of DTJs. SCO TF can be r ewritten as min g e = P ω ∈ Ω π ω g ω ( S ω ) s. t. P ω ∈ Ω π ω ~ R ω ( S ω ) ≥ ~ λ S ω ∈ Λ ω , (7) W e u se g ∗ e to denote the optimal solution to the abov e p roblem, i.e., optimal cost in the ergodic system case, with the arriv al rate ~ λ stabilized. I n practice, ~ λ c an possibly b e estimated by historic da tabase o r prediction schem es. I f th e stead y state distribution π ω is avail able, then (7) is a deterministic conv ex o ptimization pr oblem. Howe ver , in p ractice it may be difﬁcult to o btain such statistical kn owledge. W e thus design a stochastic subg radient-b ased algorithm that can solve ( 7), without prior info rmation o n π ω . Note the scheme n eeds the informa tion of the average rate, i.e., ~ λ , or at least an uppe r bound to guaran tee stability . W e ﬁrst d eﬁne a Lag rangian fu nction associated with prob- lem (7) as L ( ~ µ, ~ S ) = X ω ∈ Ω π ω g ω ( S ω ) − M X j =1 µ j ( X ω ∈ Ω π ω X i ∈ Γ j r ij S ω ij − λ j ) , (8) where ~ S = { S ω | ω ∈ Ω } , S ω ∈ Λ ω , and ~ µ = ( µ 1 , . . . , µ M ) is the set of the Lag rangian multipliers. Note ~ µ ≥ 0 . The dual problem of (7) is deﬁned as max ~ µ> 0 F ( ~ µ ) , (9) where F ( ~ µ ) = min ~ S L ( ~ µ, ~ S ) . (10) T o solve the du al pr oblem, we ﬁrst consider (8). For a given multiplier ~ µ , the problem is sepa rable fo r dif f erent states. Thus, we can solve the f ollowing proble m for a given state ω , min S ω g ω ( S ω ) − M X j =1 µ j ( X i ∈ Γ j r ij S ω ij − λ j ) s.t. S ω ∈ Λ ω . (11) An examination of (11) yields the fo llowing optimization problem of jo int capacity allocatio n an d loa d shiftin g af ter observing sy stem state in the cur rent slot min S ω N P i =1 α ω i  (1 − ρ ) K ω i + ρ ( S ω i 0 + P j ∈ Π i S ω ij ) 2 K ω i  + N P i =1 P i ′ 6 = i max 1 ≤ ϑ ≤ θ  a ϑ ii ′ P j ∈ Υ ii ′ r i ′ j S ω i ′ j B ω ii ′ + b ϑ ii ′  − P M j =1 µ j ( P i ∈ Γ j r ij S ω ij − λ j ) s. t. P j ∈ Π i S ω ij ≤ K ω i − S ω i 0 , i = 1 , . . . , N P j ∈ Υ ii ′ r i ′ j S ω i ′ j ≤ B ω ii ′ , i , i ′ = 1 , . . . , N , i 6 = i ′ . (12) In (12), the ﬁrst item is the total energy cost, th e second is the shifting cost, the ﬁrst constraint is the cap acity constraint on DTJs in IDC i and the secon d co nstraint is ban dwidth constraint betwee n IDCs i and i ′ . Clearly , (12) is a c on vex optimization problem of S ω . This is because, the objec ti ve function is the sum of a set o f conve x an d afﬁne functions of S ω , an d the con straints are bo th afﬁ ne and thus conve x . W e can solve it ef ﬁciently for a giv en state ω in each time slot. When capacity allocation is d etermined, load shiftin g policy is also jointly de termined, i.e., shift an amou nt of r i ′ j S ω i ′ j for DTJ j fro m IDC i to i ′ if j ∈ Υ ii ′ . The dual pr oblem ca n be solved using a stoch astic subg ra- dient algorithm [49], which has the following iterative steps µ n +1 j = [ µ i + β n σ n j ] + , (13 ) where n den ote the n th iter ation, i.e., n th time slots in our case, and ~ σ n = ( σ n 1 , . . . , σ n M ) is the vector of stoch astic subg radient that is chosen as E ( ~ σ n | ~ µ 0 , . . . , ~ µ n ) = ∂ ~ µ F ( ~ µ n ) , (14) where ∂ ~ µ F ( ~ µ n ) is a subgradien t of F ( ~ µ ) at ~ µ n . In th is case, b y updating ~ µ n using (1 3), ~ µ n conv erges to the o ptimal solution of the dual pr oblem (9) with proba bility 1, if th e following condition s are satisﬁed E (( σ n 1 2 + . . . + σ n M 2 ) 1 2 | ~ µ 0 , . . . , ~ µ n ) ≤ c, (15) where c is a constant, and P ∞ n =0 β n = ∞ , P ∞ n =0 ( β n ) 2 = ∞ . Note a candidate for β n can be 1 n . The subgradient ∂ ~ µ F ( ~ µ ) can b e a set, where by Danskins Theorem [50], we can choo se a subgrad ient as ∂ ~ µ j F ( ~ µ ) = − X ω ∈ Ω π ω X i ∈ Γ j r ij S ω ij ∗ + λ j , j = 1 , . . . , M , (16) where S ω ij ∗ is the optimal solution to pro blem (12). Note th at σ n j is a stochastic subg radient if its exp ectation equals to a subgrad ient. W e can cho ose σ n j as σ n j = − X i ∈ Γ j r ij S ω n ij ∗ + λ j , j = 1 , . . . , M , (17) where ω n is the index of the system state at iteration n . ( 15) is satisﬁed, because r ij S ω n ij ∗ is b ounde d, ∀ i , j , which lead s to bound ed σ n j , ∀ j . σ n j deﬁned in (17) is a stochastic subgra dient, because we consider an ergodic setting and thus the time av erage of σ n j equals to the sub gradient of (1 6). Further, since the original problem (7) is a convex optimization pro blem that satisﬁes the Slater’ s condition , there is no duality gap. W e n ame the a bove alg orithm stochastic subgr adient-based trough ﬁlling (SSTF). SSTF co n verges to the op timal solution of problem (7). Th us it can ach iev e th e o ptimal cost given a service r ate that assures queue stability . Note th at SSTF can work in non -ergodic settings. Lagran gian m ultiplier ~ µ has practical pr operties. It c an b e considere d as a price, which increases as service rate being sm aller than the average ar riv al rate, i.e., capacity und er-provisioning. In p ractice, b y updating ~ µ , SSTF can a chieve good co st p erforma nce. Moreover , the objective o f SSTF is not limited to co st optimality only . One can tune the average service rate of SSTF , i.e., by adju sting ~ λ in (7), to contr ol the DTJ delay . Thus, SSTF is NOT SCO TF in the ergo dic setting . Another beneﬁt of SSTF is that it also exploits temp oral div ersity o f electrical prices. Howe ver, SSTF needs the knowledge o f the a verage DTJ arriv al rate, which may n ot be av ailab le in practice. Further, it may converge slowly and it is d ifﬁcult to ch aracterize its delay performa nce. Th is moti vates us t o consider the following queue- based algorithm , which lev e rages queu e infor mation so that neither ~ λ n or system distribution info rmation is req uired. V I . Q U E U E B A S E D T R O U G H FI L L I N G A. Algorithm Design In this section, we pr esent a queue-based algorithm that ex- plicitly con siders queue backlog of DTJs. The algorithm takes the instantaneou s system state (i.e., queue length, a vailable server capacity and bandwidth, DSJ load deman d) as the inpu t. The algorithm also has a parameter to control th e tradeoff between co st an d q ueue dela y . W e will also show that the algorithm achieves bou nded average qu eue backlog such th at the system is stabilized, wh ile the cost can b e ar bitrarily close to the optimal cost achieved by (7). In each time slot t , o bserve current qu eue backlog Q j ( t ) , j = 1 , . . . , M , α t i , S t i 0 , C t i , an d B t ii ′ , i = 1 , . . . , N . Allocate the cap acity at each IDC i for each que ue j ac cording to the f ollowing optimiza tion scheme, named queue-based trough ﬁlling (QTF): min S t − M P j =1 Q j ( t ) P i ∈ Γ j r ij S t ij + V N P i =1 α t i  (1 − ρ ) K t i + ρ ( S t i 0 + P j ∈ Π i S t ij ) 2 K t i  + V N P i =1 P i ′ 6 = i max 1 ≤ ϑ ≤ θ  a ϑ ii ′ P j ∈ Υ ii ′ r i ′ j S t i ′ j B t ii ′ + b ϑ ii ′  (18) s. t. P j ∈ Π i S t ij ≤ K t i − S t i 0 , i = 1 , . . . , N P j ∈ Υ ii ′ r i ′ j S t i ′ j ≤ B t ii ′ , i ′ = 1 , . . . , N P i ∈ Γ j r ij S t ij ≤ Q j ( t ) , j ∈ { 1 , . . . , M } . (19) Similar to (12), (1 9) is a conve x o ptimization prob lem. Thus at the b eginning of each slot, capacity allo cation S t can be determined efﬁciently . The intuition of QT F is clear . When q ueue length M P j =1 Q j ( t ) is high, QTF has incentive to allocate a larger capac ity to reduce th e queue length. Whe n the cost is relatively large or queue length is small, QTF is driven to alloc ate less capacity to re duce the cost. The contr ol variable V is to balanc e the queue leng th an d cost. If V is large, QTF will result in lower cost but long er average que ue delay . T o b etter illustrate the intuition of the algorith m, we further consider a special case, wh ere th ere is only one I DC with M delay toleran t qu eues. In the single IDC case, we can simplify notatio ns b y removing subscript i . Th e capacity vector becomes S t = { S t 1 , . . . , S t M } . W e have the following scheme for c apacity allocation, named single-IDC queue- based trough ﬁlling (SQTF) min S t − P M j =1 Q j ( t ) r j S t j + (20) V α t  (1 − ρ ) K t + ρ ( S t 0 + P M j =1 S t j ) 2 K t  (21) s.t. P M j =1 S t j ≤ K t − S t 0 (22) S t j ≥ 0 , j = 1 , . . . , M . (23) W e h av e the f ollowing solution o n S t . Observation 1: SQT F allo cates S t as: in each time slot t , choose the q ueue with the m aximum Q j ( t ) r j , deno te as j ′ , then S t j ′ =      K t − S t 0 , if Q j ( t ) r j ≥ 2 V ρα t Q j ( t ) r j K t 2 V ρα t − S t 0 , elsif Q j ( t ) r j ≥ 2 V ρα t S t 0 K t 0 , else , S t j = 0 if j 6 = j ′ . (24) In other words, SQTF is a threshold -based policy , which serves th e longest queue and o nly when its qu eue le ngth is above a certain th reshold. B. P erformance an alysis In this sub section, we analyze the perfo rmance o f the QT F algorithm in terms o f the co st and average delay perfor mance. Our analysis is based on L yap unov drift optimizatio n [51]. Deﬁne r i = ma x { r ij | j ∈ Π i } , i.e., m aximum unit ser vice rate for all DTJs in IDC i . L et D m j denote the upper boun d of arriv al trafﬁc size of DTJ j in eac h slot. W e h av e the following propo sition. Proposition 1: A ssuming trafﬁc of DTJs is i.i.d in each slot with mean ~ λ , the QTF algorithm stab ilizes the system fo r a given parameter V . In addition, an upper b ound o n a verag e queue length is lim T →∞ 1 T T X t =1 M X j =1 E ( Q j ( t )) ≤ P i ∈∪ Γ j , ∀ j r 2 i K max i 2 + P j D m j 2 + V g ∗ e ( ǫ ) ǫ (25) Further , average cost achieved by QTF , which has a cost denoted as g t q ( S t ) in each slot t , is upper boun ded as lim T →∞ 1 T T X t =1 E [ g t q ( S t )] ≤ V g ∗ e + P i ∈∪ Γ j , ∀ j r 2 i K max i 2 + P j D m j 2 V (26) wher e g ∗ e is the optimal solutio n to pr oblem (7), and ǫ is a p ositive valu e, g ∗ e ( ǫ ) is the o ptimal solution to (7) with ~ λ r epla ced by ~ λ + 1 ǫ . Pr oof: In the App endix. V I I . D I S C U S S I O N S A. Joint DSJ and DTJ d esign Although SSTF an d QTF are both p roposed for trough- ﬁlling, with some modiﬁcations, they ca n b e u sed f or joint DSJ and DTJ ca pacity pr ovisioning. First, S t i 0 , for DSJs, w ill become a p art of the d ecision variables, to gether with S t ij for DTJs. An imp ortant issue is how to gua rantee servic e requirem ents for DSJs. For SSTF , we can simply intr oduce a QoS constrain t for DSJs. For example, if the slot length is large, i.e., tens of seconds to minutes, follo wing [4], we ca n estimate the mean of DSJ rate for IDC i in th e beginning of the curren t s lot, den oted by λ t i 0 . Note that it is p ossible for λ t i 0 to inc orpora te tr afﬁc from other IDCs du e to certain trafﬁc shifting schemes. Let r i 0 denote unit service ra te for DSJs in IDC i . Following [34], a delay constraint can be imposed, e.g ., 1 r i 0 S t i 0 − λ t i 0 ≤ δ , which is a linear co nstraint on S t i 0 , and thus can be easily incorpor ated to our convex optimization prob lem. W hen the tim e slot length is small, such as h undred s of milliseco nds, it is unlikely to estimate mean of DSJ trafﬁc in the curr ent slot. I n this case, one may assume DSJ trafﬁc follows certain distributions based on past measurem ent. One ca n d eﬁne o utage prob ability as a QoS constraint. That is, the probab ility that the load of DSJ in IDC i , i.e., D t i 0 , exceeds capacity S t i 0 . The DSJ QoS constraint can be expre ssed as Pr ( D t i 0 > S t i 0 ) ≤ δ i . Based on the knowledge of trafﬁc distribution, e.g., Gaussian or exponential distribution, one can rewrite the co nstraint func tion as a conv ex function o f S t i 0 . Since time time slot length is small, outage prob ability can be easily measured. Adjusting S t i 0 is probab ly necessary to eliminate the discrepancy betwe en the real d istribution o f D t i 0 and the assumed one u sing stocha stic approx imation schemes. Similar appro aches can be applied to extend QTF . For example, one can use the outage p robability as a DSJ QoS constraint. Let δ i denote outage probability constraint. T o enforce it, we can design a virtu al outage queue. L et I i ( · ) a s an indicator fun ction. W e have I i ( t ) = 1 if th ere is o utage in slot t , i.e., D t i 0 > S t i 0 , and I i ( t ) = 0 otherwise. W e use O i ( t ) to de- note the vir tual outage queue b acklog in slot t , wh ich u pdates as O i ( t + 1) = max { O i ( t ) − δ i , 0 } + I i ( t ) . It can be sho wn that the virtual qu eue is stable if lim T →∞ P T t =1 I i ( t ) T ≤ δ i , i.e., ou tage pr obability con straint satisﬁed. Note th at δ i can be considered as the service rate of th e virtual q ueue. Using the v irtual outag e queue, we can mod ify QTF to provide capacity provisioning for DSJs. It is our future work to further in vestigate the joint desig n o f cap acity provisioning an d QoS assurance for DSJs, and troug h-ﬁlling. B. Implemen tation issues a nd cave ats In our schemes, the decision -maker need s to gather the input in the beginnin g of eac h slot. The messaging d elay is about tens of milliseconds [36], and each IDC o nly h as a few parameters sen t to the decision-m aker . Each time slot can be from several seconds to some m inutes. Thus th e m essaging overhead is negligible. Note that the decisio n overhead is also negligib le since the conve x o ptimization pro blems can be solved efﬁciently . Load shifting overhead, i.e., network delay , can be easily constrain ed by con trolling th e bandwidth f or DTJs. In this paper, we consider homog enous servers for simplic- ity . Howe ver, in an I DC, servers may be d ifferent in terms of power con sumption, maximum speed, an d m emory . T o ap ply our schemes, we can fu rther classify the ser vers to different units. Homog enous or similar servers belong to one un it. The 1.00 10.00 100.00 100 200 300 400 500 600 700 800 0.5 1 1.5 2 2.5 3 3.5 Aver age delay Aver age cost Thousands Ratio betw een load of DT J and load of DSJ Cost by OSSI Cost by SSTF Cost by QTF (V=1000) Cost by QTF (V=1) Delay by OSSI Delay by SSTF Delay by QTF (V=1000) Delay by QTF (V=1) Fig. 1 : Delay and cost of different schemes with different ratio between lo ad o f DTJ and DSJ. input is no lo nger IDC-based, but unit- based. In practice, we can simply classify servers accordin g to their ages. T yp ically , there are three stock-keepin g un its ( SKUs) in an IDC, i.e., latest, one-ye ar-old, and two-year-old. In practice, som e DTJs may nee d to be ﬁnishe d by a deadline. D ifferent cla sses o f DTJs may h av e different d ead- lines. Designing en ergy-efﬁcient DT J scheduling algo rithms with hetero genous deadlines for I DCs is an interesting op en problem . W e will conside r it in the fu ture. In this pape r , we mainly focu s o n CPU-in tensiv e DSJs. W e will also extend our work to I/O intensive DTJs. Besides, we will also explicitly conside r the ef fect o f virtua lization, by which perf ormance versus power cu rve may b ecome more difﬁcult to quantify [22][36]. V I I I . P E R F O R M A N C E E V A L UAT I O N In this section, we ev aluate the perf ormance of SSTF and QTF , using both synthetic and real tra ces. A. Syn thetic traces based simulation 1) Simulation setup : W e consider ﬁ ve IDCs in d ifferent lo- cations. Th ere are totally ten DTJ q ueues rando mly o riginated in one of the ﬁve IDCs. The IDC set Γ j that can serve a DTJ j is chosen ran domly . Id le power co nsumption 1 − ρ is set as 0.5. T o cr eate an ergodic setting , we set 100 states, in each of which we set dif ferent total cap acity , load shifting con straint, demand by DSJs, and electr icity pr ices. Capacity of each IDC is u niform ly distributed from 10k to 15k. Load shif ting constraint is unifor mly distributed from 300 0 to 40 00. Load shifting cost p arameters are set the same as in [ 47]. Electricity price is u niformly distributed from 1 to 10. DSJ dem and, set as a ratio of the total cap acity , is rando mly distrib uted from 0 to 0 .4. Thus average D SJ demand is ab out 20% of th e total capacity . W e con sider different ratios between the load of DT J and DSJ, by setting different average arri val rates of DTJs . The ratios are 0.5, 1, 1.5, 2, 2. 5, 3 , and 3.5, respectively . Th us the percentag e of DTJ demand in the total capacity ranges from 10% to 70%. W e simulate 100k time slots in each of th e 30 simulation settings. In different time slots, a system state is chosen r andomly accord ing to a pred eﬁned prob ability . 0 1 2 3 4 5 6 7 8 9 1 5 9 1 3 1 7 2 1 2 5 2 9 3 3 3 7 4 1 4 5 4 9 5 3 5 7 6 1 6 5 6 9 7 3 7 7 8 1 8 5 8 9 9 3 9 7 1 0 1 Nor mali z ed r a t e One hun dr ed t ime sl ot s sn ap sh ot One t ime sl ot r a t e b y Q TF 0 0.5 1 1.5 2 2.5 3 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101 Normalized rat e (a) One hundred time slots s napshot One time slot r ate by SSTF 0 1 2 3 4 5 6 7 8 9 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101 Normalized rat e (b) One hundred time slots snapshot One time slot r ate by QTF 6 5 6 9 7 3 7 7 8 1 85 89 93 97 1 0 1 One hun dr ed ti me sl ot s sn ap sh ot 6 5 6 9 7 3 7 7 8 1 8 5 8 9 9 3 9 7 1 0 1 One hun dred t ime s lot s s nap s hot 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1 1.02 1.04 1.06 1.08 1.1 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 Normalized service r ate (c) Index of 1000 time slots (from 1 t oo 100) 1000 time slots averag e rate by SSTF 1000 time slots averag e rate by QTF Fig. 2: Rate assignment by SSTF and QTF on different time r esolutions. 2) Simu lation r esults: W e ﬁrst comp ute the Optimal Solu- tion to (7) with the Sy stem d istribution In formatio n, which is difﬁcult to obta in in pra ctice. W e nam e it OSSI an d c ompare it with SSTF and QT F . First, by Fig. 1, we obser ve that the cost of SSTF is very close to that of OSSI, under different DTJ lo ad ratios. Their q ueue delay s are also very close. In this paper, since we also consider idle power consump tion, i.e., (1 − ρ ) K t i , and DSJ power consumption. When load of DTJs is low , such as with the ratios o f 0.5 an d 1, costs o f different schemes are very close because the im pact of DT Js is small. T o study the co n vergenc e of SSTF , we also co nsider the DTJ power co nsumption separately . Results show th at SSFT and OSSI achieve very clo se perfor mance in terms of cost and delay . W e do no t plot results her e due to the page limit. In Fig. 1, we consider QTF with V = 1 an d V = 10 00 , respectively . For b oth cases, we see QTF lea ds to a h igher cost, but the queue delay is sign iﬁcantly smaller compared to that by OSSI and SSTF . QTF with V = 100 0 has a slightly larger co st than OSSI and SSTF , b ut much smaller delay , ev e n when DTJ load is hig h, e.g. , with a ratio of 3.5. In this case, QTF with V = 1 has a very sm all d elay , i.e., almost 1, with a mu ch hig her cost. Thus, in p ractice, one can tune the value of V to obtain a desirable trad eoff between cost and d elay , especially when load of DTJs is high . In Fig . 1, the queue d elay o f OSSI and SSTF is very large, which holds ev en when we set th e average service rate (slightly) larger than the ar riv al rate. W e examine the serv ice rates of a D TJ q ueue in d ifferent time resolutions to ﬁnd the reasons. W e ﬁrst consider one slot service rate, n ormalized over the av erage DTJ arrival rate. W e p lot 1 00 slots rate in Fig. 2a and (b), fo r SSTF and QTF , respectively (Service rate by OSSI is very similar to that by SSTF). Note in Fig. 2, the r atio o f DTJ load is 1 and V for QTF is 10 00. It is observed that rate assign ment by SSTF is quite e ven for each slot. Th e DTJs always receive a service rate in each slot. Rate assignment by QTF is much more bursty . Service rate is no n- zero only by every se veral slots. This result is consistent to Observation 1 where we show cap acity allocation by QTF f or a single IDC is a threshold-based p olicy based on the queue length. I n the time slo ts without being served, jo bs accu mulate and queue delay in creases. This is the reason that there is a queue delay about 5 in Fig. 1 for QTF ( V = 1000 and DTJ load ratio of 1) . Nevertheless, q ueue stab ility is gu aranteed since service rates are fairly large every several slots such that 0 50 100 150 200 250 300 350 400 200 210 220 230 240 250 260 270 280 290 10 50 70 90 Aver age delay (min) Aver age cost Thousands Percentag e of DT Js in total load (%) Cost by BE S Cost by SSTF Cost by Q TF Delay by BES Delay by SS TF Delay by Q TF Fig. 3: Dela y and cost b y SSTF and QTF on real trafﬁc trace jobs accu mulated can be ﬁnished . W e also examine a large time resolu tion r ate, i.e., average rate over every 100 0 time slots ( normalized over average arrival rate). W e plot results in Fig. 2(c). An interesting observation is that in this case, rate by SSTF is mo re bursty than that b y QTF . Then du ring the periods that normalized service rates are lower than 1, DTJs accumulate su ch that queue len gth is fairly large in mo st slots. Although jobs can be ﬁnished dur ing per iods when service rates are large tha n 1 , sign iﬁcant d elay c annot b e av oid ed. One can increase average service rates of SSTF to ob tain a smaller delay . But mu ch m ore c apacity needs to b e consumed, which results in muc h higher cost. In m any cases when load of DTJ is high , th ere is little space for SSTF to increa se service rates. QTF can lead to arb itrary de lay by tuning V . One importan t p roperty of QTF is that no matter V is large or small, the average service rate of QTF is always close to arrival rate, be cause it leverages the queue inf ormation . Thus QT F provides a more efﬁcient method in saving cost an d redu cing delay . There are o ther ﬁndings, such a s load shifting also plays an im portant r ole in reducing cost and queue delay . Due to the page limit, we omit them here. B. Real trace b ased simulation In this subsection, we use re al datacen ter trafﬁc trace to study th e perfor mance of SSTF and QT F . Our tra ce co mes from a com mercial d atacenter operated by a large cloud service provider in U.S. W e obta in a Had oop distrib u ted ﬁle system (HDFS) log for one datacente r f or thirty days. The HDFS log records the information of all received pa ckets, including the packet size and time-stamp. The original d ata does not differentiate DSJs and D TJs ( In fact, to d ifferentiate such tr afﬁc without app lication-layer infor mation is itself a challengin g issue in practical data center operation s, wh ich is an active research to pic itself.) . T o addre ss this issue, we simply adopt a thr eshold-ba sed policy . W e assume that a large packet is likely to be delay tolerant, and tr eat a p acket with a size larger th an a certain threshold as DTJ. T his classiﬁcation is rational, as authors in [ 48] indicate th at most Internet req uest such as searc hing and web browsing are are o nly a few kb in size. W e set threshold as 1 0, 50, 100, and 150Mb, to obtain different ratio s between DSJ load an d DTJ load , which results in the percentage of DTJ load in th e total lo ad roug hly as 90%, 70 %, 50%, and 10 %, respectively . Note here we assume one unit (M bit) of DSJs req uires one unit of cap acity , and one unit o f DT Js req uires 0.13 3 unit o f cap acity o n average, by the same r ate setting as the ab ove simulations (average unit rate r ij is roug hly 7.5) . T o simulate multiple IDCs and multiple DTJ queues, we choose twenty days of large packet traces as ten DTJ trafﬁc traces, so th at each o f them h as a two-d ay trafﬁc trace. W e choose ten days o f small packet traces as the dem and of DSJ for ﬁv e IDCs considered. W e consider a time slot leng th as 20 seconds. Ther efore we have 8640 time slots for each two-day trafﬁc trace. Further, we use the electricity data in ﬁ ve wh olesale market regions in 02/22/20 11. They are California (Hub SP 15- EZ), Louisiana (Entergy ), New Englan d (NE POOL Mass), Pennsylvania (PJM W est), and T exas (ERCOT SOUTH). Th e capacity is u niformly distributed between 1000 and 1 200 . The bandwidth constrain t is uniform ly distributed betwee n 1 000 and 150 0. The other setting is th e same as in the synthe tic trafﬁc case. W e compa re SSTF and QTF to th e best effort service scheme (BES). In e ach slot, BES ser ves as much demand as possible for DTJ queu e, in a best-effort fashion . Wh en th e av ailable c apacity in an IDC is not eno ugh to ﬁnish cu rrent jobs, it equally shares the ca pacity amo ng all DTJ queu es. In the simu lation, we assume SSTF knows the av e rage DTJ arriv al r ate. The average ser vice rate of SSTF is set equal to the average DTJ arri val rate. The control variable o f QTF is set to 100 0. W e observe f rom Fig. 3 th at for different percentages of DTJ load, BES always leads to the highest cost, while SSTF always has the lowest cost. The delay of SSTF is large, almo st 5 hou rs. One reaso n is that it explores temporal electrical p rice div ersity in a large time scale. On e may th ink that BES would result in the lowest d elay . But in Fig . 3, a verage delay of BES is always larger th an that of QTF . T he reason is that load shiftin g is not used in BES. Th us queu es suffer large delay in an IDC with less av ailab le capacity . This illustrates that load shifting is not only n ecessary in reducing cost, b ut also important in e xplorin g available cap acity to improve d elay perfor mance. I n summ ary , in Fig. 3, we ob serve that QTF is efﬁcient in both saving cost and re ducing delay . It is also observed that as the percen tage of DTJ loa d increases, the to tal cost decreases and the average DT J delay also decreases. The reason is that when DSJ load decreases, total load am ount d ecreases as DSJ requires mo re capacity per unit trafﬁc. Mo re cap acity is thus a vailable for DTJ, which leads to a sm aller DTJ delay and m ore s pace for energy saving. I X . C O N C L U S I O N S In this paper , we study intelligent trough ﬁllin g that achie ves both energy ef ﬁciency and good delay performance. W e design joint dynamic speed scalin g a nd load shifting schemes. W e ﬁrst p resent a stoch astic subg radient based trough ﬁlling algorithm , named SSTF , which solves a convex optim ization problem for capacity allocation and load shiftin g in each slo t. SSTF d oes n ot ne ed the inf ormation of unde rlying distribution of system state. The SSTF can conver g e to optimal cost with a certain service rate constraint. W e fu rther pro pose a queue- based trou gh ﬁlling algorithm, named QTF , which also solves a con vex optimization problem for capacity allocation and load shifting in each slot. W e show QTF can achiev e optimal tradeoff between queue delay an d cost. Our extensi ve simulations based o n both synthetic and r eal datacenter tra ces show that SSTF achieves optimal co st, but has a large delay . QTF achiev es both d esirable cost and delay . In practice, SSTF can be applied to the scenar io where DTJs can hav e a large time d elay , e.g., half o f a day . QTF can be a pplied to the case where smaller time delay is desirable, e.g. , tens of minute s. R E F E R E N C E S [1] Y . Y . Chen, et al., “Managin g Server Energy and Operat ional Costs in Hosting Center s, ” in A CM Sigmetric s, 2005. [2] G. Chen, W . He, J. Liu, S. Nath, L. Rigas, L. Xiao, and F . Zhao, “Ener gy-A ware Server Provisioning and Load Dispatching for Connect ion-Inte nsi ve Internet Services, ” in U SENIX NSDI, 2008. [3] C. G. Plaxton, Y . Sun, M. Tiwari, and H. V in, “Reconﬁgurab le Resouce Scheduli ng, ” in ACM SP AA, 2006. [4] M. Lin, A. Wie rman, L.L .H. Andrew and E. Thereska, “Dynamic right- sizing for powe r-proportional data centers”, in IEEE INFOCOM, 2011. [5] H. Lim, A. Kansal, and J. Liu, “Power Budgeting for Vi rtuali zed Data Centers, ” in USENIX ATC, 2011. [6] R. uRGA ONKAR, U. C. Ko zat, K. Igarashi, M. J . Neely , “Dynamic Re- source Allocatio n and Power Management in V irtual ized Data Centers, ” in IEEE NOMS, 2010. [7] J. S. Chase, et.al, “Managing e nergy and serv er resourc es in host ing cente rs, ” in ACM SOSP , 2001. [8] M. W eiser , B. W elch, and A. Demers, S. Shenke r , “Schedul ing for Reduce d CPU Energy , ” in USENIX OSDI, 1994. [9] J. R. Lorch and A. J. Smith, “Improving Dynamic V oltage Scaling Algorith ms w ith P A CE, ” in ACM SIGMETRICS/P erformance , 2001. [10] D. Grunwald, et.al, “Policies for Dynamic Clock S chedul ing, ” in USENIX OSDI, 2000. [11] N. Bansal, K. Pruhs, and C. Steins, “Speed Scaling for W eighted Flo w times, ” in ACM SODA , 2007. [12] A. W ierman L. L .H, and A. T ang, “Po wer-A ware Speed Scaling in Processor Sharing Systems, ” in IEEE INFOCOM, 2009. [13] T . Horvat h, T . Abdelzaher , K. Skadron, and X. L iu., “Dynamic V oltage Scaling in Multitie r W eb Server s with E nd-to-End Delay Control, ” in IEEE T ransact ions on Computers , V olume 56 , Issue 4, April 2007. [14] L. L. Andre w , M. Lin, and A . W ierman, “Optimality , fairne s s, and robustn ess in speed scaling designs. In ACM SIGMETRICS, 2010. [15] C. Lim, and A T ang, “Dynamic Speed S cali ng and Load Balancing of Interco nnected Queues, ” in IT A W orkshop, 2011. [16] L. Chen, N. L i and S. H. Low , “On the Interac tion between Load Balanc ing and Speed Scaling, ” IT A W orkshop, 2011. [17] O. S. Unsal and I. Kor en, “System-le vel powe r aw are design techniques in real-t ime system, ” in Pr oc. of IEEE , vol. 91, no. 7, 2003. [18] O. S. Unsal and I. Kor en, “System-le vel powe r aw are design techniques in real-t ime system, ” in Pr oc. of IEEE , vol. 91, no. 7, 2003. [19] Q. Zhu, et.al, “Hibernat or: Helping Disk Arrays Sleep through the W inter , ” in ACM SOSP , 2005. [20] J. T orres, et at., “Reduci ng W asted Resources to Help Achie ve Green Data Center s”,in IE EE IP DPS 2008. [21] R. Nathuji, a nd K. Schwan, “V irtualPo wer: Coordinated Po wer Man- agement in V irtuali zed Enterprise Systems”, in ACM SOSP , 2007. [22] S. Srikantaiah, A. Kansal and F . Z hao., “Ener gy A ware Consolidati on for Cloud Computing, ” in USENIX HotP ower , 2008. [23] J. Berral, et al, “T ow ards energy-a ware scheduling in data centers using machine lea rning, ” in e-Energ y , 2010. [24] X. Meng, et.al, “Efﬁcien t resource provisioning in compute clouds via VM multip lexi ng, ” in ICA C, 2010. [25] M. W ang, X. Meng, and L. Zhang, “ Consolida ting V irtual Mac hines with Dynamic B andwidth Demand in Dat a Center s, ” in IEEE INFOCOM MINI-CONFERENCE, 2011. [26] L. Lu and P . V arman, “ W orkload De composition for Po wer Ef ﬁcient Storage Systems, ” in USENIX HotP ower , 2008. [27] A. Gandhi, et.al, “Optimal Power Allocat ion in Server Farms, ” in AC M SIGMETRICS, 2009. [28] X. W ang, M. Chen, C. Lefurgy , T . W . Kell er , “SHIP: Scala ble Hierar - chica l P o wer Control for L arge -Scale Data Cente rs, ” in P ACT , 2009. [29] C. Ste wart, and K. Shen, “Some Joules Are More Precious Than Others: Managin g Rene wable Energy in the DC, ” in USENIX HotP ower , 2009. [30] X. Fan , W . -D. W eber , and L. A. Barroso “Power Pro visioning for a W arehouse-sized Computer , ” in ISCA, 2007. [31] V . V alancius, et al, “Greening the internet with nano data centers, ” in ACM CoNext , 2009. [32] A. Qureshi, R. W eber , H. Balakrishnan, J. Guttag, and B. Maggs, “Cut- ting the Electri c Bill for Internet-Sca le Systems, ” in ACM SIGCOMM, 2009. [33] K. Le, R. Bianchini y , M. Martonosiz, and T . D. Nguyeny , “Cost- and Energy-A ware Load Distribution Across Data Centers, ” in USENIX HotP ower , 2009. [34] L. Rao, X. Liu, L, Xie, and W . Liu., “Minimiz ing Electri city Cost: Optimiza tion of Distribut ed Internet Data Centers in a Multi-Elec tricity- Marke t E n vironment, ” in IEEE INFOCOM, 2010. [35] L. Rao, X. Liu, M. Ilic, an d J. Liu, “MEC-IDC: Joi nt L oad Balanc- ing and Po wer Control for Dist ribut ed Interne t Data Centers, ” in the F irst ACM/ IEEE Internatio nal Con fer ence on Cyber-Physica l Systems (ICCPS) , 2010. [36] R. Stanoje vic and R. Shorten., “Distribute d dynamic speed scaling, ” in IEEE INFOCOM, 2010. [37] Z. Liu, et.al, “Greening geographica l load balanci ng”, in ACM SIG- METRICS, 2011. [38] A. Narayan S, S. Sharangi, A. Fedorov a, “Global Cost Di versity A ware Dispatch Algorithm for Heterogeneou s Data Cente rs, ” in ICPE, 2011. [39] Niv Buchb inder , Na vendu Jain, and Ishai Menache, “Online Job- Migrati on for Reducin g the Electri city Bill in the Cloud, ” in IEE E INFOCOM, 2011. [40] R. Urgaonkar , et.al , “Opt imal Powe r Cost Ma nagement Using Stor ed Energy in Data Centers, ” in ACM SIGMETRICS, 2011. [41] N. Laoutari s, et.al, “Inter -Datacenter Bulk Transfers with NetStitc her , ” in AC M SIGCOMM, 2011 [42] James Hamilton ’ s blog: “Int er-da tacent er replic ation & geo-redund ancy , ” perspect i ves.mvdiron a.com/2010/05/10/ InterDatace nterRep licationGeo- Redunda ncy .aspx. [43] J. Liu, F . Zhao, X. Liu, and W . He, “Challenges T owa rds Elastic Power Manageme nt in Internet Data Cente rs, ” in the 29th IE EE International Confer ence on Distrib uted Computing Systems W orkshops (ICDCSW), 2009. [44] James Hamilton, “Where Does the Power Go in High-Scale Data Centers, ”, http://www .mvdir ona.com/jrh/work/ . [45] Z. Z hang, et.al, “Optimizing Cost and Performance in Online Service Provid er Networks, ” in USENIX NSDI, 2010. [46] N. Rasmussen, “Electri cal Efﬁcienc y Modeling for Data Centers, ” White paper No. 113. [47] B. Fort z and M. Thorup, “Inte rnet TE by Optimizing OSPF W eights, ” in IEEE INFOCOM, 2000. [48] N. Jain, et al, “VL2: A Scalable and Fle xible Data Center Netw ork, ” in ACM SIGCOMM, 2009. [49] S. Boyd, and L. V andenbe rghe, “Con ve x Optimizati on, ” Cambridge Univer sity Press . [50] D. P . Bertsekas, Nonlinear Programming. Athena Scient iﬁc, 1999. [51] L. Georgi adis, M. J. Neely , L. T assiulas, “Resource Allocation and Cross-Layer Cont rol in W ireless Networks, ” Foundatio ns and Trends in Netw orking, V ol. 1, no. 1, pp. 1-144, 2006 [52] D. Xu, X. Liu, “Geographi c T rough Filling for Internet Datacenter s, ” http:// csiﬂabs.cs.ucdav is.edu/ ˜ danxu/tr oughﬁlling.pdf A P P E N D I X T o p rove Proposition 1, we ﬁrst need Lemma 1 as Lemma 1: F or the o ptimization pr o blem ( 7), with ~ λ re- placed b y ~ λ + 1 ǫ , th e r esulting optimal solution g ∗ e ( ǫ ) r eaches g ∗ e as ǫ r ea ches 0. Pr oof: W e write the Lag rangian o f pr oblem ( 7) as L ( ~ µ, ~ S ) = X ω ∈ Ω π ω g ω ( S ω ) − M X j =1 µ j ( X ω ∈ Ω π ω X i ∈ Γ j r ij S ω ij − λ j ) (27) When ~ λ replac ed by ~ λ + 1 ǫ , we h av e L ( ~ S , λ + 1 ǫ, ~ µ ) → L ( ~ S , λ, ~ µ ) as ǫ → 0 . Since (7) is a conve x optimizatio n problem . W e ha ve g ∗ e ( ǫ ) reaches g ∗ e as ǫ reaches 0. W e n ext presen t pro of to Pr oposition 1. Pr oof: Consider the M DTJ qu eues ~ Q ( t ) = ( Q 1 ( t ) , . . . , Q M ( t )) . W e in troduce a non-n egati ve L yapu nov function as L ( ~ Q ( t )) = P M j =1 Q 2 j ( t ) . Deﬁne one-slot L y apunov drif t as ∆( t ) = E n L ( ~ Q ( t + 1 )) − L ( ~ Q ( t )) | ~ Q ( t ) o (28) In terms of the fact that ( max [ a − b , 0 ] + c ) 2 ≤ a 2 + b 2 + c 2 + 2 a ( c − b ) , for any a, b, c ≥ 0 , we h av e Q 2 j ( t + 1) − Q 2 j ( t ) ≤ X i ∈ Γ j ( r ij S t ij ) 2 + D t j 2 + 2 Q j ( t )( D t j − X i ∈ Γ j r ij S t ij ) , ∀ j (29) Based on (29), we furth er have ∆( t ) ≤ E   M X j =1 X i ∈ Γ j ( r ij S t ij ) 2 | ~ Q ( t )   + E   M X j =1 D t j 2 | ~ Q ( t )   + 2 E   M X j =1 Q j ( t )   D t j − X i ∈ Γ j r ij S t ij   | ~ Q ( t )   . (30) Note P M j =1 P i ∈ Γ j r 2 ij S t ij 2 is bound ed by P i ∈∪ Γ j , ∀ j r 2 i K max i 2 , where r i = max { r ij | j ∈ Π i } , i.e, the m aximum ser vice rate with f ull server capacity . I n each slot, we also have assum ed th at the arriv al trafﬁc size of each DTJ j is bou nded by D m j . For br evity , h ere we deﬁne B = P i ∈∪ Γ j , ∀ j r 2 i K 2 i + P j D m j 2 . Sin ce trafﬁc of DTJs in each slot is indep endent of queue backlog ~ Q ( t ) , we can rewrite (30 ) as ∆( t ) ≤ B + 2 M X j =1 Q j ( t ) λ j − 2 E   M X j =1 Q j ( t ) X i ∈ Γ j r ij S t ij | ~ Q ( t )   . (31) W e consider th e drift-plus-c ost for the system wh ere co st is resulted b y QTF . The cost is the expected cost that is condition al o n qu eue backlog in time slot t , which can be written as E ( g t q ( S t ) | ~ Q ( t )) . Note V is a contr ol v ar iable, we have ∆( t ) + V E [ g t q ( S t ) | ~ Q ( t )] ≤ B + 2 M X j =1 Q j ( t ) λ j − 2 E   M X j =1 Q j ( t ) X i ∈ Γ j r ij S t ij | ~ Q ( t )   + V E [ g t q ( S t ) | ~ Q ( t )] . (32) By (3 2), we can see that QTF min imizes dr ift-plus-cost in each time slot. Thu s we have 2 M X j =1 Q j ( t ) λ j + 2 E   V g t q ( S t ) − M X j =1 Q j ( t ) X i ∈ Γ j r ij S t ij | ~ Q ( t )   ≤ 2 M X j =1 Q j ( t ) λ j + 2 E " V g ∗ e ( ǫ ) − M X j =1 Q j ( t )( λ j + ǫ ) | ~ Q ( t ) # = − 2 ǫ M X j =1 Q j ( t ) + V g ∗ e ( ǫ ) . (33) By (32)(33), we have ∆( t ) ≤ B − 2 ǫ M X j =1 Q j ( t ) + V g ∗ e ( ǫ ) − V E [ g t q ( S t ) | ~ Q ( t )] . (34) T aking expectation s of dr ift ∆( t ) with respec t to the distribu- tion of the rando m queu e backlo g ~ Q ( t ) at time t , we have E h L ( ~ Q ( t + 1 )) − L ( ~ Q ( t )) i ≤ B − 2 ǫ M X j =1 E [ Q j ( t )] + V g ∗ e ( ǫ ) − V E [ g t q ( S t )] . (35) The above inequity is satisﬁed f or all time slot t . Su mming the ∆( t ) over time slot t = 1 , 2 , . . . , T , we have E h L ( ~ Q ( T )) − L ( ~ Q (1)) i ≤ T B − 2 ǫ T X t =1 M X j =1 E [ Q j ( t )] + T V g ∗ e ( ǫ ) − V T X t =1 E [ g t q ( S t )] . (36) By (36), we can get 1 T T X t =1 M X j =1 E ( Q j ( t )) ≤ B + V g ∗ ( ǫ ) ǫ + L ( ~ Q (1)) T ǫ . (37) As T → ∞ , we have lim T →∞ 1 T P T t =1 P M j =1 E ( Q j ( t )) ≤ B + V g ∗ e ( ǫ ) ǫ . Thus the queue backlog is bounded and system stability holds. Further 1 T T X t =1 E [ g t q ( S t )] ≤ g ∗ e ( ǫ ) + B V + L ( ~ Q (1)) T V . (38) As T → ∞ , we ha ve lim T →∞ 1 T P T t =1 E [ g t q ( S t )] ≤ g ∗ e ( ǫ ) + B V . Since by Lemma 1, we hav e g ∗ e ( ǫ ) → g ∗ e as ǫ reache s 0. ( 38) is independe nt of ǫ . T hus we have lim T →∞ 1 T P T t =1 E [ g t q ( S t )] ≤ g ∗ e + B V holds.

Geographic Trough Filling for Internet Datacenters

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment