Flow Splitting with Fate Sharing in a Next Generation Transport Services Architecture

Flo w Splitting with F ate Sharing in a Ne xt Generation T ransport Services Arc hitecture UNPUBLISHED DRAFT Ja nardha n Iyen gar F r anklin and Marshall College jiye ngar@f andm.edu Br ya n Fo rd Max Planc k Institute for Software Systems baf ord@mpi-s ws.org ABSTRA CT The challeng es of optimizing end-to-end perform ance over div erse Internet paths h as driven widespread adoption of in- path optimizers, which can destructively interfere with TCP’ s end-to- end seman tics and with each other, and ar e incom- patible with end-to- end IPsec. W e identify the architectu ral cause o f th ese con ﬂicts and resolve them in T n g , a n exper- imental ne xt-gener ation transport services architecture, by factoring con gestion con trol from e nd-to-en d semantic fu nc- tions. Throu gh a technique we ca ll queue sharing , T ng en- ables in- path de vices to inter pose on, split, and o ptimize congestion contr olled ﬂo ws w ithout affecting o r seeing the end-to- end content riding these ﬂows. Simulations show that T ng ’ s decoup ling cleanly addresses se veral common perfor- mance pr oblems, such as communication over lossy wireless links and reduction of buf fe ring-ind uced latency on residen- tial links. A working prototype and se veral incr emental de- ployment paths suggest T ng ’ s practicality . 1. INTR ODUCTIO N Ever since T CP cong estion control was intro duced [56] , we have found reasons to tweak it within the network . Per- forman ce en hancing proxies ( PEPs) [16] im prove TCP’ s poor perfor mance over loss-pron e wireless links [ 109], intermit- tent mobile links [8], and h igh-latency s atellite links [26]. Due to their effectiveness and ease of d eployment, PEPs now form the technical foundation of a bo oming $1 billion W AN optimization market [71], and are joining the growing class of midd leboxes such as ﬁrew alls [4 5], N A Ts [9 1], and ﬂow- aware routers [84] pervading the Inter net. PEPs are in theory co mpatible with the e nd-to-en d prin- ciple [ 86], which argues that reliability mechanisms ne ed to be end -to-end b ut explicitly allows for in-network mecha- nisms to enha nce per forman ce as long as they do not replace end-to- end reliability checks. Because the I nternet’ s ar chi- tectur e lump s congestion control with end-to -end reliability in th e transpor t layer, h owe ver, PEPs in the path cannot a f- fect on e functio n with out interferin g with the other . Many PEPs violate fate-sharin g [27] by introducin g “h ard state” in the ne twork, cau sing applica tion-visible failures if a PEP crashes. All PEPs are inco mpatible with transport-n eutral security m echanisms such a s end -to-end IPsec [6 3], which prevent th e PEP from seeing the rele vant tr ansport headers. Our novel solution to this architectural dilemma is to refac- tor the tra nsport layer so that PEPs c an cleanly inter pose on Figure 1: T ng Architecture Layering and optimize co ngestion co ntrol be havior , without inte rfer- ing with, or even seein g the protocol headers fo r , end-to-end function s such as re liability . W e develop this appr oach in the context of T ng , an exper imental next-gener ation trans- port that builds on ide as introd uced earlier [42, 44] to addr ess a broad er class of transport is sues. T ng break s transports i nto four layers, shown in Figure 1. T ng ’ s Semantic Layer impleme nts end-to -end abstractions such as reliable byte s treams; its o ptional Isola tion Layer protects u pper end- to-end layers fro m in-path interf erence; its Flow Re gu lation Layer f actors out performan ce con cerns such as co ngestion control to enable p erforma nce manag e- ment by PEPs; and its Endp oint Layer factors ou t endpo int naming concerns such as p ort n umbers to en able clean N A T/ ﬁrew all traversal [41 ]. W e make no claim that T ng repre- sents “the ideal architecture , ” b u t use it here only to de velop a cleaner solution to the problem of PEPs. In this paper, we develop T ng ’ s Flow Layer to e nable PEPs in th e path to inter pose on or split Flow Layer sessions, much like trad itional PEPs ofte n split T CP sess ions [16]. Since T ng ’ s end-to-e nd secu rity and reliability functions are implemented separate ly in h igher layers, this ﬂow s p litting av oids interfering with higher end-to -end function s. T ng ’ s end-to- end layer s treat Flow L ayer sessions as “soft s tate, ” and can r estart a ﬂow that fails d ue to a PE P crash or network topolog y change, p reserving end-to-en d reliability and fate- sharing. A key technical challenge ﬂow splittin g presents is joining the co ngestion control loops of con secutive path sections to yield en d-to-en d congestion control over the fu ll path, a challenge we solve via a simple but e ffecti ve tech- nique we call queu e s haring . Throu gh simulations we demonstra te that ﬂow splitting via q ueue shar ing can effecti vely add ress a variety o f co m- mon performan ce issues, suc h as optimizing the perf ormance 1 of lossy last-mile wireless links an d redu cing queu eing la- tencies on residential broadb and link s. While our sim ula- tions do not a ttempt to analyze all relev ant scenario s, they illustrate the potential uses o f ﬂow splitting and sug gest the feasibility of imp lementing it via qu eue sharing. W e also demonstra te the fea sibility o f the T ng architectu re thro ugh a work ing user-space proto type that fu nctions on both r eal and simulated networks. Finally , we discuss appr oaches to incrementa l d eployment, noting that with m oderate costs, a T ng stack could be (1) built entirely by rearranging exist- ing protocols without creating an y new ones; (2) d eployed at OS le vel transp arently to existing applications; and (3) made compatible with an d ev e n beneﬁt from existing PEPs b y us- ing legacy TCP as an imperfec t b ut workable “Flow Layer . ” This w ork makes the following co ntributions. First, we identify the Inter net’ s architectural coupling of congestion control with end-to -end semantics in the transpo rt layer as the source of many of the difﬁculties PEPs cr eate, and present a clean so lution based o n d ecouplin g these functions. Sec- ond, we introduce queue sharing as a simple b ut effectiv e technique for joinin g cong estion control loops at PEPs in the Flow Layer . Third, w e demonstrate that the proposed decoup ling is practical and addre sses a variety of common perfor mance iss ues that concern home and business users. Section 2 o f this pape r examines congestion control chal- lenges and existing solution s. Section 3 brieﬂy summarizes the T n g architec ture, and Section 4 details ﬂow splitting v ia queue sharin g in the context of T ng . Section 5 uses sim- ulations to test the f easibility and efﬁcacy of ﬂow splitting and queue sharin g, an d Section 6 describes our pr ototype to- gether with experimen ts conﬁrming T ng ’ s practicality . Sec- tion 7 discusses incremental deployment strategies, Section 8 revie ws related work, and Section 9 conclud es. 2. THE CONGESTION CONUNDR UM This section ﬁrst examines the origin of TCP conge stion control and the challeng es it enco untered as the In ternet di- versiﬁed, then reviews the many appro aches p roposed to ad - dress these challenge s and their technica l tradeoffs. 2.1 Why is Conge stion Contr ol in TCP? Thoug h network congestion was a recognized p roblem [30, 46], TCP did not include cong estion control when it was ﬁr st speciﬁed and de ployed [99]. Only after se veral yea rs of de- bate a bout whether co ngestion contr ol should be a network or tra nsport layer function [3 6, 77, 80], the tra nsport layer ap- proach took hold [1 7, 56] and e ventually w as ofﬁcially sanc- tioned [7]. TCP con gestion co ntrol [5] kept r outers simple and pe rformed well on typical n etworks of th e time. T o do so, TCP endpoints infer congestion info rmation from noth- ing but the absence of timely pac ket arriv al, using an implicit heuristic model of the way typical network compo nents are expected to beha ve. But this inference appro ach assumes that all devices on the path behave consistently accord ing to this model, an assumption somewhat con trary to th e In ter- net’ s original pur pose of making diverse physical networks interoper ate [27], and soon proven inaccurate [12]. Arguments for end-to -end cong estion c ontrol sometimes in voke the end-to -end principle, but the p rinciple’ s origi- nal formu lation [86 ] conce rns reliability , and explicitly ac- knowledges that perf ormance conce rns may justify in-path mechanisms augmenting (b ut not replacing) en d-to-end reli- ability checks. T he inclusion of congestion con trol in TCP thus appear s more a pro duct of historical expedience tha n an application of deep internetworking principles. 2.2 Patch ing Up TCP Congestion Control As th e Internet g rew to inco rporate network technologies that v iolate the assumed mo del of network behavior un der- lying TCP’ s inferen ces, a vast arr ay of tech niques appeared to m ake TCP perfo rm ad equately over these new technolo - gies. W e classify these techniq ues into br ute force, link-layer ﬁxes, new inferen ce sche mes, explicit fee dback, transport in- terposition, and mid-loop tuning . Brute Force: A sedu ctiv ely ea sy “sledg ehammer solu- tion” to many TCP ills is simply to open parallel TCP streams over o ne path, eith er at transpor t [90] or application lev el [4]. This approach effectively ampliﬁes TCP’ s aggressiveness, boosting throug hput at the cost of fairness [39]. MulTCP [ 29] achieves the sam e ef fect in a single TCP stream. Link-Layer Fix es: Most wireless networks perfor m link- layer retransmission to r educe TCP’ s misinterp retation of radio noise as congestion, at the costs of introducin g de- lay variation a nd reordering , and/or risking redundant re- transmissions by the tw o layers [55, 108]. Forward erro r correction can reduce losses while minim izing delay and reorder ing, but in curs band width overhead on all p ackets, not just th ose affected [25] . While link- layer ﬁxes are use- ful, they incu r unnecessary co sts to d elay/jitter-sensiti ve and loss-tolerant non-TCP trafﬁc, and cann ot address other is- sues affecting TCP such as high end- to-end round- trip times. New Inference Schemes: Each sign iﬁcant new network- ing technolog y has spawned ef fo rts to m odify TCP endpoints to make better co ngestion control in ferences when run over that tech nology : e.g ., for mob ile [20] , satelli te [2], wide- area wireless [2 1, 89], high- speed [38, 62], an d ad hoc [68] networks. But there is an elephan t in the r oom: in a di- verse inter netw ork, one path may cross se veral technologies in turn—e.g ., a wired LAN, then a satellite uplink, a high- speed transatlantic cable, and ﬁnally a remo te ad ho c net- work. But we can choose on ly one end- to-end scheme for any single path; sepa rate schem es tun ed to each techno logy are insufﬁcient if n one performs well on the combinatio n. The extensive parallel literatu res on high-sp eed [ 6] and wire- less [68] con gestion control schemes rarely interact o r exper - iment over diverse paths, giving us little optimism that any inference -based end-to-end scheme will perform well on all current, let alone future, network technologies. New inference schemes a lso face th e burden of compet- ing fairly with legacy ﬂo ws [5 8], a constrain t that may be in conﬂict with the goals o f th e new sch eme itself. TCP V e- gas [18], fo r example, w o rks well and minimizes end -to-end delay if ru n alone on a netw ork, but cannot compete fairly with tradition al TCP ﬂo ws [7 3], because th e signal V egas 2 respond s to —queue build-up—is fu ndamental to prev a iling loss-based conge stion control. V e g as can be modiﬁed to compete fairly by adding a loss-based com ponen t [ 98], but doing so eliminates V egas’ s beneﬁt of lo w delay . Explicit Feedback: Schemes like CSFQ [95] and XCP [5 9] for high -speed ne tworks, and A TCP [ 67] and A TP [96] for wireless network s, require rou ters to p rovide more infor- mation, such as explicit n otiﬁcation o f lo sses [9 ], co nges- tion [8 1], or link failur es [51], to the TCP en dpoints. But Internet router upg rades a re feasible today only if do ne in - crementally , one admin istrati ve d omain at a time. Since an end-to- end path may cross several d omains, congestion con- trol schem es r equiring rou ter u pgrades cannot be d eployed end-to- end b ut only in restricted domain s. T ransport Layer Interposition: Ne twork oper ators of- ten do not control end hosts and have l ittle lev erage to make users adopt new end- to-end con gestion control schemes; they must instead m ake prevalent TCP implementation s perform well by man aging heteroge neity within the network. TCP- splitting PEPs [16] interpose on tr ansport connection s as they cross speciﬁc link s or ad ministrative boun daries, e.g ., opti- mizing loss-p rone [1 09] or m obile [8] wireless link s. Th ese PEPs “split” an end-to-en d conn ection into mu ltiple section s, applying specialized algorithms to network segmen ts exhibit- ing no n-tradition al behavior . A PEP cannot interp ose on the transport’ s congestio n contro l loo p without interposing on its semantic fun ctions as well, howe ver, break ing TCP’ s end-to- end reliability and fate-sharing [27]. Transport interposition also in terferes with end -to-end IPsec [6 3], since interposi- tion is effecti vely a “man-in-the- middle attack” [16]. Mid-loop T uning: An alternativ e to interposition is for a PEP to ma nipulate a co nnection from the mid dle of a con - gestion control lo op; we ref er to this appro ach as mid-loop tuning . For mo bile/wireless networks, Snoo p [ 11] cach es TCP se gments and r etransmits them when it detects non- congestion packet loss; M-TCP [1 9] m anipulates TCP’ s re- ceiv e windo w to trick the sender into throttling transmission without reduc ing its congestion window . PEPs fo r high- speed n etworks use A CK splitting [26, 57] to trick the sender into into inc reasing its conge stion window mo re quickly , and window stufﬁng [26] to com pensate for end h osts with re- ceiv e b uffers too small for the bandwidth -delay prod uct. While mid-loop tu ning av oid s vio lating TCP ’ s end-to -end semantics, it is still in compatible with IPsec, as IPsec pr e- vents PEPs from seeing or modifyin g the re le vant tran sport headers. Mid-loop tun ing ma y also inter fere destru ctiv ely with modiﬁcatio ns to end host conge stion contro l algo rithms, as o ccurred between Sno op and SA CK [106 ]. Mu ltiple PEPs residing on on e end-to-en d path unbekn ownst to eac h other can also inter fere: e.g., if a T CP conn ection crosses k wide- area link s, each with an ACK sp litting PEP that mu ltiplies the sender ’ s co ngestion win dow incre ase r ate by a factor of n , the comb ination may u nexpectedly multip ly the send er’ s aggressiveness b y n k . Finally , mid-loo p tuning by deﬁni- tion exploits a transport’ s vulnerability to manipulatio n, and such vulner abilities are exploitab le f or maliciou s purp oses as well; parallel research efforts are now dev oted to clo sing these same vulner abilities [87, 92 ]. 3. REF A CT OR ING THE TRANSPOR T This section b rieﬂy d escribes T n g ’ s overall arch itecture to provide context for exploring ﬂo w splitting in the rest of the paper . W e focus on those aspects relevant to und erstanding how T ng suppor ts ﬂow splitting, omitting many other details of the architecture. 3.1 Ar chitectural Goals T ng ’ s fu nctional layerin g, illu strated in Fig ure 1, b uilds on previously prop osed ideas [44] b y decomp osing the Inter- net’ s trad itional transport layer with a go al of cleanly sepa- rating network-oriented from application-o riented functio ns. W e deﬁne network -oriented fu nctions to be those concer ning reliable and efﬁcient network operation : function s th at net- work operators car e about, such as who is using the network and how it is perf orming . W e d eﬁne app lication-orie nted function s a s those concer ning o nly application endpo ints, such as application conten t and the end-to-en d transport ab- stractions that app lications build o n. T ng ’ s lower E ndpoin t and Flow Regulation Lay ers implemen t what we c onsider the network-orien ted functions of endpoint identiﬁcation and congestion contr ol, respectively , w hile T ng ’ s Isolation an d Semantic Lay ers imp lement the application-orien ted fun c- tions of end-to- end s ecurity and reliability . W e acknowledge that th e “correct” boundary b etween network- oriented and app lication-orien ted fun ctions is no t clear-cut and ma y be a movin g target. T ng ’ s c ontribution as an ar- chitecture is not to ﬁnd a perfect o r complete decomposition of the transpo rt layer , b ut to identify speciﬁc tr ansport func- tions that have proven in practice to be “n etwork-oriented ” contrary to their traditional placemen t in the transpo rt layer , and to con struct a new but in crementally deployable layering that reﬂects this reality and restores the “e nd-to-en dness” o f the remaining application -oriented function s. The following sections brieﬂy outline each T ng layer . 3.2 The Endp o int Layer As in the OSI mod el [1 13], TCP/IP break s application endpo int identiﬁers into Network Layer (IP ad dress) and T rans- port Layer (p ort n umber) components, includ ing only the former in the IP head er on the assumptio n that the ne twork need know on ly how to route to a gi ven host, and leaving port number s to be p arsed and demu ltiplexed by the transport. As the Intern et’ s s ize and d i versity e x ploded, howe ver , network operator s need ed to enfo rce a ccess po licies th at depen d o n exactly who is communicating—n ot just which ho sts, but which app lications and users. Now-ubiqu itous middleboxes such as Firewalls [45], tr afﬁc shap ers [35 ], and N A Ts [ 91] must therefor e understand transport h eaders in or der to en- force these network policies. Since middleb oxes ca nnot for- ward trafﬁc for transports who se headers they do no t under- stand, new transports ha ve become effecti vely undeployable other than atop TCP or UDP [85] . 3 Recognizing th at comm unicating r ich end point informa- tion is a network-orien ted fun ction relevant to in -network policy enfo rcement, T n g factors th is fun ction into its End - point Layer so that middleboxes can extract th is information without having to un derstand application -oriented head ers. T ng rein terprets UDP [7 9] as an initial En dpoint Layer pro- tocol already suppor ted b y m ost mid dleboxes, b ut we are ev olvin g T ng to in corpor ate idea s on r icher endpo int id en- tities [102], N A T tr av e rsal [14, 41, 47 ], middleb ox signal- ing [24, 10 5], N A T -frie ndly routin g [48, 107], and other re- lated ideas outside the scope of this paper . 3.3 The Flow Regulation Laye r As T ng ’ s Endpo int Lay er factors o ut endp oint identiﬁca- tion, the Flow Regulation Layer similar ly factors out per - forman ce related fu nctions such as congestion co ntrol, with the r ecognition that th ese function s have likewise becom e “network-or iented” in practice as discussed in Section 2. T he Flow La yer assumes that the und erlying Endpoin t Layer p ro- vides only b est-effort packet delivery between a pplication endpo ints, and builds a ﬂow-re gulated best-effort delivery service for h igher laye rs to build on. I n p articular, the Flow Layer’ s interface to high er layers inclu des an explicit signal indicating when the higher layer may transmit new packets. T o p erform this ﬂow regulation, the Flow L ayer may e i- ther implemen t standard TCP-like c ongestion co ntrol [5 6], or , as we discuss in later section s, may use mo re spe ciﬁc knowledge of an un derlying network technology or admin- istrati ve domain. In the longer term, we en vision T ng ’ s ﬂow layer incorporating add itional per forman ce-related mech a- nisms such as end-to- end multihoming [93], m ultipath trans- mission [69], and forward error correction . 3.4 The Isolation Layer Having factored out network -oriented transpor t function s into the Endpoint a nd Flow L ayers, the op tional Isolation Layer “isolates” the app lication from the network, and pro - tects the “end-to -endness” of hig her layers. This isolation includes two elements. First, th e Isolatio n L ayer pr otects the a pplication’ s end-to -end comm unication fr om interf er- ence or eavesdropping within the path, via tr ansport-n eutral cryptog raphic secu rity as in IPsec [63]. Second, the Isolation Layer protects the app lication and end-to-end tran sport fro m unnecessary exposure to details of n etwork topology and at- tachment points, by impleme nting loc ation-indep endent end- point iden tities as i n HIP [7 6] or UIA [43], which remain sta- ble even as devices move o r the n etwork reconﬁgur es. The Isolation Layer’ s interface to hig her laye rs is function ally equiv a lent to the interface exported by the Flow Lay er , but with tran sformed packet pay loads and/or endpo int identities. W e believe the Isolation Layer rep resents a suitable lo ca- tion for end-to- end security precisely b ecause it deﬁne s the bound ary be tween n etwork-orien ted and application-or iented function s, thus ensurin g in tegrity and security of the latter, while allowing middlebo xes to inter act with the former . In contrast with SSL/TLS [31], the I solation layer is ne utral to tr ansport sem antics an d does not need to be adap ted to Figure 2: An end-to-end path co mposed o f multiple Flow Layer segments. Flow middleboxes can optimize net - work performance ba sed on the properties of a speciﬁc segment, such as a satellite link. each transport [ 83]. In contr ast with IPsec’ s standard loca- tion im mediately ab ove IP , the Isolation Laye r do es give up the ability to p rotect E ndpoin t and Flo w Layer m echanisms from off-path DoS attacks as IPsec prote cts TCP’ s sign al- ing mechanisms, b ut if standard non-cryp tograph ic d efenses against such attacks [1 3, 33] are deem ed insufﬁcient, then IPsec authentication c an still be dep loyed in T ng underneath the ﬂow layer , ideally v ia a delegation- friendly scheme [48, 107] permitting contro lled interposition by middlebo xes. 3.5 The Semantic Layer T ng ’ s Semantic Layer implements th e remaining application- oriented end-to-en d transport func tions, particu larly end-to- end re liability . In the case o f TCP , the se function s are all those i n the or iginal TCP protocol [99] except p ort numbers, including ackn owledgment and retra nsmission, ord er preser- vation, and r eceiv e window manag ement. Other application- visible sema ntics, such as RDP’ s r eliable datagr ams [78] and SCTP’ s message- based multi-streamin g [93], co uld ﬁt equally well into T ng ’ s Seman tic Layer as d istinct pro tocols. The Semantic Layer’ s interface to lower layers dif f ers fr om that of trad itional In ternet tran sports in two ways. First, a T ng sem antic p rotocol uses the End point Layer ’ s endpo int identities (po ssibly transform ed b y the Isolation Laye r) in- stead of im plementing its own port nu mber de multiplexing. Second, a T ng semantic protoc ol implements no co ngestion control b u t relies o n the un derlying Flow Layer to signal when packets may be transmitted. The Semantic Layer’ s in- terface to higher layers (e.g ., the application) depen ds on the transport semantics it implements, b ut need not differ in any application- visible way fro m existing transpor t APIs—a fact that could aid deploymen t as we discu ss later in Section 7. 4. FLO W SPLITTING IN T ng W ith the architec tural context in p lace, we now focus on T ng ’ s support for ﬂo w splitting at the Flow Regulation Layer, in orde r to supp ort in-path cong estion control specialization without interfer ing wit h end-to-e nd t ransport functions. 4.1 Flow Mid dleboxes T ng en ables n etwork oper ators to specialize co ngestion control and other ﬂow perfor mance co ncerns by dep loying devices we call ﬂow middlebo xes at network tec hnolog y and administrative boundaries. As illustrated in Figure 2, a ﬂow 4 middlebo x interp oses o n a Flow Layer session, effectively terminating one cong estion contro l loop and starting another for the next section o f th e path. Each section may consist of one or many Netw o rk Layer hops: ﬂo w splitting does not imply ho p-by- hop congestion control [72 ], although the lat- ter might be viewed as a l imit case of ﬂow splitting. Each ﬂo w sectio n may use any con gestion con trol sche me operating according to standard princip les; the key techn ical challenge is joining the se indepen dent segments to f orm a single ﬂow pr oviding end-to -end c ongestion control to hig her layers, a challeng e we address in Section 4.3. While ﬂow middleb oxes are similar to PEPs, they avoid the problems of PEPs discussed in Section 2.2. Sinc e T ng ’ s Flow Layer implements only performan ce-related f unctions, Flow m iddleboxes in terpose o n only these fu nctions with- out interfe ring with end-to-en d fun ctions. Flow m iddleboxes maintain only perf ormance- related “soft state;” end-to-en d function s can recover fro m a ﬂow mid dlebox failure since reliability and conn ection-related “hard state” ar e located at the endp oints. W e demon strate this fate-sharing in T ng throug h experiments using our pro totype impleme ntation in Section 6.3. 4.2 Uses of Flow Sp litting Flow splitting can be used to imp rove comm unication per- forman ce in at least th ree ways, which we summar ize her e: reducing per-section R TT , spec ializing to network tec hnol- ogy , and administrative isolation. Reducing P er -Sec tion RTT : A TCP ﬂo w’ s th rough put is adversely affected by large rou nd-trip time (R TT), e spe- cially in competition with ﬂows of sm aller R T T [37]. Fur- ther , sinc e infor mation takes on e R T T to p ropagate around the control loop, any end-to- end sch eme’ s responsiveness to changin g co nditions is limited by R TT . Subdividing a path into sho rter sections reduces each section’ s R T T to a fraction of the path’ s R TT , which can improve both throug hput and responsiveness. Pro ponents of h op-by -hop con gestion con- trol schemes for packet-switched [7 2], cell-switched [66 ], and w ireless networks [11 0] have n oted this b eneﬁt. The L o- gistical Session Lay er [97 ] similarly lev erages the redu ced R T T of split paths to improve wide-area grid perfor mance. Specializing to Network T echno logy: The literatur e re- viewed in Section 2 amply demonstra tes that th e best co n- gestion co ntrol scheme fo r a com munication path ofte n de- pends o n und erlying n etwork characteristics. Flow middle- boxes deployed at the boun daries of a network do main can implement a cong estion contro l specialized to that domain, taking advantage of a mo re precise knowledge of the do- main’ s character istics fr om which to make inferences, and/or lev e raging explicit f eedback mechan isms [9, 51, 59, 81, 95] supported only within that domain. Althou gh one path may trav erse m any such bound aries, each mid dlebox need only understan d th e pr operties of the adjace nt path section s, re- ducing the “end-to- end” challen ge of managing ﬂow perfor- mance across an arbitrary set of network technologies to the more tractable c hallenge of interfacing techno logies in pair- wise combin ations. Th e fact that one “side” of each ﬂow Figure 3: Joining Sections through Queue Sharing middlebo x is usually a standa rd wired LAN simpliﬁes the challenge further . Administrative Isolation: Flow splitting enables admin - istrators to split a Flow Layer pa th at dom ain bou ndaries and deploy a new congestion control sch eme within the do- main under c ontrolled co nditions, wh ile m aintaining TCP- friendline ss on other section s o f paths crossing the do main. Even for le gacy ﬂows not confo rming to T ng ’ s model—e.g., ﬂows with con gestion contro l e mbedded in the Transport Layer or n o cong estion con trol at all—adm inistrators can enforce the use of a pa rticular con gestion co ntrol schem e within a doma in by encapsulating legacy streams in a Flo w Layer “tunnel” as a mech anism using per-ﬂo w state at bo rder routers/ﬂow m iddleboxes to dep loy new con gestion control schemes within a domain [95], or to enforce T CP-friendliness [82] or differential ser vice agreemen ts [49 ]. Flow splitting thus giv es ad ministrators the freedom to choose schemes like V e - gas [18] fo r th eir d esirable p roperties, while isolating the chosen sch eme from compe tition with legacy Reno ﬂows and av oid ing the yoke of TCP-friend liness. 4.3 Joining Flow Sec tions As mention ed ea rlier , the prima ry tec hnical challeng e in implementin g ﬂow splitting is jo ining multiple in depend ently congestion contro lled sections to f orm an en d-to-en d co n- gestion co ntrolled path. Existing T CP splitting PEPs lev e r- age the buffer management and receiv e window control that TCP’ s re liable byte stream abstractio n p rovides, but these heavyweight abstractio ns are not well suited to T ng ’ s best- effort, packet-oriented Flow Layer . T ng addresses this challeng e throu gh a simple tec hnique we call q ueue sha ring . W e assume each ﬂow middlebo x along a split path ha s a queue in which it holds packets it has received o n one section but not yet for warded on to the next section. Wit h queue sharing, the middleb ox treats this queue as the meeting p oint f or th e two sections, with each section’ s congestion control loo p taking a role in the queue’ s man age- ment: the two adjacent sections thus “share” this queue. Consider for example da ta sent from th e source host across Section 1 an d arri v ing at the ﬂo w middlebo x in Figure 3. In- stead of ackn owledging a data segment immediately upo n reception as TCP would, the ﬂow middle box silently d e- posits the packet in its shared queue. Th e transmit side of the middlebo x’ s cong estion control log ic fo r Sec tion 2, mean- while, determines when the m iddlebox m ay remove p ack- ets fro m the sh ared qu eue and tr ansmit them over Section 5 2 to th e target h ost. When Section 2’ s cong estion contr ol logic dec ides a p acket may b e transmitted, the middleb ox removes and tran smits a packet from the shared q ueue, and only the n allows the receive-side logic fo r Section 1 to ac- knowledge the p acket’ s receipt. The m iddlebox in effect treats the shared queu e as if it w ere th e last r outer in Sec- tion 1, including the queue in Section 1’ s congestion control loop so that the sender o n Section 1 (the source host in this case) throttles its tr ansmit rate if this or any othe r Section 1 router queue ﬁlls. Suppose the path’ s bottleneck is one o f the rou ters in Sec- tion 2. As the bo ttleneck router ’ s queu e ﬁlls, Section 2 ’ s congestion control scheme d etects this bottleneck, typically by sensing either a p acket loss or delay increase d epending on the con gestion control scheme. The ﬂow middlebo x in re- sponse cuts its tran smission rate over Section 2, thereb y de- creasing the rate at wh ich it removes packets from the sha red queue. As th e shared queu e ﬁlls, Section 1’ s tran smitter— the source host—notices either a loss or a delay increa se an d cuts its transmission rate in turn. Queue sh aring is simple a nd works with a ny congestion control algorithm as long as the m iddlebox manages the shared queue in the pr oper fashion f or router s in th e section feed- ing the queue . If that section co nsists o f stan dard I nternet routers, then th e shar ed que ue may be a stand ard dr op-tail queue, or a RED [40] o r ECN-m arking [8 1] qu eue to im- prove pe rforman ce. If the feeding section uses XCP [5 9], then the shared q ueue mu st behave like an XCP router, tag- ging pac kets ﬂowing thr ough it with congestion informatio n. 4.4 Limitations of Queue Sharing Queue sharing is appealing due to its simplicity and prac- tical applicab ility as explo red in following sectio ns, but it has at least two limitations that may sug gest fu ture reﬁn e- ments or alternative ﬂow joining techniqu es. First, queu e sharing assumes that th e middlebo x maintain s a separ ate queue per ﬂow , which may be expensiv e in mid - dleboxes supp orting many ﬂows. This situatio n is still an improvement over th e per-ﬂow state requ irements of TCP splitting PEPs, howe ver , which typically need two queues in each direction —a receiv e buffer fo r the previous TCP ses- sion and a transmit buffer for the next. Second, since qu eue sh aring essentially tr ansforms a down- stream section’ s cong estion into “ backpressur e” on upstream middlebo xes’ share d q ueues, conge stion-related overheads can accum ulate across these queues. If all section s of a path use lo ss-based cong estion control [5], for example, and the last section contains the bottleneck, then not only the bottle- neck ro uter queue but ea ch upstream middlebo x q ueue ﬁlls before this backpr essure reaches the s ending e ndpoin t, exac- erbating the loss-based scheme’ s delay-ind ucing ef fects. A possible alter nativ e to queu e sharing is to layer one end - to-end con gestion contr ol lo op ato p a series of per-section control loops. The Flow Lay er m ight u se XCP [59] end-to- end, f or example, treatin g th e lower -lev el p er-section con - gestion control loops as “virtual links” as s een by the upper - lev e l XCP con trol loop . Such an appro ach mig ht ad dress Figure 4: Network topology used in simulations the above issues, at the cost of re quiring greater end-to- end coordin ation; we leav e such alternativ es to future w ork. 5. SIMULA TION EXPERIMENTS T o illustrate how ﬂow splitting can address p ractical d if- ﬁculties cau sed by network h eterogeneity , we explore two simple but r ealistic scen arios via simulation. W e implemen ted a pr ototype Flo w Layer sup porting ﬂo w splitting in the ns2 network simulator , building on existing TCP congestion con- trol algorithms already supported by t he simulator, and used it to com pare relevant per formanc e properties of ﬂows em- ploying ﬂow splitting against pure end- to-end ﬂows. Th ese scenarios ar e intende d to illustrate th e beneﬁts o f arch itec- tural su pport f or ﬂow splitting, a nd n ot to exhau sti vely ana- lyze or quantitatively predict real network perf ormance us- ing par ticular protoco ls. W e leave analysis of more diverse scenarios and implementation tradeoffs to future work. 5.1 Getting Low Delay fr om Residential DSL W e ﬁrst explore a typica l scenario in which a residen - tial DSL connection is used concurren tly f or both dela y- sensiti ve ac ti v ities such as gaming and bandwid th-intensive activities such as web browsing or ﬁle downloads. T he sim- ulation uses the topo logy shown in Figure 4 (T o pology 1) , in which a gateway on the ISP’ s network separ ates the user ’ s client from the Internet. The client commu nicates with the server on th e far r ight, but a pair of hosts generate com peting cross-trafﬁc on an interm ediate network li nk. W e con ﬁgured the ADSL link accordin g to observed parameters [32]. The ISP in th is scenario offers a premiu m “gamin g ser- vice, ” in which the client’ s gateway acts as a ﬂo w middleb ox helping the client maintain low delay . The client’ s end host or DSL modem negotiates the use o f a d elay-minim izing congestion contro l scheme over the DSL link with the ﬂow middlebo x—we u se TCP V egas [18]—but the rest of the path from the gate way to the server uses loss-based Ne wRen o congestion control. The bottleneck fo r o ur o bserved ﬂow is at the DSL link. Figure 5 compar es th e bandw idth a nd r ound- trip d elay provided by this T ng -e nabled “gamin g service” against th e perfor mance o f either NewReno or V egas alo ne ope rating end-to- end, in the presence of a constant upload stream fr om the clien t to th e server and a varying amoun t of co mpet- ing cross-trafﬁc on th e co re Internet. Th e simula tion add s a new TCP-NewR eno cro ss-trafﬁc ﬂow e very 250 seconds. As the bandwid th graph shows, end -to-end V egas p erforms well u ntil the ﬁrst com peting NewR eno ﬂow appear s, then 6 0 100 200 300 400 500 0 250 500 750 1000 Flow bandwidth (bps) Simulation time (sec) Tng (Vegas -> NewReno) TCP-Vegas TCP-NewReno 0 200 400 600 800 1000 1200 1400 0 250 500 750 1000 One way end-to-end delay (msec) Simulation time (sec) Tng (Vegas -> NewReno) TCP-Vegas TCP-NewReno Figure 5: (a) Band widt h obtained and (b) end-to- end de- lay during a DSL upload, measured at 2.5 second inter- vals over the ﬂow’ s lifet ime. One TCP-NewReno cross- trafﬁc ﬂow is added every 2 50 seconds. quickly gives up bandwidth as Ne wRen o cross-trafﬁc in- creases. End-to- end Ne wReno, on the other hand, com petes well with the cross-trafﬁc in securin g network ban dwidth, but maintains a consistently h igh delay—a f requent pr ob- lem for users of typ ical DSL modems [32]. W ith th e T ng - enabled “gaming service, ” in c ontrast, the ISP’ s ﬂow mid- dlebox iso lates the V egas algor ithm contro lling the DSL link from the Ne wReno algorithm controlling the p ath across the Internet c ore, enabling th e V egas section to provid e low de- lay with out competing with NewReno ﬂows on the same link, and enab ling NewReno to c ompete ef fectively for band- width on the Internet. In a ddition to the main ben eﬁt of obtaining low d elay while upload ing, th e split T ng ﬂo w experience s slightly lower delay than end-to -end V egas e ven without cross-trafﬁc. This effect results fro m the sho rter feed back loop that the V egas client experienc es with T ng , opera ting over on ly the ADSL link’ s 20m s R TT instead of th e full path’ s 120 ms R TT , an example of the ef f ects described in Section 4.2. Figure 6 shows similar results durin g a download from the server to the client. The results are similar overall, but the T ng ﬂow does experience some increase in delay , th ough not as much as en d-to-end NewReno. T his increase is due to our use of qu eue sharing to join Flow Layer sections, which causes pa ckets crossing fro m the hig h-band width Ne wReno core section to the lower-bandwidth DSL section to build up in a Ne wRen o-contro lled q ueue at the ﬂo w middleb ox as described in Section 4 .4. Sin ce this qu eue is on the h igh- bandwidth side of the network and und er control o f the ISP , 0 400 800 1200 1600 2000 0 250 500 750 1000 Flow bandwidth (bps) Simulation time (sec) Tng (Vegas -> NewReno) TCP-Vegas TCP-NewReno 0 100 200 300 400 500 600 0 250 500 750 1000 One way end-to-end delay (msec) Simulation time (sec) Tng (Vegas -> NewReno) TCP-Vegas TCP-NewReno Figure 6: (a) Bandwidth obtained and (b) end-t o-end delay during a DSL download, measured at 2.5 second intervals over t he ﬂow’ s lifetime. One TCP-NewReno cross-trafﬁc ﬂow is added every 250 seconds. howe ver, it can b e made sma ll to serve the low-delay de - mands of the client. Overall, this instantiation of T ng co mbines the strengths of the different T CP variants in their speciﬁc domains, and thus provides a high-ban dwidth, low-delay service that none of the end-to- end s chemes could manage alone. 5.2 A Lossy Wir eless Network The second topolog y in Figure 4 uses a wireless link at th e last hop with a varying loss rate. This topolo gy is moti vated by a mobile/wirele ss end-u ser who is chieﬂy concerned with maximizing band width. W e imp lemented TCP-SimpleEL N, a TCP variant sup - porting Explicit Loss Notiﬁcation (ELN) [9] signals fro m th e TCP-SimpleELN receiver . The TCP-SimpleELN r eceiv e r accepts notiﬁcations of packet loss fro m th e underlying wire- less link lay er . When such a n otiﬁcation is rec ei ved, th e TCP-SimpleELN receiv e r send s back a message to the sender explicitly indicating packet(s) that were dropped by the link layer . The TCP-SimpleELN sender then retransmits th e dropp ed packet(s) without modifying the congestion window . Figure 7 shows the perfor mance of end -to-end TCP-NewReno and an in stantiation of T ng co mposed of TCP-SimpleELN on the last wireless hop and TCP-NewReno in the wid e-area. The loss r ate inc reases fr om 0 at the beginnin g to 0 .1% at 250 seconds, then to 1% at 500 seconds, and ﬁnally to 3% at 750 secon ds. T ng is able to leverage TCP-SimpleEL N’ s strength on the wireless link , and maximizes bandwidth for both data uploa ds and downloads. 7 0 2000 4000 6000 8000 10000 12000 0 250 500 750 1000 Flow bandwidth (bps) Simulation time (sec) no loss 0.1% loss 1% loss 3% loss Tng (SimpleELN -> NewReno) TCP-NewReno 0 2000 4000 6000 8000 10000 12000 0 250 500 750 1000 Flow bandwidth (bps) Simulation time (sec) no loss 0.1% loss 1% loss 3% loss Tng (NewReno -> SimpleELN) TCP-NewReno Figure 7: Bandwidth obtained by data ( a) upload a nd (b) download ﬂows over the lossy wireless to pology , mea- sured ov er 2.5 second intervals, over the ﬂo w’ s lifetime. Since TCP-Simp leELN relies on a link layer n otiﬁcation, the transport rec ei ver must b e co -located with th e wir eless link layer receiver . T ng makes th is possible for any e nd-to- end ﬂo w , since the lossy link layer can be m anaged by ﬂow middlebo xes using TCP-Simp leELN on the link. 6. A PR OT O TYPE T ng ST A CK While Section 5’ s simulation s suggest the feasib ility of joining ﬂow sections via queue sharing, we wish to e valuate ﬂow splitting in the context of the overall T ng ar chitecture to validate our original goal of supp orting in-path o ptimiza- tion withou t interfering with end-to-en d transpor t functions. T o do so, we built a pro totype p rotocol suite demon strating the proposed refactoring of transport services into Endpoin t, Flow Regulation , Iso lation, and Sema ntic La yers, th ereby achieving T ng ’ s main goals. This section describ es relevant details of o ur cur rent prototy pe together with experim ents using the prototyp e that conﬁrm T n g ’ s feasibility and illu s- trate the beneﬁts of its clean suppor t for ﬂow splitting. 6.1 Organization of the Pr ototype Figure 8 illustrates the overall structur e of th e prototy pe, which builds on a pr evious exper imental proto type of the Structured Stream Transport (SST) p rotocol [42 ]. SST con - sists of two main compo nents: a Channel Protoco l and a Stream Proto col. The Channel Protoco l imp lements a se- quenced an d cong estion-con trolled but u nreliable and u n- ordered packet delivery service, comparable to DCCP [ 64], but with option al cry ptograp hic authen tication an d encryp- tion s imilar to that o f IPsec [63] and DTLS [83]. The Stream Figure 8: Protocol Design of the Prototype Protocol builds on the Chan nel Pro tocol’ s delivery service to provide reliable, o rdered byte streams semantically equiv a- lent to TCP’ s, but capab le of b eing create d an d d estroyed more efﬁciently , enabling ﬁne -grained (e.g., transaction al) use of these lig htweight streams. This separ ation of fu nc- tions within SST is the reason for it bein g th e basis of our prototy pe: SST’ s Stream Protocol nicely ﬁts the r ole of T ng ’ s Semantic L ayer, its Channel Protoco l, while n eeded to be r e- worked as descr ibed below , serves as star ting p oint for both T ng ’ s Flow and Isolation L ayers, and its Channel Proto col already builds ato p UDP as a starting point for T ng ’ s End- point Layer . The main challenge was implementing the Flow Regula- tion and Isolation Layers. T o do so, we borrowed a princip le of the Recursive Network Architectu re [10 3], and adap ted the Ch annel Pro tocol so that this on e proto col m ay b e in- stantiated in different conﬁgur ations to imp lement both the Flow Layer and t he Iso lation Layer . When implementin g the Flow Lay er , the Chan nel Protocol oper ates with co ngestion control e nabled but cryp tograph ic secu rity disabled , and we modiﬁed the pr otocol to allow d ividing an end -to-end path into s egments, each running a separate instan ce o f the Chan- nel Pr otocol with an indepe ndent c ongestion con trol loo p. When implementin g the Isolation Layer, the Chann el Proto- col oper ates end-to-en d, u sing self-certifying cryp tograph ic identiﬁers as in HIP [76 ] to give ho sts stable identities as they migrate amon g IP addr esses, and using IPsec-like e n- cryption and auth entication to secure the en d-to-end chan - nel against interpo sition or eavesdropping. The en d-to-en d channel serv ing as the Isolation L ayer run s with its o wn con- gestion control log ic disab led, r elying instead on the under- lying, segmented Flow Layer instance( s) of the Channel Pro - tocol to implement this function . The Stream Pro tocol does not requir e a stream to be at- tached always to the same channel: in stead, a stream can attach dynam ically to any av ailable chann el b etween the ap - propr iate pair of ho sts, as identiﬁed cryptogr aphically b y the Isolation La yer . Each Flow Layer ch annel monitor s the 8 Figure 9: Experimental topology f or long -delay inter-site link scenario. 0 10 20 30 40 50 0 10 20 30 40 50 60 Cumulative MBytes Transferred Time (Seconds) no loss 0.1% loss 0.4% loss 0.8% loss 1.6% loss 3.2% loss Segmented (Reno+Fixed) Segmented (Vegas+Fixed) End-to-End (Reno) End-to-End (Vegas) Figure 10: End-to-End reliable transfer performance over a high-bandwidth-delay-product link with random loss, with and without ﬂow splitting. channel’ s co ndition using the same packet-level acknowl- edgmen ts it uses to imp lement cong estion con trol, and re- ports its condition to higher layers. I f a ﬂo w detects a st all or failure, the Isolation Layer chan nel atop that ﬂo w p ropaga tes this signal up ward to the Semantic Layer, which attempts to construct Flow an d Isolation Layer ch annels repr esenting a new or alternative com munication path. If a n ew , authenti- cated end-to- end chann el com es o nline while th e old one is still unusable, th e Strea m Protocol migr ates existing s treams to the new channel transparently to the app lication. Associated with the Channel Proto col, SST u ses a sep a- rate Ne g otiation P r otocol for key exchang e, similar to IPsec’ s IKE [ 60] or HIP’ s key exchang e mechanism [7 5] and based on Just Fast Ke y ing [1]. Fin ally , to enable hosts to ﬁnd each other after chan ging IP add resses, SST provides a simple Re g istration Pr oto col analogo us to a name service thro ugh which hosts can register their cryptograph ic identities with a registration server and look up the curr ent network endpoints of other hosts by their crypto graphic identities. The prototype pro tocol suite runs in user space, and is im- plemented in C++ using th e Qt e vent f ramework [ 104]. It in- cludes an asynchro nous networking framework that enables it, and app lications using it, to be ru n either on real networks or in a network simulatio n environment for development and testing pur poses. When used in the simulation environment, the pro tocol suite still implem ents comp lete, working p ro- tocols that exchange and pro cess “real” packets containin g user data, so it is m ore faithful in th is r espect than many simulation environments. 6.2 V alidating Flow Splitting in th e Prototype T o v alidate ﬂow sp litting via the proto type’ s Chan nel Pro- tocol, we test a simple network scen ario c orrespon ding to a common use of PEPs around a h igh-ban dwidth, lon g-distance link su ch as a reserved-b andwidth link between two sites in an organization ’ s priv ate network. T o simp lify experim en- tation an d provide exactly re produ cible re sults, we run th e protoco l suite in the p rototyp e’ s n etwork simulation environ- ment. Th e experiment uses the simulated network topology shown in Figure 9, co nsisting o f two high-ba ndwidth, low- delay LAN link s surro unding a med ium-ban dwidth, high - delay W AN link, with th e W AN link in curring a variable random loss rate. In the T ng version of the scenario, the ﬂow middlebo xes surroun ding the link interpose on Flo w Layer sessions tra vers- ing the link to optimize ﬂow perfor mance. Since this inter- site link provides ﬁxed point-to-p oint bandwidth, we ass ume that the W AN lin k itself need s n o co ngestion co ntrol—on ly the LANs on both ends do. The W AN section r uns a tri vial “congestion co ntrol” scheme that mer ely maintain s an ad- ministratively ﬁxed tra nsmission rate correspo nding to the link’ s bandwidth. This way a ﬂow u sing the section takes no time to ramp up to fu ll use of the section, and ther e is no need for spec ial techniqu es to distinguish con gestion f rom non- congestion lo sses since the re are no cong estion losses. Of course, to share the link among multiple ﬂows the middle- box m ust divide th e link ’ s ﬁxed congestion window amon g the ﬂows , similar to XCP’ s fairness controller [59]. Figure 10 plots cumu lated b ytes transferred over time b y a long r eliable data transfer using the Stream Layer , ov er the T ng -split ﬂow versus an equiv alent end -to-end ﬂow , using both Reno-like and V egas-like congestion schemes. W e plot cumulative bytes in this experim ent instead of average b and- width because the Stream Protoco l’ s byte stre am reo rdering creates violent a rtiﬁcial spikes in a ban dwidth p lot. Every 10 secon ds in th e simulation , the W AN link’ s random loss rate increases. This loss quic kly af f ects end-to-end through - put as both Ren o an d V egas misinte rpret the r andom loss as congestion loss, but in the sp lit scenario th e ﬂow mid dle- boxes shield the endpoints and t he LAN sections from these loss ef f ects, resulting in g ood performanc e until the loss rate becomes very lar ge. 6.3 Recov ering from Flow Lay er Failur es While con ven tional PEPs m ight im plement the optimiza- tions d escribed in the previous experim ents, T ng ’ s key nov- elty is its sup port for such optim izations without their inter- fering with en d-to-en d security or reliability . Section 6.2 al- ready offers “proof by example” of ﬂo w splitting coexisting with end-to-end secu rity , as the Isolation Lay er chann el pro - vides end-to -end security while runn ing atop multiple per- section Flow Layer chann el instances. T o dem onstrate T ng ’ s preservation of end-to -end reliabil- ity [86] an d fate-shar ing [2 7] d espite Flow Layer failures or network recon ﬁgurations, as argued in Section 4.1, we now test the pr ototype in a simple migratio n scen ario. Fig - ure 11 sho ws a trace of an end-to-end, application-level d ata transfer using the pro totype over a simulated 10Mbps link, where the IP add ress of one o f the en dpoints (the sender in this case) changes 1 0 second s into th e trace. Once th e Flow 9 0 200 400 600 800 1000 1200 1400 1600 0 5 10 15 20 Bandwidth (KBytes/Sec) Time (Seconds) Reno Vegas Figure 11: Bandwidth trace of an end -to-end data trans- fer ac ross a migra tion event using the T ng protot ype: the sending host changes its IP address at 10 seconds. Protocol s Header Size Code Size Layer SST Legac y SST Legac y SST Legac y Semantic Stream TCP 8 20 1600 5300 Isolatio n Cha nnel ESP 24 32 930 5300 Flo w Channel DCCP 12 16 2900 Endpoint UDP UDP 8 8 600 600 T otal 52 76 3130 14100 T a ble 1: Protocols, per -pa cket header overhead, and ap- proximate code size (semicolons) of SST -based prototype versus comparable leg acy proto cols from L inux-2.6.2 8.2. IPsec/ESP and SST use AES-CTR encrypt ion [52] with HMA C-SHA256-1 28 authenticatio n [61] . Layer’ s congestio n co ntrol lo op detects and re ports a stall as described in Section 6.1, the Semantic Laye r initiates the construction a new set o f Flo w and Isolation Layer channels to the remote h ost, which includ es a new Registration Pro- tocol qu ery to ﬁnd the ho st’ s latest IP addr ess. As the ﬁgur e indicates, the p rototype requ ires only a few roun d-trips af- ter the stall to ﬁnd th e host’ s new IP add ress and negotiate new end -to-end encry pted and a uthenticated channe ls, be- fore migrating a nd r esuming th e stream tran sparently to the application. If the link or network layer could provide advance warn- ing of an impendin g network reconﬁguration , and permit si- multaneou s use of the n ew an d old ne twork conﬁgura tions during a tran sition p eriod, th en T ng could mask even this temporar y in terruption by negotiating new channels while continuin g to use the old ones. 7. DEPLO YMENT STRA TEGIES Any refactoring of existing I nternet proto cols faces m a- jor deployment h urdles due to the Internet’ s in ertia, and T ng is no exception. H owe ver , we ﬁnd several reaso ns fo r op- timism that an architectur e incorp orating the prin ciples d e- scribed here could overcome th ese deployment hurdles. Spe- ciﬁc strategies that can facilitate T ng ’ s deployment follow . Existing Protocol Reuse: A protoco l stack suppor ting clean ﬂow splitting as in T ng could be co mposed entir ely of existing pr otocols: TCP with congestion control disabled as the Seman tic Lay er , IPsec as th e Isolatio n Layer, DCCP as the Flow Layer, and UDP as the En dpoint Layer . This approa ch may not y ield the most far-reaching beneﬁts, and may incu r overheads d ue to redu ndancies b etween layers: e.g., T able 1 co mpares the minimal per-packet overhead of this reuse ap proach ag ainst ou r T ng prototy pe for co mpa- rable functio nality , as well as approx imate source code line counts. Nevertheless, r euse could mitigate the difﬁculty of new protocol de velopm ent an d standardization . Ap plication T ransparency: Our T ng prototy pe’ s Seman - tic L ayer alr eady provid es a r eliable stream abstraction com - patible with TCP’ s: with caref ul design, a kernel im plemen- tation of T ng cou ld replace TCP completely tr ansparently to applications, dy namically probin g the network an d r emote host for T ng support and falling back on TCP if necessary . Compatibility with Existing PEPs: While a DCCP-like protoco l is mo st suited to T n g ’ s Flow Lay er , a T n g stack might support the use of standard TCP as a fallback “Flow Layer, ” atop which the T ng stack’ s true Isolation an d Se- mantic Layer protocols w o uld run as if a TCP “ap plication. ” While TCP’ s overhead and ordering constraints may incur a perfor mance cost, encapsulation in le g acy TCP ﬂo ws would make the ne w stack e ven more compatible with existing net- works and c apable of beneﬁtin g fr om existing TCP-based PEPs, and could still re store end- to-end fate-sha ring by en- suring th at the new Seman tic Layer retains all end-to -end “hard state” and can restart failed TCP ﬂo ws. 8. RELA TED WORK Prior work ha s explored g eneral pro tocol decompo sition concepts, such as cross-layer protoco l stack optimiza tion [28 ], modular com position [54, 74 ], and pr otocol compilatio n [22] W e focu s in contrast on leveraging protoco l decomp osition to ad dress the speciﬁc pro blem of sup porting in- path ﬂow optimizers cleanly . Flow splitting is closely related to TCP splitting [8, 16, 109], retaining the simplicity , g enerality , and m odularity of TCP splitting without interf ering with en d-to-end secur ity or semantics. Many optimiz ation techniques attempt to avoid breaking TCP’ s end-to-end semantics by silently manipulat- ing a co ngestion con trol lo op “fr om the middle” [ 11, 19], but r isk unexpected interac tions with o ther PEPs on the path or with upg raded endp oints [106], and r emain inco mpatible with end-to-en d IPsec [16], as described in Section 2.2. Like T ng ’ s Flow Lay er , prior work has factor ed co nges- tion contr ol fo r other reasons: TCP contr ol block interdepen - dence [101], Connec tion Manag er [ 10], and TCP/SP AND [112] aggregate congestion state across ﬂows, an d DCCP [64] pro- vides an unr eliable, con gestion-co ntrolled datagram trans- port. DCCP and CM have fea tures tha t com plement our Flow Layer, such as CM’ s suppo rt for state aggregation and application- layer framing [28], an d DCCP’ s congestion con- trol scheme negotiatio n. Oth er experimental transports such as Split-TCP [65], p TCP [53], m TCP [ 111], LS-SCTP [3 ], and SST [42] have factored congestion co ntrol from trans- port semantics internally for other reasons. T ng ’ s Endpo int Layer, which factors an d expo ses appli- cation en dpoint identities to the network, h as preceden t in Xerox Pup [15 ] and AppleT a lk [88], which inclu de “so cket 10 number s” in their network -layer addresses, and Sirpent [23], which treats application -lev el end points as p art of Network Layer source routes. While IP’ s splitting of endpo int iden- tity across laye rs is con sistent w ith the OSI mod el [113 ], T en nenhou se argu ed ag ainst layered multiplexing d ue to the difﬁculty it presents to real-time sched uling [100] , and Feld- meier elaborate d o n related issues [34] . Much p rior work has focu sed on ﬁrewalls and NA Ts, such as NA T traversal schemes [14, 41, 4 7], signaling pr otocols [24 , 105] , and NA T - friendly rou ting architec tures [48, 107]. W e expect that fu- ture work exploring T ng ’ s Endpoint Lay er will draw heavily from this body of work. T ng ’ s Isolation Lay er is inspired by location-in depende nt addressing systems such as SFS [70] , i 3 [ 94], HIP [76], an d and UIA [43 ], and by IPsec’ s applicatio n-transpar ent secu - rity [63 ]; T ng ’ s contribution is to po sition such mechanisms so as to av o id interfer ence with either the network-o riented or application -oriented functions of traditional transpor ts. 9. CONCLUSION Driv en b y th e ch allenges of optimizing Intern et perfo r- mance ov e r to day’ s e xp losiv e diversity of network technolo - gies, the boom ing netw ork acceleration industry grew in the US from $236 million in 2 005 [50] to $1 billion in 200 9 [71], and now markets PEPs im plementing a variety o f transpor t- and h igher-le vel acceler ation techn iques. I f conventional transport layer PEPs pro liferate like ﬁre walls and N A Ts al- ready h av e, we predict that: (a) new tran sports and end- to- end IPsec will become practically unde ployable even with UDP encapsulatio n for N A T/ﬁrew all traversal, bec ause they will perfor m p oorly on heterog eneous p aths that optimize only TCP and not UDP tr afﬁc; and (b) multiple independent mid-loop tu ning PE Ps will increa singly be foun d ac ciden- tally cohab iting the sam e TCP paths, causing u npredictab le control interactions and mysteriou s network failures. By factoring congestion control to suppo rt ﬂow splitting, T ng demon strates an architecturally clean alt ernative to con- ventional PEPs, providin g the simplicity and gene rality of TCP splitting, b ut withou t risk ing unpredictable in teractions among mid-loop tun ing PEPs, and without interfering with end-to- end transpor t-neutral security , end-to- end semantics, or fate-sharing. While we make no pr etense that th is paper deﬁnes a complete next-generatio n transport services archi- tecture, or that ﬂow splitting alon e w ould driv e the wid espread deployment of such an architecture , we hop e that the many beneﬁts po tentially achiev ab le at once fr om a car eful fac- toring of congestion control from transport semantics [3, 10 , 42, 1 01, 11 2] will eventually d rive the deploym ent of a next- generation architectu re incorp orating t hese ideas. 10. REFERENCE S [1] W . Aiello et al. Just f as t keying: Key agreement in a hostile Int ernet. TISSEC , 7(2):1–32, May 2004. [2] I. F . Akyildiz, G. Morabito, and S. P alazz o. TCP-Peach: A new congestion control scheme for satellite IP networ ks. Tr ansac tions on Networking , 9(3), June 2001. [3] A. A. E. Al, T . Saadawi, and M. Lee. LS-SCTP: a bandwidth aggregation technique for stream control transmission protocol. Computer Communications , 27(10):1012–1024, June 2004. [4] M. Allman, H. Kruse, and S. Ostermann. An application-lev el solution to TCP’ s satellite inefﬁciencies. In 1st WOSBIS , No v . 1996. [5] M. Allman, V . Paxson, and W . Stevens. TCP congestion control , Apr . 1999. RFC 2581. [6] A. Baiocchi, A. P . Castellani, and F . V acirca. Y eAH-TCP: Y et another highspeed TCP. In 5th PFLDnet W orkshop , Feb . 2007. [7] F . Baker, ed. Requirements for IP version 4 routers, June 1995. RFC 1812. [8] A. V . Bakre and B. Badrinath. Implementation and performance ev aluation of indirect TCP. IEEE T ransactions on Computers , 46(3):260–278, Mar . 1997. [9] H. Balakrishnan and R. H. Katz. Explicit loss notiﬁcation and wireless web performance. In IEEE Globecom Internet Mini-Confer e nce , Nov . 1998. [10] H. Balakrishnan, H. S. Rahul, and S. Seshan. An integr ated congestion management architecture for Internet hosts. In SIGCOMM , Sept. 1999. [11] H. Balakrishnan, S. Seshan, E. Amir , and R. H. Katz. Improvin g TC P/IP performance over wireless networ ks. In 1st MOBICOM , Nov . 1995. [12] C. Barakat, E. Altman, and W . Dabbous . On TCP performance in an heterogeneous network: A surve y . T echnical Report 3737, INRIA, July 1999. [13] S. Bellovin. Defending against sequence numb er attacks, May 1996. RFC 1948. [14] A. Biggadike et al. N A TBLASTER: Establishing TCP connections between hosts behind NA Ts. In ACM SIGCOMM Asia W orkshop , Apr. 2005. [15] D. R. Boggs, J. F . Shoch, E. A. T aft, and R. M. Metcalfe. Pup: An internetwork architecture. IEEE T ransactions on Communications , 28(4):612–624, Apr . 1980. [16] J. Border et al. Perform ance e nhancing proxies intended to mitigate link-related degradations, June 2001. RFC 3135. [17] R. Braden, ed. Requirements for Internet hosts — communication layers, Oct. 1989. RFC 1122. [18] L. Brakmo and L. Peterson. TCP V egas: End to end congestion avoidance on a global Internet. IEEE Journal on Selected Ar eas in Communications , 13(8):1465–1480, Oct. 1995. [19] K. Brown and S. Singh. M-TCP: TCP for mobile cellular networks. Computer Communications Review , 27(5) :19–43, Oct. 1997. [20] R. C ´ aceres and L. Iftode. Impro ving the pe rformance of reliable transport protocols in mobile computing envir onments. IEEE Journal on Selected Ar eas in Communications , 13(5):850–857, June 1995. [21] C. Casetti, M. Gerla, S. Mascolo, M. Sanadidi, and R. W ang. TCP W estwood: End-to-end congestion control for wired/wireless networks. W ireless Networks , 8(5):467–479, Sept. 2002. [22] C. Castelluccia and W . Da bbous. Generating efﬁ cient protocol code from an abstract speciﬁcation. In SIGCOMM , Aug. 1996. [23] D. R. Cheriton. Sirpent: A high-performance internetworking approach. In SIGCOMM , Sept. 1989. [24] S. Cheshire, M. Krochmal, and K. Sekar . NA T port mapping protocol, June 2005. Internet-Draft (W ork in Progress). [25] A. Chockalingam, M. Zorzi, and V . Tr alli. W ireless TCP performance with link layer FEC/ARQ. In ICC , June 1999. [26] Cisco, Inc. Rate based satellite control protocol, 2004. [27] D. D. Clark. The design philosophy of the D ARP A Internet proto cols. In SIGCOMM , Aug. 1988. [28] D. D. Clark and D. L. T ennenhouse. Architectural considerations for a new generation of protocols. In SIGCOMM , pages 200–208, 1990. [29] J. Crowcroft and P . Oec hslin. Differ entiated end-to-end internet services using a weighted proportional fair sharing TCP. ACM CCR , 28(3):53–69, July 1998. [30] D. W . Davies. The control of congestion in packet switching networks. IEEE T ransactions on Communications , 20(3):546–550, June 1972. [31] T . Dierks and E. Rescorla. The transport layer security (TLS) protocol version 1.1, Apr . 2006. R FC 4346. [32] M. Dischinger , A. Haeberlen, K. P . Gummadi, and S. Saroiu. Characterizing residential broadband networks. In IMC , Oct. 2007. [33] W . E ddy . TCP SYN ﬂooding attacks and common mitigations, Aug. 2007. RFC 4987. [34] D. C. Feldmeier . Multiplexing issues in communication system design. In SIGCOMM , Sept. 1990. [35] P . Ferrill. Network trafﬁc shaping tools. Pr ocessor , 28(16):4, Apr . 2006. [36] G. G. Finn. A connectionless congestion control algorithm. ACM CCR , 19(5):12–31, Oct. 1989. [37] S. Floyd. Connections with multiple congested gatew ays in packet-switched networks, part 1: One-way trafﬁc. A CM CCR , 21(5):30–47, Oct. 1991. [38] S. Floyd. HighSpeed TCP for lar ge congestion windows, Dec. 2003. RFC 3649. [39] S. Floyd and K. Fall. P romoting the use of end-to-end congestion control in the Internet. T ransactions on Networking , 7(4):458–472, Aug. 1999. [40] S. Floyd and V . Jacobson. Random early detection gatew ays for congestion avoid ance. T ransactions on Networking , 1(4):1063–6692, Aug. 1993. [41] B. For d. Peer-to-peer communication across network address translators. In USENIX , Apr . 2005. [42] B. For d. Structured streams: a new transport abstraction. In SIGCOMM , Aug. 2007. [43] B. For d et al. Persistent personal nam es for globally connected mobile devices. In 7th OSDI , Nov . 2006. 11 [44] B. For d and J. Iyengar . Breaking up the transport logjam. In HotNets-VII , Oct. 2008. [45] N. Freed. Behavior of and requirements for Internet ﬁr ewalls, Oct. 2000. RFC 2979. [46] M. Gerla and L. Kleinrock. Flow control : A comparative surv ey . IEE E T ransactions on Communications , 28(4):553–574, Apr . 1980. [47] S. Guha and P . Francis. Characterization and measurement of TCP traversal through N A Ts and ﬁrew alls. In IMC , Oct. 2005. [48] S. Guha, Y . T akeday , and P . Francis. NUTSS: A SIP-based approach to UDP and TCP network connectivity . In SIGCOMM 2004 W orkshops , Aug. 2004. [49] A. Habib and B. Bharga va. Unr e sponsive ﬂ ow detection and control using the different iated services framew ork. In PDCS , Aug. 2001. [50] M. Hall. W AN optimization dominated by startups, growing f as t. Enterpris e Networking Planet , Apr . 2006. [51] G. Holland and N. V aidya. Analysis of TCP performance over mobile ad hoc networks. W ireless Networks , 8(2), Mar . 2002. [52] R. Housley . Using advanced encry ption standard (AES) counter mode with IPsec encapsulating security payload (ESP), Jan. 2004. RFC 3686. [53] H.-Y . Hsieh and R. Siv akumar. pTCP: An end-to-end transport layer protocol for striped connections. In 10th ICNP , Nov . 2002. [54] N. C. Hutchinson and L. L. Peterson. The x-Kernel: An architecture for implementing network protocol s. IEEE T ransactions on Softwar e Engineering , 17(1), Jan. 1991. [55] H. Inamura et al. Impact of layer two ARQ on TCP performance in W -CDMA networks. In ICDCS , Mar . 2004. [56] V . Jacobson. Congestion av oidance and control. pages 314–329, Aug. 1988. [57] K. Jin, K. Kim, and J. Lee. SP ACK: rapid recov ery of the TCP performance using split-ack in mobile communication envir onments . In IEEE Regi on 10 Conference , Sept. 1999. [58] S. Jin et al. A spectrum of TCP-friendly window-based congestion control algorithms. T ransactions on Networking , 11(3):341–355, June 2003. [59] D. Katabi, M. Handley , and C. Rohrs. Internet congestion control for high bandwidth-delay product networks. In SIGCOMM , Aug. 2002. [60] C. Kaufman, Ed. Internet key exchange ( IKE v2) protocol, Dec. 2005. RFC 4306. [61] S. Kelly and S. Fr ankel. Using HMAC-S HA-256, HMAC-SHA-384, and HMAC-SHA- 512 with IPsec, May 2007. RFC 4868. [62] T . Kelly . Scalable TCP: Impr oving performance in highspeed wide area networks. Computer Communications Review , 33(2):83–91, Apr . 2003. [63] S. Kent and K. S eo. Security architecture for the Internet protocol, Dec. 2005. RFC 4301. [64] E. K ohler , M . Handley , and S. Floyd. Datagram congestion control protocol (DCCP), Mar . 2006. RFC 4340. [65] S. Kopp arty , S. V . Krishnamurthy , M. Faloutsos, and S. K. Tripathi . Split TCP for mobile ad hoc networks. In IEEE GLOBECOM , Nov . 2002. [66] H. T . Kung and A. Chapman. The FCVC (ﬂo w-controlled virtual channels) proposal for A TM networks: A s ummary . In 1st ICNP , Oct. 1993. [67] J. Liu and S. Singh. A TC P: TCP for mobi le ad hoc networks. IEEE Journal on Selected Areas in Communications , 19(7):1 300–1315, July 2001. [68] C. Lochert, B. Scheuermann, and M. Mauve. A surve y on conges tion control for mobile ad-hoc networ ks. WCMC , 7(5):655–676, June 2007. [69] L. Magalhaes and R. Krav ets. Transport level m echanisms for bandwidth aggregation on mobil e hosts. In 9th ICNP , Nov . 2001. [70] D. Mazi ` eres, M. Kaminsky , M. F . Kaashoek, and E. Witchel. Separating k ey management from ﬁle system security . In 17th SOSP , Dec. 1999. [71] S. McGillicuddy . W AN optimization market passes $1 billion; Cisco takes the lead. Sear chEnterpr iseW AN .com , Mar . 2009. [72] P . P . Mishra and H. Kanakia. A hop by hop rate-based congestion control scheme. In SIGCOMM , Aug. 1992. [73] J. Mo, R. J. La, V . Anantharam, and J. W alrand. Analysis and comparison of TCP Reno and V egas. In INFOCOM , Mar . 1999. [74] R. Morris, E. K ohler , J . Jannotti, and M. F . Kaashoek. The Click modular router . In 17th SOSP , Dec. 1999. [75] R. Moskowit z et al. Host identity protocol, Apr . 2008. RFC 5201. [76] R. Moskowit z and P . Nikander. Host identity protocol (HIP) architecture, May 2006. RFC 4423. [77] J. Nagle. Congestion Control in IP/TCP Internetwor ks, Jan. 1984. RFC 896. [78] C. Partr idge and R. Hinden. V ersion 2 of the reli able data protocol (RDP), Apr . 1990. RFC 1151. [79] J. Postel. User datagram protocol, Aug. 1980. RFC 768. [80] W . Prue and J. Postel. Somethi ng a host could do with source quench: The source quench introduced delay (SQuID), July 1987. RFC 1016. [81] K. Ramakrishnan, S. Floyd, and D. Black. The addition of explicit congestion notiﬁcation (ECN) to IP, Sept. 2001. RFC 3168. [82] A. Rangarajan and A. Acharya. ERUF: Early regu lation of unresponsive best-effort tr afﬁc. In 7th ICNP , Oct. 1999. [83] E. Rescorla and N. Modadugu. Datagram transport layer security , Apr . 2006. RFC 4347. [84] L. G. Roberts. The ne xt generation of I P — ﬂo w routing. In SSGRR , J uly 2003. [85] J. Rosenberg. UDP and TCP as the new waist of the Internet hourglass, Feb . 2008. Internet-Draft (W ork in Progress). [86] J. H. Saltzer , D. P . Reed, and D. D. Clark. End-to-end arguments in system design. TOCS , 2(4):277 –288, Nov . 1984. [87] S. Sav age et al. TCP congestion control with a misbehaving r e ceiver . Computer Communications Review , 29(5) , Oct. 1999. [88] G. S. Sidhu, R. F . Andrews, and A. B. Oppenheimer . Inside Appletalk . Addison-W esley , 2rd edition, 1990. [89] P . Sinha et al. WTCP: A reliable transport protocol for wireless wide-area networks. W ireless Networks , 8(2):301 –316, Mar. 2002. [90] H. Siv akumar, S. Bailey , and R. Grossman. PSockets: The cas e for application-lev el network striping for data intensive applications using high speed wide area networks. In SC2000 , Nov . 2000. [91] P . Srisuresh and K. Egevang. Tr aditional IP network address translator (Tr aditional NA T), Jan. 2001. RFC 3022. [92] M. Stanojevic, R. Mahajan, T . M illstein, and M. Musuvathi. Can you fool me? tow ards automatically checking prot ocol gullibility . In HotNets-VII , Oct. 2008. [93] R. Stew art, ed. Stream control transmission protocol, Sept. 2007. RFC 4960. [94] I. Stoica et al. Internet indirection infrastructure. In SIGCOMM , Aug. 2002. [95] I. Stoica, S. Shenker , and H. Z hang. Core-stateless fair queueing: A s calable architecture to approximate fair bandwidth allocations in high speed networks. In SIGCOMM , Aug. 1998. [96] K. Sundaresan, V . Anantharaman, H. Hsieh, and R. Siv akumar. A TP: A reliable transport protocol for ad-hoc networks. In ACM MOBIHOC , June 2003. [97] M. Swan y. Impro ving throughput for grid applications with network logistics. In SC2004 , Nov . 2004. [98] K. T an, J. Song, Q. Zhang, and M. Sridharan. Compound TCP: A scalable and TCP-friendly congestion control for high-speed networks. In INFOCOM , Apr . 2006. [99] T ransmission control protocol, Sept. 1981. RFC 793. [100] D. L. T ennenhouse. Layered multiplexing considered harmful. In 1st International W orkshop on Pr otocols for High-Speed Networks , May 1989. [101] J. T ouch. TCP control block interdependence, Apr . 1997. RFC 2140. [102] J. T ouch. A TCP option for port names, Apr . 2006. Internet-Draft (W ork in Progress). [103] J. D. T ouch, Y .-S. W ang, and V . Pingali. A recursi ve netw ork architecture. T echnical Report ISI-TR-2006-626, Univer sity of Southern California Information Sciences Institute, Oct. 2006. [104] T rolltech. Qt cross-platform application framew ork. http://trolltech .com/products/qt/ . [105] UPnP Forum. Internet gatew ay device (IGD) standardized device cont rol protocol, Nov . 2001. http://www.upn p.org/ . [106] S. V angala and M. A. Labrador . The TC P SA CK-aware snoop protocol for TCP ov er wireless networks. In V ehicular T echnology Confer ence , Oct. 2003. [107] M. W alﬁsh, J. Stribling, M. Krohn, H. Balakrishnan, R. Morris, and S. Shenker . Middleboxes no longer considered harmful. In USENIX Symposium on Operating Systems Design and Implementation , Dec. 2004. [108] J. W . W ong and V . C. Leung. Impro ving end-to-end performance of TCP using link-layer retransmissions over mobile internetw orks. In ICC , June 1999. [109] R. Y avatkar and N. Bhagawat. Impro ving end-to-end performance of TCP o ver mobile internetworks. In W orks hop on Mobile Computing Systems and Applications , Dec. 1994. [110] Y . Y i and S. Shakkottai. Hop-by-hop congestion control ov er a wireless multi-hop network. IEEE T ransactions on Networking , 15(1):133–144, Feb. 2007. [111] M. Zhang, J. Lai, A. Krishnamurthy , L. Peterson, and R. W ang. A transport layer approach for improving end-to-end performance and robustness using redundant paths. In USENIX , June 2004. [112] Y . Zhang, L. Qiu, and S. Keshav . Speeding up short data transfers: Theory , architectural support and simulation results. In 10th NOSSDA V , June 2000. [113] H. Zimmermann. OSI reference model—the ISO model of architecture for open systems interconnection. IEEE T ransactions on Communications , 28(4):425–432, Apr . 1980. 12

Flow Splitting with Fate Sharing in a Next Generation Transport Services Architecture

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment