Path Diversity over Packet Switched Networks: Performance Analysis and Rate Allocation

Path diversity works by setting up multiple parallel connections between the end points using the topological path redundancy of the network. In this paper, \textit{Forward Error Correction} (FEC) is applied across multiple independent paths to enhan…

Authors: Shervan Fash, i, Shahab Oveis Gharan

Path Diversity over Packet Switched Networks: Performance Analysis and   Rate Allocation
Path Di versity o ver P acket Switched Networks: P erf ormance Analysis and Rate Allocation Shervan F ashandi, Shahab Ov eis Gharan and Amir K. Khandani Electrical and Computer Engineering Department University of W aterloo, W aterloo, ON, Canada E-mail: { sfashand,shahab,kha ndani } @cst.uwaterloo .ca T echnical Report UW -E&CE#2008-09 May 2008 1 P ath Di versit y o v er P ac ket Switched Networks: Performance Analysis and Rate Allocation Sherv an F ashandi, Shahab Oveis Gharan and Ami r K. Khandani, Me mber , IEEE Abstract Path di versity w orks by setting up multip le parallel connections between the end p oints using the top ological path redund ancy of the network. In this paper, F orwar d E rr or Corr ectio n (FEC) is ap plied acro ss multiple indepen - dent paths to enhan ce th e end -to-end reliability . Network paths are m odeled as erasure Gilbert-Ellio t channels [1]– [5]. It is known that over any era sure chan nel, Maximum Distance S eparable ( MDS) cod es achieve the minimum probab ility of ir recoverable loss amo ng all block codes of the same size [6], [ 7]. Based on the adopted mod el for the erro r behavior , we pr ove that the pr obability of irrecoverable lo ss for MDS codes deca ys exponentially for an asymptotically large numb er of path s. Then , optim al rate allocation p roblem is solved fo r th e asym ptotic case where the numb er of paths is large. Moreover , it is shown th at in such asymptotically op timal rate allocatio n, ea ch p ath is assigned a positive rate iff its quality is above a certain threshold . Th e quality of a path is defined as the p ercentage of the time it spend s in th e bad state. Finally , using dy namic programm ing, a heur istic sub optimal algorithm with polyno mial runtime is p roposed for ra te allocation over a finite number of paths. Th is algorith m con verges to the asymptotically optimal rate allocation when the number o f paths is large. The simulation results show th at th e propo sed alg orithm ap prox imates the optim al rate a llocation (f ound by exhaustive search) very closely f or practical number of p aths, and provid es significant p erform ance improvement comp ared to the alternative schemes o f rate allocation. 1 Index T erms Path di versity , Intern et, MDS codes, erasure, forward error corr ection, rate allocation, com plexity . I . I N T RO D U C T I O N I N recent years, path diversity over the Internet has received sig nificant attention. It has been shown that path diversity has the abi lity t o sim ultaneously im prove the end-to-end rate and reliability [3], [8]–[10]. In a dense network like the Internet, it is usu ally poss ible to find multi ple independent paths between most pairs of nodes [11]–[16]. A s et of paths are d efined to be independent if their corresponding packet l oss and delay characteristics are independent. Clearly , disjoint paths would be i ndependent too [3], 1 Financial support provided by Nortel and the corresponding matching funds by the Natural S ciences and Engineering R esearch Council of Canada (NSERC), and Ontario Centres of Excellence (OCE) are gratefully ackno wledged. 2 [4], [8], [11], [12], [17]–[19]. Even when the paths are not compl etely disjoi nt, their loss and delay patt erns may show a high degree of independence as l ong as the nodes and li nks th ey share are not congesti on points or bot tlenecks [3], [11], [12], [14], [16]–[19]. In this paper , F or war d Er r or Corr ection (FEC) is applied across multiple independent paths. Based on t his model, we show that path diver sity significantly enhances th e performance of FEC. In order to apply path dive rsity over any packet switched network, two problem s need t o b e addressed: i) settin g up multiple independent pat hs between the end-nodes, i i) util izing the given ind ependent paths to im prove the end-to-end throughp ut and/or reliabil ity . In this paper , we focus on the second prob lem only . Howe ver , i t should be noted that the first problem has also receiv ed significant att ention in the literature (see [8], [11], [12], [16], [19]–[26]). In case the end-point s ha ve enough control over the path selection process, t he centralized and dist ributed algorithms in references [27] and [28] can b e u sed to find multiple disjoint path s over a large conn ected graph. Howev er , applying such alg orithms over the Internet requires modi fication o f IP routing protocol and extra signaling between the nodes (routers). Of course, modifying the tradit ional IP network is extremely cost ly . T o a void such an expense, overlay networks are in troduced [16], [19], [29]. The basi c idea of overlay networks is to equip very few nodes (sm art nodes) wi th the desired new functio nalities while the rest remain unchanged. The smart nodes form a virtual network connected through vi rtual or logical l inks on top of the actual network. Thus , overlay nodes can b e used as relays to set up i ndependent paths between the end nodes [22], [24]–[26], [30]. Han et. al have experimentally studied the n umber o f av ailable disjoint paths in the Internet using overlay networks [11]. The y ha ve also discussed t he impact of network path diversity on t he performance o f o verlay networks [12], [21]. Reference [20] addresses the problem of distributed overlay network desi gn based on a game t heoretical approach. Man y other researchers ha ve tried to op timize the design of overlay networks such that they offer the maximum degree o f path dive rsity [22], [25], [26], [30]. Moreover , the idea of multihomin g is proposed to set up extra independent paths between th e end-point s [23], [24]. In this t echnique, the end us ers are connected t o more than o ne Internet Service P r ovider s (ISP’ s) simultaneous ly . It is shown that combining multiho ming w ith overlay assis ted routing can im prove the end-to-end performance con siderably [24]. In the cases where the backbone network partial ly cons ists of optical links between the nod es, each optical fiber con v eys tens of in dependent channels (tones). There has b een efforts to t ake advantage of thi s inherent physical layer div ersity in optical networks [30]. Recently , path diversity is utili zed in many applications (see [4], [31]–[34]). Reference [32] combi nes multiple description coding and path div ersity to improve quality of service (QoS) in video streaming. Pack et scheduling over m ultiple paths is addressed in [3 5] t o opti mize the rate-distortion functi on of a video stream. Reference [34] utilizes path diversity to im prove the quality of V oice over IP st reams. 3 According to [34], sendi ng so me redundant voice packets through an extra path helps the recei ver buf fer and the scheduler opti mize the trade-off b etween the maximu m to lerable delay and t he packet loss ratio [34]. In [8], multi path routi ng of TCP p ackets is appli ed t o control t he congestion with minim um signaling overhead. Content Distribution Networks (C DN’ s ) can also take adv antage of path di versity for performance improvement. CDN’ s are a special type of ove rlay networks cons isting of Edge Servers (nodes) responsible for deliv ery of t he contents from an original server t o the end users [29], [36]. Current commercial CDN’ s like Akamai use path di versity based techniques like S ur eRoute to ensure that the edge servers maint ain reliable connectio ns to the original server . V id eo server selection schemes are d iscussed in [22] to maximize path div ersity in CDN’ s. Moreover , references [9] and [3] study t he problem of rate allo cation over multipl e pat hs. Assumin g each path foll ows the leaky bucket model, reference [9] shows that a water -filling schem e provides the minimum end-to-end delay . On t he ot her hand, reference [3] consi ders a scenario of multi ple senders and a single receiver , assum ing all the senders share the same source of data. The connection between each sender and the receiv er i s assu med to follow the Gilbert-Elliot model. They propose a recei ver -dri ven protocol for packet partitioni ng and rate allocation. The packet partitio ning algo rithm ensures no sender sends th e sam e p acket, while the rate allocation al gorithm minimizes the probability of irreco verable loss in the FEC s cheme [3]. They onl y address t he rate allocation problem for the case of two paths. A brute-force search algo rithm i s proposed in [3] to sol ve the problem. Generalization of this algorit hm over mult iple paths results in an exponential complexity i n terms of t he number of paths. Moreover , it should be noted that t he scenario of [3] is equiv alent, without any loss of generality , to the case in which multiple in dependent paths connect a pair of end-nodes as t hey assume th e senders share the same data. Maximum Distan ce Separable (MDS) codes hav e been shown t o be op timum in the sense that they achie ve the maxim um possible minimu m distance ( d min ) among all the block codes of the s ame size [37]. Indeed, any [ N , K ] MDS code (wit h block length N and K informatio n symb ols) can be su ccessfully recov ered from any subset of its entries o f length K or more. This p roperty makes MDS codes fa v orable FEC schemes over the erasure channels like th e Internet [38]–[40]. Howe ver , the simple and practical encoding-decoding algorithms for such c odes ha ve qu adratic time complexity i n terms of the cod e size [41]. Theoretically , more effi cient ( O  N lo g 2 ( N )  ) MDS codes can be constructed based on eva luating and interpolating pol ynomials over speciall y chosen finite fields using Discrete Fourier T ransform [42], but these method s are not competi tive in practice wi th the sim pler q uadratic m ethods except for extremely lar ge block s izes. Recently , a family of al most-MDS codes with low encodin g-decoding t ime com plexity (linear in term of the code lengt h) is proposed and shown to be practical over the erasure channels like the Internet [43], [44]. In these codes, any subset o f symbols of si ze K (1 + ǫ ) is sufficient to recove r the 4 original K symbols wit h high probability [44]. MDS cod es also require alph abets of a lar ge size. Indeed, all t he k nown MDS codes ha ve alp habet sizes growing at least linearly with the block l ength N . There is a conjecture stating that all the [ N , K ] MDS codes over the Galois field F q with 1 < K < N − 1 have the property that N ≤ q + 1 with two exceptions [37]. Howe ver , thi s is not an issue in the practical networking applications since the alphabet size is q = 2 r where r i s the packet size, i.e. the block size is much smaller than the alphabet si ze. Alg ebraic computation over Galois fields ( F q ) of such cardinalities is now practically po ssible with the i ncreasing processing power of electronic ci rcuits. Note that network coding schemes, recently propos ed and applied for cont ent dist ribution over l ar ge networks, have a comparable com putational comp lexity [45]–[47]. In thi s work, w e utilize path diversity to imp rove t he performance of FEC between t wo end-nodes over a general packet switched network like the Internet. The details of path setup process is not discussed here. More precisely , i t is assumed t hat L independent p aths are set up by a smart overlay network or any o ther m eans [8], [11], [12], [16 ], [18]–[26]. Each path i s m odeled by a t wo-state conti nuous time Markov process called Gilbert-Elliot channel [1]–[5]. Probability of irrecov erable loss ( P E ) is d efined as the measure of FEC performance. It is kno wn that M DS block codes ha ve t he minimu m prob ability of error over our End-to-End Channel m odel, and over any other erasure channel with or withou t memory [6], [7]. Applyi ng MDS codes, our analy sis shows an exponential decay of P E with respect to L for the asymptotic case w here the number o f path s is large. Of course, in many practical cases, the number of disjoint or independent paths between the end nodes is lim itted. Howe ver , in our asymptotic analysis, we ha ve assum ed that it is poss ible to find L independent paths between the end po ints even when L is lar ge. Moreover , the optimal rate allocation problem is so lved in t he asympto tic case. It is seen that in the asymptoti cally optimal rate allocation, each path is assigned a posit iv e rate iff its quali ty is above a certain th reshold. Quali ty of a path is defined as the percentage of the time it spends in the bad s tate. Furthermore, u sing d ynamic programming, a heuristic suboptimal algorit hm is proposed for rate allocation over a finite num ber of paths (limit ted L ). Unli ke the brut e-force search, this algorithm has a polynomi al complexity , i n terms of the nu mber of path s. It i s shown that the result of this al gorithm con ver ges to the asymptoticall y opti mal sol ution for large number of path s. Finally , the propo sed algorithm is si mulated and compared wi th the op timal rate allocati on found by exhaustiv e search for practical number of paths. Simulation resul ts verify the near-optimal performance of the propos ed subo ptimal algori thm in practical scenarios. The rest of this paper is organized as follo ws. Section II describes the s ystem model. Probability distribution of the bad b urst duration is d iscussed in section III . Performance of FEC in three cases of a single path, multi ple identical paths, and non-identi cal paths are analyzed in s ection IV. Section V 5 Fig. 1. Continuous-time two-state Markov model of the end-to-en d channel studies the rate allocation problem, and proposes a suboptimal rate allocation algorithm . Finally , section VI concludes the paper . I I . S Y S T E M M O D E L I N G A N D F O R M U L A T I O N A. End-to-End Channel Model From an end t o end prot ocol’ s perspective, performance of the lower layers in the prot ocol stack can be modeled as a rando m channel called the end-to -end channel . Since each packet usually includes an internal error detectio n codin g (for instance a Cyclic Redundancy Check), t he end-to-end channel i s sati sfactorily modeled as an erasure channel. Delay of th e end-to-end channel i s strong ly dependent on i ts packet los s pattern, and aff ects the QoS considerably [48], [49]. In t his work, t he m odel assumed for t he end-to-end chann el is a two-state Markov model called Gilbert- Elliot cell, depicted in Fig. 1. The chann el spends an exponentially distributed random amount of ti me with the mean 1 µ g in the Good state. Then, it alt ernates to the Bad state and stays in that state for anoth er random duration exponenti ally dist ributed with the m ean 1 µ b . It is assumed that t he channel state does not change du ring the transmiss ion of a giv en packet [4], [50], [51]. Hence, if a packet is transmitt ed from the source at anytim e during the goo d state, it will be recei ved correctly . Otherwise, if it is transm itted during the bad state, it will eventually be lo st before reaching the destination . Therefore, the a verage probability of error is equal to the steady state probabil ity of bei ng in t he bad state, π b = µ g µ g + µ b . T o have a reasonably low probabilit y of error , µ g must be much sm aller than µ b . This model i s widely us ed in the li terature for theoretical analysis where delay is not a si gnificant factor [1]–[5], [50]–[52]. Despit e its simplicit y , this model satisfactorily captures the bursty error characteristic of the end-to-end channel. Mo re comprehensive mo dels like the hi dden Markov model are i ntroduced in [49], [53]. Althoug h analytically cumbersome, s uch models express the dependency of loss and delay more accurately . B. T ypical FEC Mod el A concatenated coding is used for packet transm ission. The coding inside each packet can be a si mple Cyclic Redundancy Check (CRC) which enables the receiver to detect an error inside each packet. Then, 6 0000000 0000000 0000000 0000000 0000000 0000000 0000000 0000000 0000000 0000000 0000000 0000000 0000000 0000000 0000000 0000000 0000000 0000000 0000000 0000000 1111111 1111111 1111111 1111111 1111111 1111111 1111111 1111111 1111111 1111111 1111111 1111111 1111111 1111111 1111111 1111111 1111111 1111111 1111111 1111111 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 111111 111111 111111 111111 111111 111111 111111 111111 111111 111111 111111 111111 111111 111111 111111 111111 111111 111111 111111 111111 N i N 1 N L Path 1 Path i S i = N i T = N i N S req ≤ W i Source Internet Destination L X i =1 N i = N L X i =1 S i = S req Path L T raffic Reassembler T raffic Allo cator S req = N T (a) T N i Pac k ets 1 S i = T N i (b) Fig. 2. Rate allocation problem: a block of N pack ets is being sent from the source to the destination through L indep endent paths over the network during the time interval T with the required rate S r eq = N T . The block is distributed ov er the paths according to the vector N = ( N 1 , . . . , N L ) which corresponds to the rate allocation vector S = ( S 1 , . . . , S L ) the receiv er can cons ider the end-to-end channel as an erasure channel. Other than the coding inside each packet, a F orwar d Err or Corr ection (FEC) scheme is appli ed between packets. Every K packets are encoded to a Block of N packets where N > K to create some redundancy . The N packets of each block are distributed across the L av ailable independent pat hs, and are receive d at t he d estination with some loss (erasure). The ratio of α = N − K N defines the FEC overhead. A Maximum Distance Separable (MD S) [ N , K ] code, such as the Reed-Solomon code, can reconstruct the original K data packets at the receiver side if K or more of the N p ackets are receiv ed correctly [54]. According to t he fol lowing theorem, an MDS code is the opt imum block code we can design over any erasure channel. Alth ough FEC imposes some bandwidt h overhead, it might be the only option when feedback and retransmissio n are not feasible or fast enough to provide the desirable QoS. Definition I. An erasure channel is defined as the o ne wh ich maps ev ery i nput s ymbol to either itself or to an erasure sym bol ξ . More accurately , an arbitrary channel (memo ryless or with m emory) wit h the input vector x ∈ X N , |X | = q , the o utput vector y ∈ ( X ∪ { ξ } ) N , and t he transiti on probability p ( y | x ) is defined to be erasure iff it satisfies the foll owing condit ions: 1) p ( y j / ∈ { x j , ξ }| x j ) = 0 , ∀ j . 2) Defining the erasure identifier vector e as e j =    1 y j = ξ 0 otherwise p ( e | x ) is independent of x . Theor em I. A b lock code of size [ N , K ] with equiprobable code words over an arbi trary erasure channel (memoryless or w ith memory) has the minim um p robability of error (assuming optimum , i.e., maximum likelihood decoding) among al l block codes of the s ame size if that code is Maximum Dista nce Separable (MDS). The proof is given in [6], [7]. 7 C. Ra te A llocation Problem The network is modeled as follows. L independent paths, 1 , 2 , . . . , L , connect the source to the des- tination, as indicated i n Fig. 2(a). Inform ation bits are t ransmitted as packets, each o f a constant length r . Furthermore, there is a constraint on the maxim um rate for each path, meaning that the i ’th path can support a m aximum rate of W i packets per second. This constraint can be consid ered as an upperboun d imposed by the physical characteristics of the p ath. As an example, [55] introduces th e concept of t he maximum TCP-friendly b andwidth for the maximu m capacity of an Internet path. W i ’ s are assumed to be known at the transmitt er side. For a specific application and FEC schem e, we require a rate of S r eq packets per second from the source to the desti nation. Obviously , we should hav e S r eq ≤ P L i =1 W i to hav e a feasible sol ution. The inform ation packets are assumed t o be cod ed in blocks of lengt h N pack ets. Hence, it takes T = N S r eq seconds to transmit a block of packets. In practical scenarios with finite number of paths, the end-to -end required rate ( S r eq ) is given, and the values o f N and T have to be chosen based on the feasible complexity of the MDS decoder and t he delay constraint o f t he application, respectiv ely . According to the FEC model, we can send N i packets t hrough the path i as l ong as P L i =1 N i = N and N i T ≤ W i . Th e rate assigned to path i can be expressed as S i = N i T = N i N S r eq , since the transmission instants of the N i packets are d istributed e venly over the block duration T (see Fig. 2(b)). Obviously , we hav e P L i =1 S i = S r eq . The objective of rate allocati on p roblem is to find t he op timal rate allocation vector or t he vector N = ( N 1 , · · · , N L ) wh ich min imizes the probability of irrecovera ble los s ( P E ). The above formulation of rate allocation p roblem is v alid for any finite number of paths and any cho sen values of N and T . Howe ver , in secti on IV where the p erformance of path diversity is stud ied for a large number of paths, and also in Theorem III where the opt imality of the proposed suboptim al algorith m is proved for t he asympt otic case, we assume that N grows linearly in terms of the number of paths , i.e. N = n 0 L , for a fixed n 0 . Th e reason behind this assum ption is that wh en L grows asympto tically large, the numb er of paths ev entually exceeds the block length, if N s tays fixed. Thus, L − N paths become useless for th e values o f N l ar ger than N . At the same t ime, it is assum ed that the delay imposed b y FEC, T , stays fixed with respect to L . Th is model results in a lin early increasing rate as the number of paths gro ws. W e wi ll later show that utilizi ng multipl e paths, it i s possible to simu ltaneously achiev e an exponential decay in P E and a linear increase in rate, whil e the delay st ays constant . In this work, an irrecoverable los s is defined as the ev ent where more than N − K packets are lost in a block o f N packets. P E denotes the probabi lity of t his ev ent. It should be no ted that thi s probability is di ff erent from th e decoding error probabili ty of a m aximum li kelihood decoder performed on an MD S [ N , K ] code, denoted by P {E } . Theoretically , an op timum maximum likelihood decoder of an MDS code may still decode the origi nal codew ord correctly with a posit iv e, but very small probability , if it recei ves 8 less than K symb ols (packets). More precisely , such a d ecoder i s able to correctly decode an MDS code over F q with the probability of 1 q i after receiving K − i correct symbols (see the proof of T heorem I in [6], [7] for more details). Of course, for Galois fields with a large cardinality , this probability is us ually negligible. The relationship between P E and P {E } can be summ arized as foll ows: P {E } = P E − K X i =1 P { K − i Pack ets recei ved correctly } q i ≥ P E − 1 q K X i =1 P { K − i P ackets recei ved correctly } = P E  1 − 1 q  . (1) Hence, P {E } is bounded as P E  1 − 1 q  ≤ P {E } ≤ P E . (2) The re ason P E is us ed as the measure of system performance is that while many practical lo w-complexity decoders for MDS codes work perfectly if th e num ber o f correctly received symbols is at least K , their probability of correct decoding is much less than t hat of m aximum likelihood decoders when the num ber of correctly received sym bols is less than K [54]. Thus, in the rest of this paper , P E is used as a close approximation of decoding error . I I I . P RO B A B I L I T Y D I ST R I B U T I O N O F B A D B U R S T S The contin uous random variable B i is defined as the duration of tim e that the path i s pends i n the bad state in a block duration, T . W e denote the v alues of B i with parameter t to emphasize that they are expressed in the unit of time. In this s ection, we focus on one path, for example path 1. Therefore, t he index i can be temporarily dropp ed in analyzing th e probabilit y distribution function (pdf) of B i . W e define the e vents g and b , respective ly , as the channel bei ng in th e good or bad states at the start of a block. Then, the distribution of B can be writt en as f B ( t ) = f B | b ( t ) π b + f B | g π g . (3) T o proceed further , two assumptio ns are made. First, it is assum ed th at π g ≫ π b or equiv alently 1 µ g ≫ 1 µ b . This condition is valid for a channel with a reasonable quality . Besid es, t he b lock time T is assumed to be much shorter than the av erage good state du ration 1 µ g , i.e. 1 ≫ µ g T , such th at T can contain either none or a single interval of b ad burst (see [1], [3], [4] for justi fication). More precisely , the probability of ha ving at least two bad bursts is negligible compared to the probability of having exactly o ne bad burst. Howe ver , it should be noted that all the results of this paper except subsecti on IV -A remain valid 9 Correctly Received Pack et E i = 3 T B i Lost or Incorrect Pack et Bad Burst 1 S i Fig. 3. A bad b urst of duration B i happens in a block of length T . E i = 3 pack ets are corrupted or lost during the interv al B i . Pack ets are transmitted e very 1 S i seconds, where S i is the rate of path i in pk t /sec . regardless of t hese two assump tions. Of course, in th at case, the exact prob ability distribution functi on of B i should be used instead of the app roximation used here (refer to Remark I in s ubsection IV -B). Hence, the pdf of B conditioned on the event b can be approxi mated as f B | b ( t ) = µ b e − µ b t + δ ( t − T ) e − µ b T (4) where δ ( u ) is the Di rac delta function . (4) foll ows from the memoryless nature of the exponential distribution, the assumption that T contains at mo st on e bad burst, and the fact t hat any bad burst longer than T has to be trun cated at B = T . T o compute f B | g ( t ) , w e have f B | g ( t ) = P { B = 0 | g } δ ( t ) − ∂ ∂ t P { B > t | g } (5) where P { B = 0 | g } = e − µ g T ≈ 1 − µ g T (6) and P { B > t | g } ( a ) = (1 − e − µ g ( T − t ) ) e − µ b t ≈ µ g ( T − t ) e − µ b t (7) where ( a ) results from the fact that { B > t | g } is equiv alent to the i nitial good burst being s horter than T − t , and the foll owing bad burst larger t han t , and the duratio n T containin g at most one bad burst. Now , combi ning (4), (5), (6) , and (7), f B ( t ) can be computed. A. Discr ete t o Cont inuous Approximation T o compute the probabi lity o f irrecove rable loss ( P E ), we hav e to find the probabili ty of k i packets being lo st out of the N i packets transmitted through t he path i , for i from 1 to L and k i from 0 to N i . Let us denot e the number of erroneous or los t packe ts over t he path i with the random variable E i . Any two subsequent packets transmit ted ove r the path i are 1 S i seconds apart i n time, w here S i is the transmissio n rate over the i ’th path. W e observe that the probability P { E i ≥ k i } can be app roximated with the con tinuous counterpart P { B i ≥ k i S i } wh en the inter-pack et in terval is much shorter than the typical bad burst ( 1 S i ≪ 1 µ b , o r equiv alently µ b ≪ S i ). Th e necessity of this condition can be intu itive ly justified as fol lows. In case this conditi on does not hold, any two consecutiv e packets have to be transmitted 10 10 15 20 25 30 35 40 10 −3 10 −2 10 −1 µ b T P E Simulation Results Theoretical Prediction Fig. 4. Probability of irrecov erable loss versus µ b T for one path with fix ed µ g , T and α . on two ind ependent states of the channel. Thu s, no gain would b e achiev ed by applying d iv ersity over multiple independent path s. Figure 3 shows an example of t his approximati on in detail. The continuous approximation si mplifies the mathematical analysis as d iscussed i n section IV. I V . P E R F O R M A N C E A NA LY S I S O F F E C O N M U LT I P L E P A T H S Assume that a rate allocation algorithm assigns N i packets t o the path i . According t o the discrete to continuous approxi mation in subsection III-A , w hen t he N i packets of t he FEC block are s ent over path i , th e lo ss count can be written as B i T N i . Hence, the to tal ratio of lost packets is equal t o L X i =1 B i N i T N = L X i =1 B i ρ i T where ρ i = S i S r eq , 0 ≤ ρ i ≤ 1 , deno tes the portion of the bandwid th assi gned to p ath i . x i = B i T is defined as the porti on of time that path i has been in the b ad state ( 0 ≤ x i ≤ 1 ). Hence, the probability of irrecov erable loss for an MDS code is equal to P E = P ( L X i =1 ρ i x i > α ) (8) where α = N − K N . In order to find the optim um rate allocation, P E has to be min imized with respect to the allocation vector ( ρ i ’ s), subject to the following const raints: 0 ≤ ρ i ≤ min  1 , W i S r eq  , P L i =1 ρ i = 1 (9) where W i is the bandwidth constraint on path i defined in subsection II-C. Note that since x i ’ s are proportional to B i ’ s, their p df can be easily computed based on the pdf of B i ’ s. 11 A. P erformance of FEC on a Single P ath Probability of irrecov erable l oss for one path is equal to P E = P { B > αT } = P { B > αT | b } π b + P { B > αT | g } π g where P { B > αT | b } and P { B > αT | g } can be com puted as P { B > α T | b } = R T αT f B | b ( t ) dt = e − µ b αT , P { B > αT | g } = R T αT f B | g ( t ) dt = µ g (1 − α ) T e − µ b αT when t he assump tions in section III and equatio ns (4) and (7) are used. Thus , we ha ve P E = π b e − µ b αT (1 + µ b (1 − α ) T ) ( a ) ≈  1 µ b + (1 − α ) T  µ g e − µ b αT (10) where ( a ) follows from t he assumption that the end-to-end channel has a low probabil ity o f error ( 1 µ g ≫ 1 µ b ). As we observe, for large values of µ b T , P E decays exponentially with µ b T . Figu re 4 shows the results of simulating a typical scenario of streamin g data between t wo end-poin ts with the rate S r eq = 1000 pk t sec , the b lock length N = 200 , and t he number of information packets K = 1 80 . These values resul t in a block transmission ti me of T = 200 ms . The average good burst of th e end-to-end channel, µ g , is selected such th at µ g T = 1 5 . Ho we ver , the av erage b ad burst, µ b , varies such that µ b T v aries from 8 to 4 0 , i n accordance with the values in [3], [4]. The s lope of t he best linear fit (in semilog scale) to the simulati on points is 0 . 0 97 which i s in accordance with the v alue of 0 . 100 , resulted from the t heoretical approximation in (10). B. Identical P aths When the paths are identical and have equal bandwidth constraint s 2 ( W i = W for ∀ 1 ≤ i ≤ L ), due to the sym metry of the probl em, the uni form rate allocation ( ρ i = 1 L ) is o bviously the optimum solution . Of course, th e soluti on is feasible onl y when we hav e 1 L ≤ W S r eq . Then, the probabilit y of i rrecov erable loss can be simplified as P E = P ( 1 L L X i =1 x i > α ) . (11) Let us define Q ( x ) as the probability distri bution function of x . Since x is defined as x = B T , clearly we hav e Q ( x ) = T f B ( xT ) . Defining E {} as the expected value operator throughout this paper , E { x } can b e 2 The case where W i ’ s are differen t is discussed in Remark V of subsection IV -C 12 computed based on Q ( x ) . W e observe that in (11), the random variable x i ’ s are bounded and independent. Hence, the following well-known upperbou nd in lar ge d e viation theory [56] can be applied P E ≤ e − u ( α ) L u ( α ) =    0 for α ≤ E { x } λα − log ( E { e λx } ) otherwise (12) where the log function is computed in Neperian base, and λ is th e so lution of the following non-lin ear equation, wh ich is shown to be un ique by Lemma I. α = E { xe λx } E { e λx } . (13) Since λ is u nique, we can define l ( α ) = λ . Even though being an upperbound, inequality (12) is exponentially tight for large values of L [56]. More precisely P E . = e − u ( α ) L (14) where the notation . = means lim L − > ∞ − log P E L = u ( α ) . Now , we state t wo useful lemm as whose proofs can be foun d in the appendi ces A and B . Lemma I. u ( α ) and l ( α ) have the following properties: 1) ∂ ∂ α l ( α ) > 0 2) l ( α = 0) = −∞ 3) l ( α = E { x } ) = 0 4) l ( α = 1) = + ∞ 5) ∂ ∂ α u ( α ) = l ( α ) > 0 for α > E { x } Lemma II. Defining y = 1 L P L i =1 x i , where x i ’ s are i.i.d. random var iables as already defined, th e probability densi ty functio n of y satisfies f y ( α ) . = e − u ( α ) L , for all α > E { x } . Figure 5 com pares the theoretical and si mulation results. W e assume the blo ck transmissi on tim e is T = 200 ms . The block l ength is propo rtional to the numb er of paths as N = 20 L . The av erage g ood burst of the end-to-end channel, µ g , is selected such th at µ g T = 1 5 . The end-to-end channel has the error probability of π b = 0 . 015 . Coding ov erhead is changed from α = 0 . 05 to α = 0 . 2 . The probability of irrecoverable loss is plotted versus t he number of paths, L , in semilogarithm ic scale i n Fig. 5(a) for diffe rent values of α . W e observe t hat as L i ncreases, log P E decays li nearly which is expected notin g equation (12). Also, Fig. 5(b) compares the slope of each plot in Fig. 5(a) with u ( α ) . Figure 5 sho ws a good agreement bet ween t he theory and th e sim ulation results , and also verifies t he fact t hat the st ronger the FEC code i s (larger α ), the higher is the gain we achiev e through path diversity (larger exponent). Remark I. E quation (14) is a d irect resul t of t he d iscrete t o continuous approxim ation in subsec- tion III-A. Therefore, it remains valid e ven if the other approx imations in section III do not hol d. For 13 1 2 3 4 5 6 7 8 9 10 10 −9 10 −8 10 −7 10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0 Number of Paths (L) P E α =0.2 α =0.175 α =0.15 α =0.125 α =0.1 α =0.075 α =0.05 (a) 0.05 0.075 0.1 0.125 0.15 0.175 0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 α Exponent ( u( α ) ) Simulation Results Theoretical Prediction (b) Fig. 5. (a) P E vs. L f or different values of α . (b) The expo nent ( slope) of plot (a) for differen t values of α : experimental versus theoretical v alues. example, if the bl ock time cont ains more t han one bad burst, equations (4) and (7) are no l onger valid. Howe ve r , equation (14) is still valid as l ong as the discrete t o conti nuous approximation i s used. Of course, in this case, th e exact distributions of B and x should b e used to comput e u ( α ) and λ instead of their simplified versions. Remark II. A special case is when the block code uses all the bandwidt h of t he paths. In thi s case, we have N = LW T , where W is the maximum bandwi dth of each path, and T i s the block duratio n. Assuming α > E { x } is a cons tant independent of L , we observe that the in formation packet rate i s equal to K T = (1 − α ) W L , and the error probability i s P E . = e − u ( α ) L . This sho ws using M DS codes over multi ple i ndependent paths p rovides an exponential decay in the irrecoverable loss probabili ty and a linearly growing end-to-end rate in terms o f th e number of paths, simu ltaneously . C. Non-Identi cal P aths Now , let us assume there are J types of paths b etween the source and th e destin ation, consisting of L j identical paths of type j ( P J j =1 L j = L ). W ithout loss of generality , we assum e that the paths are ordered according to their associated type, i.e. the paths from 1 + P j − 1 k =1 L k to P j k =1 L k are of type j . W e denote γ j = L j L . According to the i .i.d. assumption, it is obvious that ρ i has to be the same for all paths of the same type. η j and y j are defined as η j = X P j − 1 k =1 L k α ) S O = ( ( β 1 , β 2 , · · · , β J ) | 0 ≤ β j ≤ 1 , J X j =1 β j = α ) S T = ( ( β 1 , β 2 , · · · , β J ) | η j E { x j } ≤ β j , J X j =1 β j = α ) respectiv ely . Hence, P E can be written as P E = P ( J X j =1 y j > α ) = Z S I J Y j =1 f y j ( β j ) dβ j . = Z S I e − L J X j =1 γ j u j ( β j η j ) dβ j ( a ) . = e − L min β ∈S I ∪S O J X j =1 γ j u j  β j η j  ( b ) . = e − L min β ∈S O J X j =1 γ j u j  β j η j  ( c ) . = e − L min β ∈S T J X j =1 γ j u j  β j η j  ( d ) . = e − L J X j =1 γ j u j  β ⋆ j η j  (16) where ( a ) follows from Lemma III, ( b ) foll ows from the fa ct that u j ( α ) is a strictly increasing fun ction of α , for α > E { x j } , and ( c ) can be proved as foll ows. Let us denote the vector which mini mizes the exponent over the set S O as ˆ β ⋆ . Since S T is a s ubset of S O , ˆ β ⋆ is either in S T or in S O − S T . In th e former case, ( c ) is obviously valid. When ˆ β ⋆ ∈ S O − S T , we can prov e that 0 ≤ ˆ β ⋆ j ≤ η j E { x j } , for all 1 ≤ j ≤ J , by contradicti on. Let us assu me the opposite is true, i .e., there is at least one ind ex 1 ≤ j ≤ J such that 0 ≤ ˆ β ⋆ j ≤ η j E { x j } , and at least one other i ndex 1 ≤ k ≤ J such that η k E { x k } < ˆ β ⋆ k . Then, knowing that the deriv ativ e of of u j ( α ) is zero for α = E { x j } and s trictly posit iv e for α > E { x j } , a small increase in ˆ β ⋆ j and an equal decrease i n ˆ β ⋆ k reduces the obj ectiv e function , P J j =1 γ j u j  β j η j  , which contradicts the assumption that ˆ β ⋆ is a minimum point. Knowing that 0 ≤ ˆ β ⋆ j < η j E { x j } , for all 1 ≤ j ≤ J , it is easy 15 to show that the minim um v alue of the objective fun ction is zero over S O , and S T has to be an empty set. Defining the minimum value of the posi tive objectiv e function as zero over an empty set ( S T ) makes ( c ) valid for the latter case where ˆ β ⋆ ∈ S O − S T . Finally , applying Lemma IV resul ts in ( d ) where β ⋆ is defined in the L emma. Lemma III. F or any cont inuous positive functio n h ( x ) over a con ve x set S , and defining H ( L ) as H ( L ) = Z S e − h ( x ) L d x we have lim L →∞ − log( H ( L )) L = inf S h ( x ) = min cl ( S ) h ( x ) where cl ( S ) d enotes the closu re of S (refer to [57] for th e definiti on of the closu re operator). Proof of Lemma III can be found in appendix C. Lemma IV . T here exists a unique vector β ⋆ with the elements β ⋆ j = η j l − 1 j  ν η j γ j  which minim izes the con ve x function P J j =1 γ j u j ( β j η j ) over the conv ex set S T , wh ere ν sati sfies th e following condition J X j =1 η j l − 1 j  ν η j γ j  = α. (17) l − 1 () denotes the in verse of the function l () defined i n s ubsection IV -B. Proof of Lemma IV can be foun d in appendix D. Equation (16) i s valid for any fixed v alue of η . T o achiev e the m ost rapid decay of P E , the exponent must b e m aximized over η . lim L →∞ − log P E L = max 0 ≤ η j ≤ 1 J X j =1 γ j u j  β ⋆ j η j  (18) where β ⋆ is defined for any value of t he vector η in Lem ma IV . Theorem II solves the maximization problem i n (18) and i dentifies the asymptot ically o ptimum rate allocation (for large number of paths). Theor em II. Consider a point-to-point conn ection over the network with L in dependent paths from t he source to t he destination, each m odeled as a Gilbert-Elli ot cell, with a lar ge enough b andwidth constraint 3 . The paths are from J differe nt t ypes, L j paths from the t ype j . Ass ume a block FEC of size [ N , K ] i s sent during a time interval T . Let N j denote the number of packets in a bl ock of size N assigned to the paths of type j , such that P J j =1 N j = N . The rate allocation vector η is defined as η j = N j N . For fixed values of γ j = L j L , n 0 = N L , k 0 = K L , T and asymptoti cally large number of paths L , the optim um rate 3 By the term ‘large enough’, we mean t he bandwidth constraint on a path of type j , W j , satisfies the condition η j n 0 T γ j ≤ W j . The r eason is that η j must satisfy both conditions of 0 ≤ η j ≤ 1 and N j T L j = η j n 0 L T γ j L ≤ W j , simultaneously . W hen W j is large enough such that η j n 0 T γ j ≤ W j , the l atter condition is automatically satisfi ed, and the optimization problem can be solved. 16 3 6 9 12 10 −4 10 −3 10 −2 10 −1 Number of Paths (L) P E (a) 3 6 9 12 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Number of Paths (L) η 1 η 1 opt η 1 * (b) Fig. 6. (a) P E versus L for the combination of two path types, one third from type I and t he rest from type II. (b) T he normalized aggreg ated weight of type I paths in the optimal r ate allocation ( η opt 1 ), compared wi th the value of η 1 which maximizes the expone nt of equation (18) ( η ⋆ 1 ). allocation vector η ⋆ can be found by solvi ng the following optimi zation problem: max η g ( η ) , s.t. J X j =1 η j = 1 , 0 ≤ η j ≤ 1 where g ( η ) = P J j =1 γ j u j  β ⋆ j η j  , and β ⋆ is an imp licit function of η defined in Lemma IV . The functions u j () and l j () are defined in subsectio ns IV -B and IV -C . Solving t he above optimization p roblem gives the unique so lution η ⋆ as η ⋆ j =                  0 if α ≤ E { x j } γ j l j ( α ) J X i =1 , α> E { x i } γ i l i ( α ) otherwise (19) if t here is at least one 1 ≤ j ≤ J for which α > E { x j } . Otherwise, when α ≤ E { x j } for all 1 ≤ j ≤ J , the maximum value is zero for any arbitrary rate allocation vector , η . In any case, t he maximum value of the objective function is g ( η ⋆ ) = P J j =1 γ j u j ( α ) which is indeed the exponent of P E versus L . The proof of t he theorem can be found in appendi x E. Remark III. Theorem II can b e interpreted as follows. For l ar ge v alues of L , add ing a ne w type of path contributes to the path diversity iff the path satisfies th e quality constraint α > E { x } , where x is the percentage of tim e that the path spends in the bad state in the time int erv al [0 , T ] . Only in t his case, adding 17 the new type of path exponentially i mproves the performance of the syst em in terms of the probability of i rrecov erable loss. Remark IV . Observing the exponent coefficient correspond ing to the opti mum allocation vector η ⋆ , we can see that the typical error e vent occurs when the ratio of the lost packets on all types of paths i s the same as the total fraction of t he lost p ackets, α . Howe ve r , t his is not th e case for any arbitrary rate allocation vector η . Remark V . An interesti ng extension of Theorem II is the case where all t ypes hav e identical erasure patterns ( u j ( x ) = u k ( x ) for ∀ 1 ≤ j, k ≤ J and ∀ x ), but di ff erent bandwidth constrain ts. Adop ting t he notation of Theorem II, the bandwidt h constraint on η j can be written as η j n 0 L T γ j L ≤ W j , where W j is the maximum bandwidth for a path of type j . Let u s define ˜ η ⋆ as the allocatio n vector whi ch maximizes the objective function of Theorem II ( g ( η ) ), and satisfies the bandwidth constraints too. η ⋆ is als o defined as the m aximizing vector for the un constrained problem i n Th eorem II. According to equation (19), we hav e η ⋆ j = γ j for ∀ 1 ≤ j ≤ J . It is obvious that ˜ η ⋆ = η ⋆ if η ⋆ j ≤ γ j W j T n 0 for all j . In case η ⋆ j does not s atisfy the bandwidth constraint for some j , ˜ η ⋆ can be found by the water-filling alg orithm. More accurately , we hav e ˜ η ⋆ j =      γ j W j T n 0 if ˜ η ⋆ j ≤ γ j Υ γ j Υ if ˜ η ⋆ j < γ j W j T n 0 (20) where Υ can be found by impo sing the condition P J j =1 ˜ η ⋆ j = 1 . Figu re 7 depicts water -filling among identical paths with four different bandwidth constraints. Proof of equatio n (20) can be found in appendix F. Figure 6(a) shows P E of the optimum rate all ocation versus L for a syst em consist ing of two types of pat h. The opti mal rate allocation i s found by exhaustiv e search among all possible allo cation vectors. The block transmiss ion time is T = 200 ms . Th e block length is propo rtional to the num ber of paths as N = 20 L . The ave rage go od burst, µ g , i s selected such that we have µ g T = 1 5 for bo th typ es o f path s. γ 1 = 1 3 of the paths (of t he first type) benefit from shorter bad bursts and lower error probability of π b, 1 = 0 . 015 , and t he rest (the second t ype) suffer from longer congesti on bursts resul ting in a higher error probabi lity of π b, 2 = 0 . 025 . The coding over head is α = 0 . 1 . The figure depicts a li near behavior in semi-logarithm ic scale with t he exponent of 0 . 403 , which i s comparable to 0 . 3 89 result ed from (19). In the scenario of Fig. 6(a), let us deno te η ⋆ 1 as the value of of the first element of η in equation (19). Obviously , η ⋆ 1 does not depend o n L . Mo reove r , η opt 1 is defined as t he normalized aggregated weight of type I paths in the opt imal rate allocation. Figure 6 (b) com pares η opt 1 with η ⋆ 1 for dif ferent n umber of paths. It is observed that η opt 1 con ve rges rapidly to η ⋆ 1 as L grows. Figure 6(a) also verifies that the allocation vector candi date η ⋆ proposed by Theorem II indeed meets the optim al allocation vector for lar ge values of L . 18 η 4 γ 4 η 1 γ 1 W 4 T n 0 W 3 T n 0 W 1 T n 0 η 2 γ 2 η 3 γ 3 W 2 T n 0 Υ Fig. 7. W aterFill ing algorithm ov er identical paths with four different bandwidth constraints. V . S U B O P T I M A L R A T E A L L O C A T I O N In order to compute the complexity of the rate allocation problem , we focus our attention on the original discrete formulation in s ubsection II-C. According to the m odel of subsection IV -C, we assume the av ailable p aths are from J types, L j paths from ty pe j , such that P J j =1 L j = L . Obviously , all t he paths from the same type should hav e equal rate. Therefore, t he rate allocation problem is turned into finding the vector N = ( N 1 , . . . , N J ) such that P J j =1 N j = N , and 0 ≤ N j ≤ L j W j T for all j . N j denotes the n umber of packets assi gned to all t he paths of typ e j . Let us temporarily assume t hat all p aths hav e enough b andwidth such that N j can v ary from 0 to N for all j . There are  N + J − 1 J − 1  L -dimensional non-negati ve v ectors of the form ( N 1 , . . . , N J ) which satisfy the equation P J j =1 N j = N each representing a di stinct rate allocation. Hence, the number of candidates i s exponential in terms of J . First, we prove the problem o f rate allocation is NP [58] in the sense that P E can be computed in polynomial t ime for any candidate vector N = ( N 1 , . . . , N J ) . Let us define P N e ( k , j ) as the probabi lity of h a ving more than k errors over the pat hs of types 1 t o j for a specific allocation vector N . W e also define Q j ( n, k ) as the probabil ity of ha ving exactly k errors out of th e n packets sent over the paths of type j . Q j ( n, k ) can be computed and sto red for all path types and values of n and k with polynom ial complexity as explained in append ices G and H. Then, the following recursive formula holds for P N e ( k , j ) P N e ( k , j ) =        N j X i =0 Q j ( N j , i ) P N e ( k − i, j − 1) if k ≥ 0 1 if k < 0 P N e ( k , 1) = N 1 X i = k +1 Q 1 ( N 1 , i ) . (21) T o compute P N e ( K , J ) by the above recursiv e formula, we apply a well-known technique in t he t heory of algorithms called memoizatio n [59]. Memoization works by st oring the com puted values of a recursiv e function in an array . By keeping this array in the memory , memoization av oids r ecomputing the function for 19 the same ar guments when it is called later . T o compute P N e ( K , J ) , an array of size O ( K J ) is required. This array should be filled with the values of P N e ( k , j ) for 0 < k ≤ K , and 1 ≤ j ≤ J . Computing P N e ( k , j ) requires O ( K ) operations assum ing the values of P N e ( i, j − 1) and Q j ( N j , i ) and P N j i = k +1 Q j ( N j , i ) are already computed for 0 ≤ i ≤ k . Thus, P N e ( K , J ) can be computed with the complexity of O ( K 2 J ) if the values of Q j ( N j , k ) are give n for all N j and 0 ≤ k ≤ K . Following appendix H, we note that for each j , Q j ( N j , k ) for 0 ≤ k ≤ K is computed offline with the complexity of O ( K 2 L j ) + O  N j L j K  . Hence, the total complexity of computing P N e ( K , J ) adds up to O ( K 2 J ) + J X j =1 O  K 2 L j + N j L j K  ( a ) = O ( K 2 J ) + J X j =1 O  K 2 L j + N j K  ( b ) = O  K 2 L + K N  (22) where ( a ) follows from t he fact that N j L j < N j , and the term O ( K 2 J ) is omi tted in ( b ) since we know that J < L . Now , we prop ose a subopti mal pol ynomial time algorithm to estimate the best path all ocation vector , N opt . Let u s define P opt e ( n, k , j ) as the probability of ha ving more t han k errors for a block of length n over t he paths of types 1 t o j mini mized ove r all possible rate allocation s ( N = N opt ). First, we find a lowerbound ˆ P e ( n, k , j ) for P opt e ( n, k , j ) from the following recursive form ula ˆ P e ( n, k , j ) =                    min 0 ≤ n j ≤ min { n, ⌊ L j W j T ⌋} n j X i =0 Q j ( n j , i ) · ˆ P e ( n − n j , k − i, j − 1 ) if k > 0 1 if k ≤ 0 ˆ P e ( n, k , 1) = n X i = k +1 Q 1 ( n, i ) . (23) Using m emoization technique, w e need an array of size O ( N K J ) to store the values of ˆ P e ( n, k , j ) for 0 < n ≤ N , 0 < k ≤ K , and 1 ≤ j ≤ J . According to the recursiv e definition above, computing ˆ P e ( n, k , j ) requires O ( N K ) operations assumi ng th e va lues of Q j ( n j , i ) and ˆ P e ( n − n j , k − i, j − 1) and P n j i = k +1 Q j ( n j , i ) are already com puted for all i and n j . Thus, it is easy to verify that ˆ P e ( N , K, J ) can be computed wi th the com plexity of O ( N 2 K 2 J ) w hen t he values of Q j ( n j , i ) are given for all 0 < n j ≤ n and 0 ≤ i ≤ K . According to appendix H, for each 1 ≤ j ≤ J , and for each 0 < n j ≤ N , Q j ( n j , i ) for all 0 ≤ i ≤ n j is computed offl ine with the compl exity of O ( n 2 j L j ) + O  n j L j n j  = O ( n 2 j L j ) . Thus, computing Q j ( n j , i ) for all 1 ≤ j ≤ J , and 0 < n j ≤ N , and 0 ≤ i ≤ n j , has the complexity of 20 P J j =1 P N n j =1 O ( n 2 j L j ) = O ( N 3 L ) . Finally , ˆ P e ( N , K, J ) can be computed with the total complexity of O ( N 2 K 2 J + N 3 L ) . The following lemma g uarantees th at ˆ P e ( n, k , j ) is in fac t a lo werbound for P opt e ( n, k , j ) . Lemma V . P opt e ( n, k , j ) ≥ ˆ P e ( n, k , j ) . The proof is given in appendix I. The following al gorithm recursiv ely finds a subo ptimum allocation vector ˆ N based on the lowerbound of Lem ma V . (1): Initi alize j ← J , n ← N , k ← K . (2): Set ˆ N j = argmin 0 ≤ n j ≤ min { n, ⌊ L j W j T ⌋} n j X i =0 Q j ( n j , i ) · ˆ P e ( n − n j , k − i, j − 1) K j = argmax 0 ≤ i ≤ ˆ N j Q j ( ˆ N j , i ) ˆ P e ( n − ˆ N j , k − i, j − 1) (3): Updat e n ← n − ˆ N j , k ← k − K j , j ← j − 1 . (4): If j > 1 an d k ≥ 0 , goto (2). (5): F o r m = 1 to j , set ˆ N m ← ⌊ n j ⌋ . (6): ˆ N j ← ˆ N j + Rem ( n, j ) wher e Rem ( a, b ) denotes the remainder of dividing a by b . Intuitively speaking, t he above algorithm tries to recursively find the typi cal error ev ent ( K j ’ s) which has the maxim um contribution to t he error probability , and assigns th e rate allocations ( ˆ N j ’ s) such th at the esti mated ty pical error probabili ty ( ˆ P e ) i s m inimized. Indeed, Lemma V shows that the estim ate used in the algorithm ( ˆ P e ) is a l ower -bound for the min imum achiev able error probabil ity ( P opt e ). Comp aring (23) and t he step (2) o f ou r alg orithm, we observe that the values of ˆ N j and K j can be found i n O (1) during the computation o f ˆ P e ( N , K, J ) . Hence, complexity of th e p roposed alg orithm is the same as that of com puting ˆ P e ( N , K, J ) , O ( N 2 K 2 J + N 3 L ) . The following t heorem guarantees that the output of the above algorithm con verges to the asymptotically optimal rate allocation introduced in Theorem II of secti on IV -C, and accordingly , it performs optimall y for large num ber of paths. Theor em III. Consider a point -to-point connectio n ov er the network with L independent path s from the source to t he destin ation, each modeled as a Gi lbert-Elliot cell with a lar ge enough bandwidt h con straint. The p aths are from J different types, L j paths from the type j . Assume a block FEC of the size [ N , K ] is sent during an interva l time T . For fixed values of γ j = L j L , n 0 = N L , k 0 = K L , T and asympt otically lar ge number of paths ( L ) we ha ve 1) ˆ P e ( N , K, J ) . = P opt e ( N , K, J ) . = e − L P J j =1 γ j u j ( α ) 21 2) ˆ N j N = η ⋆ j + o (1) 3) K j ˆ N j = α + o (1) for α > E { x j } . where α = k 0 n 0 and u j () are defined in su bsections IV -B and IV -C. ˆ P e ( N , K, J ) is the l owerbound for P opt e ( n, k , j ) defined i n equation (23 ). ˆ N j is the total number of packets assigned to the paths of type j by the su boptimal rate allocati on algorithm. η ⋆ j is the asymptot ically optimal rate allocatio n given in equation (19). K j is also defined in the step (2) of t he alg orithm. The notati on f ( L ) = o ( g ( L )) means lim L →∞ f ( L ) g ( L ) = 0 . The proof can be found in appendix J. The p roposed algorit hm i s compared with four other allocatio n schemes over L = 6 paths i n Fig. 8. The o ptimal method uses exhaustive search over all pos sible all ocations. ‘ Best P ath All ocation ’ ass igns e verything to the best path onl y , ignoring the rest. ‘ Equal Dist ribution ’ scheme distributes the packets among all paths equall y . Finally , the ‘ Asymptot ically Optimal ’ allocation assigns the rates b ased on equation (19). The block leng th and the number of in formation packets are assumed to be N = 100 and K = 90 , respectively . The ov erall rate is S r eq = 1000 pk t/sec which result s in T = 100 ms . The a verage good burst, µ g , is sel ected such that we have µ g T = 1 5 . Howe ver , quali ty of the path s are different as t hey hav e different a verage b ad burst durations. P acket error probability of the paths are listed as [0 . 0175 ± ∆ 2 , 0 . 0175 ± 3∆ 2 , 0 . 0175 ± 5∆ 2 ] , such that the median is fixed at 0 . 0175 . ∆ is also defined as a measure of deviation from this m edian. ∆ = 0 represents t he case where all the paths are identical. T he lar ger is ∆ , the more variety we hav e among the paths and the m ore diversity gain mi ght be achie ved using a judicious rate allocation. As seen, our subopt imal algorithm tracks the o ptimal algorithm so clos ely that the corresponding curve s are not easily dis tinguish able over a wide range. Howe ver , t he ’ Asymptoti cally Optimal ’ rate allocation results in lower p erformance since there is only one path from each type which makes the asymptoti c analysis assumptions in va lid. When ∆ = 0 , ‘ Equal Distribution ’ scheme obviously coincides with the optimal allocation. This scheme e ventually div erges from the optim al algorit hm as ∆ grows. Howe ver , it still outperforms the best path allocatio n method as l ong as ∆ is not too large. For very large values of ∆ , t he best path domi nates all the other ones, and w e can ignore the rest of the paths . Hence, the best path allocatio n eve ntually con ver ges t o the optimal schem e when ∆ increases. V I . C O N C L U S I O N In this work, we ha ve studied the performance of forward error correction over a block of packets sent through m ultiple independent paths. It is known that Maximum Distance Sepa rable (MDS) block codes are optimum over o ur E nd-to-End Channel m odel, and any other erasure channel wit h or withou t mem ory , in the sense that their probability of error is minim um amo ng all bl ock codes of t he same size [6], [7]. 22 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 10 −3 0 0.01 0.02 0.03 0.04 0.05 0.06 ∆ P E Optimal Allocation Suboptimal Allocation Asymptotically Optimal Allocation Equal Distribution Best Path Allocation Fig. 8. Optimal and suboptimal rate allocations are compared with equal distribution and best path allocation schemes for different v alues of ∆ Adopting MDS codes, the probabi lity of irrecoverable loss, P E , is analyzed for the cases of a si ngle path, multiple identical, and mu ltiple non-identical paths based on t he discrete to contin uous relaxation. When there are L identical paths, P E is upperbou nded using large deviation theory . Thi s bound is shown to b e exponentially tight i n terms of L . The asympto tic analysis shows t hat the exponential decay of P E with L is still valid in the case of non-identical paths. Furthermore, the optim al rate allocatio n prob lem is sol ved in the asymp totic case where L is very large. It is seen that for the o ptimal rate allocation, each path is assigned a positive rate iff its quali ty is above certain th reshold. The quality o f a path is defined as the percentage of the ti me it spends in the bad state. Finally , we focus on the problem of optim um rate allocation w hen L is n ot necessarily large. A heuristic subop timal algorit hm is proposed wh ich compu tes a near -optimal allocation in polyno mial time. For large values of L , the result of this algorithm con ver ges to the optim al solution. Moreove r , s imulation results are provided which verify the validity of our theoretical analyses in s e veral practical scenarios, and also show that the proposed subopt imal algorithm approximates the optimal allocation very clos ely . A P P E N D I X A P R O O F O F L E M M A I 1) W e define the function v ( λ ) as v ( λ ) = E { xe λx } E { e λx } . (24) Then, the first deri vati ve of v ( λ ) will be ∂ ∂ λ v ( λ ) = E { x 2 e λx } E { e λx } − [ E { xe λx } ] 2 [ E { e λx } ] 2 . (25) 23 According to Cauchy-Schwa rz inequal ity , the following statement is alwa ys true for any two functions of f () and g ()  Z x f ( x ) g ( x ) dx  2 < Z x f 2 ( x ) dx Z x g 2 ( x ) dx (26) unless f ( x ) = K g ( x ) for a cons tant K and all v alues of x . If we choose f ( x ) = p x 2 Q ( x ) e xλ and g ( x ) = p Q ( x ) e xλ , they can not be prop ortional to each other for al l v alues o f x . Therefore, the numerator of equation (25) has to be strictly positive for all λ . Since the functi on v ( λ ) is st rictly in creasing, it has an in verse v − 1 ( α ) whi ch is also strictl y increasing. Moreover , the non-li near equation v ( λ ) = α has a unique so lution of the form λ = v − 1 ( α ) = l ( α ) . 2) T o show that l ( α = 0) = −∞ , we prove an equiv alent statement of the form lim λ →−∞ v ( λ ) = 0 . Since x is a random variable in the range [0 , 1] with the probability densit y function Q ( x ) , for any 0 < ǫ < 1 , we can write lim λ →−∞ v ( λ ) = lim λ →−∞ R ǫ 0 xQ ( x ) e xλ dx + R 1 ǫ xQ ( x ) e xλ dx R 1 0 Q ( x ) e xλ dx ≤ lim λ →−∞ R ǫ 0 xQ ( x ) e xλ dx R ǫ 0 Q ( x ) e xλ dx + R 1 ǫ xQ ( x ) dx R ǫ 0 Q ( x ) e ( x − ǫ ) λ dx ( a ) = lim λ →−∞ R ǫ 0 xQ ( x ) e xλ dx R ǫ 0 Q ( x ) e xλ dx ( b ) = lim λ →−∞ x 1 Q ( x 1 ) e λx 1 Q ( x 2 ) e λx 2 (27) for so me x 1 , x 2 ∈ [0 , ǫ ] . ( a ) foll ows from the fact that for x ∈ [0 , ǫ ] , ( x − ǫ ) λ → + ∞ when λ → −∞ , and ( b ) is a result of the mean value theorem for integration [60]. This theorem states that for e very continuous functi on f ( x ) in t he i nterval [ a, b ] , we have ∃ x 0 ∈ [ a, b ] s.t. Z b a f ( x ) dx = f ( x 0 )[ b − a ] . (28) Equation (27) is valid for any arbitrary 0 < ǫ < 1 . If we choose ǫ → 0 , x 1 and x 2 are both squeezed in the interv al [0 , ǫ ] . Thus, we ha ve lim λ →−∞ v ( λ ) ≤ lim λ →−∞ lim ǫ → 0 x 1 Q ( x 1 ) e λx 1 Q ( x 2 ) e λx 2 = lim ǫ → 0 x 1 = 0 (29) Based on the di stribution of x , v ( λ ) is obviously non-negative for any λ . Hence, the i nequality in (29) can be replaced by equality . 3) By observing that v ( λ = 0) = E { x } , it is obviou s that l ( α = E { x } ) = 0 . 24 4) T o sh ow that l ( α = 1) = + ∞ , we prov e the equi valent statement of t he fo rm lim λ → + ∞ v ( λ ) = 1 . For any 0 < ǫ < 1 and x ∈ [1 − ǫ, 1] , ( x − 1 + ǫ ) λ → + ∞ wh en λ → + ∞ . Then, defining ζ = 1 − ǫ , we hav e lim λ → + ∞ R ζ 0 xQ ( x ) e xλ dx R 1 0 Q ( x ) e xλ dx ≤ lim λ → + ∞ R ζ 0 xQ ( x ) dx R 1 ζ Q ( x ) e ( x − ζ ) λ dx = 0 . (30) Since th e fraction in (30) is o bviously non-negati ve for all λ , this inequality can be replaced by an equalit y . Similarly , we hav e lim λ → + ∞ R ζ 0 Q ( x ) e xλ dx R 1 ζ xQ ( x ) e xλ dx ≤ lim λ → + ∞ R ζ 0 Q ( x ) dx R 1 ζ xQ ( x ) e ( x − ζ ) λ dx = 0 . (31) which can also be replaced by equal ity . Now , the limit of v ( λ ) is written as lim λ → + ∞ v ( λ ) = lim λ → + ∞ R ζ 0 xQ ( x ) e xλ dx + R 1 ζ xQ ( x ) e xλ dx R 1 0 Q ( x ) e xλ dx ( a ) = lim λ → + ∞ R 1 ζ xQ ( x ) e xλ dx R 1 0 Q ( x ) e xλ dx ( b ) = lim λ → + ∞ R ζ 0 Q ( x ) e xλ dx + R 1 ζ Q ( x ) e xλ dx R 1 ζ xQ ( x ) e xλ dx ! − 1 ( c ) = lim λ → + ∞ R 1 ζ Q ( x ) e xλ dx R 1 ζ xQ ( x ) e xλ dx ! − 1 ( d ) =  lim λ → + ∞ Q ( x 1 ) e x 1 λ x 2 Q ( x 2 ) e x 2 λ  − 1 (32) for some x 1 , x 2 ∈ [1 − ǫ, 1] . ( a ) follows from equation (30), and ( b ) is valid since the final result shows that lim λ → + ∞ v ( λ ) is finite and non -zero [60]. ( c ) foll ows from equation (31), and ( d ) is a result of the mean value theorem for integration. If we choose ǫ → 0 , x 1 and x 2 are both s queezed in th e interval [1 − ǫ, 1] . Then, equation (32) t urns int o lim λ → + ∞ v ( λ ) =  lim λ → + ∞ lim ǫ → 0 Q ( x 1 ) e x 1 λ x 2 Q ( x 2 ) e x 2 λ  − 1 =  lim ǫ → 0 1 x 2  − 1 = 1 . 5) Accordin g to equations (12) and (13), the first deriv ative of u ( α ) is ∂ u ( α ) ∂ α = l ( α ) + α ∂ l ( α ) ∂ α − E { xe λx } E { e λx } ∂ l ( α ) ∂ α = l ( α ) . 25 A P P E N D I X B P R O O F O F L E M M A I I Based on the definit ion of probability d ensity fun ction, we ha ve lim L →∞ − 1 L log ( f y ( α )) = lim L →∞ − 1 L log  lim δ → 0 P { y > α } − P { y > α + δ } δ  ( a ) = lim δ → 0 lim L →∞ − 1 L log  P { y > α } − P { y > α + δ } δ  ≥ lim δ → 0 lim L →∞ 1 L ( − log ( P { y > α } ) + log δ ) ( b ) = u ( α ) (33) where ( a ) is v alid s ince log is a continuous functi on, and both limitations do exist and are interchangeable. ( b ) foll ows from equation (14 ). The exponent of f y ( α ) can be upper-bounded as lim L →∞ − 1 L log ( f y ( α )) ( a ) = lim δ → 0 lim L →∞ − log ( P { y > α } − P { y > α + δ } ) + log δ L ( b ) ≤ lim δ → 0 lim L →∞ − log  e − L ( u ( α )+ ǫ ) − e − L ( u ( α + δ ) − ǫ )  + log δ L = lim δ → 0 lim L →∞ u ( α ) + ǫ − log  1 − e − Lχ  L ( c ) = u ( α ) + ǫ (34) where χ = u ( α + δ ) − u ( α ) − 2 ǫ . Since u ( α ) is a strictly increasing funct ion (Lemma I), we can make χ positive by choosing ǫ small enough. ( a ) is valid si nce log is a continuous function, and both li mits do exist and are interchangeable. ( b ) follows from the definition of lim it i f L is sufficiently large, and ( c ) is a result of χ being positive . Selecting ǫ arbitrarily small, result s (33) and (34) prove the lemma. A P P E N D I X C P R O O F O F L E M M A I II According to the definit ion of infimum, we hav e lim L →∞ − log ( H ( L )) L ≥ lim L →∞ − 1 L log  e − L inf S h ( x ) Z S d x  ( a ) = inf S h ( x ) . (35) 26 where ( a ) foll ows from the fact that S is a bounded region. Since h ( x ) is a continuous function, it has a minimum in the bounded closed set cl ( S ) which is denoted by x ⋆ . Due to the continuity of h ( x ) at x ⋆ , for any ǫ > 0 , th ere is a neighbo rhood B ( ǫ ) centered at x ⋆ such that any x ∈ B ( ǫ ) has the property of | h ( x ) − h ( x ⋆ ) | < ǫ . Moreover , since S is a con ve x set, w e have vol ( B ( ǫ ) ∩ S ) > 0 . Now , we can write lim L →∞ − log( H ( L )) L ≤ lim L →∞ − 1 L log  Z S ∩B ( ǫ ) e − Lh ( x ) d x  ≤ lim L →∞ − 1 L log  e − L ( h ( x ⋆ )+ ǫ ) Z S ∩B ( ǫ ) d x  = h ( x ⋆ ) + ǫ. (36) Selecting ǫ to b e arbitrarily small, (35) and (36) p rove the lemma. A P P E N D I X D P R O O F O F L E M M A I V According to Lem ma I, u j ( x ) i s increasing and con vex for ∀ 1 ≤ j ≤ J . Thus, th e objective functio n f ( β ) = P J j =1 γ j u j ( β j η j ) is also con ve x, and the region S T is determined by J con vex inequality constraints and one af fine equali ty constraint. Hence, in thi s case, KKT conditi ons are bot h necessary and sufficient for opt imality [61]. In ot her words, if there exist con stants φ j and ν such that γ j η j l j ( β ⋆ j η j ) − φ j − ν = 0 ∀ 1 ≤ j ≤ J (37) φ j  η E { x j } − β ⋆ j  = 0 ∀ 1 ≤ j ≤ J (38) then the point β ⋆ is a global minimum . Now , we prove that either β ⋆ j = η j E { x j } for all 1 ≤ j ≤ J , or β ⋆ j > η j E { x j } for all 1 ≤ j ≤ J . Let us assume the opposite is true, and t here are at least t wo elements of the vector β ⋆ , i ndexed wit h k and m , whi ch have t he values of β ⋆ k = η k E { x k } and β ⋆ m > η m E { x m } , respectively . For any arbi trary ǫ > 0 , the vec tor β ⋆⋆ can be defined as below β ⋆⋆ j =          β ⋆ j + ǫ if j = k β ⋆ j − ǫ if j = m β ⋆ j otherwise. (39) 27 Then, we have lim ǫ → 0 f ( β ⋆⋆ ) − f ( β ⋆ ) ǫ = lim ǫ → 0 1 ǫ  γ k u k  β ⋆ k + ǫ η k  + γ m u m  β ⋆ m − ǫ η m  − γ m u m  β ⋆ m η k  ( a ) = lim ǫ → 0 γ k η k l k  β ⋆ k + ǫ ′ η k  − γ m η m l m  β ⋆ m + ǫ ′′ η m  = − γ m η m l m  β ⋆ m η m  < 0 (40) where ǫ ′ , ǫ ′′ ∈ [0 , ǫ ] , and ( a ) follows from the T aylor’ s theorem. Thu s, moving from β ⋆ to β ⋆⋆ decreases the function which contradicts the assumptio n of β ⋆ being the global mi nimum. Out of the remaining possibiliti es, the case where β ⋆ j = η j E { x j } ( ∀ 1 ≤ j ≤ J ) obviousl y agrees with Lemma IV for t he special case of ν = 0 . Therefore, the l emma can be p roved assumi ng β ⋆ j > η j E { x j } ( ∀ 1 ≤ j ≤ J ). T hen, equation (38) turns into φ j = 0 ( ∀ 1 ≤ j ≤ J ). By rearranging equation (37) and using t he conditi on P J j =1 β j = α , Lem ma IV is prov ed. A P P E N D I X E P R O O F O F T H E O R E M I I Sket ch of the pr oof: First, it is proved that η ⋆ j > 0 if E { x j } < α . A t the second step, we prove that η ⋆ j = 0 , if E { x j } ≥ α . Then, KKT conditions [61] are appli ed for the indi ces 1 ≤ k ≤ J where E { x k } < α t o find the maximizing allocation vector , η ⋆ . Pr oof: The parameter ν is obviously a function of the vector η . Dif ferentiating equati on (17) with respect to η k results in ∂ ν ∂ η k = − v k  ν η k γ k  + ν η k γ k v ′ k  ν η k γ k  J X j =1 η 2 j γ j v ′ j  ν η j γ j  (41) where v j ( x ) = l − 1 j ( x ) , and v ′ j ( x ) denotes i ts deri vati ve with respect to its argument. The objecti ve function can be simpli fied as g ( η ) = J X j =1 γ j u j ( β ⋆ j η j ) = J X j =1 γ j u j  v j ( ν η j γ j )  . (42) ν ⋆ is defined as the value o f ν corresponding to η ⋆ . Next, we show that ν ⋆ > 0 . Let us assume the opposite is true, i .e., ν ⋆ ≤ 0 . Then, according to Lemma I, we have v j ( ν ⋆ η j γ j ) ≤ E { x j } for all j which results in g ( η ⋆ ) = 0 . Howev er , it is possib le to achieve a positive value of g ( η ) by setting η j = 1 for the 28 one vector which has the prop erty of E { x j } < α , and settin g η j = 0 for the rest. Thus, η ⋆ can not be the maximal point. This con tradiction p roves the fact that ν ⋆ > 0 . At the first step, we prove that η ⋆ j > 0 if E { x j } < α . Assume the opposite is true for an index 1 ≤ k ≤ J . Since P J j =1 η ⋆ j = 1 , there shoul d be at least on e i ndex m such t hat η ⋆ m > 0 . For any arbitrary ǫ > 0 , the vector η ⋆⋆ can be defined as below η ⋆⋆ j =          ǫ if j = k η ⋆ j − ǫ i f j = m η ⋆ j otherwise. (43) ν ⋆⋆ is defined as t he correspond ing value of ν for the vector η ⋆⋆ . Based on equation (41), we can write ∆ ν = ν ⋆⋆ − ν ⋆ = (44) v m  ν ⋆ η ⋆ m γ m  + ν ⋆ η ⋆ m γ m v ′ m  ν ⋆ η ⋆ m γ m  − E { x k } J X j =1 η ⋆ 2 j γ j v ′ j  ν ⋆ η ⋆ j γ j  ǫ + O ( ǫ 2 ) . Then, we have lim ǫ → 0 g ( η ⋆⋆ ) − g ( η ⋆ ) ǫ = lim ǫ → 0 1 ǫ  ν ⋆ 2 η ⋆ k γ k v ′ k  ν ⋆ η ⋆ k γ k  ǫ − ν ⋆ 2 η ⋆ m γ m v ′ m  ν ⋆ η ⋆ m γ m  ǫ + ν ⋆ ∆ ν J X j =1 η ⋆ 2 j γ j v ′ j  ν ⋆ η ⋆ j γ j  + O ( ǫ 2 ) ) ( a ) = ν ⋆  v m  ν ⋆ η ⋆ m γ m  − E { x k }  (45) where ( a ) follows from (44). If the value of (45) is posit iv e for an ind ex m , moving in that di rection increases t he ob jectiv e function wh ich contradicts with the assumpt ion of η ⋆ being a maxim al poin t. If the v alue of (45) is non-positive for all indexes m whose η ⋆ m > 0 , we can write E { x k } ≥ J X m =1 η ⋆ m v m  ν ⋆ η ⋆ m γ m  = α (46) which obviousl y contradicts th e assumpt ion of E { x k } < α . At the second step, we prove that η ⋆ j = 0 if E { x j } ≥ α . Assume the opposite i s true for an index 1 ≤ r ≤ J . Since P J j =1 η ⋆ j = 1 , we should have η ⋆ s < 1 for all oth er indices s . For any arbitrary ǫ > 0 , 29 the vec tor η ⋆⋆⋆ can be defined as η ⋆⋆⋆ j =          η ⋆ j − ǫ i f j = r η ⋆ j + ǫ if j = s η ⋆ j otherwise. (47) ν ⋆⋆⋆ is defined as the correspondi ng value of ν for the vector η ⋆⋆⋆ . Based on equation (41 ), we can writ e ∆ ν = ν ⋆⋆⋆ − ν ⋆ = ǫ J X j =1 η ⋆ 2 j γ j v ′ j  ν ⋆ η ⋆ j γ j   v r  ν ⋆ η ⋆ r γ r  + ν ⋆ η ⋆ r γ r v ′ r  ν ⋆ η ⋆ r γ r  − v s  ν ⋆ η ⋆ s γ s  − ν ⋆ η ⋆ s γ s v ′ s  ν ⋆ η ⋆ s γ s  + O ( ǫ 2 ) . (48) Then, we have lim ǫ → 0 g ( η ⋆⋆⋆ ) − g ( η ⋆ ) ǫ = lim ǫ → 0 1 ǫ  ν ⋆ 2 η ⋆ s γ s v ′ s  ν ⋆ η ⋆ s γ s  ǫ − ν ⋆ 2 η ⋆ r γ r v ′ r  ν ⋆ η ⋆ r γ r  ǫ + ν ⋆ ∆ ν J X j =1 η ⋆ 2 j γ j v ′ j  ν ⋆ η ⋆ j γ j  + O ( ǫ 2 ) ) ( a ) = ν ⋆  v r  ν ⋆ η ⋆ r γ r  − v s  ν ⋆ η ⋆ s γ s  (49) where ( a ) fol lows from (48). If the value o f (49) is positive for an index s , moving in that direction increases the ob jectiv e function which contradicts w ith the ass umption of η ⋆ being a maximal poin t . If the v alue of (49) is non-positive for all indices s whose η ⋆ s > 0 , we can write E { x r } < v r  ν ⋆ η ⋆ r γ r  ≤ J X s =1 η ⋆ s v s  ν ⋆ η ⋆ s γ s  = α (50) which obviousl y contradicts th e assumpt ion of E { x r } ≥ α . Now that the bou ndary points are checked, we can safely use the KKT conditions [61] for all 1 ≤ k ≤ J , where E { x k } < α , to find t he m aximizing all ocation vector , η ⋆ . ζ = ν ⋆ 2 η ⋆ k γ k v ′ k  ν ⋆ η ⋆ k γ k  + ν ⋆ J X j =1 η ⋆ 2 j γ j v ′ j  ν ⋆ η ⋆ 2 j γ j  ∂ ν ∂ η k | ν = ν ⋆ ( a ) = − ν ⋆ v k  ν ⋆ η ⋆ k γ k  (51) 30 where ζ is a constant independent of k , and ( a ) foll ows from (41). Using the fact th at P J j =1 η j = 1 together with equations (17) and (51) results in ζ = − αν ⋆ ν ⋆ = X E { x j } <α γ j l j ( α ) . (52) Combining equatio ns (51) and (52) results i n equation (19) and g ( η ⋆ ) = P J j =1 γ j u j ( α ) . A P P E N D I X F P R O O F O F R E M A R K V Based on the arguments simil ar to th e ones in appendix E, i t can be shown that ˜ η ⋆ j = 0 iff E { x j } ≥ α . Since all the types are identical here, this means ˜ η ⋆ j > 0 for all j . Similar t o equati on (51), applying K KT conditions [61], gives us v j  ˜ ν ⋆ ˜ η ⋆ j γ j  =            − ζ if ˜ η ⋆ j < γ j W j T n 0 − ζ − σ j if ˜ η ⋆ j = γ j W j T n 0 (53) where σ j ’ s are no n-negati ve parameters [61]. Puttin g Υ = l j ( − ζ ) ˜ ν ⋆ proves equati on (20). A P P E N D I X G D I S C R E T E A NA L Y S I S O F O N E P A T H Q ( n, k , l ) is defined as the probabilit y of having e xactly k errors ou t of the n packets sent over the path l . Depend ing on th e i nitial state of the path l , P g ( n, k , l ) and P b ( n, k , l ) are defined as the probabilit ies of ha ving k errors out of the n packets sent over t his path when we start the transmissi on in the good or in the bad state, respectively . It is easy to see that Q ( n, k , l ) = π g P g ( n, k , l ) + π b P b ( n, k , l ) . (54) P g ( n, k , l ) and P b ( n, k , l ) can be computed from the fol lowing recursive equatio ns P b ( n, k , l ) = π b | b P b ( n − 1 , k − 1 , l ) + π g | b P g ( n − 1 , k − 1 , l ) P g ( n, k , l ) = π b | g P b ( n − 1 , k , l ) + π g | g P g ( n − 1 , k , l ) (55) with th e in itial conditi ons P g ( n, k , l ) = 0 for k ≥ n P b ( n, k , l ) = 0 for k > n P g ( n, k , l ) = 0 for k < 0 P b ( n, k , l ) = 0 for k ≤ 0 (56) 31 where π s 2 | s 1 is the probabilit y of t he channel being in the state s 2 ∈ { g , b } provided that it has been in the state s 1 ∈ { g , b } when the last packet was t ransmitted. π s 2 | s 1 has t he following values for diffe rent combinations o f s 1 and s 2 [1] π g | g = π g + π b e − µ g + µ b S l π b | g = 1 − π g | g π b | b = π b + π g e − µ g + µ b S l π g | b = 1 − π b | b (57) where S l denotes the transm ission rate on the path l , i.e., the p ackets are transmitted on the path l ev ery 1 S l seconds. According t o the recursi ve equations in (55), to compute P b ( n, k , l ) and P g ( n, k , l ) by m emoization technique, the functions P b () and P g () should be calculated at the following s et of points denoted as S ( n, k ) S ( n, k ) = { ( n ′ , k ′ ) | 0 ≤ k ′ ≤ k , n ′ − n + k ≤ k ′ ≤ n ′ } . Cardinality of the set S ( n, k ) is of the o rder |S ( n, k ) | = O ( k ( n − k )) . Since three operations are n eeded to com pute the recursi ve functions P b () and P g () at each point, P b ( n, k , l ) and P g ( n, k , l ) are comp utable with th e com plexity of O ( k ( n − k )) which giv e u s Q ( n, k , l ) according to equatio n (54). A P P E N D I X H D I S C R E T E A NA L Y S I S O F O N E T Y P E When there are n packets to be distri buted over L j identical path s of t ype j , uniform distribution is obviously the opti mum. Howe ve r , since the integer n may be ind ivisible by L j , th e L j dimensional vector N is selected as N l =            ⌊ n L j ⌋ + 1 for 1 ≤ l ≤ Rem ( n, L j ) ⌊ n L j ⌋ for Rem ( n, L j ) < l ≤ L j (58) where Rem ( a, b ) deno tes the remainder of dividing a by b . N represents the closest i nteger vector to a uniform di stribution. E N ( k , l ) is defined as t he probabilit y of having exactly k erasures amo ng the n packets transmi tted over the identi cal paths 1 to l with the all ocation vector N . According to the definitions of Q j ( n, k ) and 32 E N ( k , l ) , it i s obvious that Q j ( n, k ) = E N ( k , L j ) . E N ( k , l ) can be computed recursively as E N ( k , l ) = k X i =0 E N ( k − i, l − 1) Q ( N l , i, l ) E N ( k , 1) = Q ( N 1 , k , 1) (59) where Q ( N l , i, l ) is giv en in appendix G. Since all the paths are assumed to be identical here, Q ( N l , k , l ) is the same for all path indi ces, l . According to the recursi ve equations in (55), the v alues of Q ( N l , i, l ) for all 0 ≤ i ≤ k and 1 ≤ l ≤ L j can be calculated with the comp lexity of O ( N l k ) = O  n L j k  . According to th e recursiv e equations i n (59), computing E N ( k , l ) requires memoization over an array of size O ( k l ) whose ent ries can be calculated wi th O ( k ) operations each. Thus, E N ( k , l ) i s computabl e with the comp lexity of O ( k 2 l ) if Q ( N l , i, l ) ’ s are already giv en. Fin ally , not ing that Q j ( n, k ) = E N ( k , L j ) , we can comp ute Q j ( n, k ) with t he overall complexity of O ( k 2 L j ) + O  n L j k  . A P P E N D I X I P R O O F O F L E M M A V The lemma is proved by induction on j . The case of j = 1 i s obviously t rue as ˆ P e ( n, k , 1) = P opt e ( n, k , 1) . Let us assume thi s statem ent is true for j = 1 to J − 1 . Then, for j = J , we have ˆ P e ( n, k , J ) ( a ) ≤ N J X i =0 Q J ( N opt J , i ) ˆ P e ( n − N opt J , k − i, J − 1) ( b ) ≤ N J X i =0 Q J ( N opt J , i ) P opt e ( n − N opt J , k − i, J − 1 ) ( c ) ≤ N J X i =0 Q J ( N opt J , i ) P N opt e ( k − i, J − 1 ) ( d ) = P N opt e ( k , J ) = P opt e ( n, k , J ) where N opt denotes the opti mum allocation of n packets among th e J types of paths such that the probability of h a ving m ore than k lost packets is mini mized. ( a ) follows from the recursive equation (21), and ( b ) is the induction assu mption. ( c ) comes from the definition of P opt e ( n, k , l ) , and ( d ) is a resul t of equation (23). A P P E N D I X J P R O O F O F T H E O R E M I I I Sket ch of the pr oof: First , the asym ptotic behavior of Q j ( n, k ) is analyzed, and it is shown that for lar ge v alues of L j (or equivalently L ), equati on (63) computes the exponent of Q j ( n, k ) versus L . Next, 33 we prove the first part of the theorem by i nduction o n J . The proo f of this part is divided to two different cases, depending on whether K N is lar ger than E { x J } or vice versa. Finally , the second and t he third parts of the theorem are proved by i nduction on j whil e the total number o f p ath typ es, J , is fixed. Again, th e proof i s divided into two d iffe rent cases, depending on whether K N is larger t han E { x j } o r vi ce versa. Pr oof: First , we compute the asymptot ic beha vior of Q j ( n, k ) for k > n E { x j } , and n growing proportionally to L j , i.e. n = n ′ L j . Here, we can apply Sanov’ s Theorem [56], [62] as n and k are discrete variables and n ′ is a constant. Sanov’ s Theor em. Let X 1 , X 2 , . . . , X n be i.i.d. discrete random variables from an alphabet set X with the size |X | and probability mass function (pmf) Q ( x ) . Let P denote th e set of pmf ’ s i n R |X | , i.e. P = n P ∈ R |X | | P ( i ) ≥ 0 , P |X | i =1 P ( i ) = 1 o . Al so, let P L denote the subset of P corresponding t o all possible empirical distributions of X in L ob serva tions [62], i .e. P L = { P ∈ P | ∀ i, LP ( i ) ∈ Z } . For any dense and clo sed set [57] of p mf ’ s E ⊆ P , the probability that the empirical distribution of L observations belongs t o the set E is equal to P { E } = P { E ∩ P L } . = e − LD ( P ⋆ || Q ) (60) where P ⋆ = argmin P ∈ E D ( P || Q ) and D ( P || Q ) = P |X | i =1 P ( i ) log P ( i ) Q ( i ) . Focusing our attent ion on the main probl em, assum e that P is defined as th e empi rical dist ribution of the number of errors in each path, i.e. for ∀ i, 1 ≤ i ≤ n ′ , P ( i ) shows the ratio of the total paths which contain exactly i lo st packets. Similarly , for ∀ i, 1 ≤ i ≤ n ′ , Q ( i ) denotes the probability of exactly i packets being l ost o ut of the n ′ packets transm itted on a path of type j . The sets E and E out are defined as fol lows E = { P ∈ P | n ′ X i =0 iP ( i ) ≥ β } (61 ) E out = { P ∈ P | n ′ X i =0 iP ( i ) = β } where β = k n . No ting E and E out are dense sets, we can compute Q j ( n, k ) as Q j ( n, k ) ( a ) = P { E out } ( b ) . = e − L j min P ∈ E out D ( P || Q ) (62) where ( a ) foll ows from the definiti on of Q j ( n, k ) as the probabili ty of having exactly k errors out of the n packets sent over the p aths of type j given in section V, and ( b ) results from Sanov’ s Theorem. Knowing th e fact that t he Kullback L eibler distance, D ( P | | Q ) , is a con ve x functi on of P and Q [63], we conclud e that its min imum over the con v ex set E eit her lies on an interior po int which i s a global minimum of the functio n over the whole set P or is located on the boundary of E . Howe ver , we know 34 that t he glob al mi nimum of Kullback Leib ler dist ance occurs at P = Q / ∈ E . Thus, the minimum of D ( P || Q ) is located on the boundary of E . This resul ts in Q j ( n, k ) ( a ) . = e − L j min P ∈ E out D ( P || Q ) = e − L j min P ∈ E D ( P || Q ) ( b ) . = e − γ j Lu j ( k n ) (63) where ( a ) and ( b ) follow from equations (62) and (1 4), respectively . 1) W e prove the first part of the theorem by induction on J . When J = 1 , the statement is correct for both cases of K N > E { x 1 } and K N ≤ E { x 1 } , recalling the fact that ˆ P e ( n, k , 1) = P opt e ( n, k , 1) and u 1 ( x ) = 0 for x ≤ E { x 1 } . Now , let us assume the first part of the theorem is true for j = 1 to J − 1 . W e prove th e same statement for J as well. The proo f can be divided into t wo dif ferent cases, depending on wh ether K N is larger t han E { x J } or vice versa. 1.1) K N > E { x J } According to the definition, t he value of ˆ P e ( N , K, J ) is computed by minimizing P n J i =0 Q J ( n J , i ) ˆ P e ( N − n J , K − i, J − 1) over n J (see equati on (23)). Now , we s how that for any value of n J , the correspondi ng term in the m inimization is asymptoti cally at l east equal to P opt e ( N , K, J ) . n J can t ake integer values in t he range 0 ≤ n J ≤ N . W e spl it this range into three non-overlapping intervals of 0 ≤ n J ≤ ǫL , ǫL ≤ n J ≤ N (1 − ǫ ) , and N (1 − ǫ ) < n J ≤ N for any arbitrary const ant ǫ ≤ min  γ j , 1 − K N  . The reason i s that equation (63) is valid i n the second i nterval only , and we need separate analyses for the first and last interva ls. First, we show t he statement for ǫL ≤ n J ≤ N (1 − ǫ ) . Defining i J = ⌊ n J K N ⌋ , we have i J n J = K N + O ( 1 L ) , K − i J N − n J = K N + O ( 1 L ) (64) as ǫ is cons tant, and K = O ( L ) , N = O ( L ) . Hence, we hav e n J X i =0 Q J ( n J , i ) ˆ P e ( N − n J , K − i, J − 1) ≥ Q J ( n J , i J ) ˆ P e ( N − n J , K − i J , J − 1) ( a ) . = e − L J X j =1 γ j u j  K N + O  1 L  ( b ) . = e − L J X j =1 γ j u j  K N  (65) 35 where ( a ) follows from (63) and the induction assu mption, and ( b ) follows from t he fact th at u j () ’ s are diffe rentiable funct ions according to Lemma I in subsection IV -B. For 0 ≤ n J ≤ ǫL , sin ce ǫ < γ j , the n umber of packets assigned to the path s of t ype J is less than the number of such paths. Thus, one packet i s allocated to n J of t he paths, and the rest of the path s of t ype J are not used. Defining π b,J as the probability of a path of type J bei ng in the bad state, we can write Q J ( n J , n J ) = π n J b,J = e − n J log 0 @ 1 π b,J 1 A . (66) Therefore, for 0 ≤ n J ≤ ǫL , we hav e n J X i =0 Q J ( n J , i ) ˆ P e ( N − n J , K − i, J − 1) ≥ Q J ( n J , n J ) ˆ P e ( N − n J , K − n J , J − 1) . = e − L J − 1 X j =1 γ j u j  K − n J N − n J  − n J log  1 π b,J  ( a ) ≥ e − L J − 1 X j =1 γ j u j  K N  − Lǫ log  1 π b,J  ( b ) . = e − L J − 1 X j =1 γ j u j  K N  ≥ e − L J X j =1 γ j u j  K N  (67) where ( a ) follows from the fact that K − n J N − n J ≤ K N , and ( b ) results from the fa ct th at we can select ǫ arbitrarily small. Finally , we prove the statement for the case n J > N (1 − ǫ ) . In th is case, we have n J X i =0 Q J ( n J , i ) ˆ P e ( N − n J , K − i, J − 1) ≥ Q J ( n J , K ) ˆ P e ( N − n J , 0 , J − 1) ( a ) ≥ e − Lγ J u J  K N (1 − ǫ )  ( b ) ˙ ≥ e − L J X j =1 γ j u j  K N  (68) where ( a ) follows from the fact t hat ǫ < 1 − K N and ˆ P e ( n, 0 , j ) = 1 , for all n and j . Setting ǫ small enough result s in ( b ) . Inequalities (65), (67), and (68) resul t in ˆ P e ( N , K, J ) ˙ ≥ e − L J X j =1 γ j u j ( α ) (69) Combining (69) with Lemma V prove s the first part of Theorem III for th e case when K N > E { x J } . 36 1.2) K N ≤ E { x J } Similar to the case of K N > E { x J } i n subs ection 1.1, we show that for any value of 0 ≤ n J ≤ N , the corresponding term of the m inimization in equati on (23) is asymptot ically at least equal to P opt e ( N , K, J ) . Again, the range of n J is partitioned into three non-overlapping intervals. For any arbitrary 0 < ǫ < min  γ J , 1 − K N , 1 K  , and for all n J in the range of ǫL < n J ≤ N (1 − ǫ ) , we define i J as i J = ⌈ n J E { x J }⌉ . W e ha ve i J n J = E { x J } + O  1 L  ≥ E { x J } K − i J N − n J < K N + O  1 L  (70) Hence, n J X i =0 Q J ( n J , i ) ˆ P e ( N − n J , K − i, J − 1) ≥ Q J ( n J , i J ) ˆ P e ( N − n J , K − i J , J − 1) ( a ) . = e − Lγ J u J  i J n J  − L J − 1 X j =1 γ j u j  K − i J N − n J  ( b ) ≥ e − Lγ J u J  E { x J } + O  1 L  · e − L J − 1 X j =1 γ j u j  K N + O  1 L  ( c ) . = e − L J X j =1 γ j u j  K N  (71) where ( a ) follows from (63) and the inducti on assumpti on, and ( b ) is b ased on (70). ( c ) results from the facts that u j () ’ s are differentiable functions, and we hav e u J ( E { x J } ) = 0 , bot h according t o Lemm a I in subsection IV -B. For 0 ≤ n J ≤ ǫL , the analysis of section 1.1 and in equality (67) are still valid. For n J > (1 − ǫ ) N , we set i J = ⌈ E { x J } n J ⌉ . Now , we hav e i J ≥ n J E { x J } > (1 − ǫ ) N E { x J } ≥ (1 − ǫ ) K . (72) The above inequalit y can be written as K − i J < ǫK < 1 (73) 37 since ǫ < 1 K . No ting t hat K and i J are integer values, it is conclu ded that K ≤ i J . Now , we can write n J X i =0 Q J ( n J , i ) ˆ P e ( N − n J , K − i, J − 1) ≥ Q J ( n J , i J ) ˆ P e ( N − n J , K − i J , J − 1) ( a ) = Q J ( n J , i J ) ˙ ≥ e − Lγ J u J  E { x J } + 1 n J  ( b ) ˙ ≥ e − Lγ J u J  E { x J } + 1 (1 − ǫ ) N  . = e − Lγ J u J  E { x J } + O  1 L  ( c ) . = 1 (74) where ( a ) follows from t he fact that K ≤ i J , and ˆ P e ( n, k , j ) = 1 , for k ≤ 0 . ( b ) and ( c ) result from n J > (1 − ǫ ) N and u J ( E { x J } ) = 0 , respectively . Hence, inequ alities (67), (71), and (74) resul t in ˆ P e ( N , K, J ) ˙ ≥ e − L J X j =1 γ j u j ( α ) (75) which proves the first part of Theorem III for the case of K N ≤ E { x J } w hen com bined with Lemma V . 2) W e prove the second and the third parts of the theorem b y induction on j while the total number of types, J , is fixed. The proof o f the st atements for the base of th e induction, j = J , i s simil ar to the proof of t he induction step, from j + 1 to j . Hence, we just giv e the proof for the indu ction st ep. As sume th e second and the th ird parts of th e t heorem are true for m = J to j + 1 . W e prove the same st atements for j . The proof is divided into two d iffe rent cases, depending on wheth er K N is larger t han E { x j } or vi ce versa. Before we proceed furt her , it i s helpful to introduce two new parameters N ′ and K ′ as N ′ = N − J X m = j +1 ˆ N j K ′ = K − J X m = j +1 K j . According to the above definitions and the induction assum ptions, it is obvious t hat K ′ N ′ = K N + o (1) = α + o (1) . (76) 38 2.1) K N > E { x j } First, by contradiction , it will b e s hown that for small enou gh values of ǫ > 0 , we hav e ˆ N j > ǫN ′ . Let us assum e th e opposit e is true, i.e. ˆ N j ≤ ǫN ′ . Th en, we can write ˆ P e ( N ′ , K ′ , j ) ( a ) = ˆ N j X i =0 ˆ P e ( N ′ − ˆ N j , K ′ − i, j − 1) Q j ( ˆ N j , i ) ≥ ˆ P e ( N ′ − ˆ N j , K ′ − ˆ N j , j − 1) Q j ( ˆ N j , ˆ N j ) ( b ) . = Q j ( ˆ N j , ˆ N j ) e − L j − 1 X r =1 γ r u r K ′ − ˆ N j N ′ − ˆ N j ! ( c ) ≥ e − Ln 0 1 − J X r = j +1 η r ! ǫ log  1 π b,j  · e − L j − 1 X r =1 γ r u r  K ′ N ′  ( d ) ˙ > e − L j X r =1 γ r u r ( α ) (77) where ( a ) follows from equation (23) and st ep (2) of our suboptimal algorithm, ( b ) result s from the first part o f Th eorem III, and ( c ) can be justified usi ng arguments similar to th ose of inequality (67). ( d ) i s obtained assum ing ǫ is small enough such that the corresponding term in the exponent is strictl y l ess than Lγ j u j  K ′ N ′  and also the fact t hat K ′ N ′ = α + o (1) . The result i n (77) is obviously in cont radiction with the first part of Theorem III, proving that ˆ N j > ǫN ′ . Now , we show that i f ˆ N j > (1 − ǫ ) N ′ for arbi trarily sm all values of ǫ , we should have E { x r } > α for all 1 ≤ r ≤ j − 1 . In such a case, we observe ˆ N j N ′ = 1 + o (1) , proving the second stat ement of Theorem III. T o show this , let us assume ˆ N j > (1 − ǫ ) N ′ . Hence, ˆ P e ( N ′ , K ′ , j ) = ˆ N j X i =0 ˆ P e ( N ′ − ˆ N j , K ′ − i, j − 1) Q j ( ˆ N j , i ) ˙ ≥ ˆ P e ( N ′ − ˆ N j , 0 , j − 1) Q j ( ˆ N j , K ′ ) ( a ) ˙ ≥ e − Lγ j u j “ K ′ (1 − ǫ ) N ′ ” ( b ) . = e − Lγ j u j ( α + o (1)) (78) where ( a ) follows from t he fact that ˆ P e ( n, 0 , j ) = 1 , for all values of n and j , and the fact that ˆ N j ≥ (1 − ǫ ) N ′ . ( b ) is ob tained by making ǫ arbitrarily small and using equation (76). Appl ying (78) and knowing the fact th at ˆ P e ( N ′ , K ′ , j ) . = e − L P j r =1 γ r u r ( α ) , we conclud e that E { x r } > α , for all v alues of 1 ≤ r ≤ j − 1 . 39 ˆ P e ( N ′ , K ′ , j ) can be wri tten as ˆ P e ( N ′ , K ′ , j ) = min 0 ≤ N j ≤ N ′ N j X i =0 ˆ P e ( N ′ − N j , K ′ − i, j − 1) Q j ( N j , i ) ( a ) . = min ǫN ′ ≤ N j ≤ (1 − ǫ ) N ′ max 0 ≤ i ≤ N j ˆ P e ( N ′ − N j , K ′ − i, j − 1) Q j ( N j , i ) ( b ) . = min ǫN ′ ≤ N j ≤ (1 − ǫ ) N ′ max E { x j } N j E { x r } l r ( α ) (80) Hence, t he integer parameters K j , ˆ N j defined i n the subopti mal algorithm have to sati sfy K j N ′ = β ∗ j + o (1) and ˆ N j N ′ = λ ∗ j + o (1) , respectively . Based on the induction assumption, it is easy to sho w that N ′ N = j X r =1 , E { x r } <α γ r u r ( α ) J X r =1 , E { x r } <α γ r u r ( α ) (81) which compl etes the proof for the case of E { x j } < K N . 2.2) K N ≤ E { x j } In this case, we show t hat ˆ N j N = o (1) . Defining i j = ⌈ E { x j } ˆ N j ⌉ , we hav e K ′ − i j N ′ − ˆ N j = α − ( E { x j } − α ) ˆ N j N ′ − ˆ N j + o (1) (82) using equ ation (76). No w , we have ˆ P e ( N ′ , K ′ , j ) = ˆ N j X i =0 ˆ P e ( N ′ − ˆ N j , K ′ − i, j − 1) Q j ( ˆ N j , i ) ≥ ˆ P e ( N ′ − ˆ N j , K ′ − i j , j − 1) Q j ( ˆ N j , i j ) ( a ) . = e − Lγ j u j ( E { x j } + o (1)) · e − L j − 1 X r =1 γ r u r α − ( E { x j } − α ) ˆ N j N ′ − ˆ N j ! . = e − L j − 1 X r =1 γ r u r α − ( E { x j } − α ) ˆ N j N ′ − ˆ N j ! (83) where ( a ) follows from the first part of Theorem III and (63). On the other hand, according t o the result of t he first part of Theorem III, we know t hat ˆ P e ( N ′ , K ′ , j ) . = e − L j − 1 X r =1 γ r u r ( α ) . (84) 41 According to Lemma I, u r ( β ) is an increasing function of β for all 1 ≤ r ≤ j − 1 . Th us, P j − 1 r =1 γ r u r ( β ) is als o a one-to-one increasing function of β . Noting this fact and comparing (83) and (84), we conclude that ˆ N j N ′ = o (1) as E { x j } − α i s strictly posit iv e. N oting (81), we have ˆ N j N = o (1) which proves the second part o f Theorem III for the case of K N ≤ E { x j } . R E F E R E N C E S [1] J.C. Bolot, S. Fosse-Parisis, and D. T owsley , “Adaptiv e FE C-based error control for Internet telephony, ” in IEE E INFOCOM, Pr oc. IEEE V ol. 3 , 199 9, pp. 1453–146 0. [2] J.C. Bolot and T . T urletti, “Adapti ve Error Control For Packet V ideo In The Internet, ” in Pro c. IEEE International Confer ence on Imag e Pr ocessing , 1996, pp. 25 – 28. [3] T . Nguyen and A. Zakhor , “Path div ersity with forward error correction (pdf) system for pack et switched networks, ” in IEEE INFOCOM Pr oc. IEEE V ol. 1 , 2003, pp. 663– 672. [4] T . Nguye n and A. Zakhor, “Multiple Sender Distributed V ideo Streaming, ” IEEE tran sactions on multimedia , vol. 6, no. 2, pp. 315– 326, 2004. [5] F . L. Leannec, F . T outain, and C. Guillemot, “Packet Loss Resilient MPEG-4 Compliant V ideo Coding for the Internet, ” J ournal of Imag e Communication, Special Issue on Real-time video o ver the Internet , no. 15, pp. 35–56, 1999. [6] S. Fashandi, S . Oveisgharan, and A.K. Khandani, “Coding ov er an E rasure Channel with a Large Alphabet S ize, ” in IEEE International Symposium on Information Theory , ISIT ’08 , 2008. [7] ——, “Coding ov er an Erasure Channel wi th a Large Alphabet Size, ” 2008, library and Archive s Canada T echnical Report UW -EC E #2008-06 , http://cst.uwaterloo.ca/r/2008- 06 Sherv an.pdf. [8] H. Han, S. Shakkottai, C.V . Hollot, R. Srikant, and D. T o wsley , “Multi-Path T CP: A Joint Congestion Control and Routing Scheme to Exploit Path Dive rsity in the Internet, ” IEEE/ACM T ransa ctions on Networking , vol. 14, no. 6, pp. 1260 – 1271, 2006. [9] S. Mao, S. S. Panwar , and Y .T . Hou, “On optimal partitioning of realtime traffic over multiple paths, ” in INFOCOM 2005, Proc. IE EE V ol. 4 , 2005, pp. 2325–23 36. [10] S . Fashan di, S. Oveisghara n, and A.K. Khandani, “Path Div ersity in Packet Sw itched Network s: P erformance Analysis and Rate Allocation, ” in IEEE Gl obal T elecommunications Confer ence, GL OBECOM ’07 , 2007, pp. 1840–184 4. [11] J. Han, D. W atson, and F . Jahanian, “An Experimental Study of Internet Path Diversity, ” I EEE T ransaction s on Dependable and Secur e Computing , vol. 3, no. 4, pp. 273 – 288, 2006. [12] J. Han and F . Jahanian, “Impact of Path Diversity on Multi-homed and Overlay Networks, ” i n International Confer ence on Dependable Systems and Networks , 2004, pp. 29–38. [13] N. Spring, R. Mahajan, D. W etherall, and T . Anderson, “Measuring ISP T opo logies with R ocketfue l, ” IEEE/ ACM T ransa ctions on Networking , vol. 12, no. 1, pp. 2– 16, 2004. [14] R. T eixeira, K. Marzullo, S . Savag e, and G. M. V oelker, “In Search of Path Div ersity in ISP Networks, ” in Pro ceedings of the 3r d ACM SIGCOMM Confer enc e on Internet Measur ement , 2003, pp. 313 – 318. [15] A. L. Barbasi and R. Albert, “Emergence of S caling in Random Networks, ” Science , vol. 286, no. 5439, pp. 509–512 , 1999. [16] David G. Andersen, Resilient Overlay Networks . Master’ s Thesis, Massachusetts Institute of T echno logy , 2001. [17] Y . J. Liang, E. G. Steinbach, and B. Girod , “Multi-str eam V o ice ov er IP using Pack et Path D iv ersity, ” in IE EE F ourth W orkshop on Multimedia Signal Pr ocessing , 2001 , pp. 555–56 0. [18] S . Nelakuditi, Z. Z hang, and D. H. C. Du, “On Selection of Candidate Paths for Proportional Routing, ” Elsevier Computer Networks , vol. 44, no. 1, pp. 79–102 , 2004. [19] D. G. Andersen, A. C. S noeren, and H. Balakrishnan, “Best-path vs. Multi-path Overlay Routing, ” in Proce edings of t he 3r d ACM SIGCOMM Confere nce on Internet Measur ement , 200 3, pp. 91 – 100. 42 [20] B-G C hun, R. Fonseca, I. Stoica, and J. Kubiato wicz, “Characterizing Selfi shly Constructed Overlay Routing Networks, ” in IEE E INFOCOM , 2004, pp. 1329–1 339. [21] J. Han, D. W atson, and F . Jahanian, “T opology A ware Overlay Networks, ” in IEEE INF OCOM , vol. 4, 2005, pp. 2554– 2565. [22] M. Guo, Q. Zhang, and W . Zhu , “Sel ecting Path-div ersified S ervers in Content Distribution Networks, ” in IE EE Global T elecommu- nications Confer ence, GLOB ECOM ’03 , v ol. 6, 200 3, pp. 3181–3185. [23] A. Akella, B. Maggs, S . Seshan, and A. S haikh, “On the Performance Benefits of Multihoming Route Control, ” IEEE/ ACM T r ansactions on Networking , vo l. 16, no. 1, pp. 91– 104, 2008 . [24] A. Akella, J. Pang, B. Maggs, S. S eshan, and A. S haikh, “A Comparison of Overlay Routing and Multihoming Route Control, ” in ACM SIGCOMM , 200 4, pp. 93 – 106 . [25] S . Sri niv asa n, Design and Use of Mana ged Overlay Networks . PhD Dissertati on, Georgia Institute of T echnology , 2007. [26] M. Cha, S. Moon, C. D. Park, and A. Shaikh, “Placing Relay Nodes for Intra-Domain Path Diversity, ” in IEEE INFOCOM , 2006, pp. 1–12. [27] David Eppstein, “Fi nding the k shortest paths, ” in P r oc. 35th Symp. F oundations of Computer Science , 1994 , pp. 154–165. [28] Ri chard G. Ogier, Vlad Rutenburg , and Nauchum Shacham, “Distributed A lgorithms for Computing S hortest Pairs of Disjoint Paths, ” IEEE transactions on information theory , vol. 39, no. 2, pp. 443– 455 , 1993 . [29] D. Clark, W . Lehr, S . Bauer , P . Faratin, R. S ami, and J. Wroclawski, “Overlay Networks and Future of the Internet, ” Journal of Communications and Str ate gies , vol. 3, no. 63, pp. 1–21, 2006 . [30] M. Cha, Network Support for E mer ging Multimedia Stre aming Services . PhD Dissert ation, K orea Advanced Institute of S cience and T echnology , 2007. [31] Roger Karrer, and Thomas Gross, “Multipath Streaming in Best-Effort Networks, ” in Pr oc. of the IEEE Internationa l Confer ence on Communications (ICC’ 03) , 2003. [32] J.G. Apostolopoulos, T . W ong, W . T an, and S.J. W ee, “On Multiple Description Str eaming with Content Deliv ery Networks, ” in IEEE INFOCOM, Pro c. IE EE V ol. 3 , 2002, pp. 1736 – 1745. [33] “Akamai SureRoute, ” http:/ /www . akamai.com/dl/feature sheets/fs edge \ suite sureroute.pdf. [34] M. Ghanassi and P . Kabal, “Optimizing V oice-ov er-IP Speech Quality Using Path Div ersity, ” i n IEE E 8th W orkshop on Multimedia Signal Pro cessing , 2006, pp. 155–160. [35] J. Chakareski and B. Girod, “Rate-distortion optimized packet scheduling and routing for media streaming with path div ersity, ” in Pr oc. IEEE Data Compr ession Confer ence , 2003, pp. 203– 212. [36] M. Af ergan, J. W ein, an d A. LaMeyer, “Experience with some Principles for Building an Internet-Scale Reliable System, ” in Pr oceedings of the F ifth IEE E International Sympo sium on Network Computing and Applications (NCA’06) , 2006, p. 3. [37] Ron M. Roth, Introd uction to Coding Theory , 1st ed. Cambridge Univ ersity Press, 2006, pp. 333–351. [38] W . T . T an and A. Z akhor, “V ideo Multicast Usi ng Layered FEC and Scalable C ompression, ” IEEE T ra nsactions on Cir cuits and Systems for V ideo T echno logy , vol. 11, no. 3, pp. 373–386, 2001. [39] L . Dairaine, L . Lancrica, J. L acan, and J. F imes, “Content-Acce ss QoS in Peer-to-Peer Networks Using a Fast MDS Erasure Code, ” Elsevier Computer Communications , vol. 28, no. 15, pp. 1778–17 90, 2005. [40] X. H. Peng, “Erasure-control Coding for Distri buted Networks, ” IE E Pr oceedings on Communications , vol. 152, pp. 1075 – 1080, 2005. [41] N. Alon, J. Edmonds, and M. L uby, “Linear T ime Erasure Codes with N early Optimal Recove ry, ” in IEEE Symposium on F oundation s of Computer Science , Pr oc. IE EE V ol. 3 , 1995, pp. 512–51 9. [42] J. Justesen , “On the complexity of decoding Reed-Solomon codes, ” IEEE t ransac tions on information theory , vol. 22, no. 2, pp. 237– 238, 1993. [43] M. G. Luby , M. Mitzenmach er , M. A. Shokrollah i, and D. A. Spielman, “E fficient Erasure Correcting Codes, ” IEEE T ransa ctions on Information Theory , v ol. 47, no. 2, pp. 569–5 84, 2001. [44] A. Shokrollahi, “Raptor Codes, ” IEEE Tr ansactions on Information Theory , vol. 52, no. 6, pp. 2551–2567, 2006 . 43 [45] R. K oetter and M. Medard , “An algebraic approach to network coding, ” IEEE trans actions on Networking , vol. 11, no. 5, pp. 782– 795, 2003. [46] P . A. Chou, Y . W u, and K. Jain, “Pr actical Network C oding , ” in 51st Allerton Confer ence on Communication, Contr ol and Computing , 2003. [47] C. Gkantsidis and P . R. Rodriguez, “Network coding for large scale content distribution, ” in IEEE INFOCOM, Pr oc. I EEE V ol. 4 , 2005, pp. 2235 –2245. [48] M. Y ajnik, S.B . Moon, J.F . Kurose, and D.F . T o wsley , “Measurement and Modeling of the T emporal Dependence in Packet Loss, ” in IEEE INFOCOM Pr oc. IEEE V ol. 1 , 1999, pp. 345–352 . [49] P . Rossi, G. Romano, F . Palmieri, and G. Iannello , “A Hidden Markov Model for Internet Channels, ” in IEEE International Symposium on Signal Pr ocessin g and Information T echnolo gy , 2003. [50] W . Kellerer , E. Steinbach, P . Eisert, and B. Gir od, “A Real-Time Internet Streaming Media T estbed, ” in Pr oc. IEEE International Confer ence on Multimedia and Expo , 200 2, pp. 453– 456. [51] X. Henocq and C. Guillemot, “Source Adaptiv e Error Control for Real-time V ideo over the Internet, ” Numr o spcial imag e et vido. Herms. Rseaux et systme rparti. Calculateurs P a rallles , vol. 12, no. 3-4, 2000. [52] F . L. Leannec and C. Guillemot, “Error Resili ent V ideo T ransmission ov er the Internet, ” i n Pr oc. V isual Communication and Imag e Pr ocessing , 1999. [53] K. Salamatian and V aton, “Hidden Markov Modeling for Network Communication Channels, ” in Proc. ACM SIGMETRI CS , 2001, pp. 92 – 101. [54] Ron M. Roth, Introd uction to Coding Theory , 1st ed. Cambridge Univ ersity Press, 2006, pp. 183–204. [55] J. Padhy e, V . Firoiu, D.F T o wsley , and J.F . K urose, “Modeling TCP Reno performance: a si mple model and its empirical v alidation, ” IEEE/ACM Tr ansactions on Networking , vol. 8, no. 2, pp. 133 – 145, 2000 . [56] Amir Dembo and Ofer Zeitouni, Lar ge Deviations T echniques and A pplications , 2nd ed. New Y ork: Springer , 1998, pp. 11–43. [57] J. L. Kelley, General T opology . Springer , 1975, pp. 40–4 3. [58] C. H. Papadimitriou, Computational Complexity , 1st ed. New Y ork: Addison W esle y , 1994. [59] T . H. Cormen, C. E. Leiserson, R. L. Rives t, and C . Stein, Introdu ction to Algorithms , 2nd ed. M IT P ress, 2001, pp. 347–349. [60] W . Rudin, Principles of Mathematical Analysis , 3rd ed. McGraw-Hill, 1976. [61] S . Boyd and L. V andenb erghe, Con vex Optimization , 1st ed. Cambridge , UK: Cambridge Univ ersity P ress, 2004, pp. 243–245. [62] T . Cov er and J. Thomas, Elements of Information Theory , 1st ed. Ne w Y ork: Wiley , 1991, pp. 291–294 . [63] —— , Elements of Information Theory , 1st ed. New Y ork: Wiley , 1991, pp. 30–31.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment