Non-equilibrium Information Envelopes and the Capacity-Delay-Error-Tradeoff of Source Coding

1 Non-equilibrium Information En v elopes and the Capacity-Delay-Error -T radeof f of Source Coding Ralf L ¨ ubben, Markus Fidler Institute of Communications T echnology Leibniz Uni versit ¨ at Hannov er Abstract —This paper dev elops an en velope-based approach to establish a link between information and queueing theory . Unlike classical, equilibrium information theory , information en velopes focus on the dynamics of sources and coders, using functions of time that bound the number of bits generated. In the limit the information en velopes conv erge to the av erage behavior and reco ver the entropy of a source, respectiv ely , the av erage codeword length of a coder . In contrast, on short time scales and f or sour ces with memory it is shown that large deviations from known equilibrium results occur with non- negligible probability . These can cause signiﬁcant network delays. Compared to well-known trafﬁc models from queueing theory , information env elopes consider the functioning of inf ormation sources and coders, av oiding a priori assumptions, such as exponential trafﬁc, or empirical, trace-based trafﬁc models. Using results from the stochastic network calculus, the envelopes yield a characterization of the operating points of source coders by the triplet of capacity , delay , and error . In the limit, assuming an optimal coder the required capacity approaches the entropy with arbitrarily small probability of error if inﬁnitely large delays are permitted. W e deriv e a corresponding characterization of channels and prove that the model has the desirable property of additivity , that allows analyzing coders and channels separately . I . I N T RO D U C T I O N Originating from the seminal works by Shannon in 1948, the tremendous progress in information and coding theory has enabled numerous ground-breaking applications that range from digital communications to data storage and processing. The fundamental results of information theory are asymptotic limits for the transmission of information by a source over a channel. Information theory deﬁnes the notion of entropy and channel capacity as the expected information of a source and the maximum expected transinformation of a channel. Coding theory devises practical source and channel codes for data compression and reliable transmission that seek to approach the limits established by the entropy and the channel capacity , respectiv ely [11]. In networking, information theory has not become widely accepted, yet. A major challenge for establishing a network information theory is due to the properties of network data trafﬁc that is highly variable and delay-sensitiv e [14]. In contrast, information theory mostly neglects the dynamics of information and capacity and focuses on av erages, re- spectiv ely , asymptotic limits. T ypically , these limits can be achiev ed with arbitrarily small probability of error assuming, This work has been funded by the German Research Foundation (DFG). howe ver , arbitrarily long code words and as a consequence arbitrarily large coding delays [3]. In networking, ho wev er , delay is a ke y performance parameter that can be traded for capacity or loss using results from queueing theory . Moreover , considering the v ariability of sources is essential in packet data networks as it potentiates signiﬁcant resource savings due to statistical multiplexing [14]. The analytical cornerstone of networking is queueing theory that dates back to the works on the dimensioning of circuit- switched networks by Erlang in 1909 and 1917. In 1962 Klein- rock advanced the theory and prov ed the resource efﬁcienc y of packet-switching that is achiev ed by bursty sources due to resource sharing. For packet-switched networks queueing theory can provide exact solutions for backlogs and delays that occur due to the variability of packet inter -arriv al and service times. T ypically , the inter-arri val and service times obey a certain distrib ution by assumption, e.g., exponential. Recent approaches like the theory of effecti ve bandwidths [8], [24], deterministic network calculus [8], [12], [25], and the stochastic network calculus [8], [9], [13], [15], [22], [26] compute performance bounds for a wider range of stochas- tic processes. Despite the need, e.g., for joint coding and scheduling problems or for cross-layer optimization, a tight link between these models and information theory has not been established, so far [3], [14], [22]. T o bridge the gap towards queuing theory , a non-equilibrium information theory that can model the variability and delay- sensitivity of real sources is required [3], [14]. While [14] en visions “ef fectiv e bandwidth versus distortion functions, ” [3] proposes the idea of “throughput-delay-reliability-triplets” to characterize mobile ad-hoc networks. As potentially promising candidate theories [3], [14], [22] mention effecti ve bandwidths, large de viations, or the stochastic network calculus, ho we ver , without providing any details and conclude that unifying information and queueing theory remains as one of the most important challenges. In this paper we formulate a non-equilibrium theory of in- formation sources and source coders combining methods from information theory and effecti ve bandwidths, respectively , the stochastic network calculus. W e characterize information sources by env elope functions that are statistical bounds on the amount of information generated by the source in a time interval of deﬁned width. While on short time-scales the en velopes can exceed the entropy considerably , they approach the entropy on long time-scales and con verge in the limit. W e derive such information en velopes for memoryless sources 2 and develop a technique for analysis of Markov sources. W e ﬁnd that the memory of a source signiﬁcantly increases the en velope compared to its entropy and that it leads to a slower con ver gence. Using a sample path argument for the env elopes we deriv e a notion of the achiev able capacity-delay-error- tradeoff of a coded source. W e recov er known asymptotic results if the capacity approaches the av erage codeword length where the delay tends to inﬁnity for any non-trivial probability of error . W e show the capacity-delay-error -tradeoff for dif fer- ent coders, including Huffman, Shannon, and Lempel-Ziv . W e ﬁnd that the coder with the smallest average codew ord length does not necessarily achiev e the best delay performance. W e prov e that our model has the fa vorable property of additi vity , permitting the independent analysis of sources and channels. W e expect that our model enables further joint information- and queueing-theoretical in vestigations that ha ve the potential to pro vide substantial new insights and applications from a holistic analysis of communications networks. The remainder of this paper is structured as follows. In Sec. II we introduce env elope processes and dev elop the queueing model that we apply in Sec. III to characterize and analyze information sources and coders. In Sec. IV we show how to apply our model to analyze the transmission of coded sources via a Gilbert-Elliott channel and in Sec. V we discuss related works. W e provide brief conclusions in Sec. VI. I I . E N V E L O P E S A N D P E R F O R M A N C E B O U N D S In this section we introduce the concept of statistical en- velopes that are the basis of this work. W e use the analyti- cal framew ork of the stochastic network calculus established in [9], [13] to compute statistical performance bounds of the type P [ backlog > y ] ≤ ε or P [ delay > y ] ≤ ε from en velopes. For a broader overvie w see, e.g., [17], [22]. In Sec. II-A we dev elop our model of sources and channels and prove its additivity . In Sec. II-B we assemble a method for construction of statistical en velopes from results on exponentially bounded burstiness [22], [39] and on en velopes [9], [26]. A. Leg endr e T ransform Model W e use a discrete time model t ∈ N 0 . Denote A ( t ) the cumulativ e arriv als at a system, i.e., the cumulative number of bits generated by a source in the interval [0 , t ] . By def- inition A ( t ) is a non-negati ve and non-decreasing random process. By conv ention A (0) = 0 . W e use shorthand notation A ( τ , t ) = A ( t ) − A ( τ ) . Similarly , the cumulative departures from a system are denoted D ( t ) . By deﬁnition A ( t ) , D ( t ) ∈ F where F = { f : f ( t ) ≥ f ( τ ) ≥ 0 ∀ t ≥ τ ≥ 0 , f (0) = 0 } . The service guarantee of a system, e.g., a communications link, a channel, or an entire network, is expressed by a statistical service curve that pro vides a lower bound for the departures that may be violated with a deﬁned probability . A system has service curve S ( t ) ∈ F with deﬁcit proﬁle ε S ( σ ) with σ ≥ 0 if for all t ≥ 0 it holds that P [ D ( t ) < A ⊗ S ( t ) − σ ] ≤ ε S ( σ ) (1) where ⊗ is the min-plus con volution deﬁned for t ≥ 0 as f ⊗ g ( t ) := inf τ ∈ [0 ,t ] { f ( τ ) + g ( t − τ ) } . Similarly , statistical env elopes provide upper bounds for the arriv als. The arriv als hav e en velope E ( t ) ∈ F with overﬂo w proﬁle ε E ( σ ) with σ ≥ 0 if for all t ≥ 0 it holds that P [ A ( t ) > A ⊗ E ( t ) + σ ] ≤ ε E ( σ ) . (2) Using the deﬁnition of service curves and arri val env elopes, statistical backlog and delay bounds can be computed from the maximal vertical and horizontal deviation of E ( t ) and S ( t ) , respectiv ely . In this work we use the concav e and con ve x Legendre transforms of E ( t ) and S ( t ) deﬁned for c ≥ 0 as 1 L E ( c ) := sup t ≥ 0 { E ( t ) − ct } , L S ( c ) := sup t ≥ 0 { ct − S ( t ) } to model sources and channels, respectiv ely . Legendre trans- forms uniquely determine concav e arri val en velopes and con- ve x service curves and enjoy a number of useful properties in the network calculus [18]. The following Lem. 1 sho ws that backlog and delay bounds can be computed from L E ( c ) and L S ( c ) by a simple addition. The property of additi vity is particularly useful as it allo ws composing results obtained for sources E ( t ) and systems S ( t ) independently . Lem. 1 extends an earlier deterministic result for backlogs from [18]. Lemma 1 (Additivity of Leg endr e T ransforms): Giv en a system with service curve S ( t ) and deﬁcit proﬁle ε S ( σ ) and arriv als with env elope E ( t ) and ov erﬂo w proﬁle ε E ( σ ) . For any c ≥ 0 and σ E , σ S ≥ 0 it holds for the backlog B that P [ B > L E ( c ) + L S ( c ) + σ E + σ S ] ≤ ε E ( σ E ) + ε S ( σ S ) and assuming fcfs order it holds for the delay W that P [ W > ( L E ( c ) + L S ( c ) + σ E + σ S ) /c ] ≤ ε E ( σ E ) + ε S ( σ S ) . Letting σ = σ E + σ S we refer to ε ( σ ) = ε E ( σ E ) + ε S ( σ S ) as the probability of error that can be minimized for σ E , σ S ≥ 0 as ε ( σ ) = inf σ E + σ S = σ { ε E ( σ E ) + ε S ( σ S ) } = ε E ⊗ ε S ( σ ) . For the special case of a constant rate serv er with capacity c we have S ( t ) = ct with ε ( σ ) = 0 for σ ≥ 0 such that L S ( c ) = 0 . It follo ws from Lem. 1 that L E ( c ) + σ E is a backlog bound with probability of error ε E ( σ E ) , i.e., L E ( c ) has the intuitive interpretation of a backlog bound for arriv als with en velope E ( t ) at a constant rate server with capacity c . Similarly , L S ( c ) is a backlog bound for constant rate arri vals with rate c at the system S ( t ) . Pr oof: Given arriv als A ( t ) and departures D ( t ) . The backlog of the system is B ( t ) = A ( t ) − D ( t ) . By substitution of (1) for D ( t ) and (2) for A ( t ) it follows for any t ≥ 0 that P [ B > b ] ≤ ε E ⊗ ε S ( σ ) where b = sup t ≥ 0 { E ( t ) − S ( t ) } + σ [13]. W e rewrite b = sup t ≥ 0 { E ( t ) − ct + ct − S ( t ) } + σ where c ≥ 0 . It follo ws that b ≤ sup t ≥ 0 { E ( t ) − ct } + sup t ≥ 0 { ct − S ( t ) } + σ which completes the proof of the backlog bound. 1 The Legendre transform is also referred to as Fenchel conjugate [32]. Strictly speaking the concave conjugate is deﬁned as inf t ≥ 0 { ct − E ( t ) } = − sup t ≥ 0 { E ( t ) − ct } . W e slightly adapt the deﬁnition for ease of exposition. 3 The delay of the system is deﬁned as the horizontal de vi- ation W ( t ) = inf { τ ≥ 0 : A ( t ) ≤ D ( t + τ ) } . As above, it follows for an y t ≥ 0 that P [ W > d ] ≤ ε E ⊗ ε S ( σ ) where d = inf { τ ≥ 0 : sup t ≥ 0 { E ( t ) − S ( t + τ ) + σ } ≤ 0 } . W e rewrite d = inf { τ ≥ 0 : E ( t ) − c ( t + ϑ )+ c ( t + ϑ ) − S ( t + τ )+ σ ≤ 0 , ∀ t ≥ 0 } where c ≥ 0 . W e choose ϑ = sup t ≥ 0 { E ( t ) − ct } /c such that E ( t ) − c ( t + ϑ ) ≤ 0 for all t ≥ 0 and estimate d ≤ inf { τ ≥ 0 : c ( t + ϑ ) − S ( t + τ ) + σ ≤ 0 , ∀ t ≥ 0 } . After some reordering d ≤ inf { τ ≥ 0 : c ( t + τ ) − S ( t + τ ) + σ ≤ c ( τ − ϑ ) , ∀ t ≥ 0 } we arri ve at d ≤ inf { τ ≥ 0 : ( ct − S ( t ) + σ ) /c + ϑ ≤ τ , ∀ t ≥ 0 } . It follo ws that τ = ϑ + sup t ≥ 0 { ct − S ( t ) } /c + σ /c and d ≤ sup t ≥ 0 { E ( t ) − ct } /c + sup t ≥ 0 { ct − S ( t ) } /c + σ /c completes the proof of the delay bound. B. Construction of En velopes W e construct statistical env elopes as deﬁned in (2) from the moment generating function (MGF) of the arriv als. W e assume stationary arri vals, i.e., P [ A ( τ , τ + t ) > y ] = P [ A ( t ) > y ] for any y and all τ , t ≥ 0 . The MGF of the arri v als is M A ( θ , t ) = E  e θA ( t )  where θ is a free parameter . Closely related is the concept of effecti ve bandwidths deﬁned for θ > 0 as [8], [24] α ( θ , t ) = 1 θ t ln M A ( θ , t ) . (3) The effecti ve bandwidth increases in θ > 0 from the mean rate of the arri vals in an interval of length t to their peak rate, providing an estimate of their capacity requirements. Given an aggregate of independent arri vals A ( t ) = A 1 ( t ) + A 2 ( t ) the effecti ve bandwidth α ( θ, t ) = α 1 ( θ , t ) + α 2 ( θ , t ) is additive, since for the sum of independent random processes it holds that M A ( θ , t ) = M A 1 ( θ , t ) M A 2 ( θ , t ) . From Chernof f ’ s theorem P [ Y ≥ y ] ≤ e − θy M Y ( θ ) for θ ≥ 0 an upper bound on the arriv als follows as P [ A ( t ) > F ( t ) + ς ] ≤ e − θ ( F ( t )+ ς ) M A ( θ , t ) = κe − θς (4) where we chose to equate the right hand side with κe − θς with parameters κ ∈ (0 , 1] and ς ≥ 0 . W e solve for F ( t ) and obtain F ( t ) = tα ( θ , t ) − ln κ/θ . (5) By construction F ( t ) is an en velope for A ( t ) that is violated at most with probability κe − θς for any t ≥ 0 . It does, howe ver , not satisfy the deﬁnition from (2) that requires a sample path argument for all t ≥ 0 . W e rewrite (2) as P [ A ( t ) > A ⊗ E ( t ) + σ ] = P [ ∃ τ : A ( τ , t ) > E ( t − τ ) + σ ] (6) and obtain from the union bound that P [ A ( t ) > A ⊗ E ( t ) + σ ] ≤ t − 1 X τ =0 P [ A ( τ , t ) > E ( t − τ ) + σ ] where we used that the addend at τ = t is zero since E (0) + σ ≥ 0 and by deﬁnition A ( t, t ) = 0 . W e select E ( t ) = F ( t ) + δ t where F ( t ) is gi ven in (5) and δ > 0 is a free parameter . By substitution of ς = σ + δ t we obtain from (4) that P [ A ( t ) ≥ E ( t ) + σ ] ≤ κe − θ ( σ + δt ) and for A ( t ) stationary P [ A ( t ) > A ⊗ E ( t ) + σ ] ≤ κe − θσ t − 1 X τ =0 e − θδ ( t − τ ) . For any t ≥ 0 we estimate P t − 1 τ =0 e − θδ ( t − τ ) ≤ P ∞ τ =1 e − θδ τ . Since e − θδ τ is decreasing in τ we can bound each summand by e − θδ τ ≤ R τ τ − 1 e − θδ τ dτ to arri ve at P [ A ( t ) > A ⊗ E ( t ) + σ ] ≤ κe − θσ Z ∞ 0 e − θδ τ dτ = κe − θσ θ δ . Using the deﬁnition of env elope (2) we equate ε E ( σ ) = κe − θσ / ( θ δ ) . W ithout loss of generality we choose ε E (0) = 1 and solve for κ = θδ where δ ≤ 1 /θ such that κ ≤ 1 . By insertion of κ into (5) we deriv e from E ( t ) = F ( t ) + δ t that E ( t ) = ( α ( θ , t ) + δ ) t − ln( θ δ ) /θ has o verﬂo w proﬁle ε E ( σ ) = e − θσ and ﬁnd the Legendre transform L E ( c ) = sup t ≥ 0 { ( α ( θ , t ) + δ − c ) t } − ln( θ δ ) θ . (7) For a deterministic constant rate serv er with capacity c it holds that L S ( c ) = 0 with deﬁcit proﬁle ε S ( σ ) = 0 for σ ≥ 0 . It follows from Lem. 1 that P [ B ≥ L E ( c ) + σ ] ≤ e − θσ , i.e., L E ( c ) + σ is a backlog bound with exponentially decaying probability of error ε = e − θσ . The parameters θ > 0 and δ ∈ (0 , 1 /θ ] can be optimized to minimize backlog, respectiv ely , delay bounds. Giv en ε we can solv e ε = e − θσ for σ = − ln ε/θ and deriv e the minimal backlog bound b = inf θ> 0  L E ( c ) − ln ε θ  . A minimal delay bound follows as d = inf θ> 0  L E ( c ) c − ln ε θ c  . (8) Remark on Related Envelope Models: Using the Legendre transform (7) formalizes a backlog bound that can also be deriv ed from the exponentially bounded burstiness model [39] P [ A ( t ) > ρt + σ ] ≤ κe − θσ for t ≥ 0 . By application of the union bound as above a backlog bound for a constant rate server with capacity c is P [ B ≥ σ ] ≤ κe − θσ / ( θ δ ) where c = ρ + δ . Choosing κ = sup τ ≥ 0 { M A ( θ , τ ) e − θρτ } , that is the optimal solution from Chernoff ’ s theorem, the tw o backlog bounds can be conv erted into one another . W e note that a similar result can be obtained by approxi- mation of (6) by the lar gest term P [ A ( t ) > A ⊗ E ( t ) + σ ] ≈ sup τ ∈ [0 ,t ] { P [ A ( τ , t ) > E ( t − τ ) + σ ] } that strictly pro vides only a lower bound. Letting E ( t ) = F ( t ) from (5) where L F ( c ) = sup t ≥ 0 { ( α ( θ , t ) − c ) t } at κ = 1 yields that L F ( c ) + σ 4 A ( t ) D ( t ) s y m bo l s ou r ce e n c od e r d ec od e r s y m bo l s i nk qu e u e i ng n e t w o r k Fig. 1. Uniﬁed system model. A source generates symbols according to a deﬁned random process. The symbols are encoded and transmitted as arriv als A ( t ) by a queueing network. The network departures D ( t ) are decoded and deliv ered to the sink. is a backlog bound that is violated approximately with e − θσ . In comparison, (7) trades the slack rate δ to achie ve a true upper bound. I I I . S O U R C E M O D E L S A N D S O U R C E C O D E R S In this section we in vestigate the performance of a net- worked information source. An example of a relev ant system is shown in Fig. 1, where the symbols of a source are encoded and transmitted by a network. Our aim is to combine information- and queueing-theoretic aspects to identify achiev- able operating points within the capacity-delay-error-space of the joint system, i.e., giv en a network with service curve S ( t ) , e.g., in the most simple case S ( t ) = ct , can the system achieve a delay bound d with probability of error of at most ε ? W e specify the detailed system model below . Consider a random variable X that can take any of the values, also called symbols, x i with probability p i . W e also refer to X as the alphabet of the source and denote | X | its cardinality . Information theory deﬁnes that if the ev ent X = x i occurs, it provides information I ( x i ) = − ld p i bit where ld denotes the logarithm dualis, i.e., with base 2. The expected information becomes H X := − P i p i ld p i that is deﬁned as the entropy of X . W e label successi ve symbols generated by a discrete source by n ∈ N . The stochastic process X ( n ) has entropy rate H X = lim n →∞ H ( X (1) , X (2) , . . . , X ( n )) /n , i.e., H X is the entropy per symbol. For stationary processes the entrop y rate equals H X = lim n →∞ H ( X ( n ) | X ( n − 1) , X ( n − 2) , . . . , X (1)) [11]. W e assign a number of bits l i to each symbol x i and deﬁne function l to map x i to l i . Accordingly , L ( n ) = l ( X ( n )) deﬁnes a random process of bit lengths that are generated by the symbol process X ( n ) . As L ( n ) is an increment process we obtain the cumulativ e arriv al process as A ( n ) = P n ν =1 L ( ν ) . W e let A (0) = 0 by deﬁnition. Shannon established the entropy of a source as a fundamen- tal limit for lossless data compression. T o this end, a code maps symbols x i to unique code words of length l i where the compression gain is due to assigning short codew ords to frequent symbols. If no codeword is a preﬁx of any other codew ord, the code is referred to as a preﬁx code, where each code word can be decoded on its own. For an optimal code the e xpected codew ord length l = P i p i l i is bounded in an interval of one bit width by the entrop y as H X ≤ l < H X + 1 [11]. In the next sections we inv estigate the non-equilibrium behavior of memoryless as well as Markov sources and show examples for ﬁnite and inﬁnite alphabets. Secondly , we analyze the performance of well-known coders, such as the Huffman coder , Shannon coder, and Lempel-Ziv coder . W ithout loss of generality we restrict our in vestigation to binary codes. A. Memoryless Sources W e start our inv estigation with the basic memoryless source where the symbols X ( n ) are independent and identically distributed (iid). From the memorylessness it follows that the entropy rate of the process equals the entrop y of a single symbol, i.e., H X = H X . W e use function l to assign a number of bits l i to each symbol x i . By deﬁnition L ( n ) = l ( X ( n )) has categorial distribution with MGF M L ( θ ) = X i p i e θl i . (9) For the cumulativ e arriv al process A ( n ) = P n ν =1 L ( ν ) it follows that M A ( θ , n ) = ( M L ( θ )) n is multinomial. Assuming a source that emits symbols at a constant rate of one symbol per timeslot we substitute n = t . W e relax this assumption in Sec. III-E. W e equate l i = − ld p i such that M A ( θ , t ) is the MGF of the number of information bits of all symbols generated up to time t . From (3) we deri ve α ( θ ) = 1 θ ln  X i p 1 − θ ln 2 i  (10) that does not depend on t due to the memorylessness of the source. An upper env elope on the number of information bits generated by the source up to time t that is violated at most with probability κ follows immediately from (5), where θ > 0 is a free parameter that can be optimized. The env elope provides a benchmark that can be interpreted as a statistical non-equilibrium bound on the number of bits generated by a (hypothetical) optimal coder that maps symbols x i to codewords of lengths l i = − ld p i . The coder is optimal in the sense that it’ s average codeword length equals the entropy of the source. In practice, this may not be achie vable since − ld p i typically is non-integer . For comparison, the Shannon code has l i = d− ld p i e . Geometrically Distributed Symbols: Assume an inﬁnite alphabet with geometrically distributed symbols p i = p (1 − p ) i for i ≥ 0 . The entropy rate follows by insertion and application of the geometric sum as H X = − p ld p + (1 − p ) ld(1 − p ) p . Similarly , α ( θ ) follows from (10) for 0 < θ < ln 2 as α ( θ ) = 1 θ ln p 1 − θ ln 2 1 − (1 − p ) 1 − θ ln 2 ! . (11) W e show respecti ve en velopes from (5) for p = 0.25, 0.5, and 0.75 in Fig. 2. The corresponding entropy rates are H X ≈ 3.25, 2, and 1.08 bit, respectively . The violation probability of the information en velopes is κ = 10 − 6 . W e normalized the en velopes by the corresponding entropy , i.e., we plot F ( t ) / H X . Accordingly , the black line with slope one is the ex- pected normalized information by time t . The non-equilibrium information en velopes show a signiﬁcant deviation from the expected value. The non-linearity of the en velopes arises after 5 0 20 40 60 80 100 0 50 100 150 200 time t normalized envelope F(t) /H X p = 0.25, H X ≈ 3.25 p = 0.5, H X = 2 p = 0.75, H X ≈ 1.08 0 20 40 60 80 100 0 1 2 3 4 5 6 7 time t increments of the envelope F(t+1) − F(t) p = 0.25, H X ≈ 3.25 p = 0.5, H X = 2 p = 0.75, H X ≈ 1.08 Fig. 2. Information en velopes of a memoryless source with geometrically distributed symbols with parameter p . The env elopes show that the actual information rate can be signiﬁcantly larger than the entropy rate (slope one in the top ﬁgure, respectively , horizontal lines in the bottom ﬁgure). It conv erges, howe ver , quickly if longer time intervals are considered. minimization of (5) over θ > 0 for any point in time t ≥ 0 . T o see the con ver gence in equilibrium we also depict the increments of the en velopes, that hav e the interpretation of an information rate, as well as the respective entropy rates H X . While the increments of the en velope deviate largely from the entropy on short time scales the y con verge quickly if longer time interv als are considered. B. Huffman Coding Next, we consider en velopes for the number of bits gen- erated by a Huf fman coder and deri ve performance bounds. T o construct a Huffman code execute the following steps repeatedly until all symbols of the source ha ve been processed: • sort the symbols in decreasing order of probability , • substitute the two least probable symbols by a new com- pound symbol, assign the sum of the two probabilities, and add one bit to the respective code words to distinguish the indi vidual symbols. The Huffman preﬁx code achiev es the minimal expected codew ord length, hence H X ≤ l < H X + 1 . Regarding the individual codew ord lengths l i , howe ver , no such simple upper bound exists. In fact, it is shown in [23] that indi vidual codew ords of a Huf fman code can become as lar ge as ap- proximately 1 . 44 times the information of the corresponding symbol, i.e., l i < − 1 . 45 ld p i . Compared to the information en velope where l i = − ld p i , e.g., Fig. 2, the actual code word lengths of a Huffman coder may signiﬁcantly increase the number of bits generated. W e characterize source coders by their capacity-delay-error- tradeoff, i.e., by ( c, d, ε ) where d = L E ( c ) /c − ln ε/ ( θ c ) for any θ > 0 from (8). Assuming a memoryless source we ﬁrst obtain α ( θ ) for the coder from (9) as α ( θ ) = 1 θ ln  X i p i e θl i  . Since α ( θ ) does not depend on t the condition c > α ( θ ) is sufﬁcient to achiev e ﬁnite L E ( c ) from (7). W e choose the free parameter δ ∈ (0 , 1 /θ ] as δ = c − α ( θ ) . It follows that L E ( c ) = − ln( θ δ ) /θ and we obtain from (8) that d = inf θ> 0  − ln( θ ( c − α ( θ )) ε ) θ c  (12) is a delay bound with error probability ε . The ( c, d, ε ) -tradeoff expresses the capacity that is required to achieve a delay bound subject to a deﬁned probability of error . The delays are due to the randomness that is introduced by variable code word lengths. Depending on the amount of buf fering in the network, the error can be a violation of the delay bound, or a loss of information due to buf fer overﬂo w . As an implementation option, the en velopes can be used to discard excess data, that can occur at most with probability ε , proacti vely by the coder itself, such that the delay bound is not violated. In the limit θ → 0 , i.e., permitting arbitrarily large delays d → ∞ we recov er that a capacity of c → l bit per timeslot sufﬁces to transmit the symbols of the source with arbitrarily small probability of error ε → 0 . Geometrically Distributed Symbols: As for Fig. 2 assume an inﬁnite alphabet with geometrically distributed symbols p i = p (1 − p ) i for i ≥ 0 . W e let p = 1 / 2 to obtain a dyadic source where − ld p i = i + 1 is integer . The respective Huffman code uses codewords of lengths l i = i + 1 such that α ( θ ) for the Huffman coder is identical to (11) in this case. Gi ven c and ε we compute d as described abov e and optimize θ ∈ (0 , ln(2)) numerically . The entropy rate of the source is H X = 2 and the expected codeword length is l = 2 . Fig. 3 depicts the ( c, d, ε ) -tradeoff of the Huf fman coded source. For c > l ﬁnite delay bounds can be computed, whereas the delay gro ws unbounded for c → l . Also, Fig. 3 shows the logarithmic growth of d for decaying ε that is characteristic of the approach. C. Shannon Coding Shannon coding works as follows. Assume all symbols x i are ordered in decreasing order of their probabilities, i.e., p i ≥ p i +1 . Denote F i = P j 0  α ( θ , 1)( s − 1) cs − ln( θ ( c − α ( θ , 1) /s ) ε ) θ c  . Fig. 5 depicts the performance of the Lempel-Ziv encoder for different parameters s , see T ab . I. The maximum pointer length is limited to 8 s bit. The Elias-delta coding of the pointer becomes more efﬁcient with increasing s such that the window size w that can be addressed increases signiﬁcantly . Accordingly , the probability p hit that a random sequence of s symbols is found in w increases. Note that since the algorithm operates on sequences of s symbols, the unit of w are s symbols, too. Finally , the normalized av erage codew ord length l/s shows the achiev able compression gain. For comparison the entropy rate of the source is H X ≈ 4 . 98 and the av erage codew ord length that is achieved by the Huffman coder is l ≈ 5 . 03 requiring, howe ver , a priori knowledge of the symbol distribution. Moreov er , Fig. 5 sho ws the ( c, d, ε ) -tradeof f of the Lempel- Ziv coder for different s compared to the Huf fman coder . The capacity requirements of the Lempel-Zi v coder improve with increasing s , respecti vely , increasing window size and approach the entropy ev entually . The encoding of sequences of s symbols introduces, howe ver , an additional delay at the encoder . Beyond that, it makes the encoded sequence more bursty , i.e., the encoder emits a codeword for s symbols every s timeslots, which causes further delay . T ABLE I P A R A ME T E R S O F T H E L E M PE L - Z IV C O D ER . s w p hit l/s [bit] 1 2 4 − 2 0.51 7.49 2 2 10 − 2 0.75 7.08 3 2 16 − 2 0.71 6.74 4 2 24 − 2 0.85 6.71 5 2 31 − 2 0.90 6.46 6 2 38 − 2 0.91 6.34 7 2 46 − 2 0.96 6.22 8 2 54 − 2 0.98 6.16 5 5.5 6 6.5 7 7.5 8 8.5 9 0 20 40 60 80 100 capacity c delay d LZ, s = 1 s = 2 s = 3 s = 4 s = 5 s = 6 s = 7 s = 8 Huffman Fig. 5. Lempel-Ziv coding with different windo w sizes w , compared to Huffman coding. The entropy rate of the source is H X ≈ 4 . 98 . W ith increasing windo w size the Lempel-Ziv coder e ventually approaches the entropy . E. V ariable Symbol Rate So far , we assumed that sources generate symbols at a constant rate. Next, we sho w ho w sources with a v ariable symbol rate can be modeled using conditional MGFs and analyzed by unconditioning. Given a memoryless source and denote M L ( θ ) the MGF of the increments (9). The conditional MGF of n arri vals becomes M A ( θ , n ) = ( M L ( θ )) n . Here, the count of arri v als N ( t ) is a random process with probability mass function p N ( n, t ) . The MGF M A ( θ , t ) of the arriv al process A ( t ) follo ws by unconditioning such that α ( θ , t ) = 1 θ t ln ∞ X n =0 ( M L ( θ )) n p N ( n, t ) . (15) P oisson Pr ocess: A Poisson process with mean rate λ has p N ( n, t ) = e − λt ( λt ) n /n ! . By insertion into (15) it follows that α A ( θ ) = λ θ ( M L ( θ ) − 1) where we used that P ∞ n =0 a n /n ! = e a . Since α ( θ ) does not depend on t , delay bounds follow immediately from (12). W e show an example for a source that generates eight different symbols with geometrically decreasing probability p i = 1 / 2 i for 1 ≤ i ≤ 7 and p 8 = p 7 such that P i p i = 1 . Since the source is dyadic the code word lengths of the corresponding Huffman code (as well as the Shannon code) are l i = − ld p i bit. The entropy rate as well as the av erage codew ord length are H X = l ≈ 2 bit. The MGF of the 8 0 2 4 6 8 10 0 10 20 30 40 50 capacity c delay d Poisson, uncoded 3 bit words Poisson, Huffman coded Poisson, entropy sized words constant rate, Huffman coded H X ≈ 2 Fig. 6. Huffman coded Poisson source compared to an uncoded Poisson source, a hypothetical Poisson source with constant, entropy-sized codewords, and a Huf fman coded constant rate source. The doubly randomness of the Huffmann coded Poisson source causes noticeable delays. Compared to an uncoded Poisson source, the Huffman coder achie ves, howe ver , a signiﬁcant improvement. increments M L ( θ ) follows from (9). Fig. 6 shows the ( c, d, ε ) - tradeoff from (12) for the Huf fman coded Poisson source. For comparison with this doubly random process, we show results for a Huf fman coded constant rate source as well as a hypothetical Poisson source with constant length code words of length H X bit. The av erage symbol rate of all sources is λ = 1 . Clearly , the Huf fman coded constant rate source achiev es zero delay if c ≥ 7 since the codew ords have at most sev en bit length, whereas in case of the Poisson arriv al process no such limit e xists since an arbitrarily lar ge number of symbols may arriv e within a single timeslot. Finally , results for an uncoded Poisson source where each symbol is encoded using three bit are shown to depict the compression gain of the Huf fman coder . F . Markov Sour ces In the following we relax the assumption of memoryless sources and consider discrete, stationary Markov sources, i.e., random processes X ( n ) with ﬁrst order dependence where the symbol x i that occurs in step n depends only on the previous symbol x j in step n − 1 . The symbol x i is also referred to as the state of the Marko v chain that can take an y of the v alues i = 1 , 2 , . . . , m . An example of a two state Markov chain is shown in Fig. 7. W e denote p i the stationary state distribution of the chain and q ij the transition probabilities from state i to state j . Deﬁne P to be the ro w vector ( p 1 , p 2 , . . . , p m ) and Q to be the state transition matrix. The stationary state distribution is the solution of P = PQ under the normalization condition P1 = 1 where 1 is a column vector of ones. Due to the ﬁrst order dependence the entropy rate of a Markov source becomes H X = H ( X ( n ) | X ( n − 1)) [11] and using the notation abo ve H X = − P i P j p i q ij ld q ij . Next, we compute information en velopes for Markov sources. The MGF of a discrete Marko v chain that produces a constant amount of data l i if it is in state i is known from [8]. Let L be the diagonal matrix diag ( e θl 1 , e θl 2 , . . . , e θl n ) . As before we substitute n = t assuming a source that emits symbols x 1 x 2 q 21 q 12 q 22 q 11 Fig. 7. Example two-state Markov chain. at a constant rate of one symbol per timeslot. The effecti ve bandwidth of the Markov chain for t ≥ 1 is known as α ( θ , t ) = 1 θ t ln( P ( L ( θ ) Q ) t − 1 L ( θ ) 1 ) . (16) Regarding (16), we can, ho we ver , not substitute l i by the amount of information generated in state i since the informa- tion provided by symbol x i depends on the previous symbol x j , i.e. each symbol has conditional information I ( x i | x j ) = − ld q j i bit. Overall, for a Markov chain with m states we can distinguish m 2 distinct pairs of successive symbols. T o solve the problem posed by the conditional information we extend the state space from m to m 2 states. W e denote the states i | j , respectiv ely , x i | x j meaning that symbol x i occurred in the current timeslot after symbol x j occurred in the pre vious timeslot. Due to this expansion the information generated by a single symbol in any state of the chain is uniquely determined by the state itself, i.e., the information generated by symbol x i in state i | j is I ( x i | x j ) = − ld q j i . The transition probability from state j | k to state i | j is q j i for any i, j, k and zero otherwise. Fig. 8 shows the accordingly extended Markov model for the example from Fig. 7. Giv en the transition matrix of the extended Markov model we compute the stationary state distrib ution and let l i | j = − ld q j i to compute α ( θ, t ) from (16). An information env elope follo ws from (5). T wo-state Markov Sour ce: W e show an e xample for a two state Markov source as depicted in Fig. 7. The stationary state distribution follo ws from the balance equations as p 1 = q 21 / ( q 12 + q 21 ) and p 2 = q 12 / ( q 12 + q 21 ) . As a measure of the burstiness of the source we use the a verage time to change state twice T = 1 /q 12 + 1 /q 21 . W e choose p 1 = 5 / 8 and p 2 = 3 / 8 and use different b urstiness parameters T ≈ 4 . 3 , T = 8 , and T = 16 . The corresponding state transition matrices Q = ( q 11 , q 12 ; q 21 , q 22 ) are Q = ( 5 / 8 , 3 / 8 ; 5 / 8 , 3 / 8 ) , i.e. the source is memoryless, Q = ( 4 / 5 , 1 / 5 ; 1 / 3 , 2 / 3 ) , and Q = ( 9 / 10 , 1 / 10 ; 1 / 6 , 5 / 6 ) , respectiv ely . The entropy of a single symbol follo ws as H X = − P i p i ld p i ≈ 0 . 95 bit and the entrop y rate H X = H ( X ( n ) | X ( n − 1)) = − P i P j p i q ij ld q ij ≈ 0 . 95 , H X ≈ 0 . 80 , and H X ≈ 0 . 54 bit, respectively . W e use the extended model in Fig. 8 that has the stationary state distribution x 1 | x 1 q 12 q 22 q 11 x 1 | x 2 x 2 | x 1 x 2 | x 2 q 22 q 21 q 11 q 12 q 21 Fig. 8. Extended Markov model for the e xample from Fig. 7 where the information generated by symbol x i giv en the previous symbol was x j is uniquely determined by the state x i | x j itself. 9 0 20 40 60 80 100 0.5 1 1.5 2 2.5 3 3.5 time t increments of the envelope F(t+1) − F(t) T ≈ 4.3, H A ≈ 0.95 T = 8, H A = 0.8 T = 16, H A ≈ 0.54 0 500 1000 1500 2000 0.5 0.75 1 1.25 1.5 time t increments of the envelope F(t+1) − F(t) T ≈ 4.3, H X ≈ 0.95 T = 8, H X = 0.8 T = 16, H X ≈ 0.54 Fig. 9. Increments of the information env elopes of a two-state Markov source with different b urstiness parameters T , where T ≈ 4 . 3 corresponds to a mem- oryless source. The upper ﬁgure zooms into the lower one. While increasing memory T reduces the entrop y rate H X it causes a slower con vergence of the information env elopes, i.e., the source can de viate signiﬁcantly from its expected information rate with non-negligible probability . p 1 | 1 = p 1 q 11 , p 1 | 2 = p 2 q 21 , p 2 | 1 = p 1 q 12 , and p 2 | 2 = p 2 q 22 . W e compute α ( θ , t ) from (16) and information env elopes F ( t ) from (5) which we minimize for θ > 0 . Fig. 9 shows the increments of env elopes F ( t ) with viola- tion probability κ = 10 − 6 . For small t / 10 the en velopes are determined by the worst-case, i.e., the maximal amount of information that can be emitted by the Marko v source. F or parameter T = 8 the occurrence of symbol x 2 after symbol x 1 has the lar gest information I ( x 2 | x 1 ) = − ld q 12 ≈ 2 . 32 bit follo wed by the occurrence of symbol x 1 after symbol x 2 with I ( x 1 | x 2 ) = − ld q 21 ≈ 1 . 58 bit. Since direct transitions from state x 2 | x 1 to state x 2 | x 1 are not possible, the maximal information is achie ved by a sequence of alternating x 1 and x 2 causing the zigzags in between I ( x 2 | x 1 ) and I ( x 1 | x 2 ) for small t . The same ar gument applies for T = 16 . In contrast, for T ≈ 4 . 3 the source is memoryless such that the information I ( x 2 | x 1 ) = − ld q 12 equals I ( x 2 | x 2 ) = − ld q 22 ≈ 1 . 42 bit, i.e., the maximum information is achie ved by a sequence of all x 2 such that zigzags do not occur . Due to statistical ef fects for t ' 10 the worst-case occurs with probability less than κ = 10 − 6 such that it does not dominate the env elopes that approach the entropy rate for large t . While increasing memory T reduces the entropy rate it causes, howe ver , a signiﬁcantly slower con ver gence x 1 , x 1 q 11 x 1 , x 2 x 2 , x 1 x 2 , x 2 2 q 11 q 12 q 12 q 12 q 21 q 22 Fig. 10. Extended Markov model for the example from Fig. 7 where states correspond to the occurrence of supersymbols that are sequences of s symbols, here s = 2 . of the en velope. This is due to unf av orable, high-information sequences of symbols that are not excluded from the en velope by the violation probability κ . G. Coding Markov Sour ces For our in vestigation of coded Marko v sources we assign to each symbol x i a codeword of length l i without requiring further assumptions about the coder used. W e compute α ( θ , t ) of a coded Markov source from (16). T o compute a delay bound from (8) we require that L E ( c ) from (7) is ﬁnite. Since α ( θ , t ) increases in t it has to hold that c > α ( θ, t ) for all t ≥ 0 . W e choose the free parameter δ ∈ (0 , 1 /θ ] as δ = c − sup t ≥ 0 { α ( θ , t ) } . It follo ws that L E ( c ) = − ln( θ δ ) /θ and a delay bound with error probability ε is d = inf θ> 0  − ln( θ ( c − sup t ≥ 0 { α ( θ , t ) } ) ε ) θ c  . (17) The compression gain of such a straightforward encoding of a Marko v source is, ho wev er , limited by the entropy of a single symbol H X since the memory of the source is not utilized. T o achieve further compression do wn to the entropy rate H X the coder has to be adapted. One approach is to encode sequences of s symbols instead of single symbols. In this case the av erage normalized codeword length is limited by H ( X (1) , X (2) , . . . , X ( s )) /s which approaches H X for s → ∞ . Given a Markov source with m distinct symbols, respectiv ely , states. If we group s subsequent symbols we can distinguish m s supersymbols. T o model such groups of symbols we extend the state space of the Markov chain to m s states, accordingly . Fig. 10 sho ws the extended model for the Markov chain from Fig. 7 for s = 2 . Here, states x i , x j , respectiv ely , i, j denote the group of symbol x j followed by symbol x i . Hence, the state transition probabilities from state k , y to state i, j are q kj q j i for any i, j, k , y . W e assign a unique code word to each of the m s groups of s symbols and use the code word lengths to determine the diagonal matrix L . For a coder that encodes a group of s symbols ev ery s timeslots α ( θ, t ) follo ws from the extended Mark ov model for t ≥ 1 as α ( θ , t ) = ln( P ( L ( θ ) Q ) d t/s e− 1 L ( θ ) 1 ) / ( θt ) . T o obtain the delay bound from (8) we choose the free parameter δ ∈ (0 , 1 /θ ] as δ = c − sup t ≥ 0 { α ( θ , st ) } and compute L E ( c ) from (7). Grouping s symbols adds an additional delay of s − 1 timeslots. T wo-State Markov Sour ce: As an example we employ the two-state Marko v source as sho wn in Fig. 7 with transition matrix Q = ( 9 / 10 , 1 / 10 ; 1 / 6 , 5 / 6 ) and encode groups of s 10 T ABLE II P A R A ME T E R S O F T H E H U FF MA N C O D E R . s H ( X (1) , . . . , X ( s )) /s [bit] l/s [bit] 1 0.954 1.000 2 0.746 0.781 3 0.676 0.682 4 0.641 0.651 5 0.620 0.630 6 0.607 0.618 7 0.600 0.602 8 0.589 0.593 0.6 0.7 0.8 0.9 1 1.1 1.2 0 100 200 300 400 capacity c delay d s = 1 s = 2 s = 3 s = 4 s = 5 s = 6 s = 7 s = 8 Fig. 11. Huffman coded Markov source. Due to the memory the normalized entropy of groups of s symbols decreases with increasing s . The average codew ord length of the Huffman code approaches the entropy rate with increasing s resulting, howev er , in delays due to the variability of the codew ord lengths and due to the grouping of symbols. symbols using a Huffman coder . T ab . II shows the entropy and the average code word length normalized by s for s = 1 , . . . , 8 . Clearly , the entropy decreases with increasing s and for s → ∞ we ﬁnd the entrop y rate H X ≈ 0 . 537 . As T ab . II conﬁrms, the Huffman encoding of groups of symbols can approach the entropy rate quite well, howev er , at the cost of delays. In Fig. 11 we sho w a delay bound subject to an error probability of ε = 10 − 6 . The delays are due to the variability of the codeword lengths and due to the grouping of s symbols which causes an additional delay of s − 1 timeslots. Moreov er , the grouping makes the encoded sequence more bursty . Depending on c different values of s are optimal, e.g., if c > 1 the delay is minimized for s = 1 whereas for smaller c larger s are advantageous. Certain parameters s , i.e., s = 3 , . . . , 6 mark ed by dotted lines, are outperformed for all c . This effect is caused by the individual Huffman codes for each s that are more or less ef ﬁcient. As an alternativ e to the grouping of symbols as described abov e, Markov sources can be encoded efﬁciently using in- dividual codes for each of the states, i.e., the last symbol determines the code that is used to encode the next symbol. T o model an encoder that chooses the code depending on the last symbol we extend the Markov model as described in Sec. III-F, e.g., Fig. 8 for a two-state Markov chain. W e denote l i | j the length of the codeword that is used for symbol x i giv en the last symbol is x j . Using the e xtended model α ( θ, t ) follows from (16). A delay bound can be computed from (17). Example for State Dependent Codes: Consider a three- state Markov source with transition matrix Q = ( 1 / 2 , 1 / 4 , 1 / 4 ; 1 / 4 , 1 / 2 , 1 / 4 ; 1 / 4 , 1 / 4 , 1 / 2 ) . W e construct an extended nine-state Markov model, as in Sec. III-F, where the code used in state i | j to encode symbol x i is conditioned on the last symbol x j . Accordingly , if the last symbol was x 1 , the optimal codeword lengths are l 1 | 1 = 1 bit, l 2 | 1 = 2 bit, and l 3 | 1 = 2 bit, whereas l 1 | 2 = 2 bit, l 2 | 2 = 1 bit, and l 3 | 2 = 2 bit apply if the last symbol was x 2 , and l 1 | 3 = 2 bit, l 2 | 3 = 2 bit, and l 3 | 3 = 1 bit if the last symbol was x 3 . I V . T R A N S M I S S I O N V I A A G I L B E RT - E L L I OT T C H A N N E L In this section, we show ho w our results on source coding from Sec. III can be composed with channel models, such as the Gilbert-Elliott channel. Ke y to this composition is the additivity established by Lem. 1. T o this end, we require a service curve model of the channel. Service curves of wireless channels have been derived, e.g., in [2], [16], [22], [28], [36]. For ease of exposition, we resort to the impairment model from [22]. The model assumes a work-conserving channel, e.g., with peak rate R , that is impaired by a stationary random process I ( τ , t ) . Given I ( τ , t ) has env elope E ( t ) with overﬂo w proﬁle ε E ( σ ) (2) the channel has service curve S ( t ) = Rt − E ( t ) with deﬁcit proﬁle ε S ( σ ) = ε E ( σ ) (1) [22]. W e assume a two-state Gilbert-Elliott channel that is either in good state, i.e., data are transmitted error-free with rate R , or in bad state, i.e., data cannot be decoded and are lost. The transition probabilities between the two states are ﬁrst order dependent, i.e., the model is a Markov chain. Using the impairment model, the corresponding impairment process is a two-state Markov chain and has rate zero in state 1 (good) and rate R in state 2 (bad) [16], i.e., it consumes no or all av ailable resources, respectiv ely . The effecti ve bandwidth α ( θ , t ) of the impairment process is gi ven by (16) and an env elope follo ws as E ( t ) = ( α ( θ , t ) + δ ) t − ln( θ δ ) /θ with ε E ( σ ) = e − θσ where θ > 0 and δ ∈ (0 , 1 /θ ] , see Sec. (II-B). Putting all pieces together, we compute S ( t ) and obtain the delay bound ( L S ( c ) + σ S ) /c with error probability ε S ( σ ) = e − θσ S for arri vals with constant rate c , where L S ( c ) = sup t ≥ 0 { ( c + α ( θ , t ) + δ − R ) t } − ln( θ δ ) θ . As before, we let σ S = − ln ε S /θ and choose δ ∈ (0 , 1 /θ ] as δ = R − c − sup t ≥ 0 { α ( θ , t ) } such that L S ( c ) = − ln( θ δ ) /θ and a delay bound with error probability ε S is d = inf θ> 0  − ln( θ ( R − c − sup t ≥ 0 { α ( θ , t ) } ) ε S ) θ c  . A delay bound for variable rate arriv als from a source coder follo ws by a simple addition of the respectiv e Legendre transforms, i.e., from Lem. 1 ( L E ( c ) + L S ( c ) + σ E + σ S ) /c is a delay bound for the composed systems with error probability ε E ( σ E ) + ε S ( σ S ) . 11 2 2.4 2.8 3.2 3.6 4 0 20 40 60 80 100 arrival rate, respectively, capacity c delay d Gilbert−Elliott channel Huffman coded source Composite system Fig. 12. T ransmission of a Huffman coded source via a Gilbert-Elliott channel. The average codeword length of the source is 2 and the average rate of the channel is 4. The individual curves show the delay bound obtained for the Gilbert-Elliott channel given constant rate arriv als with rate c , respectively , obtained for the Huffman coded source given a channel with constant service rate c . The delay bound for the composite system is obtained from Lem. 1 by taking the minimum of the sum of the two curves, i.e., 53 timeslots. T ransmission of a Huffman Coded Sour ce: W e consider transmitting the source from Fig. 3 via a Gilbert-Elliott chan- nel with peak rate R = 6 and two-state Markov impairment process with generator matrix Q = ( 7 / 8 , 1 / 8 ; 1 / 4 , 3 / 4 ) . The state probabilities in equilibrium are P = ( 2 / 3 , 1 / 3 ) and the average rate of the channel is 4 . Fig. 12 shows the indi vidual capacity- delay-error-tradeof fs of the source coder and the channel, each with probability of error ε E = ε S = 10 − 6 . Moreov er , we show the sum of the two curves that is a delay bound for the composite system consisting of the Huffman source coder and the Gilbert-Elliott channel for any c ≥ 0 . While c has the interpretation of a constant arriv al rate, respectiv ely , constant service rate if we consider L S ( c ) and L E ( c ) in isolation, it does not have such physical meaning for the composite system, where L S ( c ) + L E ( c ) can be minimized over c ≥ 0 . The minimal delay bound of the composite system follows as 53 timeslots with probability of error ε = 2 · 10 − 6 . V . R E L AT E D W O R K Neglecting the variability and delay sensitivity of real sources, information theory has not become widely accepted in networking so far , see [14] for an excellent surve y and a discussion of the gap between respective theories. Re- cently , [3] proposes non-equilibrium information theory as a new paradigm and highlights the potentialities, dif ﬁculties, and possible approaches. The authors en vision a characterization of mobile ad-hoc networks by “throughput-delay-reliability- triplets. ” In this paper we deri ved a feasible implementation and provided respecti ve models for source coders and channels that complement the vision. The variability of fading channels is considered already in [29] where a notion of outage capacity is deﬁned. The outage capacity models the probability of errors that occur when the transmission rate is larger than the instantaneous capacity of the channel. A related concept, the delay-limited capacity [21], compensates ﬂuctuations of the fading process using po wer control to achiev e a constant transmission rate. Subsequent works use related concepts to implement po wer control subject to additional buf fering constraints [5], [27]. Recently , the impact of ﬁnite blocklength codes on the v ari- ability of the channel is in vestigated, e.g., in [4], [30], [31]. While the deﬁnition of outage capacity does not contain any queueing-theoretic aspects, it can be incorporated into a queueing analysis, as shown in [1] using the M | G | 1 model. Markovian queues have also been parameterized to model fading channels in [6], [7]. While [6] models a block fading process by a variable rate server that is gov erned by an embedded Markov chain, [7] views fading outages as an impairment process that is modeled by high priority customers at an M | G | 1 priority queue. The concept of an impairment process was also introduced to the stochastic network calculus to analyze outages of wireless channels [22]. Similar to the concept of effecti ve bandwidth [36] dev elops an effecti ve capacity model to analyze delays due to fading. Multi-access channels are modeled in [34] as a processor sharing queueing system whose capacity is adapted according to the interference created by acti ve stations. Regarding traf ﬁc sources, networking research frequently assumes certain stochastic processes or emplo ys trafﬁc traces. In [20] it is shown how the effecti ve bandwidth of traces, e.g., for MPEG video, can be computed and in [35] empirical en velopes for variable bit rate traf ﬁc are derived. The models facilitate performance analysis of networks using respective queueing models. Information theoretic concepts itself are, howe ver , not used. Recent papers [10], [19] pro vide a frame- work that includes network elements that process and re-scale data into the analysis. In this work, we model the compression of data by source coders, which complements the approach. A calculus for so-called information-driven networks is introduced in [37], where the focus is on information instead of data trafﬁc. T o this end, the entropy function is employed to conv ert the data of a ﬂo w A ( t ) to its expected information H ( A ( t )) . By substitution of H ( A ( t )) for A ( t ) the framework of the network calculus is used to compute redeﬁned metrics such as the information backlog and the information delay . Compared to [37], in this work we did not deﬁne en velopes for the expected information of a source. Instead we derived en- velopes for the actual amount of bits generated by memoryless as well as Markov sources and for different implementations of source coders. V I . C O N C L U S I O N In this paper , we in vestigated a statistical env elope-based approach towards a non-equilibrium information theory . W e applied Legendre transforms to characterize sources and sys- tems by their achiev able capacity-delay-error-tradeof f. The additivity of the model facilitates a separability of sources and systems that is comparable to the separation of entrop y and channel capacity in information theory . In addition to the a verage beha vior , statistical en velopes and their Le gendre transforms consider non-negligible de viations that can cause signiﬁcant network latencies. If arbitrary delays are permit- ted, our model recovers the entropy , respectively , average 12 codew ord length in the limit. W e provided information en- velopes for memoryless as well as Markov sources, where we show how the memory increases the v ariability . W e derived the capacity-delay-error -tradeoff of Huffman, Shannon, and Lempel-Ziv coders as well as for Gilbert-Elliott channels. Our models are applicable in the frame works of the theory of effecti ve bandwidths and the stochastic network calculus enabling joint information- and queueing-theoretic cross-layer research. R E F E R E N C E S [1] N. Ahmed and R. G. Baraniuk. Throughput measures for delay- constrained communications in fading channels. In Pr oc. Allerton Confer ence on Communication, Control and Computing , 2003. [2] S. Akin and M. . C. Gurso y . Effectiv e capacity analysis of cognitiv e radio channels for quality of service provisioning. IEEE T rans. W ireless Commun. , 9(11):3354–3364, Nov . 2010. [3] J. Andrews, S. Shakkottai, R. Heath, N. Jindal, M. Haenggi, R. Berry , D. Guo, M. Neely , S. W eber, S. Jafar , and A. Y ener . Rethinking information theory for mobile ad hoc networks. IEEE Commun. Mag. , 46(12):94–101, 2008. [4] D. Baron, M. A. Khojastepour, and R. G. Baraniuk. How quickly can we approach channel capacity? In Pr oc. Asilomar Confer ence on Signals, Systems, and Computers , Nov . 2004. [5] R. A. Berry and R. G. Gallager . Communication ov er fading channels with delay constraints. IEEE T rans. Inf. Theory , 48(5):1135–1149, 2002. [6] I. Bettesh and S. Shamai. Queuing analysis of the single user f ading channel. In Proc. IEEE Convention of the Electrical and Electr onic Engineers in Israel , 2000. [7] J. Burdin and R. Landry . Delay analysis of wireless Nakagami fading channels. In Proc. IEEE Globecom , 2008. [8] C.-S. Chang. P erformance Guarantees in Communication Networks . Springer-V erlag, 2000. [9] F . Ciucu, A. Burchard, and J. Liebeherr . Scaling properties of statistical end-to-end bounds in the network calculus. IEEE/ACM T rans. Netw . , 14(6):2300–2312, 2006. [10] F . Ciucu, J. B. Schmitt, and H. W ang. On expressing networks with ﬂow transformations in conv olution-form. In Proc. IEEE INFOCOM , Apr . 2011. [11] T . M. Cover and J. A. Thomas. Elements of Information Theory . Wiley- Interscience, second edition, 2006. [12] R. L. Cruz. A calculus for network delay , part I and II: Network elements in isolation and network analysis. IEEE T rans. Inf. Theory , 37(1):114– 141, 1991. [13] R. L. Cruz. Quality of service management in Integrated Services networks. In Pr oc. Semi-Annual Research Review , Center of W ireless Communication, UCSD , June 1996. [14] A. Ephremides and B. Hajek. Information theory and communica- tions networks: An unconsummated union. IEEE Tr ans. Inf. Theory , 44(6):2416–2434, 1998. [15] M. Fidler . An end-to-end probabilistic network calculus with moment generating functions. In Quality of Service, 2006. IWQoS 2006. 14th IEEE International W orkshop on , pages 261 –270, 2006. [16] M. Fidler . A network calculus approach to probabilistic quality of service analysis of fading channels. In Proc. of IEEE Globecom , 2006. [17] M. Fidler . Survey of deterministic and stochastic service curve models in the network calculus. Communications Surve ys T utorials, IEEE , 12(1):59 –86, 2010. [18] M. Fidler and S. Recker . Conjugate network calculus: A dual approach applying the Legendre transform. Computer Networks , 50(8):1026– 1039, 2006. [19] M. Fidler and J. B. Schmitt. On the way to a distributed systems calculus: an end-to-end network calculus with data scaling. In Proc. of ACM SIGMETRICS/P erformance , pages 287–298, 2006. [20] R. J. Gibbens. Traf ﬁc characterisation and effectiv e bandwidths for broadband network traces. In Stoc hastic Networks: Theory and Ap- plications , number 4 in Royal Statistical Society Lecture Notes, pages 169–179. Oxford University Press, 1996. [21] S. V . Hanly and D. N. Tse. Multiaccess fading channels-part ii: Delay- limited capacities. IEEE T rans. Inf. Theory , 44(7):2816–2831, 1998. [22] Y . Jiang and Y . Liu. Stochastic Network Calculus . Springer-V erlag, 2008. [23] G. Katana and O. Nemetz. Huffman codes and self-information. IEEE T rans. Inf. Theory , 22(3):337–340, 1976. [24] F . P . Kelly . Notes on effectiv e bandwidths. In Stochastic Networks: Theory and Applications , number 4 in Royal Statistical Society Lecture Notes, pages 141–168. Oxford University Press, 1996. [25] J.-Y . Le Boudec and P . Thiran. Network Calculus A Theory of Deterministic Queuing Systems for the Internet . Springer-V erlag, 2001. [26] C. Li, A. Burchard, and J. Liebeherr . A network calculus with effectiv e bandwidth. IEEE/ACM T rans. Netw . , 15(6):1442–1453, 2007. [27] X. Li, X. Dong, and D. Wu. Queue length aware power control for delay-constrained communication over fading channels. Wir eless Communications and Mobile Computing . T o appear. [28] K. Mahmood, A. Rizk, and Y . Jiang. On the ﬂow-le vel delay of a spatial multiplexing mimo wireless channel. In Proc. IEEE ICC , June 2011. [29] L. H. Ozarow , S. Shamai, and A. D. W yner. Information theoretic considerations for cellular mobile radio. IEEE Tr ans. V eh. T echnol. , 43(2):359–378, 1994. [30] Y . Polyanskiy , H. V . Poor, and S. V erd ´ u. Dispersion of the Gilbert-Elliott channel. In Proc. IEEE ISIT , June 2009. [31] Y . Polyanskiy , H. V . Poor , and S. V erd ´ u. Channel coding rate in the ﬁnite blocklength regime. IEEE T rans. Inf . Theory , 56(5):2307–2359, May 2010. [32] R. T . Rockafellar . Conve x Analysis . Princeton University Press, 1972. [33] D. Salomon. V ariable-length codes for data compression . Springer- V erlag, 2007. [34] E. T elatar and R. B. Gallager . Combining queueing theory with information theory for multiaccess. IEEE J . Sel. Ar eas Commun. , 13(6):963–969, 1995. [35] D. E. Wrege, E. W . Knightly , H. Zhang, and J. Liebeherr. Deterministic delay bounds for VBR video in packet-switching networks: Fundamental limits and practical trade-offs. IEEE/ACM T rans. Netw . , 4(3):352–362, 1996. [36] D. W u and R. Negi. Effectiv e capacity: A wireless link model for support of quality of service. IEEE T rans. W ir eless Com. , 2(4):630–643, 2003. [37] K. W u, Y . Jiang, and G. Hu. A calculus for information-driven networks. In Pr oc. IEEE IWQoS , July 2009. [38] A. W yner and J. Zi v . The sliding-window lempel-ziv algorithm is asymptotically optimal. Proc. IEEE , 82(6):872–877, 1994. [39] O. Y aron and M. Sidi. Performance and stability of communication networks via rob ust e xponential bounds. IEEE/ACM T rans. Netw . , 1(3):372–385, 1993.

Non-equilibrium Information Envelopes and the Capacity-Delay-Error-Tradeoff of Source Coding

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment