An Implementable Scheme for Universal Lossy Compression of Discrete Markov Sources

We present a new lossy compressor for discrete sources. For coding a source sequence $x^n$, the encoder starts by assigning a certain cost to each reconstruction sequence. It then finds the reconstruction that minimizes this cost and describes it los…

Authors: Shirin Jalali, Andrea Montanari, Tsachy Weissman

An Implementable Scheme for Universal Lossy Compression of Discrete   Markov Sources
An Implementable Scheme for Uni v ersal Lossy Compression o f Discrete Marko v Sources Shirin Jalali, ∗ Andrea Montan ari ∗ and Tsachy W eissman ∗ † , ∗ Departmen t of Electrical E ngineerin g, Stanford University , Stanfo rd, CA 94 305, † Departmen t of Electrical E ngineerin g, T echnion , Haifa 32 000, Israel { shjalali, montanar, tsachy } @stanford.e du Abstract W e presen t a new lossy co mpressor for discrete sources. For coding a sou rce sequen ce x n , the enco der starts by assigning a certain cost to each reco nstruction seque nce. It then find s the reconstru ction that min imizes this cost an d describes it losslessly to the deco der via a u niv ersal lossless compre ssor . The cost of a sequ ence is g iv en by a linear c ombination o f its empir ical pr obabilities of so me order k + 1 and its distortio n r elati ve to the source sequence. The linear structur e o f the cost in the emp irical cou nt matrix allows the en coder to em ploy a V iterb i-like algorithm for ob taining the minim izing reconstruc tion sequence simply . W e iden tify a ch oice of co efficients for the linear comb ination in the co st f unction which ensures that the algo rithm universally achieves the op timum rate-distortio n perfo rmance of any Markov sou rce in the lim it o f large n , provided k is increa sed as o (log n ) . I . I N T RO D U C T I O N Let X = { X i : i ≥ 1 } represent a discrete-valued stationary ergodic process with unknown statist ics, and consider the problem of compressi ng X at rate R such that the incurred dis tortion is minimized. Let X and ˆ X denote finit e s ource and reconst ruction alp habets respectiv ely . The performance of the described coding scheme is measured by its avera ge expected distortio n between source and reconstructi on blocks, i.e. D = E d n ( X n , ˆ X n ) , 1 n n X i =1 E d ( X i , ˆ X i ) , (1) where d : X × X → R + is a sing le-letter distortio n measure. For any R ≥ 0 , the minimu m achiev able distortion (cf. [4] for exact definition of achiev abi lity) is characterized as [1], [2], [3] D ( X , R ) = lim n →∞ min p ( ˆ X n | X n ): I ( X n ; ˆ X n ) ≤ R E d n ( X n , ˆ X n ) . (2) A sequence of codes at rate R is called un iv ersal if for ev ery stati onary er godic source X its asymp totic performance con verges to D ( X , R ) , i.e., lim sup n →∞ E d n ( X n , ˆ X n ) ≤ D ( X , R ) . (3) For lossl ess compression where the source is t o be reco vered wi thout any errors, there a lready exist well-known implementabl e universal schemes such as Lemp el-Ziv codi ng [5] or arithmetic coding [6]. In contrast to the si tuation of lossless compression , for D > 0 , there are no w ell-known practical s chemes that univ ersally achiev e the rate-distortion curve. In recent years, t here has been prog ress t ow ards designing univ ersal lossy com pressor especially in trying to tune some of the existin g universal l ossless cod ers to work in the lossy case as well [7], [8], [9]. All of these algorithms are either provably subopt imal, or optimal but with exponential complexity . Another approach for lossy compression, which is very well-stu died in the literature and even i mple- mented in JPEG 2000 image compression standard, i s Trellis coded quanti zation, i .e. T rellis structured code p lus V iterbi encoding (c.f. [1 0], [11 ] and references therein). Th is meth od is i n g eneral sub optimal for coding so urces that have memory [11]. In [12], an algorithm for fixed-slope Tr elli s source coding is proposed, and is shown to be able to get arbitrary close to the rate-distortion curve for continuou s-va lued stationary er godi c sou rces. The proposed method is ef ficient in low rate region. In a recent work [13], a n e w implementable a lgo rithm for lossy compression of discrete-v alued stationary er godic sources was proposed. Instead of fixing rate (or distorti on) and m inimizing di stortion (or rate), the new algo rithm fixes Lagrangian coefficient α , and mi nimizes R + α D . Thi s is d one by assi gning ener gy E ( y n ) representing R + αD to each possi ble reconstructi on sequence and finding the sequence that minimizes the cost by simulated annealing. The algorithm starts by letting y n = x n , and at each iteration chooses an index i ∈ { 1 , . . . , n } uniformly at random , and probabilisti cally chang es y i to some y ∈ ˆ X such that there is a positive probabili ty (which go es to ze ro as the number of iterations increases) that the resulting sequence has higher energy than the original sequence. Allowing the ener gy to increase especially at initial steps prev ents the algorith m from being entrapped in a local minim um. It was shown that us ing a univ ersal lossless compressor to describe the reconstruction s equence resulting from this process to the decoder results in a s cheme wh ich is univ ersal i n t he l imit of many iterations and large block length. The drawback of the proposed scheme is that although its computational complexity per iteration is independent of the block length n and lin ear in a parameter k n = o (log n ) , there is no useful bound on the number of iterations required for con ver gence. In this paper , inspired by the previous method, we propose y et another approach for lossy compression of discrete Markov sources which universally achie ves optimum rate-distortion performance for any di screte Markov source. W e start by assigning the same cost that was defined for each pos sible reconstruction sequ ence in [13]. The cost of each sequence is a linear combination of two terms: it s empirical conditional entropy and it s distance to the so urce sequence to be coded. W e show that th ere exists proper linear approx imation of the first term s uch that minimizi ng the linearized cost results in the sam e performance as minimizing t he original cost. But t he adva ntage is that minimizi ng the modified cost can be done via V it erbi algorithm in lieu of simul ated annealing which was used for minimizin g the original cost. The organization of the paper is as fol lows. In Section II, we set up the notatio n, and define the count m atrix and empirical con ditional entropy of a sequence. Section III describes a new codi ng scheme for fixed-slope los sy comp ression which univer sally achieves th e rate-distortio n curve for any discrete Markov source and IV describ es how to compute t he coef ficients requi red by t he algorithm outl ined in the previous section. Section V explains how V iterbi algorithm can be used for imp lementing the coding scheme described in Section III. Section VI p resents some simul ations results, and finall y , Section VII concludes th e paper wit h a discussi on of some future directions. Proofs t hat are not presented in the paper will appear in the full version. I I . N O TA T I O N S A N D R E Q U I R E D D E FI N I T I O N S Let X and ˆ X denote the s ource and reconst ruction alphabets respectively . Let m atrix m ( y n ) ∈ R | ˆ X | × R | ˆ X | k represent ( k + 1) th order em pirical count o f y n defined as m β , b ( y n ) = 1 n    1 ≤ i ≤ n : y i − 1 i − k = b , y i = β ]    . (4) In (4), and througho ut we assume a cyclic con vention whereby y i , y n + i for i ≤ 0 . Let H k ( y n ) denote the conditional em pirical entropy of order k induced by y n , i.e. H k ( y n ) = H ( Y k +1 | Y k ) , (5) where Y k +1 on the righ t hand si de of (5) is dist ributed according to P( Y k +1 = [ b , β ]) = m β , b ( y n ) , (6) where β ∈ ˆ X , and b ∈ ˆ X k , and [ b , β ] represents the vector made by concatenation of b and β . W e wil l use the same no tation througho ut the paper , n amely , β , β ′ , . . . ∈ ˆ X , and b , b ′ , . . . ∈ ˆ X k . The conditional empirical entro py in (5) can be expressed as a function of m ( y n ) as follows H k ( y n ) = H k ( m ( y n )) := 1 n X β , b H ( m · , b ( y n )) 1 T m · , b ( y n ) , (7) where 1 and m · , b ( y n ) denote the all -ones column ve ctor of length | ˆ X | , and the column in m ( y n ) corresponding to b respectiv ely . For a vector v = ( v 1 , . . . , v ℓ ) T with non-negativ e com ponents, we let H ( v ) denote t he entropy o f the random variable whose probability mass functi on (pmf) is proporti onal to v . Formally , H ( v ) =    ℓ P i =1 v i k v k 1 log k v k 1 v i if v 6 = (0 , . . . , 0) T 0 if v = (0 , . . . , 0) T . (8) I I I . L I N E A R I Z E D C O S T F U N C T I O N Consider the following scheme for lossy sou rce coding at fixe d slope α > 0 . For each source sequence x n let the reconstruct ion blo ck ˆ x n be ˆ x n = arg min y n ∈ ˆ X n [ H k ( y n ) + αd n ( x n , y n )] . (9) The encoder , after computin g ˆ x n , loss lessly con veys it to the d ecoder using LZ compression. Let k grow slowly enough with n so that lim sup n →∞ max y n  1 n ℓ LZ ( y n ) − H k ( y n )  ≤ 0 , (10) where ℓ LZ ( y n ) denotes the length of t he LZ representation of y n . N ote that Ziv’ s inequalit y guarantees that if k = k n = o (lo g n ) t hen (10 ) ho lds. Theor em 1: [13] Let X be a s tationary and ergodic source, let R ( X , D ) denote its rate distortion function, and let ˆ X n denote the reconstruct ion usi ng the above scheme for coding X n . Then E  1 n ℓ LZ ( ˆ X n ) + αd n ( X n , ˆ X n )  n →∞ − → min D ≥ 0 [ R ( X , D ) + αD ] . (11) In other words, conv eying the reconst ruction sequ ence to the d ecoder via universal lossless compressio n (selection of LZ algo rithm here is for concreteness, but other universal lossless m ethods can be used as well) achiev es optim um fixed-slope rate-distorti on performance universally . As propos ed in [13], the e xhaustive search required by this algorithm ca n be tackled through sim ulated annealing Gibbs sampl ing. Here assumi ng the source is a dis crete Markov source, we propose another method for finding a sequence achieving the minimum in (9). The advantage of the new m ethod is that its com putational comp lexity is li near in n for fixed k . Before d escribing the new scheme, consider t he p roblems (P1) and (P2) described below . ( P1 ) : min y n [ H k ( m ( y n )) + αd n ( x n , y n )] , (12) and ( P2 ) : min y n " X β X b λ β , b m β , b ( y n ) + αd n ( x n , y n ) # . (13) Comparing (P1) wi th (9) reveals that it is the optim ization required by the exhaustive search coding scheme described before. The quest ion is whether it is po ssible to choose a set of coefficients { λ β , b } β , b , β ∈ ˆ X and b ∈ ˆ X k , such that (P1) and (P2) hav e the sam e set of mi nimizers or at least, the set of minimizers of (P2) is a sub set of minimi zers of (P1). If the answer t o this questio n i s affi rmative, then instead of solvin g (P1) one can s olve (P2), wh ich, as we describe in Section V, can be done sim ply v ia the V iterbi alg orithm. Let S 1 and S 2 denote the set of mi nimizers of (P1) and (P2). Consider s ome z n ∈ S 1 , and let m ∗ n = m ( z n ) . Since H ( m ) i s concave in m , 1 for any empirical coun t matrix m , we have H ( m ) ≤ H ( m ∗ n ) + X β , b ∂ ∂ m β , b H ( m ) | m ∗ n ( m β , b − m ∗ β , b ) (14) , ˆ H ( m ) . (15) Now assume that i n (P2), the coeffic ients are chosen as follows λ β , b = ∂ ∂ m β , b H ( m )   m ∗ n . (16) Lemma 1: (P1) and (P2) hav e the sam e m inimum value, i f the coefficients are chosen according to (16). Moreover , if all the sequences in S 1 hav e the sam e t ype, then S 1 = S 2 . Pr o of: For any y n ∈ ˆ X n , H ( m ( y n )) + αd n ( x n , y n ) ≤ ˆ H ( m ( y n )) + αd n ( x n , y n ) . (17) Therefore, min y n [ H ( m ( y n )) + αd n ( x n , y n )] ≤ min y n [ ˆ H ( m ( y n )) + αd n ( x n , y n )] (18) ≤ ˆ H ( m ( z n )) + αd n ( x n , z n ) (19) = min y n [ H ( m ( y n )) + αd n ( x n , y n )] . (20) This s hows that (P1) and (P2) hav e the same mi nimum values. For any sequence y n with m ( y n ) 6 = m ∗ n , by stri ct con ca vity of H ( m ) , ˆ H ( m ( y n )) + αd n ( x n , y n ) > H ( m ( y n )) + αd n ( x n , y n ) , (21) ≥ min y n [ H ( m ( y n )) + αd n ( x n , y n )] . (22) As a result all the s equences in S 2 should hav e the empirical coun t matrix equal to m ∗ n . Since for th ese sequences H ( m ) = ˆ H ( m ) , we als o conclude that S 2 ⊂ S 1 . If th ere is a unique m inimizing type m ∗ n , then S 1 = S 2 . This s hows that if we knew the optim al type m ∗ n , then we could com pute th e opt imal coefficients vi a (16), and solve (P2) ins tead o f (P1). The p roblem is that m ∗ n is not known to t he encoder (si nce knowledge of m ∗ n requires solving (P1) which is the problem we are trying to av oid). In the next s ection, we describe a m ethod for approxi mating m ∗ n , and hence t he coefficients { λ β , b } . 1 As proved in Appendix B. I V . H O W TO C H O O S E T H E C O E FFI C I E N T S ? For a given stati onary er godi c source X , and for any given count matrix m define D ( m ) to be t he minimum av erage expected distortion among all processes Y that are jointly stationary ergodic with X and their ( k + 1) th order stationary distribution is ac cording to m . 2 D ( m ) can equiv alently be defined as D ( m ) = lim k 1 →∞ min p ( x k 1 ,y k 1 ) ∈M ( k 1 ) E p d ( x k 1 , y k 1 ) , (23) where M ( k 1 ) is t he set of all jo intly stat ionary distributions p ( x k 1 , y k 1 ) of ( X k 1 , Y k 1 ) with marginal distributions wi th respec t t o x coinciding with the k th 1 order distribution of X process, and with mar ginal distributions with respect to y coinciding with m , i.e., h a ving the ( k + 1) th order marginal d istribution described by m . Lemma 2: If the source is ℓ th order M arko v , then D ( m ) = min p ( x k 1 ,y k 1 ) ∈M ( k 1 ) E p d ( x k 1 , y k 1 ) , (24) where k 1 = max( ℓ, k + 1) . Pr o of: [outl ine] Us ing th e technique described in Appendix A , for any legitim ate giv en j oint d istri- bution p ( x k 1 , y k 1 ) with the marginal distrib uti on with re spect t o x coinciding with the source distribution and with mar g inal distribution with respect to y coinciding with some given distribution m , it is possible to cons truct a process which is joi ntly stationary and er godi c with our so urce process and al so has the ( k + 1) th order joint dist ribution as p ( x k 1 , y k 1 ) . Using this giv es us an ac hiev able distortion, i.e., an u pper bound on D ( m ) . On th e other hand, the limi t given in (23 ) is approaching D ( m ) from below . Com bining the upper and lower bounds yields the desired equality . Since by assu mption the encoder does not know ℓ , therefore it can not compu te max( ℓ, k + 1) . But letting k 1 = k + 1 , where k = o (log n ) , for any fixed order ℓ , k 1 will ev entually for n lar ge enough, exceed ℓ , and hence be equal to max( ℓ, k + 1) . Ha ving this observation in mind, consider the following optimizatio n problem, min H ( m ) + αD ( m ) s.t. m ∈ M ( k 1 ) . (25) By Lem ma 2, an equiva lent representation of (25) i s min H ( m ) + α X β ,β ′ , b , b ′ d k 1 ( β ′ b ′ , β b ) p x ( β ′ b ′ ) q y | x ( β b | β ′ b ′ ) s.t. m β , b = X β ′ b ′ p x ( β ′ b ′ ) q y | x ( β b | β ′ b ′ ) , ∀ β , b , 0 ≤ q y | x ( β b | β ′ b ′ ) ≤ 1 , ∀ β , β ′ , b , b ′ , X β , b q y | x ( β b | β ′ b ′ ) = 1 , ∀ β ′ , b ′ , X β ,β ′ p x ( β ′ b ′ ) q y | x ( β b | β ′ b ′ ) = X β ,β ′ p x ( b ′ β ′ ) q y | x ( b β | b ′ β ′ ) ∀ b , b ′ . (26) The last constraint in (26) is the stationarity condi tion defined in (A-1), and ensures that the joint distribution defined by p x ( β b ) q y | x ( β ′ b ′ | β , b ′ ) over ( x k +1 , y k +1 ) corresponds to ( k + 1) th order marginal distribution of some joint ly stationary processes ( X , Y ) . Note that the var iables in (26) are condit ional distributions q y | x ( y k 1 | x k 1 ) , but we are only in terested in the m that they induce. 2 As discussed in Appendix A, the set of such processes i s non-empty for any legitimate m . Lemma 3: If for each n , (P1) has a unique minimi zing type m ∗ n , then k m ∗ n − ˆ m ∗ n k TV → 0 , a . s ., (27) where ˆ m ∗ n is the s olution of (26). Remark: In (26), the only dependence on n is throug h k 1 . Therefore, if the encoder knew the distribution of the source, it could s olve (26), find a good app rox- imation of m ∗ n , and then use (16) to comput e the coeffi cients required by (P2). T he problem is that th e encoder does n ot hav e this informati on, and only knows that the s ource is Markov (but does not know its order). T o overcome its lack of information, a reasonable step is to use empirical distri bution of th e source inst ead of the true unknown d istribution in (26). For a k 1 ∈ X k 1 , define the k th 1 order empirical distribution of th e so urce as ˆ p ( k 1 ) x ( a k 1 ) , |{ i : ( x i − k 1 , . . . , x i − 1 ) = a k 1 }| n . (28) The following lemma sh ows that for k 1 = o (log n ) , ˆ p ( k 1 ) con verges to the actual k th 1 order di stribution of the source, and th erefore can be consi dered as a good approximation for i t. Lemma 4: For k 1 = o (lo g n ) , and any stationary ergodic Markov source, k ˆ p ( k 1 ) − p ( k 1 ) k TV → 0 a.s. , (29) where p ( k 1 ) is the t rue k th 1 order di stribution of the Markov source. Assume x n is generated by a di screte Mark ov source, and let ˆ p ( k 1 ) x be its empirical distribution defined in (28). Consi der th e following optimi zation problem min H ( m ) + α X β ,β ′ , b , b ′ d k 1 ( β ′ b ′ , β b ) ˆ p ( k 1 ) x ( β ′ b ′ ) q y | x ( β b | β ′ b ′ ) s.t. m β , b = X β ′ b ′ ˆ p ( k 1 ) x ( β ′ b ′ ) q y | x ( β b | β ′ b ′ ) , ∀ β , b , 0 ≤ q y | x ( β b | β ′ b ′ ) ≤ 1 , ∀ β , β ′ , b , b ′ X β , b q y | x ( β b | β ′ b ′ ) = 1 , ∀ β ′ , b ′ , X β ,β ′ ˆ p ( k 1 ) x ( β ′ b ′ ) q y | x ( β b | β ′ b ′ ) = X β ,β ′ ) ˆ p ( k 1 ) x ( b ′ β ′ ) q y | x ( b β | b ′ β ′ ) , ∀ b , b ′ . (30) and let ˜ m ∗ n denote the output of the above optimization problem. Lemma 5: For k 1 = k 1 ( n ) = o ( lo g n ) , k ˜ m ∗ n − ˆ m ∗ n k TV → 0 , a.s. Pr o of: [outl ine] The input parameters of the optimizati on probl em (30) are { ˆ p ( k 1 ) ( a k 1 ) } a k 1 ∈X k 1 , therefore ˆ m ∗ n = ˆ m ∗ n ( { ˆ p ( k 1 ) ( a k 1 ) } a k 1 ∈X k 1 ) . On the o ther hand, both the cost function and t he constraints of (30 ) are continuous both in inp ut parameters and op timization variables. This means t hat ˆ m ∗ n in turn is a conti nuous fun ction of { ˆ p ( x k 1 ) } x k 1 ∈X k 1 . Let { λ β , b ( n ) } β , b denote the optimal v alues of th e coef ficients defined at m ∗ n (as giv en in (16)), and let { ˆ λ β , b ( n ) } β , b be coefficients com puted at ˜ m ∗ n , then Lemma 6: max β , b | λ β , b ( n ) − ˆ λ β , b ( n ) | → 0 as n → ∞ . (31) These results suggest that for com puting the coeffi cients we can solve the op timization problem given in (30) (whose compl exity can be controlled wi th the rate of increase of k 1 ), and t hen subst itute t he result in (16) to obtain the approximate coefficients. After that (P2) defined by these coefficients can be solved using the V iterbi alg orithm in a way that will be detailed in t he next section. The succession of lemmas detailed in the previous sections then allow us to prove t he fol lowing theorem. Theor em 2: Let X let a st ationary and ergodic Markov source, and R ( X , D ) d enote its rate distort ion function. Let ˆ X n be t he reconst ruction sequ ence obtained using the above scheme for coding X n choosing k 1 = k + 1 , wh ere k = o (log n ) . Then E  1 n H k ( m ( ˆ X n )) + αd n ( X n , ˆ X n )  n →∞ − → min D ≥ 0 [ R ( X , D ) + αD ] . (32) Remark: Theorem 2 imp lies the fixed-slope u niv ersality of t he scheme wh ich does the loss less com- pression of the reconstruct ion by first describing its count matrix (costing a number of bits which is negligible for large n ) and then doing t he cond itional entropy cod ing. V . V I T E R B I C O D E R As proved in Section III, inst ead of solving (P1), one can solve (P2) for proper choices of coefficients { λ b ,β } . Note th at X β , b [ λ β , b m β , b ( y n ) + α d n ( x n , y n )] = 1 n n X i =1 h λ y i ,y i − 1 i − k + αd ( x i , y i ) i . (33) This alternati ve representation of the cos t function suggest s that instead of us ing simulated annealing, we can find the sequence that mini mizes the cost function by the V it erbi algorithm. For i = k + 1 , . . . , n , let s i = y i i − k be t he s tate at tim e i , S be th e set of all 2 k +1 possible stat es, and for s = b k +1 define w ( s, i ) := λ b k +1 ,b k + αd ( x i , b k +1 ) . From our definition of the states s i = g ( s i − 1 , y i ) , where g : S × ˆ X → S . This representati on leads to a Tre lli s diagram corresponding t o the e volution of the states { s i } m i = k +1 in wh ich each state has | ˆ X | states leadin g to it and | ˆ X | s tates branching from it. Assume that weight w ( s i , i ) is assi gned to the edge connecting s tates s i − 1 and s i , i.e., the cost of each edge only depends on the tail state. It is clear that in our representati on, there is a 1-t o-1 correspondence between sequences y n ∈ ˆ X n and sequences of states { s i } m i = k +1 , and mi nimizing (33) is equiv alent t o findi ng the path of min imum weight in the corresponding T relli s diagram, i.e., the path { s i } n i = k +1 that minimizes P n i = k +1 w ( s i , i ) . Solving this minimizati on can readily be done by V iterbi algorithm which can be described as follo ws. For each state s , let L ( s ) be th e two states l eading to it, and for any i > 1 , C ( s, i ) := min s ′ ∈L ( s ) [ w ( s, i ) + C ( s ′ , i − 1)] . (34) For i = 1 and s = b k +1 , let C ( s, 1) := λ b k +1 ,b k + α d k +1 ( x k +1 , b k +1 ) . Usi ng this procedure, each state s at each t ime j has a path of l ength j − k − 1 which is the mini mum path among all the possible paths between i = k + 1 and i = j such th at s j = s . After comp uting { C ( s, i ) } s ∈S i ∈{ k + 1 ,...,n } , at time i = n , let s ∗ = arg min s ∈S C ( s, n ) . (35) It is not hard to see that the path leadin g to s ∗ is the path of min imum weigh t among all possi ble paths. Note that the comp utational complexity of this procedure is linear in n but exponenti al in k because the number o f st ates increases exponentially wi th k . 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 D R R ( D ) Sh annon low er bound α = 4 α = 3 . 5 α = 3 α = 2 . 5 α = 2 Fig. 1. ( d n ( x n , ˆ x n ) , H k ( ˆ x n )) of output points of V iterbi encoder when the coefficients are computed at m [ x n ] . For each value of α , t he algorithm is run L = 20 ti mes. Here n = 5000 , k = 7 , and the source is binary Marko v wit h q = 0 . 2 V I . S I M U L A T I O N R E S U LT S In this section, som e preliminary sim ulation results of the appl ication of V it erbi encoder described in t he pre viou s section is presented. In our sim ulations, instead of computing the coef ficients { λ β , b } from (16) at the opti mal poi nt m ∗ n , we comput e them at the count matrix of th e input sequence x n , m ( x n ) . Fig. 1 demonstrates ( d n ( x n , y n ) , H k ( m ( y n ))) of outpu t poi nts o f the described algorit hm. T he block length is n = 50 00 , k = 7 and the source is 1 st order bi nary s ymmetric M arko v wit h t ransition probabil ity q = 0 . 2 . For each value of α the algorithm is applied to L = 20 different randomly generated sequences. The reason of getting some poi nts below the rate-dist ortion curve is that the actual number of bits required for describing ˆ x n losslessly to the d ecoder is l ar ger than nH k ( ˆ x n ) , but conv erges to it as n grows. For example, for the sim ple scheme o f separately describi ng the subsequences corresponding to diffe rent preceding contexts, this surplus is of order 2 k log n/n . The eff ect of t his excess rate is not reflected i n the figure, w hich explains why so me point s appear below the rate-dist ortion curve. It can be observed that for larger values of α th e output points are closer to th e curve. The reason i s that large values of α correspond to sm all values of distortion , and if the di stortion is small then m ( x n ) is a good approximation of m ( y n ) . Finally , Fig. 2 compares the performance of the ne w V iterbi encoder and th e M CMC encoder described in [13]. H ere the source is again b inary s ymmetric M arkov with q = 0 . 2 , and the other parameters are: k = 7 , n = 5 , 000 , β t = n log t , r = 10 n , wh ere β t determines the coolin g schedule of th e MCMC coder and r is its numb er of it erations. Each po int i s the figure corresponds to the avera ge performance of L = 1 0 random realizations of the source. It can be observed that ev en for this simpli stic choice of t he coef ficients the performance of th e algorithm s are comparable, while the V it erbi encoder for example in this example runs at least 40 times faster . V I I . C O N C L U S I O N S A N D C U R R E N T D I R E C T I O N S In thi s paper , a new method for u niv ersal fixed-slope lossy com pression of discrete Markov sources was proposed. T he new m ethod achieves the rate-distortion curve for any di screte Markov source. Extending 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 D R R ( D ) Sh a nn on lower bou nd Viterbi co der: α = 4 MC MC coder: α = 4 Viterbi co der: α = 3 . 5 MC MC coder: α = 3 . 5 Viterbi co der: α = 3 MC MC coder: α = 3 Viterbi co der: α = 2 . 5 MC MC coder: α = 2 . 5 Viterbi co der: α = 2 MC MC coder: α = 2 Fig. 2. Comparing the performances of V iterbi encoder and MCMC encoder proposed in [13] the algorit hm to work on any stationary er godi c source is under current in vestigation. W e believe that in fact the same algorithm works for th e general class of s tationary er godic so urces, and only t he proof should be extended to work in thi s case as well . Another direction for fut ure work is finding a sim ple met hod for approximating the opti mal coefficients that would alleviate the need for solv ing the optim ization problem (30). A P P E N D I X A : S T A T I O N A R I T Y C O N D I T I O N Assume that we are give n a | ˆ X | × | ˆ X | k matrix m wit h all elements posi tiv e and summing up to one. The quest ion is under what condi tion(s) this matrix can be ( k + 1) th order statio nary dist ribution of a stationary process. For the ease of no tations, instead of matrix m consider p ( x k +1 ) as a distribution defined on ˆ X k +1 . W e s how that a necessary and sufficient condition is the s o-called stationar ity con dition which is X β ∈ ˆ X p ( β x k ) = X β ∈ ˆ X p ( x k β ) . (A-1) - Necessity: The necessity of (A-1) is jus t a direct result of the definition of station arity of a process. If p ( x k +1 ) is to represent the ( k + 1) th order marginal distribution of a station ary process, then it should be consistent wi th the k th order marginal distribution as satisfy (A-1). - Suf ficiency: In order to prove t he s uf ficiency , we assume that (A-1) holds, and build a st ationary process wit h ( k + 1) th order marginal distribution of p ( x k +1 ) . Consider a k th order Markov chain with t ransition probabil ities of q ( x k +1 | x k ) = p ( x k +1 ) p ( x k ) . (A-2) Note that p ( x k ) is well-defined by (A-1). Moreover , again from (A-1), p ( x k +1 ) is the stationary distribution of th e defined Markov chain, because X x 1 q ( x k +1 | x k ) p ( x k ) = X x 1 p ( x k +1 ) = p ( x k +1 2 ) . (A-3) Therefore we hav e foun d a st ationary process that has the desired marginal dist ribution. Finally we show th at if m is the count mat rix of a sequence y n , then there exist a s tationary process with the marginal dist ribution coincid ing with m . From what we just proved, we only need to show that (A-1) ho lds, i. e., X β m β , b = X β m b k , [ β , b 1 ...,b k − 1 ] . (A-4) But this is tru e because both sides of (A-4) are equal to |{ i : y i + k i +1 = b }| / n . A P P E N D I X B : C O N C A V I T Y O F H ( m ) For simplicity assume t hat X = ˆ X = { 0 , 1 } . By definition H ( m ) = X b ∈{ 0 , 1 } k ( m 0 , b + m 1 , b ) h ( m 0 , b m 0 , b + m 1 , b ) , (B-1) where h ( α ) = α log α + ¯ α log ¯ α and ¯ α = 1 − α . W e need to show that for any θ ∈ [0 , 1 ] , and empirical count m atrices m (1) and m (2) , θ H ( m (1) ) + ¯ θ H ( m (2) ) ≤ H ( θ m (1) + ¯ θ m (2) ) . (B-2) From the concavity of h , it fol lows that θ ( m (1) 0 , b + m (1) 1 , b ) h ( m (1) 0 , b m (1) 0 , b + m (1) 1 , b ) + ¯ θ ( m (2) 0 , b + m (2) 2 , b ) h ( m (2) 0 , b m (2) 0 , b + m (2) 2 , b ) = ( θ ( m (1) 0 , b + m (1) 1 , b ) + ¯ θ ( m (2) 0 , b + m (2) 1 , b )) X i ∈{ 1 , 2 } θ i ( m ( i ) 0 , b + m ( i ) 1 , b ) ( θ ( m (1) 0 , b + m (1) 1 , b ) + ¯ θ ( m (2) 0 , b + m (2) 1 , b )) h ( m ( i ) 0 , b m ( i ) 0 , b + m ( i ) 1 , b ) ≤ ( θ ( m (1) 0 , b + m (1) 1 , b ) + ¯ θ ( m (2) 0 , b + m (2) 1 , b )) h ( θ m (1) 0 , b + ¯ θ m (2) 0 , b θ ( m (1) 0 , b + m (1) 1 , b ) + ¯ θ ( m (2) 0 , b + m (2) 1 , b ) ) , (B-3) where θ 1 = 1 − θ 2 = θ . Now sum ming up both sides of (B-3) over all b ∈ ˆ X k , yields the desired resul t. R E F E R E N C E S [1] C. Shannon, “Coding theorems for a discrete source with a fidelity criterion, ” IRE N at. Conv . Rec , part 4, pp. 142-163, 1959. [2] R.G. Gallager , “Information Theory and Reli able Communication, ” New Y ork, NY : John Wile y & Sons, 1968. [3] T . Berger , Rate-distortion theory: A mathematical basis f or data compr ession , Englew ood Cli ffs, NJ: Prentice-Hall, 1971. [4] T . M. Cov er, and J. A. Thomas, Elements of Information Theory , New Y ork: W iley , 1991. [5] J. Ziv and A. Lempel, “Compression of individu al sequences via variable-rate coding , ” IEEE T rans. on Inf. Theory , 24(5):530-536, Sep. 9178. [6] I. H. W itten, R. M. Neal, and J. G. Cleary , “ Arithmetic coding for data compression”, Commun. Assoc. Comp. Mach . , vol. 30, no. 6, pp. 520-540, 1987. [7] I. K ontoyiannis, “ An implementable lossy version of the Lempel Zi v algorithm-Part I: optimality for memoryless sources, ” IEEE T rans. on Inform. Theory , vol. 45, pp. 2293-2305, Nov . 1999. [8] E. Y ang, Z. Z hang, and T . Berg er, “Fixed-slope univ ersal lossy data co mpression, ” , IEE E T rans. on Inform. Theory , vol. 43, no. 5, pp. 1465-1476, Sep. 1997. [9] E. H. Y ang and J. Kieffe r, “Simple unive rsal lossy data compression schemes deriv ed from the Lempel-Ziv algorithm, ” IEEE T rans. on Inform. T heory , vol. 42, no. 1, pp. 239-245, 1996. [10] T . Berger , J.D. Gibson, “Lossy source coding, ” IEE E T rans. on Inform. Theory , vol. 44, no. 6, pp. 2693-2723, 1998. [11] A. Gersho, R.M. Gray , V ector Quantization and Signal Compr ession Springer , 1992. [12] E. Y ang, and Z. Zhang, “V ariable-Rate T rellis Source Encoding”, IEEE Tr ans. on Inform. Theory , vol. 45, no. 2, pp. 586-608, 1999. [13] S. Jalali, T . W eissman, “Lossy coding via Marko v chain Monte Carlo, ” IEEE International S ymposium on Information Theory , T oronto, Canada, 2008.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment