An Implementable Scheme for Universal Lossy Compression of Discrete Markov Sources

Reading time: 6 minute
...

📝 Original Info

  • Title: An Implementable Scheme for Universal Lossy Compression of Discrete Markov Sources
  • ArXiv ID: 0901.2367
  • Date: 2009-01-15
  • Authors: Shirin Jalali, Andrea Montanari, Tsachy Weissman

📝 Abstract

We present a new lossy compressor for discrete sources. For coding a source sequence $x^n$, the encoder starts by assigning a certain cost to each reconstruction sequence. It then finds the reconstruction that minimizes this cost and describes it losslessly to the decoder via a universal lossless compressor. The cost of a sequence is given by a linear combination of its empirical probabilities of some order $k+1$ and its distortion relative to the source sequence. The linear structure of the cost in the empirical count matrix allows the encoder to employ a Viterbi-like algorithm for obtaining the minimizing reconstruction sequence simply. We identify a choice of coefficients for the linear combination in the cost function which ensures that the algorithm universally achieves the optimum rate-distortion performance of any Markov source in the limit of large $n$, provided $k$ is increased as $o(\log n)$.

💡 Deep Analysis

Deep Dive into An Implementable Scheme for Universal Lossy Compression of Discrete Markov Sources.

We present a new lossy compressor for discrete sources. For coding a source sequence $x^n$, the encoder starts by assigning a certain cost to each reconstruction sequence. It then finds the reconstruction that minimizes this cost and describes it losslessly to the decoder via a universal lossless compressor. The cost of a sequence is given by a linear combination of its empirical probabilities of some order $k+1$ and its distortion relative to the source sequence. The linear structure of the cost in the empirical count matrix allows the encoder to employ a Viterbi-like algorithm for obtaining the minimizing reconstruction sequence simply. We identify a choice of coefficients for the linear combination in the cost function which ensures that the algorithm universally achieves the optimum rate-distortion performance of any Markov source in the limit of large $n$, provided $k$ is increased as $o(\log n)$.

📄 Full Content

Let X = {X i : i ≥ 1} represent a discrete-valued stationary ergodic process with unknown statistics, and consider the problem of compressing X at rate R such that the incurred distortion is minimized. Let X and X denote finite source and reconstruction alphabets respectively. The performance of the described coding scheme is measured by its average expected distortion between source and reconstruction blocks, i.e.

where d : X × X → R + is a single-letter distortion measure. For any R ≥ 0, the minimum achievable distortion (cf. [4] for exact definition of achievability) is characterized as [1], [2], [3] D(X, R) = lim n→∞ min p( Xn |X n ):I(X n ; Xn )≤R

A sequence of codes at rate R is called universal if for every stationary ergodic source X its asymptotic performance converges to D(X, R), i.e.,

For lossless compression where the source is to be recovered without any errors, there already exist well-known implementable universal schemes such as Lempel-Ziv coding [5] or arithmetic coding [6]. In contrast to the situation of lossless compression, for D > 0, there are no well-known practical schemes that universally achieve the rate-distortion curve. In recent years, there has been progress towards designing universal lossy compressor especially in trying to tune some of the existing universal lossless coders to work in the lossy case as well [7], [8], [9]. All of these algorithms are either provably suboptimal, or optimal but with exponential complexity.

Another approach for lossy compression, which is very well-studied in the literature and even implemented in JPEG 2000 image compression standard, is Trellis coded quantization, i.e. Trellis structured code plus Viterbi encoding (c.f. [10], [11] and references therein). This method is in general suboptimal for coding sources that have memory [11]. In [12], an algorithm for fixed-slope Trellis source coding is proposed, and is shown to be able to get arbitrary close to the rate-distortion curve for continuous-valued stationary ergodic sources. The proposed method is efficient in low rate region.

In a recent work [13], a new implementable algorithm for lossy compression of discrete-valued stationary ergodic sources was proposed. Instead of fixing rate (or distortion) and minimizing distortion (or rate), the new algorithm fixes Lagrangian coefficient α, and minimizes R + αD. This is done by assigning energy E(y n ) representing R + αD to each possible reconstruction sequence and finding the sequence that minimizes the cost by simulated annealing. The algorithm starts by letting y n = x n , and at each iteration chooses an index i ∈ {1, . . . , n} uniformly at random, and probabilistically changes y i to some y ∈ X such that there is a positive probability (which goes to zero as the number of iterations increases) that the resulting sequence has higher energy than the original sequence. Allowing the energy to increase especially at initial steps prevents the algorithm from being entrapped in a local minimum. It was shown that using a universal lossless compressor to describe the reconstruction sequence resulting from this process to the decoder results in a scheme which is universal in the limit of many iterations and large block length. The drawback of the proposed scheme is that although its computational complexity per iteration is independent of the block length n and linear in a parameter k n = o(log n), there is no useful bound on the number of iterations required for convergence. In this paper, inspired by the previous method, we propose yet another approach for lossy compression of discrete Markov sources which universally achieves optimum rate-distortion performance for any discrete Markov source. We start by assigning the same cost that was defined for each possible reconstruction sequence in [13]. The cost of each sequence is a linear combination of two terms: its empirical conditional entropy and its distance to the source sequence to be coded. We show that there exists proper linear approximation of the first term such that minimizing the linearized cost results in the same performance as minimizing the original cost. But the advantage is that minimizing the modified cost can be done via Viterbi algorithm in lieu of simulated annealing which was used for minimizing the original cost.

The organization of the paper is as follows. In Section II, we set up the notation, and define the count matrix and empirical conditional entropy of a sequence. Section III describes a new coding scheme for fixed-slope lossy compression which universally achieves the rate-distortion curve for any discrete Markov source and IV describes how to compute the coefficients required by the algorithm outlined in the previous section. Section V explains how Viterbi algorithm can be used for implementing the coding scheme described in Section III. Section VI presents some simulations results, and finally, Section VII concludes the paper with a discussion of some future direc

…(Full text truncated)…

📸 Image Gallery

cover.png page_2.webp page_3.webp

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut