Source Coding with Mismatched Distortion Measures

1 Source Coding with Mismatched Distortion Measures Urs Niesen, Dev avrat Shah, and Gregory W ornell Abstract W e consider the pro blem of lossy source coding with a m ismatched distortion measure . That is, we invest igate what distortion guaran tees can be made with resp ect to distortion measure ˜ ρ , for a sou rce code d esigned such th at it achieves distortion less than D with respect to d istortion measure ρ . W e ﬁnd a sing le-letter character ization of this mismatch distortion and study properties of this quantity . These results give insight into the robustness o f lossy source coding with respect to modeling errors in the distor tion measure. T hey also provide guid elines o n how to choose a good tracta ble appr oximation of an intractable distortion measure. I . I N T RO D U C T I O N A. Pr oblem F ormulation Giv en so urce alphabet X and a reconstruction alphabet Y , a source { X i } i ≥ 1 with each X i taking values in X , and t wo distortion measures ρ n , ˜ ρ n : X n × Y n → R + . Assu me we have access to an oracle that , when queried, produ ces a so urce code f n (i.e., a mappin g f n : X n → Y n ) such that E ρ n ( X n , f n ( X n )) ≤ D . What guarantees can we make a priori (i.e., before queryi ng the oracle) about E ˜ ρ n ( X n , f n ( X n )) ? As a second question, assume we ha ve access to an oracle that, when queried, pro duces a source code f n such that 1 1 n log | f n ( X n ) | ≤ R E ρ n ( X n , f n ( X n )) ≤ D . What guarantees can we make a priori about E ˜ ρ n ( X n , f n ( X n )) ? This prob lem has the following operatio nal signi ﬁcance. Let a source code with expected distortion according to ρ of at most D be give n. Assume instead of using this so urce code with respect to ρ , we decide to use it wi th respect to ˜ ρ . Such a situ ation occurs if constructing a s ource code for ˜ ρ is not feasible or if ˜ ρ is not fully known when cons tructing the s ource code. W e are then f aced with a m ismatch in the d istortion measure, and the best dist ortion guarantee menti oned in t he opening paragraph provides a measure for ho w se vere th is mismat ch is. As an e x ample, for an image compression p roblem, ˜ ρ is determin ed by the human visual system, and any tractable model ρ of it can necessarily be only an approximat ion of it. T o be more speciﬁc, assume ρ is taken t o be squared error . While i t is well known that this is not a faithful model for the h uman vi sual system, it is nevertheless often us ed in practice due to i ts simplicity . Assume then we choose one out of the many av ail able source coding schemes for squared error distorti on ρ . This sou rce coding scheme will hav e some d istortion guarantee for ρ (the dist ortion measure it is designed for). The best performance guarantee mentio ned in the opening paragraph allows then t o translate t his dist ortion guarantee for ρ to a distortion guarantee for ˜ ρ . If, in addition , we also ﬁx the rate of the source coding scheme, we are able to obtain a tighter performance guarantee (t he second quest ion in the opening p aragraph). This work was supported i n part by NSF under Grant No. CCF-0515109, and by HP through the MIT/HP Alliance. The authors are wit h the Massa chusetts I nstitute of T echnology , Department of Electrical Engineering and Computer Science, Cambridge, MA 02139 , USA. E mail: { u niesen,deva vrat,gww } @ mit.edu 1 | f n ( X n ) | denotes the cardinality of the range of the function f n . 2 In other words, an answer to the above q uestions allows to analy ze th e robustness of coding schemes to modeling errors - or mis match in general - in the dist ortion measure. B. Related W ork The question of m ismatched distortion measures in sou rce coding has pre viously been considered in [1], [2], [3], [4], and [5]. In these works the mismatch is only with respect to the encodin g part of the source code, wherea s at least the decoder is matched to the proper di stortion measure. Th is di f fers from the setu p here, where the mismat ch is with respect to bo th, the encoder and th e decoder . W e comment on the precise dif ferences in t he following paragraphs. In [1], a partial order amo ng distortion measures is deﬁned such that ρ ≥ ˜ ρ if for e very source code (consisting of an encoder g n : X n → { 1 , . . . , exp( nR ) } and a decoder φ n : { 1 , . . . , exp( nR ) } → Y n ) satisfying E ρ n ( X n , φ n ( g n ( X n ))) ≤ D t here e xists a second decoder ˜ φ n satisfying E ˜ ρ n ( X n , ˜ φ n ( g n ( X n ))) ≤ D . Thus, in t his setup, the encoder g n is designed for a mismatched dist ortion measure ρ , whereas the decoder ˜ φ n is matched to the distortion measure ˜ ρ . In [2], the fol lowing problem is considered. Fix a codebook C ⊂ Y n , and let g n : X n → C be an optimal encoder for this codeboo k C with respect to ρ . Find codebook C and decoder ˜ φ n : C → Y n such that E ˜ ρ n ( X n , ˜ φ n ( g n ( X n ))) is minimized. Again , t he m ismatch is only wit h respect to the encoder g n , whereas the d ecoder as well as the codebook C are matched to the distortion measure ˜ ρ . In [3], t he author consid ers the problem of ﬁnding an encoder g n : X n → { 1 , . . . , exp( nR ) } such that there exists a decoder φ n : { 1 , . . . , exp( nR ) } → Y n satisfying E ρ n ( X n , φ n ( g n ( X n ))) ≤ D whil e maximizing inf ˜ φ n E ˜ ρ n ( X n , ˜ φ n ( g n ( X n ))) . In other words, the goal is to ﬁnd an encoder that guarantees distortion at most D with respect to ρ , whi le making sure that this code has maximum possible distorti on with respect to ˜ ρ . As in the previous cases, the mi smatch is only with respect to the encoder , whereas the decoder ˜ φ n is matched to the distortion measure ˜ ρ . In [4, Problem 2.2.14] and [5], the probl em of lo ssy source codi ng with respect to a class of di stortion measures is considered: Gi ven a class of dist ortion measures Γ , we want t o ﬁnd a source code f n : X n → Y n such that sup ρ ∈ Γ E ρ n ( X n , f n ( X n )) is minimized. In other w ords, f n is now “matched” to all ρ ∈ Γ simultaneous ly . C. Modeli ng P er ceptual Distort ion Measur es In this section, we brieﬂy re view the typical structure of perceptual distorti on measures. This will mo- tiv ate th e results presented i n the main t ext. W e focus here on di stortion m easures for image compression; the structu re of perceptual dist ortion measures for speech, audio, or vi deo compression i s simil ar (see [6] for details on those distortio n measures). The discussi on here follows [7] and [8]. The typical structure of a perceptual disto rtion measure for image com pression is depicted in Figure 1. Here x and y are th e original and reconst ructed image respectively , represented, for example, as vector of gray scale v alues. Linear transform Linear transform Error pooling Front end Front end y ρ ( x, y ) x Masking Masking Fig. 1. T ypical structure of a perceptual distortion measure. Adapted from [7]. 3 The ﬁrst block (termed front end) contains con versions from the image format to physical luminance observed by the human eye and other calibrations. Th e second block performs a linear transform of the two im ages, usu ally decom posing it into a number of s patial frequency bands with different orientati ons. In the ne xt block, the coef ﬁcient of each band is weighted to account for masking effe cts. The resulting vector of weighted coef ﬁcients of the ori ginal and reconstructed image are then s ubtracted. Th e last bl ock takes thi s vector of weighted difference s and pools it together i nto one real nu mber . Usually this is done by computing the ℓ p norm of the di f ference vector for some p ≥ 1 or taking some power r ≥ 1 of that norm. T yp ical values of p range from 2 to 4 . Formally , the source and reconstruction alphabets are X = Y = R m or X = Y = [0 , 1] m for so me ﬁnite m . In the fol lowing, we write x, y for elem ents of general X , Y , and we write x , y if we want to emphasize that X = Y = R m or X = Y = [0 , 1] m . This means that ρ is of the form ρ ( x , y ) =   [ v ( x 1 ) , . . . , v ( x m )] W x − [ v ( y 1 ) , . . . , v ( y m )] W y   r p , and is so metimes simpli ﬁed to ρ ( x , y ) =    [ v ( x 1 ) , . . . , v ( x m )] − [ v ( y 1 ) , . . . , v ( y m )]  W x   r p . (1) v : R → R accounts for the front end, W : R m → R m × k accounts for the linear transform and masking. Here (and in the following), we write for a ∈ R k and p ≥ 1 k a k p , (  P k i =1 | a i | p  1 /p if p < ∞ , max 1 ≤ i ≤ k | a i | if p = ∞ . D. Outline of Resul ts W e n ow discuss several questions that arise wh en trying to construct and use perceptual distort ion measures for source coding. These questions motiv ate the resul ts presented in this paper , and they are used as examples throug hout. • T he choice of r and p for the err or pooling seems to va ry quite considerably across dif ferent perceptual distortion measures for image com pression. [9] uses p = 2 , r = 1 , [10] uses p = 2 . 4 , r = 1 , [11] uses p = 4 , r = 1 , and [12], [13] us e p = 2 , r = 2 . It is therefore of interest to know how distortion mismatch i n these two p arameters af fect the performance of t he source code. T his is discussed in Example 2 (using Theorems 1, 2, 3, 4). • Given a class of d istortion measures Γ , [12] sug gests the following approach to ﬁnd the “best” approximation ρ ∈ Γ to the distortion m easure implemented b y the human v isual system: Simulate the (information t heoretically) optim al encoding scheme for all ρ ∈ Γ , and determine e xperimentally (i.e., by showing the original and di storted i mage to a hum an) the one yielding the sm allest di stortion. This o ptimal distortion measure is then declared to be t he best approxim ation. While this approach yields indeed the best approximatio n ρ ∈ Γ when used with the optimal inﬁnite leng th source code, it is not clear a prio ri if this ρ will also yield a good approxi mation when u sed with a su boptimal s ource code. Indeed, as we sh all see in Example 2, there are situations in which the mismatch for the optimal and (ev en only slightly) suboptimal source codes are very different. In Example 3 (using Theorem 5), we p rovide conditions on Γ and the s ource u nder which the ρ found wi th th is approach yields also a goo d app roximation when used with good b ut not optimal source codes. Th ese condi tions hold for t he mod el in [12] (with a few addit ional assumpt ions, that are im plicitly made there). Hence our results provide e vi dence that the optimal approxim ation ρ ∈ Γ found in [12] will also be good for practical (and hence necessarily suboptimal) source codes. • [13] proposes a vector quantizer desig n procedure for d istortion measures of the form ρ ( x , y ) = w x k y − x k 2 2 , (2) 4 where w : R m → R . Since this is con siderably simpl er t han the standard model (1), the qu estion arises of how to ﬁnd the w x such th at the resulting ρ in (2) is “close” to one of the mo re compli cated form (1). Note that it is not im mediately obvious wh at “close” should mean in this context. Indeed, there are several such notions th at are reasonable. In Example 5, we show what properti es such a notion sh ould ha ve. The problem posed by [13 ] discussed above is t reated i n detail in Exam ple 1 (using Theorem 3) and Example 6 (using Corollaries 8 and 9). • E ssentially all m odels of perceptual disto rtion measures contain a numb er of parameters t hat are usually chosen t o be i n “close agreement” wi th the beha vi or of the human visual system. Again, it is not clear what “clos e agreement” should mean here. In Example 7 (us ing Propositio n 10), a sim ple such m easure o f closeness is proposed, providing a guid eline for how to tune the parameters of a perceptual distortion model to be used for source coding. E. Or ganiza tion The re mainder of this paper is or gani zed a s follows. In Section II, we present our main results. Section III contains the corresponding proo fs. Section IV contains concluding remarks. I I . M A I N R E S U LT S In t his s ection, we formall y int roduce the problem of source coding wi th di stortion m ismatch. T o simplify the exposition , and since it represents the case of m ost practical interest, we assu me in the following that X = Y = R m for some ﬁnite m . Most of t he results are, howe ver , also valid if the alphabets are general Polish spaces (i.e., complete, separable, metric spaces). W e let B ( X × Y ) be the Borel sets of X × Y . By P ( X × Y ) , we denote the set of all probabili ty measures on ( X × Y , B ( X × Y )) . For Q ∈ P ( X × Y ) , Q X denotes the X marginal of Q . For a measurable functi on g : X × Y → R , we denote by E Q g ( X, Y ) or E Q g the e xpectation of g ( X, Y ) with respect to Q . For an y A ∈ B ( X × Y ) , we write E Q ( g ; A ) for E Q g 1 1 A . I ( Q ) denotes mutual information (in nats) betwee n the random variables ( X , Y ) ∼ Q . Throughout this p aper , we restrict attention to single-letter dist ortion measures, i.e., measurable functions ρ : X × Y → R + with ρ n : X n × Y n → R + deﬁned by ρ n ( x n , y n ) = 1 n n X i =1 ρ ( x i , y i ) . W e al so assume througho ut that the source { X i } i ≥ 1 is i.i.d. wit h distribution P ∈ P ( X ) . R ρ ( D ) and D ρ ( R ) denote t he rate-distortion and the dist ortion-rate functi on for the source { X i } i ≥ 1 and with respect to the s ingle-letter disto rtion measure ρ , i. e., R ρ ( D ) , inf Q ∈P ( X ×Y ): Q X = P , E Q ρ ≤ D I ( Q ) , D ρ ( R ) , inf Q ∈P ( X ×Y ): Q X = P ,I ( Q ) ≤ R E Q ρ. Our resul ts are divided into se veral p arts. In Section II-A, we provide single-letter characterizations of the mismatch distortion. In Section II-B, we in vestigate properti es of these q uantities. Section II-C contains inform ation on how to ev aluate the single-letter characterizations of th e mis match distortion. Section II-D, con siders t he problem of ﬁnding a g ood representatio n of a distortion measure from a class of sim pler ones. 5 A. Single-Letter Characterizations In this secti on, we provide si ngle-letter characterizations of the smallest d istortion wi th respect t o ˜ ρ that can be guaranteed for any so urce code (either wi th or without constraint on the rate R ) d esigned for distortion D ρ with respect t o ρ . Deﬁne D ρ, ˜ ρ ( R, D ρ ) , sup E Q ˜ ρ, (3) where th e s upremum is taken over all Q ∈ P ( X × Y ) such that Q X = P , E Q ρ ≤ D ρ and I ( Q ) ≤ R . If the set over which this suprem um is taken is empty , we deﬁne D ρ, ˜ ρ ( R, D ρ ) , −∞ . Theor em 1. Let ρ, ˜ ρ be distortio n measur es satisfyin g E P ρ ( X, y 0 ) < ∞ for some y 0 ∈ Y . F or every D ˜ ρ < ∞ such that 0 ≤ D ˜ ρ < lim δ ↓ 0 D ρ, ˜ ρ ( R − δ, D ρ − δ ) ther e exists a s equence of sourc e codes { f n } n ≥ 1 such that lim n →∞ 1 n log | f n ( X n ) | ≤ R, lim sup n →∞ E ρ n ( X n , f n ( X n )) ≤ D ρ , lim inf n →∞ E ˜ ρ n ( X n , f n ( X n )) ≥ D ˜ ρ . Theor em 2. F or any n and any sour ce code f n : X n → Y n such that 1 n log | f n ( X n ) | = R, E ρ n ( X n , f n ( X n )) ≤ D ρ , we have 2 E ˜ ρ n ( X n , f n ( X n )) ≤ D ρ, ˜ ρ ( R + , D ρ ) . If, mor eover , R > R ρ ( D ρ ) then E ˜ ρ n ( X n , f n ( X n )) ≤ D ρ, ˜ ρ ( R, D ρ ) . Theorems 1 and 2 allow us to m ake guarantees about the performance of a source code constructed with mis matched distortion measure. Indeed, if f n : X n → Y n is a s ource code of rate R designed for a distortion measure ρ and d istortion lev el D ρ , then by Theorem 2, f n is also a source code for any distortion measure ˜ ρ and dist ortion lev el D ρ, ˜ ρ ( R + , D ρ ) . Mo reove r , this is essent ially the best g uarantee one can make, since by Theorem 1 there exist source codes with same blocklength n and same rate R designed for di stortion measure ρ and distortion lev el D ρ that result in a distortion level of m ore than D ρ, ˜ ρ ( R − δ ( n ) , D ρ − δ ( n )) − δ ( n ) for dist ortion m easure ˜ ρ with δ ( n ) → 0 as n → ∞ . This answers the s econd quest ion posed in t he introduction. T o answer the ﬁrst quest ion, we need to ﬁnd the best di stortion guarantee that is independent of the rate R of the source code. From Theorems 1 and 2, t his best distortion g uarantee is gi ven by sup R ≥ 0 D ρ, ˜ ρ ( R, D ρ ) . 2 For a real valued function g , we write g ( x +) , lim δ ↓ 0 g ( x + δ ) and g ( x − ) , lim δ ↓ 0 g ( x − δ ) , assuming the limits exist. 6 Since D ρ, ˜ ρ ( · , D ρ ) is an increasing function, this i s equal to lim R →∞ D ρ, ˜ ρ ( R, D ρ ) . The next theorem consid ers this limi t. Theor em 3. If (i) ρ, ˜ ρ ar e contin uous (ii) there exists y 0 ∈ Y such that E P ρ ( X, y 0 ) < ∞ (iii) D ρ ( ∞ ) < D ρ < ∞ then for any η ≥ 0 the expectation E P sup y ∈Y ( ˜ ρ ( X , y ) − η ρ ( X , y )) is well deﬁned and D ρ, ˜ ρ ( ∞ , D ρ ) = min η ≥ 0  η D ρ + E P sup y ∈Y  ˜ ρ ( X , y ) − η ρ ( X, y )   . If, mor eover , (iv) D ρ, ˜ ρ ( ∞ , D ρ ) < ∞ , then lim R →∞ D ρ, ˜ ρ ( R, D ρ ) = D ρ, ˜ ρ ( ∞ , D ρ ) . Example 1. Let ρ ( x , y ) = ( y − x ) T W x ( y − x ) , ˜ ρ ( x , y ) = ( y − x ) T f W x ( y − x ) , where W x and f W x are positive deﬁnite for P alm ost e very x . Let P ∈ P ( X ) such t hat E P X T W X X < ∞ . W ith thi s, Assumpti on (i) and (ii) of Theorem 3 are satisﬁed. Applying t he theorem yiel ds that for D ρ ( ∞ ) < D ρ < ∞ , D ρ, ˜ ρ ( ∞ , D ρ ) = min η ≥ 0 η D ρ + E P sup y ∈ R m ( y − X ) T ( f W X − η W X )( y − X ) , (4) and whenever this quantity is ﬁnite then also lim R →∞ D ρ, ˜ ρ ( R, D ρ ) = D ρ, ˜ ρ ( ∞ , D ρ ) . If f W x − η W x in (4) is not negati ve sem ideﬁnite for s ome x , then it has at least one strictly positive eigen value ν > 0 with corresponding eig en vector v . Setting y = x − a v yields ( y − x ) T ( f W x − η W x )( y − x ) = a 2 ν v T v → ∞ as a → ∞ . Hence the η m inimizing (4) is always s uch that f W x − η W x is negativ e semideﬁnite for P almost ever y x . In t his case sup y ∈ R m ( y − x ) T ( f W x − η W x )( y − x ) = 0 , and we obtain lim R →∞ D ρ, ˜ ρ ( R, D ρ ) = D ρ inf { η ≥ 0 : f W x − η W x ≤ 0 P a.e. } , (5) where f W x − η W x ≤ 0 means that the matrix on the left hand si de is negativ e semideﬁnite. ♦ 7 B. Pr operties of D ρ, ˜ ρ ( R, D ρ ) The function D ρ, ˜ ρ ( R, D ρ ) exhibits the following beha vior: D ρ, ˜ ρ ( R, D ρ ) ∈      {−∞} if R < R ρ ( D ρ ) , R + ∪ {±∞} if R = R ρ ( D ρ ) , R + ∪ {∞} if R > R ρ ( D ρ ) . Moreover , a simple argument shows that D ρ, ˜ ρ ( R, D ρ ) i s concave and increasing in bot h its ar gum ents, and continuous at all poi nts ( R, D ρ ) such that R > R ρ ( D ρ ) . D ρ, ˜ ρ ( R, D ρ ) is necessarily di scontinuous at ( R ρ ( D ρ ) , D ρ ) , but could be either left- or rig ht-continuous (as a function of either R or D ρ ). This imp lies that th e function either equals ∞ for all ( R, D ρ ) such that R > R ρ ( D ρ ) or is ﬁnite on this whole range. The two types of possible behaviors of D ρ, ˜ ρ ( R, D ρ ) are depict ed in Figure 2. R ∞ −∞ R −∞ R ρ ( D ρ ) R ρ ( D ρ ) D ρ, ˜ ρ ( R, D ρ ) D ρ, ˜ ρ ( R, D ρ ) Fig. 2. Possible behaviors of D ρ, ˜ ρ ( R, D ρ ) . The n ext two theorems describe the behavior of D ρ, ˜ ρ ( R, D ρ ) in more detail. Th eorem 4 provides conditions under which D ρ, ˜ ρ ( R, D ρ ) = ∞ for all ( R, D ρ ) such that R > R ρ ( D ρ ) . In these s ituations, we cannot make any guarantees about the performance of a source code of rate R designed for distortion measure ρ and distortion l ev el D ρ when used for di stortion measure ˜ ρ . Theorem 5 gives s ufﬁ cient conditions such that D ρ, ˜ ρ ( R, D ρ ) ≥ 0 for ( R , D ρ ) wit h R = R ρ ( D ρ ) , and conditi ons for D ρ, ˜ ρ ( R, D ρ ) to be right -continuous in R at those poin ts. Theor em 4. If (i) 0 < R < ∞ (ii) D ρ > D ρ ( R ) (iii) ther e e xists y 0 ∈ Y such that E P ρ ( X, y 0 ) , D 0 < ∞ (iv) ther e ex ist { A k } k ≥ 1 ⊂ B ( X ) , { y ∗ k } k ≥ 1 ⊂ Y such that E P ( ρ ( X, y ∗ k ); A k ) < ∞ for all k ≥ 1 , P ( A k ) inf x ∈ A k ˜ ρ ( x, y ∗ k ) → ∞ as k → ∞ , sup x ∈ A k ρ ( x, y ∗ k ) / ˜ ρ ( x, y ∗ k ) → 0 as k → ∞ then D ρ, ˜ ρ ( R, D ρ ) = ∞ . Remark. The second and third part of Assumpt ion (iv) are satisﬁed for example if ˜ ρ ( x, y ) → ∞ and ρ ( x, y ) / ˜ ρ ( x, y ) → 0 when k y − x k 2 → ∞ . See also Example 2. 8 Example 2. Let ρ ( x, y ) = d ( y − x ) r , ˜ ρ ( x, y ) = ˜ d ( y − x ) ˜ r for arbitrary norms d, ˜ d : R m → R + , and for r , ˜ r ≥ 1 . Let P ∈ P ( X ) be such that E P d ( X ) r < ∞ . W ith sl ight ab use of not ation, we shall write ρ ( x − y ) for ρ ( x, y ) and si milar for ˜ ρ in this example. Case 1: r < ˜ r . W e ﬁrst show that the conditi ons o f Theorem 4 are satisﬁed. Since all norms on a ﬁnite dimensional space are equiv alent, there exist a 1 , a 2 > 0 such that a 1 d ( z ) ≤ ˜ d ( z ) ≤ a 2 d ( z ) for all z ∈ R m , and thus there exist b 1 , b 2 > 0 such that b 1 ρ ( x − y ) ˜ r/r ≤ ˜ ρ ( x − y ) ≤ b 2 ρ ( x − y ) ˜ r/r for all x ∈ X , y ∈ Y . Hence, we ha ve ρ ( x − y ) / ˜ ρ ( x − y ) ≤ 1 b 1 ρ ( x − y ) ( r − ˜ r ) /r for all x ∈ X , y ∈ Y . Let A , [ − c, c ] m , and choose c su ch that P ( A ) > 0 . Set y ∗ k , k 1 , where 1 = (1 , . . . , 1) ∈ R m . W it h this sup x ∈ A ρ ( x − y ∗ k ) / ˜ ρ ( x − y ∗ k ) ≤ sup x ∈ A 1 b 1 ρ ( x − y ∗ k ) ( r − ˜ r ) /r = max x ∈ A 1 b 1 d ( x − y ∗ k ) r − ˜ r → 0 as k → ∞ , sat isfying Assum ption (i v .3) o f Theorem 4. Moreov er , P ( A ) inf x ∈ A ˜ ρ ( x − y ∗ k ) = P ( A ) min x ∈ A ˜ d ( x − y ∗ k ) ˜ r → ∞ as k → ∞ , sat isfying Assum ption (i v .2) o f. Finally , E P ρ ( X − y ∗ k ) ≤ E P  d ( y ∗ k ) + d ( X )  r . By Jensen’ s inequalit y  1 2 d ( y ∗ k ) + 1 2 d ( X )  r ≤ 1 2 d ( y ∗ k ) r + 1 2 d ( X ) r , and hence E P ρ ( X, y ∗ k ) ≤ 2 r − 1  d ( y ∗ k ) r + E P d ( X ) r  ≤ ∞ for all k ≥ 0 . Therefore with y 0 = 0 , we ha ve E P ρ ( X − y 0 ) < ∞ and E P ( ρ ( X − y ∗ k ); A ) < ∞ , s atisfying Assumptio ns (ii i) and (iv .1) of Theorem 4. Thus apply ing the Theorem with A k , A yields D ρ, ˜ ρ ( R, D ρ ) = ∞ for all 0 < R < ∞ and D ρ > D ρ ( R ) . Case 2: r = ˜ r . Clearly ρ and ˜ ρ are continu ous, and E P ρ ( X ) < ∞ . Hence Theorem 3 asserts that for D ρ ( ∞ ) < D ρ < ∞ D ρ, ˜ ρ ( ∞ , D ρ ) = min η ≥ 0  η D ρ + E P sup y ∈Y  ˜ ρ ( X , y ) − η ρ ( X, y )   = min η ≥ 0  η D ρ + sup z ∈ R m ˜ ρ ( z ) − η ρ ( z )  , (6) and that t his quantity is equal to lim R →∞ D ρ, ˜ ρ ( R, D ρ ) whenever it i s ﬁnite. Set v ∗ ∈ arg max v ∈ R m : d ( v )=1 ˜ d ( v ) . 9 Since ˜ d is continu ous and { v : d ( v ) = 1 } is compact, at least one such m aximizer exists. It is easy to check that sup z ∈ R m ˜ ρ ( z ) − η ρ ( z ) = sup a ≥ 0 a ˜ r ˜ d ( v ∗ ) ˜ r − η a r = sup a ≥ 0 a r ( ˜ d ( v ∗ ) r − η ) , (7) where we ha ve used r = ˜ r . In other words, the maxim izing z is of the form av ∗ for so me a ≥ 0 . If η < ˜ d ( v ∗ ) r then sup a ≥ 0 a r ( ˜ d ( v ∗ ) r − η ) = lim a →∞ a r ( ˜ d ( v ∗ ) r − η ) = ∞ . On the ot her hand, if η ≥ ˜ d ( v ∗ ) r , then sup a ≥ 0 a r ( ˜ d ( v ∗ ) r − η ) = lim a → 0 a r ( ˜ d ( v ∗ ) r − η ) = 0 . Therefore the m inimizing η ≥ 0 in (6) is equal t o ˜ d ( v ∗ ) r and lim R →∞ D ρ, ˜ ρ ( R, D ρ ) = D ρ ˜ d ( v ∗ ) r . Case 3: r > ˜ r . Recall that by (7) sup a ≥ 0 sup z ∈ R m ˜ ρ ( z ) − η ρ ( z ) = a ˜ r ˜ d ( v ∗ ) ˜ r − η a r The opti mal a ∗ ≥ 0 maxim izing this quanti ty is a ∗ =  ˜ r η r ˜ d ( v ∗ ) ˜ r  1 / ( r − ˜ r ) , which by T heorem 3 im plies that for D ρ ( ∞ ) < D ρ < ∞ D ρ, ˜ ρ ( ∞ , D ρ ) = min η ≥ 0 η D ρ + η − ˜ r/ ( r − ˜ r ) b , min η ≥ 0 g ( η ) , where b , ˜ d ( v ∗ ) ˜ rr / ( r − ˜ r )   ˜ r r  ˜ r/ ( r − ˜ r ) −  ˜ r r  r / ( r − ˜ r )  > 0 . The η ∗ minimizi ng g is η ∗ =  r − ˜ r b ˜ r D ρ  ( ˜ r − r ) /r , which ﬁnally y ields lim R →∞ D ρ, ˜ ρ ( R, D ρ ) = D ˜ r/r ρ  b r − ˜ r  ( r − ˜ r ) /r ˜ r − ˜ r/r r . For d = ˜ d , r = 2 , ˜ r = 1 , this reduces to lim R →∞ D ρ, ˜ ρ ( R, D ρ ) = p D ρ . ♦ Theorem 4 characterizes the behavior o f D ρ, ˜ ρ ( R, D ρ ) for ( R, D ρ ) such that R > R ρ ( D ρ ) . The next theorem characterizes th e behavior of D ρ, ˜ ρ ( R, D ρ ) for ( R, D ρ ) such th at R = R ρ ( D ρ ) . Theor em 5. Let the distort ion measur e ρ be continuo us, and D ρ > 0 . If ther e e xist compact sets K k ⊂ X , M k ⊂ Y such that P ( K k ) → 1 as k → ∞ a nd inf x ∈ K k ,y ∈ M c k ρ ( x, y ) → ∞ (8) as k → ∞ , then D ρ, ˜ ρ ( R ρ ( D ρ ) , D ρ ) ≥ 0 , i.e., the set over whi ch we optimize in (3) is non-empty . 10 If, in addit ion, D ρ, ˜ ρ ( R ρ ( D ρ ) + r, D ρ ) < ∞ for some r > 0 , ˜ ρ i s continuous, and ther e e xis ts a > 1 an d c ≥ 0 such that ˜ ρ a ≤ c + ρ , then D ρ, ˜ ρ ( R ρ ( D ρ )+ , D ρ ) = D ρ, ˜ ρ ( R ρ ( D ρ ) , D ρ ) . Remark. Condition (8) is satisﬁed for example for ρ su ch t hat ρ ( x, y ) → ∞ as k y − x k 2 → ∞ . Indeed, for K k = [ − k , k ] m and M k = [ − 2 k , 2 k ] m , lim k →∞ P ( K k ) = 1 , and inf x ∈ K k ,y ∈ M c k ρ ( x, y ) ≥ inf x,y : k y − x k 2 ≥ k ρ ( x, y ) → ∞ as k → ∞ . Example 3. Given a class of dist ortion measures Γ , the foll owing approach is suggest ed in [12] t o ﬁnd the “closest ” one to ˜ ρ implemented by the human visual sys tem: Determin e D ρ, ˜ ρ ( R, D ρ ( R )) for each ρ ∈ Γ and pick a minim izer ρ ∗ . In sit uations where a unique distribution Q with Q X = P achieving D ρ ( R ) exists, D ρ, ˜ ρ ( R, D ρ ( R )) can be found empirically by generating s amples from Q and having t hem e va luated by human subjects. The hope is that the distortion measure minim izing D ρ, ˜ ρ ( R, D ρ ( R )) shoul d be a good appro ximation to ˜ ρ also for non-optim al image com pression schemes. Formally , th is amounts to assuming that D ρ, ˜ ρ ( R + r , D ρ ( R )) is close to D ρ, ˜ ρ ( R, D ρ ( R )) (at least for small r ). Hence this approach is only valid, if D ρ, ˜ ρ ( R + r, D ρ ( R )) is right continuous i n r at r = 0 . Theorem 5 giv es con ditions u nder which this is indeed t he case. In [12], X = Y = R m + , and each ρ ∈ Γ is of th e form ρ ( x , y ) =    [ v ( x 1 ) , . . . , v ( x m )] − [ v ( y 1 ) , . . . , v ( y m )]  W   2 2 for some monotoni c increasing concav e functi on v : R + → R and some matrix W ∈ R m × m . In order to apply Theorem 5, we need the addition al assumption s t hat v i s conti nuous at 0 , that v ( s ) → ∞ as s → ∞ , t hat W T W is positive deﬁnite, and that ˜ ρ i mplemented by the hum an vis ual s ystem is continuous and bo unded. From Theorem 5, we obtain that under these condi tions — impli citly made in [12] — D ρ, ˜ ρ ( R + r, D ρ ( R )) is indeed rig ht conti nuous at r = 0 , showing that ρ ∗ should yield a good approximation to ˜ ρ also for compression schemes that are only close to optimal. W e consi der the problem of ﬁnding an o ptimal ρ ∈ Γ approximati ng a given ˜ ρ in more detail in Section II-D. ♦ C. Comput ing D ρ, ˜ ρ ( R, D ρ ) Deﬁne R ρ, ˜ ρ ( D ρ , D ˜ ρ ) , inf I ( Q ) , where the inﬁmum is taken over all Q ∈ P ( X × Y ) s uch t hat Q X = P , E Q ρ ≤ D ρ and E Q ˜ ρ ≥ D ˜ ρ . Setting S 1 ,  ( R, D ρ , D ˜ ρ ) : D ˜ ρ ≤ D ρ, ˜ ρ ( R, D ρ )  , S 2 ,  ( R, D ρ , D ˜ ρ ) : R ≥ R ρ, ˜ ρ ( D ρ , D ˜ ρ )  , it is easy to sho w that the closures o f S 1 and S 2 are identical. It i s con venient in the following to analyze R ρ, ˜ ρ ( D ρ , D ˜ ρ ) inst ead of D ρ, ˜ ρ ( R, D ρ ) . Deﬁne Q 1 ( D ρ , D ˜ ρ ) , { Q ∈ P ( X × Y ) : Q X = P , E Q ρ ≤ D ρ , E Q ˜ ρ ≥ D ˜ ρ } , Q 2 ( D ρ , D ˜ ρ ) , { Q ∈ Q 1 ( D ρ , D ˜ ρ ) : Q ≪ λ R m × R m } , 11 where λ R m × R m is Lebesgue m easure on R m × R m . Not e that if Q ≪ λ R m × R m } , i.e., Q is absolutely continuous with respect to Lebesgue measure, t hen Q admit s a densit y . The next theorem gives condit ions u nder w hich we can restrict the mi nimization in the deﬁniti on of R ρ, ˜ ρ ( D ρ , D ˜ ρ ) to di stributions admitt ing a d ensity . W e t hen use thi s resul t to ﬁnd tight er bonds on R ρ, ˜ ρ ( D ρ , D ˜ ρ ) for the i mportant class of dif ference d istortion measures. Theor em 6. If (i) ρ, ˜ ρ ar e contin uous (ii) there exists a ≥ 0 , c ≥ 0 , ε > 0 such t hat for all ( x, y ) ∈ A , { ( x, y ) : ρ ( x, y ) > a } sup z : k z k ∞ ≤ ε ρ ( x, y + z ) ≤ cρ ( x, y ) (iii) P ≪ λ R m then for all δ > 0 inf Q ∈Q 2 ( D ρ + δ,D ˜ ρ − δ ) I ( Q ) ≤ inf Q ∈Q 1 ( D ρ ,D ˜ ρ ) I ( Q ) ≤ inf Q ∈Q 2 ( D ρ ,D ˜ ρ ) I ( Q ) . If, in additi on, (iv) inf Q ∈Q 2 ( D ρ ,D ˜ ρ ) I ( Q ) is continuous at ( D ρ , D ˜ ρ ) (as a function of ( D ρ , D ˜ ρ ) ) then inf Q ∈Q 1 ( D ρ ,D ˜ ρ ) I ( Q ) = inf Q ∈Q 2 ( D ρ ,D ˜ ρ ) I ( Q ) . W e s ay that ρ and ˜ ρ are differ ence dis tortion measur es if ρ ( x, y ) and ˜ ρ ( x, y ) are functions of y − x . W ith some abuse of notati on we shall write ρ ( y − x ) and ˜ ρ ( y − x ) in this case. The next theorem provides a lower bound on R ρ, ˜ ρ ( D ρ , D ˜ ρ ) , similar to the Shannon lower bound for R ρ ( D ρ ) . Theor em 7. Let ρ, ˜ ρ be dif fer ence distort ion meas ur es, and let P ≪ λ R m have ﬁnite differ ential entr op y . If ther e exist η , ˜ η ≥ 0 , and α , such that f : R m → R + deﬁned by f ( z ) , exp  − α − η ρ ( z ) + ˜ η ˜ ρ ( z )  satisﬁes Z f ( z ) dz = 1 , Z ρ ( z ) f ( z ) dz = D ρ , Z ˜ ρ ( z ) f ( z ) dz = D ˜ ρ , then inf Q ∈Q 2 ( D ρ ,D ˜ ρ ) I ( Q ) ≥ max  0 , h ( X ) − h ( Z )  = max  0 , h ( X ) − α − η D ρ + ˜ η D ˜ ρ  , (9) wher e X ∼ P and Z has densit y f . If, in addition, t her e e xists a random variable Y independent of Z such that X = Y + Z , then we have equality in (9) . Example 4. Let X = Y = R 2 , ρ ( x , y ) = ( y − x ) T  1 0 0 1  ( y − x ) , ˜ ρ ( x , y ) = ( y − x ) T  a 0 0 b  ( y − x ) , 12 with a ≥ b > 0 , and let X be Gaussian with mean 0 and cov ariance matrix I . The asympto tic expression (and upper bound) gi ven by Theorem 3 is D ρ, ˜ ρ ( ∞ , D ρ ) = aD ρ (10) and on th e boundary D ρ, ˜ ρ ( R ρ ( D ρ ) , D ρ ) = 1 2 ( a + b ) D ρ . (11) W e now apply Theorems 6 and 7 to compu te D ρ, ˜ ρ ( R, D ρ ) for intermediate values of R . The density of Z from Theorem 7 is gi ven by f ( z ) = exp  α − z T  η − a ˜ η 0 0 η − b ˜ η  z  . Let σ 2 , 1 / (2 η − 2 a ˜ η ) , ˜ σ 2 , 1 / (2 η − 2 b ˜ η ) , and not e that 0 < ˜ σ 2 ≤ σ 2 . W ith this, f is a Gaus sian density , with two independent components with mean zero and variances σ 2 and ˜ σ 2 . For the bound on inf Q ∈Q 2 ( D ρ ,D ˜ ρ ) I ( Q ) give n by Theorem 7 to be tight, we need to show that X = Y + Z for so me independ ent random variable Y . This is th e case if we need σ 2 ≤ 1 (and hence also ˜ σ 2 ≤ 1 ). In terms of σ 2 and ˜ σ 2 , we h a ve E ρ ( Z ) = σ 2 + ˜ σ 2 , E ˜ ρ ( Z ) = aσ 2 + b ˜ σ 2 , h ( X ) − h ( Z ) = − 1 2 log( σ 2 ) − 1 2 log( ˜ σ 2 ) . A short com putation rev eals that for σ 2 = 1 2  1 + p 1 − exp( − 2 r )  D ρ , ˜ σ 2 = 1 2  1 − p 1 − exp( − 2 r )  D ρ , we have E ρ ( Z ) = D ρ , E ˜ ρ ( Z ) = 1 2  ( a + b ) + p 1 − exp( − 2 r )( a − b )  D ρ , h ( X ) − h ( Z ) = R ρ ( D ρ ) + r . Thus, by Theorems 6 and 7, D ρ, ˜ ρ ( R ρ ( D ρ ) + r , D ρ ) ≤ 1 2  ( a + b ) + p 1 − exp( − 2 r )( a − b )  D ρ . And for 0 < D ρ ≤ 2 /  1 + p 1 − exp( − 2 r )  , 13 D ρ, ˜ ρ ( R, D ρ ) 2 1.5 1 0.5 D ρ 1 0.8 0.6 R 1.4 1.2 1 0.8 Fig. 3. D ρ, ˜ ρ ( R, D ρ ) from Example 4 with a = 2 and b = 0 . 5 . we have σ 2 ≤ 1 and hence this bou nd is tight . In particul ar , thi s is t he case for 0 < D ρ ≤ 1 . As a quick sanity check, we see that indeed lim r → 0 D ρ, ˜ ρ ( R ρ ( D ρ ) + r , D ρ ) = 1 2 ( a + b ) D ρ , lim r →∞ D ρ, ˜ ρ ( R ρ ( D ρ ) + r , D ρ ) = aD ρ , which are t he values foun d in (10) and (11). For 0 < D ρ ≤ 1 , the ratio between the limiting expression as r → ∞ and the va lue for ﬁnite r is independent of D ρ and given by D ρ, ˜ ρ ( R ρ ( D ρ ) + r , D ρ ) / D ρ, ˜ ρ ( ∞ , D ρ ) =  ( a + b ) + p 1 − exp( − 2 r )( a − b )  / 2 a. W e see that this con ver ges to on e quickly as r → ∞ , as is shown in Figure 4. Hence in th is case the limitin g expression found in Theorem 3 i s approached rapidly , and is hence a fairly tight upper bo und on D ρ, ˜ ρ ( R ρ ( D ρ ) + r, D ρ ) even for small values of r . ♦ D. Choosing a “Repr esent ative” of a Class of Dist ortion Measur es Let Γ and e Γ denote classes of distortion measures. In this sectio n, we consider the quest ion of ho w a good “representative ” ρ ∈ Γ of e Γ can b e chosen (in a sense to be made precise). Consider again the oracle prod ucing source codes as mention ed in the introductio n, but assume th is time t hat when queried, we can also supply the oracle with a distortion m easure ρ ∈ Γ . The oracle then produces a source code f n such that 1 n log | f n ( X n ) | ≤ R E ρ n ( X n , f n ( X n )) ≤ D ρ ( R ) + ∆ ρ . Knowing the set of all { ∆ ρ } ρ ∈ Γ , and giv en a e Γ , how shoul d we choose ρ ∈ Γ to query the oracle with such that f n will “work well” for all ˜ ρ ∈ e Γ ? 14 D ρ, ˜ ρ ( R ρ ( D ρ ) + r , D ρ ) / D ρ, ˜ ρ ( ∞ , D ρ ) r 2 1.5 1 0.5 0 1 0.8 0.6 0.4 0.2 0 Fig. 4. D ρ, ˜ ρ ( R ρ ( D ρ ) + r, D ρ ) / D ρ, ˜ ρ ( ∞ , D ρ ) from Example 4 as a function of r wi th a = 2 , b = 0 . 5 , f or all values 0 < D ρ ≤ 1 . Note that for an excess rate of r = 0 . 5 , we are already at ov er 90% of the limiting value, at excess r ate of r = 1 , we are at over 97% of the limiting value. This problem has t he foll owing operational s igniﬁcance. Assu me we have a collection Γ of tractable distortion measures (i.e., distorti on measures for whi ch we are able to design good sou rce codes). Assume furthermore, we know that the true distorti on m easure lies in som e class e Γ . W e can choose a source code designed for on e of t he tractable dist ortion measures in Γ . W e are t hen using thi s source code with respect to any of the dis tortion m easures in e Γ . While in the pre vious sections w e were only analyzing the performance guarantees under m ismatched distortion measures, here we also get to choose ρ ∈ Γ in order to mi nimize the mis match. The parameters { ∆ ρ } ρ ∈ Γ allow to account for the difﬁculty of constructing a source code for dist ortion measure ρ (see also Example 5 below). Note, howe ver , that there are se veral reasonable ways in which “work well” in the last paragraph can be d eﬁned. W e will consi der two su ch deﬁnition s in t he following. For rate R , deﬁne D Γ , e Γ  R, { ∆ ρ }  , inf ρ ∈ Γ sup ˜ ρ ∈ e Γ D ρ, ˜ ρ ( R, D ρ ( R ) + ∆ ρ ) , ∆ Γ , e Γ  R, { ∆ ρ }  , inf ρ ∈ Γ sup ˜ ρ ∈ e Γ  D ρ, ˜ ρ ( R, D ρ ( R ) + ∆ ρ ) − D ˜ ρ ( R )  . W e assum e throughout that the { ∆ ρ } ρ ∈ Γ satisfy inf ρ ∈ Γ ∆ ρ > 0 . The next example illustrates why introducing { ∆ ρ } ρ ∈ Γ is necessary . Example 5 . Fix di stortion measures ρ, ˜ ρ , and l et Γ , { aρ } a ≥ 1 . All dis tortion measures in Γ are equi va lent (in the sens e t hat const ructing s ource cod es for ρ is as dif ﬁcult as constructing source cod es for any aρ ). So we should hav e that all aρ represent ˜ ρ equally well (in the sense that for appropriately chosen D aρ , D aρ, ˜ ρ ( R, D aρ ) is the same for all a ≥ 1 ). As w e will see in a moment, this imposes the i ntroduction of the quanti ty { ∆ aρ } a ≥ 1 . For any ﬁ xed D ρ , we have D aρ, ˜ ρ ( R, D ρ ) = D ρ, ˜ ρ ( R, D ρ /a ) 15 which goes either to 0 (if R ≥ R ρ (0) ) or to −∞ as a → ∞ . This shows that we should look at source codes constructed with distortio n lev el relative to D aρ ( R ) Assu me then we try to mi nimize D aρ, ˜ ρ ( R, D aρ ( R ) + ∆) for some ﬁxed ∆ > 0 . W e hav e D aρ, ˜ ρ ( R, D aρ ( R ) + ∆) = D ρ, ˜ ρ ( R, D ρ ( R ) + ∆ /a ) . Thus, again, the m inimum is achie ved as a → ∞ , irrespecti ve of the choice o f ˜ ρ . Th is shows, that we should not choose ∆ aρ as a cons tant. The n atural choice in this example i s ∆ aρ = a ∆ , for which D aρ, ˜ ρ ( R, D aρ ( R ) + ∆ aρ ) = D ρ, ˜ ρ ( R, D ρ ( R ) + ∆) , as expected. ♦ The following two corollaries of Theorem 1 and 2, respectively , establish the operational meaning of D Γ , e Γ  R, { ∆ ρ }  and ∆ Γ , e Γ  R, { ∆ ρ }  . Corollary 8 . Let Γ , e Γ be classes of distortion measur es such that for all ρ ∈ Γ ther e e xists a y 0 = y 0 ( ρ ) ∈ Y satisfying E P ρ ( X, y 0 ) < ∞ . F or every ρ ∈ Γ , R > 0 , and D e Γ , ∆ e Γ such that 0 ≤ D e Γ < lim δ ↓ 0 D Γ , e Γ  R, { ∆ ρ − δ }  , 0 ≤ ∆ e Γ < lim δ ↓ 0 ∆ Γ , e Γ  R, { ∆ ρ − δ }  , a) ther e e xist s ˜ ρ ∈ e Γ and sequ ences of source cod es { f n } n ≥ 1 such that lim n →∞ 1 n log | f n ( X n ) | ≤ R, lim sup n →∞ E ρ n ( X n , f n ( X n )) ≤ D ρ ( R ) + ∆ ρ , lim inf n →∞ E ˜ ρ n ( X n , f n ( X n )) ≥ D e Γ . b) ther e exists ˜ ρ ∈ e Γ and sequ ences of source cod es { f n } n ≥ 1 such that lim n →∞ 1 n log | f n ( X n ) | ≤ R, lim sup n →∞ E ρ n ( X n , f n ( X n )) ≤ D ρ ( R ) + ∆ ρ , lim inf n →∞  E ˜ ρ n ( X n , f n ( X n )) − D ˜ ρ ( R )  ≥ ∆ e Γ . Corollary 9 . a) F or every δ > 0 ther e exists ρ ∈ Γ such that if f n : X n → Y n satisﬁes 1 n log | f n ( X n ) | = R, E ρ n ( X n , f n ( X n )) ≤ D ρ ( R ) + ∆ ρ , then sup ˜ ρ ∈ e Γ E ˜ ρ n ( X n , f n ( X n )) ≤ D Γ , e Γ  R, { ∆ ρ }  + δ. b) F or every δ > 0 t her e exists ρ ∈ Γ su ch that if f n : X n → Y n satisﬁes 1 n log | f n ( X n ) | = R, E ρ n ( X n , f n ( X n )) ≤ D ρ ( R ) + ∆ ρ , then sup ˜ ρ ∈ Γ  E ˜ ρ n ( X n , f n ( X n )) − D ˜ ρ ( R )  ≤ ∆ Γ , e Γ  R, { ∆ ρ }  + δ. 16 Corollaries 8 and 9 al low us to m ake guarantees about the performance of a source codes constructed with respect to the best “representativ e” ρ ∈ Γ o f e Γ . Ind eed, by Corollary 9, there exists ρ ∈ Γ such that if f n : X n → Y n is a source code of rate R desig ned for dist ortion measure ρ and d istortion lev el D ρ ( R ) + ∆ ρ , then f n is als o a source code for any d istortion measure ˜ ρ ∈ e Γ and distorti on le vel D Γ , e Γ  R, { ∆ ρ }  + δ . Moreover , this is essentially the best guarantee one can make, si nce b y Corollary 8 there exist s ource codes with same blocklength n and same rate R designed for any dis tortion measure ρ ∈ Γ and distortion l e vel D ρ ( R ) + ∆ ρ that result in a distortion level of m ore than D Γ , e Γ  R − δ ( n ) , { ∆ ρ − ˜ δ ( n ) }  − δ ( n ) for some di stortion measure ˜ ρ ∈ e Γ , and wit h δ ( n ) , ˜ δ ( n ) → 0 as n → ∞ . Example 6. Let e Γ = { ˜ ρ } , and Γ ,  ρ ( x , y ) = w x k y − x k 2 2 : w ∈ W ⊂ ( X → R + )  , Let P ∈ P ( X × Y ) be such that E P w X k X k 2 2 < ∞ for all w ∈ W . In [13], the authors show how vector quantizers can be relativ ely easily constructed for di stortion measures in the class Γ deﬁned here. Give n a m ore sophi sticated distortion m easure ˜ ρ , it is thus of i nterest to ﬁnd the “closest ” ρ ∈ Γ to ˜ ρ . In other words, for some δ > 0 , we want to ﬁnd a ρ ∈ Γ such that D ρ, ˜ ρ ( R, D ρ ( R ) + ∆ ρ ) ≤ D Γ , ˜ Γ ( R ) + δ. Computing D Γ , e Γ  R, { ∆ ρ }  could be done numerically; to obt ain some insigh t we will instead minimi ze D ρ, ˜ ρ ( ∞ , D ρ ( R ) + ∆ ρ ) . This will lead to an upper b ound on D Γ , e Γ  R, { ∆ ρ }  , and thus allows us to make performance guarantees. Moreover , as we have seen i n Example 4, this bound can be quite good even for ﬁnite values of R . T o be speciﬁc, let ˜ ρ ( x , y ) = ( y − x ) T f W x ( y − x ) for f W x positive deﬁnite P alm ost e verywhere. Let w ρ ∈ W be the weight functi on corresponding to distortion measure ρ ∈ Γ . Then from Example 1, D ρ, ˜ ρ ( ∞ , D ρ ( R ) + ∆ ρ ) = ( D ρ ( R ) + ∆ ρ ) min  η : f W x − η w ρ x I ≤ 0 P a.e  = ( D ρ ( R ) + ∆ ρ ) ess sup x ∈X λ 1 ( f W x ) /w ρ x , where λ 1 ( f W x ) is the largest eigen value of f W x , and where t he essential supremum is with respect to P . Hence D Γ , e Γ  R, { ∆ ρ }  ≤ inf ρ ∈ Γ  ( D ρ ( R ) + ∆ ρ ) ess su p x ∈X λ 1 ( f W x ) /w ρ x  . In other words, th e optimal “representati ve” ρ ∈ Γ of ˜ ρ ﬁnds the best tradeof f between the dif ﬁculty of constructing sou rce codes for ρ (captured by the term D ρ ( R ) + ∆ ρ ) and the clos eness to ˜ ρ (captured by the term ess sup x ∈X λ 1 ( f W x ) /w ρ x ). ♦ In t he l ast example, we ha ve taken a sophisticated dis tortion measure ˜ ρ and fou nd a good t ractable approximation in Γ for it . This approach poses t he following q uestion. Even if ˜ ρ is a very good model for (say) the h uman visual system, it wil l certainly be di f ferent from it. In t his situation, it is not clear if minimizi ng D ρ, ˜ ρ ( R, D ρ ( R ) + ∆ ρ ) is meaningful. Ind eed, if ρ ∗ is t he distortion measure implemented by the human visual system , we should really be m inimizin g D ρ,ρ ∗ ( R, D ρ ( R ) + ∆ ρ ) i nstead. The next theorem provides condition s under which D ρ, ˜ ρ ( R, D ρ ( R ) + ∆ ρ ) and D ρ,ρ ∗ ( R, D ρ ( R ) + ∆ ρ ) are close and hence the app roach of Ex ample 6 is reasonable. Pr oposition 10. Let ρ 1 , ρ 2 , ρ 3 be continu ous distortion measur es. Then D ρ 1 ,ρ 3 ( R, D ) ≤ D ρ 1 ,ρ 2 ( R, D ) + E P (sup y ∈Y ρ 3 ( X , y ) − ρ 2 ( X , y )) 17 and D ρ 1 ,ρ 3 ( R, D ) ≥ D ρ 1 ,ρ 2 ( R, D ) − E P sup y ∈Y | ρ 3 ( X , y ) − ρ 2 ( X , y ) | . Example 7. Setting ρ 1 = ρ 2 , Propositi on 10 shows that   D ρ 2 ,ρ 3 ( R, D ρ 1 ) − D ρ 2 ( R )   ≤ E P sup y ∈Y | ρ 3 ( X , y ) − ρ 2 ( X , y ) | . Thus if E P sup y ∈Y | ρ 3 ( X , y ) − ρ 2 ( X , y ) | is sm all th en the distortion m easures ρ 2 and ρ 3 are almost equiv alent (from the poi nt of source cod ing). Moreover , if ρ 3 is the actual distortion measure (implem ented, e.g., by the human visual sys tem), and ρ 2 is a soph isticated model for i t (e.g., ρ 2 ( x , y ) = ( y − x ) T f W x ( y − x ) as in Example 6), t hen small E P sup y ∈Y | ρ 3 ( X , y ) − ρ 2 ( X , y ) | guarantees that m inimizing D ρ 1 ,ρ 2 ( R, D ρ 1 + ∆ ρ 1 ) ov er all ρ 1 ∈ Γ (as is do ne in Exam ple 6) is essential ly equiv alent to minimizin g D ρ 1 ,ρ 3 ( R, D ρ 1 + ∆ ρ 1 ) . Hence, when constructi ng a model ρ 2 for the distort ion measure ρ 3 implemented by the hum an vis ual sys tem, it is reasonabl e to choose th e m odel parameters such that E P sup y ∈Y | ρ 3 ( X , y ) − ρ 2 ( X , y ) | is minim ized. ♦ I I I . P RO O F S A. Pr oof of Theor em 1 A sli ght modi ﬁcation of Lemma 9.3.1 and the ﬁrst part of the proof of Theorem 9.6.2 in [14] show that for every δ > 0 there exists a sequence of source codes { ˜ f n } n ≥ 1 such that lim n →∞ P n ( A n ) = 0 , lim n →∞ 1 n log | ˜ f n ( X n ) | ≤ R, (12) where A n , { x n : ρ n ( x n , ˜ f n ( x n )) > D ρ − δ / 2 } ∪ { x n : ˜ ρ n ( x n , ˜ f n ( x n )) < D ˜ ρ } . Let B n , { x n : ρ n ( x n , ˜ f n ( x n )) > D ρ − δ / 2 } ⊂ A n , and set f n ( x n ) , ( y 0 if x n ∈ B n , ˜ f n ( x n ) else, where y 0 , ( y 0 , . . . , y 0 ) ∈ Y n . W e ha ve | f n ( X n ) | ≤ | ˜ f n ( X n ) | + 1 and hence by (12) lim n →∞ 1 n log | f n ( X n ) | = lim n →∞ 1 n log | ˜ f n ( X n ) | ≤ R. Moreover , E ρ n ( X n , f n ( X n )) ≤ D ρ − δ / 2 + E ( ρ n ( X n , f n ( X n )); B n ) , 18 and for an y b ≥ 0 E ( ρ n ( X n ,f n ( X n )); B n ) = 1 n n X i =1 E ( ρ ( X i , y 0 ); B n ) ≤ 1 n n X i =1  E  ρ ( X i , y 0 ); { ρ ( X i , y 0 ) ≤ b } ∩ B n  + E  ρ ( X i , y 0 ); { ρ ( X i , y 0 ) > b }   ≤ bP n ( A n ) + E P  ρ ( X, y 0 ); { ρ ( X , y 0 ) > b }  . Since E P ρ ( X, y 0 ) < ∞ , there exists b > 0 such th at E P  ρ ( X, y 0 ); { ρ ( X , y 0 ) > b }  ≤ δ / 2 . Hence using (12), lim sup n →∞ E ρ n ( X n , f n ( X n )) ≤ D ρ . Finally , E ˜ ρ ( X n , f n ( X n )) ≥ E ( ˜ ρ ( X n , f n ( X n )); A c n ) ≥ D ˜ ρ P n ( A c n ) , and hence b y (12) lim inf n →∞ E ˜ ρ n ( X n , f n ( X n )) ≥ D ˜ ρ . B. Pr oof of Theor em 2 Let ˜ ρ ′ , − ˜ ρ . If 1 n log | f n ( X n ) | = R, E ρ n ( X n , f n ( X n )) ≤ D ρ , E ˜ ρ n ( X n , f n ( X n )) ≥ D ˜ ρ , then we also ha ve E ˜ ρ ′ n ( X n , f n ( X n )) ≤ − D ˜ ρ , D ˜ ρ ′ . By [5, T heorem 1.b], for e very δ > 0 there exists Q ∈ P ( X × Y ) such that Q X = P and I ( Q ) ≤ R + δ, E Q ρ ≤ D ρ , E Q ˜ ρ ′ ≤ D ˜ ρ ′ . Therefore D ˜ ρ ≤ E Q ˜ ρ ≤ D ρ, ˜ ρ ( R + δ , D ρ ) , and maxim izing over the choice of D ˜ ρ yields the ﬁrst part of the th eorem. For the second part, we need to s how that D ρ, ˜ ρ ( · , D ρ ) is continuo us for R > R ρ ( D ρ ) . W e ﬁrst sh ow that D ρ, ˜ ρ ( · , D ρ ) is conca ve. Fix δ > 0 . Let Q 1 , Q 2 ∈ P ( X × Y ) , both with X marginal P , and such t hat I ( Q i ) ≤ R i , E Q i ρ ≤ D ρ , and E Q i ˜ ρ ≥ D ρ, ˜ ρ ( R i , D ρ ) − δ for i ∈ { 1 , 2 } . Setting Q , αQ 1 + (1 − α ) Q 2 , we hav e E Q ρ ≤ D ρ and E Q ˜ ρ = α E Q 1 ˜ ρ + (1 − α ) E Q 2 ˜ ρ ≥ α D ρ, ˜ ρ ( R 1 , D ρ ) + (1 − α ) D ρ, ˜ ρ ( R 2 , D ρ ) − δ. 19 Since mutual i nformation is con vex in the cond itional distribution [15, Corollary 5.5 .5], I ( Q ) ≤ α I ( Q 1 ) + (1 − α ) I ( Q 2 ) ≤ αR 1 + (1 − α ) R 2 . Hence D ρ, ˜ ρ ( αR 1 + (1 − α ) R 2 , D ρ ) ≥ α D ρ, ˜ ρ ( R 1 , D ρ ) + (1 − α ) D ρ, ˜ ρ ( R 2 , D ρ ) − δ. Since δ > 0 is arbitrary , thi s proves conca vity of D ρ, ˜ ρ ( · , D ρ ) . Moreover D ρ, ˜ ρ ( · , D ρ ) is increasing, and therefore this implies that it is rig ht-continuous except for possi bly at the p oint R ρ ( D ρ ) . From this, the result follows. C. Proof of Theor em 3 W e ﬁrst sho w t hat lim R →∞ D ρ, ˜ ρ ( R, D ρ ) = D ρ, ˜ ρ ( ∞ , D ρ ) . D ρ, ˜ ρ ( ∞ , D ρ ) < ∞ by As sumption (iv), and therefore there exists Q ∈ P ( X × Y ) s uch that Q X = P , E Q ρ ≤ D ρ , and E Q ˜ ρ ≥ D ρ, ˜ ρ ( ∞ , D ρ ) − ε . Let K i ⊂ X × Y be compact and su ch that Q ( K i ) ≥ 1 − 1 /i for all i ≥ 1 . Thus Q ( ∪ i ≥ 1 K i ) = 1 , and therefore by do minated con vergence (using Assumption (iv) for the ﬁrst lin e and A ssumptio n (ii) for the second li ne) lim I →∞ E Q ( ˜ ρ ; ∪ I i =1 K i ) = E Q ˜ ρ , lim I →∞ E P ( ρ ( X, y 0 ); ( ∪ I i =1 K i ) c ) = 0 . Hence there exists a compact K ⊂ X × Y such that E Q ( ˜ ρ ; K ) ≥ E Q ˜ ρ − ε (13) E P ( ρ ( X, y 0 ); K c ) ≤ ε. (14) Since ρ and ˜ ρ are conti nuous by As sumption (i), they are uniformly continuous on the compact set K . Hence there exists δ > 0 such that | ρ ( x, y ) − ρ ( ˜ x, ˜ y ) | < ε | ˜ ρ ( x, y ) − ˜ ρ ( ˜ x, ˜ y ) | < ε , whenev er k x − ˜ x k + k y − ˜ y k < δ . No w , since K is compact, there exists some ﬁnite L and { x ℓ , y ℓ } L ℓ =1 ⊂ K and a ﬁnite measurable partitio n { A ℓ } L ℓ =1 of K such that k x − x ℓ k + k y − y ℓ k < δ for all ( x, y ) ∈ A ℓ and for all ℓ ∈ { 1 , . . . , L } . Deﬁne e Y , ( y 0 if ( X , Y ) ∈ K c , y ℓ if ( X , Y ) ∈ A ℓ . Since K and { A ℓ } L ℓ =1 are measurable, e Y is a random variable. Let e Q b e t he distribution of ( X, e Y ) when ( X , Y ) ∼ Q . Since e Y takes on at most L + 1 values, we have I ( e Q ) ≤ log( L + 1 ) < ∞ . 20 Moreover , E e Q ρ = L X ℓ =1 E e Q ( ρ ; A ℓ ) + E e Q ( ρ ; K c ) = L X ℓ =1 E P ( ρ ( X, y ℓ ); A ℓ ) + E P ( ρ ( X, y 0 ); K c ) ≤ L X ℓ =1 E Q ( ρ ( X, Y ) + ε ; A ℓ ) + ε ≤ E Q ρ + 2 ε ≤ D ρ + 2 ε, where the ﬁrst inequality follows from the uniform continuity of ρ on K , and from (14). An d E e Q ˜ ρ ≥ L X ℓ =1 E e Q ( ˜ ρ ; A ℓ ) = L X ℓ =1 E P ( ˜ ρ ( X , y ℓ ); A ℓ ) ≥ L X ℓ =1 E Q ( ˜ ρ ( X , Y ) − ε ; A ℓ ) = E Q ( ˜ ρ ; K ) − ε ≥ E Q ˜ ρ − 2 ε ≥ D ρ, ˜ ρ ( ∞ , D ρ ) − 3 ε, where the second in equality follows from the uniform continui ty of ˜ ρ o n K , and the third inequali ty form (13). Therefore D ρ, ˜ ρ ( ∞ , D ρ ) ≤ lim R →∞ D ρ, ˜ ρ ( R, D ρ + 2 ε ) + 3 ε ≤ D ρ, ˜ ρ ( ∞ , D ρ + 2 ε ) + 3 ε . Since D ρ, ˜ ρ ( ∞ , · ) is concav e, it is continuous at D ρ > D ρ ( ∞ ) (Assumption (iii)). Hence t aking the limit as ε → 0 y ields lim R →∞ D ρ, ˜ ρ ( R, D ρ ) = D ρ, ˜ ρ ( ∞ , D ρ ) . W e now show that E P sup y ∈Y ( ˜ ρ ( X , y ) − η ρ ( X , y )) (15) is well deﬁne d. Let η > 0 , f ( x, y ) , ˜ ρ ( x, y ) − η ρ ( x, y ) , g ( x ) , sup y ∈Y f ( x, y ) . By Assump tion (i), f is continuo us, and hence sup y ∈ R m f ( x, y ) = sup y ∈ Q m f ( x, y ) . As Q m is countable, this last supremum is measurable, and hence g is a measurable function. Moreover , g ( x ) ≥ f ( x, y 0 ) ≥ − η ρ ( x, y 0 ) , 21 and hence b y Assumpti on (ii) E P g − < ∞ , where g − , max { 0 , − g } is t he negativ e part o f g . Thus the expectation E P g in (15) is well deﬁned. W e next show t hat D ρ, ˜ ρ ( ∞ , D ρ ) ≤ min η ≥ 0 η D ρ + E P g . (16) Consider D ρ, ˜ ρ ( ∞ , D ρ ) = sup Q ∈P ( X ×Y ): Q X = P , E Q ρ ≤ D ρ E Q ˜ ρ . The right hand s ide is linear i n Q with li near constrain ts. Since D ρ > D ρ ( ∞ ) by Assumption (iii), a strictly feasibly p oint exists. Hence, we ob tain by strong duali ty (see, e.g., [16, Theorem 8.6.1]) D ρ, ˜ ρ ( ∞ , D ρ ) = min η ≥ 0 η D ρ + sup Q ∈P ( X ×Y ): Q X = P E Q ( ˜ ρ − η ρ ) ≤ min η ≥ 0 η D ρ + E P g . As the last step, we sho w that we ha ve equality in (16). T o this e nd, we hav e to construct a Q ∈ P ( X × Y ) such that Q X = P and E Q f is arbitrarily close to E P g . Given any positive simple function 0 ≤ s ≤ g , i.e., s = P J j =1 β j 1 1 B j for ﬁnite measurable partition { B j } J j =1 of X . Let C j ⊂ B j be compact and such t hat P ( C j ) ≥ P ( B j ) − ε/J for all j ∈ { 1 , . . . , J } . Since the { B j } J j =1 are disjoi nt, we h a ve P ( ∪ J j =1 C j ) ≥ 1 − ε . For each x ∈ C j and any δ > 0 , there e x ists a y ( x ) such that f ( x, y ( x )) ≥ g ( x ) − δ ≥ s ( x ) − δ. By cont inuity of f and since s is cons tant on B j , for each x ∈ C j ∩ B j there exists a open neighborhood G j ( x ) o f x such t hat f ( ˜ x, y ( x )) ≥ s ( x ) − 2 δ = s ( ˜ x ) − 2 δ for e very ˜ x ∈ G j ( x ) . Since C j ⊂ ∪ x ∈ B j G j ( x ) , and since C j is compact, there exists a ﬁnit e subcover , say { G j ( x ) } x ∈ e C j for som e ﬁnite set e C j ⊂ C j . Construct a ﬁnite m easurable partiti on { E k } K k =1 of ∪ J j =1 C j such that for each k we ha ve E k ⊂ G j ( x ) ∩ B j for some j and some x ∈ e C j . Call x k the x ∈ e C j corresponding to E k . Deﬁne Y , ( y 0 if X ∈ ( ∪ J j =1 C j ) c , y ( x k ) if X ∈ E k . Since each E k is m easurable, this is a random variable. Let Q be the dis tribution of ( X, Y ) . W e ha ve E Q f = K X k =1 E Q ( f ; E k ) + E Q ( f ( X , Y ); ( ∪ J j =1 C j ) c ) ≥ K X k =1 E P ( f ( X , y ( x k )); E k ) − η E P ( ρ ( X, y 0 ); ( ∪ J j =1 C j ) c ) ≥ K X k =1 E P ( s ( X ) − 2 δ ; E k ) − η E P ( ρ ( X, y 0 ); ( ∪ J j =1 C j ) c ) = E P s ( X ) − E P ( η ρ ( X , y 0 ) + s ( X ); ( ∪ J j =1 C j ) c ) − 2 δ. Recall that P ( ∪ J j =1 C j ) ≥ 1 − ε . Since 0 ≤ E P ( η ρ ( X , y 0 ) + s ( X )) ≤ η E P ρ ( X, y 0 ) + max j ∈{ 1 ,...,J } β j < ∞ 22 by Assumpt ion (ii), we can choose ε small enough such that E P ( η ρ ( X , y 0 ) + s ( X ); ( ∪ J j =1 C j ) c ) ≤ δ. W ith thi s E Q f ≥ E P s − 3 δ. (17) Since g is a m easurable function, we can choo se simple functions s i ≤ g such that lim i →∞ E P s i = E P g . In light of (17), this impl ies that sup Q ∈P ( X ×Y ): Q X = P E Q f = E P g , concluding the proof. D. Pr oof of Theor em 4 D ρ ( · ) is con vex [15, Lemm a 10.6.1] and hence continuous except for p ossibly at the boundary . By Assumptio n (iii), D ρ (0) < ∞ , and by Ass umption (i), R > 0 . Thus D ρ ( · ) is continuou s at R . Therefore, since D ρ > D ρ ( R ) by Assumpt ion (ii), and since 0 < R < ∞ by Assump tion (i), there exists ε > 0 such that D ρ − 2 ε ≥ D ρ ( R − ε ) . Hence by the deﬁnition of D ρ ( R ) , there e xists Q ∈ P ( X × Y ) such that Q X = P , I ( Q ) ≤ R − ε and E Q ρ ≤ D ρ − ε . Let g k : X → Y be deﬁned by g k ( x ) , ( y ∗ k if x ∈ A k , y 0 else . Set Y k , g k ( X ) and let W k be the distri bution of ( X , Y k ) . Set e Q k , (1 − α ) Q + αW k for so me α ∈ [0 , 1 ] . Clearly b oth W k and e Q k hav e X marginal P . Mutual information is conv ex i n the cond itional distribution [15, Corollary 5.5.5], and thus I ( e Q k ) ≤ (1 − α ) I ( Q ) + αI ( W k ) ≤ I ( Q ) + α I ( W k ) . W e have I ( W k ) ≤ log( 2) < 1 , and h ence for α ≤ ε I ( e Q k ) ≤ I ( Q ) + ε ≤ R. (18) Moreover , by Assumption (iii) E e Q k ρ ≤ E Q ρ + α E W k ρ ≤ D ρ − ε + α  E P ( ρ ( X, y 0 ); A c k ) + E P ( ρ ( X, y ∗ k ); A k )  ≤ D ρ − ε + α  D 0 + E P ( ρ ( X, y ∗ k ); A k )  . Setting α , ε 1 + D 0 + E P ( ρ ( X, y ∗ k ); A k ) , this becomes E e Q k ρ ≤ D ρ . (19) Note that α ≤ ε as needed i n (18), and α > 0 since E P ( ρ ( X, y ∗ k ); A k ) < ∞ by As sumption (iv). Finally E e Q k ˜ ρ ≥ α E W k ˜ ρ ≥ α E P ( ˜ ρ ( X , y ∗ k ); A k ) = ε E P ( ˜ ρ ( X , y ∗ k ); A k ) 1 + D 0 + E P ( ρ ( X, y ∗ k ); A k ) , 23 and E P ( ρ ( X, y ∗ k ); A k ) = E P  ρ ( X, y ∗ k ) ˜ ρ ( X , y ∗ k ) ˜ ρ ( X , y ∗ k ); A k  ≤ E P ( ˜ ρ ( X , y ∗ k ); A k )  sup x ∈ A k ρ ( x, y ∗ k ) ˜ ρ ( x, y ∗ k )  . Hence by A ssumptio n (iv), E e Q k ˜ ρ ≥ ε  1 + D 0 E P ( ˜ ρ ( X , y ∗ k ); A k ) + sup x ∈ A k ρ ( x, y ∗ k ) ˜ ρ ( x, y ∗ k )  − 1 ≥ ε  1 + D 0 P ( A k ) inf x ∈ A k ˜ ρ ( x, y ∗ k ) + sup x ∈ A k ρ ( x, y ∗ k ) ˜ ρ ( x, y ∗ k )  − 1 → ∞ (20) as k → ∞ . Combining (18), (19), and (20), we get D ρ, ˜ ρ ( R, D ρ ) = ∞ . E. Pr oof of Theor em 5 D ρ, ˜ ρ ( R ρ ( D ρ ) , D ρ ) ≥ 0 if and only if the set of all Q ∈ P ( X × Y ) s uch that Q X = P , I ( Q ) ≤ R ρ ( D ρ ) , E Q ρ ≤ D ρ is non empty . By deﬁnition R ρ ( D ρ ) = inf I ( Q ) , where the inﬁmum is taken over all Q ∈ P ( X × Y ) such that Q X = P and E Q ρ ≤ D ρ . Hence D ρ, ˜ ρ ( R ρ ( D ρ ) , D ρ ) ≥ 0 if and only if this last inﬁmu m is att ained (i.e., a minimizin g Q exists). By Theorem 2.2 (and the remark foll owing its proof) in [17], this is the case when ρ is continuous, D ρ > 0 , and the set of all Q over w hich the inﬁmum is take n is tight 3 . From this, we only have t o show tightness to prove the ﬁrst part of the theorem. E Q ρ ≤ D ρ implies that D ρ ≥ E Q ρ ≥ Q ( K k × M c k ) inf x ∈ K k ,y ∈ M c k ρ ( x, y ) , and thus Q ( K k × M k ) = P ( K k ) − Q ( K k × M c k ) ≥ P ( K k ) − D ρ / inf x ∈ K k ,y ∈ M c k ρ ( x, y ) → 1 as k → ∞ . Since the sets K k × M k are compact, this shows ti ghtness and proves the ﬁrst part of the theorem. The proof of t he second part adapt s an argument from [17 , Theorem 2.2]. Note that since D ρ, ˜ ρ ( R ρ ( D ρ ) + r , D ρ ) < ∞ for some r > 0 (and hence, by conca vity , for all r > 0 ), for e very ε > 0 and all i ≥ 1 there exists Q i ∈ P ( X × Y ) with X marginal P such that I ( Q i ) ≤ R ρ ( D ρ ) + 1 /i, E Q i ρ ≤ D ρ , E Q i ˜ ρ ≥ D ρ, ˜ ρ ( R ρ ( D ρ ) + 1 /i, D ρ ) − ε. Since the set of all feasible distri butions is tight as s hown above, thi s implies th at { Q i } i ≥ 1 contains a weakly con ver gent subsequence 4 , and we may assume with out l oss of generality that Q i ⇒ Q for som e Q ∈ P ( X × Y ) . Using exactly the same ar gument as i n [17, Theorem 2 .2], we ha ve R ρ ( D ρ ) ≥ lim inf i →∞ I ( Q i ) ≥ I ( Q ) (21) 3 The set of distribution s Q ⊂ P ( X × Y ) is tight if there exists compact sets A k ⊂ X × Y such that sup Q ∈Q Q ( A c k ) → 0 as k → ∞ . 4 Q i con verge s weakly to Q (denoted by Q i ⇒ Q ) if lim i →∞ E Q i g = E Q g for all bounded and continuous functions g ∈ X × Y → R . An equi valent deﬁnition for Q i ⇒ Q i s that lim inf i →∞ Q i ( A ) ≥ Q ( A ) for all open sets A ⊂ X × Y (see [18, Theorem 2.1]). If Z i ∼ Q i and Z ∼ Q , we write Z i ⇒ Z if Q i ⇒ Q . 24 and D ρ ≥ lim inf i →∞ E Q i ρ ≥ E Q ρ. (22) Finally , si nce ˜ ρ is continuous , we ha ve ˜ ρ ( X, Y i ) ⇒ ˜ ρ ( X , Y ) , wh ere ( X , Y i ) ∼ Q i and ( X , Y ) ∼ Q . As sup i ≥ 1 E ˜ ρ ( X , Y i ) a ≤ c + sup i ≥ 1 E ρ ( X, Y i ) ≤ c + D ρ < ∞ , { ˜ ρ ( X , Y i ) } i ≥ 1 is uniformly integra ble. Therefore by [18, Theorem 3.5] lim i →∞ D ρ, ˜ ρ ( R ρ ( D ρ ) + 1 /i, D ρ ) − ε ≤ lim i →∞ E ˜ ρ ( X , Y i ) = E ˜ ρ ( X , Y ) = E Q ˜ ρ. (23) Since ε > 0 is arbitrary , (21), (22), and (23), imply th at D ρ, ˜ ρ ( R ρ ( D ρ ) , D ρ ) ≥ D ρ, ˜ ρ ( R ρ ( D ρ )+ , D ρ ) . As D ρ, ˜ ρ ( · , D ρ ) is i ncreasing, we also ha ve D ρ, ˜ ρ ( R ρ ( D ρ ) , D ρ ) ≤ D ρ, ˜ ρ ( R ρ ( D ρ )+ , D ρ ) , concluding the proof of the second p art of t he theorem. F . Pr o of of Theor em 6 Since Q 2 ( D ρ , D ˜ ρ ) ⊂ Q 1 ( D ρ , D ˜ ρ ) , it i s enough to sh ow that for ev ery δ > 0 inf Q ∈Q 1 ( D ρ ,D ˜ ρ ) I ( Q ) ≥ inf Q ∈Q 2 ( D ρ + δ,D ˜ ρ − δ ) I ( Q ) . For some ν > 0 , choose Q ∈ Q 1 ( D ρ , D ˜ ρ ) such th at I ( Q ) ≤ inf Q ∈Q 1 ( D ρ ,D ˜ ρ ) I ( Q ) + ν. Fix ε > 0 and let Z be uniforml y distributed on ( − ε, ε ) m and independent of X , Y . Deﬁne e Y , Y + Z and let Q ε be the distribution of ( X, e Y ) when ( X , Y ) ∼ Q . Not e that by Assumption (i ii), Q ε ≪ λ R m × R m whenev er ε > 0 and that Q 0 = Q . By the d ata processing i nequality I ( Q ε ) ≤ I ( Q ) ≤ inf Q ∈Q 1 ( D ρ ,D ˜ ρ ) I ( Q ) + ν. (24) W e n ow sho w that Q ε ⇒ Q as ε → 0 (i.e., that Q ε con ver ges weakly 4 to Q ). For this, it sufﬁc es to show t hat for ever y open G ⊂ X × Y we have lim inf ε → 0 Q ε ( G ) ≥ Q ( G ) . D eﬁne G ε , { ( x, y ) ∈ X × Y : ( x, y + z ) ∈ G ∀ z ∈ R m with k z k ∞ < ε } . Since ( X , Y ) ∈ G ε implies ( X , e Y ) ∈ G , we ha ve Q ε ( G ) ≥ Q ( G ε ) . Since G is open, we h a ve 1 1 G ε → 1 1 G pointwise as ε → 0 , and hence by F atou’ s lemma lim inf ε → 0 Q ε ( G ) ≥ lim inf ε → 0 Q ( G ε ) ≥ Q ( G ) . Thus Q ε ⇒ Q as ε → 0 . By continui ty of ˜ ρ (Assumption (i)), we get by weak con vergence for ev ery b ≥ 0 E Q ε ˜ ρ ≥ E Q ε min { ˜ ρ, b } → E Q min { ˜ ρ, b } , (25) as ε → 0 . Assuming E Q ˜ ρ < ∞ , cho ose b such that E Q min { ˜ ρ, b } ≥ E Q ˜ ρ − δ / 2 . T hen there e xists ε 1 > 0 such that for ε ≤ ε 1 , we h a ve by (25) E Q ε ˜ ρ ≥ E Q min { ˜ ρ, b } − δ / 2 ≥ E Q ˜ ρ − δ ≥ D ˜ ρ − δ. (26) 25 Since D ˜ ρ < ∞ , this last conclusion follows by a similar argument if E Q ˜ ρ = ∞ . Moreover , by Assu mption (ii) E Q ε ρ = E Q ε ( ρ ; A c ) + E Q ε ( ρ ; A ) ≤ E Q ε ( ρ ; A c ) + E Q  sup z : k z k ∞ ≤ ε ρ ( X, Y + z ); A  ≤ E Q ε ( ρ ; A c ) + c E Q ( ρ ; A ) . (27) Now not e that Assumpt ion (ii ) holds also as we in crease a . Since E Q ρ ≤ D ρ , we have E Q ( ρ ; A ) → 0 as a → ∞ . Hence there e xi sts a such that Assumption (ii) ho lds and c E Q ( ρ ; A ) ≤ δ / 2 . F o r this a , we can continue (27) as E Q ε ρ ≤ E Q ε ( ρ ; A c ) + δ / 2 ≤ E Q ε min { ρ, a } + δ / 2 . By continuity o f ρ (Ass umption (i)) and weak con vergence of Q ε , E Q ε min { ρ, a } → E Q min { ρ, a } as ε → 0 . Hence there exists 0 < ε 2 ≤ ε 1 such that for 0 < ε ≤ ε 2 , we h a ve E Q ε ρ ≤ E Q min { ρ, a } + δ ≤ E Q ρ + δ ≤ D ρ + δ. (28) Combining (24), (26), and (28), we obt ain for 0 < ε ≤ ε 2 that Q ε ∈ Q 2 ( D ρ + δ, D ˜ ρ − δ ) and I ( Q ε ) ≤ inf Q ∈Q 1 ( D ρ ,D ˜ ρ ) I ( Q ) + ν. Since ν > 0 is arbitrary , this shows that inf Q ∈Q 2 ( D ρ + δ,D ˜ ρ − δ ) I ( Q ) ≤ inf Q ∈Q 1 ( D ρ ,D ˜ ρ ) I ( Q ) , proving the ﬁrst part of the theorem. The second part follo ws directly from conti nuity of inf Q ∈Q 2 ( D ρ ,D ˜ ρ ) I ( Q ) . G. Pr oof of Theor em 7 Let Q ∈ Q 2 ( D ρ , D ˜ ρ ) and ( X , Y ) ∼ Q . Then h ( X ) and h ( X | Y ) are well deﬁned and s ince h ( X ) is ﬁnite, we ha ve I ( Q ) = h ( X ) − h ( X | Y ) . Therefore I ( Q ) = h ( X ) − h ( X | Y ) = h ( X ) − h ( X − Y | Y ) ≥ h ( X ) − h ( X − Y ) ≥ h ( X ) − sup Z : E ρ ( Z ) ≤ D ρ , E ˜ ρ ( Z ) ≥ D ˜ ρ h ( Z ) , with equality if there exists Z = X − Y independent of Y and achieving t he sup remum. By [19, Theorem 3.2], if th ere exist α ∈ R , η , ˜ η ∈ R + such that f : R m → R + deﬁned by f ( z ) , exp  − α − η ρ ( z ) + ˜ η ˜ ρ ( z )  26 satisﬁes Z f ( z ) dz = 1 , Z ρ ( z ) f ( z ) dz = D ρ , Z ˜ ρ ( z ) f ( z ) dz = D ˜ ρ , then f is the density of the maxi mizing Z , and h ( Z ) = α + η D ρ − ˜ η D ˜ ρ . Thus in thi s case inf Q ∈Q 2 ( D ρ ,D ˜ ρ ) I ( Q ) ≥ max  0 , h ( X ) − α − η D ρ + ˜ η D ˜ ρ  . H. Pr oof of Cor ol lary 8 Let ε, δ > 0 be small enoug h such t hat D e Γ < D Γ , e Γ  R, { ∆ ρ − δ }  − 2 ε and inf ρ ∈ Γ ∆ ρ > δ . (29) For e very ρ ∈ Γ , we ha ve D e Γ < D Γ , e Γ  R, { ∆ ρ − δ }  − 2 ε ≤ sup ˜ ρ ∈ e Γ D ρ, ˜ ρ ( R, D ρ ( R ) + ∆ ρ − δ ) − 2 ε ≤ D ρ, ˜ ρ ( R, D ρ ( R ) + ∆ ρ − δ ) − ε, for some ˜ ρ ∈ e Γ . By (29), ∆ ρ − δ > 0 , and hence D ρ, ˜ ρ ( · , D ρ ( R ) + ∆ ρ − δ ) is continuous at R . Therefore, by choosing δ small enoug h, we have D ρ, ˜ ρ ( R, D ρ ( R ) + ∆ ρ − δ ) − ε ≤ D ρ, ˜ ρ ( R − δ, D ρ ( R ) + ∆ ρ − δ ) . Hence for thi s ˜ ρ , Theorem 1 guarantees th e existence of a sequence of source codes { f n } n ≥ 1 such that lim n →∞ 1 n log | f n ( X n ) | ≤ R, lim sup n →∞ E ρ n ( X n , f n ( X n )) ≤ D ρ ( R ) + ∆ ρ , lim inf n →∞ E ˜ ρ n ( X n , f n ( X n )) ≥ D e Γ . This proves part a of the theorem. Part b follows simil arly . 27 I. Proof of Cor ol lary 9 Choose ρ ∈ Γ such that sup ˜ ρ ∈ e Γ D ρ, ˜ ρ ( R, D ρ ( R ) + ∆ ρ ) ≤ D Γ , e Γ  R, { ∆ ρ }  + δ. For any ˜ ρ ∈ e Γ , we ha ve by Theorem 2 E ˜ ρ n ( X n , f n ( X n )) ≤ D ρ, ˜ ρ ( R + , D ρ ( R ) + ∆ ρ ) . (30) Since ∆ ρ > 0 , D ρ, ˜ ρ ( · , D ρ ( R ) + ∆ ρ ) is cont inuous at R . Hence D ρ, ˜ ρ ( R + , D ρ ( R ) + ∆ ρ ) = D ρ, ˜ ρ ( R, D ρ ( R ) + ∆ ρ ) ≤ sup ˜ ρ ∈ e Γ D ρ, ˜ ρ ( R, D ρ ( R ) + ∆ ρ ) ≤ D Γ , e Γ  R, { ∆ ρ }  + δ. This proves part a of the theorem. Part b follows simil arly . J . Pr o of of Pr oposition 10 D ρ 1 ,ρ 3 ( R, D ) = sup Q ∈P ( X ×Y ): Q X = P , I ( Q ) ≤ R, E Q ρ 1 ≤ D E Q ρ 3 ≤ D ρ 1 ,ρ 2 ( R, D ) + sup Q ∈P ( X ×Y ): Q X = P E Q ( ρ 3 − ρ 2 ) ≤ D ρ 1 ,ρ 2 ( R, D ) + E P (sup y ∈Y ρ 3 ( X , y ) − ρ 2 ( X , y )) . And D ρ 1 ,ρ 3 ( R, D ) = sup Q ∈P ( X ×Y ): Q X = P , I ( Q ) ≤ R, E Q ρ 1 ≤ D E Q ρ 3 ≥ D ρ 1 ,ρ 2 ( R, D ) − sup Q ∈P ( X ×Y ): Q X = P E Q | ρ 3 − ρ 2 | ≥ D ρ 1 ,ρ 2 ( R, D ) − E P sup y ∈Y | ρ 3 ( X , y ) − ρ 2 ( X , y ) | . I V . C O N C L U S I O N In this paper , we in vestigated the probl em of source coding with mismatched disto rtion measures. W e deriv ed a singl e-letter characterization D ρ, ˜ ρ ( R, D ρ ) of the best distort ion leve l with respect to ˜ ρ that can be guaranteed for any source code of rate R desi gned for distortion le vel D ρ with respect to ρ . W e also deriv ed a single-letter characterization D ρ, ˜ ρ ( ∞ , D ρ ) of the best di stortion guarantee independent of the rate R of the s ource code. W e then looked at properties of D ρ, ˜ ρ ( R, D ρ ) , characterizing its beha vior for R > R ρ ( D ρ ) and on t he boundary . W e ﬁnally considered the problem of choosing a representative ρ ∈ Γ of ˜ ρ . V . A C K N OW L E D G M E N T The authors would l ike to thank Ram Zamir for helpful discuss ions. 28 R E F E R E N C E S [1] J. Stjern vall. Dominance—a relati on between distortion measures. IEEE T ransactions on Information T heory , 29(6):798–807 , Nov ember 1983. [2] A. Lapidoth. On t he role of mismatch in rate distortion theory . IE EE Tr ansactions on Information Theory , 43(1):38–47, January 1997. [3] H. Y amamoto. A rate-distortion problem for a communication system with a secondary decoder to be hindered. IEEE T ran sactions on Information Theory , 33(4):835–842, July 1988 . [4] I. Csisz ´ ar and J. K ¨ orner . Information Theory: Coding Theor ems for Discrete Memoryless Systems . Academic Press, 1981. [5] A. Dembo and T . W eissman. The minimax distortion redundanc y in noisy source coding. IEEE T ransactions on Information T heory , 49(11):3020 –3030 , Nov ember 2003. [6] N. Jayant, J. Johnston, and R. Safranek. Signal compression based on models of human perception. Proce edings of the IE EE , 81(10):1385 –1422 , October 1993. [7] N. Pappas, R. J. Safranek, and J. Chen. Perceptual criteria f or image quality e v aluation. In A. C. Bovik, editor , Handbook of Imag e and V ideo Proc essing , pages 939–95 9. Academic Press, 2005. [8] M. P . Eckert and A. P . Bradley . Perceptual quality metrics applied to still i mage compression. Signal Pr ocessing , 70:177–200, Nov ember 1998. [9] H. de Ridder . Minko wski-metrics as a combination rule for digital-image-cod ing impairments. In SPIE Confer ence , v olume 1666, pages 16–2 6, August 1992. [10] J. Lubin. T he use of psychoph ysical data and models in the analysis of display system performance. In B. W atson, editor , Digital Imag es and Human V ision , pages 163–178. MIT Press, 1993. [11] A. B . W atson. DCT quantization matrices visually optimized for individ ual images. In SPIE Confer ence , volume 1913 , pages 202–216, February 1993. [12] J. L . Mannos and D. J. Sakrison. The effec ts of a visual ﬁ delity criterion on the encoding of images. IEEE T ransactions on Information Theory , 20(4):525–536, July 1974. [13] R. M. Gray , P . C. Cosman, and K. L. Oehler . Incorporating visual factors into ve ctor quantizers for image compression. In B. W atson, editor , Digital Images and Human V ision , pages 35–52. MIT Press, 1993 . [14] R. G. Gal lager . Information T heory and Reliable Communication . John W iley & Sons, 1968. [15] R. M. Gray . Entr opy and Information Theory . Springer , 1990. [16] D. G. L uenber ger . Optimization by V ector Space Methods . W iley , 1969. [17] I. Csisz ´ ar . On an extremum problem of information theory . Studia Scientiarum Mathematicarum Hungarica , 9:57–71, 1974. [18] P . Bill ingsley . Con verg ence of Prob ability Measur es . W i ley Interscience, second edition, 1999. [19] P . Ishwar and P . Moulin. On the existence and characterization of the maxent distribution under general moment inequality constraints. IEEE T ransa ctions on Information T heory , 51(9):3322–333 3, September 2005.

Source Coding with Mismatched Distortion Measures

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment