The Quadratic Gaussian Rate-Distortion Function for Source Uncorrelated Distortions
We characterize the rate-distortion function for zero-mean stationary Gaussian sources under the MSE fidelity criterion and subject to the additional constraint that the distortion is uncorrelated to the input. The solution is given by two equations …
Authors: Milan S. Derpich, Jan Ostergaard, Graham C. Goodwin
The Quadratic Gaussian Rate-Disto rtion Function for Source Uncorrel ated Distortions Milan S. Derpich, Jan Øster gaard, and Graham C. Goodwin The University of Newcastle, NSW , Australia milan.derpich@studentmail.ne wcastle.edu.au , {jan.ostergaard,g raham.goodwin}@ne wcastle.edu .au Abstract W e c haracterize the rate-distortio n function fo r zero- mean stationary Gau ssian sour ces un- der the MSE fidelity c riterion and subject to the add itional co nstraint th at the distortion is un - correlated to the inp ut. The solu tion is given by two equatio ns coup led thro ugh a sing le scalar parameter . T his has a structure similar to the well known water -filling solution obtained with out the unco rrelated distortion restriction . Our results fu lly chara cterize the unique statistics o f the optimal distortion . W e also show that, for all positive distortio ns, the minimum achiev able rate subject to th e u ncorr elation constraint is strictly larger than that given b y the un -constraine d rate-distortio n fun ction. This gap increases with th e distor tion and tends to infin ity an d zero, respectively , as the distortion tends to zero and infinity . 1 Introd uction Many lossy source coding schemes ha ve the propert y that the end-to -end recons tructio n error is uncorr elated with the source. W e refer to such schemes as uncorr elated distor tion (UD) coders. As an exampl e, consider a typical transfo rm coder , as depicted in Fig. 1. Here, a random vecto r X ∈ R N is first transformed by an analysis transform T ∈ R N × N to yield U = T X . Then U is quanti zed, yieldin g the v ector , ˆ U = Q ( U ) . The input signal is finally approxi mated by Y = ˜ T ˆ U , where ˜ T ∈ R N × N is the synthesis transfor m, cf. [1, 2]. If the quant ization error E , ˆ U − U is P S f r a g r e p l a c e m e n t s X Y U ˆ U Q T ˜ T Figure 1: T ransform coder . uncorr elated to U , and if T ˜ T = I , then it is easy to show that Y − X is uncorrela ted to X , thus yieldi ng a UD coder . More generall y , an y quantizatio n scheme satis fying the fol lo wing two properties constitute s a UD coder : a) The error introdu ced by the quantiz er is un correla ted to its input; b) The linear proces sing (if any ) before and after the quantizer yields perfect reconstruc tion (PR) in the absenc e of quantiza tion errors. Property a) is satisfied in m any cases , e.g. in high-re soluti on coding [3] or when a qu antizer with dither (eith er sub tracti ve [4 ] or non -subtr acti ve [5]) is emp loye d. On the oth er hand, the P R condition (Property b)) is often imposed (sometimes implici tly) in the design of fi lter banks [6], tra nsform coders [1, 2], and fee dback quantiz ers [7, 8]. Thus, any PR sou rce coder using , for examp le, subtracti vely dithered quantizati on, is a UD coder . T he rate-d istortio n performance of any UD coder can be compared to the underly ing Shannon’ s rate-dist ortion function R ( D ) of the source , for a giv en distortio n metric. O ne may quest ion whether such a comparison is, in fact, fair . After all, the additional constra int that the end-to-end distort ion is uncorrelated with the source is not imposed upon R ( D ) . W ith this in mind, let R ⊥ ( D ) denote the rate distort ion func tion with the additio nal cons traint that the end-to -end distortion is unco rrelated to the source . (A formal definitio n of R ⊥ ( D ) is giv en in Section 2). Clearly , R ⊥ ( D ) ≥ R ( D ) . 1 Ho wev er , to the best of the author s’ knowled ge, the problem of chara cterizin g R ⊥ ( D ) has not been formally addre ssed before. Therefore , quest ions such as in w hich cases (if any) R ⊥ ( D ) equals R ( D ) , and how R ⊥ ( D ) can be achie ved, appear to be unanswered. In th is pap er , we not on ly gi ve c onclus i ve answers t o t he ab ov e question s, but more impo rtantly , we complet ely characteriz e R ⊥ ( D ) for the quadratic G aussian case 2 as a lower bound for the rate achie vab le under the uncorre lated distortion constrain t 3 . W e sho w , in Section 2, that R ⊥ ( D ) can be paramete rized through a single scalar va riable α > 0 . This is a resul t whic h paralle ls the con vent ional water -filling equatio ns that describe R ( D ) . W e characterize the unique optimal statistics that the recons tructio n error Y − X needs to h a ve in ord er to ac hie ve R ⊥ ( D ) , for a giv en Gaussian sou rce X . In parti cular , we sho w that Y − X must be Gaussi an. In additio n, we recas t the results in a tra nsform coding sense. More precisely , we show that if the quantizati on errors are Gaussian , indepen dent both mutually and from the source, th en the Karhunen-Loè ve T ran sform (KL T) is optimal amo ng all perfec t recon structi on transforms, at all rates. 4 A comparati ve analy sis between R ⊥ ( D ) and R ( D ) is then pres ented in Section 3. There we sho w that R ⊥ ( D ) is con vex and monot onical ly decreasing in D , and that R ⊥ ( D ) > R ( D ) , ∀ D > 0 , con ver ging in the limit as R → ∞ . Furthermore, we show that R ⊥ ( D ) → 0 ⇔ D → ∞ , which is dif ferent from the well kn o wn result R ( D ) = 0 ⇔ D ≥ σ 2 X . 5 It is w orth emph asizing that our resu lts are not tie d to an y partic ular source codin g architectu re, b ut are genera l in the sense that any coding scheme in which the end-to- end d istorti on is uncorrela ted with the sourc e can do no better than R ⊥ ( D ) . Notation W e use uppercase letters to represent random vec tors, adding a subscrip t when refer - ring to one of its elements, i.e., X i is the i -th element of the random vec tor X . The expec- tation operato r is denoted by E [ · ] . Upper case bold letters are used for matrices . The positi ve- definite square root of a positi ve-de finite matrix M is denoted by √ M . W e write | M | and tr ( M ) for the deter minant and the trace of a matrix M , respe cti vely . The pr obability density functi on (PD F) and cov ariance matrix of a rando m (column) vector X are denoted respecti vely by f X and K X , E [ X X T ] , where X T is the transpos e of X . W e write K X,Y for the cross- cov ariance matrix between two ran dom vecto rs X and Y . The spe ctrum of a w .s.s. random proces s Z with autoco rrelati on function R Z [ k ] , E [ Z i Z i + k ] is denoted by S Z ( ω ) , P ∞ k = −∞ R Z [ k ] e − j k ω , ∀ ω ∈ [ − π , π ] . The diffe rential entropy and the diff erentia l entrop y per dimensio n of an N -length random vec tor X are deno ted, respect i vely , by h ( X ) and ¯ h ( X ) , 1 N h ( X ) . When X is a rando m proces s, ¯ h ( X ) , lim N →∞ 1 N h ( X 1 , X 2 , . . . , X N ) denote s the dif ferent ial entropy rate of X . W e use I ( X ; Y ) and ¯ I ( X ; Y ) , 1 N I ( X ; Y ) to refer , res pecti vely , to the mutual information and the mu- 1 Enforcing additional constraints can nev er increase the achiev able r ate region for a gi ven D . Thus, since t he achie v- able rate region is lo wer bound ed by R ⊥ ( D ) we must hav e R ⊥ ( D ) ≥ R ( D ) . 2 By q uadratic Gaussian we refer to the case of Gaussian sou rces with the MSE fidelity criterion. Moreover , we restrict our attention to zero-mean sources. 3 A proof of achiev ability has recently been deri ved by the authors in [9]. 4 The o ptimality of a KL T has previously been established by a number of authors in a v ariety of settings, cf. [1, 10, 11]. Ho we ver , this appears to be the fi rst time that this result is prov en exp licitly for R ⊥ ( D ) . 5 Notice that, in the case of a va nishingly small positi ve cod ing rate, we cannot simply recon struct the source using its statistical mean (as we would do in a tr aditional water-filling solution to R ( D ) ) since this will lead to linear distortion, clearly correlated to X (since it is a linear function of X ). tual informat ion per dimension between two r andom vec tors X and Y . When X and Y are random proces ses , ¯ I ( X ; Y ) , lim N →∞ 1 N I ( X 1 , . . . , X N ; Y 1 , . . . , Y N ) denotes the mutua l infor mation rate between X and Y . W e write a.e. for “almost ev erywher e”. 2 Rate-Distortion Function with Uncorr elated Distortion W e be gin by fo rmalizing the definition of th e quadra tic rate-distor tion function under the const raint that the en d-to-e nd distortion be unc orrelat ed w ith th e source . Then, in Sec tion 2.1, we ch aracteri ze this functi on for Gaussian random vecto rs, deferrin g the case of Gaussian stationary processes to Section 2.2. Definition 1. The uncor r elated quadra tic ra te-disto rtion function R ⊥ ( D ) for a random vector (sour ce) X ∈ R N is defined as R ⊥ ( D ) , min Y : E [ X ( Y − X ) T ]= 0 , 1 N tr ( K Y − X ) ≤ D, | K X − Y | > 0 ¯ I ( X ; Y ) , (1) wher e Y is an N -length ran dom vector . 2.1 R ⊥ ( D ) f or Gaussian Random V ector Source s W e now present one of the main results of this paper , namely that, for Gaus sian vector sources, R ⊥ ( D ) is giv en by two equations link ed through a single scalar par ameter . This res embles the “water -filling” equatio ns that describe R ( D ) . The proo f of this result, w hich is presente d in T heo- rem 1, makes use of th e follo wing lemma. Lemma 1. L et X ∈ R N ∼ N ( 0 , K X ) . Let Z ∈ R N and Z G ∈ R N be two r andom vectors with zer o mean and the same cov arianc e matrix, i.e., K Z = K Z G , an d having the same cr oss- cov arianc e matrix with r espect to X , that is, K X,Z = K X,Z G . If Z G and X ar e jointly Gaussian , and if Z has any distrib ution, then I ( X ; X + Z ) ≥ I ( X ; X + Z G ) . (2) If furthermor e | K X + Z | = | K X + Z G | > 0 , then equality is achie ved in (2) iff Z ∼ N ( 0 , K Z ) with Z and X being jointly Gaussian. Pr oof. Define Y , X + Z and Y G , X + Z G . T hen I ( X ; X + Z ) − I ( X ; X + Z G ) = h ( X | Y G ) − h ( X | Y ) = h ( Z G | Y G ) − h ( Z | Y ) = − Z Z f Z G ,Y G ( z , y ) log( f Z G | Y G ( z | y )) d z d y + Z Z f Z ,Y ( z , y ) log( f Z | Y ( z | y )) d z d y ( a ) = − Z Z f Z ,Y ( z , y ) log( f Z G | Y G ( z | y )) d z d y + Z Z f Z ,Y ( z , y ) log ( f Z | Y ( z | y )) d z d y = Z f Y ( y ) Z f Z | Y ( z | y ) log f Z | Y ( z | y ) f Z G | Y G ( z | y ) d z d y = Z f Y ( y ) D ( f Z | Y = y k f Z G | Y G = y ) d y ≥ 0 , (3) where D ( f k g ) is the relati ve entrop y (or Ku llbac k-Leibler distance ) between the two probabil- ity density functions f and g . The equali ty ( a ) follo ws from the fact that log( f Z G | Y G ( z | y )) is a quadra tic fo rm of z and y , and from the f act that K Z,Y = K Z G ,Y G . The inequality fo llo w s fro m the fact that D ( f k g ) ≥ 0 , with equalit y iff f = g . Thus, equality is achiev ed if f f Z G | Y G = y = f Z | Y = y for all y such that f Y ( y ) > 0 . The proof is completed by noting that | K X + Z | = | K X + Z G | > 0 implies f Y ( y ) > 0 for all y ∈ R N . Remark 1. W e not e that the ab ove Lemma g ener alizes Lemma II.2 i n [12 ], by r elaxin g the r equir e- ment that f Z | X = f Z and f Z G | X = f Z G , to the r equir ement K X,Z = K X,Z G . W e are no w in a position to present the main result of this section: Theor em 1. Let the so ur ce X ∈ R N be a ze r o mean Gauss ian ran dom vector wit h pos itive-d efinite cov arianc e matrix K X , havin g eigen value s { λ k } N k =1 . T hen (i) F or any positiv e D , R ⊥ ( D ) = 1 N X N k =1 log √ λ k + α + √ λ k √ α , (4) wher e the scalar paramete r α ∈ R + is suc h that D = 1 2 N X N k =1 q λ 2 k + λ k α − λ k , ∀ D > 0 . (5) (ii) F or eac h D > 0 , the value of α that satisfies (5) is unique . (iii) L et Y satisfy K X ( Y − X ) = 0 , tr ( K Y − X ) / N ≤ D and | K Y − X | > 0 . T hen ¯ I ( X ; Y ) = R ⊥ ( D ) iff Z , ( Y − X ) ∼ N ( 0 , K Z ⋆ ) , with K Z ⋆ , 1 2 q K 2 X + α K X − 1 2 K X , (6) and wher e α satisfies (5) . Pr oof. Let U denote the set of all N -len gth random vect ors uncorre lated to X , and define the sets G D ⊂ B D ⊂ U as G D , { Z ∈ B D : Z ∼ N ( 0 , K Z ) } ; B D , { Z ∈ U : tr ( K Z ) / N ≤ D , | K Z | > 0 } . (7) W ith the abov e definitions, (1) can be written as R ⊥ ( D ) = min Z ∈B D ¯ I ( X ; X + Z ) ( a ) = min Z ∈G D ¯ I ( X ; X + Z ) ( b ) = min Z ∈G D ¯ h ( X + Z ) − ¯ h ( Z ) = 1 2 N min Z ∈G D { log | K X + K Z | − log | K Z |} = 1 2 N min Z ∈G D log K − 1 Z K X + I , (8) where ( a ) follo ws directly from L emma 1 and where ( b ) holds since the definition of B D (see (7)), guaran tees that both ¯ h ( X + Z ) and ¯ h ( Z ) exist . W e no w prov e, by contr adictio n, that the minimizer of lo g | K − 1 Z K X + I | in G D , namely Z ⋆ , is such th at tr ( K Z ⋆ ) / N = D . For thi s purpo se, suppose that b , N D/ tr ( K Z ⋆ ) > 1 , and l et { ζ k } N k =1 be the eigen va lues of K − 1 Z ⋆ K X . Let Z ′ ∈ G D be a Gaussian random vector with cov arianc e matrix K Z ′ = b K Z ⋆ . W e then ha ve that tr ( K Z ′ ) / N = D , and that log K − 1 Z ⋆ K X + I = N X k =1 log ( ζ k + 1) > N X k =1 log ζ k b + 1 = log K − 1 Z ′ K X + I , (9) since b > 1 and because log( · ) is a strict ly increasin g function. Thus Z ⋆ canno t be a minimizer of log K − 1 Z K X + I in G D unless tr ( K Z ⋆ ) = N D . The minimizer of log( | K X + K X | / | K Z | ) subject to tr ( K Z ) / N = D can be found using a v ariatio nal approach. More pre cisely , the cov ariance matrix of t he mini mizer , K Z ⋆ , must n ecessar - ily be such that the deri vati ve of the L agrang ian L ( K Z ) , log | K X + K Z | − log | K Z | + β tr ( K Z ) (10) with respec t to K Z is zero at K Z = K Z ⋆ , for some β ∈ R , which is equiv alent to the conditi on that the matrix dif ferenti al ∂ L ( K Z ) = 0 , ∀ ∂ K Z . U sing the fact that ∂ log | M | = tr ( M − 1 ∂ M ) , for an y po siti ve definite matrix M , the neces sary conditio n for Z ⋆ to be a m inimizer ta kes the for m ∂ L ( K Z ) K Z = K Z ⋆ = tr ( K X + K Z ⋆ ) − 1 ∂ K Z − tr K − 1 Z ⋆ ∂ K Z + β tr ( ∂ K Z ) = 0 , ∀ ∂ K Z ⇐ ⇒ tr ( K X + K Z ⋆ ) − 1 − K − 1 Z ⋆ + β I ∂ K Z = 0 , ∀ ∂ K Z ⇐ ⇒ ( K X + K Z ⋆ ) − 1 − K − 1 Z ⋆ + β I = 0 ⇐ ⇒ K Z ⋆ = ± 1 2 r K 2 X + 4 β K X − 1 2 K X . (11) The fact that K ⋆ Z needs to be positi ve definite implies that β > 0 and that it is infeasible to hav e a neg ati ve sign before the square root in (11) . This, together w ith the chang e of variab le α , 4 /β , leads directly to (6), with α > 0 . On the other hand, the v alue of α must be such that the equality constr aint tr ( K Z ⋆ ) / N = D is satisfied. From (11) , and applyi ng L emma 2 (see appendix), this requir ement is equiv alent to D = 1 N tr ( K Z ⋆ ) = 1 2 N P N k =1 q λ 2 k + λ k α − λ k , which prov es (5). Similarly , (4) is obtained by substituti ng (11) into (8) and then appl ying L emma 2, which yield s: R ⊥ ( D ) = 1 2 N log K − 1 Z ( K X + K Z ⋆ ) = 1 2 N N X k =1 log q λ 2 k + λ k α + λ k q λ 2 k + λ k α − λ k = 1 2 N N X k =1 log q λ 2 k + λ k α + λ k 2 . λ k α ! = 1 N N X k =1 log √ λ k + α + √ λ k √ α . The uniquen ess of α is easily v erified by noting that the right hand side of (5 ) is monoton ically increa sing with α . Since α = 1 /β is unique , it follo ws from (11) tha t the cov ariance matrix of Z ⋆ = arg min Z ∈B D ¯ I ( X ; X + Z ) is unique 6 , comple ting the proof. T ransfo rm Coding Realization of R ⊥ ( D ) : Closer examin ation of L emma 2, when used in (6), sugge sts that, for a Gauss ian source X , R ⊥ ( D ) ca n be achi e ved by the transfo rm codin g archi- tecture sho wn in Fig. 1. More precise ly , an end-to-end distortion ha ving the optimal cov ariance matrix K Z ⋆ gi ve n by (6) is obtained by choosing the transform T such that T Λ T T = K X , where Λ , diag ( λ 1 , . . . , λ N ) (i.e., the KL T transform for X ), and by havin g a G aussian random vector of qu antizat ion error s E with E [ E E T ] = 1 2 N diag ( { q λ 2 k + λ k α − λ k } N k =1 ) . 7 Interes tingly , here E [ E E T ] is not a scaled identi ty matrix, as is usually the case in KL T transform coding, cf. [13]. This discrepa ncy arise s from the approximation E [ E 2 k ] = c E [ U 2 k ]2 − 2 b k , commonly used to link the 6 This is i n agreement with the fact that log( | K X + K Z | / | K Z | ) is strictly con vex in K Z for | K X | > 0 , as sho wn in [12, Lemma II.3], together with the fact that the set { K Z : tr ( K Z ) ≤ D, | K Z | > 0 } is con vex. 7 It is easy to show that these noise variances, namely σ 2 E k , are such that t he deriv atives ∂ I ( ˆ U k ; V k ) /∂ σ 2 E k are the same for all k . v arianc e of E k to the bit-rat e b k at which each k -th transform coefficien t is quantized. In this ex- pressi on, c > 0 is a constant that depends on the PD F of the source and on the type of quantize r . The w ell known optimal bit allocation analyzed , e.g., in [13], is based upon this formula, and thus minimizes the total bit-rate r , 1 2 N P N k =1 log 2 ( E [ U 2 k ] / E [ E 2 k ]) − 1 2 log 2 ( c ) . On the other hand, the optimal quantiz ation errors E k implied by T heorem 1 need to be Gaussian, their vari ances being such that the end-to- end mutual information ¯ I ( X ; Y ) = 1 2 N P N k =1 log 2 ( E [ U 2 k ] / E [ E 2 k ] + 1) is mini- mized. 8 Thus, the diffe rence in the optimal value s for { E [ E 2 k ] } N k =1 obtain ed in each case is due to the fact that r 6 = ¯ I ( X ; Y ) . 2.2 R ⊥ ( D ) f or Gaussian Stationary Random Pr ocesses The R ⊥ ( D ) function defined in (1) can be extend ed to random processe s as follows: Definition 2. T he uncorr elated quad rati c rate -distor tion functio n R ⊥ ( D ) for a random pr ocess X is defin ed as R ⊥ ( D ) = min Y : E [ X ( Y − X ) T ]= 0 , l im N →∞ 1 N tr ( K Y − X ) ≤ D, lim N →∞ | K Y − X | 1 / N > 0 ¯ I ( X ; Y ) , (12) wher e Y is a rando m pr ocess. The R ⊥ ( D ) function for stationary G aussian rando m processes can be deriv ed from the results obtain ed in Section 2.1, by restricti ng to random v ectors X ∈ R N ha ving a T oeplitz cov ariance matrix, and then letting N → ∞ . More precisely , we ha ve the follo wing resu lt: Theor em 2 . Let the sour ce X be a Gaussian stat ionary r andom pr ocess with spec trum S X ( ω ) such that S X ( ω ) > 0 , a.e . on [ − π , π ] . T hen (i) F or any D > 0 , R ⊥ ( D ) = 1 2 π Z π − π log p S X ( ω ) + α + p S X ( ω ) √ α ! dω , (13) wher e the scalar paramete r α ∈ R + is such that D = 1 4 π Z π − π p S X ( ω ) + α − p S X ( ω ) p S X ( ω ) dω . (14) (ii) F or eac h D > 0 , the value of α that satisfies (14) is unique . (iii) L et Y satisfy E [ X ( Y − X ) T ] = 0 , 1 N lim N →∞ tr ( K Y − X ) ≤ D , lim N →∞ | K Y − X | 1 N > 0 . Then ¯ I ( X ; Y ) = R ⊥ ( D ) if f Z , Y − X is a Gauss ian stati onary ra ndom pr ocess with spectr um S ⋆ Z ( ω ) , 1 2 p S X ( ω ) + α − p S X ( ω ) p S X ( ω ) , a.e. on [ − π , π ] . (15 ) Pr oof. Define, from the rando m processes X and Y , the vectors X ( N ) , [ X 1 · · · X N ] T , Y ( N ) , [ Y 1 · · · Y N ] T , N ∈ N . It is kno wn that ˘ λ ( N ) ≥ ˘ λ ( N +1) , ∀ N ∈ N , where ˘ λ ( N ) and ˘ λ ( N +1) are the smallest eigen va lues of K X ( N ) and K X ( N +1) , respecti vely (see e.g. [14, T heorem 4.3.8]). This 8 The mutual information per dimension between X and Y in this case equals the sum of the mutual informations between each pair U k , E k , since all these scalars are mutually independent and T is i n vertible. result, together with Lemma 3, in the Appendix, and the fact that S X ( ω ) > 0 , a.e. on [ − π , π ] , implies that | K X ( N ) | > 0 , for all N ∈ N . W e can then appl y Theorem 1 to each X ( N ) , ∀ N ∈ N , obtain ing R ⊥ ( N ) ( D ) , 1 N N X k =1 log q λ ( N ) k + α ( N ) + q λ ( N ) k ! √ α N ! , ∀ N ∈ N , (16) where α ( N ) satisfies D = 1 2 N X N k =1 r h λ ( N ) k i 2 + λ ( N ) k α ( N ) − λ ( N ) k , ∀ N ∈ N , (17) and where { λ ( N ) k } N k =1 denote s the set of eigen value s of K X ( N ) . From (14), the optimal disto rtion cov ariance matrix for X ( N ) is K ( N ) Z ⋆ , 1 2 q K 2 X + α K X − 1 2 K X . (18) Direct applic ation of Lemma 3 (see the A ppend ix) to (16) and (17) yields R ⊥ ( D ) = lim N →∞ R ⊥ ( N ) ( D ) = 1 2 π Z π − π log p S X ( ω ) + α + p S X ( ω ) √ α ! dω , (19) wherein α , lim N →∞ α ( N ) is the only scalar that satisfies D = 1 4 π Z π − π p S X ( ω ) + α − p S X ( ω ) p S X ( ω ) . (20) Similarly , applying L emma 3 to (18) we obtai n (15). This completes the proof. Remark 2. It is inter esting to note that the equat ions cha rac terizin g the optimal UD feedbac k con verte rs deri ved in [8] achie ve an end-to-end distortion whose spectrum is given pr ecisely by (1 5) . Furthermor e, it is easy to show tha t such co n verter s would ach ieve the R ⊥ ( D ) function if the noise due to the scalar qu antizat ion within the feedbac k loop wer e white Gaussian no ise uncorr elated with the input . 3 Comparison with R ( D ) The next theorem sho w s that R ⊥ ( D ) share s strict monotonicity and con vex ity with R ( D ) , b ut de viates from R ( D ) in the asymptot ic limit of large distort ions. Theor em 3. F or any Gau ssian r andom vector (stati onary rand om pr ocess) X with positiv e definite cov arianc e matrix K X (with S X ( ω ) > 0 , a.e. on [ − π , π ] ), the function R ⊥ ( D ) is monotonica lly decr easing and con vex . In addition , D → ∞ ⇐ ⇒ R ⊥ → 0 , and D → 0 ⇐ ⇒ R ⊥ → ∞ . Pr oof. W e pre sent here only the proof for the case of Gaussian rando m vecto rs. T he pro of for Gaussian stationary processes proceed s in an analo gous fashion . Monotonicit y: W e hav e that ∂ R ⊥ ∂ D = ∂ R ⊥ ∂ α ∂ D ∂ α , pro vided that ∂ R ⊥ ∂ α and ∂ D ∂ α exi st and that the latter deri v ati ve is non-z ero. From (4), we obtain ∂ R ⊥ ∂ α = 1 N X N k =1 ∂ ∂ α log p λ k + α + p λ k − 1 2 log( α ) = 1 2 N X N k =1 √ λ k + α − √ λ k α √ λ k + α − 1 α = − 1 2 N α X N k =1 √ λ k √ λ k + α . (21) On the othe r hand, from (5), ∂ D ∂ α = 1 2 N N X k =1 ∂ ∂ α q λ 2 k + λ k α − λ k = 1 4 N N X k =1 λ k q λ 2 k + λ k α = 1 4 N N X k =1 √ λ k √ λ k + α , and thus ∂ R ⊥ ( D ) ∂ D = − 2 α , ∀ D > 0 , (22) pro ving that R ⊥ ( · ) is a st rictly decre asing functio n (since α > 0 ). Con vexi ty: The fact t hat α gro ws monoton ically with increa sing D , together with (22), imply that ∂ R ⊥ ( D ) ∂ D D = D 1 > ∂ R ⊥ ( D ) ∂ D D = D 2 ⇐ ⇒ D 1 > D 2 , and thus R ⊥ ( · ) is con vex. Limits: It is clear from (4) that lim α →∞ R ⊥ = 0 and lim α → 0 R ⊥ = ∞ . Since, as can be seen from (21), R ⊥ decrea ses monotonical ly with increa sing α , ∀ α ∈ (0 , ∞ ) , it follows that R ⊥ → 0 ⇐ ⇒ α → ∞ , and that R ⊥ → ∞ ⇐ ⇒ α → 0 . Similarly , it follows from (5) and the m onoton icity of D with respect to α that, for fixed { λ k } N k =1 , D → ∞ ⇐ ⇒ α → ∞ and D → 0 ⇐ ⇒ α → 0 . W e then ha ve that D → ∞ ⇐ ⇒ R ⊥ → 0 and D → 0 ⇐ ⇒ R ⊥ → ∞ , comple ting the proof. W e next sho w that R ⊥ ( D ) > R ( D ) for all D > 0 , con ver ging asymptoticall y as D → 0 . Theor em 4. F or any Gau ssian r andom vector (stati onary rand om pr ocess) X with positiv e definite cov arianc e matrix K X (with S X ( ω ) > 0 , a.e. on [ − π , π ] ), the followin g holds (i) R ⊥ ( D ) − R ( D ) > 0 , ∀ D > 0; (ii) lim D → 0 h R ⊥ ( D ) − R ( D ) i = 0 . (23) Pr oof. W e pre sent here only the proof for the case of Gaussian rando m vecto rs. T he pro of for Gaussian stationary pro cesses proceeds in an analogous fashion. Recal l that for a Gaussian ran- dom vector X ha ving positi ve cov arianc e m atrix with eigen v alues { λ k } N k =1 one has that R ( D ) ≥ 1 2 N P N k =1 log( λ k /D ) , with equalit y iff D ≤ min k { λ k } N k =1 . A s a conseq uence , R ⊥ ( D ) − R ( D ) ≤ R ⊥ ( D ) − 1 2 N N X k =1 log( λ k /D ) (24) ( a ) = 1 2 N N X k =1 log " √ λ k + α + √ λ k 2 α 1 2 N λ k N X i =1 q λ 2 i + λ i α − λ i !# = 1 N N X k =1 log √ λ k + α + √ λ k √ λ k + 1 2 log 1 2 N N X i =1 λ i q λ 2 i + λ i α + λ i . (25 ) Equality ( a ) is obtained by substituti ng (4 ) and (5) into the right hand side of (24). T he va lidity of (23)- ( ii ) then follo ws directly by taking the limit of the right hand side of (25 ) as α → 0 . In order to p rov e (23)- ( i ) , we wil l s ho w that D ⊥ ( R ) > D ( R ) , where the functio n D ⊥ ( · ) is the in verse of R ⊥ ( · ) and D ( R ) is Shannon’ s distortion -rate fun ction. For this purpose, consider the random vec tor Y ′ , W ( X + Z ⋆ ) , where W ∈ R N × N is a symmetric, positi ve-de finite matrix, and Z ⋆ is as in Theorem 1- ( iii ) . Notice that Y ′ − X and X are not uncorre lated unless W = I . The mutual informat ion per dimension between Y ′ and X is giv en by ¯ I ( X ; Y ′ ) = ¯ h ( W Y ) − ¯ h ( W Y | X ) = ¯ h ( W Y ) − ¯ h ( W X + W Z ⋆ | X ) = ¯ h ( W Y ) − ¯ h ( W Z ⋆ ) = ¯ h ( Y ) − ¯ h ( Z ⋆ ) = ¯ I ( X ; Y ) . Thus, for any positi ve definite W , ¯ I ( X ; Y ′ ) = R ⊥ ( D ) . W e ne xt show that for any D (and corresp onding K Z ⋆ ), choos ing an optimal matrix W yields a Y ′ whose distorti on 1 N tr ( K Y ′ − X ) is strictly smaller than D . It is easy to show that tr ( K Y ′ − X ) is m inimized by choosing W = W ⋆ , where W ⋆ , ( K X + K Z ⋆ ) − 1 K X is the W iener filter (matrix ) for X + Z ⋆ . From this equa tion, we define the functi on D ′ ( R ) , 1 N tr ( K W ⋆ ( X + Z ⋆ ) − X ) , desc ribing the distortion associated with Y ′ = W ⋆ ( X + Z ⋆ ) , with the cov arianc e matrix of Z ⋆ as in (6) when R ⊥ = R . Since D ( R ) ≤ D ′ ( R ) , w e obtain, from applyi ng Lemm a 2, and afte r some algebrai c m anipula tion, that D ⊥ ( R ) /D ( R ) ≥ D ⊥ ( R ) /D ′ ( R ) = tr [ K Z ⋆ ] / tr [( K X + K Z ⋆ ) − 1 K Z ⋆ K X ] = 1 2 X N k =1 q λ 2 k + λ k α − λ k α . X N k =1 q λ 2 k + λ k α − λ k 2 > 1 , (26) where the last inequality follows from the fact that α/ 2 > q λ 2 k + λ k α − λ k , ∀ α > 0 . Finally , the fac t that bot h R ( D ) and R ⊥ ( D ) are m onoton ically decreasing functions, together with (26), implies (23)- ( i ) . T his comple tes the proof. The summation term in (25) describes exactly the rate loss R L ( D ) , R ⊥ ( D ) − R ( D ) for all D ≤ m in k { λ k } N k =1 . For D within this range, it can be shown, using Chebysh e v’ s sum inequality , that ∂ R L ( D ) /∂ α > 0 , which implies that R L ( D ) increases monoton ically with incre asing D . On the other hand, for all D > 0 , the ratio D ⊥ ( R ) /D ( R ) can be lower bounded by (26). It can be sho wn that this bound increases with α (and thus with D as well), tendi ng to ∞ as α → ∞ , w hich is in agreemen t with T heorem 3. 4 Concludin g Rema rks In th is wo rk we ha ve co mpletely charact erized R ⊥ ( D ) , the quad ratic Gaussia n rate-disto rtion func- tion sub ject to the con straint that the en d-to-e nd distorti on be unco rrelate d w ith the source. W e ha ve furthe r prov ed that this function shares con vexit y and m onoton icity with Shannon’ s rate-distorti on functi on R ( D ) , but R ⊥ ( D ) is positi vely bounded away from the latter , con ver ging to R ( D ) only in the limit as the distorti on tends to zero. W e sho wed that the unco rrelati on constraint causes the distor tion to unbounded ly gro w as the rate tends to zero. W e also discussed the achie v ability of R ⊥ ( D ) for random vectors and stationary random processe s through transform coding and feed- back quan tization architecture s. 5 A ppendix Lemma 2 (Adapted from Corollar y 11.1.2 in [15]) . Let A = Q diag ( λ 1 , . . . , λ N ) Q − 1 = X N k =1 λ k Q : ,k Q − 1 k , : , with Q ∈ C N × N , and w her e Q : ,k and Q − 1 k , : denote the k -th c olumn a nd the k -th r ow of Q and Q − 1 , r espectively . If f ( · ) is analytic in a neighbour hood ar ound each λ k , for k = 1 , . . . , N , then f ( A ) = Q diag ( f ( λ 1 ) , . . . , f ( λ N )) Q − 1 = X N k =1 f ( λ k ) Q : ,k Q − 1 k , : . (27) Lemma 3 ( T heorem 4.5.2 in [16] ) . Let A ∞ be an infinite T oeplitz matrix with entry a k ∈ R on the k -th diagon al. Then the eigen values of A ∞ ar e contained in the interval m ≤ λ ≤ M , wher e m and M denote the essential infimum and supr emum, res pective ly , of the functio n f ( ω ) , P ∞ k = −∞ a k e − j k ω . Mor eover , if both m and M ar e finite and G ( λ ) is any continuo us functio n of λ ∈ [ m, M ] , then lim N →∞ 1 N X N k =1 G ( λ ( N ) k ) = 1 2 π Z π − π G [ f ( ω )] dω , (28) wher e the λ ( N ) ar e the eigen value s of the sub-matrix A ( N ) ∈ R N × N of A ∞ center ed about the main diag onal of A ∞ . Refer ences [1] A. Gersho and R. M . Gray , V ector quantiza tion and signal compr ession . Kluwer Academic Publishe rs, 2001. [2] V . K. Goyal, “Theoretica l foundations of transform coding, ” IEE E Signal Pr ocessing Maga- zine , v ol. 18, no. 5, pp. 9 – 21, September 2001. [3] D. Marco and D. L . Neuhof f, “The va lidity of the additi ve noise model for unif orm scalar quanti zers, ” IEEE T rans . Inform. Theory , vo l. 51, no. 5, pp. 1739–175 5, May 2005. [4] R. Zamir and M. Feder , “On lattice quantizatio n noise, ” IEE E T rans. Inform. Theory , v ol. 42, no. 4, pp. 1152–1 159, July 1996. [5] R. M. Gray and T . G. Stockha m, “Dithered quantizer s, ” IEEE T ra ns. Inform. T heory , vol. 39, no. 3, pp. 805–81 2, M ay 1993. [6] P . P . V aidyanath an, Multira te Systems and F ilter Banks . Engle wood Clif fs, Ne w Jerse y: Prentice- Hall, 1993. [7] S. R. N orswor thy , R. Schreier , and G. C. T emes, Eds., Delta –Sigma Data Co n verter s: Theory , Design and Simulat ion . Piscata way , N.J.: IEEE Press, 1997. [8] M. S. Derpich, E. S ilv a, D. E. Quev edo, and G . C. Goodwin, “On optimal perfect reconstruc - tion feedb ack quantizers , ” submitted to IEEE Tr ans. Signal Proc.. [9] M. Der pich, J. Øster gaard, and D. Que ved o, “ Achie ving the quadr atic Gaussian rate- distor tion function for source uncorrelat ed distortion s, ” submitt ed to IS IT 2008 (av ailable at http:// arXi v .org). [10] J. J. Y . Huang and P . M. Schult heiss, “Block quanti zation of correla ted Gaussian rand om v ariabl es, ” IEEE T rans . Commun. Syst. , vol . COM-11, pp. 289 – 296, September 1963. [11] V . K. Go yal, J. Zhuang , and M. V etterli, “T ransform coding with back ward adapti ve updates, ” IEEE T rans. In form. Theory , no. 4, pp. 1623 – 1633, July 2000. [12] S . N . Digga vi and T . M. Cov er , “The worst additi ve nois e und er a cov ariance co nstrai nt, ” IEEE T rans . Inform. Theory , vo l. 47, pp. 3072–3081 , 2001 . [13] N . Jayant and P . Noll, Digital coding of waveforms. Princip les and appr oache s to speec h and video. Engle wood C lif fs, NJ: Prentice Hall, 1984. [14] R . A. H orn and C. R. Johnso n, Matrix Analysis . Cambridge, UK: Cambridge U ni ver sity Press, 1985 . [15] G . H. Golub and C. F . V an Loan, M atrix computat ions , 3rd ed. Baltimore, Marylan d: T he John Hopkin s U ni ver sity Press, 1996. [16] T . Berger , Rate distortio n theory: a mathematic al basis for data compr ession . Engle wood Clif fs, N.J.: Prentice-Hall, 1971.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment