Communication Requirements for Generating Correlated Random Variables
Two familiar notions of correlation are rediscovered as extreme operating points for simulating a discrete memoryless channel, in which a channel output is generated based only on a description of the channel input. Wyner's "common information" coinc…
Authors: Paul Cuff (Stanford University)
Communication Requirements for Generating Correlate d Random V ariable s Paul Cuff Departmen t of E lectrical En gineerin g Stanford Uni versity E-mail: p cuff@stanford.edu Abstract — T wo familiar notions of correlation are re- discov ered as extreme operating points for simul ating a discrete m emoryless channel, in which a channel output is generated based only on a descript ion of th e chann el input. W yner’ s “common information” coincides with the minimum description rate needed. Ho wev er , wh en common randomness independ ent of the input is av ailable, the necessary description rate reduces to Shannon’s mutual informa tion. T his work characterizes the optimal tradeoff between the a mount of com mon randomness used and the required r ate of description. I . I N T R O D U C T I O N What is the intrin sic co nnection betwe en co rrelated random variables? How much inter action is necessary to create co rrelation ? Many f ruitful efforts have been made to q uantify correlation between two rand om v ariables. Each q uantity is justified by the ope rational questions that it answers. Cov ariance dictates the m ean squa red error in lin ear esti- mation. Shannon ’ s mu tual informatio n is the descriptive sa vings from side informa tion in lossless sour ce cod ing and the addition al growth ra te of w ealth d ue to side informa tion in investing. G ´ acs and K ¨ orner’ s co mmon informa tion [1] is the num ber of comm on ran dom b its that can be extracted f rom correlated random variables. It is less than mutual informatio n. W yner’ s co mmon informa tion [2] is the num ber of comm on ran dom b its needed to genera te co rrelated r andom variables and is greater than mutu al inform ation. This work provid es a f resh look at two of these qua n- tities — mutu al infor mation and W yner’ s common in - formation (herein simply “common info rmation”) . Both are extreme points of the c hannel simulatio n pro blem, introdu ced a s follows: An observer ( enc oder ) of an i.i.d. source X 1 , X 2 , ... describes the sequence to a distant ran- dom number generato r ( deco der ) that produces Y 1 , Y 2 , ... (see Fig ure 1 ). What is the minimum rate of descrip tion needed to ach iev e a joint distribution th at is statistically indistinguishab le ( as measured by to tal variation) fro m the distribution induced by putting th e sour ce th roug h a memory less channel? Channel simulatio n is a form of rando m n umber generation . The variables X n come from a n external P S f r a g r e p l a c e m e n t s X n F n nR bits G n Y n Channel Simula tor: q ( y | x ) Fig. 1. A discret e-memoryless channel is simulated by two separate processors, F and G . The first processor , F , observe s X and the second processor , G , generates Y after recei ving a message at rate R from F . The minimum rate needed is the common entropy of X and Y . source and Y n are gener ated to b e correlated with X n . Th e channel simulation is successful if the total variation b etween the r esulting distribution of ( X n , Y n ) and the i.i. d. distribution tha t would result fro m passing X n throug h a memoryless channel is small. This is a stron g requir ement. It’ s stricter th an the requirem ent that ( X n , Y n ) be jo intly typ ical as in the coor dinated action work of Cover and Permuter [3]. This to tal variation requirement mean s that any hyp othesis test that a statistician comes up with to determ ine whether X n was passed throu gh a r eal memor yless cha nnel or the channel simu lator will be v irtually useless. W yner’ s r esult implies tha t in ord er to generate X n and Y n separately as a n i.i.d. sour ce p air they must share bits a t a rate o f at least the co mmon informa - tion C ( X ; Y ) of the joint distribution. In the ch annel simulation p roblem these shared bits come in the form of th e description of X n . 1 Howe ver , the “reverse Shan - non theor em” of Benn ett and Shor [4] suggests th at a description rate of th e m utual inform ation I ( X ; Y ) of the joint d istribution is all that is needed to successfully simulate a channe l. How can we reso lve this appar ent contradictio n? The work of Benn ett and Shor assumes th at common random b its, or co mmon rando mness , independ ent of the source X n are av ailable to th e encoder and decode r . In that setting, the common r andom ness provides a second co nnection b etween the source X n and output 1 T o achie ve channel simulation with a rate as low as the common informati on one must change W yne r’ s relati ve entrop y requirement in [2] to a total variat ion requirement as used in this work. P S f r a g r e p l a c e m e n t s X n F n nR 1 bits G n Y n Channel Simulator: q ( y | x ) nR 2 bits Fig. 2. A discret e-memoryless channel is simulated by two separate processors, F and G . The first processor , F , ob serves X and common randomness independent of X a t rate R 2 . The second processor , G , generat es Y based on the common randomness and a message at rate R 1 from F . Y n , in addition to the descriptio n of X n . Remarkably , ev en though it is in depend ent from the sou rce X n , the com mon random ness assists in generatin g co rrelated random n umbers an d a llows for d escription rates smaller than the comm on inform ation C ( X ; Y ) . In this work, we char acterize the tradeo ff between the rate o f av ailable com mon rand omness and th e re quired description r ate for simulating a discrete m emoryless channel for a fixed in put distribution, as in Figure 2. Indeed , the tradeo ff re gion of Section III co nfirms the two extreme cases. I f the en coder and decoder are provided with enoug h common ran domne ss, sendin g I ( X ; Y ) bits per symbol suffi ces. On the oth er ha nd, in the absence o f common randomness on e must spen d C ( X ; Y ) bits per symb ol. This result has implications in cooperative game the- ory , reminiscent o f the framework investigated in [ 5]. Suppose a team shares the same p ayoff in a rep eated game setting. An oppon ent tries to anticip ate and exploit patterns in the team ’ s com bined actions, but a secure line of commu nication is av ailable to help them coord inate. Of course, each player could com municate h is r andom - ized action s to the oth er players, but th is is an excessive use of comm unication. A memoryless channel is a useful way to co ordinate th eir random actio ns. Thus, common informa tion is found in Section VII to be the significant quantity in this situation. I I . P R E L I M I N A R I E S A N D P RO B L E M D E FI N I T I O N A. Nota tion W e represent random variables as c apital letters, X , and their alphabets are written in script, X . Sequences, X 1 , ..., X n are in dicated with a superscript X n . Distri- bution f unction s, p X ( x ) , are usually abb reviated as p ( x ) when there is no conf usion. Accented variables, ˆ X , ind icate different variables f or each accent, but their alphabets are all the same, X . Similarly , distribution function s written with an accen t or different letter , such as p ( x ) versus ˆ p ( x ) , represent different distributions. Markov chains, satisfying p ( x, y , z ) = p ( x, y ) p ( z | y ) , are re presented with dashes, X − Y − Z . (W yner’ s) co mmon information: C ( X ; Y ) , min X − U − Y I ( X, Y ; U ) . Conditional c ommon information: C ( X ; Y | W ) , min X − ( U,W ) − Y I ( X, Y ; U | W ) . T otal variation d istance: || p − q || 1 , 1 2 X x | p ( x ) − q ( x ) | . B. P r ob lem Sp ecific Defin itions A source X n is distributed i.i.d. accor ding to ˇ p ( x ) . A description of the sou rce at rate R 1 is represented by I ∈ { 1 , ..., 2 nR 1 } . A rand om v ariable J , uniform ly distributed on { 1 , ..., 2 nR 2 } and in depend ent of X n , represents th e comm on rando m bits at rate R 2 known at bo th th e en coder and d ecoder . The decoder generates a channel outpu t Y n based o nly on I an d J . The chann el b eing simulated has a the co nditiona l distribution q ( y | x ) , thus the d esir ed join t distrib ution is ˇ p ( x ) q ( y | x ) . Definition 1 : A (2 nR 1 , 2 nR 2 , n ) channel simula tion code consists of a ran domized enco ding fun ction, F n : X n × { 1 , 2 , ..., 2 nR 2 } → { 1 , 2 , ..., 2 nR 1 } , and a rand omized decod ing functio n, G n : { 1 , 2 , ..., 2 nR 1 } × { 1 , 2 , ..., 2 nR 2 } → Y n . The d escription I e quals F n ( X n , J ) , and th e channel output Y n equals G n ( I , J ) . Since random ized fun ctions ar e specified by cond i- tional probab ility distributions, it is equiv alent to say that a (2 nR 1 , 2 nR 2 , n ) ch annel simulation code consists of a condition al probab ility mass function p ( i, y n | x n , j ) with the pr operties that p ( y n | i, j, x n ) = p ( y n | i, j ) , |I | = 2 nR 1 , and |J | = 2 nR 2 . The induced joint distribution of a (2 nR 1 , 2 nR 2 , n ) channel simu lation code is the jo int distribution on the quadru ple ( X n , Y n , I , J ) . In other words, it is the probab ility mass fu nction, p ( x n , y n , i, j ) = p ( i, y n | x n , j ) p ( x n , j ) , (1) where p ( x n , j ) = p ( j ) Q n k =1 ˇ p ( x k ) b y co nstruction . Definition 2 : A sequence of (2 nR 1 , 2 nR 2 , n ) channel simulation codes f or n = 1 , 2 , ... is said to a chieve q ( y | x ) if the ind uced joint distributions have marginal distributions p ( x n , y n ) th at satisfy lim n →∞ p ( x n , y n ) − n Y k =1 ˇ p ( x k ) q ( y k | x k ) 1 = 0 . Definition 3 : A rate pair ( R 1 , R 2 ) is said to be achiev- able if there exists a sequence of (2 nR 1 , 2 nR 2 , n ) channel simulation co des that achieves q ( y | x ) . Definition 4 : T he simulation rate r egion is the closure of ac hiev able rate p airs ( R 1 , R 2 ) . I I I . M A I N R E S U LT Theor em 3. 1: For an i.i.d. source with distribution ˇ p ( x ) and a desired memor yless chan nel with c ondition al distribution q ( y | x ) , th e simula tion rate region is the set, S , ( R 1 , R 2 ) ∈ R 2 : ∃ p ( x, y , u ) ∈ D s.t. R 1 ≥ I ( X ; U ) , R 1 + R 2 ≥ I ( X, Y ; U ) } , (2) where D , { p ( x, y , u ) : ( X, Y ) ∼ ˇ p ( x ) q ( y | x ) , X − U − Y form a Markov chain , |U | ≤ |X ||Y | + 1 } . (3) I V . O B S E RV A T I O N S A N D E X A M P L E S T wo extreme po ints of the simulatio n rate region S fall directly from its definition. If R 2 = 0 , the second inequality in (2) dominates. Th us, the minimu m rate R 1 is the co mmon informatio n C ( X ; Y ) . This coincid es with the intuition pr ovided by W yner’ s result in [ 2]. At the other extrem e, using the data processing inequ ality on the first ine quality of (2) yields R 1 ≥ I ( X ; Y ) no matter h ow mu ch commo n random ness is a v ailable, and this is achiev ed wh en R 2 ≥ H ( Y | X ) . 2 Source coding results an d the coo rdinated action work of Cover and Permu ter in [3] illustrate that with a description rate of I ( X ; Y ) we can cr eate a codeboo k o f outp ut sequences in such a way that we’ll likely be able to find a join tly typ ical outp ut sequen ce for e ach in put sequ ence from the so urce. Con sequently , we can then rand omize the cod ebook using common rand omness to actu ally simulate the cha nnel, as Bennett and Shor proved in [4]. A. B inary Erasure Cha nnel For a Bernoulli-half sour ce X , let us demonstrate th e simulation rate region f or the binary erasur e channel. Y is an erasure with probab ility P e and is equal to X otherwise. Th e distributions in D that pro duce the bound ary o f the simulation rate region are formed by cascading two binar y erasure ch annels as shown in Figure 3, where p 2 ∈ 0 , min 1 2 , P e , p 1 = 1 − 1 − P e 1 − p 2 . The m utual information terms in (2) become I ( X ; U ) = 1 − p 1 , I ( X, Y ; U ) = h ( P e ) + (1 − p 1 )(1 − h ( p 2 )) , where h is the bin ary entr opy functio n. 2 R 2 doesn’t nec essary have to be as lar ge as H ( Y | X ) for ( I ( X ; Y ) , R 2 ) to be in the simulation rate regi on. P S f r a g r e p l a c e m e n t s X U Y 0 1 p 1 p 1 p 2 p 2 0 e 1 Fig. 3. T he Marko v ch ains X − U − Y that giv e the bounda ry of the simulation rate region for the binary erasure channel with a Bernoull i-half input are formed by cascading two erasure channel s. 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 R 1 R 2 P S f r a g r e p l a c e m e n t s BEC Simulatio n Rate Region, P e = 0 . 75 I ( X ; Y ) C ( X ; Y ) Fig. 4. Boundary of the simulation rate region for a binary erasure channe l with erasure probabilit y P e = 0 . 75 and a Bernoulli-hal f input, where R 1 is the de scriptio n rate and R 2 is the rate of common random- ness. W ithout common randomness, a descrip tion rate of C ( X ; Y ) is require d to simul ate the channel. W ith unlimited common ra ndomness, a description rate of I ( X ; Y ) suf fices. Figure 4 shows the bou ndary of the simulation rate region fo r er asure probability P e = 0 . 75 . The req uired description rate R 1 varies from C ( X ; Y ) = h (0 . 75) = 0 . 811 bits to I ( X ; Y ) = 0 . 25 bits as the rate o f c ommon random ness ru ns between 0 an d H ( Y | X ) = h (0 . 75) = 0 . 811 b its. V . S K E T C H O F C O N V E R S E Let ( R 1 , R 2 ) be an achie vable rate pair . Then for each ǫ ∈ (0 , 1 / 4) ther e exists a (2 nR 1 , 2 nR 2 , n ) chan - nel simu lation code with an induced joint distribution p ( x n , y n , i, j ) such that p ( x n , y n ) − n Y k =1 ˇ p ( x k ) q ( y k | x k ) 1 < ǫ. Let the rand om variable K be unifo rmly d istributed over the set { 1 , ..., n } . T he variable K will serve as a ra ndom time ind ex. A. E ntr o py B ounds The joint distribution of the sequences ( X n , Y n ) is close in total variation to an i.i.d . distribution, so we can extend Lemma 2.7 o f [6] to o btain two bound s: H ( X n , Y n ) − n X k =1 H ( X k , Y k ) ≤ ng ( ǫ ) , (4) I ( X K , Y K ; K ) ≤ n g ( ǫ ) , (5) where g ( ǫ ) , 4 ǫ log |X | + log |Y | + log 1 ǫ . (6) Notice th at lim ǫ ↓ 0 g ( ǫ ) = 0 . B. E psilon R ate R e gion Define an epsilon rate region, S ǫ , ( R 1 , R 2 ) ∈ R 2 : ∃ p ( x, y , u ) ∈ D ǫ s.t. R 1 ≥ I ( X ; U ) − 2 g ( ǫ ) , R 1 + R 2 ≥ I ( X, Y ; U ) − 2 g ( ǫ ) } , where D ǫ , { p ( x, y , u ) : || p ( x, y ) − ˇ p ( x ) q ( y | x ) || 1 < ǫ, X − U − Y form a Markov chain , |U | ≤ |X ||Y | + 1 } . (7) Lemma 5 .1: ( R 1 , R 2 ) ∈ S ǫ . Pr o of: W e use familiar informa tion theo retic in- equalities, and the fact that X n and J a re independ ent, to bound R 1 and the sum rate R 1 + R 2 . nR 1 ≥ H ( I ) ≥ H ( I | J ) ≥ I ( X n ; I | J ) = I ( X n ; I , J ) . (8) n ( R 1 + R 2 ) ≥ H ( I , J ) ≥ I ( X n , Y n ; I , J ) . (9) W e then lower bo und the r .h.s. of (8) an d (9) usin g similar steps. Here we pr oceed fr om (9). I ( X n ; Y n ; I , J ) = H ( X n , Y n ) − H ( X n , Y n | I , J ) ≥ H ( X n , Y n ) − n X k =1 H ( X k , Y k | I , J ) ≥ n X k =1 I ( X k , Y k ; I , J ) − ng ( ǫ ) = nI ( X K , Y K ; I , J | K ) − ng ( ǫ ) ≥ nI ( X K , Y K ; I , J, K ) − 2 ng ( ǫ ) . The second in equality comes fro m (4), and the last inequality c omes f rom (5). The jo int distribution of the pair ( X K , Y K ) ca n be shown to satisfy the total variation constraint in ( 7). Finally , we ackn owledge the Markovity of the triple X K − ( I , J, K ) − Y K to comp lete the proof of the lemma. (The cardinality bound o f U in (7) is sh own to be satisfiable via a gene ralized Caratheod ory theor em.) C. Lower semi-continu ity The epsilon rate regions decr ease to th e simulation rate region as epsilon decreases to zero. Lemma 5 .2: \ ǫ ∈ (0 , 1 / 2) S ǫ ⊂ S. V I . S K E T C H O F A C H I E V A B I L I T Y A. R esolvability One key tool for the achie vability proof is summarized in Lemma 6.1. This lemma is implied by the resolv ability work o f Han and V erd ´ u in [7], but the co ncept was first introdu ced by W yn er in Theor em 6.3 of [2]. Lemma 6 .1: For any discrete distribution p ( u , v ) and each n , let C ( n ) = { U n ( m ) } 2 nR m =1 be a “codeb ook” of sequence s each independ ently drawn according to Q n k =1 p U ( u k ) . For a fixed codebo ok, define th e distrib ution Q ( v n ) = 2 − nR 2 nR X m =1 n Y k =1 p V | U ( v k | U k ( m )) . Then if R > I ( V ; U ) , lim n →∞ E Q ( v n ) − n Y k =1 p V ( v k ) 1 = 0 , where the expectation is with respect to the random ly constructed codebooks C ( n ) . B. E xistence o f A chievable Cod es Assume that ( R 1 , R 2 ) is in the in terior of S . Then there e xists a distrib ution p ∗ ( x, y , u ) ∈ D such that R 1 > I ( X ; U ) and R 1 + R 2 > I ( X , Y ; U ) . For each n , let ( I , J ) be unifo rmly distributed on { 1 , ..., 2 nR 1 } × { 1 , ..., 2 nR 2 } . W e app ly Lemma 6.1 twice, o nce with V = ( X , Y ) and again with V = X , to assert that ther e exists a sequence o f “code books” C ( n ) = { U n ( i, j ) } ( i,j ) ∈I ×J , n = 1 , 2 , ... with the proper ties lim n →∞ Q ( x n , y n ) − n Y k =1 p ∗ X,Y ( x k , y k ) 1 = 0 , (10) lim n →∞ Q ( x n , j ) − p ( j ) n Y k =1 p ∗ X ( x k ) 1 = 0 , (11) where Q ( x n , y n ) and Q ( x n , j ) are marginal d istributions derived fr om the joint d istribution Q ( x n , y n , i, j ) = p ( i, j ) n Y k =1 p ∗ X,Y | U ( x k , y k | U k ( i, j )) . In an indirect w ay , we’ ve constru cted a seq uence of joint distributions Q ( x n , y n , i, j ) from which we can derive cha nnel simu lation co des that achieve q ( y | x ) . The M arkovity of p ∗ implies the Markov prop erty Q ( x n , y n | i, j ) = Q ( x n | i, j ) Q ( y n | i, j ) . Let ˆ p ( i | x n , j ) = Q ( i | x n , j ) , ˆ p ( y n | i, j ) = Q ( y n | i, j ) . Considering ( 10) and (11) with the p roperties of to- tal variation and p ∗ in mind, it can b e shown th at ˆ p ( i, y n | x n , j ) = ˆ p ( i | x n , j ) ˆ p ( y n | i, j ) is a sequence of channel simu lation codes that achiev es q ( y | x ) . C. Comment o n Achievability Scheme This channel simulation scheme requires ran domiza- tion at b oth th e enco der and deco der . I n essence, a codebo ok of ind ependen tly drawn U n sequences is over - populated so that the enc oder can choose one ran domly from m any that a re jo intly ty pical with X n . Th e de coder then randomly generates Y n condition ed on U n . V I I . G A M E T H E O RY Our framew ork finds m otiv ation in a game theoretic setting. Consider a zero- sum repeated game between two team s. T eam A consists of tw o players who on the i th iteration take action s X i ∈ X and Y i ∈ Y . The oppon ents on team B take combined action Z i ∈ Z . All actio n spaces X , Y , a nd Z are finite. T he pay off fo r team A at each iteration is a time-in variant finite fun ction Π( X i , Y i , Z i ) and is the loss for team B. Each team wishes to maxim ize its time-av eraged expected p ayoff. Assume that team A plays conservati vely , attempting to max imize the expected payoff for the worst-case actions of team B. Th en the p ayoff at the i th iteration is Θ i , min z ∈Z E Π( X i , Y i , z ) | X i − 1 , Y i − 1 . (12) Clearly , (1 2) cou ld b e max imized by fin ding an optimal mixed strategy p ∗ ( x, y ) that maximizes min z ∈Z E [Π( X , Y , z )] and choosing indepen dent actions each iter ation. T his would corre spond to the minimax strategy . Howev er , now we introduc e a new constraint: The players on team A have a limited secure channel of com munication . Play er 1, who cho oses the actions X n , commun icates at rate R to Player 2, who chooses Y n . Let U be the message passed from Player 1 to Player 2. W e say a rate R is achie vable for pa yoff Θ if there exists a sequence of random variable tr iples ( X n , Y n , U ) that each form Markov chain s 3 X n − U − Y n and such 3 This Marko v chain requirement can be relaxed to the more physi- cally rele v ant requirement that X k − ( U, X k − 1 , Y k − 1 ) − Y k for all k . that |U | ≤ 2 nR and lim n →∞ E " 1 n n X i =1 Θ i # > Θ . (13) Let R (Θ) be the in fimum of achievable rate s fo r payoff Θ . W e claim th at R (Θ) is the least average common informatio n o f all c ombinatio ns of strategies that ach iev e average payoff Θ . Define, R 0 (Θ) , min C ( X ; Y | W ) s.t. E min z ∈Z E [Π( X , Y , z ) | W ] ≥ Θ . Theor em 7. 1: R (Θ) = R 0 (Θ) . Con verse Sketch: The importan t elements of the converse a re the inequal- ities n ( R (Θ) + ǫ ) > H ( U ) ≥ I ( X n , Y n ; U ) = n X i =1 I ( X i , Y i ; U | X i − 1 , Y i − 1 ) = nI ( X K , Y K ; U | X K − 1 , Y K − 1 , K ) , for a ll ǫ > 0 , where K is unifo rmly distributed on { 1 , ..., n } . Now identify the tuple ( X K − 1 , Y K − 1 , K ) as the au xiliary random variable W . Achievability Comment: The rando m v ariable W serves as a tim e sharing variable to co mbine strategies of hig h an d low cor relation. V I I I . A C K N O W L E D G M E N T The author would like to thank his advisor, T om Cover , for encouraging the stud y of coordination via commun ication, and Y oung-Han Kim for his sug gestions and encourag ement. This work is supported by the Na- tional Science Foun dation through grants CCF-051 5303 and CC F-0635 318. R E F E R E N C E S [1] P . G ´ acs and J. K ¨ orner , “Common Information is Far L ess Than Mutual Informati on, ” Prob lems of Contr ol and Info. Theory , vol. 2, pp. 149-162, 1973. [2] A. W yner , “The Common Information of T wo Depende nt Ra ndom V ariables, ” IEEE T rans. Info. Theory , v ol. IT -21, no. 2, March 1975. [3] T . Cov er and H. Permuter , “Capaci ty of Coordinate d Actions, ” ISIT 2007, Nice, France. [4] C. H. Bennett and P . W . Shor , “Entangl ement-Assiste d Capacit y of a Quantum Channel and the Rev erse Shanno n Theorem, ” IEEE T rans. Info. Theory , vol. 48, no. 10, Oct. 2002. [5] V . Anantharam an d V . Borkar , “Common Randomness and Dis- trib uted Control: A Counte rexa mple, ” Syste ms and Contr ol Lette rs , vol. 56, no. 7-8, July 2007. [6] I. Csi sz ´ ar and J. K ¨ orner , Informatio n Theory: Coding Theor ems for Discrete Memoryl ess Systems. New Y ork: Academic , 1981. [7] T . S. Han and S. V erd ´ u, “ Approximation Theory of Output Statist ics, ” IE EE T rans. Info. Theory , vol. 39, no. 3, May 1993.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment