The finite-dimensional Witsenhausen counterexample

1 The ﬁnite-dimensional W itsenhausen countere xample Pulkit Grov er , Se Y ong Park and Anant Sahai Department of EECS, Uni versity of California at Berk eley , CA-94720, USA { pulkit, sahai, separk } @eecs.berkele y .edu Abstract Recently , a vector version of W itsenhausen’ s counterexample was considered and it was sho wn that in that limit of inﬁnite vector length, certain quantization-based control strategies are provably within a constant factor of the optimal cost for all possible problem parameters. In this paper, ﬁnite vector lengths are considered with the dimension being viewed as an additional problem parameter . By applying a large-deviation “sphere-packing” philosophy , a lower bound to the optimal cost for the ﬁnite dimensional case is derived that uses appropriate shadows of the inﬁnite-length bound. Using the new lower bound, we show that good lattice-based control strategies achiev e within a constant factor of the optimal cost uniformly ov er all possible problem parameters, including the vector length. F or W itsenhausen’ s original problem — the scalar case — the gap between regular lattice-based strategies and the lower bound is numerically ne ver more than a factor of 8 . I . I N T RO D U C T I O N Distributed control problems hav e long proved challenging for control engineers. In 1968, W itsen- hausen [1] gav e a counterexample sho wing that ev en a seemingly simple distributed control problem can be hard to solve. For the counterexample, W itsenhausen chose a two-stage distributed LQG system and provided a nonlinear control strategy that outperforms all linear laws. It is now clear that the non-classical information pattern of W itsenhausen’ s problem makes it quite challenging 1 ; the optimal strategy and the optimal costs for the problem are still unknown — non-con ve xity makes the search for an optimal strategy 1 In words of Y u-Chi Ho [2], “the simplest problem becomes the hardest problem. ” 2 hard [3]–[5]. Discrete approximations of the problem [6] are ev en NP-complete 2 [7]. In the absence of a solution, research on the counterexample has bifurcated into two dif ferent directions. Since there is no kno wn systematic approach to obtain pro v ably optimal solutions, a body of literature (e.g. [4] [5] [8] and the references therein) applies search heuristics to explore the space of possible control actions and obtain intuition into the structure of good strategies. W ork in this direction has also yielded considerable insight into addressing non-con vex problems in general. In the other direction the emphasis is on understanding the role of implicit communication in the counterexample. In distributed control, control actions not only attempt to reduce the immediate control costs, they can also communicate relev ant information to other controllers to help them reduce costs. W itsenhausen [1, Section 6] and Mitter and Sahai [9] aim at de veloping systematic constructions based on implicit communication. W itsenhausen’ s two-point quantization strategy is motiv ated from the optimal strategy for two-point symmetric distributions of the initial state [1, Section 5] and it outperforms linear strategies for certain parameter choices. Mitter and Sahai [9] propose multipoint-quantization strategies that, depending on the problem parameters, can outperform linear strategies by an arbitrarily-large factor . V arious modiﬁcations to the counterexample in vestigate if misalignment of these two goals of control and implicit communication makes the problems hard [3], [10]–[14] (see [15] for a survey of other such modiﬁcations). Of particular interest are two works, those of Rotko witz and Lall [12], and Rotko witz [14]. The ﬁrst work [12] sho ws that with extremely fast, inﬁnite-capacity , and perfectly reliable external channels, the optimal controllers are linear not just for the W itsenhausen’ s counterexample (which is a simple observ ation), but for more general problems as well. This suggests that allowing for an external channel between the two controllers in W itsenhausen’ s countere xample might simplify the problem. Ho wev er , when the channel is not perfect, Martins [16] sho ws that ﬁnding optimal solutions can be 2 More precisely , results in [7] imply that the discrete counterparts to the W itsenhausen countere xample are NP-complete if the assumption of Gaussianity of the primiti ve random variables is relaxed. Further, it is also shown in [7] that with this relaxation, a polynomial time solution to the original continuous problem would imply P = N P , and thus conceptually the relaxed continuous problem is also hard. 3 hard 3 . A closer inspection of the problem in [16] re veals that nonlinear strategies can outperform linear ones by an arbitrarily large factor for any ﬁxed SNR on the external channel. Even to make good use of the external channel resource, one needs nonlinear strategies. The second w ork [14] sho ws that if one considers the induced norm instead of the original e xpected quadratic cost, linear control laws are optimal and easy to ﬁnd. The induced norm formulation is therefore easy to solve, and at the same time, it makes no assumptions on the state and the noise distributions. This led Doyle to ask if W itsenhausen’ s counterexample (with expected quadratic cost) is at all relev ant [21] — after all, not only is the LQG formulation more constrained, it is also harder to solve. The question thus becomes what norm is more appropriate, and the answer must come from what is relev ant in practical situations. In practice, one usually knows the “typical” amplitude of the noise and the initial state, or at least rough bounds them. The induced-norm formulation may therefore be quite conserv ative: since no assumptions are made on the state and the noise, it requires budgeting for completely arbitrary behavior of state and noise — they can e ven collude to raise the costs for the chosen strategy . T o see ho w conservati ve the induced-norm formulation can be, notice the following: e ven allowing for colluding state and noise, mere knowledge of a bound on the noise amplitude sufﬁces to hav e quantization-based nonlinear strategies outperform linear strategies by an arbitrarily large f actor (with the expected cost replaced by a hard-budget. The proof is simpler than that in [9], and is left as an e xercise to the interested reader for reasons of limited space). Conceptually , the LQG formulation is only abstracting some knowledge of noise and initial state behavior . In practical situations where such knowledge e xists, designs based on an induced norm formulation (and linear strategies) may be needlessly expensi ve because they budget for impossible events. 3 Martins shows that nonlinear strategies that do not ev en use the external channel can outperform linear ones that do use the channel where the external channel SNR is high. As is suggested by what David Tse calls the “deterministic perspective” (along the lines of [17]–[19]), linear strategies do not make good use of the external channel because they only communicate the “most signiﬁcant bits” — which can anyway be estimated reliably at the second controller . So if the uncertainty in the initial state is large, the external channel is only of limited help and there may be substantial adv antage in having the controllers talk through the plant. A similar problem is considered by Shoarinejad et al in [20], where noisy side information of the source is av ailable at the recei ver . Since this formulation is ev en more constrained than that in [16], it is clear that nonlinear strategies outperform linear for this problem as well. 4 The fact that nonlinear strategies can be arbitrarily better brings us to a question that has recei ved little attention in the literature — how far are the proposed nonlinear strategies from the optimal? It is belie ved that the strategies of Lee, Lau and Ho [5] are close to optimal. In Section VI, we will see that these strategies can be viewed as an instance of the “dirty-paper coding” strategy in information theory , and quantify their advantage over pure quantization based strate gies. Despite their improved performance, there was no guarantee that these strategies are indeed close to optimal 4 . W itsenhausen [1, Section 7] deri ved a lo wer bound on the costs that is loose in the interesting regimes of small k and large σ 2 0 [15], [22], and hence is insufﬁcient to obtain any guarantee on the gap from optimality . T o wards obtaining such a guarantee, a strategic simpliﬁcation of the problem was introduced in [15], [23] where we consider an asymptotically-long vector version of the problem. This problem is related to a toy communication problem that we call “ Assisted Interference Suppression” (AIS) which is an extension of the dirty-paper coding (DPC) [24] model in information theory . There has been a b urst of interest in extensions to DPC in information theory mainly along two lines of work — multi-antenna Gaussian channels, and the “cogniti ve-radio channel. ” For multi-antenna Gaussian channels, a problem of much theoretical and practical interest, DPC turns out to be the optimal strategy (see [25] and the references therein). The “cogniti ve radio channel” problem was formulated by De vroye et al [26]. This inspired much w ork in asymmetric cooperation between nodes [27]–[31]. In our work [15], [23], we de veloped a new lower bound to the optimal performance of the vector W itsenhausen problem. Using this bound, we sho w that vector-quantization based strategies attain within a factor of 4 . 45 of the optimal cost for all problem parameters in the limit of inﬁnite vector length. Further , combinations of linear and DPC-based strategies attain within a factor 2 of the optimal cost. This factor was later improved to 1 . 3 in [32] by improving the lower bound. While a constant-factor result does not establish true optimality , such results are often helpful in the face of intractable problems lik e those that are otherwise NP-hard [33]. This 4 The search in [5] is not exhausti ve. The authors ﬁrst ﬁnd a good quantization-based solution. Inspired by piece wise linear strategies (from the neural networks based search of Baglietto et al [4]), each quantization step is broken into sev eral small sub-steps to approximate a piecewise linear curve. 5 constant-factor spirit has also been useful in understanding other stochastic control problems [34], [35] and in the asymptotic analysis of problems in multiuser wireless communication [17], [36]. While the lower bound in [15] holds for all v ector lengths, and hence for the scalar counterexample as well, the ratio of the costs attained by the strategies of [9] and the lower bound diver ges in the limit k → 0 and σ 0 → ∞ . This suggests that there is a signiﬁcant ﬁnite-dimensional aspect of the problem that is being lost in the inﬁnite-dimensional limit: either quantization-based strategies are bad, or the lower bound of [15] is very loose. This effect is elucidated in [22] by deriving a different lower bound showing that quantization-based strategies indeed attain within a constant 5 factor of the optimal cost for W itsenhausen’ s original problem. The bound in [22] is in the spirit of W itsenhausen’ s original lower bound, but is more intricate. It captures the idea that observation noise can force a second-stage cost to be incurred unless the ﬁrst stage cost is large. In this paper, we re vert to the line of attack initiated by the vector simpliﬁcation of [15]. In Section II, we formally state the v ector version of the countere xample. F or obtaining good control strate gies, we observe that the action of the ﬁrst controller in the quantization-based strategy of [9] can be thought of as forcing the state to a point on a one-dimensional lattice . Extending this idea, in Section III, we provide lattice-based quantization strategies for ﬁnite dimensional spaces and analyze their performance. Building upon the v ector lower bound of [15], a ne w lo wer bound is deri ved in Section IV which is in the spirit of large-de viations-based information-theoretic bounds for ﬁnite-length communication problems 6 (e.g. [40]–[43]). In particular , our ne w bound extends the tools in [43] to a setting with unbounded distortion measure. In Section V, we combine the lattice-based upper bound (Section III) and the large- de viations lo wer bound (Section IV) to sho w that lattice-based quantization strategies attain within a constant f actor of the optimal cost for any ﬁnite length, uniformly ov er all problem parameters. F or example, this constant factor is numerically found to be smaller than 8 for the original scalar problem. 5 The constant is large in [22], but as this paper sho ws, this is an artifact of the proof rather than reality . 6 An alternativ e Central Limit Theorem (CL T)-based approach has also been used in the information-theory literature [37]–[39]. In [38], [39], the approach is used to obtain extremely tight approximations at moderate blocklengths for Shannon’ s noisy communication problem. 6 W e also provide a constant factor that holds uniformly over all vector lengths. T o understand the signiﬁcance of the result, consider the follo wing. At k = 0 . 01 and σ 0 = 500 , the cost attained by the optimal linear scheme is close to 1 . The cost attained by a quantization-based 7 scheme is 8 . 894 × 10 − 4 . Our new lo wer bound on the cost is 3 . 170 × 10 − 4 . Despite the small value of the lower bound, the ratio of the quantization-based upper bound and the lo wer bound for this choice of parameters is less than three! W e conclude in Section VI outlining directions of future research and speculating on the form of ﬁnite-dimensional strategies (following [15]) that we conjecture might be optimal. I I . N OTA T I O N A N D P RO B L E M S TA T E M E N T C x 0 x 1 + + u 1 C + z 1 2 x 2 + - m m m m u 2 m m Fig. 1. Block-diagram for vector version of Witsenhausen’ s counterexample of length m . V ectors are denoted in bold. Upper case tends to be used for random v ariables, while lower case symbols represent their realizations. W ( m, k 2 , σ 2 0 ) denotes the vector version of W itsenhausen’ s problem of length m , deﬁned as follows (sho wn in Fig. 1): • The initial state X m 0 is Gaussian, distributed N (0 , σ 2 0 I m ) , where I m is the identity matrix of size m × m . • The state transition functions describe the state ev olution with time. The state transitions are linear: X m 1 = X m 0 + U m 1 , and X m 2 = X m 1 − U m 2 . 7 The quantization points are regularly spaced about 9 . 92 units apart. This results in a ﬁrst stage cost of about 8 . 2 × 10 − 4 and a second stage cost of about 6 . 7 × 10 − 5 . 7 • The outputs observed by the controllers: Y m 1 = X m 0 , and Y m 2 = X m 1 + Z m , (1) where Z m ∼ N (0 , σ 2 Z I m ) is Gaussian distributed observation noise. • The control objective is to minimize the expected cost, av eraged over the random realizations of X m 0 and Z m . The total cost is a quadratic function of the state and the input gi ven by the sum of two terms: J 1 ( x m 1 , u m 1 ) = 1 m k 2 k u m 1 k 2 , and J 2 ( x m 2 , u m 2 ) = 1 m k x m 2 k 2 where k · k denotes the usual Euclidean 2-norm. The cost expressions are normalized by the vector - length m to allo w for natural comparisons between different vector-lengths. A control strate gy is denoted by γ = ( γ 1 , γ 2 ) , where γ i is the function that maps the observation y m i at C i to the control input u m i . For a ﬁxed γ , x m 1 = x m 0 + γ 1 ( x m 0 ) is a function of x m 0 . Thus the ﬁrst stage cost can instead be written as a function J ( γ ) 1 ( x m 0 ) = J 1 ( x m 0 + γ 1 ( x m 0 ) , γ 1 ( x m 0 )) and the second stage cost can be written as J ( γ ) 2 ( x m 0 , z m ) = J 2 ( x m 0 + γ 1 ( x m 0 ) − γ 2 ( x m 0 + γ 1 ( x m 0 ) + z m ) , γ 2 ( x m 0 + γ 1 ( x m 0 ) + z m )) . For giv en γ , the expected costs (averaged over x m 0 and z m ) are denoted by ¯ J ( γ ) ( m, k 2 , σ 2 0 ) and ¯ J ( γ ) i ( m, k 2 , σ 2 0 ) for i = 1 , 2 . W e deﬁne ¯ J ( γ ) min ( m, k 2 , σ 2 0 ) as follo ws ¯ J min ( m, k 2 , σ 2 0 ) := inf γ ¯ J ( γ ) ( m, k 2 , σ 2 0 ) . (2) W e note that for the scalar case of m = 1 , the problem is W itsenhausen’ s original counterexample [1]. Observe that scaling σ 0 and σ Z by the same factor essentially does not change the problem — the solution can also be scaled by the same factor (with the resulting cost scaling quadratically with it). Thus, without loss of generality , we assume that the variance of the Gaussian observation noise is σ 2 Z = 1 (as is also assumed in [1]). The pdf of the noise Z m is denoted by f Z ( · ) . In our proof techniques, we also 8 covering radius x n 0 x n 1 packing radius x n 1 ! x n 1 z n z n ! x n 1 Fig. 2. Cov ering and packing for the 2-dimensional hexagonal lattice. The packing-cov ering ratio for this lattice is ξ = 2 √ 3 ≈ 1 . 15 [44, Appendix C]. The ﬁrst controller forces the initial state x m 0 to the lattice point nearest to it. The second controller estimates b x m 1 to be a lattice point at the centre of the sphere if it falls in one of the packing spheres. Else it essentially gives up and estimates b x m 1 = y m 2 , the receiv ed output itself. A hexagonal lattice-based scheme would perform better for the 2-D Witsenhausen problem than the square lattice (of ξ = √ 2 ≈ 1 . 41 [44, Appendix C]) because it has a smaller ξ . consider a hypothetical observation noise Z m G ∼ N (0 , σ 2 G ) with the variance σ 2 G ≥ 1 . The pdf of this test noise is denoted by f G ( · ) . W e use ψ ( m, r ) to denote Pr( k Z m k ≥ r ) for Z m ∼ N (0 , I ) . Subscripts in expectation expressions denote the random variable being averaged ov er (e.g. E X m 0 , Z m G [ · ] denotes av eraging over the initial state X m 0 and the test noise Z m G ). I I I . L A T T I C E - B A S E D Q U A N T I Z AT I O N S T R A T E G I E S Lattice-based quantization strategies are the natural generalizations of scalar quantization-based strate- gies [9]. An introduction to lattices can be found in [45], [46]. Rele vant deﬁnitions are revie wed belo w . B denotes the unit ball in R m . Deﬁnition 1 (Lattice): An m -dimensional lattice Λ is a set of points in R m such that if x m , y m ∈ Λ , then x m + y m ∈ Λ , and if x m ∈ Λ , then − x m ∈ Λ . 9 Deﬁnition 2 (Packing and packing radius): Gi ven an m -dimensional lattice Λ and a radius r , the set Λ + r B is a packing of Euclidean m -space if for all points x m , y m ∈ Λ , ( x m + r B ) T ( y m + r B ) = ∅ . The packing radius r p is deﬁned as r p := sup { r : Λ + r B is a packing } . Deﬁnition 3 (Covering and co vering radius): Gi ven an m -dimensional lattice Λ and a radius r , the set Λ + r B is a covering of Euclidean m -space if R m ⊆ Λ + r B . The cov ering radius r c is deﬁned as r c := inf { r : Λ + r B is a cov ering } . Deﬁnition 4 (Packing-covering ratio): The packing-co vering ratio (denoted by ξ ) of a lattice Λ is the ratio of its cov ering radius to its packing radius, ξ = r c r p . Because it creates no ambiguity , we do not include the dimension m and the choice of lattice Λ in the notation of r c , r p and ξ , though these quantities depend on m and Λ . For a gi ven dimension m , a natural control strategy that uses a lattice Λ of co vering radius r c and packing radius r p is as follows. The ﬁrst controller uses the input u m 1 to force the state x m 0 to the lattice point nearest to x m 0 . The second controller estimates x m 1 to be the lattice point nearest to y m 2 . For analytical ease, we instead consider an inferior strate gy where the second controller estimates x m 1 to be a lattice point only if the lattice point lies within the sphere of radius r p around y m 2 . If no lattice point exists in the sphere, the second controller estimates x m 1 to be y m 2 , the receiv ed vector itself. The actions γ 1 ( · ) of C 1 and γ 2 ( · ) of C 2 are therefore gi ven by γ 1 ( x m 0 ) = − x m 0 + arg min x m 1 ∈ Λ k x m 1 − x m 0 k 2 , γ 2 ( y m 2 ) =      e x m 1 if ∃ e x m 1 ∈ Λ s.t. k y m 2 − e x m 1 k 2 < r 2 p y m 2 otherwise . The ev ent where there exists no such e x m 1 ∈ Λ is referred to as decoding failur e . In the follo wing, we denote γ 2 ( y m 2 ) by b x m 1 , the estimate of x m 1 . Theorem 1: Using a lattice-based strategy (as described abov e) for W ( m, k 2 , σ 2 0 ) with r c and r p the cov ering and the packing radius for the lattice, the total average cost is upper bounded by ¯ J ( γ ) ( m, k 2 , σ 2 0 ) ≤ inf P ≥ 0 k 2 P + q ψ ( m + 2 , r p ) + s P ξ 2 q ψ ( m, r p ) ! 2 , 10 where ξ = r c r p is the packing-cov ering ratio for the lattice, and ψ ( m, r ) = Pr( k Z m k ≥ r ) . The following looser bound also holds ¯ J ( γ ) ( m, k 2 , σ 2 0 ) ≤ inf P >ξ 2 k 2 P + 1 + s P ξ 2 ! 2 e − mP 2 ξ 2 + m +2 2  1+ln  P ξ 2  . Remark : The latter loose bound is useful for analytical manipulations when pro ving explicit bounds on the ratio of the upper and lower bounds in Section V. Pr oof: Note that because Λ has a cov ering radius of r c , k x m 1 − x m 0 k 2 ≤ r 2 c . Thus the ﬁrst stage cost is bounded abov e by 1 m k 2 r 2 c . A tighter bound can be provided for a speciﬁc lattice and ﬁnite m (for example, for m = 1 , the ﬁrst stage cost is approximately k 2 r 2 c 3 if r 2 c  σ 2 0 because the distribution of x m 0 conditioned on it lying in any of the quantization bins is approximately uniform at least for the most likely bins). For the second stage, observe that E X m 1 , Z m h k X m 1 − b X m 1 k 2 i = E X m 1 h E Z m h k X m 1 − b X m 1 k 2 | X m 1 ii . (3) Denote by E m the ev ent {k Z m k 2 ≥ r 2 p } . Observe that under the ev ent E c m , b X m 1 = X m 1 , resulting in a zero second-stage cost. Thus, E Z m h k X m 1 − b X m 1 k 2 | X m 1 i = E Z m h k X m 1 − b X m 1 k 2 1 1 {E m } | X m 1 i + E Z m h k X m 1 − b X m 1 k 2 1 1 {E c m } | X m 1 i = E Z m h k X m 1 − b X m 1 k 2 1 1 {E m } | X m 1 i . W e no w bound the squared-error under the error ev ent E m , when either x m 1 is decoded erroneously , or there is a decoding f ailure. If x m 1 is decoded erroneously to a lattice point e x m 1 6 = x m 1 , the squared-error can be bounded as follows k x m 1 − e x m 1 k 2 = k x m 1 − y m 2 + y m 2 − e x m 1 k 2 ≤ ( k x m 1 − y m 2 k + k y m 2 − e x m 1 k ) 2 ≤ ( k z m k + r p ) 2 . If x m 1 is decoded as y m 2 , the squared-error is simply k z m k 2 , which we also upper bound by ( k z m k + r p ) 2 . Thus, under e vent E m , the squared error k x m 1 − b x m 1 k 2 is bounded abov e by ( k z m k + r p ) 2 , and hence E Z m h k X m 1 − b X m 1 k 2 | X m 1 i ≤ E Z m  ( k Z m k + r p ) 2 1 1 {E m } | X m 1  ( a ) = E Z m  ( k Z m k + r p ) 2 1 1 {E m }  , (4) 11 1 1.5 2 2.5 3 10 −6 10 −4 10 −2 Power P MMSE Scalar lower bound σ G = 1 σ G = 1.25 σ G = 2.1 σ G = 3.0 σ G = 3.9 Fig. 3. A pictorial representation of the proof for the lower bound assuming σ 2 0 = 30 . The solid curv es show the vector lower bound of [15] for various values of observation noise variances, denoted by σ 2 G . Conceptually , multiplying these curves by the probability of that channel behavior yields the shadow curves for the particular σ 2 G , shown by dashed curves. The scalar lower bound is then obtained by taking the maximum of these shadow curves. The circles at points along the scalar bound curve indicate the optimizing v alue of σ G for obtaining that point on the bound. where ( a ) uses the fact that the pair ( Z m , 1 1 {E m } ) is independent of X m 1 . No w , let P = r 2 c m , so that the ﬁrst stage cost is at most k 2 P . The following lemma helps us deriv e the upper bound. Lemma 1: For a gi ven lattice with r 2 p = r 2 c ξ 2 = mP ξ 2 , the follo wing bound holds 1 m E Z m  ( k Z m k + r p ) 2 1 1 {E m }  ≤ q ψ ( m + 2 , r p ) + s P ξ 2 q ψ ( m, r p ) ! 2 . The follo wing (looser) bound also holds as long as P > ξ 2 , 1 m E Z m  ( k Z m k + r p ) 2 1 1 {E m }  ≤ 1 + s P ξ 2 ! 2 e − mP 2 ξ 2 + m +2 2  1+ln  P ξ 2  . Pr oof: See Appendix I. The theorem no w follows from (3), (4) and Lemma 1. I V . L O W E R B O U N D S O N T H E C O S T Bansal and Basar [3] use information-theoretic techniques related to rate-distortion and channel capacity to show the optimality of linear strategies in a modiﬁed version of W itsenhausen’ s counterexample where 12 the cost function does not contain a product of two decision variables. Follo wing the same spirit, in [15] we deri ve the following lower bound for W itsenhausen’ s counterexample itself. Theorem 2: For W ( m, k 2 , σ 2 0 ) , if for a strategy γ ( · ) the a verage po wer 1 m E X m 0 [ k U m 1 k 2 ] = P , the follo wing lower bound holds on the second stage cost ¯ J ( γ ) 2 ( m, k 2 , σ 2 0 ) ≥  q κ ( P , σ 2 0 ) − √ P  + ! 2 , where ( · ) + is shorthand for max( · , 0) and κ ( P , σ 2 0 ) = σ 2 0 σ 2 0 + P + 2 σ 0 √ P + 1 . (5) The follo wing lower bound thus holds on the total cost ¯ J ( γ ) ( m, k 2 , σ 2 0 ) ≥ inf P ≥ 0 k 2 P +  q κ ( P , σ 2 0 ) − √ P  + ! 2 . Pr oof: W e refer the reader to [15] for the full proof. W e outline it here because these ideas are used in the deri vation of the new lower bound in Theorem 3. Using a triangle inequality argument, we show r 1 m E X m 0 , Z m h k X m 0 − b X m 1 k 2 i ≤ r 1 m E X m 0 , Z m [ k X m 0 − X m 1 k 2 ] + r 1 m E X m 0 , Z m h k X m 1 − b X m 1 k 2 i . (6) The ﬁrst term on the RHS is √ P . It therefore suf ﬁces to lo wer bound the term on the LHS to obtain a lo wer bound on E X m 0 , Z m h k X m 1 − b X m 1 k 2 i . T o that end, we interpret b X m 1 as an estimate for X m 0 , which is a problem of transmitting a source across a channel. For an iid Gaussian source to be transmitted across a memoryless power -constrained additi ve-noise Gaussian channel (with one channel use per source symbol), the optimal strategy that minimizes the mean-square error is merely scaling the source symbol so that the av erage power constraint is met [47]. The estimation at the second controller is then merely the linear MMSE estimation of X m 0 , and the obtained MMSE is κ ( P , σ 2 0 ) . The lemma now follo ws from (6). Observe that the lo wer bound e xpression is the same for all vector lengths. In the follo wing, lar ge- de viation arguments [48], [49] (called sphere-packing style arguments for historical reasons) are extended follo wing [41]–[43] to a joint source-channel setting where the distortion measure is unbounded. The obtained bounds are tighter than those in Theorem 2 and depend explicitly on the vector length m . 13 Theorem 3: For W ( m, k 2 , σ 2 0 ) , if for a strategy γ ( · ) the a verage po wer 1 m E X m 0 [ k U m 1 k 2 ] = P , the follo wing lower bound holds on the second stage cost for any choice of σ 2 G ≥ 1 and L > 0 ¯ J ( γ ) 2 ( m, k 2 , σ 2 0 ) ≥ η ( P , σ 2 0 , σ 2 G , L ) . where η ( P , σ 2 0 , σ 2 G , L ) = σ m G c m ( L ) exp  − mL 2 ( σ 2 G − 1) 2   q κ 2 ( P , σ 2 0 , σ 2 G , L ) − √ P  + ! 2 , where κ 2 ( P , σ 2 0 , σ 2 G , L ) := σ 2 0 σ 2 G c 2 m m ( L ) e 1 − d m ( L )  ( σ 0 + √ P ) 2 + d m ( L ) σ 2 G  , c m ( L ) := 1 Pr( k Z m k 2 ≤ mL 2 ) = (1 − ψ ( m, L √ m )) − 1 , d m ( L ) := Pr( k Z m +2 k 2 ≤ mL 2 ) Pr( k Z m k 2 ≤ mL 2 ) = 1 − ψ ( m +2 ,L √ m ) 1 − ψ ( m,L √ m ) , 0 < d m ( L ) < 1 , and ψ ( m, r ) = Pr( k Z m k ≥ r ) . Thus the following lower bound holds on the total cost ¯ J min ( m, k 2 , σ 2 0 ) ≥ inf P ≥ 0 k 2 P + η ( P , σ 2 0 , σ 2 G , L ) , (7) for any choice of σ 2 G ≥ 1 and L > 0 (the choice can depend on P ). Further , these bounds are at least as tight as those of Theorem 2 for all values of k and σ 2 0 . Pr oof: From Theorem 2, for a gi ven P , a lower bound on the a verage second stage cost is   √ κ − √ P  +  2 . W e deri ve another lo wer bound that is equal to the expression for η ( P , σ 2 0 , σ 2 G , L ) . The high-le vel intuition behind this lower bound is presented in Fig. 3. Deﬁne S G L := { z m : k z m k 2 ≤ mL 2 σ 2 G } and use subscripts to denote which probability model is being used for the second stage observ ation noise. Z denotes white Gaussian of v ariance 1 while G denotes white Gaussian of variance σ 2 G ≥ 1 . E X m 0 , Z m h J ( γ ) 2 ( X m 0 , Z m ) i = Z z m Z x m 0 J ( γ ) 2 ( x m 0 , z m ) f 0 ( x m 0 ) f Z ( z m ) d x m 0 d z m ≥ Z z m ∈S G L Z x m 0 J ( γ ) 2 ( x m 0 , z m ) f 0 ( x m 0 ) d x m 0 ! f Z ( z m ) d z m = Z z m ∈S G L Z x m 0 J ( γ ) 2 ( x m 0 , z m ) f 0 ( x m 0 ) d x m 0 ! f Z ( z m ) f G ( z m ) f G ( z m ) d z m . (8) 14 The ratio of the two probability density functions is giv en by f Z ( z m ) f G ( z m ) = e − k z m k 2 2  √ 2 π  m  p 2 π σ 2 G  m e − k z m k 2 2 σ 2 G = σ m G e − k z m k 2 2  1 − 1 σ 2 G  . Observe that z m ∈ S G L , k z m k 2 ≤ mL 2 σ 2 G . Using σ 2 G ≥ 1 , we obtain f Z ( z m ) f G ( z m ) ≥ σ m G e − mL 2 σ 2 G 2  1 − 1 σ 2 G  = σ m G e − mL 2 ( σ 2 G − 1) 2 . (9) Using (8) and (9), E X m 0 , Z m h J ( γ ) 2 ( X m 0 , Z m ) i ≥ σ m G e − mL 2 ( σ 2 G − 1) 2 Z z m ∈S G L Z x m 0 J ( γ ) 2 ( x m 0 , z m ) f 0 ( x m 0 ) d x m 0 ! f G ( z m ) d z m = σ m G e − mL 2 ( σ 2 G − 1) 2 E X m 0 , Z m G h J ( γ ) 2 ( X m 0 , Z m G )1 1 { Z m G ∈S G L } i = σ m G e − mL 2 ( σ 2 G − 1) 2 E X m 0 , Z m G h J ( γ ) 2 ( X m 0 , Z m G ) | Z m G ∈ S G L i Pr( Z m G ∈ S G L ) . (10) Analyzing the probability term in (10), Pr( Z m G ∈ S G L ) = Pr  k Z m G k 2 ≤ mL 2 σ 2 G  = Pr  k Z m G k σ G  2 ≤ mL 2 ! = 1 − Pr  k Z m G k σ G  2 > mL 2 ! = 1 − ψ ( m, L √ m ) = 1 c m ( L ) , (11) because Z m G σ G ∼ N (0 , I m ) . From (10) and (11), E X m 0 , Z m h J ( γ ) 2 ( X m 0 , Z m ) i ≥ σ m G e − mL 2 ( σ 2 G − 1) 2 E X m 0 , Z m G h J ( γ ) 2 ( X m 0 , Z m G ) | Z m G ∈ S G L i (1 − ψ ( m, L √ m )) = σ m G e − mL 2 ( σ 2 G − 1) 2 c m ( L ) E X m 0 , Z m G h J ( γ ) 2 ( X m 0 , Z m G ) | Z m G ∈ S G L i . (12) W e now need the follo wing lemma, which connects the new ﬁnite-length lower bound to the inﬁnite-length lo wer bound of [15]. Lemma 2: E X m 0 , Z m G h J ( γ ) 2 ( X m 0 , Z m G ) | Z m G ∈ S G L i ≥  q κ 2 ( P , σ 2 0 , σ 2 G , L ) − √ P  + ! 2 , for any L > 0 . Pr oof: See Appendix II. The lo wer bound on the total average cost now follo ws from (12) and Lemma 2. 15 W e no w verify that d m ( L ) ∈ (0 , 1) . That d m ( L ) > 0 is clear from deﬁnition. d m ( L ) < 1 because { z m +2 : k z m +2 k 2 ≤ mL 2 σ 2 G } ⊂ { z m +2 : k z m k 2 ≤ mL 2 σ 2 G } , i.e. , a sphere sits inside a cylinder . Finally we verify that this ne w lo wer bound is at least as tight as the one in Theorem 2. Choosing σ 2 G = 1 in the expression for η ( P , σ 2 0 , σ 2 G , L ) , η ( P , σ 2 0 , σ 2 G , L ) ≥ sup L> 0 1 c m ( L )  q κ 2 ( P , σ 2 0 , 1 , L ) − √ P  + ! 2 . No w notice that c m ( L ) and d m ( L ) con ver ge to 1 as L → ∞ . Thus κ 2 ( P , σ 2 0 , 1 , L ) L →∞ − → κ ( P , σ 2 0 ) and therefore, η ( P , σ 2 0 , σ 2 G , L ) is lo wer bounded by   √ κ − √ P  +  2 , the lo wer bound in Theorem 2. V . C O M B I N A T I O N O F L I N E A R A N D L A T T I C E - B A S E D S T R AT E G I E S AT T A I N W I T H I N A C O N S T A N T FAC T O R O F T H E O P T I M A L C O S T Theorem 4 (Constant-factor optimality): The costs for W ( m, k 2 , σ 2 0 ) are bounded as follows inf P ≥ 0 sup σ 2 G ≥ 1 ,L> 0 k 2 P + η ( P , σ 2 0 , σ 2 G , L ) ≤ ¯ J min ( m, k 2 , σ 2 0 ) ≤ µ inf P ≥ 0 sup σ 2 G ≥ 1 ,L> 0 k 2 P + η ( P , σ 2 0 , σ 2 G , L ) ! , where µ = 100 ξ 2 , ξ is the packing-cov ering ratio of any lattice in R m , and η ( · ) is as deﬁned in Theorem 3. For any m , µ < 1600 . Further , depending on the ( m, k 2 , σ 2 0 ) values, the upper bound can be attained by lattice-based quantization strategies or linear strategies. F or m = 1 , a numerical calculation (MA TLAB code av ailable at [50]) shows that µ < 8 (see Fig. 5). Pr oof: Let P ∗ denote the power P in the lo wer bound in Theorem 3. W e show here that for any choice of P ∗ , the ratio of the upper and the lower bound is bounded. Consider the two simple linear strategies of zero-forcing ( u m 1 = − x m 0 ) and zero-input ( u m 1 = 0 ) followed by LLSE estimation at C 2 . It is easy to see [15] that the av erage cost attained using these two strategies is k 2 σ 2 0 and σ 2 0 σ 2 0 +1 < 1 respecti vely . An upper bound is obtained using the best amongst the two linear strategies and the lattice-based quantization strategy . Case 1 : P ∗ ≥ σ 2 0 100 . The ﬁrst stage cost is larger than k 2 σ 2 0 100 . Consider the upper bound of k 2 σ 2 0 obtained by zero-forcing. The 16 −2 −1 0 0 0.5 1 1.5 0 5 10 15 20 log 10 (k) log 10 ( σ 0 ) ratio of upper and lower bounds −2 −1 0 0 0.5 1 1.5 2 4 6 8 10 12 14 16 log 10 (k) log 10 ( σ 0 ) ratio of upper and lower bounds Fig. 4. The ratio of the upper and the lo wer bounds for the scalar W itsenhausen problem (top), and the 2-D Witsenhausen problem (bottom, using hexagonal lattice of ξ = 2 √ 3 ) for a range of values of k and σ 0 . The ratio is bounded abov e by 17 for the scalar problem, and by 14 . 75 for the 2-D problem. ratio of the upper bound and the lower bound is no larger than 100 . Case 2 : P ∗ < σ 2 0 100 and σ 2 0 < 16 . Using the bound from Theorem 2 (which is a special case of the bound in Theorem 3), κ = σ 2 0 ( σ 0 + √ P ∗ ) 2 + 1  P ∗ < σ 2 0 100  ≥ σ 2 0 σ 2 0  1 + 1 √ 100  2 + 1 ( σ 2 0 < 16) ≥ σ 2 0 16  1 + 1 √ 100  2 + 1 = σ 2 0 20 . 36 ≥ σ 2 0 21 . Thus, for σ 2 0 < 16 and P ∗ ≤ σ 2 0 100 , ¯ J min ≥  ( √ κ − √ P ∗ ) +  2 ≥ σ 2 0  1 √ 21 − 1 √ 100  2 ≈ 0 . 014 σ 2 0 ≥ σ 2 0 72 . Using the zero-input upper bound of σ 2 0 σ 2 0 +1 , the ratio of the upper and lo wer bounds is at most 72 σ 2 0 +1 ≤ 72 . Case 3 : P ∗ ≤ σ 2 0 100 , σ 2 0 ≥ 16 , P ∗ ≤ 1 2 . In this case, κ = σ 2 0 ( σ 0 + √ P ∗ ) 2 + 1 ( P ∗ ≤ 1 2 ) ≥ σ 2 0 ( σ 0 + √ 0 . 5) 2 + 1 ( a ) ≥ 16 ( √ 16 + √ 0 . 5) 2 + 1 ≈ 0 . 6909 ≥ 0 . 69 , 17 −2 −1.5 −1 −0.5 0 0 0.2 0.4 0.6 0.8 1 2 3 4 5 6 7 8 log 10 (k) log 10 ( σ 0 ) ratio of upper and lower bounds Fig. 5. An exact calculation of the ﬁrst and second stage costs yields an improved maximum ratio smaller than 8 for the scalar Witsenhausen problem. where ( a ) uses σ 2 0 ≥ 16 and the observation that x 2 ( x + b ) 2 +1 = 1 ( 1+ b x ) 2 + 1 x 2 is an increasing function of x for x, b > 0 . Thus,  ( √ κ − √ P ) +  2 ≥ (( √ 0 . 69 − √ 0 . 5) + ) 2 ≈ 0 . 0153 ≥ 0 . 015 . Using the upper bound of σ 2 0 σ 2 0 +1 < 1 , the ratio of the upper and the lower bounds is smaller than 1 0 . 015 < 67 . Case 4 : σ 2 0 > 16 , 1 2 < P ∗ ≤ σ 2 0 100 Using L = 2 in the lower bound, c m ( L ) = 1 Pr( k Z m k 2 ≤ mL 2 ) = 1 1 − Pr( k Z m k 2 > mL 2 ) (Markov’ s ineq.) ≤ 1 1 − m mL 2 ( L =2) = 4 3 , Similarly , d m (2) = Pr( k Z m +2 k 2 ≤ mL 2 ) Pr( k Z m k 2 ≤ mL 2 ) ≥ Pr( k Z m +2 k 2 ≤ mL 2 ) = 1 − Pr( k Z m +2 k 2 > mL 2 ) (Markov’ s ineq.) ≥ 1 − m + 2 mL 2 = 1 − 1 + 2 m 4 ( m ≥ 1) ≥ 1 − 3 4 = 1 4 . 18 In the bound, we are free to use any σ 2 G ≥ 1 . Using σ 2 G = 6 P ∗ > 1 , κ 2 = σ 2 G σ 2 0  ( σ 0 + √ P ∗ ) 2 + d m (2) σ 2 G  c 2 m m (2) e 1 − d m (2) ( a ) ≥ 6 P ∗ σ 2 0  ( σ 0 + σ 0 10 ) 2 + 6 σ 2 0 100   4 3  2 m e 3 4 ( m ≥ 1) ≥ 1 . 255 P ∗ . where ( a ) uses σ 2 G = 6 P ∗ , P ∗ < σ 2 0 100 , c m (2) ≤ 4 3 and 1 > d m (2) ≥ 1 4 . Thus,  ( √ κ 2 − √ P ∗ ) +  2 ≥ P ∗ ( √ 1 . 255 − 1) 2 ≥ P ∗ 70 . (13) No w , using the lo wer bound on the total cost from Theorem 3, and substituting L = 2 , ¯ J min ( m, k 2 , σ 2 0 ) ≥ k 2 P ∗ + σ m G c m (2) exp  − mL 2 ( σ 2 G − 1) 2    √ κ 2 − √ P ∗  +  2 ( σ 2 G =6 P ∗ ) ≥ k 2 P ∗ + (6 P ∗ ) m c m (2) exp  − 4 m (6 P ∗ − 1) 2  P ∗ 70 ( a ) ≥ k 2 P ∗ + 3 m 4 3 e 2 m e − 12 P ∗ m 1 70 × 2 ( m ≥ 1) ≥ k 2 P ∗ + 3 × 3 × e 2 4 × 70 × 2 e − 12 mP ∗ > k 2 P ∗ + 1 9 e − 12 mP ∗ , (14) where ( a ) uses c m (2) ≤ 4 3 and P ∗ ≥ 1 2 . W e loosen the lattice-based upper bound from Theorem 1 and bring it into a form similar to (14). Here, P is a part of the optimization: ¯ J min ( m, k 2 , σ 2 0 ) ≤ inf P >ξ 2 k 2 P + 1 + s P ξ 2 ! 2 e − mP 2 ξ 2 + m +2 2  1+ln  P ξ 2  ≤ inf P >ξ 2 k 2 P + 1 9 e − 0 . 5 mP ξ 2 + m +2 2  1+ln  P ξ 2  +2 ln  1+ q P ξ 2  +ln(9) ≤ inf P >ξ 2 k 2 P + 1 9 e − m  0 . 5 P ξ 2 − m +2 2 m  1+ln  P ξ 2  − 2 m ln  1+ q P ξ 2  − ln(9) m  = inf P >ξ 2 k 2 P + 1 9 e − 0 . 12 mP ξ 2 × e − m  0 . 38 P ξ 2 − 1+ 2 m 2  1+ln  P ξ 2  − 2 m ln  1+ q P ξ 2  − ln(9) m  ( m ≥ 1) ≤ inf P >ξ 2 k 2 P + 1 9 e − 0 . 12 mP ξ 2 e − m  0 . 38 P ξ 2 − 3 2  1+ln  P ξ 2  − 2 ln  1+ q P ξ 2  − ln(9)  ≤ inf P ≥ 34 ξ 2 k 2 P + 1 9 e − 0 . 12 mP ξ 2 , (15) 19 where the last inequality follo ws from the f act that 0 . 38 P ξ 2 > 3 2  1 + ln  P ξ 2  + 2 ln  1 + q P ξ 2  + ln (9) for P ξ 2 > 34 . This can be checked easily by plotting it. 8 Using P = 100 ξ 2 P ∗ ≥ 50 ξ 2 > 34 ξ 2 (since P ∗ ≥ 1 2 ) in (15), ¯ J min ( m, k 2 , σ 2 0 ) ≤ k 2 100 ξ 2 P ∗ + 1 9 e − m 0 . 12 × 100 ξ 2 P ∗ ξ 2 = k 2 100 ξ 2 P ∗ + 1 9 e − 12 mP ∗ . (16) Using (14) and (16), the ratio of the upper and the lower bounds is bounded for all m since µ ≤ k 2 100 ξ 2 P ∗ + 1 9 e − 12 mP ∗ k 2 P ∗ + 1 9 e − 12 mP ∗ ≤ k 2 100 ξ 2 P ∗ k 2 P ∗ = 100 ξ 2 . (17) For m = 1 , ξ = 1 , and thus in the proof the ratio µ ≤ 100 . For m large, ξ ≈ 2 [46], and µ . 400 . For arbitrary m , using the recursiv e construction in [51, Theorem 8.18], ξ ≤ 4 , and thus µ ≤ 1600 reg ardless of m . Though the proof above succeeds in showing that the ratio is uniformly bounded by a constant, it is not very insightful and the constant is lar ge. Ho wev er , since the underlying v ector bound can be tightened (as sho wn in [32]), it is not worth improving the proof for increased elegance at this time. The important thing is that such a uniform constant exists. A numerical e valuation of the upper and lower bounds (of Theorem 1 and 3 respecti vely) shows that the ratio is smaller than 17 for m = 1 (see Fig. 4). A precise calculation of the cost of the quantization strategy improv es the upper bound to yield a maximum ratio smaller than 8 (see Fig. 5). A simple grid lattice has a packing-cov ering ratio ξ = √ m . Therefore, while the grid lattice has the best possible packing-cov ering ratio of 1 in the scalar case, it has a rather lar ge packing covering ratio of √ 2 ( ≈ 1 . 41) for m = 2 . On the other hand, a hexagonal lattice (for m = 2 ) has an improved packing- cov ering ratio of 2 √ 3 ≈ 1 . 15 . In contrast with m = 1 , where the ratio of upper and lo wer bounds of Theorem 1 and 3 is approximately 17 , a hexagonal lattice yields a ratio smaller than 14 . 75 , despite ha ving 8 It can also be veriﬁed symbolically by e xamining the expression g ( b ) = 0 . 38 b 2 − 3 2 (1 + ln b 2 ) − 2 ln(1 + b ) − ln (9) , taking its deriv ativ e g 0 ( b ) = 0 . 76 b − 3 b − 2 1+ b , and second deriv ativ e g 00 ( b ) = 0 . 76 + 3 b 2 + 2 (1+ b ) 2 > 0 . Thus g ( · ) is conv ex- ∪ . Further, g 0 ( √ 34) ≈ 3 . 62 > 0 , and g ( √ 34) ≈ 0 . 09 and so g ( b ) > 0 whenev er b ≥ √ 34 . 20 a larger packing-covering ratio. This is a consequence of the tightening of the sphere-packing lower bound (Theorem 3) as m gets large 9 . V I . D I S C U S S I O N S O F N U M E R I C A L E X P L O R A T I O N S A N D C O N C L U S I O N S Though lattice-based quantization strategies allo w us to get within a constant factor of the optimal cost for the vector W itsenhausen problem, the y are not optimal. This is kno wn for the scalar [5] and the inﬁnite-length case [15]. It is shown in [15] that the “slopey-quantization” strategy of Lee, Lau and Ho [5] that is belie ved to be very close to optimal in the scalar case can be viewed as an instance of a linear scaling follo wed by a dirty-paper coding (DPC) strategy . Such DPC-based strategies are also the best known strategies in the asymptotic inﬁnite-dimensional case, requiring optimal po wer P to attain 0 asymptotic mean-square error in the estimation of x m 1 , and attaining costs within a factor of 1 . 3 of the optimal [32] for all ( k , σ 2 0 ) . This leads us to conjecture that a DPC-like strategy might be optimal for ﬁnite-vector lengths as well. In the follo wing, we numerically e xplore the performance of DPC-like strategies. It is natural to ask how much there is to gain using a DPC-based strategy over a simple quantization strategy . Notice that the DPC-strate gy gains not only from the slope y quantization, but also from the MMSE-estimation at the second controller . In Fig. 6, we eliminate the latter advantage by considering ﬁrst a uniform quantization-based strategy with an appropriate scaling of the MLE so that it approximates the MMSE-estimation performance, and then the actual MMSE-estimation strategy for uniform quantization. Along the curv e k σ 0 = √ 10 , there is signiﬁcant gain in using this approximate-MMSE estimation over MLE, and further gain in using MMSE-estimation itself. This also sho ws that there is an interesting tradeof f between the complexity of the second controller and the system performance. From Fig. 6, along the curve k σ 0 = √ 10 , the DPC-based strategy performs only negligibly better than a quantization-based strategy with MMSE estimation. Fig. 7 (a) shows that this is not true in general. A 9 Indeed, in the limit m → ∞ , the ratio of the asymptotic av erage costs attained by a vector-quantization strategy and the vector lower bound of Theorem 2 is bounded by 4 . 45 [15]. 21 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 2 4 6 8 10 12 14 log 10 ( σ 0 ) (k σ 0 = 10 0.5 ) Ratio to the lower bound Linear+Slopey Quantization with optimal DPC parameter Linear+ Slopey−quantization with hurisitic DPC parameter Quantization + MLE linear + quantization + MLE Optimal linear linear + quantization + MMSE Fig. 6. Ratio of the achiev able costs to the scalar lo wer bound along k σ 0 = 10 − 0 . 5 for v arious strategies. Quantization with MMSE-estimation at the second controller outperforms quantization with MLE, or even scaled MLE. For slopey-quantization with heuristic DPC-parameter , the parameter α in DPC-based scheme is borrowed from the inﬁnite-length analysis. The ﬁgure suggests that along this path ( k σ 0 = √ 10 ), the difference between optimal-DPC and heuristic DPC is not substantial. Howe ver, Fig. 7 (b) shows that this is not true in general. 0 0.2 0.4 0.6 0.8 1 − 2 − 1.5 − 1 − 0.5 0 0.5 1 1.05 1.1 1.15 log 10 ( σ 0 ) X: 1 Y: − 0.2 Z: 1.148 log 10 (k) Ratio 0 0.2 0.4 0.6 0.8 1 − 2 − 1.5 − 1 − 0.5 0 0.5 1 1.1 1.2 log 10 ( σ 0 ) log 10 (k) Ratio (a) (b) Fig. 7. (a) sho ws the ratio of cost attained by linear+quantization (with MMSE decoding) to DPC with parameter α obtained by brute-force optimization. DPC can do up to 15% better than the optimal quantization strategy . Also the maximum is attained along k ≈ 0 . 6 which is different from k = 0 . 2 of the benchmark problem [5]. (b) shows the ratio of cost attained by linear+quantization to DPC with α borrowed from inﬁnite-length optimization. Heuristic DPC does not outperform linear+quantization (with MMSE estimation) substantially . 22 DPC-based strategy can perform up to 15% better than a simple quantization-based scheme depending on the problem parameters. Interestingly , the advantage of using a DPC-based strate gy for the case of k = 0 . 2 , σ 0 = 5 (which is used as the benchmark case in many papers, e.g. [5], [8]) is quite small. The maximum gain of about 15% is obtained at k ≈ 10 − 0 . 2 ≈ 0 . 63 , and σ 0 = 1 (and indeed, any σ 0 > 1 . In the future, we suggest the community use the point (0 . 63 , 1) as the benchmark case. Gi ven that there is an adv antage in using a DPC-like strategy , an interesting question is whether the DPC parameter α that optimizes the DPC-based strategy’ s performance at inﬁnite-lengths (in [15]) gi ves good performance for the scalar case as well. Fig. 7 (b) answers this question at least partially in the negati ve. This heuristic-DPC does only slightly better than a quantization strategy with MMSE estimation, whereas other v alues of α do signiﬁcantly better . Finally , we observ e that while uniform bin-size quantization or DPC-based strategies are designed for atypical noise behavior , atypical beha vior of the the initial state is better accommodated by using nonuniform bin-sizes (such as those in [5], [8]). T able I compares the two. Clearly , the advantage in having nonuniform slope y-quantization is small, but not negligible. It would be interesting to calibrate the advantage of nonuniform-bin sizes for ( k , σ 0 ) = (0 . 63 , 1) , a maximum gain point for uniform-bin size slopey-quantization strategies. T ABLE I C O S T S A T TA I N E D F O R T H E B E N C H M A R K C A S E O F k = 0 . 2 , σ 0 = 5 . linear+quantization Slopey-quantization Lee, Lau and Ho [5] 0.1713946 0.1673132 Li, Marden and Shamma [8] — 0.1670790 This paper 0.1715335 0.1673654 There are plenty of open problems that arise naturally . Both the lower and the upper bounds ha ve room for improvement. The lower bound can be improved by tightening the vector lower bound of [15] (one such tightening is performed in [32]) and obtaining corresponding ﬁnite-length results using the sphere-packing tools de veloped here. 23 T ightening the upper bound can be performed by using DPC-based techniques over lattices. Further , an exact analysis of the required ﬁrst-stage po wer when using a lattice would yield an improv ement (as pointed out earlier , for m = 1 , 1 m k 2 r 2 c ov erestimates the required ﬁrst-stage cost), especially for small m . Improv ed lattice designs with better packing-cov ering ratios would also improv e the upper bound. Perhaps a more signiﬁcant set of open problems are the next steps in understanding more realistic versions of W itsenhausen’ s problem, speciﬁcally those that include costs on all the inputs and all the states [13], with noisy state ev olution and noisy observations at both controllers. The hope is that solutions to these problems can then be used as the basis for prov ably-good nonlinear controller synthesis for larger distributed systems. Further , tools dev eloped for solving these problems might help address multiuser problems in information theory , in the spirit of [52], [53]. A C K N O W L E D G M E N T S W e gratefully acknowledge the support of the National Science Foundation (CNS-403427, CNS-093240, CCF-0917212 and CCF-729122), Sumitomo Electric and Samsung. W e thank Amin Gohari, Bobak Nazer and Anand Sarwate for helpful discussions, and Gireeja Ranade for suggesting improvements in the paper . A P P E N D I X I P RO O F O F L E M M A 1 E Z m  ( k Z m k + r p ) 2 1 1 {E m }  = E Z m  k Z m k 2 1 1 {E m }  + r 2 p Pr( E m ) + 2 r p E Z m  1 1 {E m }   k Z m k 1 1 {E m }  ( a ) ≤ E Z m  k Z m k 2 1 1 {E m }  + r 2 p Pr( E m ) + 2 r p q E Z m  1 1 {E m }  q E Z m  k Z m k 2 1 1 {E m }  =  q E Z m  k Z m k 2 1 1 {E m }  + r p p Pr( E m )  2 , (18) where ( a ) uses the Cauchy-Schwartz inequality [54, Pg. 13]. W e wish to express E Z m  k Z m k 2 1 1 {E m }  in terms of ψ ( m, r p ) := Pr( k Z m k ≥ r p ) = R k z m k≥ r p e − k z m k 2 2 ( √ 2 π ) m d z m . 24 Denote by A m ( r ) := 2 π m 2 r m − 1 Γ ( m 2 ) the surface area of a sphere of radius r in R m [55, Pg. 458], where Γ( · ) is the Gamma-function satisfying Γ( m ) = ( m − 1)Γ( m − 1) , Γ(1) = 1 , and Γ( 1 2 ) = √ π . Di viding the space R m into shells of thickness dr and radii r , E Z m  k Z m k 2 1 1 {E m }  = Z k z m k≥ r p k z m k 2 e − k z m k 2 2  √ 2 π  m d z m = Z r ≥ r p r 2 e − r 2 2  √ 2 π  m A m ( r ) dr = Z r ≥ r p r 2 e − r 2 2  √ 2 π  m 2 π m 2 r m − 1 Γ  m 2  dr = Z r ≥ r p e − r 2 2 2 π  √ 2 π  m +2 2 π m +2 2 r m +1 π 2 m Γ  m +2 2  dr = mψ ( m + 2 , r p ) . (19) Using (18), (19), and r p = q mP ξ 2 E Z m  ( k Z m k + r p ) 2 1 1 {E m }  ≤ m q ψ ( m + 2 , r p ) + s P ξ 2 q ψ ( m, r p ) ! 2 , which yields the ﬁrst part of Lemma 1. T o obtain a closed-form upper bound we consider P > ξ 2 . It suf ﬁces to bound ψ ( · , · ) . ψ ( m, r p ) = Pr( k Z m k 2 ≥ r 2 p ) = Pr(exp( ρ m X i =1 Z 2 i ) ≥ exp( ρr 2 p )) ( a ) ≤ E Z m " exp( ρ m X i =1 Z 2 i ) # e − ρr 2 p = E Z 1  exp( ρZ 2 1 )  m e − ρr 2 p ( for 0 <ρ< 0 . 5) = 1 (1 − 2 ρ ) m 2 e − ρr 2 p , where ( a ) follo ws from the Markov inequality , and the last inequality follows from the f act that the moment generating function of a standard χ 2 2 random variable is 1 (1 − 2 ρ ) 1 2 for ρ ∈ (0 , 0 . 5) [56, Pg. 375]. Since this bound holds for any ρ ∈ (0 , 0 . 5) , we choose the minimizing ρ ∗ = 1 2  1 − m r 2 p  . Since r 2 p = mP ξ 2 , ρ ∗ is indeed in (0 , 0 . 5) as long as P > ξ 2 . Thus, ψ ( m, r p ) ≤ 1 (1 − 2 ρ ∗ ) m 2 e − ρ ∗ r 2 p =  r 2 p m  m 2 e − 1 2  1 − m r 2 p  r 2 p = e − r 2 p 2 + m 2 + m 2 ln  r 2 p m  . Using the substitutions r 2 c = mP , ξ = r c r p and r 2 p = mP ξ 2 , Pr( E m ) = ψ ( m, r p ) = ψ m, s mP ξ 2 ! ≤ e − mP 2 ξ 2 + m 2 + m 2 ln  P ξ 2  , and (20) E Z m  k Z m k 2 1 1 {E m }  ≤ mψ m + 2 , s mP ξ 2 ! ≤ me − mP 2 ξ 2 + m +2 2 + m +2 2 ln  mP ( m +2) ξ 2  . (21) 25 From (18), (20) and (21), E Z m  ( k Z m k + r p ) 2 1 1 {E m }  ≤  √ me − mP 4 ξ 2 + m +2 4 + m +2 4 ln  mP ( m +2) ξ 2  s mP ξ 2 e − mP 4 ξ 2 + m 4 + m 4 ln  P ξ 2   2 ( since P >ξ 2 ) < √ m 1 + s P ξ 2 ! e − mP 4 ξ 2 + m +2 4 + m +2 4 ln  P ξ 2  ! 2 = m 1 + s P ξ 2 ! 2 e − mP 2 ξ 2 + m +2 2 + m +2 2 ln  P ξ 2  . A P P E N D I X I I P RO O F O F L E M M A 2 The follo wing lemma is taken from [15]. Lemma 3: For any three random variables A , B and C , E  k B − C k 2  ≥   p E [ k A − C k 2 ] − p E [ k A − B k 2 ]  +  2 . Pr oof: See [15, Appendix II]. Choosing A = X m 0 , B = X m 1 and C = b X m 1 , E X m 0 , Z m G h J ( γ ) 2 ( X m 0 , Z m G ) | Z m G ∈ S G L i = 1 m E X m 0 , Z m G h k X m 1 − b X m 1 k 2 | Z m G ∈ S G L i ≥  r 1 m E X m 0 , Z m G h k X m 0 − b X m 1 k 2 | Z m G ∈ S G L i − r 1 m E X m 0 , Z m G [ k X m 0 − X m 1 k 2 | Z m G ∈ S G L ]  +  2 =  r 1 m E X m 0 , Z m G h k X m 0 − b X m 1 k 2 | Z m L ∈ S G L i − √ P  +  2 , (22) since X m 0 − X m 1 = U m 1 is independent of Z m G and E [ k U m 1 k 2 ] = mP . Deﬁne Y m L := X m 1 + Z m L to be the output when the observation noise Z m L is distributed as a truncated Gaussian distribution: f Z L ( z m L ) =          c m ( L ) e − k z m L k 2 2 σ 2 G  √ 2 π σ 2 G  m z m L ∈ S G L 0 otherwise. (23) Let the estimate at the second controller on observing y m L be denoted by b X m L . Then, by the deﬁnition of conditional expectations, E X m 0 , Z m G h k X m 0 − b X m 1 k 2 | Z m G ∈ S G L i = E X m 0 , Z m G h k X m 0 − b X m L k 2 i . (24) 26 T o get a lower bound, we no w allow the controllers to optimize themselves with the additional kno wledge that the observation noise z m must fall in S G L . In order to prev ent the ﬁrst controller from “cheating” and allocating different powers to the two ev ents ( i.e. z m falling or not falling in S G L ), we enforce the constraint that the power P must not change with this additional knowledge. Since the controller’ s observation X m 0 is independent of Z m , this constraint is satisﬁed by the original controller (without the additional knowledge) as well, and hence the cost for the system with the additional knowledge is still a valid lower bound to that of the original system. The rest of the proof uses ideas from channel coding and the rate-distortion theorem [57, Ch. 13] from information theory . W e vie w the problem as a problem of implicit communication from the ﬁrst controller to the second. Notice that for a gi ven γ ( · ) , X m 1 is a function of X m 0 , Y m L = X m 1 + Z m L is conditionally independent of X m 0 gi ven X m 1 (since the noise Z m L is additiv e and independent of X m 1 and X m 0 ). Further , b X m L is a function of Y m L . Thus X m 0 − X m 1 − Y m L − b X m L form a Markov chain. Using the data-processing inequality [57, Pg. 33], I ( X m 0 ; b X m L ) ≤ I ( X m 1 ; Y m L ) , (25) where I ( A, B ) is the expression for mutual information expression between two random v ariables A and B (see, for example, [57, Pg. 18, Pg. 231]). T o estimate the distortion to which X m 0 can be communicated across this truncated Gaussian channel (which, in turn, helps us lower bound the MMSE in estimating X m 1 ), we need to upper bound the term on the RHS of (25). Lemma 4: 1 m I ( X m 1 ; Y m L ) ≤ 1 2 log 2 e 1 − d m ( L ) ( ¯ P + d m ( L ) σ 2 G ) c 2 m m ( L ) σ 2 G ! . Pr oof: W e ﬁrst obtain an upper bound to the po wer of X m 1 (this bound is the same as that used in [15]): E X m 0  k X m 1 k 2  = E X m 0  k X m 0 + U m 1 k 2  = E X m 0  k X m 0 k 2  + E X m 0  k U m 1 k 2  + 2 E X m 0  X m 0 T U m 1  ( a ) ≤ E X m 0  k X m 0 k 2  + E X m 0  k U m 1 k 2  + 2 q E X m 0 [ k X m 0 k 2 ] q E X m 0 [ k U m 1 k 2 ] ≤ m ( σ 0 + √ P ) 2 , 27 where ( a ) follo ws from the Cauchy-Schw artz inequality . W e use the following deﬁnition of differ ential entr opy h ( A ) of a continuous random variable A [57, Pg. 224]: h ( A ) = − Z S f A ( a ) log 2 ( f A ( a )) da, (26) where f A ( a ) is the pdf of A , and S is the support set of A . Conditional differential entropy is deﬁned similarly [57, Pg. 229]. Let ¯ P := ( σ 0 + √ P ) 2 . No w , E  Y 2 L,i  = E  X 2 1 ,i  + E  Z 2 L,i  (since X 1 ,i is independent of Z L,i and by symmetry , Z L,i are zero mean random v ariables). Denote ¯ P i = E  X 2 1 ,i  and σ 2 G,i = E  Z 2 L,i  . In the follo wing, we deriv e an upper bound C ( m ) G,L on 1 m I ( X m 1 ; Y m L ) . C ( m ) G,L := sup p ( X m 1 ): E [ k X m 1 k 2 ] ≤ m ¯ P 1 m I ( X m 1 ; Y m L ) ( a ) = sup p ( X m 1 ): E [ k X m 1 k 2 ] ≤ m ¯ P 1 m h ( Y m L ) − 1 m h ( Y m L | X m 1 ) = sup p ( X m 1 ): E [ k X m 1 k 2 ] ≤ m ¯ P 1 m h ( Y m L ) − 1 m h ( X m 1 + Z m L | X m 1 ) ( b ) = sup p ( X m 1 ): E [ k X m 1 k 2 ] ≤ m ¯ P 1 m h ( Y m L ) − 1 m h ( Z m L | X m 1 ) ( c ) = sup p ( X m 1 ): E [ k X m 1 k 2 ] ≤ m ¯ P 1 m h ( Y m L ) − 1 m h ( Z m L ) ( d ) ≤ sup p ( X m 1 ): E [ k X m 1 k 2 ] ≤ m ¯ P 1 m m X i =1 h ( Y L,i ) − 1 m h ( Z m L ) ( e ) ≤ sup ¯ P i : P m i =1 ¯ P i ≤ m ¯ P 1 m m X i =1 1 2 log 2  2 π e ( ¯ P i + σ 2 G,i )  − 1 m h ( Z m L ) ( f ) ≤ 1 2 log 2  2 π e ( ¯ P + d m ( L ) σ 2 G )  − 1 m h ( Z m L ) . (27) Here, ( a ) follo ws from the deﬁnition of mutual information [57, Pg. 231], ( b ) follo ws from the fact that translation does not change the dif ferential entropy [57, Pg. 233], ( c ) uses independence of Z m L and X m 1 , and ( d ) uses the chain rule for dif ferential entropy [57, Pg. 232] and the fact that conditioning reduces entropy [57, Pg. 232]. In ( e ) , we used the fact that Gaussian random v ariables maximize differential entropy . The inequality ( f ) follo ws from the concavity- ∩ of the log ( · ) function and an application of Jensen’ s inequality [57, Pg. 25]. W e also use the fact that 1 m P m i =1 σ 2 G,i = d m ( L ) σ 2 G , which can be prov en 28 as follo ws 1 m E " m X i =1 Z 2 L,i # ( using (23) ) = σ 2 G m Z z m ∈S G L k z m k 2 σ 2 G c m ( L ) exp  − k z m G k 2 2 σ 2 G   p 2 π σ 2 G  m d z m G = c m ( L ) σ 2 G m E  k Z m G k 2 1 1 n k Z m G k≤ √ mL 2 σ 2 G o  ( e Z m := Z m G σ G ) = c m ( L ) σ 2 G m E h k e Z m k 2 1 1 { k e Z m k≤ √ mL 2 } i = c m ( L ) σ 2 G m  E h k e Z m k 2 i − E h k e Z m k 2 1 1 { k e Z m k > √ mL 2 } i  ( using (19) ) = c m ( L ) σ 2 G m  m − mψ ( m + 2 , √ mL 2 )  = c m ( L )  1 − ψ ( m + 2 , L √ m )  σ 2 G = d m ( L ) σ 2 G . (28) W e no w compute h ( Z m L ) h ( Z m L ) = Z z m ∈S G L f Z L ( z m ) log 2  1 f Z L ( z m )  d z m = Z z m ∈S G L f Z L ( z m ) log 2    p 2 π σ 2 G  m c m ( L ) e − k z m k 2 2 σ 2 G   d z m = − log 2 ( c m ( L )) + m 2 log 2  2 π σ 2 G  + Z z m ∈S G L c m ( L ) f G ( z m ) k z m k 2 2 σ 2 G log 2 ( e ) d z m . (29) Analyzing the last term of (29), Z z m ∈S G L c m ( L ) f G ( z m ) k z m k 2 2 σ 2 G log 2 ( e ) d z m = log 2 ( e ) 2 σ 2 G Z z m ∈S G L c m ( L ) e − k z m k 2 2 σ 2 G  p 2 π σ 2 G  m k z m k 2 d z m = log 2 ( e ) 2 σ 2 G Z z m f Z L ( z m ) k z m k 2 d z m ( using (23) ) = log 2 ( e ) 2 σ 2 G E G  k Z m L k 2  = log 2 ( e ) 2 σ 2 G E G " m X i =1 Z 2 L,i # ( using (28) ) = log 2 ( e ) 2 σ 2 G md m ( L ) σ 2 G = m log 2  e d m ( L )  2 . (30) The expression C ( m ) G,L can no w be upper bounded using (27), (29) and (30) as follo ws. C ( m ) G,L ≤ 1 2 log 2  2 π e ( ¯ P + d m ( L ) σ 2 G )  + 1 m log 2 ( c m ( L )) − 1 2 log 2  2 π σ 2 G  − 1 2 log 2  e d m ( L )  = 1 2 log 2  2 π e ( ¯ P + d m ( L ) σ 2 G )  + 1 2 log 2  c 2 m m ( L )  − 1 2 log 2  2 π σ 2 G  − 1 2 log 2  e d m ( L )  = 1 2 log 2 2 π e ( ¯ P + d m ( L ) σ 2 G ) c 2 m m ( L ) 2 π σ 2 G e d m ( L ) ! = 1 2 log 2 e 1 − d m ( L ) ( ¯ P + d m ( L ) σ 2 G ) c 2 m m ( L ) σ 2 G ! . (31) 29 No w , recall that the rate-distortion function D m ( R ) for squared error distortion for source X m 0 and reconstruction b X m L is, D m ( R ) := inf p ( b X m L | X m 0 ) 1 m I ( X m 0 ; b X m L ) ≤ R 1 m E X m 0 , Z m G h k X m 0 − b X m L k 2 i , (32) which is the dual of the rate-distortion function [57, Pg. 341]. Since I ( X m 0 ; b X m L ) ≤ mC ( m ) G,L , using the con verse to the rate distortion theorem [57, Pg. 349] and the upper bound on the mutual information represented by C ( m ) G,L , 1 m E X m 0 , Z m G h k X m 0 − b X m L k 2 i ≥ D m ( C ( m ) G,L ) . (33) Since the Gaussian source is iid, D m ( R ) = D ( R ) , where D ( R ) = σ 2 0 2 − 2 R is the distortion-rate function for a Gaussian source of variance σ 2 0 [57, Pg. 346]. Thus, using (22), (24) and (33), E X m 0 , Z m G h J ( γ ) 2 ( X m 0 , Z m ) | Z m ∈ S G L i ≥  q D ( C ( m ) G,L ) − √ P  + ! 2 . Substituting the bound on C ( m ) G,L from (31), D ( C ( m ) G,L ) = σ 2 0 2 − 2 C ( m ) G,L = σ 2 0 σ 2 G c 2 m m ( L ) e 1 − d m ( L ) ( ¯ P + d m ( L ) σ 2 G ) . Using (22), this completes the proof of the lemma. Notice that c m ( L ) → 1 and d m ( L ) → 1 for ﬁxed m as L → ∞ , as well as for ﬁxed L > 1 as m → ∞ . So the lo wer bound on D ( C ( m ) G,L ) approaches κ of Theorem 2 in both of these two limits. R E F E R E N C E S [1] H. S. Witsenhausen, “ A counterexample in stochastic optimum control, ” SIAM Journal on Contr ol , vol. 6, no. 1, pp. 131–147, Jan. 1968. [2] Y .-C. Ho, “Re view of the Witsenhausen problem, ” Proceedings of the 47th IEEE Confer ence on Decision and Control (CDC) , pp. 1611–1613, 2008. [3] R. Bansal and T . Basar , “Stochastic teams with nonclassical information revisited: When is an afﬁne control optimal?” IEEE T rans. Automat. Contr . , vol. 32, pp. 554–559, Jun. 1987. [4] M. Baglietto, T . Parisini, and R. Zoppoli, “Nonlinear approximations for the solution of team optimal control problems, ” Proceedings of the IEEE Confer ence on Decision and Contr ol (CDC) , pp. 4592–4594, 1997. 30 [5] J. T . Lee, E. Lau, and Y .-C. L. Ho, “The Witsenhausen counterexample: A hierarchical search approach for noncon ve x optimization problems, ” IEEE T rans. Automat. Contr . , vol. 46, no. 3, pp. 382–397, 2001. [6] Y .-C. Ho and T . Chang, “ Another look at the nonclassical information structure problem, ” IEEE T rans. A utomat. Contr . , v ol. 25, no. 3, pp. 537–540, 1980. [7] C. H. Papadimitriou and J. N. Tsitsiklis, “Intractable problems in control theory , ” SIAM Journal on Control and Optimization , vol. 24, no. 4, pp. 639–654, 1986. [8] N. Li, J. R. Marden, and J. S. Shamma, “Learning approaches to the Witsenhausen counterexample from a view of potential games, ” Pr oceedings of the 48th IEEE Confer ence on Decision and Contr ol (CDC) , 2009. [9] S. K. Mitter and A. Sahai, “Information and control: Witsenhausen re visited, ” in Learning, Contr ol and Hybrid Systems: Lecture Notes in Control and Information Sciences 241 , Y . Y amamoto and S. Hara, Eds. New Y ork, NY : Springer, 1999, pp. 281–293. [10] T . Basar , “V ariations on the theme of the Witsenhausen counterexample, ” Pr oceedings of the 47th IEEE Conference on Decision and Contr ol (CDC) , pp. 1614–1619, 2008. [11] M. Rotkowitz, “On information structures, conv exity , and linear optimality , ” Pr oceedings of the 47th IEEE Confer ence on Decision and Control (CDC) , pp. 1642–1647, 2008. [12] M. Rotko witz and S. Lall, “ A characterization of con vex problems in decentralized control, ” IEEE T rans. Automat. Contr . , vol. 51, no. 2, pp. 1984–1996, Feb. 2006. [13] P . Grov er , S. Y . Park, and A. Sahai, “On the generalized Witsenhausen counterexample, ” in Pr oceedings of the Allerton Conference on Communication, Control, and Computing , Monticello, IL, Oct. 2009. [14] M. Rotko witz, “Linear controllers are uniformly optimal for the W itsenhausen counterexample, ” Pr oceedings of the 45th IEEE Confer ence on Decision and Contr ol (CDC) , pp. 553–558, Dec. 2006. [15] P . Grover and A. Sahai, “V ector Witsenhausen counterexample as assisted interference suppression, ” T o appear in the special issue on Information Pr ocessing and Decision Making in Distributed Contr ol Systems of the International Journal on Systems, Control and Communications (IJSCC) , Sep. 2009. [Online]. A vailable: http://www .eecs.berkeley .edu/$ \ sim$sahai/ [16] N. C. Martins, “Witsenhausen’ s counter e xample holds in the presence of side information, ” Pr oceedings of the 45th IEEE Conference on Decision and Contr ol (CDC) , pp. 1111–1116, 2006. [17] A. S. A vestimehr , S. Diggavi, and D. N. C. Tse, “ A deterministic approach to wireless relay networks, ” in Proc. of the Allerton Confer ence on Communications, Contr ol and Computing , October 2007. [18] A. S. A vestimehr , “W ireless network information ﬂow: A deterministic approach, ” Ph.D. dissertation, UC Berkeley , Berkeley , CA, 2008. [19] A. S. A vestimehr , S. Digga vi, and D. N. C. Tse, “Wireless network information ﬂow: a deterministic approach, ” Submitted to IEEE T ransactions on Information Theory , Jul. 2009. [20] K. Shoarinejad, J. L. Speyer , and I. Kanellakopoulos, “ A stochastic decentralized control problem with noisy communication, ” SIAM Journal on Control and optimization , vol. 41, no. 3, pp. 975–990, 2002. 31 [21] J. Doyle, Panel Discussions at Paths Ahead in the Science of Information and Decision Systems, Cambridge, MA, Nov . 2009. [22] S. Y . Park, P . Grover , and A. Sahai, “ A constant-factor approximately optimal solution to the Witsenhausen counterexample, ” Pr oceedings of the 48th IEEE Conference on Decision and Contr ol (CDC) , Dec. 2009. [23] P . Grover and A. Sahai, “ A vector version of Witsenhausen’ s counterexample: T o wards con ver gence of control, communication and computation, ” Proceedings of the 47th IEEE Conference on Decision and Contr ol (CDC) , Dec. 2008. [24] M. Costa, “Writing on dirty paper , ” IEEE T rans. Inform. Theory , vol. 29, no. 3, pp. 439–441, May 1983. [25] H. W eingarten, Y . Steinberg, and S. Shamai, “The capacity region of the Gaussian multiple-input multiple-output broadcast channel, ” IEEE T ransactions on Information Theory , vol. 52, no. 9, pp. 3936–3964, 2006. [26] N. Devr oye, P . Mitran, and V . T arokh, “ Achiev able rates in cognitiv e radio channels, ” IEEE T rans. Inform. Theory , vol. 52, no. 5, pp. 1813–1827, May 2006. [27] A. Jovicic and P . V iswanath, “Cogniti ve radio: An information-theoretic perspectiv e, ” in Pr oceedings of the 2006 International Symposium on Information Theory , Seattle, W A , Seattle, W A, Jul. 2006, pp. 2413–2417. [28] Y .-H. Kim, A. Sutiv ong, and T . M. Cover , “State ampliﬁcation, ” IEEE T rans. Inform. Theory , vol. 54, no. 5, pp. 1850–1859, May 2008. [29] N. Merhav and S. Shamai, “Information rates subject to state masking, ” IEEE T rans. Inform. Theory , vol. 53, no. 6, pp. 2254–2261, Jun. 2007. [30] T . Philosof, A. Khisti, U. Erez, and R. Zamir , “Lattice strategies for the dirty multiple access channel, ” in Pr oceedings of the IEEE Symposium on Information Theory , Nice, France, Jul. 2007, pp. 386–390. [31] S. Kotagiri and J. Laneman, “Multiaccess channels with state known to some encoders and independent messages, ” EURASIP Journal on W ir eless Communications and Networking , no. 450680, 2008. [32] P . Grover , A. B. W agner , and A. Sahai, “Information embedding meets distributed control, ” In pr eparation for submission to IEEE T ransactions on Information Theory , 2009. [33] G. Ausiello, P . Crescenzi, G. Gambosi, V . Kann, A. Marchetti-Spaccamela, and M. Protasi, Complexity and Appr oximation: Combinatorial optimization pr oblems and their appr oximability pr operties . Springer V erlag, 1999. [34] R. Cogill and S. Lall, “Suboptimality bounds in stochastic control: A queueing example, ” in American Contr ol Conference , 2006 , Jun. 2006, pp. 1642–1647. [35] R. Cogill, S. Lall, and J. P . Hespanha, “ A constant factor approximation algorithm for event-based sampling, ” in American Contr ol Confer ence, 2007. ACC ’07 , Jul. 2007, pp. 305–311. [36] R. Etkin, D. Tse, and H. W ang, “Gaussian interference channel capacity to within one bit, ” IEEE T rans. Inform. Theory , vol. 54, no. 12, Dec. 2008. [37] D. Baron, M. A. Khojastepour , and R. G. Baraniuk, “Non-asymptotic performance of symmetric Slepian-Wolf coding, ” in 39th Confer ence on Information Sciences and Systems , Princeton, NJ, Mar . 2005. [38] Y . Polyanskiy , H. V . Poor , and S. V erdu, “Dispersion of Gaussian channels, ” in IEEE International Symposium on Information Theory , 32 Seoul, Korea, 2009. [39] ——, “New channel coding achiev ability bounds, ” in IEEE International Symposium on Information Theory , T oronto, Canada, 2008. [40] R. G. Gallager, Information Theory and Reliable Communication . New Y ork, NY : John Wile y , 1971. [41] M. S. Pinsker , “Bounds on the probability and of the number of correctable errors for nonblock codes, ” Pr oblemy P er edachi Informatsii , vol. 3, no. 4, pp. 44–55, Oct./Dec. 1967. [42] A. Sahai, “Why block-length and delay behav e differently if feedback is present, ” IEEE T rans. Inform. Theory , no. 5, pp. 1860–1886, May 2008. [43] A. Sahai and P . Grover , “The price of certainty : “waterslide curves” and the gap to capacity , ” Dec. 2007. [Online]. A vailable: http://arXiv .org/abs/0801.0352v1 [44] R. F . H. Fisher , Pr ecoding and Signal Shaping for Digital T ransmission . New Y ork, NY : John Wile y , 2002. [45] J. H. Conway and N. J. A. Sloane, Sphere packings, lattices and groups . Ne w Y ork: Springer-V erlag, 1988. [46] U. Erez, S. Litsyn, and R. Zamir, “Lattices which are good for (almost) everything, ” IEEE T rans. Inform. Theory , vol. 51, no. 10, pp. 3401–3416, Oct. 2005. [47] T . J. Goblick, “Theoretical limitations on the transmission of data from analog sources, ” IEEE T rans. Inform. Theory , vol. 11, no. 4, Oct. 1965. [48] R. Blahut, “A hypothesis testing approach to information theory, ” Ph.D. dissertation, Cornell University , Ithaca, NY , 1972. [49] I. Csisz ´ ar and J. K ¨ orner , Information Theory: Coding Theorems for Discrete Memoryless Systems . Ne w Y ork: Academic Press, 1981. [50] “Code for performance of lattice-based strategies for Witsenhausen’ s counterexample. ” [Online]. A v ailable: http://www .eecs.berkeley . edu/$ \ sim$pulkit/FiniteW itsenhausenCode.htm [51] D. Micciancio and S. Goldwasser , Complexity of Lattice Pr oblems: A Cryptographic P erspective . Springer, 2002. [52] W . W u, S. V ishwanath, and A. Arapostathis, “Gaussian interference networks with feedback: Duality , sum capacity and dynamic team problems, ” in Pr oceedings of the Allerton Confer ence on Communication, Contr ol, and Computing , Monticello, IL, Oct. 2005. [53] N. Elia, “When Bode meets Shannon: control-oriented feedback communication schemes, ” IEEE T rans. Automat. Contr . , vol. 49, no. 9, pp. 1477–1488, Sep. 2004. [54] R. Durrett, Probability: Theory and Examples , 1st ed. Belmont, CA: Brooks/Cole, 2005. [55] R. Courant, F . John, A. A. Blank, and A. Solomon, Intr oduction to Calculus and Analysis . Springer, 2000. [56] S. M. Ross, A ﬁrst course in probability , 6th ed. Prentice Hall, 2001. [57] T . M. Cover and J. A. Thomas, Elements of Information Theory , 1st ed. New Y ork: Wile y , 1991.

The finite-dimensional Witsenhausen counterexample

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment