Power-Controlled Feedback and Training for Two-way MIMO Channels

Most communication systems use some form of feedback, often related to channel state information. The common models used in analyses either assume perfect channel state information at the receiver and/or noiseless state feedback links. However, in pr…

Authors: Vaneet Aggarwal, Ashutosh Sabharwal

Power-Controlled Feedback and Training for Two-way MIMO Channels
1 Po wer -Controlled Feedback and T raining for T wo-w ay MIMO Channels V aneet Aggarwal, and Ashutosh Sabharwal, Senior Member , IEEE Abstract —Most communication systems use some f orm of feedback, often r elated to channel state information. The common models used in analyses either assume perfect channel state information at the recei ver and/or noiseless state feedback links. Howev er , in practical systems, neither is the channel estimate known perfectly at the receiver and nor is the feedback link perfect. In this paper , we study the achievable diversity multi- plexing tradeoff using i.i.d. Gaussian codebooks, considering the errors in training the receiver and the err ors in the feedback link f or FDD systems, where the forward and the feedback are independent MIMO channels. Our key r esult is that the maximum diversity order with one- bit of feedback information is identical to systems with mor e feedback bits. Thus, asymptotically in SNR , more than one bit of feedback does not improve the system performance at constant rates. Furthermor e, the one-bit diversity-multiplexing performance is identical to the system which has perfect channel state information at the receiver along with noiseless feedback link. This achiev ability uses novel concepts of power controlled feedback and training, which naturally surface when we consider imperfect channel estimation and noisy feedback links. In the process of evaluating the proposed training and feedback pr oto- cols, we find an asymptotic expression for the joint probability of the SNR exponents of eigen values of the actual channel and the estimated channel which may be of independent interest. Index T erms —Channel state information, diversity multiplex- ing tradeoff, feedback, multiple access channel, outage pr obabil- ity , power-contr olled, training. I . I N T R O D U C T I O N Channel state information at the transmitter has been well established to improv e communication performance, mea- sured either as increased capacity (see e.g. [1–6]), improved div ersity-multiplexing performance [7–18] or higher signal- to-noise ratio at the receiv er [2, 19, 20] among many possible metrics. Several of the aforementioned works have considered the impact of incomplete channel knowledge at the trans- mitter , by considering quantized channel information at the transmitter which can be visualized to be made av ailable by the recei ver through a noiseless finite-capacity feedback link. While the model and subsequent analysis clearly shows that reduced channel information at the transmitter can lead to significant performance gains due to channel knowledge, a key requirement is that the receiv er knows what the transmitter knows (ev en if there is an error in feedback link). In practice, the communicating nodes are distributed and ha ve no way of aligning their channel kno wledge perfectly . V . Aggarwal is with the Department of Electrical Engineering, Princeton Univ ersity , Princeton, NJ 08544, USA (email: vaggarwa@princeton.edu). A. Sabharwal is with the Department of Electrical and Computer Engineering, Rice University , Houston, TX, 77005, USA (email: ashu@rice.edu). In this paper , we systematically analyze the impact of mismatch in channel kno wledge at the transmitter and recei ver . For clarity of presentation, we will lar gely focus on a single- user point-to-point link (one transmitter and one receiv er) and only at the end of the paper , e xtend the results to the case of multiple-access channels (many transmitters and one receiv er). The key departure from prior work is that we explicitly model both the forw ard and feedback links as fading wireless channels. A little thought immediately shows that all practical wireless netw orks ha ve “two-way” communication links, that is, the nodes are transcei vers such that all the receiv ed and transmitted packets (control, data, feedback) tra vel o ver noisy fading channels. The two-w ay model to analyze feedback was first proposed in [10] for TDD systems, where it was shown to enable accurate resource accounting of the feedback link resources (po wer and spectrum) and analyze the important case of transmitter and receiver channel kno wledge mismatch. A key message from [10] was that in the general case when the transmitter and recei ver kno wledge is mismatched, both feedback and forward communication has to be jointly de- signed. The simple two-way training protocol proposed in [10] led to the concept of po wer-controlled training which enabled joint estimation of transmitter’ s intended power control and actual channel realization. W e will continue the line of thought initiated in [10] for the case of FDD (Frequency Division Duplex) systems and more importantly , systematically sho w how mismatch between information at the transmitter and receiv er impacts the ov erall system performance. For the case of FDD systems, we will continue to model the forward and feedback links as fading channels. Ho wever , unlike [10], the forward and feedback channel will be as- sumed to be completely uncorrelated since they use different frequency bands. In this case, the two-way training protocol proposed in [10] will not be applicable and we will use the quantized feedback model, in which the receiver sends a quantized version of its own channel information over the feedback link. Our main result is an achiev able div ersity-multiplexing tradeoff for an m × n MIMO (Multiple Input Multiple Output) system, with measured channel information at the receiv er which is used to send quantized channel information via a noisy link using i.i.d. Gaussian codebooks. W e show that, in the scope of modeling assumptions, div ersity-multiplexing tradeoff with one-bit of noisy feedback with estimated CSIR (channel state information at receiver) is identical to diversity- multiplexing tradeof f with perfect CSIR and noiseless one-bit feedback. Even more importantly , more bits of feedback do not improve the maximum div ersity order in the imperfect 2 system (estimated CSIR with noisy feedback) in contrast of the idealized system where the maximum div ersity order increases exponentially [8, 13, 17]. The conclusion holds for multiuser systems in general. The encouraging ne ws from our analysis is two-fold. First, ev en noisy and mismatched information about the channel at both transmitter and recei ver is sufficient to improve the whole div ersity-multiplexing tradeoff when compared to the case with no feedback. Second, v ery few bits of feedback, in fact one-bit, is all one needs to b uild in a practical system. Howe ver , to achiev e any diversity order gain ov er the no- feedback system, a straightforward application of the Genie- aided feedback analysis [8, 13, 17] fails. That is, con ventional training follo wed by con ventionally coded quantized feed- back signal leads to no improvement in div ersity order . A combination of po wer-controlled training (much lik e in [10]) and the new concept of po wer-controlled feedback (proposed in this paper) appear essential for significant improvement in diversity-multiple xing gain compared to the no-feedback system. W e will build our main result in three major steps. In the first step, we assume that the recei ver knowledge of the channel is perfect and the feedback channel is error- prone. Thus, the transmitter information about the channel is potentially different from that sent by the receiv er . In this case, the coding of the feedback signal becomes important. If the feedback information about the channel is sent using a codebook where each codeword has equal power , then the maximum div ersity order is 2 mn irrespectiv e of the amount of feedback information. In contrast, b -bit noiseless feedback leads to a maximum div ersity order grows e xponentially in number of bits [8, 13]. The reason for such dramatic decrease (from unbounded growth with b in the noiseless case to maximum 2 mn in the noisy case) is that the transmitter cannot distinguish between rare channel states, especially those where it is supposed to send large power to overcome poor channel conditions. As a result, it becomes conservati ve in its power allocation and the gain from power control does not increase unboundedly as in the case of noiseless feedback. A careful examination of the outage events on the feedback path naturally points to unequal error protection in the form of power controlled feedbac k coding scheme, where the rare states are encoded with higher power code words. W ith power controlled feedback, the di versity order can be improved to ( mn ) 2 + 2 mn , a substantial increase compared to constant power feedback. As our second step, we assume that the channel knowledge at the recei ver is imperfect but the feedback path is error-free. W e first analyze the commonly used protocol setup, where the receiv er is trained at the av erage av ailable power SNR (assum- ing noise v ariance is one) to obtain a channel estimate and then a quantized version of the estimated channel is fed back to the transmitter . While both the transmitter and receiv er have identical information, the error in receiver information leads to no gain in the di versity order compared to a no-feedback system. W e take a cue from an earlier work in [10] and the power -controlled feedback mechanism, and propose a two- round training protocol, where the recei ver is trained twice. First round training uses an av erage power SNR and then after obtaining the channel feedback, the recei ver is trained again with a training po wer dependent on the channel estimate. This implies that in poor estimated channel conditions, second round training power is higher than the first and in good conditions, it is lo wer . The adaptation of training po wer is labeled power-contr olled training and allo ws a substantial increase in maximum diversity order ( ( mn ) 2 + mn ) compared to no-feedback case ( mn ). For general multiple xing gains, the achiev able div ersity order is identical to that obtained when the receiv er knows perfect channel state information with 1 bit of perfect feedback in [13]. Further , additional bits of feedback do not help in improving the maximum diversity order at constant rates when the receiver is trained to obtain a channel estimate. Finally , we put the first two steps together to analyze the general case of noisy channel estimates with noisy feedback. The key conclusion is that the receiver channel estimate errors are the bottleneck in the maximum achie v able di versity order . Thus, the maximum achie v able diversity order at multiplexing gain r = 0 is ( mn ) 2 + mn , which is less than ( mn ) 2 + 2 mn obtained in the case of perfect recei ver information. For general multiplexing, we get the same tradeoff as with 1 bit of perfect feedback and the receiv er having perfect channel state information as in [13]. This div ersity multiplexing tradeoff can be achiev ed with a single bit of noisy feedback from the receiv er . The results on MIMO point-to-point channels are extended to the multiple access channel in which all the transmitters hav e m antennas each and the receiv er has n antennas, and all the conclusions drawn earlier for the single user also holds in a multiple access system. The channel information for each transmitter is measured at the receiver and a global quantized feedback is sent from the recei ver to all the transmitters. W e use the similar combination of power control training and feedback as in single-user systems to achieve a di versity order of ( mn ) 2 + mn with a single bit of power -controlled feedback in the main case in which the errors in the channel estimate and the feedback link are accounted. W e note that the improvement in di versity order is com- pletely dependent on our use of power control as adaptation mechanism. For e xample, in MIMO systems, feedback can be used for beamforming (e.g. [19, 20]). Howe ver , beamforming does not lead to any change the diversity order and only increases the recei ver SNR by a constant amount. Since we hav e focused on the asymptotic regime, we do not consider schemes like beamforming which have identical diversity- multiplexing performance as a non-feedback based system. In this paper, we find new achiev able schemes to improv e the div ersity multiplexing tradeoff which are better than the traditional approaches but do not claim globally optimality of these schemes. The rest of the paper is organized as follows. Section II describes preliminaries on channel model and diversity multiplexing tradeoff. Section III summarizes the known re- sults for the case when the receiv er knows perfect channel state information and the feedback is sent ov er a noiseless channel [13]. Section IV describes the diversity order when the recei ver knows the channel perfectly while the transmitter 3 receiv es feedback on a noisy channel. Section V describes the div ersity multiple xing tradeoff when the recei ver is trained to get channel estimate while the feedback link to the transmitter is noiseless. Section VI consider both the abov e errors, i.e., it considers recei ver estimating the channel and the feedback link is also noisy . Section VII presents some numerical results. W e consider the extension to multiple access channels in Section VIII. Section IX concludes the paper . I I . P R E L I M I N A R I E S A. T wo-way Channel Model W e will primarily focus on single-user multiple input-output channel with the transmitting node denoted by T and receiving node denoted by R . Later, we will extend the main result to the multiple access channel in Section VIII. For the single-user channel, we will assume that there are m transmit antennas at the source node and n receiv e antennas at the destination node, such that the input-output relation is gi ven by T → R : Y = H X + W, (1) where the elements of H and W are assumed to be i.i.d. with complex Normal distrib ution of zero mean and unit v ariance, C N (0 , 1) . The matrices Y , H, X and W are of dimension n × T coh , n × m, m × T coh and n × T coh , respectiv ely . Here T coh is coherence interval such that the channel H is fix ed during a fading block of T coh consecutiv e channel uses, and statistically independent from one block to another . The transmitter is assumed to be power -limited, such that the long-term power is upper bounded, i.e, 1 T coh trace( E  X X †  ) ≤ SNR . Since our focus will be studying feedback over noisy channels, we assume that the same multiple antennas at the transmitter and recei ver are av ailable to send feedback on an orthogonal frequency band. For the feedback path, the receiv er will act as a transmitter and the transmitter as a receiver . As a result, the feedback source (which is destination for data bits) will hav e n transmit antennas and feedback destination (which is source of data bits) will be assumed to have m receiv e antennas. Furthermore, a block fading channel model is assumed R → T : Y f = H f X f + W f , (2) where H f is the MIMO fading channel for the feedback link, and the W f is the additive noise at the recei ver of the feedback; both are assumed to have i.i.d. C N (0 , 1) el- ements. The feedback transmissions are also assumed to be power -limited with a long-term power constraint giv en by 1 T coh trace( E h X f X † f i ) ≤ SNR f . W ithout loss of generality , we will assume the case where the transmitter and receiver hav e symmetric r esour ces , such that SNR = SNR f . W e note that a phase-symmetric two-way channel model with H = H T f was studied earlier in [10, 11]. The phase- symmetric two-w ay channel is a good model for slow-f ading time division duplex (TDD) systems. On the other hand, the abov e SNR-symmetric ( SNR = SNR f ) model is well-suited for symmetrically resourced FDD systems. Sender Destination antennas antennas Forward Channel Feedback Channel Fig. 1. T wo-way MIMO channel. The forward and feedback channels use the same antennas. If the forward and feedback are frequency-di vision duplex (FDD), then H and H f are not equal. On the other hand, if the forward and feedback channels are time-division duplex (TDD), then H = H T f is often a reasonable assumption. B. Obtaining Channel State Information The two-way channel model allo ws two-way protocols, as depicted in Figure 2, where the transmitter and receiv er can conduct multiple back-and-forth transactions to complete transmission of one code word. The model was used in [10, 11] to study the di versity-multiplexing performance of two- way training method. In this paper, we will focus on another instance of the two-way protocols which will in volve channel estimation and quantized feedback. Ho wever , unlike [10, 11], we will not account for resources spent on feedback path since the methods developed in [10] directly apply to the current case. Instead, we will focus on the more important issue of understanding how the mismatch in the transmitter and receiv er channel information af fects system performance. T o develop a systematic understanding of the two-way fading channel described in Section II-A, we will consider two forms of receiver kno wledge about the channel H . The re- ceiv er will either be assumed to know the channel H perfectly or ha ve a noisy estimate b H obtained using a minimum mean- squared channel estimate (MMSE) via a training sequence (described in detail in Section V). For the transmitter kno wledge, the receiver will quantize its own knowledge ( H or b H ) and map it to an index J R , where J R ∈ { 0 , 1 , . . . , K − 1 } where K ≥ 1 is the number of feedback lev els used by the receiver for feedback. The receiv er then transmits the quantized channel information J R ov er the feedback channel and the transmitter knowledge of the index is denoted by J T . F or the case when the feedback channel is assumed to be noiseless, J T = J R , else J T 6 = J R with finite probability . Here the error probability will be depend on the channel signal-to-noise ratio SNR and the transmission scheme. Based on the quantized information about the channel, in the form of J T , the transmitter adapts its codeword to min- imize the probability of outage (defined in the next section). Sender Destination antennas antennas Fig. 2. T wo-way protocols: In a two-way channel, the protocols can inv olve many exchanges between the transmitter and receiver . The arrows indicate the direction of transmission. 4 C. Diversity-Multiple xing T radeof f W e will consider the case when then the codeword X spans a single f ading block. Based on the transmitter channel knowledge J T = i , the transmitted codeword is chosen from the codebook C i = { X i (1) , X i (2) , · · · , X i (2 RT coh ) } , where the codebook rate is R . All X i ( k ) ’ s (for 1 ≤ k ≤ 2 RT coh ) are matrices of size m × T coh . W e assume that T coh is finite and does not scale with SNR . In this paper, we will only consider single rate transmission where the rate of the codebooks does not depend on the feedback index. Furthermore, we will assume codebooks C i are deri ved from the same base codebook C by power scaling of the code words. In other words C i = √ P i C , where the product implies that each element of every codeword is multiplied by √ P i where each element of codew ord X ( k ) ∈ C has unit power . Thus, P i is the power of the transmitted codewords. Recall that there is an av erage transmit po wer constraint, such that E ( P i ) ≤ SNR . W e further assume that the base codebook C consists of Gaussian entries, in other words we only focus on Gaussian inputs in this paper . In point-to-point channels, outage is defined as the e vent that the mutual information of the channel for a channel knowledge H ∈ ξ with some distribution P H | ξ (for ξ a subset of all n × m matrices) at the receiv er, I ξ ( X ; Y ) is less than the desired rate R [21]. If ξ is a singleton set containing H , the channel H is known at the receiver in which case I ξ ( X ; Y ) = log det  I + P ( J T ) m H QH †  is the mutual information of a point-to-point link with m transmit and n receiv e antennas, transmit signal to noise ratio P ( J T ) and input distrib ution Gaussian with co v ariance matrix Q [22]. The dependence of the index at the transmitter is made explicit by writing the transmit SNR as a function of J T . Note that P i and P ( i ) will mean the same thing in this paper . Let Π( O ) denote the probability of outage, where O is the set of all the channels where the maximum supportable rate I ξ ( X ; Y ) is less than the transmitted rate R [21]. The system is said to hav e div ersity order of d if Π( O ) . = SNR − d 1 . Note that all the index mappings, codebooks, rates, powers are dependent on the average signal to noise ratio, SNR . Specif- ically , the dependence of rate R on SNR is explicitly given by R = r log SNR , where r is labeled as the multiplexing gain. The di versity-multiple xing tradeoff is then described as the maximum di versity order d ( r ) that can be achieved for a giv en multiplexing gain r . The follo wing result captures the div ersity-multiplexing tradeoff for the case of perfect recei ver information and no transmitter information. For a given rate R . = r log SNR and power P . = SNR p , define the outage set O ( R, P ) = { H :  I + P m H QH †  < R } . Denote the result di versity order as G ( r , p ) , which is Π( O ( R, P )) . = SNR − G ( r,p ) . The following result completely characterizes G ( r, p ) and is a straightforward extension of the result in [22]. Lemma 1 ([16]) . The diversity-multiple xing tradeoff for P . = 1 W e adopt the notation of [22] to denote . = to represent exponential equality . W e similarly use . < , . > , . ≤ , . ≥ to denote exponential inequalities. SNR p and r ≤ p min( m, n ) is given by the ( r, G ( r , p )) wher e G ( r , p ) = inf min( m,n ) X i =1 (2 i − 1 + max( m, n ) − min( m, n )) α i wher e the inf is over all α 1 , · · · , α min( m,n ) satisfying { α 1 ≥ . . . α min( m,n ) ≥ 0 , min( m,n ) X i =0 ( p − α i ) + < r } . Remark 1. G ( r , p ) is a piecewise linear curve connect- ing the points ( r , G ( r, p )) = ( k p, p ( m − k )( n − k )) , k = 0 , 1 , . . . , min( m, n ) for fixed m , n and p > 0 . This follows directly from Lemma 2 of [13]. G ( r, 1) is the div ersity multiplexing tradeoff with perfect CSIR and the transmit signal to noise ratio . = SNR [22]. Further , G ( r , p ) = pG ( r p , 1) . D. Summary of Results W e will study the following four systems with different accuracy of channel state information (CSI) at the transmitter and receiver . 1) CSIRT q : In this case, the receiv er knowledge about H is assumed to be perfect and the transmitter is assumed to receiv e a noiseless quantized feedback from the recei ver about channel H . This case was first studied in [23] and the div ersity order increase was first pro ved in [17] with later extensions in [13]. The channel quantizer maps H to an index J R which is then communicated over the noiseless feedback channel. Since the feedback is noise- less, J T = J R . Based on the index J T , the transmitter adapts its transmission power as described in Section III. The di versity order for K lev els of feedback is defined recursiv ely as d RT q ( r , K ) = G ( r , 1 + d RT q ( r , K − 1)) where d RT q ( r , 0) = 0 and gro ws exponentially in number of feedback bits. 2) CSIR b T q : Our first set of results analyze the case of imperfect channel knowledge at the transmitter, where the errors are caused by errors in the received quan- tized feedback index. In this case, J T 6 = J R with finite probability . W e show that if a MIMO scheme optimized for equally likely input messages is used to send the feedback information, then the diversity order is limited to d R b T q ( r , K ) given in T able I. Howe ver , the feedback information is not equally-lik ely . Hence, we propose a natural unequal error protection method labeled power contr olled feedback , where the po wer control is performed based on input probabilities. The power -controlled feedback results in a di versity-order d R b T q ( r , K ) , specified in T able I. With po wer-controlled feedback, the maximum div ersity order increases from d R b T q (0 , K ) = 2 mn to d R b T q (0 , K ) = mn ( mn + 2) for all K > 2 . For K = 2 , the maximum diversity order increases from d R b T q (0 , K ) = 2 mn to d R b T q (0 , K ) = mn ( mn + 1) . 3) CSI b RT q : Our next step will be to isolate the effect of errors in recei ver knowledge of the channel. Thus, the receiver will estimate b H , which will be mapped to J R . Since the feedback is assumed to be perfect, 5 the transmitter knowledge is same as that of receiver , J T = J R . W e will show that for a three-phase po wer- controlled training based protocol can achie ve a di versity order of d b RT q ( r , K ) = G ( r, 1 + G ( r , 1)) . In f act, if the power -controlled training is not performed then channel estimation errors completely dominate and feedback is rendered useless; the resultant tradeoff collapses to d b RT q ( r , K ) = G ( r , 1) for all K . So analogous to power - controlled feedback, po wer-controlled training appears essential to improv e the div ersity order in this case. 4) CSI b R b T q : Finally , we put the above two cases together and deri ve the maximum achiev able diversity order as d b R b T q ( r , K ) = G ( r, 1 + G ( r , 1)) , which is equiv alent to the case with only receiv er errors. Thus, we conclude that the receiv er errors dominate the achiev able diversity order . Remark 2. An important word of caution for all the results (new and known) summarized in T able I. Unlike the pr evious work in [10, 21], we do not account for the resour ces spent in channel tr aining and feedback in this paper . Resour ce account- ing can be performed using the procedur e developed in [10], by scaling the multiple xing r appr opriately . Mor e pr ecisely , the multiple xing r should be r eplaced by rT coh / ( T coh − 1) for CSIR b T q , r T coh / ( T coh − 2 m ) for the thr ee-phase pr otocol in power-contr olled training of CSI b RT q , r T coh / ( T coh − m ) for two-phase pr otocol in constant power training of CSI b RT q , and r T coh / ( T coh − 2 m − 1) for the three-phase pr otocol of CSI b R b T q . The resour ce accounting multipliers assumes that the feedback r equires one channel use and the training requir es m channel uses. The details ar e further explained in Remarks 3 and 5. Further , the use of the number of antennas to use and the pr otocols would need to be optimized for each multiplexing as in [10, 21] and is omitted in this paper for r eadability . I I I . C S I R T Q : P E R F E C T C S I R W I T H N O I S E L E S S Q UA N T I Z E D F E E D B AC K The di versity-multiplexing tradeoff for the case of receiv er with perfect information and noiseless quantized information has been extensiv ely studied in [8, 13, 14]. In this section, we will discuss the main ideas for the single user MIMO channel model stated in Section II-A. W e start off with an e xample to illustrate the main idea. Example 1 (SISO) : Consider the case where m = n = 1 . W ithout any feedback, the maximum div ersity (at r → 0 ) is 1. The space of channels { H } = C can be divided into two sets: the outage set O 0 = n H : k H k 2 < 2 R − 1 SNR o and its complement O 0 = { H } \ O 0 ; see Figure 3(a). The probability of the set Π( O 0 ) ≈ SNR − 1 and Π( O 0 ) ≈ (1 − SNR − 1 ) . W ith one-bit of noiseless feedback, the receiver can con vey whether the current channel H belongs to O 0 or O 0 . Since the outage e vent is rare, the transmitter can send much larger power than usual to reduce the outage probability as follows. When the feedback index is J R = 1 representing ev ent O 0 , the transmitter will use transmit po wer 1 2 SNR / Π( O 0 ) ≈ 1 2 SNR 2 . For the feedback index J R = 0 , representing O 0 , po wer 1 2 SNR is used. The av erage power used in the abov e two- lev el power control is 1 2 SNR / Π( O 0 ) · Π( O 0 ) + 1 2 SNR · (1 − SNR − 1 ) ≤ SNR . W ith the above one-bit power control, the outage probability is ≈ SNR − 2 since the set of channels in outage is reduced to O 1 = n H : k H k 2 < (2 R − 1) · 2 · Π( O 0 ) SNR o ≈ n H : k H k 2 < (2 R − 1) · 2 SNR 2 o ; see Figure 3(b). If the feedback rate was log 2 (3) bits/channel state, then the receiver could con vey information about three ev ents {O 1 , O 0 \ O 1 , O 0 } . In state O 1 , the transmitter can send power ≈ SNR 3 since Π( O 1 ) ≈ SNR − 2 and thus reduce the outage probability to SNR − 3 . If we had two bits of feedback or equiv alently K = 4 le vels, then the di versity order will be 4 and the outage region will be given by O 3 which will be the set of all those channels which could not support rate R with power ≈ SNR 4 ; see Figure 3(c). Using the above recursiv e argument, for K le vels of feedback, a diversity order of K = 2 b can be achieved, where b is the number of feedback bits per channel realization.  The abo ve example captures the essence of the general result for the MIMO channels, giv en by the following theorem. Theorem 1 ([13]) . Suppose that K ≥ 1 and r < min( m, n ) . Then, the diversity-multiple xing tradeoff for the case of perfect CSIR with noiseless quantized feedback is d RT q ( r , K ) defined r ecursively as d RT q ( r , K ) = G ( r , 1 + d RT q ( r , K − 1)) , wher e d RT q ( r , 0) = 0 . The main idea of the proof is along the lines of Example 1 and is summarized as follows. Based on its knowledge of H , the recei ver decides the feedback index J R = J T ∈ { 0 , 1 , · · · , K − 1 } , where K = 2 b is the number of quan- tization le vels and b is the number of feedback bits. If the transmitter recei ves feedback index J T , it sends data at power lev el P J T . Without loss of generality , P 0 ≤ P 1 ≤ · · · ≤ P K − 1 . The optimal deterministic inde x mapping has the following form [13], J R =      arg min i ∈ I i, I =  k : log det( I + H H † P k ) ≥ R , k ∈ { 0 , · · · , K − 1 }} 0 , if the set I is empty . Based on the abov e assignment of feedback indices, the optimal po wer lev els can be found out as P i . = SNR 1+ p i where p i are recursiv ely defined as: p 0 = 0 , p j = G ( r , 1 + p j − 1 ) ∀ j ≥ 1 . Using the abo ve recursion, the optimal div ersity is given by G ( r , 1 + p K − 1 ) which reduces to d RT q ( r , K ) , thus proving Theorem 1. W e note that special cases of Theorem 1 were also proved in [8]. W e note that ev ent that { H : log det( I + H H † P K − 1 ) < R } corresponds to the outage e vent, because none of the power lev els can be used to reach a mutual information of R . Thus any state can be assigned to the outage e vent and in f act, the div ersity-multiplexing tradeoff is unaf fected by which index is used to represent this state. Since the receiver knowledge is perfect, using the index J R = 0 ensures that the overall power consumption is minimized. Howe ver , when the receiv er 6 T ABLE I S U MM A RY O F D I V E RS I T Y - M ULT I PL E X I NG T R AD E O FF S . S E E R E MA R K 2 F O R A C AU T IO N I N U SI N G T H I S TAB L E . Case Main Characteristic D-M Tradeof f CSIR T Perfect Information at T and R d R T ( r ) = ∞ ∀ r < min( m, n ) CSIRT q Quantized Information at T d RT q ( r , K ) = G ( r , 1 + d RT q ( r , K − 1)) , d RT q ( r , 0) = 0 CSIR b T q Noisy information at T d R b T q ( r , K ) = min( B K ( r ) , mn + G ( r, 1)) (Constant Power Feedback) CSIR b T q Noisy information at T d R b T q ( r , K ) = min  d RT q ( r , K ) , max q j ≤ 1+ d RT q ( r,j ) (Power -controlled Feedback) min K − 1 i =1 ( mn (( q i ) + − ( q i − 1 ) + ) + d RT q ( r , i ))  CSI b RT q Noisy information at R d b RT q ( r , K ) = G ( r , 1) (Constant Power Training) CSI b RT q Noisy information at R d b RT q ( r , K ) = G ( r , 1 + G ( r, 1)) (Power -controlled training) CSI b R b T q No Genie-aided information d b R b T q ( r , K ) = G ( r , 1 + G ( r, 1)) (Power -controlled training & feedback) (a) (b) (c) Fig. 3. Example 1: Channel events for the case of (a) no feedback ( O 0 is the set of channels in outage) and (b) one-bit feedback ( O 1 is the outage set) and (c) two-bit feedback ( O 3 is the outage set). or transmitter knowledge is not perfect (which will be the case in the rest of the paper), we will assign the perceiv ed outage state the highest power lev el P K − 1 , which helps reduce the outage probability due to misestimation of channel. The maximum diversity order in Theorem 1 increases very rapidly with the number of feedback le vels K . The maximum div ersity order [13] d RT q (0 , K ) = P K g =1 ( mn ) g , which gro ws exponentially fast in the number of levels K . In fact, as K → ∞ , the diversity order also increases unboundedly . Since as K → ∞ , the feedback approaches the perfect feedback and as a result, perfect channel in version becomes possible. In [3], it was sho wn that for perfect channel state information at transmitter and recei ver , zero outage can be obtained at finite SNR if max( m, n ) > 1 . Perfect channel in version essentially con verts the f ading channel into a v ector Gaussian channel, whose error probability goes to zero for all rates less than its capacity . The unbounded gro wth of diversity order is rather unsettling and is in fact, a fragile result as sho wn by our results in the following sections. I V . C S I R b T Q : P E R F E C T C S I R W I T H N O I S Y Q UA N T I Z E D F E E D BA C K In this section, we will analyze the case when the receiver knows the channel H perfectly b ut the quantization index J R is conv eyed to the transmitter over a noisy feedback link, resulting in the event J T 6 = J R with non-zero probability . W e will consider two feedback designs. In the first design, the feedback channel will use a constant power transmission scheme designed for equally-likely symbols (which is the commonly studied case for i.i.d. data). Learning from the limitations of constant power feedback transmission, we will then construct a new po wer-controlled feedback strate gy which will exploit the unequal probability of different channel events quantized at the recei ver . A. Constant P ower F eedback T ransmission First, we observe the impact of using the power control described in Section III by considering Example 1. Example 2 (Impact of F eedback Err ors, SISO) : Consider the case of b = 2 feedback bits which allows K = 4 feedback indices. In this case, the recei ver can conv ey four e vents, which 7 we choose to be O = {O 0 , O 0 \ O 1 , O 1 \ O 2 , O 2 } (shown in Figure 3(c)), where the ev ents are described as O i =  H : log  1 + | H | 2 SNR 1+ i  < R  , i = 0 , 1 , 2 . (3) As discussed in the previous section, the po wer control in the forward channel uses four different power lev els, P J T . = SNR 1+ J T for J T = 0 , 1 , 2 , 3 . The probability of each of the abov e ev ents and the associated power control is defined in T able II. The last column in T able II sho ws the impact of probability of e vents as seen by the transmitter when there are errors in the feedback link such that Π( J T = j | J R = i ) · = SNR − 1 for j 6 = i as shown in Figure 4(b). F or example, with noisy feedback, Π( J T = 3) = P 3 j =0 Π( J T = 3 | J R = j )Π( J R = j ) . = SNR − 1 + SNR − 1 SNR − 1 + SNR − 1 SNR − 2 + SNR − 3 . = SNR − 1 . In fact, there are two dominant error events, first being O 0 ( J R = 0 ) being confused as either O 0 \ O 1 or O 1 \ O 2 or O 2 ( J T = 1 , 2 or 3) which amounts to limiting the maximum power that can be used and the second being the probability that O 1 ( J R = 1 ) being confused as O 0 ( J T = 0 ). The second error e vent has approximate probability of SNR − 2 which limits the maximum div ersity to 2 . Thus, the e vent probabilities at transmitter are no longer the same in the presence of the feedback errors. As a result, the transmitter cannot use the power control P J T = SNR 1+ J T without exceeding the a verage power constraint. In fact, the highest power the transmitter can send is of the order . = SNR 2 , and the maximum div ersity order of the constant power feedback mechanism is limited to 2 irrespective of the number of feedback bits.  W e no w generalize the above e xample to MIMO for ar- bitrary multiplexing gain. Define B j ( r ) be defined by the recursiv e equation B j ( r ) = ( 0 , j = 0 G ( r , 1 + min( mn, B j − 1 ( r ))) , j ≥ 1 . Theorem 2. [16] Suppose that K > 1 and r < min( m, n ) . Then, the diversity-multiple xing tradeoff for the constant power feedbac k transmission with K indices of feedback is given by d R b T q ( r , K ) = min( B K ( r ) , mn + G ( r, 1)) . Remark 3. Accounting for the feedback resour ces can be done as follows. In the limit of high SNR , the feedback will consume one channel use and hence to get the rate R . = r log ( SNR ) , r should be r eplaced by r T coh / ( T coh − 1) in the above expr ession. However , the r eader must note that there ar e implicit time factor terms which can be easily inte grated and one may carry out an optimization on the diversity obtained as a function of these time loss terms for each multiplexing. F or example, for r > min( m, n )( T coh − 1) /T coh , the feedback would not be useful due to the time spent and it is better to use a non-feedback based str ate gy . Much like in Example 2, the maximum di versity order for r → 0 is 2 mn for all K ≥ 2 . That is one-bit of feedback is sufficient to achiev e the maximum div ersity order with constant-power feedback. Ho wev er, for r > 0 , as the number of feedback levels K increases, the di versity order d R b T q ( r , K ) also increases, such that d R b T q ( r , K ) ≤ G ( r, 1) + mn for all r . Note that G ( r, 1) is the di versity order without any feedback. Coincidentally , the div ersity order of the feedback link is mn (feedback link is non-coherent where the error probability can decay no faster than SNR − mn ), and ends up determining the maximum gain possible beyond G ( r , 1) . The key bottleneck in the abov e result is that the feedback link is using a transmission scheme optimized for equally- likely signals, which is appropriate if the feedback link was being used to send equally-likely messages like data packets. Howe ver , the information being con ve yed in the feedback link is not equiprobable and hence the usual MIMO schemes optimized for equally-likely messages are not well suited. Again, in the context of Example 2, we will first show an alternate design of power-contr olled feedback can improve the div ersity order in the next section. Before we proceed, we note that our analysis in [10] for a TDD two-way channel yields the same maximum di versity order of 2 mn . It is satisfying to see how two dif ferent feedback methods have identical behavior; their relationship is further explored in [24]. B. P ower -contr olled F eedback In this section, we exploit the unequal probabilities of the outage ev ents at different power levels to de velop a power -controlled feedback scheme to reduce overall outage probabilities. Our feedback transmission will be designed for a non-coherent channel, since the feedback channel H f is not known at the transmitter or the recei ver . Example 3 (P ower -contr olled F eedbac k) : For the constant- power feedback transmission scheme described in the Sec- tion IV -A, each input is mapped to a codeword with the same power and is pictorially depicted in Figure 5(a). Ho wev er, since the events in set O are not equi-likely , we assign different po wer le vels to each e vent. Now consider the power assignments shown in T able III, also depicted in Figure 5(b) for the feedback channel. The av erage po wer o ver the feedback channel is SNR 0 (1 − SNR − 1 ) + SNR 2 SNR − 1 + SNR 3 SNR − 2 + SNR − 3 SNR 4 . = SNR Here the rare e vents are con veyed with more power , which allows a more reliable deli very of the feedback information without violating feedback po wer constraint. Thus, the power- controlled scheme resembles an amplitude shift keying. The reason that the probabilities at the transmitter and the receiv er are same for power -controlled is because the e vents J T > J R occur with exponentially small probability . The out- age probability can be lo wer bounded by Π( J T = 0 , J R = 1) which is an ev ent when the receiv er requested a higher po wer lev el than what it receiv ed. Note that Π( J R = 1) . = SNR − 1 and Π( J T = 0 | J R = 1) . = SNR 0 − 2 . Hence, Π( J T = 0 , J R = 1) . = SNR − 1 SNR − 2 = SNR − 3 is the probability of this dominant error ev ent. Note that both the forward and feedback channel power constraints are satisfied. Hence, we can obtain a di versity order of 3 , which is higher than the di versity order 8 T ABLE II E X AM P L E 2 : P RO BA B I L IT Y O F E V EN T S AT T H E T R A NS M I T TE R A ND R E CE I V E R W I TH A N D W I T H OU T N OI S E I N T H E F E E D BAC K C H A NN E L . R E CE I V E R K N OW L ED G E I S A S S UM E D T O B E P E R FE C T . ( C AU TI O N : P RO BA B I LI T I E S A R E O N L Y R E P O RTE D U P TO T H E IR O R DE R , A N D C O N STA N TS S U CH T H A T T H E Y S U M T O O N E A R E O M IT T E D .) Event Feedback Prob of Event Prob at T ransmitter Prob at Transmitter Index ( J R ) at Receiver (noiseless feedback) (noisy feedback) O 0 0 (1 − SNR − 1 ) (1 − SNR − 1 ) (1 − SNR − 1 ) O 0 \ O 1 1 SNR − 1 SNR − 1 SNR − 1 O 1 \ O 2 2 SNR − 2 SNR − 2 SNR − 1 O 2 3 SNR − 3 SNR − 3 SNR − 1 (a) Event Input Prob Output Prob (b) Input Prob Output Prob Fig. 4. Example 2: Input-output probabilities of the feedback channel, when (a) the channel is noiseless and (b) when the channel is noisy causing mismatch between transmitter and receiv er knowledge, assuming that a code optimized for equally-likely source is used. The noise in the feedback channel changes the output probabilities, as shown by the circled output probabilities. (Note that the sum of probabilities is more than one since we have omitted the constants for ease of understanding). T ABLE III E X AM P L E 3 : P OW E R A S S IG N M E NT F O R N O I S Y F E E D BAC K C H A N NE L . ( C AU T IO N : P RO BA B I L IT I E S A R E O N L Y R EP O RT E D U P T O T H E I R O R DE R , A N D C O NS TA NT S S U C H T H A T T H EY S U M T O O N E A R E O M IT T E D .) Event Feedback Prob at Receiv er Feedback T ransmit Prob at T ransmitter Index Power (noisy feedback) O 0 0 (1 − SNR − 1 ) SNR 0 (1 − SNR − 1 ) O 0 \ O 1 1 SNR − 1 SNR 2 SNR − 1 O 1 \ O 2 2 SNR − 2 SNR 3 SNR − 2 O 2 3 SNR − 3 SNR 4 SNR − 3 of 2 obtained via constant-power feedback design (Example 2) but lower than 4 that can be achie ved with a noiseless feedback channel (Example 1).  (a) (b) Fig. 5. Examples 2 and 3: Dominant error e vents when (a) the feedback channel uses a transmission suited for equally-likely messages and (b) a power -controlled feedback design where codewords are power -controlled based on their probability of occurrence. W e will generalize the above example to the case of general MIMO channels by using the follo wing power -control for the feedback channel. Let the feedback power lev el for different feedback messages be Q 0 . = SNR q 0 , Q 1 . = SNR q 1 , . . . , Q K − 1 . = SNR q K − 1 with q j > q j − 1 . Assume for now that the set of po wers { Q i } satisfy the feedback power constraint of SNR . W e will use the maximum aposteriori probability (MAP) detection rule to detect the message transmitted by the re- ceiv er on the feedback channel. If the transmitted signal is at signal to noise ratio lev el of Q , the receiv ed power is S f . = trace( H H † Q + W W † + √ Q ( H W † + W H † )) . Since we are encoding the feedback index information in signal power , we will compare the received po wer S f with different threshold power levels. W e first note that for all q i ≤ 0 , the recei ved power will be dominated by the noise term trace( W W † ) , which is . = 1 with high probability . Thus, we assume that q j > 0 for j > 0 . Suppose the eigen values of H H † are ( λ 1 , · · · , λ m N ) and λ i . = SNR − α i where α i are the negati ve SNR exponents of the λ i , m N = min( m, n ) and α = ( α 1 , · · · , α m N ) . Then, the distribution of α i is given by the following result. Lemma 2. [22] Assume α 1 ≥ α 2 ≥ · · · α m N ar e the power 9 exponents as described above. In the limit of high SNR , the pr obability density function of the SNR exponents, α , of the eigen values of H H † is given by p ( α ) . = Π m N i =1 SNR − (2 i − 1+ | n − m | ) α i 1 min( α ) ≥ 0 . (4) W e now show that the thresholds for MAP decoding are SNR max( q 0 , 0)+  , SNR q 1 +  , . . . , SNR q K − 2 +  for some small  > 0 . This can be derived by observing reason is that Π( J T > i | J R = i ) decays faster than polynomially 2 in SNR and is hence . = 0 as long as the threshold for detecting J T = i is abov e SNR q i +  for any  > 0 . T o see this, for 0 ≤ j < K − 1 , Π( J T > j | J R = j ) = Π  trace( H H † SNR max( q j , 0) ) + 1 ≥ SNR max( q j , 0)+   . = Π  X ( SNR q j − α i ) + 1 ≥ SNR max( q j , 0)+   . = Π  SNR max( q j − min α i , 0) ≥ SNR max( q j , 0)+   . = Π (max( q j − min α i , 0) ≥ max( q j , 0) +  ) (5) W e see that for the last expression to occur with finite prob- ability , min α i < 0 which cannot happen with polynomially decreasing probability by Lemma 2 and hence Π( J T > j | J R = j ) . = 0 irrespectiv e of  > 0 . No w , for MAP detection, we would find a threshold between q i and q i +1 so as to minimize Π( J T = i + 1 , J R = i ) + Π( J T = i, J R = i + 1) . Since the first term . = 0 and the second term is minimized by choosing  as small as possible, choosing  small enough gives the desired threshold for MAP decoding. Further , for all Q ˙ > SNR , trace( H H † Q + W W † + √ Q ( H W † + W H † )) . = trace( H H † ) Q + 1 . Hence, we obtain Π( J T = 0 | J R = 1) = Π(trace( H H † SNR p 1 ) + 1 ˙ < SNR max( p 0 , 0)+  ) . Let λ i be the eigenv alues of H H † and λ i . = SNR − α i . Then, Π( J T = 0 | J R = 1) = Π  trace( H H † SNR q 1 ) + 1 ˙ < SNR max( q 0 , 0)+   . = Π   min( m,n ) X i =1 SNR q 1 − α i + 1 < SNR max( q 0 , 0)+    . = Π  SNR max( q 1 − min α i , 0) ˙ < SNR max( q 0 , 0)+   . = Π (max( q 1 − min α i , 0) < max( q 0 , 0) +  ) . = Π (min α i > q 1 − max( q 0 , 0) −  ) . = SNR − mn ( q 1 − max( q 0 , 0)) where the last step follo ws from Lemma 2. Since q 1 ≤ 1 + G ( r, 1) (as Π( J R = 1) ˙ ≥ SNR − G ( r, 1) and there is power constraint for feedback link of SNR ), the outage probability is lower bounded by Π( J T = 0 , J R = 1) ≥ SNR − G ( r, 1) SNR − mn (1+ G ( r, 1) − 0) . = SNR − mn (1+ G ( r, 1)) − G ( r , 1) . Further, we can use the same tech- nique to find that for an y j < i , Π( J T = j | J R = i ) . = SNR − mn ( q i − max( q j , 0)) . (6) 2 Any function f ( SNR ) which decays faster than any polynomial in SNR , like e xponential or super -exponential functions of SNR , are . = 0 since lim SNR →∞ log ( f ( x )) log( SNR ) = 0 . Thus we obtain the following result. Theorem 3 (Po wer-controlled Feedback) . Suppose that K > 1 and r < min( m, n ) . Then, the following diversity- multiplexing tradeoff can be achieved with power -controlled feedback d R b T q ( r , K ) = min d RT q ( r , K ) , max q j ≤ 1+ d RT q ( r,j ) min i =1 ,...,K − 1 ( mn (( q i ) + − ( q i − 1 ) + ) + d RT q ( r , i ))  . Pr oof: The proof is provided in Appendix B. Corollary 1. The diver sity multiple xing tr adeoff with one bit of imperfect feedback is same as the diversity multiplexing tradeof f with one bit of perfect feedback. In other wor ds, for K = 2 , d R b T q ( r , 2) = G ( r, 1 + G ( r, 1)) = d RT q ( r , 2) . Pr oof: For K = 2 , the optimal choice of q is to maximize q + 1 − q + 0 which giv es optimal choice of q 0 = 0 and q 1 = 1 + G ( r, 1) . Using this for one bit of feedback, diversity of G ( r, 1 + G ( r , 1)) can be achiev ed with power -controlled feedback which is the optimal considering the upper bound of perfect feedback is d RT q ( r , 2) = G ( r, 1 + G ( r , 1)) . Thus, we find that there is no loss of div ersity with one imperfect bit of feedback compared to the case of perfect feedback. Hence, power controlled feedback scheme is better than the constant po wer scheme which was limited to a maximum of 2 mn di versity even as r → 0 . Corollary 2. As K → ∞ and r → 0 , the maximum diversity that can be obtained with power -contr olled feedback is mn ( mn + 2) , i.e, lim K →∞ ,r → 0 d R b T q ( r , K ) = mn ( mn + 2) . Pr oof: Note that in this case, choosing q 0 = 0 and q i = 1 + d RT q ( r , i ) ≈ 1 + mn mn i − 1 mn − 1 1 mn> 1 + i 1 mn =1 giv es the optimal diversity multiplexing point for r → 0 and thus, the div ersity is mn ( mn + 2) . W e further note that K = 3 is enough to get this point. Thus, even with power -controlled feedback, arbitrary num- ber of feedback bits do not yield unbounded increase in div ersity order as in the case of CSIRT q , where the diversity order increases unbounded with K . Note that howe ver , we restricted our attention to an ordering q j > q j − 1 which can be relaxed gi ving better results as in the following Lemma. Howe ver , for our objectiv e of the achiev- ability for imperfect channel state at the receiver and imperfect channel state at the transmitter , the achie vability strategy in Theorem 3 is enough. Further note that the ne w achiev ability strategy which relaxes the assumption of q j > q j − 1 does gi ve improv ed div ersity , but still the diversity remain bounded with increase of the feedback lev els. Lemma 3. Let q i ≥ 0 and q i 6 = q j for any i 6 = j , 0 ≤ i, j ≤ K − 1 . Then, the following diversity can be achie ved by power-contr olled feedback. 10 min  d RT q ( r , K ) , max ( q 0 , ··· ,q K − 1 ) ∈ Q min i,j ∈ 0 ,...,K − 1: j q j ( d RT q ( r , i ) + ( q i − q j ) mn )) ∀ 0 ≤ j ≤ K − 1 } . Pr oof: The proof is a simple generalization of the proof of Theorem 3. The constraint on Q keeps the Π( J T = i ) SNR q i ˙ ≤ SNR for the po wer constraint and the div ersity expression records the possible dominant outage e vents. C. A Source Coding Interpr etation The idea of po wer-controlled feedback transmission is akin to source-coding the feedback information. In con ventional lossless source coding (e.g. Huf fman coding), the currency of representation is bits. There, to minimize average code-length, the code word length is (approximately) in versely proportional to the probability of an e vent. Rare ev ents are represented by longer codewords while more frequent e vents are represented by shorter length code words, thereby minimizing the a verage codelength. Analogously , our currency is av erage transmit po wer and objectiv e is to minimize av erage error probability . Thus rare ev ents get higher power and frequent e vents lo wer transmit power . Note that there are many po wer allocations which will meet the po wer constraint b ut they will all result in different error probabilities. Our proposed feedback transmit po wer al- location minimizes the error probability (in asymptotic sense). V . C S I b R T Q : E S T I M AT ED C S I R W I T H N O I S E L E S S Q UA N T I Z E D F E E D B AC K In this section, we will consider the case when the recei ver obtains its channel information from an MMSE estimate, b H , using training. As a result, the receiver index J R is not al ways equal to the optimal index, J , based on actual channel H . W e will model the relationship between J and J R via an effecti ve channel (described in Section V -C). The feedback link, on the other hand, is assumed to be noiseless. As a result, the transmitter index J T = J R . W e will analyze two protocols. The first protocol, labeled constant-power training, trains the receiver once at the begin- ning using a constant po wer le vel. W e show that feedback, ev en if noiseless, is completely useless in providing any gains in div ersity order compared to a no-feedback system. Inspired by our understanding of the noisy feedback channel in Section IV, we propose a second protocol, labeled power - contr olled training which utilizes the feedback and trains the receiv er twice, where the second training is power -controlled based on feedback information. The second protocol has a higher diversity order than any non-feedback system. A. T raining the r eceiver In this subsection, we will consider MMSE channel estima- tion for a single user MIMO channel. The channel is estimated using a training signal that is known at the receiv er . From the receiv ed signal, MMSE estimation is done as in [25] to get an estimate b H of the original channel H . Let X T be the training signal of size m × N for some m ≤ N < T coh that is known at the recei ver and transmitter . The transmitter sends X T and the destination recei ves Y T = H X T + W T where Y T is a n × N recei ved signal and W T is the additiv e Gaussian noise with each entry from C N (0 , 1) . Follo wing [25], the optimal training signal is X T =  p ( nµ 0 − 1) + I m 0 m × ( N − m )  , where µ 0 is tuned to satisfy po wer constraint, I m denote m × m identity matrix and 0 m × ( N − m ) represents m × ( N − m ) matrix having all entries 0 . Hence, nµ 0 − 1 = N SNR m . Further, the channel estimate is gi ven by b H = Y T ( X † T X T + I N ) − 1 X † T = Y T " √ N SNR m 1+ N SNR m I m 0 N − m × m # , which can be re written as b H = H 1 1 + m N SNR + W 2 q N SNR m 1 + N SNR m (7) where W 2 is the left n × m submatrix of W T . W e now note some properties of MMSE estimate. First, it is easy to see that expected value of ( H − b H ) b H is zero confirming the orthogonality of the error with the unbiased estimate. Further , the variance of any entry in ( H − b H ) is 1 1+ N SNR m . Thus, H and b H are matrices in which each corresponding element is highly correlated with the correlation coef ficient between corresponding element of H and b H is ρ = 1 √ 1+ m N SNR . Further note that an y N ≥ m channel uses are equi valent for analyzing the asymptotic performance. In general, if G and H are correlated with correlation coefficient ρ , the joint probability distribution function of eigen values of H H † and GG † is giv en by the followi ng result: Lemma 4. [26] Consider two n × m random matrices H = ( h ij ) and G = ( g ij ) , i ∈ [1 , n ] , j ∈ [1 , m ] , each with i.i.d complex zer o-mean unit-variance Gaussian entries, i.e., E [ h ij ] = E [ g ij ] = 0 , ∀ i, j, E [ h ij h † pq ] = E [ g ij g † pq ] = δ ip δ j q , wher e the Kr oneck er symbol δ ij is 1 or 0 when i = j or i 6 = j r espectively . Mor eover , the correlation among the two random matrices is given by E [ h ij g † pq ] = ρδ ip δ j q , ∀ i, j, p, q , where ρ = | ρ | e j θ is a complex number with | ρ | < 1 . Let n ≤ m and ν = m − n . The joint probability distribution function of the unor dered eigen values of H H † and GG † is p ( λ, b λ ) = exp  − P n k =1 λ k + b λ k 1 −| ρ | 2  4 ( λ ) 4 ( b λ ) n ! n !Π n − 1 j =0 j !( j + ν )! | ρ | mn − n (1 − | ρ | 2 ) n × Π n k =1 ( q λ k b λ k ) ν det       I ν   2 | ρ | q λ k b λ l 1 − | ρ | 2         , (8) wher e 4 ( . ) r epr esents n − dimensional V andermonde determi- nant, I k ( . ) denotes the k th or der modified Bessel function of the first kind, the eigen values of H H † and GG † ar e given by 11 λ = ( λ 1 , · · · , λ n ) and b λ = ( b λ 1 , · · · , b λ n ) respectively . Note that although Lemma 4 assumed n ≤ m , it can be extended to the other case of m > n since nonzero eigen values of H H † and H † H are the same. Hence for all n and m , let m N = min( m, n ) and ν = | m − n | . Then, the joint probability density function of the unordered eigen values of H H † and GG † is p ( λ, b λ ) = exp  − P m N k =1 λ k + b λ k 1 −| ρ | 2  4 ( λ ) 4 ( b λ ) m N ! m N !Π n − 1 j =0 j !( j + ν )! | ρ | mn − m N × Π m N k =1 ( q λ k b λ k ) ν det     I ν  2 | ρ | √ λ k b λ l 1 −| ρ | 2      (1 − | ρ | 2 ) m N . (9) Recall that the eigen values of H H † be ( λ 1 , · · · , λ m N ) , λ i . = SNR − α i and α = ( α 1 , · · · , α m N ) . Similarly , let the eigen values of b H b H † be ( b λ 1 , · · · , b λ m N ) , b λ i . = SNR − b α i and b α = ( b α 1 , · · · , b α m N ) . The distribution of α i ’ s is given earlier in Lemma 2. W e will now find the joint distribution of α i ’ s and b α i ’ s. Let α 1 ≥ α 2 ≥ · · · α m N and b α 1 ≥ b α 2 ≥ · · · b α m N . Further , we define E k = { ( α , b α ) : min( α i , b α i ) ≥ 1 ∀ i = 1 , · · · , k , and 0 ≤ α i = b α i < 1 ∀ i > k } (10) for all 0 ≤ k ≤ m N . Theorem 4. Let H be the channel and b H be the estimated channel. In the limit of high SNR , the pr obability density function of the SNR exponents of the eigen values of H H † and b H b H † is given by p ( α , b α ) . = m N X k =0 e k 1 E k (11) wher e α 1 ≥ α 2 ≥ · · · α m N , b α 1 ≥ b α 2 ≥ · · · b α m N , and e k = SNR k ( | n − m | + k ) Π k i =1 SNR − (2 i − 1+ | n − m | ) b α i × Π m N i =1 SNR − (2 i − 1+ | n − m | ) α i . (12) Pr oof: Since all the results were symmetric about inter- changing m and n , without loss of generality , we take m ≤ n for the purpose of this proof and the rest of the paper . The proof is provided in Appendix A. Remark 4. Since the r eceiver is trained at SNR , we note that for all α i < 1 , α i = b α i with pr obability 1 , which means that the channel estimate is a r eliable pr oxy for the actual channel. On the other hand, if α i ≥ 1 , all we can state is that b α i ≥ 1 with pr obability 1. F or e xample, if b α i ≥ 100 , all we can r eliably say about α i is that it is ≥ 1 . In SISO case, the above pr operty of b α i implies that the channel cannot be r esolved below the noise floor since the noise dominates the training signal. The interesting implication in MIMO is that this r esult of noise dominance holds for all the eigen-values. None of the eigen-value of the channel can be r esolved beyond α i ≥ 1 if b α i ≥ 1 . Example 4 (Asymptotic Distribution) : For the case of m = n = 1 , which is our running example, p ( α, b α ) . = SNR − α 1 0 ≤ α = b α< 1 + SNR 1 − α − b α 1 min( α, b α ) ≥ 1 . This density function will be used to analyze the div ersity tradeoffs in this section. B. Constant-power T raining W e first note the decoding scheme with trained channel estimate at the receiv er . If the receiv er has estimate b H trained with the power of SNR p for some p ≥ 0 , the estimation error variance is SNR − p . As Y = H X + W = b H X + ( H − b H ) X + W , a lower bound on mutual information can be considered assuming that ( H − b H ) X + W is Gaussian noise and hence the expression for the coherent channel with actual channel as b H can be used as a lo wer bound [21]. The protocol is di vided into three phases as described below: Phase 1: T raining from Tx to Rx: The training is done using transmit power of SNR to obtain the channel estimate b H . On the basis of this training, the receiv er decides a feedback level J R in the following way . Suppose that there are K power lev els P i . = SNR 1+ p i , i ∈ { 0 , · · · , K − 1 } . The feedback index chosen at the recei ver is giv en by J R =      arg min i ∈ I i, I = { k : log det( I + b H b H † P k ) ≥ R , k ∈ { 0 , · · · , K − 1 }} K − 1 , if the set I is empty . Phase 2: The feedback index J R ∈ { 0 , · · · , K − 1 } is sent to the transmitter via the noiseless feedback channel. Thus, the feedback index recei ved by the transmitter is J T = J R . Phase 3: The transmitter recei ves a feedback power level J T and sends data at power lev el P J T . = SNR 1+ p J T . Example 5 (Constant-power tr aining) : The only error in chan- nel knowledge appears in the first phase, where the receiv er is trained. Note that for MMSE estimation b H and e H = H − b H are uncorrelated and since the receiv er is trained with power of SNR , the variance of e H is . = 1 / SNR . Let | b H | 2 = SNR − b α and | e H | 2 = SNR − 1 SNR − e α where the probability distrib ution function of b α and e α is p ( b α, e α ) = SNR − b α − e α 1 b α ≥ 0 1 e α ≥ 0 . The probability of outage is Π( O ) = K − 1 X j =0 Π( O , J T = j ) ≥ Π( O , J T = 1) = Π( O , J R = 1) ˙ ≥ Π( O , log(1 + | b H | 2 SNR 2 ) ≥ r log SNR , log(1 + | b H | 2 SNR ) < r log SNR ) . = Π log 1 + | b H | 2 SNR 2 1 + SNR 2 | e H | 2 ! < r log SNR , log(1 + | b H | 2 SNR 2 ) ≥ r log SNR , log(1 + | b H | 2 SNR ) < r log SNR  . = Π((2 − b α − (1 − e α ) + ) + < r, (2 − b α ) + ≥ r, (1 − b α ) + < r ) . = SNR − min ( b α, e α ) ∈ A ( b α + e α ) , (13) 12 where A = { ( b α, e α ) : b α ≥ 0 , e α ≥ 0 , (2 − b α − (1 − e α ) + ) + < r , (2 − b α ) + ≥ r, (1 − b α ) + < r } . Hence, substituting e α = 0 and b α = 1 − r + δ for δ small enough (Note that this choice of ( b α, e α ) is in A .) giv es a bound on the abov e probability as: Π( O ) ˙ ≥ SNR − min ( b α, e α ) ∈ A ( b α + e α ) ˙ ≥ SNR − (1 − r ) . (14) Thus, we see that the di versity of this scheme is at most same as the one without feedback. Hence, there is no adv antage of feedback.  Theorem 5. Suppose that K > 1 and r < min( m, n ) . Then, the diversity-mul tiplexing tradeoff is given by d b RT q ( r , K ) = G ( r , 1) . Pr oof: W e will consider the third phase in this case, where we see that any increase in power le vels do not help to increase the diversity . Achiev ability follows by not using any feedback. W e will prove the con verse here. Note that for MMSE estimation b H and e H = H − b H are uncorrelated and since the receiver is trained with power of SNR , the variance of e H is . = 1 / SNR . Let b λ i be the eigen values of b H b H † and b λ i . = SNR − b α i . Further , assume that e λ i be the eigen values of e H e H † and e λ i . = SNR − 1 SNR − e α i . The probability distribution function of b α = ( b α 1 , · · · , b α m ) and e α = ( e α 1 , · · · , e α m ) is p ( b α , e α ) = Π m N i =1 SNR − (2 i − 1+ | n − m | )( b α i + e α i ) 1 min( b α ) ≥ 0 1 min( e α ) ≥ 0 . The probability of outage is Π( O ) = K − 1 X j =0 Π( O , J T = j ) ≥ Π( O , J T = 1) = Π( O , J R = 1) ˙ ≥ Π( O , log det( I + b H b H † P 1 ) ≥ r log SNR , log det( I + b H b H † P 0 ) < r log SNR ) . = Π log det I + b H b H † P 1 1 + P 1 trace( e H e H † ) ! < r log SNR , log det( I + b H b H † P 1 ) ≥ r log SNR , log det( I + b H b H † P 0 ) < r log SNR  . = Π m X i =1 (1 + p 1 − b α i − ( p 1 − min e α ) + ) + < r, m X i =1 (1 + p 1 − b α i ) + ≥ r, m X i =1 (1 + p 0 − b α i ) + < r ! . = SNR − min ( b α , e α ) ∈ A P m i =1 ((2 i − 1+ | n − m | )( b α i + e α i )) (15) where A = { ( b α , e α ) : min b α ≥ 0 , e α ≥ 0 , P m i =1 (1 + p 1 − b α i − ( p 1 − min e α ) + ) + < r , P m i =1 (1 + p 1 − b α i ) + ≥ r , P m i =1 (1 + p 0 − b α i ) + < r } . Hence, substituting e α i = 0 and allocating b α as in [13] so that P m i =1 (1 + p 0 − b α i ) + = r − δ for δ small enough gives a bound on the above probability as: Π( O ) ˙ ≥ SNR − min ( b α , e α ) ∈ A P m i =1 ((2 i − 1+ | n − m | )( b α i + e α i )) ˙ ≥ SNR − G ( r, 1+ p 0 ) ˙ ≥ SNR − G ( r, 1) . (16) In the last step, p 0 ≤ 0 else the transmit po wer constraint cannot be satisfied. Thus, we see that the diversity of this scheme is at most same as the one without feedback. Hence, there is no advantage of feedback for constant power training. This was observed for r → 0 in [27]. Remark 5. Analogous to Remark 3, the accounting for the training r esour ces can be done . In the limit of high SNR , the training will consume m channel uses to train m antennas (as also seen in Section V -A) and hence to get the rate R . = r log( SNR ) , r should be r eplaced by r T coh / ( T coh − m ) in the above e xpr ession. However , these terms will be omitted in the sequel and can be similarly inte grated. Muc h like in Remark 3, one may carry out an optimization on the diversity obtained as a function of these time loss terms for each multiple xing to optimize over the number of antennas that need to be trained like in [10, 21]. C. P ower -contr olled T r aining As we sho wed in the previous section, the recei ver estimate was not good enough to help impro ve the div ersity with feed- back. W e no w propose a po wer-controlled training protocol which can improv e the outage performance using feedback. The protocol is again divided into three phases as described below: Phase 1: The transmitter sends the training signal using power SNR which is used at the recei ver to obtain channel estimate b H . On the basis of this training, the receiv er decides a feedback level J R in the following way . Suppose that there are K power le vels P i . = SNR 1+ p i , i ∈ { 0 , · · · , K − 1 } . (W e will giv e the exact constants in front while proving achiev ability .) The feedback index is J R =          arg min i ∈ I i, I = { k : log det( I + b H b H † P k ) ≥ R +  log ( SNR ) , k ∈ { 0 , · · · , K − 1 }} K − 1 , if the set I is empty , where  > 0 is some small constant chosen. W e will later substitute  → 0 . Phase 2: A feedback le vel J R ∈ { 0 , · · · , K − 1 } is fed back ov er the noiseless feedback channel to the transmitter , which implies J T = J R . Phase 3: If the feedback index J T is recei ved, the transmitter trains the recei ver again at po wer level P J T which is follo wed by the data at power le vel P J T . The trained channel estimate is denoted by b H 2 and let e H = H − b H 2 . The outage probability is defined as: Π( O ) , Π log det I + P ( J T ) m b H 2 Q b H † 2 1 + SNR mn trace( e H e H † ) ! < R ! (17) 13 T ABLE IV E X AM P L E 6 : P OW E R A S S IG N M E NT F O R C S I b R T Q . ( C AU T IO N : P RO BA B I LI T I E S A R E O N L Y R E P ORT E D U P T O T H E I R O R D E R , A N D C O NS TA NT S S U C H T H A T T H EY S U M T O O N E A R E O M IT T E D .) Event Prob at Receiv er T raining and T ransmit Power b O 0 1 − SNR − 1 SNR 1 b O 0 \ O 1 SNR − 1 SNR 2 b O 1 \ O 2 SNR − 2 SNR 3 b O 2 SNR − 3 SNR 4 This is the effecti ve outage probability using Gaussian code- books and considering the channel estimation error as noise [10]. Example 6 (P ower-contr olled T raining) : Since the receiv er now does not kno w the value of channel estimate H , but only an estimate b H , the e vents b O i can in this case be defined as b O i = n b H : log  1 + | b H | 2 SNR 1+ i  < R o , i = 0 , 1 , 2 . (18) Figure 6 depicts the relation between J and J R . Let α be the negati ve SNR exponent of | H | 2 while b α be the negati ve SNR exponent of | b H | 2 . Let J =      arg min i ∈ I i, I = { k : log(1 + | H | 2 P k ) ≥ R , k ∈ { 0 , · · · , 3 }} 0 , if the set I is empty . W e assume that 3 rd phase is perfect for this example and hence e H = 0 . W e will prov e later that this interference error in (17) due to e H does not make a difference asymptotically . Consider the e vent ( J R < 2 , J = 2) which would result in outage. The powers and the probabilities can be seen in T able IV. The outage probability for r → 0 is then Π( O ) ≥ Π( J R < 2 , J = 2) = Π(log det( I + b H b H † P 1 ) ≥ R , log(1 + | H | 2 P 1 ) < R , log(1 + | H | 2 P 2 ) ≥ R ) ≥ Π(2 − b α > 0 , 2 − α < 0 , 3 − α > 0) . = SNR max 2 <α< 3 , 1 < b α< 2 (1 − α − b α ) . = SNR 1 − 2 − 1 . = SNR − 2 (19) Thus, the maximum div ersity order is 2 with any number of feedback lev els. Note that this is more than constant power training which was limited to 1 . W e will sho w later in this Section that this can be achiev ed with a single feedback bit.  The abov e example can be generalized to MIMO systems as follows. Theorem 6. F or K > 1 and r < min( m, n ) , the diversity- multiplexing tradeoff of d b RT q ( r , K ) = G ( r , 1 + G ( r , 1)) can be achie ved with power-contr olled training. Further , the above is optimal for zer o multiplexing . Pr oof: The proof of this Theorem is provided in Appendix C. W e will first sho w that the di versity cannot be greater than mn ( mn + 1) and later prove that the div ersity multiplexing tradeoff of G ( r, 1 + G ( r, 1)) can be achieved for K = 2 . Note that d b RT q ( r , K ) = d R T q ( r , 2) which means that diver - sity with imperfect receiver information with any number of feedback lev els K ≥ 2 is same as the diversity with perfect receiv er information with 1 bit of feedback. V I . C S I b R b T Q : E S T I M AT E D C S I R W I T H N O I S Y Q U A N T I Z E D F E E D BA C K W e observed in Section V that with CSIR obtained by MMSE training and perfect feedback, the div ersity- multiplexing tradeoff of d b RT q ( r , K ) = G ( r, 1 + G ( r, 1)) can be achieved with 1 bit of noiseless feedback. W e also ob- served that the div ersity-multiplexing tradeoff of d R b T q ( r , K ) = G ( r , 1+ G ( r , 1)) is also achie vable with 1 bit of noisy feedback with perfect CSIR. In this section, we will show that diversity- multiplexing tradeoff d b RT q ( r , 2) can be achiev ed when both imperfections are present simultaneously: 1 bit of imperfect feedback based on noisy training-based receiv er information. Thus, our main result is Theorem 7. F or K = 2 and r < min( m, n ) , an achie v- able diversity-multiplexing tradeoff is given by d b R b T q ( r , 2) = G ( r , 1 + G ( r, 1)) . A. Pr otocol The complete protocol with both power -controlled training and power -controlled feedback constitutes of three phases as described below . Phase 1: The training is done using po wer SNR to get the channel estimate b H . On the basis of this training, the receiv er decides a feedback lev el J R in the following way . Since the feedback is assumed to be only one bit, there are only tw o power lev els at the transmitter . Denote the two power lev els as P i . = SNR 1+ p i , i ∈ { 0 , 1 } with p 0 = 0 and p 1 = G ( r, 1) . (W e will gi ve the exact constants in front while proving that the av erage power constraint will be satisfied.) The feedback index is J R = ( 1 if log det  I + b H b H † P 0  < R +  log( SNR ) 0 otherwise , where  > 0 is an arbitrarily small constant chosen as before. The above inde x assignment is simply choosing the higher power level if the lo wer po wer level is estimated to be too low . Phase 2: A feedback lev el J R ∈ { 0 , 1 } is transmitted from the receiv er which is receiv ed at the transmitter as J T ∈ { 0 , 1 } . The receiver employs the power -controlled encoding scheme described in Section IV to send the feedback index. The power lev els used for sending the feedback information are denoted as Q 0 = 0 and Q 1 . = SNR 1+ G ( r, 1) . Phase 3: The transmitter gets a feedback index J T which is used to train the receiv er at power le vel P J T and then send data at power lev el P J T . The trained channel estimate is denoted b H 2 and let e H = H − b H 2 . 14 (a) Event Noiseless Index Receiver Index (b) Noiseless Index Receiver Index Fig. 6. Example 6: Obtaining channel information at the receiver: (a) if the receiv er has perfect information, then it knows the correct feedback index, else (b) the receiver index is not known perfectly and can be viewed as an output of a noisy channel. The outage probability of the above three-phase protocol is upper bounded by Π( O ) = Π  log det  I + P J T m b H 2 Q b H † 2 1 + P J T mn trace( e H e H † ) ! < R ! . (20) The abov e expression for the effecti ve outage probability considers the channel estimation error as noise [10] and hence is only an upper bound for the optimal scheme. Following identical steps as in Section V, we find that the interference error due to e H does not impact the analysis asymptotically and can thus be ignored. Hence, Π( O ) . = Π  log det  I + P ( J T ) b H 2 b H † 2  < R  . (21) Now define J as J =        1 if log det  I + b H 2 b H † 2 P 0  < R and log det  I + b H 2 b H † 2 P 1  ≥ R 0 otherwise . Using the analysis as in Sections III and IV , we observe that Π( J T = 0 | J R = 1) . = SNR − mn (1+ G ( r, 1)) , Π( J T = 1 | J R = 0) . = 0 , Π( J R = 0 , J = 1) . = 0 . W e now split Π( O ) in (21) into 8 terms depending on the values of J , J R and J T as follows, Π( O ) . = Π(log det( I + P ( J T ) b H 2 b H † 2 ) < R ) . = 1 X i =0 1 X j =0 1 X k =0 Π(log det( I + P ( J T ) b H 2 b H † 2 ) < R , J = i, J R = j, J T = k ) . (22) Note that the terms with J R 6 = J T can be asymptotically upper bounded SNR − G ( r, 1+ G ( r , 1)) . Also, the term corresponding to i = j = k = 0 can be upper bounded by SNR − G ( r, 1+ G ( r , 1)) since when J = 0 , log det( I + P (0) b H 2 b H † 2 ) < R happens when log det( I + P 1 b H 2 b H † 2 ) < R . Further , J R = 0 and J = 1 happens with probability . = 0 . Thus, the only remaining case is when J T = J R = 1 . Hence, Π( O ) . = 1 X i =0 1 X j =0 1 X k =0 Π(log det( I + P ( J T ) b H 2 b H † 2 ) < R , J = i, J R = j, J T = k ) ˙ ≤ SNR − G ( r, 1+ G ( r , 1)) +Π(log det( I + P ( J T ) b H 2 b H † 2 ) < R , J T = 1 , J R = 1) ˙ ≤ SNR − G ( r, 1+ G ( r , 1)) +Π(log det( I + P 1 b H 2 b H † 2 ) < R . = SNR − G ( r, 1+ G ( r , 1)) . (23) Hence, the div ersity order of G ( r, 1 + G ( r , 1)) for multiplexing gain r can be achiev ed with only one bit of feedback. W e now sho w that the power constraint is also satisfied which completes the proof. Recall that the power levels at the transmitter are denoted by P 0 and P 1 , while at the receiv er are denoted by Q 0 and Q 1 . Let P 0 = SNR 2 and P 1 = SNR 4(Π( J R =1)) . The average po wer used at the transmitter is P 0 Π( J T = 0) + P 1 Π( J T = 1) . T o show that this average power ≤ SNR , it is enough to prove that Π( J T = 1) ≤ 2(Π( J R = 1)) . (24) Thus, the left hand side Π( J T = 1) = Π( J T = 1 , J R = 0) + Π( J T = 1 , J R = 1) ≤ Π( J T = 1 | J R = 0) + Π( J R = 1) . (25) Note that the first decays faster with SNR than Π( J R = 1) , this the abov e ≤ 2 π ( J R = 1) . Further , let Q 0 = 0 and Q 1 = SNR 2(Π( J R =1)) . The av erage power used at the recei ver is Q 0 Π( J R = 0) + Q 1 Π( J R = 1) ≤ SNR . Thus, di versity order of G ( r , 1 + G ( r, 1)) can be achie ved with imperfect feedback and imperfect CSIR. Hence, one bit of imperfect feedback and imperfect CSIR is same as one bit of perfect feedback and perfect CSIR except the time losses in trainings and the feedback. 15 V I I . N U M E R I C A L R E S U L T S First consider the case of m = 1 and n = 2 . The div ersity multiplexing tradeoff in the various cases for 1 or 2 bits of feedback can be seen in Figure 7. W e will now go through all the different tradeoff curves in the order of the legend from top to bottom. The first line G ( r , 1) represents the di versity obtained with no feedback (CSIR and CSI b R hav e identical performance since time lost in training is not accounted in our expressions) and also the di versity obtained in the case when the feedback is perfect and the recei ver is trained through a constant po wer symbol (CSI b RT q with constant power training). The second piecewise linear curve represents the div ersity obtained when the receiv er knows perfect channel state information but one bit of constant power feedback is sent o ver a noisy feedback channel (CSIR b T q with constant power feedback). The third line G ( r , 1 + G ( r, 1)) represents the div ersity obtained with 1 bit of perfect feed- back (CSIRT q ). This also represents the div ersity obtained when recei ver knows the channel perfectly while 1 bit of power -controlled feedback is provided on a feedback channel (CSIR b T q with power -controlled feedback). This diversity is also obtained when the feedback link is perfect while the receiv er is trained using power -controlled training symbols (CSI b RT q with po wer-controlled training). Further , the di versity obtained when the receiv er does not kno w channel state infor- mation and the feedback link is noisy is also G ( r , 1 + G ( r, 1)) (CSI b R b T q with po wer-controlled training and power -controlled feedback). Note that the second curve started from diversity of 4 at zero multiplexing, but then follo wed the third line after r ≈ 0 . 5 since it cannot perform better than the perfect feedback case. Next, we consider three le vels of feedback. More bits of feedback do not increase the di versity in any case except when the receiv er knows the channel perfectly . In this case, the next two curves lines sho w the effect of constant power feedback and the po wer controlled feedback on the di versity . The last line shows the performance with 3 levels of perfect feedback. The diversity multiplexing tradeoff curve achieved with constant po wer starts from 4 and hits the line of 3 le vels of perfect feedback. W e will no w see the effect of increase on SNR on the outage probabilities at a constant multiple xing gain. W e will focus on 1 bit of feedback. In Figure 8, m = n = 1 and r = 0 . 2 . Thus, the theoretical diversity for the case of no feedback is 0 . 8 while for all other cases considered in the Figure is 1 . 6 and we find that we obtain close to the expected di versity order at SNR of about 20 dB. Note that the differences in the higher div ersity order curves is small and the y seem on top of each other in the plot. In Figure 9, m = 1 , n = 2 and r = 0 . 5 . In this case, the theoretical di versity for the case of no feedback is 1 while with feedback is 3 which can be noted from the slope. V I I I . E X T E N S I O N TO M U L T I U S E R M I M O In this section, we extend our main result for point-to-point systems, i.e, Theorem 7, to the case of multiple access channel under the model of common feedback to all transmitters. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 2 4 6 8 10 12 14 Multiplexing Diversity G(r,1) 1 bit of constant power feedback with perfect CSIR G(r,1+G(r,1)) 3 levels of constant power feedback with perfect CSIR 3 levels of power−controlled feedback with perfect CSIR 3 level perfect feedback with perfect CSIR Fig. 7. Diversity multiplexing tradeoff in various scenarios. 10 15 20 25 30 35 40 45 50 55 10 −8 10 −6 10 −4 10 −2 10 0 SNR in dB Outage Probability Perfect CSIR, No feedback Imperfect CSIR, Perfect Feedback Perfect CSIR, Perfect Feedback Imperfect CSIR, Imperfect Feedback Perfect CSIR, Imperfect Feedback Fig. 8. Diversity multiple xing tradeoff in v arious scenarios. The various parameters chosen are: m=1, n=1, r=0.2. The channel estimates are obtained using a training time of 10 channel uses. A. Multiuser Channel Model Consider a multiple access channel with L transmitters T i where each transmitter has m transmit antennas and the receiv er R has n receiv e antennas. The channel is constant during a fading block of T coh channel uses, but changes independently from one block to the next. The received signal can be written in the matrix form as { T i } → R : Y = X 1 ≤ i ≤ L H i X i + W . (26) Here, W is of size n × T coh represents additive white Gaussian noise at the receiv er with all entries i.i.d. C N (0 , 1) . W e consider a Rayleigh fading environment, i.e. elements of H i are assumed to be i.i.d C N (0 , 1) . The transmitters are subject to an average po wer constraint such that the long-term po wer 16 10 12 14 16 18 20 22 24 26 28 30 10 −8 10 −7 10 −6 10 −5 10 −4 10 −3 10 −2 SNR in dB Outage Probability Perfect CSIR, No feedback Imperfect CSIR, Perfect Feedback Perfect CSIR, Perfect Feedback Imperfect CSIR, Imperfect Feedback Perfect CSIR, Imperfect Feedback Fig. 9. Diversity multiple xing tradeoff in v arious scenarios. The various parameters chosen are: m=1, n=2, r=0.5. The channel estimates are obtained using a training time of 10 channel uses. is upper bounded, i.e, 1 T coh trace( E h X i X † i i ) ≤ SNR for 1 ≤ i ≤ L . The feedback path to the transmitter in an orthogonal frequency band is giv en by R → T i : Y f ,i = H f ,i X f ,i + W f , (27) where H f ,i is the MIMO fading channel for the feedback link to i th user , normalized much like the forward link. The feedback transmissions are also assumed to be po wer- limited, that is the reverse link has a power budget of 1 T coh trace( E h X f ,i X † f ,i i ) ≤ SNR f . Without loss of generality , we will assume a symmetry in r esour ces , such that SNR = SNR f . Finally , we assume that the recei ver computes the common feedback indix J R ∈ { 0 , · · · , K − 1 } , which is broadcast ov er the downlink and received at transmitter i as J T i ∈ { 0 , · · · , K − 1 } . B. Diversity-Multiple xing T radeof f The diversity multiplexing tradeoff for single user MIMO channels was described in Section II.C. Here, we extend that discussion to MIMO MA C channels. As before, we concentrate on single rate transmission. The dependence of rates on the SNR s is explicitly given by R s = r s log SNR s . W e refer to r , ( r s ) 1 ≤ s ≤ L as the multiplexing gains. Let H = { H 1 , · · · , H L } . Further , let the channel estimates at the receiv er be b H = { b H 1 , · · · , b H L } . In a multiple access channel, corresponding outage e vent is defined as the union over the ev ents that the channel cannot support target data rate for some subset of the users [28], union ov er all the subsets. Hence, for a multiple access channel with L users, each equipped with m transmit antennas, and a re- ceiv er with n receive antennas, the outage event is O , S S O S . The union is taken over all subsets S ⊆ { 1 , 2 , · · · , L } , and O S is the set of all the channels where the sum transmitted rate by these | S | users is less than the maximum supportable rate by the MIMO link from these | S | users to the destination. The system is said to hav e diversity order of d if Π( O ) . = SNR − d . The diversity multiplexing tradeoff for the multiple access channel can be described as follows: given the multiplexing gains r for all the users, the di versity order that can be achieved describes the div ersity-multiplexing tradeof f region. The probability of outage with rate R = ( R 1 , R 2 , · · · , R L ) , transmit power P ( J T i ) = P i ∀ J T i and perfect channel state information H at the receiver is denoted by S ( R , P ) , Π  S S O S ( R , P )  . If we assume that the recei ver kno ws the channel perfectly as H , we denote the event ∪ S O S ( R , P )) by U H ( R , P ) where we assume that H is the perfect channel knowledge at the receiv er . Let D ( r , p ) be defined as S ( R , P ) . = SNR − D ( r , p ) where r = ( r 1 , r 2 , · · · , r L ) and p = ( p 1 , p 2 , · · · , p L ) . W e further de- note function G ( r, p ) in Section II by G m,n ( r , p ) to e xplicitly depict that this is for m transmit and n recei ve antennas. Lemma 5. [16] Let p s = p for all 1 ≤ s ≤ L . Also, let P i ∈ S r i ≤ min( | S | m, n ) for all non-empty subsets S of { 1 , 2 , · · · , L } . Then, D ( r , p ) = min S G | S | m,n X i ∈ S r i , p ! . (28) C. CSI b R b T q : Estimated CSIR with Noisy Quantized F eedbac k All the results related to quantized feedback in this paper can be extended to multiple access channel where there is a feedback lev el sent from the receiver and all the transmitters receiv e this signal and adjust the power accordingly . T o demonstrate the extension, we consider a symmetric system where all transmitters hav e a statistically identical channel to the recei ver with identical average SNR and employ the same power control thresholds. Furthermore, we will only consider a single-bit feedback, which implies that simultaneously all transmitters will be instructed to use the low power level or the high power lev el. Howe ver , since we assume independent errors in the feedback links, each transmitter may or may not transmit at the right power le vel. Under the above conditions, an achiev able di versity-multiplexing tradeoff is gi ven by Theorem 8. F or K = 2 and r = ( r 1 , · · · , r L ) with P i ∈ S r i ≤ min( | S | m, n ) for all non-empty subsets S of { 1 , 2 , · · · , L } , an ac hievable diversity-multiplexing tr adeoff for a multiple- access channel is given by d b R b T q ( r , 2) = D ( r , 1 (1 + D ( r , 1 ))) wher e 1 x denotes ( x, x, · · · , x ) . Pr oof: W e will provide the main steps to pro ve the abo ve result based on the following three-phase protocol. First define p 0 = 0 and p j = D ( r , 1 (1 + p j − 1 )) ∀ j ≥ 1 . Phase 1: Each transmitter trains the recei ver using power SNR to get the channel estimate b H i at the receiv er . On the basis of this training, the receiv er decides a feedback lev el J R in the following way . W e consider tw o po wer lev els P i . = SNR 1+ p i , i ∈ [0 , 1] . W e will state the e xact constants for power control while proving that the a verage power constraint 17 will be satisfied. The feedback inde x is J R =  1 if U b H ( R + 1  log( SNR ) , 1 P 0 ) = 1 0 otherwise . Intuitiv ely , we choose the higher of the two power levels if the lower power lev el is not sufficient to avoid outage (ev en for one of the users) based on the estimated channel. Phase 2: A feedback level J R ∈ { 0 , 1 } is sent from the re- ceiv er but each transmitter receiv es J T i ∈ { 0 , 1 } according to the power controlled feedback scheme in Section IV -B. Since the feedback links have i.i.d. errors, different transmitters may receiv e dif ferent feedback indices. Phase 3: The transmitter k gets a feedback po wer le vel J T k , sends a training signal to the receiver at po wer lev el P J T k followed by data at power lev el P J T k . The channel estimate based on this power -controlled training is denoted b H 2 ,k . Let e H k = H k − b H 2 ,k . Further, b H 2 = { b H 2 , 1 , · · · , b H 2 ,L } . Also, denote H S as the | S | m × n matrix formed by concatenation of H i in S . Similarly , define b H 2 S and b H S . The outage probability is bounded from above by sum of outage probabilities for each transmitter . The analysis of Phase 3 is similar to Appendix C-B1 since the estimation error in the third phase can be neglected for div ersity multiplexing tradeoff purposes. Now define J as J =  1 if U b H 2 ( R , 1 P 0 ) = 1 and U b H 2 ( R , 1 P 1 ) = 0 0 otherwise . Since the third phase estimation error can be neglected, the outage probability is Π( O ) . = Π( U b H 2 ( R , P ( J T )) = 1) . Hence, repeating the analysis of single user systems and using union bounds in Section VI, we get the same results as in single user systems, but with G replaced by D and single multiplexing gain replaced by multiple xing gain vector . W e show by example how to extend all the steps. Π( J R = 1 , J = 0) can be written as Π( J R = 0 , J = 1) = Π log det( I + SNR b H S b H † S ) ≥ X i ∈ S R i ∀ S and log det( I + SNR b H 2 S b H † 2 S ) < X i ∈ S R i for some S and log det( I + P 1 b H 2 S b H † 2 S ) ≥ X i ∈ S R i ∀ S ! (29) W e will now bound the above statement as follo ws. First define the following events A S = { log det( I + SNR b H S b H † S ) ≥ X i ∈ S R i } , (30) C S = { log det( I + SNR b H 2 S b H † 2 S ) < X i ∈ S R i } , (31) D S = { log det( I + SNR 1+ p 1 b H 2 S b H † 2 S ) ≥ X i ∈ S R i } . (32) W e note (29) is the probability of ∩ S A S ∩ ( ∪ S C S ) ∩ ∩ S D S ⊆ ∪ s ( A S ∩ C S ) . Hence, Π( J R = 0 , J = 1) ˙ ≤ X S Π log det( I + SNR b H S b H † S ) ≥ X i ∈ S R i and log det( I + SNR b H 2 S b H † 2 S ) < X i ∈ S R i ! (33) Note that this term is similar to that in single user, and for each S is . = 0 . Thus, the abo ve probability . = 0 . Similarly , all other steps for single user MIMO channels can be extended to MIMO MA C system. Hence, the di versity order of D ( r , 1 (1 + D ( r , 1 ))) can be achiev ed with imperfect feedback and imperfect CSIR for L transmitters. In [28], di versity-multiplexing for multiple-access case was considered without feedback. The achiev able diversity multi- plexing without feedback is D ( r , 1 ) . In [14], it was shown that with 1 bit of perfect feedback, the div ersity multiplexing of D ( r , 1 (1 + D ( r , 1 ))) can be achieved when the receiv er knows perfect channel state information. In this paper , we sho w that the di versity multiplexing tradeof f of D ( r , 1 (1 + D ( r , 1 ))) can be achiev ed e ven when the recei ver is trained on a noisy channel and the feedback inde x is also sent on an orthogonal noisy channel with po wer-controlled training and feedback. I X . C O N C L U S I O N S In this paper , we find the diversity tradeof f for a non- symmetric FDD system in which the errors in MMSE channel estimate and the quantized feedback channel are accounted for a single user and a multiple access channel. W e find that div ersity multiplexing tradeof f of a system with 1 bit of feedback over a noisy channel and MMSE channel estimate at the receiv er is the same as that of a system with 1 bit of perfect feedback and perfect channel estimate at the recei ver . More importantly , we show that additional bits of feedback do not increase the diversity order of the system at constant rates. The approach in this paper has also been used to improv e the performance of a TDD system as is summarized in [24]. The two models, FDD and TDD, consider the two extreme cases of the correlations between the forward and the backward channel. As a next step, one can consider what happens if the forward and the feedback channel are correlated, but not exactly the same. This paper suggests that one round of training provides a certain resolution to the channel gain which limits the div ersity multiplexing tradeoff performance. The strate gies can be extended to a multi-round communication between the sender and the receiv er that allo ws better channel resolution at the nodes. This multi-round extension for both FDD and TDD models can be seen in [32]. Also, this paper assumes a Rayleigh fading channel model. The authors of [29] consider a general model for fading which includes Rayleigh, Rician, Nakagami and W eibull distributions to find the diversity multiplexing tradeoff for a system with no feedback and perfect channel estimate at the receiv er . The 18 extension of the feedback cases to general fading models is still open. Finally , the two way channel model can be extended to consider delays in the feedback channel. If there is a delay in the feedback process, the transmitter can decide to send some data as if there is no feedback till it receives feedback and then try to use the feedback to improve the diversity by sending po wer controlled data (possibly correlated with the data transmitted before feedback is recei ved) in the remaining time. X . A C K N OW L E D G E M E N T S The authors wish to thank Gajanana Krishna and Srikrishna Bhashyam for useful discussions related to this paper . W e would also like to thank the anonymous revie wers for many suggestions that improv ed this paper . A P P E N D I X A P RO O F O F T H E O R E M 4 W e first note some properties of I ν ( x ) , modified Bessel function of first kind, that will be used in the proof. The series expansion of I ν ( x ) is giv en as [31, Equation 9.6.10][30], I ν ( x ) = ∞ X i =0 1 i !( i + ν )!  x 2  2 i + ν . (34) When | x | is large and | arg( x ) | < π 2 , asymptotic expansion of I ν ( x ) is giv en by [31, Equation 9.7.1] I ν ( x ) . = e x √ 2 π x  1 − µ − 1 x + ( µ − 1)( µ − 9) 2!(8 x ) 2 − ( µ − 1)( µ − 9)( µ − 25) 3!(8 x ) 3 + · · ·  , (35) where µ = 4 ν 2 . Now , for the proof of Theorem 4, we will use Lemma 4. W e will further suppose that m N = m without loss of generality . Let A = { i : α i + b α i ≤ 2 } . W e will e valuate p ( α , b α ) in the following fi ve disjoint cases which comprise the whole space of possibilities. 1) min( α m , b α m ) ≥ 1 ( or ( α , b α ) ∈ E m ). 2) min( α m , b α m ) < 1 , α m + b α m ≥ 2 . 3) min( α m , b α m ) < 1 , α m + b α m < 2 , α i 6 = b α i for some i ∈ A . 4) min( α m , b α m ) < 1 , α m + b α m < 2 , α i = b α i for all i ∈ A , ( α , b α ) / ∈ S m − 1 k =0 ( E k ) . 5) ( α , b α ) ∈ S m − 1 k =0 ( E k ) . Now , we consider all of the cases one by one as follows. 1) min( α m , b α m ) ≥ 1 : Using Equation (9), p ( λ, b λ ) = exp  − P m k =1 λ k + b λ k 1 −| ρ | 2  4 ( λ ) 4 ( b λ ) m ! m !Π m − 1 j =0 j !( j + ν )! | ρ | mn − m (1 − | ρ | 2 ) m Π m k =1 ( q λ k b λ k ) ν det       I ν   2 | ρ | q λ k b λ l 1 − | ρ | 2         Since ρ = 1 √ 1+ m SNR , we substitute ρ . = 1 and 1 − ρ 2 . = 1 / SNR to obtain p ( α, b α ) . = exp − P m i =1 SNR − α i + SNR − b α i 1 / SNR ! 4 ( SNR − α ) 4 ( SNR − b α ) SNR − P m i =1 ( α i + b α i ) ( p SNR − P m i =1 α i SNR − P m i =1 b α i ) ( n − m ) det     I n − m  2 √ SNR − α k SNR − c α l 1 / SNR      m ! m !Π m − 1 j =0 j !( j + n − m )! 1 SNR m . (36) As 4 ( SNR − α ) . = SNR − P m i =1 ( i − 1) α i , we get p ( α, b α ) . = SNR m SNR − P m i =1 ( i + n − m 2 )( α i + b α i ) det    I n − m  SNR 1 − α k + b α l 2     . (37) W e will now find det    I ν  SNR 1 − α k + b α l 2     . Using (34), we get I ν  SNR 1 − α k + b α l 2  = ∞ X i =0 1 i !( i + ν )! SNR 1 − α k + b α l 2 2 ! 2 i + ν = SNR 1 − α k + b α l 2 2 ! ν ∞ X i =0 1 i !( i + ν )! SNR 1 − α k + b α l 2 2 ! 2 i (38) Let p er ( k 1 , k 2 , · · · , k m ) for ( k 1 , k 2 , · · · , k m ) a permu- tation of (1 , · · · , m ) be defined as follo ws p er ( k 1 , k 2 , · · · , k m ) ,          0 if ( k 1 , k 2 , · · · , k m ) is an e ven permutation of (1 , · · · , m ) 1 if ( k 1 , k 2 , · · · , k m ) is an odd permutation of (1 , · · · , m ) . (39) Thus, det    I ν  SNR 1 − α k + b α l 2     = X k ( − 1) per ( k 1 ,k 2 , ··· ,k m ) Π m l =1   SNR 1 − α k l + b α l 2 2   ν ∞ X i =0 1 i !( i + ν )!   SNR 1 − α k l + b α l 2 2   2 i . = SNR ν P m i =1 (1 − α i + b α i 2 ) X k ( − 1) per ( k 1 ,k 2 , ··· ,k m ) Π m l =1 ∞ X i =0 1 i !( i + ν )! SNR 2 − α k l + b α l 4 ! i (40) The abov e equation is same as Equation (56) in [33] 19 with K = √ 2 − 1 2 , φ i = SNR 1 − α i and λ i = SNR 1 − b α i , and hence det    I ν  SNR 1 − α k + b α l 2     . = SNR ν P m i =1 (1 − α i + b α i 2 ) 4 ( SNR 1 − α i ) 4 ( SNR 1 − b α i ) . = SNR ν P m i =1 (1 − α i + b α i 2 ) SNR 2 P m i =1 ( i − 1)(1 − α i + b α i 2 ) . = SNR P m i =1 (2 i − 2+ ν )  1 − α i + b α i 2  . (41) Substituting in Equation (37), we get p ( α, b α ) . = SNR m SNR − P m i =1 ( i + n − m 2 )( α i + b α i ) SNR P m i =1 (2 i − 2+ ν )  1 − α i + b α i 2  . = SNR mn SNR − P m i =1 ( i + n − m 2 )( α i + b α i ) SNR − P m i =1 ( i − 1+ n − m 2 )( α i + b α i ) . = SNR mn SNR − P m i =1 (2 i + n − m − 1)( α i + b α i ) = e m . (42) 2) min( α m , b α m ) < 1 , α m + b α m ≥ 2 : Using Equation (36), we see that all terms except exp  − P m i =1 SNR − α i + SNR − b α i 1 / SNR  remain the same and hence are polynomial in SNR while exp  − P m i =1 SNR − α i + SNR − b α i 1 / SNR  decreases exponentially in SNR and hence p ( α, b α ) . = 0 . 3) min( α m , b α m ) < 1 , α m + b α m < 2 , α i 6 = b α i for some i ∈ A : W e will prove p ( α, b α ) . = 0 in this case. For this we consider exp  − P m k =1 λ k + b λ k 1 −| ρ | 2  det     I ν  2 | ρ | √ λ k b λ l 1 −| ρ | 2      and prove this part to decrease exponentially and we would be done since rest of the terms are polynomial in SNR . Using (35), det       I ν   2 | ρ | q λ k b λ l 1 − | ρ | 2         . = X k ( − 1) per ( k 1 ,k 2 , ··· ,k m ) Π m l =1 ,α k l + b α l ≤ 2 exp 2 | ρ | q λ k l b λ l 1 − ρ 2 ! r 2 q λ k l b λ l 1 −| ρ | 2 (1 − µ − 1 16 q λ k l b λ l 1 − ρ 2 + · · · ) Π m l =1 ,α k l + b α l > 2 P oly ( SNR )  , (43) where Poly ( SNR ) represents the term that are polyno- mial in SNR . First observe that exp − λ k l + b λ l 1 − ρ 2 ! exp   2 | ρ | q λ k l b λ l 1 − ρ 2   ≤ 1 , (44) and hence the product of such terms cannot increase exponentially with SNR . exp − P m k =1 λ k + b λ k 1 − | ρ | 2 ! det       I ν   2 | ρ | q λ k b λ l 1 − | ρ | 2         . = X k ( − 1) per ( k 1 ,k 2 , ··· ,k m ) Π m l =1 exp( − SNR 1 − min( α k l , b α l ) ) Π m l =1 ,α k l + b α l ≤ 2 exp( SNR 1 − α k l + b α l 2 ) P oly ( SNR ) . = X k ( − 1) per ( k 1 ,k 2 , ··· ,k m ) Π m l =1 ,α k l + b α l > 2 exp( − SNR 1 − min( α k l , b α l ) ) Π m l =1 ,α k l + b α l ≤ 2  exp( − SNR 1 − min( α k l , b α l ) ) exp( SNR 1 − α k l + b α l 2 )  P oly ( SNR ) ˙ ≤ X k ( − 1) per ( k 1 ,k 2 , ··· ,k m ) Π m l =1 ,α k l + b α l > 2 1 min( α k l , b α l ) ≥ 1 Π m l =1 ,α k l + b α l ≤ 2 1 α k l = b α l P oly ( SNR ) (45) Now , we will show that each term under the sum decays exponentially with SNR . For this, the product of the abov e indicators should be 0 . Let us now consider all the scenarios when the abov e product of indicators is 1 . If some b α l < 1 , and pairs with α k l to gi ve b α l + α k l ≤ 2 , then the two must be equal and if it pairs to give b α l + α k l > 2 , the product of indicators will always be 0 . If b α l ≥ 1 and pairs with α k l to giv e b α l + α k l ≤ 2 , it can only happen when the two are equal and if it pairs to give b α l + α k l > 2 , then α k l ≥ 1 . Hence the only pairing that will work is that all elements of α i < 1 are matched to c α j < 1 and also α i ≥ 1 is not mapped to b α i < 1 . This can happen only when α i = b α i whenev er α i < 1 or b α i < 1 and all the rest are ≥ 1 . If α i 6 = b α i for some i ∈ A , the above condition do not hold. This prov es Case 3. 4) min( α m , b α m ) < 1 , α m + b α m < 2 , α i = b α i for all i ∈ A , ( α , b α ) / ∈ S m − 1 k =0 ( E k ) : W e see that all the analysis of Case 3 holds for this case and hence if α i = b α i for all i ∈ A and ( α, b α ) / ∈ S m − 1 k =0 ( E k ) , an element ≥ 1 is mapped to an element < 1 which makes the product of indicators zero and hence the probability decreases exponentially with SNR . 5) α , b α ∈ S m − 1 k =0 ( E k ) . These are the cases we sum ov er in the statement of the Lemma. In each of these cases, p ( α, b α ) exist. When we integrate over b α , we find that integral of P k e k 1 E k w .r .t. b α is Π m i =1 SNR − (2 i − 1+ n − m ) α i 1 α 1 , ··· ,α k ≥ 1 1 0 ≤ α k +1 , ··· ,α m N < 1 . W e next note that in order to find the polynomial expressions associated, the two exponentials multiplication cannot decrease with SNR for which we would need k l = l in (43) (since all the λ ’ s and b λ ’ s are ordered) and hence from all the expressions, we find that p ( α , b α ) = p 1 ( α ) p 2 ( b α ) for any E k . Using the 20 separability , we find that e k = SNR k ( n − m + k ) Π k i =1 SNR − (2 i − 1+ | n − m | ) b α i Π m N i =1 SNR − (2 i − 1+ | n − m | ) α i . (46) A P P E N D I X B P RO O F O F T H E O R E M 3 In this appendix, we prov e that the diversity in the statement of Theorem 3 can be achiev ed. Let S ( R, P ) be the probability of outage when the transmitter uses power level P and rate R is required. If P . = SNR p , then S ( R, P ) . = SNR − G ( r,p ) and Π( O ) ˙ ≤ S ( R, P K − 1 ) + K − 1 X i =1 Π( J T < i, J R = i ) . W e assign the feedback inde x J R at the receiv er as J R =      arg min i ∈ I i, I = { k : log det( I + H H † P k ) ≥ R , k ∈ { 0 , · · · , K − 1 }} K − 1 , if the set I is empty . W e will no w find Π( J T = i ) for i ≥ 1 as Π( J T = i ) = K − 1 X j =0 Π( J T = i, J R = j ) = K − 1 X j =0 Π( J T = i | J R = j )Π( J R = j ) . = Π( J R = i ) + K − 1 X j = i +1 Π( J T = i | J R = j )Π( J R = j ) . = Π( J R = i ) . (47) The abo ve steps follow from the fact that Π( J T = i | J R = j ) . = 0 if j < i . Further , K − 1 X j = i +1 Π( J T = i | J R = j )Π( J R = j ) ˙ ≤ K − 1 X j = i +1 Π( J R = j ) ˙ ≤ Π( J R = i ) . (48) Let the power lev els be chosen as P i =  SNR K when i = 0 SNR K S ( R,P i − 1 ) when i > 0 . Further , Q i ≤ P i . W e first note that the po wer constraints are satisfied. Note that as before, P i = SNR 1+ p i and Q i = SNR q i . Also, P i . = SNR 1+ d R T q ( r,i ) ∀ i and S ( R , P i ) . = SNR − d R T q ( r,i +1) . Thus, Π( J R = i ) = Π( J T = i ) . = SNR − d R T q ( r,i ) . W e will now show that using P i as giv en above, we achieve the desired div ersity multiplexing tradeoff using the following computation. Π( O ) ˙ ≤ S ( R, P K − 1 ) + K − 1 X i =1 Π( J T < i, J R = i ) . = SNR − d R T q ( r,K ) + K − 1 X i =1 Π( J T < i | J R = i )Π( J R = i ) . = SNR − d R T q ( r,K ) + K − 1 X i =1 Π( J T < i | J R = i ) SNR − d R T q ( r,i ) . = SNR − d R T q ( r,K ) + K − 1 X i =1 SNR − mn (max( q i , 0) − max( q i − 1 , 0)) SNR − d R T q ( r,i ) . = SNR − d R b T q ( r,K ) . (49) A P P E N D I X C P RO O F O F T H E O R E M 6 W e will first show that the di versity cannot be greater than mn ( mn + 1) , and then show that diversity of G ( r, 1 + G ( r, 1) can be achiev ed with 1 bit of feedback. A. Con verse In this subsection, we will prov e that we cannot get more div ersity with > 2 le vels of feedback with imperfect CSIR as compared to 1 bit of feedback with perfect CSIR at zero multiplexing. For this, we assume that the third phase is perfect and thus e H = 0 , and b H 2 = H . Thus, Π( O ) . = Π(log det( I + P ( J R ) H H † ) < R ) . Define J as J =      arg min i ∈ I i, I = { k : log det( I + H H † P k ) ≥ R , k ∈ { 0 , · · · , K − 1 }} 0 , if the set I is empty . The probability of outage is then Π( O ) . = Π(log det( I + P ( J R ) H H † ) < R ) ˙ ≥ Π(log det( I + P ( J R ) H H † ) < R , J R = 1) . = Π(log det( I + P 1 H H † ) < R , log det( I + P 0 b H b H † ) < R +  log( SNR ) , log det( I + P 1 b H b H † ) ≥ R +  log ( SNR )) . (50) Let λ j and b λ j be the eigenv alues of H H † and b H b H † respectiv ely . Further let α j and b α j be the negati ve SNR exponents of the corresponding eigen values. Then, Equation 21 (50) reduces to Π( O ) ˙ ≥ Π(log det( I + P 1 H H † ) < R , log det( I + P 0 b H b H † ) < R +  log( SNR ) , log det( I + P 1 b H b H † ) ≥ R +  log( SNR )) ˙ ≥ Π(Π m j =1 (1 + λ j SNR 1+ p 1 ) < SNR r and Π m j =1 (1 + b λ j SNR ) < SNR r +  and Π m j =1 (1 + b λ j SNR 1+ p 1 ) > SNR r +  ) . = Π( m X j =1 (1 + p 1 − α j ) + < r and m X j =1 (1 − b α j ) + < r +  and m X j =1 (1 + p 1 − b α j ) + > r +  ) . (51) For r → 0 , the above reduces to Π( O ) ˙ ≥ Π( m X j =1 (1 + p 1 − α j ) + ≤ 0 and m X j =1 (1 − b α j ) + <  and m X j =1 (1 + p 1 − b α j ) + >  ) . (52) Thus, the choice of α j = 1 + p 1 and all b α j = 1 for  → 0 can be used for the outer bound on outage probability . This giv es Π( O ) ˙ ≥ SNR − mn ( p 1 +1) . As p 1 ≤ mn , the diversity ≤ mn ( mn + 1) . B. Achie vability In this subsection, we will show that the div ersity gain of G ( r , 1 + G ( r, 1)) can be achieved with 1 bit of feedback with imperfect CSIR. W e will first pro ve that the interference error due to e H in third phase do not make a difference and thus can be remov ed. 1) Analysis of Phase 3: Let the actual channel be H while e H be the error in the estimate of H in the third phase of training which is po wer controlled. As a result, we obtain Π( O ) = K − 1 X j =0 Π( O , J T = j ) . = K − 1 X j =0 Π(log det(1 + P j b H 2 b H † 2 1 + P j trace( e H e H † ) ) < R , J T = j ) . Consider any term in the sum above, the receiv er is trained with power P j , and hence let eigen-values of P j e H e H † be e λ i . = SNR − e α i . Let e α = ( e α 1 , · · · , e α m ) Then, Π( O ) . = K − 1 X j =0 Π(log det(1 + P j b H 2 b H † 2 1 + P j trace( e H e H † ) ) < R , J T = j ) . = K − 1 X j =0 Π(log det(1 + P j b H 2 b H † 2 SNR − ( − min e α ) + ) < R , J T = j ) . Now , since b H 2 and e H are uncorrelated, probability that ( − min e α ) + > 0 decreases higher than polynomial in SNR and hence, Π( O ) . = K − 1 X j =0 Π(log det(1 + P j b H 2 b H † 2 SNR − ( − min e α ) + ) < R , J T = j ) . = K − 1 X j =0 Π(log det(1 + P j b H 2 b H † 2 ) < R , J T = j ) . = Π(log det(1 + P ( J T ) b H 2 b H † 2 ) < R ) . (53) Thus, Π( O ) . = Π(log det( I + P ( J T ) b H 2 b H † 2 ) < R ) Note that the correlations between b H 2 and b H are same as between H and b H and thus, there is no dif ference in using H in place of b H 2 for the purpose of calculating the di versity multiplexing tradeoff. W e will no w only focus on K = 2 . 2) Analysis of Phase 1: In this section, we see how the feedback error in first phase decays with SNR . J R = ( 1 if log det( I + b H b H † P 0 ) < R +  log( SNR ) 0 if log det( I + b H b H † P 0 ) ≥ R +  log( SNR ) Let us define J as J =    1 if log det( I + b H 2 b H † 2 P 0 ) < R and log det( I + b H 2 b H † 2 P 1 ) ≥ R 0 otherwise . The probability of outage is then Π( O ) . = Π(log det( I + P ( J T ) b H 2 b H † 2 ) < R ) . = Π(log det( I + P ( J T ) b H 2 b H † 2 ) < R , J = 0 , J R = 0) +Π(log det( I + P ( J T ) b H 2 b H † 2 ) < R , J R = 1) +Π(log det( I + P ( J T ) b H 2 b H † 2 ) < R , J = 1 , J R = 0) ˙ ≤ Π(log det( I + P 1 b H 2 b H † 2 ) < R , J R = 0) +Π(log det( I + P 1 b H 2 b H † 2 ) < R , J R = 1) +Π( J R = 0 , J = 1) . = Π(log det( I + P 1 b H 2 b H † 2 ) < R ) +Π( J R = 0 , J = 1) . (54) 22 Denote eigen values of b H b H † by b λ i and the neg ativ e SNR exponents of b λ i as b α i . Also denote eigen values of b H 2 b H † 2 by λ i and the negati ve SNR exponents of λ i as α i . Then, Π( J R = 0 , J = 1) can be bounded as Π( J R = 0 , J = 1) = Π  log det( I + b H 2 b H † 2 SNR ) < R and log det( I + b H 2 b H † 2 SNR 1+ p 1 ) ≥ R and log det( I + b H b H † SNR ) ≥ R +  log( SNR )  . = Π (Π m i =1 (1 + λ i SNR ) < SNR r and Π m i =1 (1 + λ i SNR 1+ p 1 ) ≥ SNR r and Π m i =1 (1 + b λ i SNR ) ≥ SNR r +   ˙ ≤ Π m X i =1 (1 − α i ) + ≤ r and m X i =1 (1 + p 1 − α i ) + ≥ r and m X i =1 (1 − b α i ) + ≥ r +  ! . = 0 . (55) The last step follows since expanding in terms of E k giv es P m i =1 (1 − α i ) + = P m i =1 (1 − b α i ) + for all cases of non exponentially decreasing probability . W e now show that the div ersity order of G ( r, 1 + G ( r, 1)) can be achiev ed. From Equation (54), the probability of outage for K = 2 is Π( O ) ˙ ≤ Π(log det( I + P 1 b H 2 b H † 2 ) < R ) +Π( J R = 0 , J = 1) ˙ ≤ SNR − G ( r, 1+ p 1 ) . (56) Thus, diversity order of G ( r, 1 + G ( r, 1)) can be achie ved which is the optimal diversity order for perfect training in [13]. Lastly , we show that the power constraint can be satisfied with the abov e choice of po wers. Lemma 6. P ower constraint is satisfied for K = 2 . Pr oof: Let P 0 = SNR 2 and P 1 = SNR 2(Π( J R =1)) . The power constraint is P 0 Π( J T = 0) + P 1 Π( J T = 1) ≤ SNR , which trivially holds. R E F E R E N C E S [1] A. J. Goldsmith and P . P . V araiya, “Capacity of fading channels with channel side information, ” IEEE T ransactions on Information Theory , vol. 43(6), pp. 1986-1992, Nov . 1997. [2] V . K. N. Lau and Y . R. Kwok, Channel-Adaptive T echnolo gies and Cross- Layer Designs for W ir eless Systems with Multiple Antennas , John Wile y & Sons, Inc., 2006. [3] G. Caire and S. Shamai, “On the capacity of some channels with channel state information, ” IEEE T ransactions on Information Theory , vol. 45(6), pp. 2007-2019, Sep. 1999. [4] A. K uhne and A. Klein, “Throughput Analysis of Multi-user OFDMA- Systems using Imperfect CQI Feedback and Diversity T echniques, ” IEEE Journal on Selected Ar eas in Communications , vol. 26, no. 8, pp. 1440- 1450, Oct. 2008. [5] E. Biglieri, G. Caire and G. T aricco, “Limiting performance for block- fading channels with multiple antennas, ” IEEE T rans. on Inform. Theory , vol. 47, no. 4, pp. 1273-1289, May 2001. [6] M. Agarwal, D. Guo and M. Honig, “Limited-Rate Channel State Feedback for Multicarrier Block Fading Channels, ” Submitted to IEEE T ransactions on Information Theory, Jan. 2009. [7] W . Shin, S. Chung and Y . H. Keem, “Outage analysis for MIMO Rician channels and channels with partial CSI, ” in Proc. International Symposium on Information Theory , Jul. 2006. [8] A. Khoshne vis and A. Sabharwal, “On the asymptotic performance of multiple antenna channels with quantized feedback, ” IEEE T ransactions on Wir eless Communications , 10 (7), pp. 3869 - 3877, October 2008. [9] V . Sharma, K. Premkumar and R. N. Swamy , “Exponential diversity achieving spatio-temporal power allocation scheme for fading channels, ” IEEE Tr ansactions on Information Theory , vol. 54, no. 1, Jan. 2008. [10] C. Steger and A. Sabharwal, “Single-Input T wo-W ay SIMO Channel: Div ersity-Multiplexing Tradeof f with T wo-W ay T raining, ” to appear in IEEE Tr ansactions on Wir eless Communications , December 2008. [11] G. G. Krishna, S. Bhashyam and A. Sabharwal, “Decentralized power control with two-way training for multiple access, ” in Pr oc. International Symposium on Information Theory , July 2008, T oronto. [12] S. Ekbatani, F . Etemadi and H. Jafarkhani, “Outage behavior of slow fading channels with power control using noisy quantized CSIT , ” arXiv:0804.0790v1 , Apr. 2008. [13] T . T . Kim and M. Skoglund, “Div ersity-Multiplexing tradeoff in MIMO channels with partial CSIT , ” IEEE T ransactions on Information Theory , vol. 53, Issue 8, pp. 2743-2759, Aug. 2007. [14] V . Aggarwal and A. Sabharwal, “Performance of multiple access chan- nels with asymmetric feedback, ” IEEE Journal on Selected Areas in Communication , vol. 26, no. 8, pp. 1516-1525, Oct. 2008. [15] T . T . Kim and G. Caire, “Diversity gains of po wer control with noisy CSIT in MIMO channels, ” IEEE T rans. Inf. Th. , accepted for publication. [16] V . Aggarwal and A. Sabharwal, “Div ersity order gain with noisy feedback in multiple access channels, ” in Pr oc. International Symposium on Information Theory , July 2008, T oronto. [17] A. Khoshnevis and A. Sabharwal, “ Achiev able diversity and multiplex- ing in multiple antenna systems with quantized power control, ” in Pr oc. IEEE Intl. Conference on Communications, May 2005. [18] H. El Gamal, G. Caire, M. O. Damen, “The MIMO ARQ Channel: Div ersity-Multiplexing-Delay Tradeoff, ” IEEE T ransactions on Informa- tion Theory , vol. 52, pp. 3601-3621, Aug. 2006. [19] A. Narula, M. J. Lopez, M. D. Trott and G. W . W ornell, “Efficient use of side information in multiple-antenna data transmission over fading channels, ” IEEE JSAC , vol. 16, pp. 1423-1436, Oct. 1998. [20] K. K. Mukkavilli, A. Sabharwal, E. Erkip and B. Aazhang, “On beamforming with finite rate feedback in multiple-antenna systems, ” IEEE T ransactions on Information Theory , vol. 49, pp. 2562-2579, Oct. 2003. [21] L. Zheng, “Diversity-Multiple xing Tradeof f: A Comprehensive V iew of Multiple Antenna Systems, ” PhD Thesis , Uni versity of California at Berkeley , 2002. [22] L. Zheng and D. N. C. Tse, “Diversity and multiplexing: a fundamental tradeoff in multiple-antenna channels, ” IEEE T ransactions on Information Theory , vol. 49, Issue 5, pp. 1073-1096, May 2003. [23] S. Bhashyam, A. Sabharwal and B. Aazhang, “ Feedback gain in multiple antenna systems, ” IEEE T ransactions on Comm. , v ol. 50, Issue 5, pp. 785-798, May 2002. [24] V . Aggarwal, G. G. Krishna, S. Bhashyam and A. Sabharwal, “T wo Models for Noisy Feedback in MIMO Channels, ” in Pr oc. Asilomar Confer ence on Signals, Systems and Computers, Oct. 2008, Pacific Grove, CA. [25] M. Biguesh and A. B. Gershman, “Training-based MIMO channel estimation: a study of estimator tradeof fs and optimal training signals, ” IEEE T ransactions on Signal Pr ocessing, vol. 54, no. 3, pp. 884-893, Mar . 2006. [26] S. W ang, A. Abdi, “Joint singular value distribution of two correlated rectangular Gaussian matrices and its application, ” SIAM Journal on Matrix Analysis and Applications, volume 29, issue 3, pp. 972-981, Oct. 2007. [27] G. G. Krishna, “Feedback with resource accounting in MIMO systems, ” B. T ech. Project Report, Indian Institute of T echnology Madras, May 2008. [28] D. N. C. Tse, P . V iswanath and L. Zheng, “Di versity-Multiplexing tradeoff in multiple-access channels, ” IEEE T ransactions on Information Theory , vol. 50, Issue 9, pp. 1859-1874, Sept. 2004. [29] L. Zhao, W . Mo, Y . Ma and Z. W ang, “ Diversity and multiplexing tradeoff in general fading channels, ” IEEE T ransactions on Information Theory , vol. 53(4), pp. 1549-1557, Apr . 2007. 23 [30] C. M. Bender, D. C. Brody and B. K. Meister , “On po wers of Bessel functions, ” Journal of Mathematical Physics , vol. 44 no. 1, Jan. 2003. [31] M. Abramowitz and I. A. Stegun, “Handbook of Mathematical Func- tions. ” Courier Dover Publications, 1995. [32] V . Aggarwal and A. Sabharwal, “Bits About the Channel: Multi-round Protocols for T wo-way Fading Channels, ” Submitted to IEEE T rans. Inf. Th., Sept 2009 , available at [33] W on-Y ong Shin, Sae-Y oung Chung and Y ong H. Lee, “Diversity- Multiplexing Tradeof f and Outage Performance for Rician MIMO Chan- nels, ” IEEE T ransactions on Information Theory , Mar. 2008. V aneet Aggarwal received the B.T ech. degree in 2005 from the Indian Institute of T echnology , Kanpur, India and the M.A. degree in 2007 from Princeton Univ ersity , Princeton, NJ, USA, both in Electrical Engineering. He is currently pursuing the Ph.D. degree in Electrical Engineering at Princeton Univ ersity , Princeton, NJ, USA. His research interests are in applications of information and coding theory to wireless systems and quantum error correction. He was the recipient of Princeton University’ s Porter Ogden Jacobus Honorific Fellowship in 2009. Ashutosh Sabharwal (S’91 - M’99 - SM’04) recei ved the B.T ech. degree from the Indian Institute of T echnology , Ne w Delhi, in 1993 and the M.S. and Ph.D. degrees from The Ohio State Uni versity , Columbus, in 1995 and 1999, respectively . He is currently an Assistant Professor in the Department of Electrical and Computer Engineering and also the Director of Center for Multimedia Communications at Rice Uni versity , Houston, TX. His research interests are in the areas of information theory and communication algorithms for wireless systems. Dr . Sabharwal was the recipient of Presidential Dissertation Fellowship award in 1998.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment