On the Capacity and Energy Efficiency of Training-Based Transmissions over Fading Channels

1 On the Capacity and Ener gy Ef ﬁci enc y of T raining-Based T ransmissions o v er F ading Channels Mustafa Cenk Gursoy Department of Electrical Engin eering Univ ersity of Nebraska-Lincol n, Lincoln, NE 68588 Email: gu rsoy@engr .unl .edu Abstract 1 In this paper , the capacity and e nergy efﬁciency of training -based commu nication schemes employed fo r transmission over a-prio ri unkn own Rayleigh block fading chann els are studied. In these schemes, periodica lly transmitted training symbols are used at the receiver to o btain the minimum mean- square-er ror (MM SE) estimate of the cha nnel fading coefﬁcients. Initially , the case in which the produ ct o f the estimate error an d transmitted signal is assum ed to b e Ga ussian noise is considered . In this case, it is shown that bit e nergy req uiremen ts grow without b ound as the sign al-to-no ise ratio ( SNR ) g oes to zero, a nd the minimum bit energy is achiev ed at a nonzer o SNR value below which one shou ld not o perate. The effect of the bloc k length on both the minimum bit energy and the SNR value at which th e minimum is a chieved is investigated. E nergy ef ﬁciency analysis is also carried out when peak power co nstraints are imposed on pilot signals. Flash train ing an d transmission schemes are analyze d a nd shown to imp rove the energy efﬁciency in the low- SNR regime. In the secon d pa rt of the paper, the capacity and energy efﬁciency of training -based schemes are investigated when the channe l input is subject to peak power constraints. The capacity-ach ieving input structure is character ized and the m agnitude distribution o f the optim al input is sho wn to be discrete with a ﬁnite number of m ass points. The capacity , bit energy req uiremen ts, and o ptimal resour ce a llocation strategies are obtaine d through numer ical analysis. The b it energy is again shown to grow without bou nd as SNR d ecreases to zero due to the presen ce o f peakedness constraints. Capacity and en ergy-per-bit are also analy zed under the assump tions that the tra nsmitter interleaves the data symbols befo re transmission over the chan nel, and per-symbol peak p ower con straints are imposed. The improvements in energy efﬁciency when on -off keying with ﬁxed peak p ower and vanishing du ty cycle is emp loyed are studied. Comparison s of the perform ances of training- based and noncoh erent transmission schemes are pr ovided. Index T erms: Channel capacity , energy-per-bit, energy ef ﬁciency , trainin g-based transmission, capacity-ach ieving input distribution, optimal resou rce allocation, Rayleig h block fading chan nels, chann el estima tion. 1 This work was supported in part by the NS F CAREE R Grant CCF-0546384. The material in this paper was presented in part at the IEEE International Symposium on Information Theory (IS IT), Nice, F rance, June 2007. October 26, 2018 DRAFT I . I N T RO D U C T I O N In wireless communications, channel conditions v ary randomly ov er time due t o mobility and changing en vironment, and the d egree of channel side information (CSI) assume d to be av ailable at the receiver and transmitter is a key assumption in the study of wireless fading chan nels. The case in which the chann el is assume d to be perfectly known at the receiv er a nd/or transmitter has been extensi vely studied. In an early work, Ericsson [1] obtained the capac ity of ﬂat fading channe ls with perfect rece i ver CSI. More recen tly , Ozarow et al. [2] studied the average and outage capacity values in the cellular mobile radio se tting assuming perfect chann el knowledge at the rece i ver . Goldsmith and V araiya [3] analyze d the capacity of ﬂa t fading ch annels with perfect CSI at the transmitter and /or receiv er . The a ssumption of having perfect c hannel kn owledge is un warranted when communication is tr ying to be established in a highly mo bile en vironment. This c onsideration has led to ano ther line of work whe re both the receiv er and transmitter are assume d to be completely uninformed of the chan nel conditions. Abou-Faycal et al. [4] studied the capacity of the u nknown Rayleigh fading c hannel a nd showed that the optimal inp ut amplitude has a discrete structure. This is in stark co ntrast to the optimality of a c ontinuous Gaussian input in known chan nels. In [16] and [18], the d iscreteness of the capac ity-achieving amp litude d istrib ution is pro ven for nonc oherent Rician fading cha nnels under input peakednes s co nstraints. Whe n the inp ut is sub ject to peak power constraints, the disc rete na ture of the o ptimal inpu t is shown for a gene ral class of s ingle-input single-output ch annels in [7]. Marzetta an d Hochwald [5] g av e a c haracterization of the optimal input s tructure for u nknown mu ltiple- antenna Rayleigh fading channels. This analysis subseq uently l ed to the proposal of unitary space -time modulation techniques [6]. Ch an e t a l [8] c onsidered conditionally Ga ussian mu ltiple-input multiple-output (MIMO) chan nels with bou nded inputs and proved the disc reteness of the o ptimal input un der certain c onditions. Zhen g and Tse [10] a nalyzed the multiple-antenna Ra yleigh cha nnels an d ide ntiﬁed the high signa l-to-noise ratio (SNR) behavior of the channe l c apacity . Heretofore, the two extreme ass umptions of having either perfect CSI or no CSI h av e bee n discuss ed. Practica l wireless systems li ve in b etween these tw o extremes. Unless there is very high mobility , wireless systems generally employ e stimation techniques to learn the cha nnel c onditions, albeit with errors. Henc e, it is o f utmost interest to analyze fading c hanne ls with imperfect CSI. M ´ edard [13] in vestigated the effect u pon chann el ca pacity of imperfect channel knowledge and obtained upper a nd lo wer bounds on the inp ut-output mutual information. Lapidoth and Sha mai [12] an alyzed the eff ects of channel estimation errors on the pe rformance if Gau ssian codebo oks are used and n earest neighbor deco ding is e mployed. The cap acity o f impe rfectly-known fading channe ls is characterized in the low- SNR regime in [14] and in the high - SNR regime in [9]. The aforementioned s tudies ha ve not considered exp licit training and estimation tec hniques, a nd res ources allocated to them. Recently , Hassibi an d Hochwald [23] studied training s chemes to learn the multiple-anten na 2 channe ls. In this work, power and time d edicated to training is optimized by maximizing a lower bound o n the capac ity . Similar training tec hniques are also discuss ed in [10]. Du e to its practical signiﬁcan ce, the information- theoretic a nalysis of training sc hemes has attracted much interest (see e.g., [24]-[35]). Since exact capa city expressions a re dif ﬁcu lt to ﬁnd, these studies h av e optimized the training s ignal power , duration, and p lacement using cap acity bound s. Since Ga ussian noise is the worst-case uncorrelated add iti ve noise in a Gaussian setting [23], a ca pacity lower b ound is generally obtaine d b y a ssuming the produc t of the estimate error and the transmitted signal as another source of Gaussian noise. In the above cited work, training symbols are employed to solely facilitate channe l e stimation. Howev er , w e note that training symbols ca n also be used for timing- a nd freque ncy- off set sync hronization, and channel equalization [36]-[ 38]. T on g et al. in [22 ] present an overvie w o f pilot- assisted wireless transmissions and disc uss de sign iss ues from both information-theoretic and signa l proc essing perspec ti ves. Another important conc ern in wir eles s co mmunications is the ef ﬁcient us e of limited energy resou rces. In systems where energy is at a premium, minimizing the energy cost per unit transmitted information will improve the efﬁciency . Hence, the energy required to reliably s end one bit is a metric that ca n be ad opted to measure the performance. Ge nerally , en ergy-per -bit requiremen t is minimized , and hence the energy efﬁciency is ma ximized, if the system ope rates in the low- SNR regime. In [14], V erd ´ u ha s analyze d the trade off between the spectral efﬁciency a nd bit energy in the low- SNR regime for a ge neral clas s of channels and shown that the normalized receiv ed minimum bit en ergy of − 1 . 59 dB is ac hieved a s S NR → 0 in averaged power limits chann els regardless of the av ailability of CSI at the receiver . On the other hand, [14] has proven that if the rece i ver has imperfect CSI, the wideb and slope, which is the slope o f the spe ctral e f ﬁcien cy curve a t zero s pectral efﬁciency , is zero. Hence, approa ching the minimum bit energy of − 1 . 59 dB is extremely slow , and moreover it requ ires input signals with increasingly high er pea k-to-av erage power ratios. The impact up on the energy ef ﬁcien cy of limiting the pea kedness of s ignals is ana lyzed in [17]. The wideband chan nel c apacity in the presence of input peakednes s constraints is in vestigated in [15], [19], an d [20]. Energy ef ﬁcien cy , wh ich is of p aramount importance in many wireless systems, ha s not bee n the core focus of the aforementioned work on training sch emes. Moreover , previous studies op timized the training parameters by using c apacity lower b ounds. These ac hiev able rate expressions are rele vant for s ystems in which the chan nel estimate is ass umed to be pe rfect and transmiss ion and reception is designe d for a kno wn c hannel. Note that these as sumptions will lead to poor pe rformance unless the S NR is high or the c hanne l c oherence time is long. The con trib utions of this pape r are the following: • W e provide an energy efﬁciency perspec ti ve by analyzing the performance of training tec hniques in the low- SNR regime. Note that a t low S NR levels, the qu ality o f the chan nel estimate is far from being pe rfect. W e quantify the p erformance losses in terms of energy ef ﬁcien cy in the worst-case scenario whe re the e stimate 3 is as sumed to be p erfect. W e identify a n SNR level be low which on e should av oid op erating. W e cons ider ﬂash training and transmission tech niques to improve the performance . • W e obtain the exact cap acity of training-bas ed schemes by ch aracterizing the structure of the cap acity- achieving input distrib ution un der input pe ak po wer constraints whic h are h ighly rele vant in practical applications. Op timal resource allocation is performed using the exact c apacity values. Improvements in energy ef ﬁciency with resp ect to the worst-case scena rio are s hown. • W e c ompare the pe rformances of untrained noncoh erent and training-based c ommunication schemes und er peak power constraints and show through numerical results that performance loss expe rienced by training- based sch emes is small even at lo w SNR levels and s mall values o f cohe rence time. On the other h and, if data symbols a re interlea ved a nd experience ind epende nt fading, we s how that training-bas ed schemes outperform non coherent technique s. • W e ﬁ nd the attainable bit energy lev els in the low- SNR re gime when limitations on the peak-to-av erage power ratio are relaxed and on -of f keying with ﬁxed power and v anishing duty cyc le is u sed to transmit information. The organization of the pape r is as follo ws. Se ction II p rovides the cha nnel model. In S ection III, training-based transmission and recep tion is desc ribed. In Section IV , we stud y the achiev able rates an d energy e f ﬁciency in the ca se where the p roduct of the ch annel estimate a nd the transmitted s ignal is as sumed to be Gauss ian noise. In Section V, we analyze the capa city an d the energy efﬁciency of training-based sch emes wh en the input is subject to pea k power limitations. Section VI includes our co nclusions . Proo fs of several results are relegated to the App endix. I I . C H A N N E L M O D E L W e con sider Rayleigh bloc k-fading channe ls wh ere the inp ut-output relationsh ip within a block of m symb ols is given by y = h x + n (1) where h ∼ C N (0 , γ 2 ) 2 is a zero-mean circularly s ymmetric complex Gau ssian random variable with variance E {| h | 2 } = γ 2 , and n is a z ero-mean, m c omplex-dimensional Gaussian random vector 3 with covariance matrix E { nn † } = N 0 I . x and y are the m complex-dimensional channe l input and outpu t vectors, respe cti vely . It is assume d that the fading c oefﬁcients stay cons tant for a bloc k of m symbo ls and have ind epende nt realizations 2 x ∼ C N ( d , Σ ) is used to denote that x i s a complex Gaussian r andom vector with mean E { x } = d and cov ariance E { ( x − d )( x − d ) † } = Σ 3 Note that in the channel model ( 1), y , x , and n are column vectors. 4 for each block . It is further a ssumed that neither the trans mitter nor the rec eiv er has prior knowledge of the realizations of the fading coefﬁcients. I I I . T R A I N I N G - B A S E D T R A N S M I S S I O N A N D R E C E P T I O N W e a ssume that pilot symbols are employed in the system to facilitate channel es timation a t the rece i ver . Hence, the s ystem o perates in two phases, n amely training and data transmission. In the training phase, pilot symbols known at the rece i ver are sen t from the trans mitter and the receiv ed signal is y t = h x t + n t (2) where y t , x t , and n t are l -dimension al vectors signifying the fact that l out of m inp ut symbo ls are d ev oted to training. It is assume d that the rece iv er emp loys minimum me an-square e rror (MMSE) estimation to o btain the estimate ˆ h = E { h | y t } = γ 2 γ 2 k x t k 2 + N 0 x † t y t . (3) W ith this e stimate, the fading co efﬁcient can n ow be expressed as h = ˆ h + ˜ h (4) where ˆ h ∼ C N  0 , γ 4 k x t k 2 γ 2 k x t k 2 + N 0  and ˜ h ∼ C N  0 , γ 2 N 0 γ 2 k x t k 2 + N 0  . (5) Note that ˜ h denotes the error in the channel es timate. Foll owing the training phase, the transmitter sends the ( m − l )-dimensional data vector x d , an d the receiver eq uipped with the k nowledge of the channe l estimate operates on the receiv ed signal y d = ˆ h x d + ˜ h x d + n d (6) to recover the trans mitted information. W e note that sinc e training-based schemes are studied in this pa per , memoryless fading ch annels in whic h m = 1 are not co nsidered, and it is as sumed throughou t the pap er that the block length s atisﬁes m ≥ 2 . I V . A C H I E V A B L E R AT E S A N D E N E R G Y E FF I C I E N C Y I N T H E W O R S T C A S E S C E N A R I O A. A verage P o wer Limited Case In this s ection, we a ssume that the input is s ubject to an average power constraint E {k x k 2 } ≤ mP . (7) 5 Our overall goal is to identify the bit energy values that c an be attained with optimized training parameters su ch as the power and duration of pilot symbols . The lea st amo unt of e nergy req uired to send on e information bit reliably is gi ven by 4 E b N 0 = SNR C ( SNR ) (8) where C ( SNR ) is the channel capacity in bits/symbol. In this section, we follow the gene ral approach in the literature an d consider a lower bound on the chan nel capacity by a ssuming that z = ˜ h x d + n d (9) is a Gau ssian noise vec tor that has a cov ariance of E { zz † } = σ 2 ˜ h E { x d x † d } + N 0 I , (10) and is uncorrelated with the input signa l x d . W ith this a ssumption, the channel model be comes y d = ˆ h x d + z . (11) This mod el is called the worst-case scen ario since the ch annel estimate is assumed to be perfect, and the noise is mode led as Gaussian, which presen ts the worst ca se [23]. The c apacity of the cha nnel in (11 ), wh ich acts as a lower bound on the capacity of the chann el in (6), is a chieved by a Gaussian input with E { x d x † d } = (1 − δ ∗ ) mP m − 1 I (12) where δ ∗ is the op timal fraction of p ower allocated to the p ilot symbol, i.e., | x t | 2 = δ ∗ mP . T he o ptimal value is given by δ ∗ = p η ( η + 1) − η (13) where η = m SNR + ( m − 1) m ( m − 2) SNR and SNR = γ 2 P N 0 . (14) Note tha t SNR in (14) is the received s ignal-to-noise ratio. In the average power limited case , sending a single pilot is optimal because instead of increasing the number of pilot symbols, a single pilot with higher po wer can be used a nd a decrea se in the d uration of the data trans mission can be av oided. H ence, the op timal x d is an 4 Note that E b N 0 is the bit energy normalized by the noise power spectral level N 0 . 6 ( m − 1 )-di mens ional Gaus sian vector . S ince the above results are inde ed spec ial case s of thos e in [23 ], the details are omitted. Th e resulting c apacity expression 5 is C L ( SNR ) = m − 1 m E w  log  1 + φ ( SNR ) SNR 2 ψ ( SNR ) SNR + ( m − 1) | w | 2  = m − 1 m E w  log  1 + f ( SNR ) | w | 2  nats/symbol (15) where φ ( SNR ) = δ ∗ (1 − δ ∗ ) m 2 , and ψ ( SNR ) = (1 + ( m − 2) δ ∗ ) m, (16) and w ∼ C N (0 , 1) . Note also that the expec tation in (15) is with resp ect to the random variable w . T he bit energy v alues in this setting are giv en by E b,U N 0 = SNR C L ( SNR ) log 2 (17) where C L is in nats/sy mbol. E b,U N 0 provides the lea st a mount of normalized b it e nergy values in the worst-case scena rio and also serves as an upper bound on the achievable bit energy levels of channe l (6 ). It is shown in [12] that if the chan nel estimate is assume d to be pe rfect, and Gaussian codeb ooks de signed for known c hannels a re used, a nd s caled ne arest ne ighbor dec oding is employed at the receiv er , then the gen eralized mutual information has an express ion similar to (15) (see [12, Co rollary 3.0.1]). Hence E b,U N 0 also gi ves a good indication of the energy requ irements of a sy stem operating in this fashion. The next result provides the asymptotic beh avior of the bit e nergy as SNR dec reases to zero. Pr oposition 1: The normalized bit energy (17) grows without bound as the signal-to-noise ratio d ecrease s to zero, i.e., E b,U N 0     C L =0 = lim SNR → 0 SNR C L ( SNR ) log 2 = log 2 ˙ C L (0) = ∞ . (18) Pr oof : In the low SNR regime, we h av e C L ( SNR ) = m − 1 m  f ( SNR ) E {| w | 2 } + o ( f ( SNR ))  (19) = m − 1 m ( f ( S NR ) + o ( f ( SNR ))) . (20) As SNR → 0 , δ ∗ → 1 / 2 , and henc e φ ( SNR ) → m 2 / 4 and ψ ( SNR ) → m + m ( m − 2) / 2 . Therefore, it ca n ea sily be see n that f ( SNR ) = m 2 4( m − 1) SNR 2 + o ( S NR 2 ) (21) from which we have ˙ C L (0) = 0 .  5 Unless speciﬁed otherwise, all logarithms are to the base e . 7 0 0.5 1 1.5 2 2.5 3 −2 0 2 4 6 8 10 SNR E b /N 0 (dB) m = 3 m = 5 m = 10 m = 20 m = 50 m = 100 m = 200 m = 10 4 Fig. 1. Energy per bit E b,U / N 0 vs. SNR in the worst-case scenario The fact that C L decreas es as S NR 2 as SNR goes to zero has already bee n pointed out in [23]. Th e reason for this behavior is that a s SNR de creases , the power of ˆ h (5) de creases linearly with S NR and he nce the qua lity of the channe l e stimate deteriorates. Since the chan nel estimate is as sumed to b e perfect, the e f fective signal-to-noise ratio de cays a s SN R 2 leading to the ob served result. P roposition 1 shows the impact of this behavior on the energy-per -bit, and indicates that it is extremely energy-inef ﬁcient to o perate at very lo w SNR values. Th e result holds regardless of the size of the bloc k length m as long as it is ﬁnite. W e further c onclude that in a training- based sche me whe re the channel estimate is a ssumed to be perfect, the minimum e nergy per b it is achieved at a nonze ro SNR value. This mo st ene r gy-efﬁcient o perating point ca n be obtained by nume rical analysis. W e can easily compu te C L ( SNR ) in (15), and henc e the bit en ergy values. Figure 1 plots the normalized bit energy cu rves as a function of SNR for block lengths of m = 3 , 5 , 10 , 20 , 50 , 100 , 200 , 10 4 . As p redicted, for ea ch block length value, the minimum bit energy is achiev ed at non zero SNR , and the bit energy requ irement inc reases as SNR → 0 . It is been noted in [23] that training-bas ed sc hemes, which assume the ch annel es timate to be perfect, pe rform poo rly at very low S NR values, and the exac t transition point below which one should n ot operate in this fashion is deemed as not clear . Here, we propose the SNR level a t which the minimum bit energy is achieved as a transition point since operating b elow this point results in h igher b it energy 8 requirements. It is further s een in Fig. 1 that the minimum bit en ergy is attained at an SNR value that s atisﬁes d d SNR  E b,U N 0  = d d SNR  SNR log 2 C L ( SNR )  = 0 . (22) Another observation from Fig. 1 is that the minimum bit energy decrea ses with increasing m and is achieved at a lower SNR value. The followi ng result sheds a light on the asymptotic be havior o f the cap acity as m → ∞ . Theorem 1: As the block length m inc reases, C L approach es to the capacity of the perfectly kn own chan nel, i.e., lim m →∞ C L ( SNR ) = E w { log(1 + SNR | w | 2 ) } . (23) Moreover , deﬁn e χ = 1 /m . Then dC L ( SNR ) dχ     χ =0 = −∞ . (24) Pr oof : W e have lim m →∞ C L ( SNR ) = lim m →∞ E w  log  1 + f ( SNR ) | w | 2  (25) = E w n lim m →∞ log  1 + f ( SNR ) | w | 2  o (26) = E w n log  1 + | w | 2 lim m →∞ f ( SNR ) o (27) = E w  log  1 + SNR | w | 2  . (28 ) (25) follo ws from the fact that ( m − 1) /m → 1 as m → ∞ . F or (26) to hold, we in voke the Domina ted Con ver genc e Th eorem [40]. No te that   log(1 + f ( SNR ) | w | 2 )   ≤ f ( SNR ) | w | 2 (29) = φ ( SNR ) SNR 2 ψ ( SNR ) SNR + ( m − 1) | w | 2 (30) ≤ φ ( SNR ) ψ ( SNR ) SNR | w | 2 (31) = δ ∗ (1 − δ ∗ ) m 2 m + m ( m − 2) δ ∗ SNR | w | 2 (32) = (1 − δ ∗ ) m 2 m δ ∗ + m ( m − 2) SNR | w | 2 (33) ≤ (1 − δ ∗ ) m m − 2 SNR | w | 2 (34) ≤ 3 SNR | w | 2 for m ≥ 3 (35) where (34) is obtaine d by removing m δ ∗ in the denominator and (35) follows from the facts tha t 1 − δ ∗ ≤ 1 an d m m − 2 ≤ 3 for a ll m ≥ 3 . If m = 2 , we have φ ( SNR ) = 1 , ψ ( SNR ) = 2 , and hence   log(1 + f ( SNR ) | w | 2 )   = log  1 + SNR 2 2 SNR + 1 | w | 2  ≤ 1 2 SNR | w | 2 . (36) 9 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 1/blocklength (1/m) E b /N 0 min (dB) Fig. 2. Minimum energy per bit E b,U N 0 min vs. 1 m in the worst-case scenario Therefore, 3 SNR | w | 2 is an u pper boun d that applies for all integer values m ≥ 2 . Furthermore, the upper bou nd does not depe nd o n m and is integrable, i.e., E w { 3 SNR | w | 2 } = 3 SNR < ∞ . Hence, the Domina ted Con ver gence Theorem applies and (26) is justiﬁed. (27) is d ue to the fact that logarithm is a c ontinuous function. (28) can easily be veriﬁed b y noting that m 2 δ ∗ is the fastest growi ng co mponent, increasing as m 3 2 with increasing m . (24) follows again from the application of the Do minated Co n ver gence Theorem and the fact tha t the deriv ati ve of f ( SNR ) with res pect to χ = 1 /m a t χ = 0 is −∞ .  The ﬁ rst part o f T heorem 1 is no t surprising an d is expected becaus e reference [5] h as alread y shown that as the block length grows, the perfect knowledge capacity is achieved even if no channel estimation is performed. T his result agrees with our ob servation in Fig. 1 that − 1 . 59 dB is a pproach ed at lower SNR values as m increase s. Howe ver , the rate of ap proach is very slow in terms o f the block size , as prov en in the se cond part of Theorem 1 and evidenced in Fig. 2. Due to the inﬁnite slope 6 observed in the ﬁg ure, approaching − 1 . 59 dB is very demanding in block length. 6 Note that Theorem 1 implies that the slope of SNR C L ( SNR ) at χ = 1 m = 0 i s ∞ . 10 B. P eak P owe r Constraint o n the P ilot Heretofore, we have ass umed that there are no peak power co nstraints imposed on either the data or pilot symbols. Reca ll that the power of the pilot symbo l is given by | x t | 2 = δ ∗ mP = p ξ ( ξ + mP ) − ξ (37) where ξ = mγ 2 P + ( m − 1) N 0 ( m − 2) γ 2 . W e immediately o bserve from (37) that the pilot power increases at lea st as √ m as m increa ses. For large block sizes, su ch an increase in the pilot power may be proh ibiti ve in p ractical sys tems. Therefore, it is of interest to impose a peak power constraint on the pilot in the following form: | x t | 2 ≤ κP . (38) Since the average power is uniformly distrib uted over the data symbols, the average power of a d ata s ymbol is proportional to P an d is at most (1 − δ ∗ )2 P for a ny block s ize. Therefore, κ can be seen as a limitation on the peak-to-average power ratio. Note that we will a llo w Gaus sian signaling for data transmission. Henc e, the re are no hard pe ak power limitations on data sign als. This approa ch will ena ble us to work with a closed-form cap acity expression. A lthough Gaus sian signals can theoretically assume lar ge values, the p robability of such values is decreas ing exponentially . The case in wh ich a peak power constraint is impos ed on bo th the training and d ata symbols is treated in the S ection V. If the optimal power allocated to a single pilot exceeds κP , i.e ., δ ∗ mP > κP ⇒ δ ∗ m > κ , the peak power constraint on the pilot be comes acti ve. In this case , more than just a single pilot may be needed for optimal performance. In this s ection, we a ddress the optimization of the n umber of pilot symb ols whe n e ach pilot symbo l ha s ﬁxed power | x t,i | 2 = κP ∀ i . If the number of pilot symbols is l < m , then k x t k 2 = lκP and, as we know from Section III, ˆ h ∼ C N  0 , γ 4 lκP γ 2 lκP + N 0  and ˜ h ∼ C N  0 , γ 2 N 0 γ 2 lκP + N 0  . Similarly a s before, when the estimate error is assu med to be a nother s ource of a dditi ve n oise and overall additi ve noise is ass umed to be Gau ssian, the inp ut-output mutua l information achieved by Gaus sian signaling is given by I L,p = m − l m E w  log  1 + g ( SNR , l ) | w | 2  (39) where w ∼ C N (0 , 1) an d g ( SNR , l ) = lκ ( m − l κ ) SNR 2 ( m − l κ + ( m − l ) l κ ) SNR + m − l . (40) The o ptimal value of the training du ration l tha t maximizes I L,p can be obtained through n umerical op timization. 11 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 −2 −1 0 1 2 3 4 5 SNR E b /N 0 (dB) m = 50 m = 100 m = 200 m = 500 m = 10 3 m = 10 4 Fig. 3. Energy per bit E b,U / N 0 vs. SNR for block si zes of m = 50 , 100 , 200 , 500 , 10 3 , 10 4 . The pilot peak power constraint is | x t | 2 ≤ 10 P . Fig. 3 plots the normalized bit en ergy values SNR log 2 I L,p in dB obtained with optimal training du ration for d if ferent block lengths. The peak power cons traint imposed on a pilot symb ol is | x t | 2 ≤ 10 P . Fig. 4 gi ves the optimal number of pilot symbols pe r block. F rom Fig. 3, we obse rve that the minimum bit energy , which is ag ain achieved at a nonzero value o f the SN R , d ecrease s with increasing block length and app roaches to the fundamenta l limit of − 1 . 59 dB. W e note from Fig. 4 that the n umber pilot symbols per block inc reases as the block length inc reases or as SNR dec reases to zero. When the re are no pea k cons traints, δ ∗ → 1 / 2 as SNR → 0 . Hence, we need to allocate a pproximately half of the av ailable total power mP to the single pilot signal in the low-po wer regime, increasing the peak-to-average power ratio. In the limited p eak power case, this requirement is translated to the requirement of more pilot symb ols per block a t low SNR values. T able I lists, for dif ferent values of m , the minimum bit energy values, the required number of pilot symbols at this level, and the S NR at which minimum bit en ergy is achieved. It is a gain assu med tha t κ = 10 . The last column of the table p rovides the minimum bit ene r gy attaine d when there are no peak power con straints on the pilot signal. As the block size increa ses, the minimum bit energy is ach iev ed at a lo wer SNR value while a longer training duration is required. Furthermore, comparison with the las t column indicates tha t the loss in minimum 12 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 5 10 15 20 25 30 SNR Number of Pilot Symbols 1 m = 50 m = 200 m = 500 m = 10 3 m = 10 4 m = 100 Fig. 4. Number of pilot symbols per block vs. SNR bit energy incurred by the p resence of pea k power co nstraints is negligible. The following result shows that the capac ity of the perfectly known channe l, an d henc e the minimum bit energy o f − 1 . 59 dB, is approa ched with simultaneous growth of training duration and block len gth. Note that this res ult conforms with the res ults in T able I. Pr oposition 2: Assu me that the training duration l ( m, SNR ) inc reases as m increase s and satisﬁes lim m →∞ l ( m, S NR ) m = 0 . (41) Then, lim m →∞ I L,p = E w { log(1 + SNR | w | 2 ) } . Pr oof : W e have lim m →∞ I L,p = lim m →∞  1 − l m  E w  log  1 + g ( SNR , l ) | w | 2  = lim m →∞ E w  log  1 + g ( SNR , l ) | w | 2  (42) = E w n lim m →∞ log  1 + g ( SNR , l ) | w | 2  o (43) = E w { log(1 + SNR | w | 2 ) } . (44) 13 T ABLE I E b,U N 0 min (dB) # of pilots S NR E b,U N 0 min (dB) ( no peak constraints) m = 50 1.441 1 0. 41 1.440 m = 100 0.897 2 0.28 0.871 m = 200 0.413 3 0.22 0.404 m = 500 -0.079 5 0.16 - 0.085 m = 10 3 -0.375 9 0.12 -0.378 m = 10 4 -1.007 44 0.05 -1.008 (42) follo ws from the con dition (41) . (43) can be justiﬁed b y in voking the Dominated Conv ergence Theorem [40] similarly as in the p roof of T heorem 1. (44) follo ws from lim m →∞ g ( SNR , l ) = S NR , (45) which holds if the con ditions of the the orem are me t.  C. Flash T raining and T ransmission One app roach to improve the energy e f ﬁciency in the low SNR regime is to increase the peak power of the transmitted signa ls. This can be a chieved by transmitting ν fraction of the time with power P /ν . Note that training also needs to be performed only ν fraction of the time. In this s ection, n o peak power cons traints are impo sed on pilot symbols. This type of training and communication, called ﬂash training and transmission scheme, is analyze d in [11] where it is shown that the minimum bit energy of − 1 . 59 dB ca n b e achieved if the block length m increase s at a certain rate as SNR decreases . In the setting we co nsider , ﬂa sh transmiss ion s cheme achieves the following rate: C f L ( SNR , ν ) = ν ( SNR ) C L  SNR ν ( SNR )  (46) where 0 < ν ( · ) ≤ 1 is the duty cycle wh ich in general is a function of the SNR . First, we show that ﬂa sh transmission us ing peaky Gaussian signals do es not improve the minimum bit e nergy . Pr oposition 3: For any duty cycle function ν ( · ) , inf SNR SNR C f L ( SNR , ν ) ≥ inf SNR SNR C L ( SNR ) . (47) 14 Pr oof : Note that for any SNR and ν ( SN R ) , SNR C f L ( SNR , ν ) = SNR ν ( SNR ) C L  SNR ν ( SNR )  = ˜ SNR C L ( ˜ SNR ) ≥ inf SNR SNR C L ( SNR ) (48) where ˜ SNR is deﬁned as the new S NR lev el. Since the inequa lity in (48) holds for a ny SNR and ν ( · ) , it also holds for the inﬁ mum of the left-hand side of (48), a nd henc e the result follows.  W e class ify the duty cyc le function into three c ategories: 1) ν ( · ) tha t satisﬁe s lim SNR → 0 SNR ν ( SNR ) = 0 2) ν ( · ) tha t satisﬁe s lim SNR → 0 SNR ν ( SNR ) = ∞ 3) ν ( · ) tha t satisﬁe s lim SNR → 0 SNR ν ( SNR ) = a for some constan t a > 0 . Next, we analyze the performance of e ach ca tegory of duty cycle functions in the low- SNR regime. Theorem 2: If ν ( · ) is chose n from either Ca tegory 1 or 2, E b,U N 0     C f L =0 = lim SNR → 0 SNR C f L ( SNR , ν ) log 2 = ∞ . (49) If ν ( · ) is chos en from Category 3, E b,U N 0     C f L =0 = m m − 1 a E w { log 2 (1 + f ( a ) | w | 2 ) } . (50) Pr oof : W e ﬁrst note that by J ensen ’ s inequality , C f L ( SNR , ν ) SNR ≤ m − 1 m ν ( SNR ) SNR log  1 + f  SNR ν ( SNR )  (51) def = ζ ( SNR , ν ) . (52) First, we cons ider category 1 . In this cas e, as SNR → 0 , SNR ν ( SNR ) → 0 . As sh own before, the logarithm in (51) scales as SNR 2 ν ( SNR 2 ) as SNR → 0 , an d hence ζ ( SNR , ν ) sca les as SNR ν ( SNR ) leading to lim SNR → 0 C f L ( SNR , ν ) SNR ≤ lim SNR → 0 ζ ( SNR , ν ) = 0 . (53) In category 2, SNR ν ( SNR ) grows to inﬁnity as SNR → 0 . Since the log( · ) function on the right han d side of (51 ) increases only log arithmically as SNR ν ( SNR ) → ∞ , we ca n easily verify that lim SNR → 0 C f L ( SNR , ν ) SNR ≤ lim SNR → 0 ζ ( SNR , ν ) = 0 . (54) In category 3, ν ( SNR ) decreas es a t the sa me rate a s SNR . In this cas e, we have lim SNR → 0 C f L ( SNR , ν ) SNR = lim n →∞ C f L  1 n , ν  1 n (55) = m − 1 m E w { lim n →∞ log  1 + f ( 1 nv ) | w | 2  } a (56) = m − 1 m E w { log  1 + f ( a ) | w | 2  } a (57) 15 0 0.5 1 1.5 2 2.5 3 0 1 2 3 4 5 6 7 8 9 10 SNR E b /N 0 (dB) flash transmission m = 10 non−flashy transmission Fig. 5. Energy per bit E b,U / N 0 vs. SNR for non-ﬂashy and ﬂash t ransmissions. (56) is jus tiﬁed by in v oking the Do minated Con vergence The orem and no ting the integrable upp er bound     log  1 + f  1 nν  | w | 2      ≤ 3 1 nν | w | 2 ≤ 3 ν | w | 2 for n ≥ 1 . The above upp er bound is gi ven in the proof o f Theorem 1. Finally , (57) follows from the continuity of the logarithm.  Theorem 2 sh ows that if the rate of the de crease of the duty cycle is faster o r slower than SNR as SNR → 0 , the bit energy requirement s till increases withou t bo und in the low- SNR regime. Th is obs ervation is tightly linked to the fact that the c apacity cu rve C L has a zero s lope as both SNR → 0 a nd S NR → ∞ . For improved p erformance in the lo w- SNR regime, it is required that the duty cyc le scale as SN R . A pa rticularly good choice is ν ( SNR ) = 1 a ∗ SNR where a ∗ is equal to the S NR level at wh ich the minimum bit en ergy is a chieved in a non-ﬂashy transmission scheme . W ith this ch oice, we ba sically perform time-sha ring between SNR = 0 an d SNR = a ∗ . Fig. 5 plots the normalized bit energy E b,U N 0 as a function o f SNR for b lock size m = 10 . The minimum bit energy is ac hiev ed a t SNR = 0 . 8 . For SNR < 0 . 8 , ﬂash trans mission is e mployed with ν ( SNR ) = 1 / 0 . 8 SNR . As obs erved in the ﬁgu re, the minimum bit e nergy level ca n be ma intained for lo wer values of SNR at the cost of increased peak-to-average 16 power r atio. It should be noted that the optimal point of operation is still at SNR = 0 . 8 since operating at SNR < 0 . 8 will result in reduc ed data rates without any improvements in the bit en ergy . From a d if ferent pe rspective, if SNR is the signal-to-noise ratio pe r unit ba ndwidth, then inc reasing the bandwidth so tha t SNR < 0 . 8 w ill not produce any energy s avings. Howe ver , in circumstances in which regulations or device properties dictate operation at SNR values lower than the minimum bit ene rgy point, ﬂash transmission ca n be ad opted to improve the energy efﬁciency . V . C A PAC I T Y A N D E N E R G Y E FFI C I E N C Y I N T H E P R E S E N C E O F P E A K P OW E R L I M I TA T I O N S In this s ection, we c onsider the c hannel y d = ˆ h x d + ˜ h x d + n d (58) and ass ume that the ch annel input is subject to the follo wing peak power constraint k x k 2 a.s. ≤ mP . (59) In this setting, it is again ea sy to se e that the transmission of a single p ilot is optimal. Since the pea k p ower constraint is impo sed on the inpu t vector x , the p ilot power ca n be varied instead of inc reasing the nu mber of pilot symbols. Similarly as be fore, we ass ume that the pilot symbol power is | x t | 2 = δ mP . (60) Therefore, the ( m − 1 )-dimensiona l data vector x d is sub ject to k x d k 2 a.s. ≤ (1 − δ ) mP . (61) Our goa l is to s olve the maximization problem C = sup δ ∈ (0 , 1) sup x d k x d k 2 a.s. ≤ (1 − δ ) mP 1 m I ( x d ; y d | ˆ h ) (62) and obtain the chan nel ca pacity , and identify the ca pacity-achieving input distrib ution an d the optimal value of the power allocation coe f ﬁcient δ . The input-output mutual information is I ( x d ; y d | ˆ h ) = E ˆ h E x d Z f y | x d , ˆ h ( y | x d , ˆ h ) log f y | x d , ˆ h ( y | x d , ˆ h ) f y | ˆ h ( y | ˆ h ) d y (63) where f y | x d , ˆ h ( y | x d , ˆ h ) = exp  − ( y − ˆ h x d ) † ( ˜ γ 2 x d x † d + N 0 I ) − 1 ( y − ˆ h x d )  π m − 1 N m − 2 0 ( ˜ γ 2 k x d k 2 + N 0 ) (64) and ˜ γ 2 = E {| ˜ h | 2 } = γ 2 N 0 γ 2 δ mP + N 0 . (65) 17 First, we hav e the following preliminary result o n the s tructure of the capacity-ac hieving input distrib ution. Theorem 3: For the block fading chan nel (58) whe re the input is subje ct to a peak power limitation (61), the capac ity-achieving input vector can be written as x d = k x d k v whe re k x d k is a nonnegativ e real random variable and v is an independ ent isotropically distrib uted unit random vector . Pr oof : The proof follows primarily from the same technique s developed in [5]. First note the in variance of the peak c onstraint (61) to rotations of the inpu t. S ince f y | x (Φ y | Φ x d , ˆ h ) = f y | x d , ˆ h ( y | x d , ˆ h ) for a ny ( m − 1) × ( m − 1) dimensional deterministic unitary ma trix Φ , it can b e easily se en tha t the mutual information is als o in variant to deterministic rotations of the input, an d the result follo ws from the conc avity of the mutua l information which implies that there is no loss in optimality if one uses circularly s ymmetric input distributions.  W ith this characterization, the problem has bee n reduced to the optimization of the input magnitude distributi on, F x d . W e ﬁrst obtain an equi valent expression for t he mu tual information when the t he input vector has the structure described in T heorem 3. Theorem 4: When the input is x d = k x d k v where v is a n isotropica lly d istrib uted unit v ector that is in- depend ent o f the ma gnitude k x d k , the inp ut-output mutual information of the chan nel (58 ) ca n be expressed as I ( x d ; y d | ˆ h ) = I ( F r | ˆ h ) = − E K ,r  Z ∞ 0 f R | r, K ( R | r , K ) log g ( R, F r , K ) dR  − E r { log(1 + r 2 ) } − ( m − 1) (66) where f R | r, K ( R | r , K ) =      R m − 2 ( m − 3)! e − R − K r 2 1+ r 2 1+ r 2 R 1 0 (1 − a ) m − 3 e ar 2 R 1+ r 2 I 0  2 √ K R r √ a 1+ r 2  da m ≥ 3 e − R + K r 2 1+ r 2 1+ r 2 I 0  2 √ K R r 1+ r 2  m = 2 (67) and g ( R, F r , K ) = ( m − 2)! R m − 2 Z ∞ 0 f R | r, K ( R | r , K ) dF r . (68) In the above formulations, R = k y k 2 N 0 , r = ˜ γ k x d k √ N 0 , and K = | ˆ h | 2 ˜ γ 2 . Furthermore, F r denotes the distribution function of r . K is an exponential random variable with me an E { K } = E {| ˆ h 2 |} ˜ γ 2 = γ 2 δmP N 0 . E K ,r denotes the expectation with respec t to K a nd r . Pr oof : See Appe ndix A. Note that the integral in the mutual information expression in (63) is in gene ral a n 2( m − 1) -fold integral. In (66), this has bee n reduce d to a do uble integral providing a signiﬁcant simpliﬁcation especially for numerical analysis. W ith this result, the c hannel capa city in nats p er symbol ca n now be reformulated as C = sup δ ∈ (0 , 1) C δ = sup δ ∈ (0 , 1) sup F r r a.s. ≤ √ L 1 m I ( F r | ˆ h ) (69) 18 where L = γ 2 (1 − δ ) mP γ 2 δmP + N 0 . Hence, the capacity is obtained through the optimal c hoices of the power allocation coefﬁcient δ and n ormalized inp ut ma gnitude distrib ution F r . Since the inner ma ximization is over a continuous alphabet, the existence of the c apacity-ach ieving distrib ution F r is not guarantee d. Next, we prove the existence of a ca pacity-achieving input distribution and provide a suf ﬁcient and nece ssary condition for an input to be optimal. Theorem 5: Fix the v alue of δ ∈ (0 , 1) and consider the inner ma ximization in (69). There exists a n inp ut distrib ution F r that maximizes the mutual information I ( F r | ˆ h ) . Moreover , an inpu t distrib ution F r is capa city- achieving if and only if the following Kuhn-T ucker cond ition is satisﬁed : Φ( r ) = E K  Z ∞ 0 f R | r, K ( R | r , K ) log g ( R, F r , K ) dR  + log (1 + r 2 ) + mC δ + ( m − 1) ≥ 0 ∀ r ∈ [0 , √ L ] (70) with e quality at the p oints of increase of F r 7 . I n the a bove cond ition, C δ denotes the result of the inn er maximization in (69) . Pr oof : See Appe ndix B. Having shown t he existen ce of the capa city-achieving input distrib ution and a s ufﬁcient and necessary condition for an input distrib ution to be optimal, we turn our attention to the ch aracterization of the optimal input. Theorem 6: Fix the value of δ ∈ (0 , 1) . The inpu t distributi on that maximizes the mutual information I ( F r | ˆ h ) is discrete with a ﬁn ite number of ma ss points Pr oof : The following upper boun d is obtained in Ap pendix B: g ( R, F , K ) ≤ ( m − 2) e − R 1+ L + √ K R (71) Using this u pper boun d, we have E K  Z ∞ 0 f R | r, K ( R | r , K ) log g ( R, F r , K ) dR  = E K E R | r, K { log g ( R , F r , K ) } (72) ≤ log( m − 2) − E K E R | r, K  R 1 + L  + E K E R | r, K n √ K R o (73) ≤ log( m − 2) − E K E R | r, K  R 1 + L  + E K n √ K q E R | r, K { R } o (74) ≤ log( m − 2) − (1 + E K { K } ) r 2 + m − 1 1 + L + E K n √ K p (1 + K ) r 2 + m − 1 o . (75) (73) follo ws from (71 ), a nd (74) follo ws from the fact that E { √ R } ≤ p E { R } . Finally , (75) is o btained by noting that E R | r, K { R } = (1 + K ) r 2 + m − 1 . Note that the upper bound in (75), and hence the left-hand-side of (70), dec reases to −∞ as r → ∞ due to the pres ence of − r 2 in the se cond term. 7 The set of points of increase of a distribution function F i s { r : F ( r − ǫ ) < F ( r + ǫ ) ∀ ǫ > 0 } . 19 W e prov e the result by c ontradiction. Hence, we now a ssume that the optimal input d istrib ution F 0 has an inﬁnite nu mber of p oints of inc rease on a bounde d interval. Next, we extend the Φ( · ) in (70 ) to the complex domain: Φ( z ) = E K  Z ∞ 0 f R | r, K ( R | z , K ) log g ( R , F r , K ) dR  + log (1 + z 2 ) + C + ( m − 1) (76) where z ∈ C and log is the principle branch of the logarithm. Th e Identity Th eorem for analytic functions [41] states that if two functions are analytic in a region and if they coincide for an inﬁnite number o f distinct points having a limiting p oint, they a re eq ual ev erywhe re in that region. It is s hown in App endix C that Φ( z ) is a nalytic in a re gion D that includes the positive real line. By the above assumption on the optimal input distrib ution, Φ( z ) = 0 for a n inﬁnite nu mber of points having a limiting point 8 in region D . Therefore, by the Identity Theorem, we s hould have Φ( r ) = 0 for all r ≥ 0 . Clea rly , this is not poss ible from the upper bound in (74) which di ver ges to − ∞ as r → ∞ . Henc e, the optimal input cannot have an inﬁnite n umber of points of increase on a bounded interval, from which we co nclude that the optimal input distrib ution is disc rete with a ﬁnite numb er of mass po ints.  After the charac terization of the discrete nature of the optimal input, the optimization problem in (69) c an be solved using vector optimization technique s. Nu merical results indica te that the optimal magnitude distrib ution F r has a single mass at the peak lev el r = √ L for low-to-medium received peak SNR = γ 2 P N 0 lev els. Hence, all the information is carried by the isotropica lly distributed directional un it vector . T herefore, information transmission is ach iev ed by sending points on the surface of an ( m − 1) -dimensiona l complex sphere with radius √ LN 0 ˜ γ . Note that the mutual information (in nats per m symbols) ach iev ed by having a single-mass a t r = √ L is I cm = − E K  Z ∞ 0 f R | r, K ( R | r = √ L, K ) log g ( R, F r , K ) dR  − log (1 + L ) − ( m − 1) . (77) Figure 6 plots the ca pacity values as a function of SNR for block len gths o f m = 10 , 20 , 30 and 40 . T hese c apacity values are achieved with optimal power allocation. The optimal fractions of power alloca ted to the p ilot symbol are p lotted in Fig. 7. Note that for the range of SNR values considered in the ﬁg ure, the optimal value of δ is slightly s maller tha n 1 /m a nd a pproache s 1 /m a s S NR tend s to 0. This power a llocation s trategy is signiﬁcantly dif ferent from that o f the worst-case sc enario in which δ ∗ → 1 / 2 with decrea sing SNR . In the low- SNR regime, the tradeoff between spe ctral efﬁciency and energy per bit obtained from E b N 0 = SNR log 2 C ( SNR ) is the key performance measure [14]. If we assume , withou t loss of generality , that one symbol occu pies a 1 s × 1 Hz time-frequency slot, then the maximum spec tral efﬁciency is C ( E b / N 0 ) = C ( SNR ) log 2 e bits/s/Hz where we hav e assume d that C ( SNR ) is in n ats/symbol. Fig. 8 plots the bit e nergy values a s a function o f the spe ctral ef ﬁcien cy . It is a gain ob served that the minimum bit en ergy is achiev ed at a nonzero spectral e f ﬁciency and the required 8 The Bolzano-W eierstrass Theorem [40] states that ev ery bounded inﬁnite set of real numbers has a limit point. 20 0 0.5 1 1.5 2 2.5 3 0 0.2 0.4 0.6 0.8 1 SNR C(SNR) nats/symbol m = 10 m = 20 m 30 m = 40 Fig. 6. C apacity (nats/symbol) vs. SNR for block lengths of m = 10 , 20 , 30 and 40 when the input is subject to peak power limitations. 0.5 1 1.5 2 2.5 3 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 SNR δ m =10 m = 20 m = 30 m = 40 Fig. 7. Optimal fraction of po wer δ allocated to the pilot symbol vs. SNR for block lengths of m = 10 , 20 , 30 and 40 . 21 bit energy values g ro w without bound as SNR an d hence the s pectral efﬁciency is further decrea sed. Indeed, we can show the following result. Theorem 7: Ass ume that the n ormalized input magnitude d istrib ution has a s ingle mass and he nce the mag- nitude is ﬁxed at r = √ L . For a ny value of δ ∈ (0 , 1) , the normalized bit e nergy required by this input grows without boun d as the s ignal-to-noise ratio de creases to ze ro, i.e., E b,cm N 0     I cm =0 = lim SNR → 0 m SNR I cm ( SNR ) log 2 = m log 2 ˙ I cm (0) = ∞ . (78) Pr oof: Recall that L = (1 − δ ) m SNR δm SNR +1 and SN R = γ 2 P N 0 . Also , no te that an express ion for I cm is given in (77). B y making a ch ange of variables, we have the follo wing e quiv alent expression: I cm = − E K  Z ∞ 0 f R | r, K ( R | √ L, K δ m SNR ) log g ( R , F r , K δ m SNR ) dR  − log (1 + L ) − ( m − 1) ( 79) where K is now an expo nential random variable with E { K } = 1 , and he nce is ind epende nt of SNR . W e can ea sily show that ∂ ∂ SNR f R | r, K ( R | √ L, K δ m SNR )     SNR =0 = − (1 − δ ) m R m − 2 ( m − 2)! e − R + (1 − δ ) m R m − 1 ( m − 1)! e − R . (80) Note that g ( R, F r , K δ m SNR ) = ( m − 2)! R m − 2 f R | r, K ( R | √ L, K δ m SNR ) . Using these facts, we can ea sily prove that ˙ I cm (0) = ∂ I cm ∂ SNR     SNR =0 = 0 . (81)  In the very low SNR regime, the c hannel estimate deteriorates and the performance a pproach es that of non- coherent Rayleigh b lock fading cha nnels. As shown in [18], bit energy values required in these chann els grow without bound a s SNR → 0 and the sa me phenome non is observed h ere as we ll. In the worst-case scena rio treate d in Section IV, the pe rformance deterioration at very lo w SNR levels is du e to the fact that poo r channel estimates are assu med to be perfect. In this s ection, similar ob servations are the resu lt of the limitations o n the peaked ness of the sign al. Ne vertheless, designing the transmiss ion and reception for cha nnel in (58) rather than that in (11) leads to e nergy gains in the lo w- SNR regime. Fig. 9 provides a comparison of the bit energy values required in the worst case sc enario and the s cenario where pea k power c onstraints are imposed and op timal signa ling and decoding is employed. In the worst-case s cenario, the c hannel estimate is assumed to b e perfect and transmission and reception is designe d for a kn own channe l. This is obviously a poor assumption in the lo w- SNR regime and in Fig. 9 we o bserve bit e nergy g ains of ap proximately 1.5 dB when optimal techniques a re emp loyed in the case of m = 10 . Note that these gains are achieved whe n the input is subject to more stringent peak power constraints. From Fig. 9, we also conc lude tha t in the low- SNR regime, the a chiev able rate expression in (15) is 22 0 0.5 1 1.5 0 1 2 3 4 5 6 7 8 Spectral efficiency (bits/s/Hz) E b /N 0 (dB) m = 10 m = 20 m = 30 m = 40 Fig. 8. Bit energy E b N 0 vs. Spectral ef ﬁciency C “ E b N 0 ” in pilot-assisted systems wit h block lengths m = 10 , 20 , 30 and 40 . a lo wer bound to the p eak-power limited ca pacity o f the ch annel in (58). Note that (15) will eventually exceed this cap acity value a t high SNR levels as it is obtained und er less strict average power co nstraints. In training-base d systems, certain fraction of time and power which otherwise will be use d for data transmission is allocated to the p ilot symb ols to facili tate ch annel es timation. Hence , there is a potential for pe rformance loss in terms o f data rates . Howe ver , at the s ame time, the av ailability of c hanne l estimates at the rece i ver tends to improve the performance . On the other ha nd, in n oncohe rent communic ations, there is no attempt for channel estimation and c ommunication is pe rformed over unk nown c hanne ls. T he analys is p resented in this p aper can be applied to nonco herent communications in a straightforward manne r by ch oosing δ = 0 and replacing m in the equations by m + 1 as no time is allocated to pilot symb ols. He nce, for ins tance, the discrete nature of the optimal input und er p eak power co nstraints can eas ily be shown for the nonc oherent Ra yleigh cha nnel as well. Howe ver , the details of this analysis is omitted becaus e the discretene ss results are proven for nonc oherent Rician fading channe ls in [18] and for more genera l nonco herent MIMO chan nels in [8]. Here, we prese nt numerica l res ults. Figures 10 a nd 11 co mpare the pe rformances of training-base d and no ncohe rent communica tion s ystems. In Fig. 10, the bit energy v alues are plotted for bo th sch emes when the block leng th is m = 20 . It is obs erved that for this relati vely small value of the block length, both sc hemes achieve a lmost the same minimum bit energy value, 23 0 0.2 0.4 0.6 0.8 1 1.2 1.4 2 3 4 5 6 7 8 Spectral efficiency (bits/s/Hz) E b /N 0 (dB) worst−case optimal Fig. 9. Bit ener gy E b N 0 vs. S pectral efﬁcienc y C “ E b N 0 ” in t he worst-case scenario and the scenario of optimal coding-d ecoding under input peak-po wer constraints. The block length is m = 10 . and therefore, the training-based performance is surprisingly rather close to that of the n oncohe rent s cheme ev en in the low- SNR regime. Fig. 11 plots the cap acity values as a function of the block length at SN R = 5 dB. Here, we also observe that the performance of training-bas ed schemes c omes very close to that of the noncoherent scheme . Th erefore, if having the c hannel estimate reduces the complexity of the receiver and/or pilot signa ls are additionally u sed for timing an d freque ncy-off set sync hronization or chan nel equa lization, training-based sche mes can be p referred over n oncohe rent co mmunications with small loss in da ta rates. A. Capacity with Ideal Interleav ing and P er-symbol P eak P ower Constraints Since most of the well-known code s are designed to co rrect e rrors that occu r indepe ndently from the location of other errors [43], practica l c ommunication systems employ interleavers a t the transmitters to gain protection against error bursts. Deinterleavers are used at the rece i ver to reverse the interleaving operation. In this section, we c onsider su ch sys tems and assume that idea l interleaving is use d so that ea ch data symb ol experiences independ ent channe l conditions. Pilot symbols are inserted p eriodically after the interleaver . W e no te that a pilot- assisted transmission with ide al interleaving is also s tudied in [24 ] a nd [25] wh ere a chiev able rates are conside red. 24 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 Spectral efficiency (bits/s/Hz) E b /N 0 (dB) noncoherent training−based Fig. 10. Bit energy E b N 0 vs. Spectral efﬁciency C “ E b N 0 ” for training-based and noncoh erent communication systems when m = 20 . 5 10 15 20 25 30 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 1.05 1.1 m Capacity (nats/symbol) noncoherent training−based Fig. 11. Capacity (nats/symbol) vs. block length m for training-based and noncoherent communication systems. SNR = 5 dB. 25 Since interleaving b reaks the chann el correlation se en by the da ta s ymbols, chan nel memory c an no longer be taken adv antage of in the trans mission. Hence, interleaving in gen eral decreases the capac ity . Th erefore, the capac ity results in this section can also be regarded as lower boun ds on the capa city of a non -interlea ved system. On the o ther han d, one advantage of interleaving is the s impliﬁcation of signa ling scheme s. W e c ontinue conside ring the block fading channel mode l. He nce, the channel stays co nstant for a block of m symbols. Howe ver , a fter deinterleaving, the chann el output can be expressed as y d,i = ˆ h i x d,i + ˜ h i x d,i + n i i = 1 , 2 , 3 . . . (82) Note that due to interlea ving, each da ta sy mbol x d,i is af fected by independ ent a nd identically distrib uted fading coefﬁcients h i = ˆ h i + ˜ h i . In this section, we consider per -symbol peak power co nstraints, | x i | 2 a.s. ≤ P ∀ i . The refore, the pilot symb ol power is | x t | 2 = P . Note that the use of more than o ne pilot may be op timal. The channel capac ity in this setting is formulated as follo ws: C = sup 1 ≤ l ≤ m sup x d | x d | 2 a.s. ≤ P m − l m I ( x d ; y d | ˆ h ) (83) where l den otes the number of pilot sy mbols per m symbols, and ˆ h ∼ C N  0 , γ 4 lP γ 2 lP + N 0  and ˜ h ∼ C N  0 , γ 2 N 0 γ 2 lP + N 0  . The inner maximiza tion in (83) be comes a spec ial c ase of the inner ma ximization in (62) when we reduce the dimensionality of the optimization problem in (62) 9 by c hoosing m = 2 . Therefore, the results on the structure of the capac ity-achieving input immediately apply to the se tting we consider in this sec tion. The op timal input has a un iformly distrib uted phase . W ith this c haracterization, the ca pacity is C = sup 1 ≤ l ≤ m sup F r r a.s. ≤ √ L m − l m I ( F r | ˆ h ) (84) where I ( F r | ˆ h ) = − E K ,r  Z ∞ 0 f R | r, K ( R | r , K ) log g ( R, F r , K ) dR  − E r { log(1 + r 2 ) } − 1 (85) where f R | r, K ( R | r , K ) = e − R + K r 2 1+ r 2 1 + r 2 I 0 2 √ K R r 1 + r 2 ! , (86) g ( R, F r , K ) = Z ∞ 0 f R | r, K ( R | r , K ) dF r , (87) 9 Note that the input constraints, error v ariances, and the constants multiplying the mutual information exp ressions will be different in the specialized case of (62) and in (83). But, the general structures of the two optimization problems are the same. 26 and, R = | y d | 2 N 0 , r = ˜ γ | x d | √ N 0 , K = | ˆ h | 2 ˜ γ 2 , ˜ γ 2 = γ 2 N 0 γ 2 lP + N 0 , and L = γ 2 P / N 0 lγ 2 P / N 0 +1 = SNR l SNR +1 . Note that K is a n expon ential random variable with mean E { K } = E {| ˆ h 2 |} ˜ γ 2 = lγ 2 P N 0 = l SNR . Since the inne r maximization in (84 ) is a s pecial case of tha t in (69), we immediately h ave the following result. Theorem 8: Fix the value of 1 ≤ l ≤ m . The inpu t distributi on that maximizes the mu tual information I ( F r | ˆ h ) in (84) is discrete with a ﬁnite number of mas s points. Next, we prese nt numerical results. Fig. 12 p lots, for d if ferent values of the block len gths, the ca pacity curves as a function of SNR for training-based s chemes . W e o bserve that the capacity values increas e with the block length even though the channe l in (82) is me moryless. This performance gain sho uld b e attributed to the fact that the c hanne l e stimate improves with increasing block length. Fig. 12 also plots the c apacity o f the interleaved noncoh erent c ommunications in which no attempt is made to learn the channe l. From the co mparison of the capac ity curves, we observe that training signiﬁcantly en hance s the da ta rates wh en data s ymbols are interleaved at the transmitter . In Fig. 13, bit energy cu rves as a function of the spec tral efﬁciency are plotted. Again, we see that training-base d schemes perform mu ch better in terms of energy ef ﬁciency than the no ncohe rent sch eme. In all cases, the minimum bit ene r gy is achieved at a nonze ro spe ctral efﬁciency le vel below wh ich one s hould n ot operate. The b it energy requirement inc reases without bo und as sp ectral efﬁciency decreas es to zero. Whe n we compare Figs. 8 and 13, we note tha t wh ile simplifying the s ystem des ign, interlea ving also incurs a pena lty in energy efﬁciency . Finally , in Fig. 1 4, we provide the optimal resou rce allocations by plotting the optimal nu mber of pilot s ymbols p er block as a function o f SNR for different b lock length values. W e realize that optimal numbe r of pilots tends to increase as SNR de crease s and app roaches m/ 2 . Henc e, as in Section IV -A, asymptotically half of the av ailable power in each block sh ould be alloca ted to the training symbols. B. Achievable Ra tes and Bit Energies of On -Of f K e ying In this s ection, we relax the inpu t cons traints and assu me that the input is su bject to an average power constraint E {k x k 2 } ≤ mP . (88) W e con sider the chan nel model (58) where there is n o interleaving. Ak in to Se ction IV -C, our goal is to o btain the attainable bit en ergy levels when signals with high p eak-to-average power ratios are employed. As before, s ingle pilot symbol with power | x t | 2 = δ mP is us ed and h ence the data vector is subject to E {k x d k 2 } ≤ (1 − δ ) mP . The data vector is a gain a ssumed to have an isotropically distributed directional vec tor v , and henc e x d = k x d k v . W e further as sume that the o n-off keying is use d for magn itude modulation an d therefore 1 √ m k x d k =    A with prob . p 0 0 with prob. 1 − p 0 (89) 27 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.1 0.2 0.3 0.4 0.5 0.6 SNR C(SNR) (nats/symbol) m = 10 m = 20 m = 30 m = 40 m = 50 noncoherent Fig. 12. Capacity (nats/symbol) vs. SNR for interleave d, training-based transmissions when block l engths are m = 10 , 20 , 30 , 40 and 50 , and for interleav ed noncoherent t ransmission ove r the unkno wn Rayleigh fading channel. where A is a ﬁxed ma gnitude level that d oes not vary with the power P . In order to satisfy the average power constraint we should have A 2 p 0 = (1 − δ ) P ⇒ p 0 = (1 − δ ) P A 2 (90) Therefore, in this signa ling sc heme, the peak power of the transmitted data signal is kept constant while its probability vanishes as P → 0 . Hence, while the peak power is ﬁxed, the peak-to-av erage power ratio gro ws without boun d as P → 0 . Similarly as before, we deﬁ ne r = ˜ γ k x d k √ N 0 . W ith this deﬁnition, the d istrib ution of r is r =    γ √ mA √ γ 2 δmP + N 0 with prob . p 0 = (1 − δ ) P A 2 0 with prob . 1 − p 0 . (91) W e further de ﬁne ν = A 2 γ 2 (1 − δ ) N 0 which does not depen d on P , and SNR = γ 2 P N 0 . Now , we can write r =    r 0 = √ (1 − δ ) mν √ δm SNR +1 with prob. p 0 = SNR ν 0 with prob. 1 − p 0 . (92) 28 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 1 2 3 4 5 6 7 8 9 10 Spectral efficiency (bits/s/Hz) E b /N 0 (dB) m = 10 m = 20 m = 30 m = 40 m = 50 noncoherent Fig. 13. Bit energy E b N 0 vs. Spectral efﬁciency C “ E b N 0 ” for interleav ed, training-based transmissions when block l engths are m = 10 , 20 , 30 , 40 and 50 , and for interleav ed noncoherent transmission ov er the unkno wn Rayleigh fading channel. For a given value of δ , the mutual information achieved by the isotropically distributed directiona l vector v and r who se distrib ution is given in (92) is I ook = − E K  Z ∞ 0 f R | K ( R | K ) log  ( m − 2)! R m − 2 f R | K ( R | K )  dR  − p 0 log(1 + r 2 0 ) − ( m − 1) ( 93) where f R | K ( R | K ) = (1 − p 0 ) f R | r, K ( R | r = 0 , K ) + p 0 f R | r, K ( R | r = r 0 , K ) and f R | r, K ( R | r , K ) is gi ven in (67). Note tha t K is a n expo nential random variable with mean E { K } = δmγ 2 P N 0 = δ m SNR . Next, we obtain the bit energy required for reliable c ommunications with OOK as SNR → 0 . Theorem 9: Ass ume that the no rmalized input ma gnitude distribution is giv en by (92). For a given value of δ ∈ (0 , 1) , the normalized bit energy req uired by this input a s P → 0 is E b,ook N 0     I ook =0 = lim SNR → 0 m SNR I ook ( SNR ) log 2 = m log 2 ˙ I ook (0) = log 2 (1 − δ ) − 1 mν log(1 + (1 − δ ) mν ) . (94) Pr oof : As in the proof of The orem 7, we apply a c hange of variables and express the mu tual information a s I ook = − E K  Z ∞ 0 f R | K ( R | K δ m SNR ) log  ( m − 2)! R m − 2 f R | K ( R | K δ m SNR )  dR  − p 0 log(1 + r 2 0 ) − ( m − 1) (95) 29 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 5 10 15 20 25 SNR Number of Pilot Symbols m = 10 m = 20 m = 30 m = 40 m = 50 Fig. 14. Number of pilot symbols per block vs. S NR for i nterleav ed, training-based transmissions when block lengths are m = 10 , 20 , 30 , 40 and 50 . where K is now an exponential random variable with mean E { K } = 1 . It can be e asily seen that ∂ ∂ SNR p 0 log(1 + r 2 0 )     SNR =0 = 1 ν log(1 + (1 − δ ) mν ) . (96) W e can also show tha t ∂ ∂ SNR f R | K ( R | K δ m SNR )     SNR =0 = − 1 ν f R | r, K ( R | r = 0 , K = 0) + 1 ν f R | r, K ( R | r = p (1 − δ ) mν , K = 0) . (97) Using (97), we can prove that the deriv ativ e of the ﬁrst term on the right-hand side of (95) with respect to SNR at SNR = 0 is (1 − δ ) m . Comb ining this resu lt with (96), we arriv e to ˙ I ook (0) = (1 − δ ) m − 1 ν log(1 + (1 − δ ) mν ) (98) which conc ludes the proof.  Theorem 9 shows that un like previously treated c ases , reliable c ommunications with OOK modulation with ﬁxed peak power requires ﬁnite b it en ergy as P → 0 . Hence, OOK provides signiﬁcant improv ements in energy efﬁciency in the low- SNR regime at the cost of high p eak-to-average power ratio. Since ν = A 2 γ 2 (1 − δ ) N 0 , we can also 30 express the asymp totic bit energy le vel as E b,ook N 0     I ook =0 = log 2 (1 − δ )  1 − 1 m A 2 γ 2 N 0 log  1 + m A 2 γ 2 N 0   . (99) It has be en s hown in [18] that, if nonc oherent commu nications with no chan nel estimation is p erformed and the input is subject to E {k x | 2 } ≤ m P and k x k 2 a.s. ≤ mA , the n o ptimal signaling requ ires the following bit energy value as P → 0 : E b,noncoh N 0     C =0 = log 2  1 − 1 m A 2 γ 2 N 0 log  1 + m A 2 γ 2 N 0   . (100) W e note that similar results for fading cha nnels with memory are ob tained in [21] through the a nalysis of c apacity per unit c ost. Compa ring (99) and (10 0), we ﬁnd that training-based sc hemes suffer an en ergy pe nalty d ue to the prese nce of the term 1 / (1 − δ ) and this pe nalty vanishes if δ → 0 . Therefore, if OOK with ﬁxed power is employed, the power of the training symbols sho uld be dec reased to ze ro a s P → 0 to match the noncoh erent performance. This power allocation policy is in s tark contrast to the results in the pre vious se ctions. Note that as SNR decreas es, data trans mission oc curs extremely infrequently . In such a ca se, performing cha nnel estimation all the time for each m -block irrespecti ve o f whether or no t data transmission takes place is not an good design choice. He nce, a g radual dec rease in the po wer a llocated to training should also be intuiti vely expected . W e further remark tha t as m → ∞ and δ → 0 , E b,ook N 0    I ook =0 → − 1 . 59 dB . Fig. 15 p lots the bit energy le vels a s a function o f spectral e f ﬁciency for training-base d OOK with ﬁxed peak power and for training-bas ed optimal signaling under input peak po wer constraints in the form k x k 2 a.s. ≤ (1 − δ ) mP . In this ﬁgure, the block length is m = 10 , a nd for OOK, ν = 1 . As predicted, be low the spe ctral ef ﬁciency of approx imately 0.4 bits/s/Hz, OOK provides be tter energy ef ﬁcien cy . The bit ene r gy requirements o f OOK decreas es as sp ectral ef ﬁcien cy decrease s as opp osed to the b ehavior prese nted in the peak-power -limited c ase. Numerical analys is have a lso shown tha t the fraction of power allocated to training, δ , in OOK dec reases as SNR decreas es, conforming with the discussion in the previous pa ragraph. V I . C O N C L U S I O N In this pa per , we have studied the energy efﬁciency and cap acity o f training-base d c ommunication sche mes employed for the transmission of information over a-priori unknown Ray leigh block fading ch annels. W e h ave initially conside red the worst-case scena rio in which the product of the es timate e rror and transmitted signal is assume d to b e Gaussian noise. The capacity expression obtained under this assumption is a lo wer bound to the true capac ity of the cha nnel, a nd p rovides the a chiev able rates when the communica tion system is de signed as if the channe l estimate were perfect. W e have in vestigated the bit energy le vels required for reliable communica tions and 31 0 0.1 0.2 0.3 0.4 0.5 0.6 1.5 2 2.5 3 3.5 4 4.5 5 Spectral efficiency (bits/s/Hz) E b /N 0 (dB) OOK peak−power limited Fig. 15. Bit energy E b N 0 vs. S pectral efﬁcienc y C “ E b N 0 ” for training-based OOK signaling and tr aining-based optimal signaling under input peak power constraints. The block length is m = 10 . quantiﬁed the penalty in energy efﬁciency incurred due to regarding the imperfec t chan nel estimate a s p erfect in the low- SNR regime. W e h ave shown that the b it e nergy requirements grow without bound as SNR → 0 regardless of the size o f the block length m . Hen ce, the minimum bit energy is achieved at a n onzero SNR v alue be lo w which one s hould not operate u nder the aforemen tioned assumptions . W e hav e also shown tha t approa ching the minimum bit ene rgy le vel of − 1 . 59 dB is extremely slow in terms of block length a s m → ∞ . S imilar results are ob tained if peak power limitations are imposed on training symbols. W e ha ve also in vestigated ﬂash training and transmission s chemes to improve the ene rgy ef ﬁciency a t low S NR le vels. W e have shown that in order for the bit energy requirement not to grow as SNR → 0 , the duty cycle in ﬂas h transmiss ion s hould vanish linearly with decrea sing SNR . Next, we hav e ana lyzed the capa city a nd en ergy e f ﬁcien cy o f training-base d scheme s when the input is su bject to p eak power con straints. W e hav e ch aracterized that the c apacity-ach ieving input has a discrete magnitude and an iso tropically distributed unit dir ectiona l vector . Using this characterization, we ha ve obtained the capa city expressions, optimal training power allocations, and bit energy levels required for reliable communica tions. W e have noted that at low SNR s, the optimal input magnitude is ﬁxed at a con stant level. D ue to the presen ce of 32 the p eak power constraints, the bit energy requiremen ts are ag ain shown to increa se without bo und a s SNR → 0 . Howe ver , w e have se en that gains in e nergy ef ﬁciency are obtained when optimal signaling and deco ding are employed. W e hav e compared the pe rformances of training-based and nonc oherent transmission schemes. Although training-based schemes dedicate certain amount of time and po wer to training symbols a nd a s a res ult are expe cted to s uff er in terms of data rates, we have observed that the pe rformance loss is small ev en at relativ ely small block len gths and small SNR levels. W e have also con sidered the case in which interleaving u sed at the transmitter for protection aga inst error b ursts and per-symbol peak power co nstraints are imposed. W e hav e obtained the channel c apacity , o ptimal training duration, a nd a nalyzed the e nergy efﬁciency . In this case , training is s hown to improve the performance with respe ct to no ncoheren t c ommunications . W e h av e also in vestigated the improvements in e nergy efﬁciency in the low- SNR regime if OOK with ﬁxed pe ak power and vanishing duty cycle is employed at the transmitter . F inally , we note that this work ha s primarily focuse d on bloc k fading channe ls. Recently , we in [33], [34] and [35] have considere d more general fading p rocesse s with memory . Since the exact capac ity is rather dif ﬁcult to obta in in such c ases , a chiev able rate expressions a re a nalyzed, an d s ubsequ ently energy ef ﬁciency and optimal res ource allocations a re studied . 33 A P P E N D I X A. Derivation of the Mutua l Information Exp r ess ion in The orem 4 The input-output mutual information exp ression for channe l (58) is I ( x d ; y d | ˆ h ) = E ˆ h E x d Z f y | x d , ˆ h ( y | x d , ˆ h ) log f y | x d , ˆ h ( y | x d , ˆ h ) f y | ˆ h ( y | ˆ h ) d y (101) = − E ˆ h E x d Z f y | x d , ˆ h ( y | x d , ˆ h ) log f y | ˆ h ( y | ˆ h ) d y − E x d { log( π m − 1 N m − 2 0 e m − 1 ( ˜ γ 2 k x d k 2 + N 0 )) } (102) = − E ˆ h E x d Z f y | x d , ˆ h ( y | x d , ˆ h ) log f y | ˆ h ( y | ˆ h ) d y − E r { log(1 + r 2 ) } − log ( π m − 1 N m − 1 0 ) − ( m − 1) (103) Note tha t the seco nd part of (102) is the conditional dif ferential entropy of y given x d and ˆ h . (103) follo ws from the de ﬁnition r = ˜ γ k x d k √ N 0 . The main dif ﬁculty is to s implify χ ( x d , ˆ h ) = Z f y | x d , ˆ h ( y | x d , ˆ h ) log f y | ˆ h ( y | ˆ h ) d y (104) which, in g eneral, is an 2( m − 1) -fold integral. Note that f y | ˆ h ( y | ˆ h ) = Z f y | x d , ˆ h ( y | x d , ˆ h ) dF x d . (105) Using the facts tha t f y | x (Φ y | Φ x d , ˆ h ) = f y | x d , ˆ h ( y | x d , ˆ h ) and inpu t ha s circular symmetry , we c an eas ily see that for any ﬁxed unitary matrix Φ f y | ˆ h (Φ y | ˆ h ) = f y | ˆ h ( y | ˆ h ) = f y | ˆ h ( k y k| ˆ h ) (106) and hen ce χ (Φ x d , ˆ h ) = χ ( x d , ˆ h ) = χ ( k x d k , ˆ h ) . (107) Therefore, f y | ˆ h ( y | ˆ h ) an d χ ( x d , ˆ h ) are circularly-symmetric functions dep ending only on k y k a nd k x d k , respec- ti vely . Noting that ( ˜ γ 2 x d x † d + N 0 I ) − 1 = I N 0 − ˜ γ 2 x d x † d N 0 ( ˜ γ 2 k x d k 2 + N 0 ) , (108) and de ﬁning x d = k x d k v and y = k y k w , we ca n, after s ome algeb raic s teps, rewrite the cond itional density function in (64) as f y | x d , ˆ h ( y | x d , ˆ h ) = exp  − k y k 2 N 0 − | ˆ h | 2 k x d k 2 ˜ γ 2 k x d k 2 + N 0 + ˜ γ 2 k x d k 2 k y k 2 | w † v | 2 N 0 (˜ γ 2 k x d k 2 + N 0 ) + 2 k x d kk y k| ˆ h | ℜ ( e jθ ˆ h w † v ) ˜ γ 2 k x d k 2 + N 0  π m − 1 N m − 2 0 ( ˜ γ 2 k x d k 2 + N 0 ) (109) 34 where ℜ ( z ) deno tes the real part of the complex numbe r z , and θ ˆ h is the phase o f ˆ h . The u sefulness of (10 9) comes from the prop erty that the magnitude k x d k and the directional un it vec tor v are s eparated. W e k now from Theorem 3 tha t v is is otropically distrib uted and inde penden t of k x d k . He nce, we now have f y | ˆ h ( y | ˆ h ) = Z exp  − k y k 2 N 0 − | ˆ h | 2 k x d k 2 ˜ γ 2 k x d k 2 + N 0 + ˜ γ 2 k x d k 2 k y k 2 | w † v | 2 N 0 (˜ γ 2 k x d k 2 + N 0 ) + 2 k x d kk y k| ˆ h | ℜ ( e jθ ˆ h w † v ) ˜ γ 2 k x d k 2 + N 0  π m − 1 N m − 2 0 ( ˜ γ 2 k x d k 2 + N 0 ) f v ( v ) dF k x d k . ( 110 ) where f v is the prob ability density function of v . Since f y | ˆ h is a func tion of only k y k , we can, without loss of generality , a ssume that w † = [1 , 0 , 0 , . . . , 0] . In such a case, f y | ˆ h ( y | ˆ h ) = Z exp  − k y k 2 N 0 − | ˆ h | 2 k x d k 2 ˜ γ 2 k x d k 2 + N 0 + ˜ γ 2 k x d k 2 k y k 2 | v 1 | 2 N 0 (˜ γ 2 k x d k 2 + N 0 ) + 2 k x d kk y k| ˆ h | ℜ ( e jθ ˆ h v 1 ) ˜ γ 2 k x d k 2 + N 0  π m − 1 N m − 2 0 ( ˜ γ 2 k x d k 2 + N 0 ) f v 1 ( v 1 ) dF k x d k . (111) where v 1 is the ﬁrst comp onent of v an d f v 1 is the corresp onding den sity func tion. From [5], we have for m ≥ 3 f v 1 ( v 1 ) = 1 2 π 2( m − 2)(1 − | v 1 | 2 ) m − 3 | v 1 | ≤ 1 . (112) Hence, v 1 has a uniform pha se and a mag nitude whose de nsity function is f | v 1 | ( | v 1 | ) = 2( m − 2) | v 1 | (1 − | v 1 | 2 ) m − 3 . (113) Note tha t if m = 2 , then x d is one-dimensional and h ence x d = k x d k v = k x d k e θ x d . The refore, in this cas e, | v | = | v 1 | = 1 with p robability one. Using these facts a nd deﬁning r = ˜ γ k x d k √ N 0 , R = k y k 2 N 0 , K = | ˆ h | 2 ˜ γ 2 , and a = | v 1 | 2 , we obtain f y | ˆ h ( y | ˆ h ) =      R ∞ 0 dF r ( m − 2) e − R − K r 2 1+ r 2 π m − 1 N m − 1 0 (1+ r 2 ) R 1 0 (1 − a ) m − 3 e ar 2 R 1+ r 2 I 0  2 √ K R r √ a 1+ r 2  da m ≥ 3 R ∞ 0 dF r e − R + K r 2 1+ r 2 π m − 1 N m − 1 0 (1+ r 2 ) I 0  2 √ K R r 1+ r 2  m = 2 (114) = g ( R, F r , K ) π m − 1 N m − 1 0 (115) where g ( R , F r , K ) is de ﬁned in (68). Therefore, we h av e χ ( x d , ˆ h ) = Z f y | x d , ˆ h ( y | x d , ˆ h ) log f y | ˆ h ( y | ˆ h ) d y (116) = E y | x d , ˆ h { log f y | ˆ h ( y | ˆ h ) d y } (117) = E R | r, K { log  g ( R, F r , K ) π m − 1 N m − 1 0  } (118) = − log ( π m − 1 N m − 1 0 ) + E R | r, K { log g ( R , F r , K ) } (119) = − log ( π m − 1 N m − 1 0 ) + Z ∞ 0 f R | r, K ( R | r , K ) log g ( R, F r , K ) dR (120) where f R | r, K ( R | r , K ) is the conditional dens ity function o f R giv en r and K . Combining (103) a nd (120), we get I ( x d ; y d | ˆ h ) = − E K ,r  Z ∞ 0 f R | r, K ( R | r , K ) log g ( R, F r , K ) dR  − E r { log(1 + r 2 ) } − ( m − 1) (121) 35 which is the mu tual information expression provided in T heorem 4. Proof will be completed by showi ng that f R | r, K ( R | r , K ) has the expression gi ven in (67). From the p revious development, we can eas ily verify that f R 1 ,...,R m − 1 | r, K ( R 1 , . . . , R m − 1 | r , K ) =      ( m − 2) e − R − K r 2 1+ r 2 (1+ r 2 ) R 1 0 (1 − a ) m − 3 e ar 2 R 1+ r 2 I 0  2 √ K R r √ a 1+ r 2  da m ≥ 3 e − R + K r 2 1+ r 2 (1+ r 2 ) I 0  2 √ K R r 1+ r 2  da m = 2 (122) where f R 1 ,...,R m − 1 | r, K is the co nditional joint dens ity func tion of R 1 , . . . , R m − 1 giv en r , K . Note that we above have deﬁned R i = | y i | 2 N 0 and henc e R = k y k 2 N 0 = R 1 + R 2 + . . . + R m − 1 . Note that the joint p robability density function depe nds on the su m R . W e have the following relationship Z ∞ 0 f R ( R | r , K ) dR = Z f R 1 ,...,R m − 1 | r, K ( R 1 , . . . , R m − 1 | r , K ) dR 1 . . . dR m − 1 (123) = Z dR 2 . . . dR m − 1 Z ∞ R 2 + ... + R m − 1 f R 1 ,...,R m − 1 | r, K ( R | r , K ) dR (124) = Z dR 3 . . . dR m − 1 Z ∞ R 3 + ... + R m − 1 f R 1 ,...,R m − 1 | r, K ( R | r , K ) dR Z R − ( R 3 + ... + R m − 1 ) 0 dR 2 (125) = Z dR 3 . . . dR m − 1 Z ∞ R 3 + ... + R m − 1 ( R − ( R 3 + . . . + R m − 1 )) f R 1 ,...,R m − 1 | r, K ( R | r , K ) dR (126) = Z ∞ 0 R m − 2 ( m − 2)! f R 1 ,...,R m − 1 | r, K ( R | r , K ) dR. (127) (124) follo ws by applying the change of variables with R = R 1 + R 2 + . . . + R m − 1 in the integral with re spect to R 1 . (125 ) is obtained by interchanging the integrals with respect to R 2 and R . (126) follows by ev aluating the rightmost integral in (125). Finally , (127) is obtained through the repeated application of this procedure. From (127), we h av e f R ( R | r , K ) = R m − 2 ( m − 2)! f R 1 ,...,R m − 1 | r, K ( R | r , K ) (128) =      R m − 2 ( m − 3)! e − R − K r 2 1+ r 2 (1+ r 2 ) R 1 0 (1 − a ) m − 3 e ar 2 R 1+ r I 0  2 √ K R r √ a 1+ r 2  da m ≥ 3 e − R + K r 2 1+ r 2 (1+ r 2 ) I 0  2 √ K R r 1+ r 2  da m = 2 (129) which is the same as the expression in (67). B. Pr oof of Th eorem 5 1) Existen ce of the Capacity-Achieving Input Distribution: An optimal distrib ution exists if the space of input distrib ution func tions over whic h the minimization is pe rformed is c ompact, and the o bjectiv e functional is weak* continuous [39]. Th e co mpactnes s of the sp ace of inp ut d istrib utions with second moment constraints is s hown in [4]. Th e co mpactnes s for the more stringent case of pea k limited inputs follo ws immediately from this result. 36 Therefore, we ne ed only to show the weak * continuity of I ( ·| ˆ h ) . Th e weak* c ontinuity of the functional I ( ·| ˆ h ) is equ i valent to F n w ∗ → F ⇒ I ( F n | ˆ h ) → I ( F | ˆ h ) . (130) W e ﬁrst note the upper bo und f R ( R | r , K ) ≤ R m − 2 ( m − 3)! e − R + K r 2 1+ r 2 (1 + r 2 ) I 0 2 √ K R r 1 + r 2 ! (131) which is obta ined from the bound (1 − a ) m − 3 e ar 2 R 1+ r 2 I 0  2 √ K R r √ a 1+ r 2  ≤ e r 2 R 1+ r 2 I 0  2 √ K R r 1+ r 2  ∀ a ∈ [0 , 1] . The upper bound in (131) is bo unded for all r ∈ [0 , √ L ] an d also for all R ≥ 0 d ue to the expon ential decrease in R in the seco nd term. S ince f R ( R | r , K ) and log (1 + r 2 ) are con tinuous a nd boun ded func tions for all r ∈ [0 , √ L ] and R ≥ 0 , by the deﬁn ition of weak c on ver genc e [39], F n w ∗ → F ⇒ Z ∞ 0 log(1 + r 2 ) d F n ( r ) → Z ∞ 0 log(1 + r 2 ) d F ( r ) (132) and F n w ∗ → F ⇒ Z ∞ 0 f R ( R | r , K ) d F n ( r ) → Z ∞ 0 f R ( R | r , K ) d F ( r ) (133) for all R ≥ 0 . Therefore, we have F n w ∗ → F ⇒ g ( R , F n , r ) → g ( R, F , r ) ∀ R ≥ 0 . (134) Note that the mutual information in (66) c an also b e written a s I ( F r | ˆ h ) = − Z ∞ 0 d K f K ( K ) Z ∞ 0 dR R m − 2 ( m − 2)! g ( R, F r , K ) log g ( R , F r , K ) − E r { log(1 + r 2 ) } − ( m − 1) (135) The weak* continuity of the sec ond term on the right-hand-side o f (135) follo ws from (132). In order to show (130) and hence the wea k* continuity of the mutual information, we ne ed to prove lim n →∞ Z ∞ 0 d K f K ( K ) Z ∞ 0 dR R m − 2 ( m − 2)! g ( R, F n , K ) log g ( R , F n , K ) (136) = Z ∞ 0 lim n →∞ d K f K ( K ) Z ∞ 0 dR R m − 2 ( m − 2)! g ( R, F n , K ) log g ( R, F n , K ) (137) = Z ∞ 0 d K f K ( K ) Z ∞ 0 lim n →∞ dR R m − 2 ( m − 2)! g ( R, F n , K ) log g ( R, F n , K ) (138) = Z ∞ 0 d K f K ( K ) Z ∞ 0 dR R m − 2 ( m − 2)! g ( R, F , K ) log g ( R , F , K ) (139) (139) follows from (13 4) an d the co ntinuity of the func tion x log x . In order to justify the intercha nges of the limit a nd integral in (137) and (138), we in v oke the Dominated Co n ver genc e Theorem [40] wh ich req uires an 37 integrable upper boun d on the integrand. W e ﬁrst ﬁnd the following upper bou nd on the fun ction g : g ( R, F n , K ) ≤ ( m − 2) Z √ L 0 e − R + K r 2 1+ r 2 (1 + r 2 ) I 0 2 √ K R r 1 + r 2 ! dF n ( r ) (140) ≤ ( m − 2) e − R 1+ L + √ K R Z √ L 0 e − K r 2 1+ r 2 (1 + r 2 ) dF n ( r ) (141) ≤ ( m − 2) e − R 1+ L + √ K R (142) , u ( R, K ) ∀ n, ∀ R, K ≥ 0 . (143) (140) follows from the upper bou nd in (131). (141) is obtained by n oting that e − R 1+ r 2 ≤ e − R 1+ L for all r ∈ [0 , √ L ] and R ≥ 0 , and I 0  2 √ K R r 1+ r 2  ≤ I 0 ( √ K R ) ≤ e √ K R ∀ R, r ≥ 0 . F inally , (142) follows from the observation that the integrand in (14 1) is less than 1 ∀ r , K ≥ 0 . Note tha t the up per bou nd u ( R, K ) is not a function of F n and decreas es exp onentially in R for sufﬁciently large values of R . Next, we ﬁnd the follo wing upper bou nd:     R m − 2 ( m − 2)! g ( R, F n , K ) log g ( R, F n , K )     ≤ R m − 2 ( m − 2)! (4 g 0 . 9 ( R, F n , K ) + g 2 ( R, F n , K )) (144) ≤ R m − 2 ( m − 2)! (4 u 0 . 9 ( R, K ) + u 2 ( R, K )) ∀ R , K ≥ 0 . (145) (144) follo ws from the fact that | x log ( x ) | ≤ 4 x 0 . 9 + x 2 for all x ≥ 0 , an d (145) follows from (143). Note that the up per bound in (145) does not dep end on F n and is integrable due to the expo nential d ecay of u ( R , K ) in R for sufﬁciently large values o f R . Applying the Dominated Con vergence Theorem with the u pper boun d in (14 5) justiﬁes (138). W e further con sider     f K ( K ) Z ∞ 0 dR R m − 2 ( m − 2)! g ( R, F n , K ) log g ( R, F n , K )     ≤ f K ( K ) Z ∞ 0 dR R m − 2 ( m − 2)! | g ( R, F n , K ) log g ( R, F n , K ) | (146) ≤ f K ( K ) Z ∞ 0 dR R m − 2 ( m − 2)! (4 g 0 . 9 ( R, F n , K ) + g 2 ( R, F n , K )) (147) 38 Note that f K ( K ) = 1 E { K } e − K E { K } where E { K } = γ 2 δmP N 0 . The integral of the upper bo und u ( R, K ) with respect to R inc reases exponentially with K . He nce, we n eed to ﬁnd a tighter u pper boun d. W e have g ( R, F n , K ) ≤ ( m − 2) Z √ L 0 e − R + K r 2 1+ r 2 (1 + r 2 ) I 0 2 √ K R r 1 + r 2 ! dF n ( r ) (148) ≤ ( m − 2) Z √ L 0 e − R + K r 2 − 2 √ K R r 1+ r 2 dF n ( r ) (149) ≤ ( m − 2) Z √ L 0 e − ( √ R − √ K r ) 2 1+ L dF n ( r ) (150) ≤    ( m − 2) R ≤ K L ( m − 2) e − ( √ R − √ K L ) 2 1+ L R > K L (151) , v ( R, K ) ∀ n, ∀ R , K ≥ 0 (152) where (14 9) follows from the fact tha t I 0 ( x ) ≤ e x , and (150) follows by choosing the largest value r = √ L in the deno minator of the expon ential function. (151) is obtained by noting that ( √ R − √ K r ) 2 is a nonnegative quadratic function of r , minimized at r = q R K . Hence, if L ≥ R K , the minimum value of the q uadratic function is zero. Otherwise, it is ( √ R − √ K L ) 2 . From (147) and (152), we have     f K ( K ) Z ∞ 0 dR R m − 2 ( m − 2)! g ( R, F n , K ) log g ( R , F n , K )     ≤ f K ( K ) Z ∞ 0 dR R m − 2 ( m − 2)! (4 v 0 . 9 ( R, K ) + v 2 ( R, K )) . (153) Note tha t the upper bound in (153) is independ ent o f F n . It can also be veriﬁed ea sily that this upper bound is integrable with respe ct to K du e to the facts that f K decreas es exp onentially with K wh ile the integral in the upper boun d produces a resu lt that is at most polynomial in K . Applying the Dominated Con ver genc e The orem with the integrable u pper boun d in (153) justiﬁes (137). Hence, the proof is complete. 2) Sufﬁcient and Necessar y K uhn-T ucker Condition: The proof of the sufﬁcient and n ecess ary con dition in (70) follows along the sa me lines as tho se in [4] and [16]. The weak d eri vati ve of I ( ·| ˆ h ) at F 0 is deﬁ ned as I ′ F 0 ( F | ˆ h ) , lim θ → 0 I [(1 − θ ) F 0 + θ F | ˆ h ] − I ( F 0 | ˆ h ) θ . (154) The weak deri vati ve of the mutual information in (66) is obtained a s I ′ F 0 ( F | ˆ h ) = E K  Z dF 0 ( r ) Z ∞ 0 f R | r, K ( R | r , K ) log g ( R, F 0 , K ) dR  − E K  Z dF ( r ) Z ∞ 0 f R | r, K ( R | r , K ) log g ( R, F 0 , K ) dR  + Z dF 0 ( r ) log (1 + r 2 ) − Z dF ( r ) log (1 + r 2 ) . (155) 39 Note that if F 0 is indeed the maximizing distributi on and h ence c apacity achieving, then I ′ F 0 ( F | ˆ h ) ≤ 0 for all F satisfying the pea k power c onstraint. Then using the same steps in [4, Ap pendix II, Theorem 4], we can sh ow that F 0 is a ca pacity-achieving inpu t distrib ution if and on ly if E K  Z ∞ 0 f R | r, K ( R | r , K ) log g ( R, F 0 , K ) dR  + log (1 + r 2 ) + mC δ + m − 1 ≥ 0 ∀ r ∈ [0 , √ L ] (156) with equa lity at the po ints of increa se of distribution F 0 . C. Analyticity of the K uhn-T ucker Condition in the Complex Domain W e conside r the follo wing func tion which is the left-hand-side o f the Kuhn-T ucker condition (70) in the complex domain: Φ( z ) = E K  Z ∞ 0 f R | r, K ( R | z , K ) log g ( R , F r , K ) dR  + log (1 + z ) + mC δ + ( m − 1) . (157) Note that log(1 + z ) is an analytic function of z = z r + j z i in the entire complex plan e excluding the rea l axis with z r ≤ − 1 bec ause the principle branc h of the loga rithm is not analytic on ly on the negati ve real line. Next, we in vestigate the region in which the ﬁrst term of (157) is analytic. W e ﬁrst note the Dif ferentiation Lemma. Dif ferentiati on Lemma 1: [ 42, Se c. XII] L et I be an interval o f real numbers, possibly inﬁnite. Let U be a n open set o f complex numbers. Let f = f ( t, z ) be a co ntinuous function on I × U . Assume : (i) For eac h compact sub set K of U the integral R I f ( t, z ) d t is u niformly co n ver gent for s ∈ K . (ii) For each t the function z 7→ f ( t, z ) is analytic. Let F ( z ) = R I f ( t, z ) d t . Then F is analytic on U and F ′ ( z ) = R I D f ( t, z ) d t whe re D is the diff erentiation op erator . Furthermore D f ( t, z ) satisﬁes the same hypothes is as f .  The integral R ∞ 0 f ( t, z ) d t is said to be uniformly con vergent [42] for z ∈ K if , given ǫ > 0 , the re exists B 0 such tha t if B 0 < B 1 < B 2 , then    R B 2 B 1 f ( t, z ) d t    < ǫ . From this d eﬁnition it can be e asily shown tha t if R ∞ 0 | f ( t, z ) | dt < ∞ , then R ∞ 0 f ( t, z ) d t is uniformly conv ergent. The function f R | r, K ( R | z , K ) = R m − 2 ( m − 3)! e − R − K z 2 1+ z 2 1 + z 2 Z 1 0 (1 − a ) m − 3 e az 2 R 1+ z 2 I 0 2 √ K R z √ a 1 + z 2 ! da (158) is analytic in the entire complex plan e excluding the points a t z = ± j because rational functions a re analytic ev erywhe re exce pt at the points that make the denominator z ero; the exponential function a nd I 0 are a nalytic ev erywhe re because they ca n be expa nded as p ower series; and if g and f a re an alytic then g ◦ f is a lso analytic in the corres ponding region. The an alyticity of the integral in (158) can also be easily veriﬁed using the 40 Dif ferentiation Le mma since the integration is over a ﬁnite interval. In order to ﬁnd the region in which the the ﬁrst term on the right-hand-side of (157) is analytic, we n eed to ﬁnd the region D that satisﬁe s for all z ∈ D Z ∞ 0 f K ( K ) Z ∞ 0 | f R | r, K ( R | z , K ) || log g ( R, F r , K ) | dR < ∞ . (159) W e consider | f R | r, K ( R | z , K ) | =       R m − 2 ( m − 3)! e − K z 2 1+ z 2 1 + z 2 Z 1 0 (1 − a ) m − 3 e − R 1+(1 − a ) z 2 1+ z 2 I 0 2 √ K R z √ a 1 + z 2 ! da       (160) ≤ R m − 2 ( m − 3)!     e − K z 2 1+ z 2     | 1 + z 2 | Z 1 0 (1 − a ) m − 3     e − R 1+(1 − a ) z 2 1+ z 2          I 0 2 √ K R z √ a 1 + z 2 !      da (161) ≤ R m − 2 ( m − 3)! e −ℜ n K z 2 1+ z 2 o | 1 + z 2 | Z 1 0 (1 − a ) m − 3 e − R ℜ n 1+(1 − a ) z 2 1+ z 2 o I 0  2 √ K R √ a ℜ  z 1 + z 2  da (162) ≤ R m − 2 ( m − 3)! e −ℜ n K z 2 1+ z 2 o | 1 + z 2 | Z 1 0 (1 − a ) m − 3 e − R ℜ n 1+(1 − a ) z 2 1+ z 2 o e 2 √ K R √ a ˛ ˛ ˛ ℜ n z 1+ z 2 o ˛ ˛ ˛ da (163) ≤ R m − 2 ( m − 3)! e −ℜ n K z 2 1+ z 2 o | 1 + z 2 | Z 1 0 (1 − a ) m − 3 e − R 1+ z 2 r − z 2 i | 1+ z 2 | 2 e 2 √ K R ˛ ˛ ˛ ℜ n z 1+ z 2 o ˛ ˛ ˛ da (16 4) = R m − 2 ( m − 2)! e −ℜ n K z 2 1+ z 2 o | 1 + z 2 | e − R 1+ z 2 r − z 2 i | 1+ z 2 | 2 e 2 √ K R ˛ ˛ ˛ ℜ n z 1+ z 2 o ˛ ˛ ˛ (165) = R m − 2 ( m − 2)! e − K ( z 2 r − z 2 i )(1+ z 2 r − z 2 i )+4 z 2 r z 2 i | 1+ z 2 | 2 | 1 + z 2 | e − R 1+ z 2 r − z 2 i | 1+ z 2 | 2 e 2 √ K R | z r (1+ z 2 r − z 2 i )+2 z r z 2 i | | 1+ z 2 | 2 (166) = R m − 2 ( m − 2)! e − K ( z 2 r − z 2 i )(1+ z 2 r − z 2 i )+4 z 2 r z 2 i | 1+ z 2 | 2 | 1 + z 2 | e − 0 B @ √ R (1+ z 2 r − z 2 i ) − √ K | z r (1+ z 2 r − z 2 i )+2 z r z 2 i | √ 1+ z 2 r − z 2 i 1 C A 2 | 1+ z 2 | 2 e K ( z r (1+ z 2 r − z 2 i )+2 z r z 2 i ) 2 (1+ z 2 r − z 2 i ) | 1+ z 2 | 2 (167) = R m − 2 ( m − 2)! e − K „ ( z 2 r − z 2 i )(1+ z 2 r − z 2 i )+4 z 2 r z 2 i | 1+ z 2 | 2 − ( z r (1+ z 2 r − z 2 i )+2 z r z 2 i ) 2 (1+ z 2 r − z 2 i ) | 1+ z 2 | 2 « | 1 + z 2 | e − 0 B @ √ R (1+ z 2 r − z 2 i ) − √ K | z r (1+ z 2 r − z 2 i )+2 z r z 2 i | √ 1+ z 2 r − z 2 i 1 C A 2 | 1+ z 2 | 2 (168) = R m − 2 ( m − 2)! e K 0 B @ z 2 i (1+ z 2 r − z 2 i )+ 4 z 2 r z 4 i 1+ z 2 r − z 2 i | 1+ z 2 | 2 1 C A | 1 + z 2 | e − 0 B @ √ R (1+ z 2 r − z 2 i ) − √ K | z r (1+ z 2 r − z 2 i )+2 z r z 2 i | √ 1+ z 2 r − z 2 i 1 C A 2 | 1+ z 2 | 2 (169) In the a bove formulations, ℜ ( z ) de notes the real value of the co mplex-v alued numb er z = z r + j z i whose real and im agina ry comp onents are a lso de noted by z r and z i , respecti vely . (161) foll ows by taking the absolute value of the integrand instea d of the a bsolute value of the integral. (162) follows from the facts that ℜ ( e z ) = e ℜ ( z ) and | I 0 ( z ) | ≤ I 0 ( ℜ ( z )) . (163) is du e to I 0 ( x ) ≤ e | x | for a real number x . (164) is o btained from the 41 bounds ℜ n 1+(1 − a ) z 2 1+ z 2 o ≥ 1+ z 2 r − z 2 i | 1+ z 2 | 2 which h olds for all a ∈ [0 , 1] and | z r | ≥ | z i | , a nd e 2 √ K R √ a ˛ ˛ ˛ ℜ n z 1+ z 2 o ˛ ˛ ˛ ≤ e 2 √ K R ˛ ˛ ˛ ℜ n z 1+ z 2 o ˛ ˛ ˛ ∀ a ∈ [0 , 1] . (16 5) follo ws b y ev aluating the integral in (164), in wh ich the only term that depend s on a is (1 − a ) m − 3 . (166) is obtained by explicitly expressing ℜ n K z 2 1+ z 2 o and ℜ n z 1+ z 2 o in terms of z r and z i , the real and imaginary co mponents of z . (167) follows by express ing the expon ents o f the sec ond and third exponential func tions a s a qu adratic fun ction of √ R . Eventually , (169) is obtained from straightforward algebraic co mputations. The following lo wer bound on g ( R, F r , K ) ca n easily be veriﬁed by noting that e x ≥ 1 and I 0 ( x ) ≥ 1 for all x ≥ 0 : g ( R, F r , K ) ≥ e − R Z L 0 e − K r 2 1+ r 2 1 + r 2 dF r ≥ e − R e − K L 1+ L 1 + L . (170) From the above lower bound, we see that | log g ( R, F r , K ) | increase s at mo st linearly in both R and K for sufﬁciently large values of R an d K . T herefore, if (1 + z 2 r − z 2 i ) > 0 , the n the uppe r bound in (169) d ecrease s exponentially in R , and as a result, the inner integral in (159) c on ver ges. This c ondition is s atisﬁed in the region where | z r | ≥ | z i | . Note that the upper bo und in (16 9) increases expon entially in K . Howev er , the value of the function c ( z r , z i ) = z 2 i (1 + z 2 r − z 2 i ) + 4 z 2 r z 4 i 1+ z 2 r − z 2 i | 1 + z 2 | 2 (171) can be made arbitrarily s mall b y ch oosing arbitrarily small values for | z i | . No te also that f K ( K ) = 1 E { K } e − K E { K } where E { K } = γ 2 δmP N 0 . Hence, in the region whe re c ( z r , z i ) < N 0 γ 2 δmP , we have the integrand in (159 ) exponen - tially dec reasing in K an d as a res ult the integral con verges. it can be shown that for a ﬁxed | z i | < 1 , c ( z r , z i ) is a mon otonically de creasing function of z r ≥ 0 ac hieving its maximum of at z 2 i 1 − z 2 i at z r = 0 . He nce, we con sider the following region in the c omplex domain: D = ( ( z r , z i ) : 0 ≤ z r ≤ min 1 √ 2 , s N 0 2 γ 2 δ mP ! and | z i | ≤ z r ) [ ( ( z r , z i ) : z r > min 1 √ 2 , s N 0 2 γ 2 δ mP ! and | z i | ≤ min 1 √ 2 , s N 0 2 γ 2 δ mP !) (172) In region D , c ( z r , z i ) < N 0 γ 2 δmP and | z r | ≥ | z i | . He nce, the integral in (159) con verges in this region. Moreover , this region includes the positiv e real line. R E F E R E N C E S [1] T . Ericsson, “ A Gaussian channel with slo w fading, ” IEEE T rans. Inform. Theory , vol. 16, pp. 353-355, May 1970. [2] L. H. Ozaro w , S. S hamai, and A. D. W yner , “Information theoretic considerations for cellular mobile radio, ” IEEE T rans. V ehicular T echno logy , vol. 43, pp. 359-377, May 1994. 42 [3] A. J. Goldsmith and P . P . V araiya, “Capacity of fading channels with channel side information, ” IEEE T ran s. Inform. Theory , vol. 43, pp. 1986-1992, Nov . 1997. [4] I. Abou-Faycal, M. D. Trott, and S. Shamai (S hitz), “The capacity of discrete-time memoryless Rayleigh fading channels, ” IEEE T rans . Inform. Theory , vol. 47, pp. 1290-1 301, May 2001. [5] T . L. Marzetta and B. M. Hochwa ld, “Capacity of a mobile multiple-antenna communication link i n Rayleigh ﬂ at fading, ” IEEE T rans . Inform. Theory , vol. 45, pp. 139-15 7, Jan. 1999. [6] T . L. Marzetta and B. M. Hochwald, “Unitary space-time modulation for multiple-antenna communications in R ayleigh ﬂat fading, ” IEEE T rans. Inform. Theory , vol. 46, pp. 543-564, Mar . 2000. [7] J. Huang and S. P . Meyn, “Characterization and computation of optimal distr ibutions for channel coding, ” IEEE T rans. Inform. Theory , vol. 51, pp. 2336-2351 , July 2005. [8] T . H. Chan, S. Hranilovic and F . R . Kschischang, “Capacity-achie ving probability measure for conditionally Gaussian channels with bounded inputs, ” IE EE T rans. Inform. Theory , vol. 51, pp. 2073-2088, June 2005. [9] A. Lapidoth and S. M. Moser , “Capacity bounds via duality with applications t o multiple-antenna systems on ﬂat-fading channels, ” IEEE T rans. Inform. Theory , vol. 49, pp. 2426-2467, Oct. 2003. [10] L. Zheng and D. N. C. Tse, “Communication on the Grassman manifold: A geometric approach to the noncoherent multiple-antenna channel, ” IEEE T rans. Inform. Theory , vol. 48, pp. 359-383, Feb . 2002. [11] L. Zheng, D. N. C. Tse, and M. M ´ edard “Channel coherence in the low SNR regime, ” IEEE Tr ans. Inform. Theory , vol. 53, pp. 976-997, March 2007. [12] A. Lapidoth and S. S hamai (S hitz), “Fading channels: How perfect need ‘perfect side information’ be?, ” IEE E Tr ans. Inform. Theory , vol. 48, pp. 1118-1134 , May 2002. [13] M. M ´ edard, “The effect upon channel capacity in wireless communications of perfect and imperfect kno wledge of channel, ” IEE E T rans . Inform. Theory , vol. 46, pp. 933-94 6, May 2000. [14] S. V erd ´ u, “Spectral efﬁcienc y i n the wideband regime, ” IEE E T rans. Inform. Theory , vol. 48, pp. 1319-1343, June 2002. [15] M. M ´ edard and R. G. Gallager , “Bandwidth scaling for fading multipath channels, ” IEE E T rans. Inform. Theory , vol. 48, pp. 840-852, Apr . 2002. [16] M. C. Gursoy , H. V . Poor, and S. V erd ´ u, “The noncoherent Rician fading channel – Part I : Str ucture of the capacity-achie ving input, ” IEEE T rans. W ir eless C ommun. , vol. 4, no. 5, pp. 2193-2206 , Sept. 2005. [17] M. C. Gursoy , H. V . P oor , and S . V erd ´ u, “The noncoherent Rician fading channel – Part II : Spectral efﬁcienc y in the low power regime, ” I EEE T rans. W ire less Commun. , vol. 4, no. 5, pp. 2207-22 21, Sept. 2005. [18] M. C. Gursoy , H. V . Poor and S . V erd ´ u , “Spectral E fﬁcienc y of P eak Power Limited Rician Block-Fading Channels, ” Pr oc. 2004 IEEE Int’l. Symp. Inform. Theory , Chicago, IL, June 27 - July 2, 2004. [19] V . G. Subramanian and B. Hajek, “Broad-band fading channels: signal burstiness and capacity , ” IEEE T rans. Inform. Theory , vol. 48, pp. 809-827, A pr . 2002. [20] I. E. T elatar and D. N. C. Tse, “Capacity and mutual information of wideband multipath fading channels, ” I EEE T rans . Inform. Theory , vol. 46, pp. 1384-1400, July 2000. [21] V . S ethuraman and B. Hajek, “Capacity Per Unit Energy of Fading Channels W ith a Peak Constraint, ” IEEE T rans. Inform. Theory , vol. 51, pp. 3102-3120 , Sept. 2005. [22] L. T ong, B. M. Sadler , and M. Dong, “Pil ot-assisted wireless transmission, ” IEE E Signal Processing Mag . , pp. 12-25, Nov . 2004. [23] B. Hassibi and B . M. Hochwald, “How much tr aining is needed in multiple-antenna wireless links, ” IEEE T rans. Inform. Theory , vol. 49, pp. 951-963, Apr . 2003. 43 [24] J. Baltersee, G. Fock, and H. Meyr , “ An information theoretic foundation of synchronized detection, ” IEEE T rans. Commun. , vol. 49, pp. 2115-2123, Dec. 2001. [25] J. Baltersee, G. Fock, and H. Meyr , “ Achiev able r ate of MIMO Channels with data-aided channel esti mation and perfect interleaving, ” IEEE J. Select. Area s Commun. , vol. 19, pp. 2358-2368, Dec. 2001. [26] D. Samardzija and N. Mandayam, “Pilot-assisted estimation of MIMO fading channel response and achiev able data rates, ” IEEE T rans . Signal Proc ess. , vol. 51, pp. 2882-2890, Nov . 2003. [27] S. Ohno and G. B. Giannakis, “ A verag e-rate optimal PSAM t ransmissions ov er time-selectiv e fading channels, ” IEEE Tr ans. W ir eless Commun. , vol. 1, pp. 712-720, Oct. 2002. [28] S. Misra, A. Swami, and L. T ong , “Optimal training for time-selective wireless fading channels using cutoff rate, ” EURASIP J . Appl. Signal Process. , vol. 2006, pp. 1-15, 2006. [29] X. Deng and A. M. Haimovich, “ Achiev able rates over time-v arying Rayleigh fading channels, ” IE EE T ran s. Commun. , vol. 55, pp. 1397-1406, July 2007. [30] I. Abou-Faycal, M. M ´ edard, and U. Madhow , “Binary adaptiv e coded pilot symbol assisted modulation over Rayleigh fading channels without feedback, ” IEEE T rans. Commun. , vol. 53, pp. 1036-10 46, June 2005. [31] S. Furrer and D. Dahlhaus, “Multiple-antenna signaling over fading channels wi th estimated channe l state information: Capacity Analysis, ” IEEE T rans. Inform. Theory , vol. 53, pp. 2028-2043, June 2007. [32] S. Adireddy , L. T ong, and H. V iswan athan, “Optimal placement of training for frequency -selective block-fading channels, ” IEE E T rans . Inform. Theory , vol. 48, pp. 2338-2 353, Aug. 2002. [33] M. F . Sencan and M.C . Gursoy , “ Achiev able rates for pilot-assisted transmission over Rayleigh fading channels, ” Proceedings of the 40th Annual Conference on Information Sciences and Systems, P rnceton University , P rinceton, NJ, March, 22-24, 2006. [34] S. Akin and M. C. Gursoy , “Training optimization for Gauss-Mark ov Rayleigh fading channels, ” Proc. of the IEEE International Conference on Communications (ICC), Glasgow , 2007. [35] S. Akin and M. C. Gursoy , “Pilot-symbol-assisted communications with noncausal and causal W iener ﬁlters, ” submitted t o t he I EEE International Conference on Communications (ICC), Beij ing, 2008. [36] J. A. Gansman, M. P . Fi tz, and J. V . Krogmeier , “Optimum and subop timum frame synchronization for pilot-symbol-assisted modulation, ” IEEE T rans. Inform. Theory , vol. 45, pp. 1327-1337, Oct. 1997. [37] W .-Y . Kuo and M. P . Fitz, “Frequency offset compensation of pilot symbol assisted modulation in frequency ﬂat fading, ” IEEE T rans . Commun. , vol. 45, pp. 1412-1416, Nov . 1997. [38] G. K. Kaleh, “Channel equalization for block transmission systems, ” IEEE J. Select. Ar eas C ommun. , vol. 13, pp. 1728-1736 , Jan. 1995. [39] D. G. Luenberg er , Optimization by V ector Space Methods. Wiley: New Y ork, 1969 [40] W . Rudin, P rinciples of Mathematical Analysis. McGra w Hill: New Y ork, 1964. [41] K. Knopp, T heory of Functions. Dover: New Y ork, 1945. [42] S. L ang, Complex Analysis 2nd. Ed. Springer-V erlag: New Y ork, 1985. [43] J. G. Proakis, Digital Communications 3rd . Ed. McGra w-Hill : New Y ork, 1995. 44

On the Capacity and Energy Efficiency of Training-Based Transmissions over Fading Channels

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment