Hessian and concavity of mutual information, differential entropy, and entropy power in linear vector Gaussian channels

1 Hessian and conca vity of mutual information, dif f erential entrop y , and entropy po wer in li near v ector Gauss ian channels Miquel Payar ´ o and Dani el P . Palomar Abstract W ithin the framework of linear vector Gaussian chann els with arbitr ary signalin g, closed- form expressions for the Jacobian of th e minimum mean square error and Fisher inform ation matrices with respect to arbitr ary param eters of the system are calculated in this paper . Capitalizing on prior research where the minimum mean square error and Fisher information matrices were linked to infor mation-the oretic quantities through dif fer entiation, clo sed-form expressions f or the Hessian of the mutu al in formatio n and the differential entropy are deriv ed. These expressions are then u sed to assess th e con cavity properties of mutual informatio n and differential entro py under different chan nel condition s and also to der i ve a m ultiv ariate version o f the en tropy power inequality d ue to Co sta. I . I N T R O D U C T I O N A N D M O T I V A T I O N Closed-form expressions for the Hessian matrix of the mutual information with respec t to arbitrary parameters of the system are useful from a theoretical p erspective b ut also from a practical standp oint. In sy stem des ign, if the mutua l information i s to be optimi zed t hroug h a gradient algorithm as in [1], the Hessian ma trix may be used alongside the grad ient in the Newton’ s method to s peed u p the con vergence of the algorithm. Additionally , from a system a nalysis perspe cti ve, the He ssian matrix can a lso c omplement the gradient in stud ying the sens iti vity of the mutual i nformation to variations of the s ystem parameters and, more importantly , in the cas es where t he mutual information is co ncave with respect to the sy stem design parameters, it can also be us ed to gua rantee the global optimality of a g i ven des ign. In this sense and within the framework of linear vector Gaus sian cha nnels with arbitrary signaling, the purpos e of this work is twofold. First, we ﬁnd closed-form exp ressions for the Hess ian matrix of the mutua l information, dif ferential en tropy and entropy power with respect to arbitrary paramete rs of the system and, second, we study the concavity properties of t he se quantities. Both goals are inti mately related sinc e concavity can be assess ed through the negati ve de ﬁniteness of the He ssian matrix. As intermediate results of our study , we d eri ve closed-form expressions for the Jac obian of the minimum mean-squa re error (MMSE) and Fisher information matrices, wh ich are interesting results in their own right and contrib ute to the exploration of the fundamental links betwee n information theory and estimation theo ry . Initial connec tions between information- a nd es timation-theoretic qua ntities for linear channe ls with a dditi ve Gaussian noise date bac k from the late ﬁfties: in the proof of Sha nnon’ s entropy power inequality [2], Stam used the fact that the d eri vati ve of the ou tput d if ferential entropy with respec t to the adde d noise power is equ al to the Fisher informati on of the channel ou tput an d attributed this identity to De Bruijn. More tha n a decade later , the links b etween bo th w orlds strengthened whe n Dun can [3] and Kadota, Zaka i, a nd Ziv [ 4] independ ently represented mutual information as a function o f the error in cau sal ﬁltering. Much mo re recently , in [5], Guo, Sh amai, and V e rd ´ u fruitfully explored further these conne ctions a nd, as their main result, proved tha t the deri vati ve of the mutual information (and d if ferential entropy) with res pect to the signal-to-noise ratio (SNR) is equal to half the MMSE regardless of the input s tatistics. Th e main result in [5] was A sho rter version of this paper is to appear in IEEE T ransa ctions on Information Theory . This w ork was supported by the RGC 618008 research gran t. M. Payar ´ o conducted his part of this research while he was with the Department of E lectronic and Computer Engineering, Hong K ong Univ ersity of S cience and T echnology , Clear W ater B ay , K o wloon, Hong K ong. He i s now with the C entre T ecnol ` ogic de T elecomunicacions de Catalun ya (CTTC), Barcelon a, Spain (e-mail: miqu el.payaro@cttc.es). D. P . Palomar is with the D epartment of Electronic and Compu ter Engineering, Hong K ong University of Science and T echnolog y , Clear W ater Bay , K owloon, Hong K ong (e-mail: palomar@us t.hk). 2 ( ) ( ; ) h I Y S Y { } ( ) = E S S Y Φ E { } ( ) = J Y Y Y Γ E G C Y = S + N { } ( ) ( ) ⊗ S S Y Y Φ Φ E { } ( ) ( ) ⊗ Y Y Y Y Γ Γ E ( ) ⋅ G D ( ) ⋅ C D ( ) ⋅ G D ( ) ⋅ C D ( ) ⋅ G H ( ) ⋅ C H Signa l Pa rameters Noise Pa ra m eters Fig. 1. Simpliﬁed representation of the relations between the quantities dealt with i n this work . The Jacobian, D , and Hessian, H , operators represent ﬁrst and second orde r differentiation, respectiv ely . generalized to the abstract W iener spa ce by Zakai in [6] and by Palomar and V erd ´ u in two dif ferent directions: in [1] they calcu lated the partial deri vati ves of the mutual information and dif ferential entropy with respe ct to arbitrary parameters of the sy stem, rather than with res pect to the SNR alone , and in [7] they repres ented the de ri vati ve of mutual information as a fun ction of the conditional marginal inpu t giv en the output for c hannels where the n oise is not constrained to be Ga ussian. In this p aper we build upon the s etting of [1], where loo sely spea king, it was proved that, for the linear vec tor Gaussian channe l Y = G S + C N , (1) i) the gradien ts of the diff eren tial entropy h ( Y ) and the mutua l information I ( S ; Y ) with respe ct to func tions of the linear transformation unde r gon e by the input, G , are linear functions of the MMSE matrix E S and ii) the gradient o f the differential e ntropy h ( Y ) with respec t to the linear transformation unde r gone by the noise, C , are linear functions of the Fishe r information matrix, J Y . In this work, we show tha t the previous two key qu antities E S and J Y , wh ich comple tely ch aracterize the ﬁrst-order deriv ati ves, are n ot enough to d escribe the second -order deriv ativ es. For that purpo se, we introduce the more reﬁned conditional MMSE matrix Φ S ( y ) and conditional Fisher information matr ix Γ Y ( y ) (note that when these qu antities are averaged with res pect to the distribution o f the output y , we recover E S = E { Φ S ( Y ) } and J Y = E { Γ Y ( Y ) } ). In particular , the second -order deriv ativ es de pend on Φ S ( y ) a nd Γ Y ( y ) through the follo wing terms: E { Φ S ( Y ) ⊗ Φ S ( Y ) } a nd E { Γ Y ( Y ) ⊗ Γ Y ( Y ) } . See Fig. 1 for a schematic represe ntation o f these relations. Analogous results to some o f the expressions pres ented in this pa per pa rticularized to the scalar Ga ussian ch annel were simultaneously deri ved in [8], [ 9], whe re t he secon d and third deri vati ves of the mutual information wi th respect to the SNR we re c alculated. As an app lication of the obtaine d express ions, we show conc avit y p roperties of the mutual information and the dif ferential entropy , deri ve a multiv ariate ge neralization of the entropy power ine quality (EPI) du e to Cos ta in [10]. Our multi variate EPI has already found an ap plication in [11] to deriv e outer bound s o n the capac ity region in multiuser c hannels with fee dback. This pa per is organized as follo ws. In S ection II, the model for the linea r vector Gau ssian chann el is given a nd the differential entropy , mutual information, minimum mean-squa re error , a nd Fisher information qu antities as well as the relationships amon g them are introduc ed. Th e main results of the pa per are given in Section III where we present the express ions for the Jac obian matrix of the MMSE a nd Fisher information an d also for the Hes sian matrix of the mutual information a nd diff erential entropy . In S ection IV the concavity properties of the mutua l information are studied and in Sec tion V a multiv ariate gen eralization of Costa’ s EPI in [10] is given. Finally , an extension to the complex-valued ca se of some of the obtained results is considered in Section VI. Notation: Straight boldface den ote multiv ariate quantities such a s vectors (lowercase) and matrices (uppercas e). Uppercas e italics deno te ran dom variables, and the ir realizations are represented by lowercase italics. The se ts of 3 q -dimensiona l symmetric, positi ve semideﬁnite, and po siti ve deﬁnite matrices are d enoted by S q , S q + , and S q ++ , respectively . The elemen ts of a ma trix A are represented b y A ij or [ A ] ij interchangea bly , whereas the elements of a vec tor a are represe nted by a i . Th e o perator diag ( A ) repres ents a column vector with the d iagonal entries o f matrix A , Diag ( A ) and Diag ( a ) re present a d iagonal matrix whos e non-ze ro elements are g i ven b y the diago nal elements of matrix A and by the elements of vec tor a , resp ecti vely , and vec A repres ents the vector o btained by stacking the columns of A . For symmetric matrices, vech A is obtained from vec A by elimi nating the rep eated elements loca ted above the main diag onal of A . The Kronec ker matrix produc t is represente d by A ⊗ B and the Schur (or Had amard) e lement-wise matrix p roduct is denoted by A ◦ B . The superscripts ( · ) T , ( · ) † , and ( · ) + , denote transpose, Hermitian, a nd Moore-Penrose ps eudo-in verse operations, resp ecti vely . W ith a s light abuse of notation, we consider that when s quare root or mu ltiplicati ve in verse are app lied to a vector , they a ct up on the e ntries of the vector , we thus have  √ a  i = √ a i and [1 / a ] i = 1 / a i . I I . S I G NA L M O D E L W e cons ider a ge neral discrete-time linear vector Gaus sian chann el, whose output Y ∈ R n is repres ented by the follo wing signa l model Y = G S + Z , (2) where S ∈ R m is the zero-mea n cha nnel input vector with covariance matri x R S , the matrix G ∈ R n × m speciﬁes the linear transformation un dergone by the inp ut vector , and Z ∈ R n represents a ze ro-mean Gau ssian noise with non-singular covariance matrix R Z . The channe l trans ition prob ability de nsity function correspo nding to the channe l model in (2 ) is P Y | S ( y | s ) = P Z ( y − G s ) = 1 p (2 π ) n det ( R Z ) exp  − 1 2 ( y − G s ) T R − 1 Z ( y − G s )  (3) and the marginal p robability d ensity func tion of the output is giv en by 1 P Y ( y ) = E  P Y | S ( y | S )  , (4) which is an inﬁnitely dif ferentiable co ntinuous function of y regardless of the distribution of the input vector S thanks to the s moothing p roperties of the added noise [10, Section II]. At so me points, it may be con venient to de ﬁne the ran dom vector X = G S with cov ariance matrix given by R X = GR S G T and a lso e xpres s the no ise vec tor as Z = C N , wh ere C ∈ R n × n ′ , s uch tha t n ′ ≥ n , and the noise covariance matrix R Z = C R N C T has an inv erse so that (3) is meaningful. W ith this no tation, P Y | X ( y | x ) can be ob tained b y replac ing G s by x in (3) and the chann el mode l (2) can be alternati vely rewr itten as Y = G S + C N = X + C N = G S + Z = X + Z . (5) In the following subsections we desc ribe the information- and es timation-theoretic quantities who se relations we are interes ted in. A. Dif ferential entr opy a nd mu tual informa tion The dif ferential e ntropy 2 of the co ntinuous rando m vector Y is deﬁ ned as [12, Chap ter 9] h ( Y ) = − E { log P Y ( Y ) } . (6) For the case where the distrib ution of Y a ssigns positi ve mass to o ne o r mo re s ingletons in R n , the above deﬁnition is usually extended with h ( Y ) = −∞ . For the linear vector Gau ssian cha nnel in (5), the input-output mutual information is [12, Cha pter 1 0] I ( S ; Y ) = h ( Y ) − h ( Z ) = h ( Y ) − 1 2 logdet (2 π e R Z ) = h ( Y ) − 1 2 logdet  2 π e CR N C T  . (7) 1 W e highlight that in e very exp ression in volving integrals, expectation operators, or ev en a density we should include the statement if it exists . 2 Throughout this paper we w ork with natura l logarithms and thus nats are u sed as information units. 4 B. MMSE matrix W e c onsider the estimation of the input signa l S bas ed on the ob servation of a realization of the output Y = y . The mean square error (MSE) matrix o f an estimate b S ( y ) of the input S giv en the realization of the o utput Y = y is deﬁned as E  ( S − b S ( Y ))( S − b S ( Y )) T  and it giv es u s a des cription of the performance of the estimator . The estimator that s imultaneously a chieves the minimum MSE for all the compone nts of the estimation error vector is giv en by the c onditional mea n estimator b S ( y ) = E { S | y } and the correspond ing MSE matrix, referred to a s the MMSE ma trix, is E S = E  ( S − E { S | Y } )( S − E { S | Y } ) T  . (8) An a lternati ve an d useful expres sion for the MMSE matrix can be o btained b y co nsidering ﬁrst the MMSE matrix conditioned o n a sp eciﬁc realization of the output Y = y , wh ich is den oted by Φ S ( y ) and de ﬁned as: Φ S ( y ) = E  ( S − E { S | y } )( S − E { S | y } ) T   y  . (9) Observe from (9) that Φ S ( y ) is a positiv e se mideﬁnite matrix. Finally , the MMSE matrix in (8) ca n be ob tained by tak ing the expec tation in (9) with re spect to the distribution of the outpu t: E S = E { Φ S ( Y ) } . (10) C. F isher infor mation matrix Besides the MMSE matrix, another q uantity that is clos ely related to the differential entropy is the Fisher information matrix with respect to a translation parameter , which is a spec ial case of the Fishe r information matrix [13]. The Fishe r information is a me asure of the minimum error in e stimating a pa rameter of a distribution an d is closely related to the C ram ´ er- Rao lower bou nd [14]. For an arbitrary rand om vector Y , the Fishe r information matrix with respect to a trans lation p arameter is deﬁned as J Y = E  D T y log P Y ( Y ) D y log P Y ( Y )  , (11) where D is the Jacobian op erator . This operator tog ether with the He ssian operator , H , and other d eﬁnitions and con ventions u sed for dif ferentiation with respect to multi dimen sional parameters are described in Appendices A and B. The expression of the Fisher information in (11) in terms o f the Jac obian of l og P Y ( y ) can be transformed into an expres sion in terms o f its Hessian ma trix, thank s to the logarithmic ide ntity H y log P Y ( y ) = H y P Y ( y ) P Y ( y ) − D T y log P Y ( y ) D y log P Y ( y ) (12) together with the fact that E  H y P Y ( Y ) /P Y ( Y )  = R H y P Y ( y ) d y = 0 , which follows directly from the expression for H y P Y ( y ) in (154) in Appe ndix C. The alternative expression for the Fishe r information matrix in te rms of the Hess ian is then J Y = − E { H y log P Y ( Y ) } . (13) Similarly to the p re vious section with the MMSE matrix, it will be u seful to deﬁne a c onditional form of the Fisher information matrix Γ Y ( y ) , in su ch a way tha t J Y = E { Γ Y ( Y ) } . At this point, it may not be clear which of the two forms (11) or (13) will be more u seful for the res t of the pape r; we advance that deﬁning Γ Y ( y ) base d on (13) will prove more con venient: Γ Y ( y ) = − H y log P Y ( y ) = R − 1 Z − R − 1 Z Φ X ( y ) R − 1 Z , (14) where the secon d e quality is proved in Lemma C.4 in Ap pendix C 3 and where we have Φ X ( y ) = GΦ S ( y ) G T . 3 Note that the lemmas placed in the appendices have a preﬁx indicating the appendix where they belong to ease its localization. From this po int we will omit the exp licit reference to the appendix. 5 D. Prior k nown r elations a mong infor mation- and estimation-theoretic q uantities The ﬁrst known relation b etween the above de scribed quantities is the De Bruijn identity [2] (see also the alternati ve deriv ation in [5]), which couples the Fisher information with the d if ferential entropy according to d d t h  X + √ t Z  = 1 2 T r J Y , (15) where, in this c ase Y = X + √ t Z . A multi variate extension of the De Bruijn iden tity was fou nd in [1] as ∇ C h ( X + C N ) = J Y CR N . (16) In [5], the more c anonical operational measures of mutual information and MMSE were c oupled through the identity d d snr I  S ; √ snr S + Z  = 1 2 T r E S . (17) This res ult was generalize d in [1] to the multiv ariate cas e, yielding ∇ G I ( S ; G S + Z ) = R − 1 Z GE S . (18) Note that the s imple de penden ce of mutual information on differential en tropy establishe d in (7), implies tha t ∇ G I ( S ; G S + Z ) = ∇ G h ( G S + Z ) . From the se previous existing resu lts, we realize that t he output differential en tropy fun ction h ( G S + C N ) is related to the MMSE matrix E S through dif ferentiation with respect to the transformation G under gone by the signal S (see (18)) and is related to the Fisher information matrix J Y through d if ferentiation with respec t to the transformation C undergone by the Ga ussian noise N (see (16)). Th is is illustrated i n Fig. 1. A comprehensive accoun t of othe r relations can be found in [5]. Since we a re interested in ca lculating the Hess ian matri x o f diff eren tial entropy and mutual information q uantities, in the light of the results in (16) and ( 18 ), it i s instrumental to ﬁrst calculate the Jacobian matrix of the MMSE and Fisher information matrices , as c onsidered in the next s ection. I I I . J AC O B I A N A N D H E S S I A N R E S U L T S In order to derive the H essian of the diff erential entropy and the mutual information, we s tart b y obtaining the Jacobia ns o f the F isher information matrix and the MMSE matrix. A. J aco bian of the Fi she r informa tion matrix As a warm-up, c onsider ﬁrst the signa l model in (5) with Gau ssian s ignaling, Y G = X G + C N . In this c ase, the conditional Fisher information matrix d eﬁned in (14) does not d epend on the realization of the rece i ved vector y and is (e.g., [14, Append ix 3C]) Γ Y G = ( R X G + R Z ) − 1 =  R X G + CR N C T  − 1 . (19) Consequ ently , we have that J Y G = E { Γ Y G } = Γ Y G . The Jacobian matrix of the Fisher information matrix with respe ct to the noise trans formation C can be readily obtained as D C J Y G = D R Z J Y G · D C R Z = D R Z ( R X G + R Z ) − 1 · D C CR N C T (20) = − D + n ( J Y G ⊗ J Y G ) D n · 2 D + n ( CR N ⊗ I n ) (21) = − 2 D + n ( J Y G ⊗ J Y G ) ( CR N ⊗ I n ) (22) = − 2 D + n E { Γ Y G ⊗ Γ Y G } ( CR N ⊗ I n ) , (23) where (20) follows from the Ja cobian chain rule in Lemma B.5; in (21) we h av e applied Lemmas B.7.6 and B.7.7 with D + n being the Moore-Penrose in verse of the duplication matrix D n deﬁned in Appe ndix A 4 ; and ﬁnally (22) 4 The matrix D n appears in (23) and in man y successiv e expressions because we are e xplicitly taking into account the fac t that J Y is a symmetric matrix. T he reader is referred to Appendices A and B for more details on t he con ventions used in this paper . 6 follo ws from the facts tha t D n D + n = N n , D + n N n = D + n , and ( A ⊗ A ) N n = N n ( A ⊗ A ) , which are gi ven in (132 ) and (129) in Appendix A, respe cti vely . In the follo wing theo rem we generalize (23) for the case of arbitrary s ignaling. Theorem 1 (J ac obian of the F isher inform ation matrix): Consider the signal model Y = X + C N , where C is an arbitrary dete rministic matrix, the s ignaling X is arbitrarily d istrib uted, an d the no ise vector N is Gaus sian and indep endent of the inp ut X . Then, the Ja cobian of the F isher information ma trix of the n -dimensiona l o utput vector Y i s D C J Y = − 2 D + n E { Γ Y ( Y ) ⊗ Γ Y ( Y ) } ( CR N ⊗ I n ) , (24) where Γ Y ( y ) is deﬁne d in (14). Pr oof: Since J Y is a sy mmetric matrix, its Jac obian can be w ritten a s D C J Y = D C vech J Y (25) = D C D + n vec J Y (26) = D + n D C vec J Y (27) = D + n ( − 2 N n E { Γ Y ( Y ) CR N ⊗ Γ Y ( Y ) } ) (28) = − 2 D + n E { Γ Y ( Y ) ⊗ Γ Y ( Y ) } ( CR N ⊗ I n ) , (29) where (26) follows from (131) in Appe ndix A and (27) follows from Lemma B.7.2. The expression for D C vec J Y is d eri ved in Appe ndix D, which yields (28) and (29) follows from Lemma A.3 an d D + n N n = D + n as d etailed in Appendix A. Remark 1: Due to the fact that, in gene ral, the c onditional Fishe r information matrix Γ Y ( y ) d oes de pend on the particular value of the o bservation y , it is n ot pos sible to express the expectation o f the Kronecker product as the Kronecker product of the expectations, as in (22) for the Gaus sian sign aling case, where Γ Y G does not dep end on the particular value o f the ob servation y . B. J aco bian of the MMSE m atrix Again, as a warm-up, before d ealing with the arbitrary signaling c ase we conside r ﬁrst the s ignal mode l in (5) with Gauss ian signaling, Y G = G S G + Z , and stud y the prope rties of the con ditional MMSE matrix, Φ S ( y ) , which d oes not depen d on the pa rticular realization o f the obse rved vector y . Precisely , we have [14, Chap ter 11] Φ S G =  R − 1 S + G T R − 1 Z G  − 1 (30) and thus E S G = E { Φ S G } = Φ S G . Follo wing s imilar ste ps as in (20)-(23) for the Fisher information matrix, the Jac obian matrix o f the MMSE matrix with respect to the s ignal transformation G can be readily obtained as D G E S G = − 2 D + m E { Φ S G ⊗ Φ S G }  I m ⊗ G T R − 1 Z  , (31) Note that the express ion in (31) for the Jaco bian of the MMSE matrix ha s a very similar structure as the Ja cobian for the Fisher information matrix in (23). The follo wing theorem formalizes the fact that the Gauss ian a ssumption is unnece ssary for (31) to h old. Theorem 2 (Jacobian of the MMSE matrix): Conside r the signal mo del Y = G S + Z , whe re G is an arbitrary deterministic matrix, the m -dimensional signaling S is arbitrarily distributed, and the noise vec tor Z is Gaus sian and independ ent o f the inpu t S . Th en, the Ja cobian of the MMSE matrix of the input vector S is D G E S = − 2 D + m E { Φ S ( Y ) ⊗ Φ S ( Y ) }  I m ⊗ G T R − 1 Z  , (32) where Φ S ( y ) is deﬁne d in (9). Pr oof: The proof is analogo us to that o f The orem 1 with the a ppropriate no tation adaptation. Th e calculation of D G vec E S can be fou nd in App endix E. Remark 2: In light of the two results in Theorems 1 an d 2, it is no w a pparent that Γ Y ( y ) plays an a nalogous role in the differentiation of the F isher information matrix as the on e playe d by the conditional MMSE matrix Φ S ( y ) when d if ferentiating the MMSE matrix, which justiﬁes the choice made in Section II-C of identifying Γ Y ( y ) with the expression in (13) and not with the expres sion in (11 ). 7 C. J acob ians with r es pect to a rbitrary parameters W ith the bas ic r es ults f or the Jacobian of the MMSE and Fi sh er i nformation matrices in Theorems 1 and 2, one can easily ﬁnd the Ja cobian with resp ect to arbitrary parameters of the sys tem through the chain rule for differentiation (see Lemma B.5). P recisely , we are interes ted in c onsidering the c ase where t he linea r trans formation undergone b y the signal is deco mposed as the product of two linear transformations, G = HP , where H represen ts the cha nnel, which is externally determined by the propagation e n vironment c onditions, and P represents the linear precode r , which is s peciﬁed b y the system design er . Theorem 3 (J acobians with respect to arbitrary parame ters): Co nsider the s ignal mo del Y = HP S + C N , where H ∈ R n × p , P ∈ R p × m , a nd C ∈ R n × n ′ , with n ′ ≥ n , are a rbitrary deterministic matrices, the signaling S ∈ R m is arbitrarily d istrib uted, the no ise N ∈ R n is Gaus sian, indep endent of the input S , and has covariance matrix R N , and the total noise, deﬁned a s Z = C N ∈ R n , has a positiv e deﬁnite cov ariance matrix given by R Z = C R N C T . Then, the MMSE a nd Fish er information matrices sa tisfy D P E S = − 2 D + m E { Φ S ( Y ) ⊗ Φ S ( Y ) }  I m ⊗ P T H T R − 1 Z H  (33) D H E S = − 2 D + m E { Φ S ( Y ) ⊗ Φ S ( Y ) }  P T ⊗ P T H T R − 1 Z  (34) D R Z J Y = − D + n E { Γ Y ( Y ) ⊗ Γ Y ( Y ) } D n (35) D R N J Y = − D + n E { Γ Y ( Y ) ⊗ Γ Y ( Y ) } ( C ⊗ C ) D n ′ . ( 36) Pr oof: The Jacobian s D P E S and D H E S follo w from the J acobian D G E S calculated in Th eorem 2 a pplying the follo wing chain rules (from Lemma B.5): D P E S = D G E S · D P G (37) D H E S = D G E S · D H G , (38) where G = HP a nd whe re D P G = I m ⊗ H and D H G = P T ⊗ I n can be fou nd in Le mma B.7.1. Similarly , the Jaco bian D R Z J Y can be c alculated by app lying D C J Y = D R Z J Y · D C R Z , (39) where D C R Z = 2 D + n ( CR N ⊗ I n ) as in Lemma B.7.7. Recalling that, in this ca se, the matrix C is a dummy variable that is us ed only to obtain D R Z J Y through the ch ain rule, the factor ( CR N ⊗ I n ) ca n be eliminated from both s ides of the e quation. Using D + n D n = I n , the res ult follows. Finally , the Jac obian D R N J Y follo ws from the chain rule D R N J Y = D R Z J Y · D R N R Z = D R Z J Y · D + n ( C ⊗ C ) D n ′ , (40) where the expression for D R N R Z is obtained from Le mma B.7.3 and where we have used that D + n ( A ⊗ A ) D n D + n = D + n ( A ⊗ A ) N n = D + n N n ( A ⊗ A ) = D + n ( A ⊗ A ) . D. Hessian of diff erential entropy and mutual information Now that we have obtaine d the Ja cobians of the MMSE and Fisher matrices, we will capitalize on the resu lts in [1] to obtain the He ssians of the mutual informati on I ( S ; Y ) and the diff eren tial entropy h ( Y ) . W e start b y recalling the res ults that will be us ed. Lemma 1 (Dif fer ential entropy J acob ians [1]): Consider the setting of Theorem 3. T hen, the dif ferential entropy of the output vec tor Y , h ( Y ) , satisﬁes D P h ( Y ) = vec T  H T R − 1 Z HPE S  (41) D H h ( Y ) = vec T  R − 1 Z HPE S P T  (42) D C h ( Y ) = vec T  J Y CR N  (43) D R Z h ( Y ) = 1 2 vec T  J Y  D n (44) D R N h ( Y ) = 1 2 vec T  C T J Y C  D n ′ . (45) 8 Remark 3: Note that in [1] the a uthors gav e the expressions (41 ) a nd (42 ) for the mutual information. Rec alling the simple relation (7) be tween mutual information and differential e ntropy for the li ne ar vec tor Gaussian c hannel, it becomes e asy to see that (41) and (42) are also valid by replacing the differential e ntropy by the mutua l information becaus e the dif ferential entropy of the noise vector is indepen dent o f P a nd H . Remark 4 : Alternativ ely , the expressions (43 ), (44), and (45) d o not h old verbatim for the mutua l information becaus e, in tha t cas e, the diff erential en tropy of the no ise vector does d epend on C , R Z , a nd R N and it h as to be taken into acc ount. Then , from (7) a nd a pplying b asic Jaco bian results from [15, Chapter 9 ], we have D C I ( S ; Y ) = D C h ( Y ) − vec T   CR N C T  − 1 CR N  (46) D R Z I ( S ; Y ) = D R Z h ( Y ) − 1 2 vec T  R − 1 Z  D n (47) D R N I ( S ; Y ) = D R N h ( Y ) − 1 2 vec T  C T ( CR N C T ) − 1 C  D n ′ . (48) W ith Lemma 1 at hand, and the expression s obtaine d in the previous section for the Jacobian matrices of the Fisher information and the MMSE matrices, we are ready to calculate the He ssian matrix with respe ct to all the parameters of interest. Theorem 4 (Dif ferential entr opy Hess ians): Consider the setting o f Theorem 3 . The n, the d if ferential entropy of the output vec tor Y , h ( Y ) , satisﬁes H P h ( Y ) =  E S ⊗ H T R − 1 Z H  − 2  I m ⊗ H T R − 1 Z HP  N m E { Φ S ( Y ) ⊗ Φ S ( Y ) }  I m ⊗ P T H T R − 1 Z H  H H h ( Y ) =  PE S P T ⊗ R − 1 Z  − 2  P ⊗ R − 1 Z HP  N m E { Φ S ( Y ) ⊗ Φ S ( Y ) }  P T ⊗ P T H T R − 1 Z  =  E P S ⊗ R − 1 Z  − 2  I p ⊗ R − 1 Z H  N p E { Φ P S ( Y ) ⊗ Φ P S ( Y ) }  I p ⊗ H T R − 1 Z  (49) H C h ( Y ) = ( R N ⊗ J Y ) − 2( R N C T ⊗ I n ) N n E { Γ Y ( Y ) ⊗ Γ Y ( Y ) } ( CR N ⊗ I n ) (50) H R Z h ( Y ) = − 1 2 D T n E { Γ Y ( Y ) ⊗ Γ Y ( Y ) } D n (51) H R N h ( Y ) = − 1 2 D T n ′ ( C T ⊗ C T ) E { Γ Y ( Y ) ⊗ Γ Y ( Y ) } ( C ⊗ C ) D n ′ . (52) Pr oof: See Appe ndix F. Remark 5: The Hessian results in Theorem 4 a re gi ven for the differential entropy . The Hessian matri ce s for the mutua l information can be found straightforwardly from (7) a nd Remarks 3 and 4 as H P I ( S ; Y ) = H P h ( Y ) , H H I ( S ; Y ) = H H h ( Y ) , an d H C I ( S ; Y ) = H C h ( Y ) + 2( R N C T ⊗ I n ) N n  ( CR N C T ) − 1 ⊗ ( CR N C T ) − 1  ( CR N ⊗ I n ) − R N ⊗ ( CR N C T ) − 1 (53) H R Z I ( S ; Y ) = H R Z h ( Y ) + 1 2 D T n ( R − 1 Z ⊗ R − 1 Z ) D n (54) H R N I ( S ; Y ) = H R N h ( Y ) + 1 2 D T n ′  ( C T ( CR N C T ) − 1 C ) ⊗ ( C T ( CR N C T ) − 1 C )  D n ′ . (55) E. Hessian o f mutual inform ation with respect to the transmitted signal covarian ce While in the previous se ctions we have obtained expres sions for the J acobian of the MMSE and the Hessia n of the mutual information and differential en tropy with res pect to the n oise covariances R Z and R N among others , we have purposely av oided ca lculating thes e J acobian and Hessian matrices with res pect to cov arianc e matrices of the signal su ch as the squared preco der Q P = PP T , the transmitted signa l c ov ariance Q = PR S P T , or the inpu t signal c ov ariance R S . The reaso n is tha t, in gene ral, the mu tual information, the differential entropy , and the MMSE a re not functions of Q P , Q , or R S alone. It ca n be s een, for example , by noting that, gi ven Q P , the corres ponding precod er matrix P is speciﬁed up to an arbitrary orthonormal transformation, a s bo th P and PV , with V being orthonormal, yield the sa me sq uared preco der Q P . Now , it is ea sy to see that the two preco ders P and PV n eed not yield the s ame mutual information, a nd, thus , the mutual information is not well d eﬁned as a func tion of Q P alone becaus e the 9 mutual information can not b e uniquely d etermined from Q P . The same reasoning a pplies to t he dif ferential e ntropy and the MMSE matrix. There are, ho wever , some particular cases wh ere the quantities of mutual information and dif fe rential entropy are ind eed func tions o f Q P , Q , or R S . W e have, for examp le, the pa rticular ca se whe re the signaling is Ga ussian, S = S G . In this c ase, the mutual information is given by I ( S G ; Y G ) = 1 2 logdet ( I n + R − 1 Z HPR S P T H T ) , (56) which is, of course , a function of the tr ans mitted sign al c ov ariance Q = PR S P T , a function of the inpu t sign al covari an ce R S , and also a function o f the s quared prec oder Q P = PP T when R S = I m . Upon direct dif ferentiation with resp ect to, e.g., Q we obtain [15, Ch apter 9] D Q I ( S G ; Y G ) = 1 2 vec T  H T R − 1 Z H ( I p + QH T R − 1 Z H ) − 1  D p , (57) which, a fter so me algeb ra, agree s with the result in [1, Th eorem 2, E q. (23)] adap ted to our notation, D Q I ( S G ; Y G ) = 1 2 vec T  H T R − 1 Z HPE S G R − 1 S P − 1  D p , (58) where, for t he sake of simplicit y , we have assu med that the in verses of P and R S exist an d where the MMSE is given by E S G = ( R − 1 S + P T H T R − 1 Z HP ) − 1 . Note now that the MMSE matrix is not a function of Q and, conseq uently , it cannot be u sed to de ri ve the He ssian of the mutual information with res pect to Q as we have done in Sec tion III-D for other vari able s such as P or C . Therefore, the Hessian of the mutual information for the Gaussian s ignaling ca se has to be obtained by direct differentiati on of the expression in (57) with respect to Q , yielding [15, Chapter 1 0] H Q I ( S G ; Y G ) = 1 2 D T p  ( I p + H T R − 1 Z HQ ) − 1 H T R − 1 Z H  ⊗  H T R − 1 Z H ( I p + H T R − 1 Z HQ ) − 1  D p . (59) Another p articular case whe re the mutua l information is a func tion of the transmit covari anc e matrices is in the low- SNR regime [16 ]. As suming that R Z = N 0 I , Prelov and V erd ´ u s howed that [16, The orem 3] I ( S ; Y ) = 1 2 N 0 T r  HPR S P T H T  − 1 4 N 2 0 T r  HPR S P T H T  2  + o  N − 2 0  , (60) where the depen dence (up to terms o ( N − 2 0 ) ) of the mutua l information with respect to Q = PR S P T is explicitly shown. Th e Jacob ian an d Hess ian of the mutual information, for this particular ca se beco me [15, Ch apters 9 a nd 10]: D Q I ( S ; Y ) = 1 2 N 0 vec T  H T H  D p − 1 2 N 2 0 vec T  H T HQH T H  D p + o  N − 2 0  (61) H Q I ( S ; Y ) = − 1 2 N 2 0 D T p  H T H ⊗ H T H  D p + o  N − 2 0  . (62) Even though we have shown two particular cas es where the mutual information is a func tion of the transmitted signal c ov ariance matrix Q = PR S P T , it is impo rtant to highlight that care mu st b e taken when calculating the Jacobia n matrix of the MMSE and the Hessian matrix of the mutual information or dif ferential entropy as, in general, these quantities are not functions of Q P , Q , nor R S . In this sense, the results in [1, Theorem 2, Eq s. (23), (24), (25); Corollary 2, Eq. (49); Theorem 4, Eq . (56)] only make s ense wh en the mutual information is well deﬁned as a function of the signal covariance matri x (s uch as the case s s een above where the signaling is Ga ussian or the SNR is low). I V . M U T UA L I N F O R M A T I O N C O N C A V I T Y R E S U L T S As we have mentioned in the introduction, studying the c oncavity of the mutual information with respect to design pa rameters of the s ystem is important from both analys is and d esign perspec ti ves. The ﬁrst c andidate as a sys tem parameter of interest that naturally a rises is the precod er matrix P in the signal model Y = H P S + Z . Ho wever , one realizes from the expression H P I ( S ; Y ) in Remark 5 of Theo rem 4, that for a su f ﬁciently sma ll P t he Hessian is ap proximately H P I ( S ; Y ) ≈ E S ⊗ H T R − 1 Z H , which, from Lemma G.3 is 10 positiv e deﬁnite and, conse quently , the mu tual information is not c oncave in P (actually , it is c on vex). Numerical computations sh ow that the non-co ncavity o f the mu tual information with respec t to P also h olds for non-small P . The n ext ca ndidate is the transmitted s ignal covariance matrix Q , which, at ﬁrst sigh t, is be tter suited tha n the precoder P as it is well kn own that, for the Gaussian signaling case , the mutual information as in (56) is a co ncave function of the transmitted signal c ov arianc e Q . S imilarly , in the lo w SNR regime we ha ve that, from (62), the mutual information is also a con cave function with respect to Q . Since in this work we are intereste d in the properties of the mutual information for a ll the SNR range and for arbitrary signaling, we wish to study if the above results can be gen eralized. Un fortunately , as discusse d in the previous section, the ﬁrst d if ference o f the ge neral case with res pect to the particular ca ses of Gaus sian s ignaling and low SNR is that the mutual information is not well de ﬁned as a func tion of the trans mitted s ignal covariance Q on ly . Having discarded the co ncavity of the mutual information with res pect to P a nd Q , in the following subs ections we s tudy the c oncavity of the mu tual information with respe ct to other parameters of the s ystem. For the sake of no tation we d eﬁne the channel covariance ma trix as R H = H T R − 1 Z H , which will be used in the remainder of the pa per . A. The scalar case : c oncavity in the S NR The concavity of the mutual information with respe ct to the SNR for arbitrary inpu t distributions can be deriv ed as a corollary from Costa’ s resu lts in [10], where he proved the co ncavity of the entropy power of a rand om variable consisting of the sum of a sign al a nd Ga ussian no ise with res pect to the power of the signa l. As a direct conseq uence, the concavity of the e ntropy power implies the con cavity of the mutual information in the signal power , or , equiv alently , in the SNR. In this section, we g i ve an explicit express ion of the Hes sian of the mutua l information with respect to the SNR, which was previously unav ailable for vector Gaus sian channe ls. Cor ollary 1 (Mutua l information Hes sian with respect to the SNR): Con sider the signa l model Y = √ snr H S + Z , with snr > 0 and where all the terms are deﬁne d as in T heorem 3. The n, H snr I ( S ; Y ) = d 2 I ( S ; Y ) d snr 2 = − 1 2 T rE  ( R H Φ S ( Y )) 2  . (63) Moreover , H snr I ( S ; Y ) ≤ 0 for all snr , which implies that the mutual information is a c oncave fun ction w ith respect to snr . Pr oof: First, we con sider the result in [1, Co rollary 1], D snr I ( S ; Y ) = 1 2 T r R H E S . (64) Now , we only need to choose P = √ snr I p , which implies m = p , and apply the results in Theorem 4 and the chain rule in L emma B.5 to obtain H snr I ( S ; Y ) = 1 2 D snr T r R H E S (65) = 1 2 D E S T r R H E S · D P E S · D snr P (66) = 1 2 vec T ( R H ) D p ( − 2 D + p E { Φ S ( Y ) ⊗ Φ S ( Y ) } ( I p ⊗ √ snr R H )) 1 2 √ snr vec I p (67) = − 1 2 vec T ( R H ) E { Φ S ( Y ) ⊗ Φ S ( Y ) } vec R H , (68) where in last equa lity we have use d Le mma A.4, the equality D p D + p = N p , and the fact that, for symmetric matrices, vec T ( R H ) N p = vec T R H as in (12 8) in App endix A. From the expression in (68), it readily follo ws that the mutual information is a c oncave function of the snr parameter beca use, fr om Lemma G.3 we have that Φ S ( y ) ⊗ Φ S ( y ) ≥ 0 , ∀ y , and, cons equently , H snr I ( S ; Y ) ≤ 0 . Finally , applying again Lemma A.4 a nd vec T A vec B = T r A T B , the expression f or the Hessian in the corollary follo ws. Remark 6 : Obs erve that (63 ) ag rees with [9, Prp. 5] for sca lar Gauss ian c hannels. 11 W e now wonder if the con cavity result in Corollary 1 can be extended to more general q uantities tha n the scalar SNR. In the follo wing s ection we study the conc avity of the mu tual information with res pect to the squa red singular values of the precoder for the simple ca se where the left singular vectors of the preco der coincide with the eigen vectors of the cha nnel c ov arianc e matrix R H , which is commonly referred to as the case where the precode r diagonalizes the chan nel. B. Concavity in the s quared singular values of the pr ec oder when the precoder diagonalizes the chan nel Consider the eigendec omposition of the p × p chann el covariance matri x R H = U H Diag ( σ ) U T H , wh ere U H ∈ R p × p is an orthonormal matrix and the vector σ ∈ R p contains non-negative e ntries in d ecreasing order . Note that in the case where ra nk ( R H ) = p ′ < p , the last p − rank ( R H ) elements of the vector σ are z ero. Let us no w consider the singular value decomposition (SVD) of the p × m precoder matrix P = U P Diag  √ λ  V T P . For the case where m ≥ p , we have tha t U P ∈ R p × p is a n orthonormal matrix, the vector λ is p -dimen sional, and the ma trix V P ∈ R m × p contains orthono rmal column s such that V T P V P = I p . For the c ase m < p , the matrix U P ∈ R p × m contains orthonormal c olumns such t ha t U T P U P = I m , the vector λ is m -dimensiona l, a nd V P ∈ R m × m is an o rthonormal matrix. In the follo wing theorem we assume m ≥ p for the sake of simplicity , a nd we characterize the concavit y properties of the mutual information with resp ect to the e ntries of the sq uared singular values vector λ for the particular case where the left s ingular v ectors of the precoder coincide with the eigen vectors of the chann el covariance matrix, U P = U H . The result for the c ase m < p is stated after the following theorem, and is left without proof bec ause it follo ws similar steps . Theorem 5 (Mutual information Hessian with respect to the squared singular value s of the p r ecod er): Con sider Y = HP S + Z , whe re all the terms are d eﬁned as in The orem 3, for the particular cas e where the eigen vectors of the chan nel covariance matrix R H and the left singular vectors of the p recoder P ∈ R p × m coincide, i.e., U P = U H , and where we have m ≥ p . The n, the He ssian of the mutual information with res pect to the s quared singular values of the p recoder λ is: H λ I ( S ; Y ) = − 1 2 Diag ( σ ) E  Φ V T P S ( Y ) ◦ Φ V T P S ( Y )  Diag ( σ ) , (69) where we reca ll that A ◦ B de notes the Schur (or Ha damard) produ ct. Moreover , the Hes sian matrix H λ I ( S ; Y ) is negative s emideﬁnite, which implies that the mutual information is a c oncave function of the squa red singu lar values of the precoder . Pr oof: The Hessian of the mutua l i nformation H λ I ( S ; Y ) can be obtained from t he Hessian chain rule in Lemma B.5 as H λ I ( S ; Y ) = D T λ P H P I ( S ; Y ) D λ P + ( D P I ( S ; Y ) ⊗ I p ) H λ P . (70) Now we need to ca lculate D λ P and H λ P . The expression for D λ P follows a s D λ P = D λ vec  U H Diag  √ λ  V T P  (71) = ( V P ⊗ U H ) D λ vec Diag  √ λ  (72) = 1 2 ( V P ⊗ U H ) S p  Diag  √ λ  − 1 , (73) where, in (72 ), we have used Lemmas A. 4 and B.7.2 and whe re the last step follows from [ D λ vec Diag  √ λ  ] i +( j − 1) p, k = ∂ ∂ λ k  p λ i δ ij  = 1 2 √ λ k δ ij δ ik , { i, j, k } ∈ [1 , p ] , (74) and recalling the d eﬁnition of the redu ction matrix S p in (13 3), [ S p ] i +( j − 1) p, k = δ ij δ ik . Follo wing steps similar to the d eri vation of D λ P , the Hessian ma trix H λ P is obtained a ccording to H λ P = D λ ( D T λ P ) = 1 2 D λ  Diag  √ λ  − 1 S T p ( V T P ⊗ U T H )  (75) = 1 2 (( V P ⊗ U H ) S p ⊗ I p ) D λ  Diag  √ λ  − 1 (76) = − 1 4 (( V P ⊗ U H ) S p ⊗ I p ) S p  Diag  √ λ  − 3 . (77) 12 Plugging (73) and (77) in (70) and o perating together with the expressions for the Ja cobian matrix D P I ( S ; Y ) and the Hessian matrix H P I ( S ; Y ) given in Remark 3 of L emma 1 a nd in Re mark 5 of Theorem 4, respec ti vely , we ob tain H λ I ( S ; Y ) = 1 4  Diag  √ λ  − 1 S T p  E V T P S ⊗ Diag ( σ ) − 2  I p ⊗ Diag  σ ◦ √ λ  N p E  V T P Φ S ( Y ) V P ⊗ V T P Φ S ( Y ) V P  I p ⊗ Diag  σ ◦ √ λ   S p  Diag  √ λ  − 1 − 1 4  vec T  Diag  σ ◦ √ λ  E V T P S  S p ⊗ I p  S p  Diag  √ λ  − 3 , (78) where it can be noted that the de pendenc e of H λ I ( S ; Y ) on U H has disappe ared. Now , a pplying L emma A.2 the ﬁrst term in last equa tion bec omes  Diag  √ λ  − 1 S T p  E V T P S ⊗ Diag ( σ )  S p  Diag  √ λ  − 1 =  Diag  √ λ  − 1  E V T P S ◦ Diag ( σ )  Diag  √ λ  − 1 (79) = E V T P S ◦ Diag ( σ ◦ (1 / λ )) , (80) whereas the third term in (78) c an be expres sed as  vec T  Diag  σ ◦ √ λ  E V T P S  S p ⊗ I m  S p  Diag  √ λ  − 3 = Diag  Diag  σ ◦ √ λ  E V T P S  Diag  √ λ  − 3 (81) = E V T P S ◦ Diag ( σ ◦ (1 / λ )) , (82) where in (81 ) we h av e us ed that, for any sq uare matrix A ∈ R p × p , [ vec T ( A ) S p ] k = p X i,j =1 A ij δ ij δ ik = A k k (83) [( diag ( A ) T ⊗ I p ) S p ] k l = p X i,j A j j δ k i δ ij δ il = A ll δ k l . (84) Now , from (80) and (82) we see that the ﬁrst and third t erms in ( 78) cancel out and, recalling t hat V T P Φ S ( Y ) V P = Φ V T P S ( Y ) , the expression for the Hessian matrix H λ I ( S ; Y ) s impliﬁes to H λ I ( S ; Y ) = − 1 4  Diag  √ λ  − 1 E  Φ V T P S ( Y ) ◦ Diag  σ ◦ √ λ  Φ V T P S ( Y ) Diag  σ ◦ √ λ  + Diag  σ ◦ √ λ  Φ V T P S ( Y ) ◦ Φ V T P S ( Y ) Diag  σ ◦ √ λ  Diag  √ λ  − 1 , (85) where we have ap plied Lemma A.2 and have taken into acc ount that 2  I p ⊗ Diag  σ ◦ √ λ  N p =  I p ⊗ Diag  σ ◦ √ λ  + K p  Diag  σ ◦ √ λ  ⊗ I p  , (86) together with S T p K p = S T p . Now , from simple inspection o f the expres sion in (85) and recalling the properties of the Schur product, the d esired res ult follows. Remark 7 : Observe from the expression for the Hessia n in (69 ) that for the c ase where the cha nnel c ov ariance matrix R H is ran k deﬁcient, rank ( R H ) = p ′ < p , the la st p − p ′ entries of the vector σ are zero, which implies that the last p − p ′ rows a nd c olumns o f the Hess ian matrix are also zero. Remark 8 : F or the case whe re m < p , note that the matrix U P ∈ R p × m with the left singular v ec tors o f the precoder P is not square. W e thus conside r tha t it contains the m eige n vectors in U H assoc iated with the m largest eigen values of R H . In this c ase, the Hes sian matrix of the mutual information with respe ct to the squared singular values λ ∈ R m is also n egati ve semideﬁnite an d its expression becomes H λ I ( S ; Y ) = − 1 2 Diag ( ˜ σ ) E  Φ V T P S ( Y ) ◦ Φ V T P S ( Y )  Diag ( ˜ σ ) , (87) where we have de ﬁned ˜ σ = ( σ 1 σ 2 . . . σ m ) T and where w e reca ll that, in this cas e, V P ∈ R m × m . 13 W e now recover a result obtaine d in [17] were it was proved that the mutual information is concave in the power allocation for the case of pa rallel chann els. Note, howev er , that [17] c onsidered indepe ndence of the e lements in the signaling vector S , whe reas the following result shows that it is not neces sary . Cor ollary 2 (Mutual information co ncavity with r e spect to the power allocation in p arallel channels): Particu- larizing Theorem 5 for the case where the chann el H , the precode r P , and the n oise c ov ariance R Z are diag onal matrices, which implies that U P = U H = I p , it follo ws that the mutual information is a co ncave fun ction with respect t o the po wer allocation for parallel n on-interacting channels for an arbitrary distrib ution of the signa ling vector S . C. General ne gative results In the previous section we have pro ved that the mutua l information is a concave func tion of the s quared singular values o f the precode r matrix P f or the ca se whe re the left singu lar vectors of the precod er P coincide with the eigen vectors of the chann el correlation matrix, R H . For the gene ral case where these vectors d o not co incide, the mutual information is not a concave fun ction o f the squa red singular values of the p recoder . This fact is formally established in the following theo rem through a coun terexample. Theorem 6 (Gene ral no n-concavity of the mutua l information): Cons ider Y = HP S + Z , whe re all the terms are deﬁn ed as in Theorem 3. It then follo ws that, in general, the mutual information is not a c oncave function with respect to the s quared singu lar values of the precod er λ . Pr oof: W e present a simple two-dimensional coun terexample. Assume tha t the no ise is wh ite R Z = I 2 and consider the following c hannel matrix and preco der structure H ce =  1 β β 1  , P ce = Diag  √ λ  =  √ λ 1 0 0 √ λ 2  , (88) where β ∈ (0 , 1] and as sume tha t the distribution for the s ignal vector S has two equally likely mass points at the follo wing pos itions s (1) =  2 0  , s (2) =  0 2  . ( 89) Accordingly , we deﬁne the noiseless received vector as r ( k ) = H ce P ce s ( k ) , for k = { 1 , 2 } , w hich yields r (1) = 2  √ λ 1 β √ λ 1  , r (2) = 2  β √ λ 2 √ λ 2  . (90) W e now de ﬁne the mutual information for this counterexample as I ce ( λ 1 , λ 2 , β ) = I ( S ; H ce P ce S + W ) . (91) Since there are only two possible signals to be transmitted, s (1) and s (2) , it is clear that 0 ≤ I ce ≤ l og 2 . Moreover we will use the fact that, as R Z = I 2 , the mutual information is an increasing function of the squared distance of the tw o only possible recei ved vectors d 2 ( λ 1 , λ 2 , β ) = k r (1) − r (2) k 2 , which is denoted by I ce ( λ 1 , λ 2 , β ) = f  d 2 ( λ 1 , λ 2 , β )  , where f is an increasing function. For a ﬁxed β , we want to study the conc avit y of I ce ( λ 1 , λ 2 , β ) with res pect to ( λ 1 , λ 2 ) . In order to do so, we restrict ourselves to the study of conc avity along straight lines of the type λ 1 + λ 2 = ρ , with ρ > 0 , which is sufﬁcient to disprove the co ncavity . Gi ven three aligned points, such that the point in be tween is at the sa me dista nce of the other two, if a function is conc av e in ( λ 1 , λ 2 ) it means tha t the average of the function evaluated at the two extreme po ints is s maller than or equa l to the func tion ev aluated at the midpoint. Cons equently , con cavity can b e disproved by ﬁnding three aligned points, s uch that the aforeme ntioned con cavity prope rty is violated. Our three aligne d points will be ( ρ, 0) , (0 , ρ ) , and ( ρ/ 2 , ρ/ 2) a nd ins tead o f working with the mutual information we will work with the squa red distance a mong the rece i ved points bec ause closed form expressions are av ailable. Operating with the rec eiv ed vectors a nd rec alling tha t β ∈ (0 , 1] , we c an e asily ob tain d 2 ( ρ, 0 , β ) = d 2 (0 , ρ, β ) = 4 ρ (1 + β 2 ) > 4 ρ (92) d 2 ( ρ/ 2 , ρ/ 2 , β ) = 4 ρ (1 − β ) 2 < 4 ρ. (93) 14 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 θ , ( λ 1 = 10 θ , λ 2 = 10 (1− θ )) Mutual information (nats) Mutual information evaluated through the line λ 1 + λ 2 = 10 for different values of β β = 0 β = 0.4 β = 0.6 β = 0.8 β = 1 Fig. 2. G raphical representation of the mutual information I ce ( λ 1 , λ 2 , β ) in the counterexample for different value s of the channel parameter β along the line λ 1 + λ 2 = ρ = 10 . It can be readily seen that, ex cept for the case β = 0 , the function is not concav e. The ﬁrst equality means that the mutual information e valuated a t the extreme points has the same qua ntitati ve value and is alw ays above a certain threshold, f (4 ρ ) , indepen dently of the value of β . Consequently t he mean of the mutual i nformation ev aluated a t t he two extreme points is equal to the v alue on any of the e xtreme points. The second equality means tha t the func tion ev alua ted a t the po int in between is always below this same threshold. Now it is c lear tha t, given any ρ > 0 we c an always ﬁnd β such that 0 < β ≤ 1 and that I ce ( ρ, 0 , β ) = I ce (0 , ρ, β ) > I ce ( ρ/ 2 , ρ/ 2 , β ) , (94) which c ontradicts the con cavity hypothesis. For illustrati ve p urposes, in Fig. 2 we have depicted the mutual information for d if ferent values of the chann el parameter β for the c ounterexample in the proof of Theorem 6. Note that the func tion is only concave (and , in fact, linear) for the cas e where the chann el is diagonal, β = 0 , which a grees with the results in Theorem 5. D. Concavity results su mmary At the beginning of Section IV we h av e a r gued that the mutual information is conc av e with resp ect to the full transmitted signa l c ov arianc e ma trix Q for the c ase where the signa ling is Gaussian and also for the low SNR regime. Next we have discuss ed that this resu lt ca nnot be gene ralized for arbitrary s ignaling distrib utions beca use, in the general ca se, the mutua l information is not well deﬁned as a function o f Q alone. In Sections IV -A and IV -B, we have e ncountered two particular cases where the mutual information is a concave function. In the ﬁrst c ase, we h av e se en that the mutua l information is conc av e with respect to the SNR and, in the second , that the mutual information is a conc av e function of the squa red singular values o f the precod er , provided that the eigenv ec tors o f the ch annel covariance R H and the left s ingular vec tors o f the precode r P c oincide. For the general case where thes e vectors do not coinc ide in gene ral, we have shown in Section IV -C that the mutual information is n ot c oncave in the squared s ingular values. A s ummary of the different concavity results for the mutual information as a function of the conﬁgu ration of the linear vector Gaus sian channel c an be found in T able I. 15 T ABL E I S U M M A RY O F T H E C O N C A V I T Y T Y P E O F T H E M U T U A L I N F O R M A T I O N . ( X I N D I C AT E S C O N C A V I T Y , × I N D I C AT E S N O N - C O N C A V I T Y , A N D − I N D I C A T E S T H A T I T D O E S N O T A P P LY ) Scalar V ector Matrix Power , snr , Squared singu lar values, λ , T ransmit cov ariance, Q , Cases P = √ snr I P = U P Diag ` √ λ ) V T P Q = PR S P T General case: Y = HP S + Z X [5] [10] × (Section IV -C) − (Section III-E) (Section IV -A) Channel co varian ce R H and precoder P X X (Section IV -B) − (Section III-E) share singular/eige nv ectors: U P = U H Independe nt parallel communication: X X [17] X [17] R H = U P = V P = I , P S = Q i P S i (Note that Q is diago nal) Low SNR regime: R Z = N 0 I n , N 0 ≫ 1 X X X [16] Gaussian signaling : S = S G X X X V . M U LT I V A R I A T E E X T E N S I O N O F C O S T A ’ S E N T R O P Y P OW E R I N E Q U A L I T Y Having p roved that the mutual information and, hen ce, the dif ferential entropy are concave functions of the squared singular values λ of the precode r P for the case where the left singular vectors of the prec oder coincide with the eigen vectors of the channel cov ariance R H , H λ I ( S ; Y ) = H λ h ( Y ) ≤ 0 , we want to stud y if this last result can be s trengthened by proving the c oncavity in λ o f the e ntropy power . The en tropy power of the random vector Y ∈ R n was ﬁrst introdu ced by Shanno n in his seminal work [18] and is, since then , deﬁ ned as N ( Y ) = 1 2 π e exp  2 n h ( Y )  , (95) where h ( Y ) represe nts t he differential entropy as deﬁne d i n ( 6). Th e entropy power of a random vector Y represen ts the variance (or p ower) of a s tandard Gaus sian random vector Y G ∼ N  0 , σ 2 I n  such tha t both Y and Y G have identical differential e ntropy , h ( Y G ) = h ( Y ) . Costa p roved in [10] that, provided tha t the ran dom vector W is white Gauss ian distrib uted, then N ( X + √ t W ) ≥ (1 − t ) N ( X ) + tN ( X + W ) , (96) where t ∈ [0 , 1] . As Costa no ted, the abov e en tropy power inequa lity (EPI) is e quiv alent to the concavity of the entropy power function N ( X + √ t W ) with res pect to the parame ter t , or , formally 5 d 2 d t 2 N  X + √ t W  = H t N  X + √ t W  ≤ 0 . (97) Due to its inherent interest an d to the fact tha t the proof by Costa was rather in volved, simpliﬁed proofs of his result have bee n subse quently giv en in [19]–[22]. Additionally , in his paper Costa pres ented two extensions o f his main result in (97 ). Prec isely , he showed that the EPI is also v alid when the Gaussian vector W is no t white, and also f or the case wh ere t he t parameter is multiplying the arbitrarily d istrib uted rando m vector X instead: H t N  √ t X + W  ≤ 0 . (98) Observe that √ t in (98) play s the role of a sc alar precoder . W e next consider an extension of (98) to the ca se where the sca lar p recoder √ t is replace d by a multi variate prec oder P ∈ R p × m and a ch annel H ∈ R n × p for the particular case whe re the preco der left singular vectors co incide with the channel c ov ariance e igen vectors. Similarly as in Sec tion IV -B we as sume tha t m ≥ p . T he case m < p is presen ted after the proof of the followi ng theorem. Theorem 7 (Costa’ s mu ltivariate EPI): Consider Y = HP S + Z , where all the terms are deﬁned a s in The orem 3, for the pa rticular ca se where the eigenv ectors of the ch annel covariance matrix R H coincide w ith the left s ingular 5 The equiv alence between equations (97) and (96) is due to the fact that the function N ( X + √ t W ) is twice differen tiable almost e verywhere thanks t o the smoothing properties of the added Gaussian noise. 16 vectors of the precoder P ∈ R p × m and where we assume that m ≥ p . It then f ollows that the entropy po wer N ( Y ) is a co ncave function of λ , i.e., H λ N ( Y ) ≤ 0 . Moreover , the He ssian matrix of the entropy p ower func tion N ( Y ) with respect to λ is given by H λ N ( Y ) = N ( Y ) n Diag ( σ ) diag ( E V T P S ) diag ( E V T P S ) T n − E n Φ V T P S ( Y ) ◦ Φ V T P S ( Y ) o ! Diag ( σ ) , (99) where we recall that diag ( E V T P S ) is a column vector w ith the diag onal e ntries of the matrix E V T P S . Pr oof: First, let us prove (99). From the d eﬁnition of the entropy p ower in (95) and applying the chain rule for He ssians in (143) we obtain H λ N ( Y ) = D T λ h ( Y ) · H h ( Y ) N ( Y ) · D λ h ( Y ) + D h ( Y ) N ( Y ) · H λ h ( Y ) = 2 N ( Y ) n  2 D T λ h ( Y ) D λ h ( Y ) n + H λ h ( Y )  . (100) Now , recalling from [5, Eq. (61)] t ha t D T λ h ( Y ) = (1 / 2) Diag ( σ ) diag ( E V T P S ) and incorporating the expression for H λ h ( Y ) calculated in T heorem 5 , the resu lt in (99) follows. Now that a explicit e xpres sion for the Hessian matrix has be en ob tained, we wish t o prove that it is negativ e semideﬁnite. Note from ( 100) that, except for the positive factor 2 N ( Y ) /n , the Hessian matrix H λ N ( Y ) is the sum of a ran k one positiv e se mideﬁnite matrix and the He ssian matrix of the differential entropy , which is negative semideﬁnite according to Theorem 5. Conseque ntly , the deﬁniteness o f H λ N ( Y ) is, a priori, undetermined, and some further developments are needed to determine it, which is what we d o next. First c onsider the po siti ve semideﬁn ite matrix A ( y ) ∈ S p ′ + , which is obtained by s electing the ﬁrst p ′ = ra nk ( R H ) columns an d rows o f the positive semideﬁnite matrix Diag  √ σ  Φ V T P S ( y ) Diag  √ σ  , [ A ( y )] ij =  Diag  √ σ  Φ V T P S ( Y ) Diag  √ σ  ij , { i, j } = 1 , . . . , p ′ . (101) W ith this deﬁnition, it is now ea sy to se e that the expression E { diag ( A ( y )) } E  diag ( A ( y )) T  n − E  A ( y ) ◦ A ( y )  (102) coincides (up to the factor 2 N ( Y ) / n ) with the ﬁrst p ′ rows and columns of the Hes sian matrix H λ N ( Y ) in (99). Re calling t ha t the remaining e lements of the Hessian matrix H λ N ( Y ) are zero due to the presence of the matrix Diag ( σ ) , it is su f ﬁcient to sh ow that the expres sion in (102) is negativ e semideﬁnite to prove the negativ e semideﬁnitene ss of H λ N ( Y ) . Now , w e apply Proposition G. 9 to A ( y ) , yielding A ( y ) ◦ A ( y ) ≥ diag ( A ( y )) diag ( A ( y )) T p ′ . (103) T ak ing the expectation in bo th sides of (103), we have E { A ( Y ) ◦ A ( Y ) } ≥ E  diag ( A ( Y )) diag ( A ( Y )) T  p ′ , (104) From Lemma G.10 we know that E  diag ( A ( Y )) diag ( A ( Y )) T  ≥ E  diag ( A ( Y ))  E  diag ( A ( Y )) T  , from which it follows that E { A ( Y ) ◦ A ( Y ) } ≥ E { diag ( A ( Y )) } E  diag ( A ( Y )) T  p ′ . Since the op erators diag ( A ) and expectation commu te we ﬁnally obtain the des ired result a s E { A ( Y ) ◦ A ( Y ) } ≥ diag ( E { A ( Y ) } ) diag ( E { A ( Y ) } ) T p ′ ≥ diag ( E { A ( Y ) } ) diag ( E { A ( Y ) } ) T n , 17 where in last inequa lity we have used that p ′ = ra nk ( R H ) ≤ min { n, p } ≤ n , as R H = H T R − 1 Z H a nd H ∈ R n × p . Remark 9 : For the c ase where m < p , we a ssume that the matrix U P ∈ R p × m contains the m e igen vectors in U H assoc iated with the m lar gest eigen values o f R H . It the n follo ws that the He ssian ma trix H λ N ( Y ) is also negati ve semideﬁnite a nd its express ion is the s ame given in (99 ) by simply replac ing σ by ˜ σ = ( σ 1 . . . σ m ) T . Remark 1 0: For the case where R H = I p and p = m we recover our results in [23]. Another poss ibility of multi variate g eneralization o f Co sta’ s EPI would be to study the concavity of N ( X + Z ) with resp ect to the covariance of the noise vector R Z . Numerical computations se em to indicate that the entropy power is indeed concave in R Z . Howev er , a p roof ha s been e lusi ve, mainly due to the fact that, differently from the conditional MSE, Φ S ( y ) , the conditional Fisher information matrix Γ Y ( y ) , which appears w hen dif ferentiating with res pect to R Z , is not a p ositi ve d eﬁnite func tion ∀ y . V I . E X T E N S I O N S T O T H E C O M P L E X FI E L D So far , the pres ented resu lts ho ld for the ca se where all the variables and p arameters take values from the ﬁeld of real nu mbers. Due to the simplicity of working with b aseba nd equi valent models, it is a co mmon practice when studying communication systems to mode l the pa rameters and rando m v ariables in t he complex ﬁeld, and work with the following c omplex linear vector Gaus sian channe l: Y c = G c S c + Z c , (105) where G c ∈ C n × m and all the other dimension s are deﬁned accordingly and the noise Z c is a zero mean circularly symmetric (or proper [24]) complex Gaussian random vector with cov arianc e E  Z c Z † c  = R Z c . The complex model in (105) can be equiv alently rewri tten by deﬁning an extende d doub le-dimensional real mod el of (105). W e consider the exten ded vectors an d matrices Y r =  ℜ e Y c ℑ m Y c  , G r =  ℜ e G c −ℑ m G c ℑ m G c ℜ e G c  , S r =  ℜ e S c ℑ m S c  , Z r =  ℜ e Z c ℑ m Z c  , (106) and then rewrite the inpu t-output relation in (105) acco rding to the real model Y r = G r S r + Z r . (107) W ith these deﬁnitions, we have that, for example, h ( Y c ) = h ( Y r ) o r I ( S c ; Y c ) = I ( S r ; Y r ) [24]. W orking with the real model in (107), it is possible to ca lculate, for example, the Jacobian of the mutual information with respect to the c omplex precoder G c by u sing the res ults for the real cas e and the chain rule a s D G c I ( S c ; Y c ) = D G c I ( S r ; Y r ) , 1 2  D ℜ e G c I ( S r ; Y r ) − j D ℑ m G c I ( S r ; Y r )  (108) = 1 2  D G r I ( S r ; Y r ) D ℜ e G c G r − j D G r I ( S r ; Y r ) D ℑ m G c G r  , (109) where we have used the conv ention for the complex deriv ative deﬁ ned in [25 ] an d where the Jaco bians D ℜ e G c G r and D ℑ m G c G r can be found using the d eﬁnition in (106) and the res ults in [15, Chapter 9]. Similarly , express ions for H G c I ( S c ; Y c ) or H G ∗ c I ( S c ; Y c ) can also be obtained by succes siv e application of the complex deriv ative deﬁnition a nd the cha in rule. In the following we prese nt a simpliﬁed complex coun terpart of the Hessian res ult in The orem 5 for the real case, which, d espite its simplicity , illustrates the pa rticulariti es of the complex cas e. Theorem 8 (Mutual information He ssian in the complex c ase): Cons ider the complex signal mode l Y c = Diag  √ λ c  S c + W c , wh ere Diag  √ λ c  ∈ S n + is a n arbitrary deterministic diag onal matrix ( λ c ∈ R n ), the signaling S c ∈ C n is a rbitraril y d istrib uted, and the noise vector W c ∈ C n follo ws a wh ite Gau ssian proper distrib ution and is independe nt of the input S c . Then , the d if ferential e ntropy of the output vec tor Y c , h ( Y ) , satisﬁes H λ c h ( Y c ) = − E n Φ S c ( Y ) ◦ Φ ∗ S c ( Y ) + Φ S c ( Y ) ◦ Φ ∗ S c ( Y ) o , (110) 18 where we have de ﬁned Φ S c ( y ) = E n ( S c − E { S c | y } ) ( S c − E { S c | y } ) †    y o (111) Φ S c ( y ) = E n ( S c − E { S c | y } ) ( S c − E { S c | y } ) T    y o . (112) Pr oof: The real extende d mod el of Y c = Diag  √ λ c  S c + W c is readily ob tained as Y r = Diag  p λ r  S r + N r =  Diag  √ λ c  0 0 Diag  √ λ c    ℜ e S c ℑ m S c  + 1 √ 2 W r , (113) where n ow we have E  W r W T r  = I 2 n . Now , a pplying the chain rule for λ T r =  λ T c λ T c  the eleme nts of the He ssian matrix read as [ H λ c h ( Y c )] ij = ∂ 2 h ( Y c ) ∂ λ c,i ∂ λ c,j = ∂ 2 h ( Y r ) ∂ λ r,i ∂ λ r,j + ∂ 2 h ( Y r ) ∂ λ r,i + n ∂ λ r,j + ∂ 2 h ( Y r ) ∂ λ r,i ∂ λ r,j + n + ∂ 2 h ( Y r ) ∂ λ r,i + n ∂ λ r,j + n . The four terms in the complex Hessian can be identiﬁed with the eleme nts o f the Hess ian for the real cas e, which thanks to The orem 5 c an be written as ∂ 2 h ( Y r ) ∂ λ r,i ∂ λ r,j = − 2 E n ( E { ( S r,i − E { S r,i | Y } ) ( S r,j − E { S r,j | Y } ) | Y } ) 2 o . (114) Noting that S r,i = ℜ e S c,i and S r,i + n = ℑ m S c,i , we can ﬁ nally w rite [ H λ c h ( Y c )] ij = − 2 E n ( E {ℜ e ( S c,i − E { S c,i | Y } ) ℜ e ( S c,j − E { S c,j | Y } ) | Y } ) 2 o (115) − 2 E n ( E {ℜ e ( S c,i − E { S c,i | Y } ) ℑ m ( S c,j − E { S c,j | Y } ) | Y } ) 2 o (116) − 2 E n ( E {ℑ m ( S c,i − E { S c,i | Y } ) ℜ e ( S c,j − E { S c,j | Y } ) | Y } ) 2 o (117) − 2 E n ( E {ℑ m ( S c,i − E { S c,i | Y } ) ℑ m ( S c,j − E { S c,j | Y } ) | Y } ) 2 o . (118) Now , w ith the de ﬁnitions in (112) and (111) and a slight amount of a lgebra, the resu lt follows. It is impo rtant to highlight that, whe reas in the real ca se the con ditional MMSE matrix Φ S ( y ) was enough to compute the Hessian, in the complex c ase, in addition to the con ditional MMSE matrix (as de ﬁned in (11 1)) the re is an extra matrix Φ S c ( y ) deﬁ ned as in (112), and which is refe rred to as the c onditional ps eudo-MMSE matrix. A P P E N D I X A. The commutation K q ,r , symmetriza tion N q , duplication D q , and r edu ction S q matrices. In this appendix we present four matrices that are very important when calculating Hessian matrices. Th e deﬁnitions of the commutation K q ,r , symmetrization N q , and duplication D q matrices have be en taken from [15] and the redu ction matrix S q has be en de ﬁned by the autho rs of the present work. Gi ven any matr ix A ∈ R q × r , the two v ectors vec A and vec A T contain the same elements b ut arranged in a dif ferent order . Conseq uently , there exists a unique permutation ma trix K q ,r ∈ R q r × q r independ ent of A , wh ich is called c ommutation matrix, that sa tisﬁes vec A T = K q ,r vec A , and K T q ,r = K − 1 q ,r = K r,q . (119) It is easy to verify that the entries of the c ommutation matrix satisfy [ K q ,r ] i +( j − 1) r,i ′ +( j ′ − 1) q = δ i ′ j δ j ′ i , { i ′ , j } ∈ [1 , q ] , { i, j ′ } ∈ [1 , r ] . (120) The main reason wh y we have introduc ed the commu tation matrix is due to the property from wh ich it obtains its n ame, a s it en ables us to commute the two matrices of a Kronec ker p roduct [15, Th eorem 3. 9], K s,q ( A ⊗ B ) = ( B ⊗ A ) K t,r , (121) where we have co nsidered A ∈ R q × r and B ∈ R s × t . 19 W e also deﬁne K q = K q ,q for the case whe re the commutation matrix is sq uare. An impo rtant property of the square ma trix K q is giv en in the following lemma. Lemma A.1: Let A ∈ R q × r and B ∈ R q × t . The n, [ A ⊗ B ] i +( j − 1) q ,k +( l − 1) t = A j l B ik , [ K q ( A ⊗ B )] i +( j − 1) q ,k +( l − 1) t = A il B j k , { i, j } ∈ [1 , q ] , k ∈ [1 , t ] , l ∈ [1 , r ] , (122) Pr oof: The equality for the entries of the product A ⊗ B follo ws straightfor wardly fr om the deﬁnition [26, Section 4.2]. Con cerning the en tries of K q ( A ⊗ B ) , we have [ K q ( A ⊗ B )] i +( j − 1) q ,k +( l − 1) t = q X i ′ =1 q X j ′ =1 [ K q ] i +( j − 1) q ,i ′ +( j ′ − 1) q [ A ⊗ B ] i ′ +( j ′ − 1) q, k +( l − 1) t (123) = q X i ′ =1 q X j ′ =1 δ i ′ j δ j ′ i A j ′ l B i ′ k (124) = A il B j k , (125) where we have us ed the express ion for the eleme nts of K q in (12 0). When ca lculating Jacobian and He ssian matrices, the form I q + K q is usua lly encountered. Hence , we deﬁne the symmetrization ma trix N q = 1 2 ( I q + K q ) , wh ich is sing ular and has the follo wing prop erties N q = N T q = N 2 q (126) N q = N q K q = K q N q . (127) The name of the sy mmetrization ma trix come s from the fact that given a ny squa re matrix A ∈ R q × q , then N q vec A = 1 2  vec A + vec A T  = 1 2 vec  A + A T  . (12 8) The last important p roperty of the sy mmetrization matrix is N q ( A ⊗ A ) = ( A ⊗ A ) N q , (129) which follows from the de ﬁnition N q = 1 2 ( I q + K q ) together with (12 1). Another important matrix related to the ca lculation o f Jac obian and Hessian matrices, spe cially when symmetric matrices are in volved, is the dup lication matrix D q . Giv en any symmetric matrix R ∈ S q , we d enote by vech R the 1 2 q ( q + 1) -dimensional vec tor that is obtained from vec R by eliminating all the repea ted eleme nts that lie strictly above the d iagonal of R . The n, the duplica tion matrix D q ∈ R q 2 × 1 2 q ( q +1) fulﬁlls [15, S ection 3.8 ] vec R = D q vech R , (130) for any q -dimensional symmetric matrix R . Th e duplication matrix takes its na me from the fact that it d uplicates the entries o f vech R which corres pond to off-diagonal elements of R to produce the elements o f vec R . Since D q has full column rank, it is possible to in vert the trans formation in (130) to ob tain vech R = D + q vec R =  D T q D q  − 1 D T q vec R . (131) The most important properties o f the duplication matrix are [15, Theo rem 3.12] K q D q = D q , N q D q = D q , D q D + q = N q , D + q N q = D + q . (132) The last one of the matrices introdu ced in this a ppendix is the reduc tion matrix S q ∈ R q 2 × q . The entries of the reduction matrix a re d eﬁned as [ S q ] i +( j − 1) q ,k = δ ij δ ik = δ ij k , { i, j, k } ∈ [1 , q ] (133) from which it is ea sy to verify tha t the red uction matrix fulﬁlls K q S q = S q , N q S q = S q . (134) 20 Howe ver , the most important property of the reduction matrix is tha t it can be us ed to redu ce the Kronecker p roduct of two matrices to their Schu r product as it is de tailed in the next lemma. Lemma A.2: Let A ∈ R q × r , B ∈ R q × r . Then, S T q ( A ⊗ B ) S r = S T q ( B ⊗ A ) S r = A ◦ B . (135) Pr oof: From the expression for the e lements of the Kronecker product in Lemma A.1 an d the expres sion for the elements o f the redu ction matrix we have that, for a ny i ∈ [1 , q ] a nd j ∈ [1 , r ] ,  S T q ( A ⊗ B ) S r  i,j = q X k ,l r X k ′ ,l ′ [ S q ] k +( l − 1) q ,i [ A ⊗ B ] k +( l − 1) q ,k ′ +( l ′ − 1) r [ S r ] k ′ +( l ′ − 1) r ,j (136) = q X k ,l r X k ′ ,l ′ A ll ′ B k k ′ δ k li δ k ′ l ′ j (137) = A ij B ij , (138) from which the res ult in the lemma follows. Finally , to co nclude this ap pendix, we present two ba sic lemmas conce rning the Kronec ker prod uct and the vec operator . Lemma A. 3: Let A , B , F , and T be four matrices suc h that the produc ts AB and FT are deﬁned. Then , ( A ⊗ F )( B ⊗ T ) = AB ⊗ FT . Pr oof: See [15, Chapter 2]. Lemma A.4: Let A , T , and B b e three matrices suc h that the product A TB is d eﬁned. Th en, vec ( A TB ) =  B T ⊗ A  vec T . (139) Pr oof: See [15, Theorem 2.2] or [27, Proposition 7.1.6]. B. Con ventions u sed for Jacobian and Hess ian matrices In this work we make extensi ve use of d if ferentiation of matrix functions Ψ with respect to a ma trix argument T . From the many possibilities of displaying the partial de ri vati ves ∂ r Ψ st /∂ T ij · · · ∂ T k l , we will stick to the “goo d notation” introduced by Magnu s and Ne udecker in [15 , S ection 9.4] which is b rieﬂy reproduc ed next for the sake of c ompleteness . Deﬁnition B.1: L et Ψ be a diff erentiable q × t real matrix func tion of an r × s matrix of real variables T . The Jacobia n matrix o f Ψ at T = T 0 is the q t × r s matrix D T Ψ ( T 0 ) = ∂ vec Ψ ( T ) ∂ ( vec T ) T     T = T 0 . (14 0) Remark B.2: T o properly dea l with the ca se w here Ψ is a sy mmetric matrix, the vec operator in the numerator in (140) h as to be replaced b y a vech o perator to avoid obtaining rep eated elements. Similarly , vech h as to replac e vec in the denomina tor in (140) for the cas e whe re T is a symme tric matrix. For p ractical purpos es, it is eno ugh to calculate the Ja cobian wi thout taking into account any symmetry p roperties and t he n a dd a left factor D + q to the obtained Jac obian when Ψ is symmetric and/or a right factor D r when T is sy mmetric. Th is p roceeding will become mo re c lear in the examples given below . Deﬁnition B.3: Let Ψ be a twice differentiable q × t real matrix function of an r × s matrix of real variables T . The Hessian ma trix of Ψ at T = T 0 is the q trs × r s matrix H T Ψ ( T 0 ) = D T  D T T Ψ ( T )     T = T 0 = ∂ ∂ ( vec T ) T vec  ∂ vec Ψ ( T ) ∂ ( vec T ) T  T      T = T 0 . (141) One can verify that the obtained Hes sian ma trix for the matrix fun ction Ψ is the stac king o f the q t Hessian ma trices correspond ing to ea ch ind i vidual element of vector vec Ψ . Remark B.4: Similarly to the Jac obian case, when Ψ or T a re symmetric matrices, the vech operator has to replace the vec o perator whe re approp riate in (141). 21 One of the major ad vantages of using the no tation of [15] is that a simple chain rule can be app lied for both the Jacobia n an d Hes sian matrices, as de tailed in the follo wing lemma. Lemma B.5 ( [ 15, Th eorems 5.8 and 6.9]): Let Υ be a twice differentiable u × v real matrix function of a q × t real matrix argument. Let Ψ be a twice diff erentiable q × t real matrix function of an r × s ma trix of rea l variables T . Deﬁne Ω ( T ) = Υ ( Ψ ( T )) . The Jacob ian a nd Hes sian matrices of Ω ( T ) at T = T 0 are: D T Ω ( T 0 ) = ( D Ψ Υ ( Ψ 0 ))( D T Ψ ( T 0 )) (142) H T Ω ( T 0 ) = ( I uv ⊗ D T Ψ ( T 0 )) T H Ψ Υ ( Ψ 0 )( D T Ψ ( T 0 )) + ( D Ψ Υ ( Ψ 0 ) ⊗ I r s ) H T Ψ ( T 0 ) , (143) where Ψ 0 = Ψ ( T 0 ) . The notation introduced above uniﬁes the s tudy of scalar ( q = t = 1) , v ector ( t = 1) , and matrix functions of scalar ( r = s = 1) , vector ( s = 1) , or matr ix arguments into the study of vector functions of v ec tor arguments through the use o f the vec a nd vech operators. However , the idea of a rranging the partial deriv ativ es of a scalar function of a matrix argument ψ ( T ) into a matrix rather than a vector is quite appea ling and sometimes use ful, so we will also make use of the notation described next. Deﬁnition B.6: Let ψ be d if ferentiable sca lar function of an r × s ma trix of real variables T . The gradient of ψ at T = T 0 is the r × s matrix ∇ T ψ ( T 0 ) = ∂ ψ ∂ T     T = T 0 . (144) It is easy to verify that D T ψ ( T 0 ) = vec T ∇ T ψ ( T 0 ) . W e now g i ve expressions for the mos t c ommon Jacobian a nd Hessian matrices encou ntered during our develop- ments. Lemma B.7: Consider A ∈ R q × r , T ∈ R r × s , B ∈ R s × t , R ∈ S s + , a nd f ∈ R r × 1 , s uch that f is a func tion of T . Then, the following h olds: 1) If Ψ = A TB , we have D T Ψ =  B T ⊗ A  . If, in addition, B is a function o f T , then we have D T Ψ =  B T ⊗ A  + ( I t ⊗ A T ) D T B . 2) If Ψ = Af , we have D T Ψ = A D T f . 3) If Ψ = A T A T , with T b eing a symmetric matrix, we have D T Ψ = D + q  A ⊗ A  D r . 4) If Ψ = B T T T A T , we have D T Ψ =  A ⊗ B T  K r,s . 5) If Ψ = ( T ⊗ A ) , we have D T Ψ = ( I s ⊗ K t,r ⊗ I q )( I r s ⊗ vec A ) and if Ψ = ( A ⊗ T ) , we have D T Ψ = ( I t ⊗ K s,q ⊗ I r )( vec A ⊗ I r s ) , wh ere in this ca se we have assumed that A ∈ R q × t . 6) If Ψ = T − 1 , we have D T Ψ = −  T T ⊗ T  − 1 , where T is a s quare in vertible matrix. 7) If Ψ = A TR T T A T , we have D T Ψ = 2 D + q ( A T R ⊗ A ) . 8) If Ψ = A TR T T A T , we have H T Ψ = 2( D + q ⊗ I r s )( I q ⊗ K q ,s ⊗ I r )  A ⊗ R ⊗ vec A T  K r,s . Pr oof: Th e iden tities from 1) to 7) can be fou nd in [15, Chapter 9]. Concerning identity 8 ), it can be calcu lated through the deﬁnition of the Hes sian as H T  A TR T T A T  = 2 D T  D + q ( A T R ⊗ A )  T = 2 D T  R T T A T ⊗ A T  D + T q  . (145) Now , w e deﬁne Υ = R T T A T and Ω = Υ ⊗ A T and apply the ch ain rule twice to obtain D T  R T T A T ⊗ A T  D + T q  = D Ω  ΩD + T q  D Υ Ω D T Υ , (146) from which the res ult in 8) follows by applica tion of identities 1), 5), and 4 ) an d from Le mma A.3. C. Dif ferential properties of the quantities P Y ( y ) , P Y | S ( y | s ) , and E { S | y } . In this a ppendix we present a n umber of lemma s which are used in the proofs of Theorems 1 a nd 2 in Appendices D and E. In the p roofs o f the following lemma s we interchange the order of d if ferentiation and expectation, which can be justiﬁed following s imilar s teps as in [1, Append ix B]. 22 Lemma C. 1: Let Y = X + C N , wh ere X is arbitrarily d istrib uted a nd whe re N is a z ero-mean Gaus sian random variable with covariance matr ix R N and inde penden t of X . Then, the probability d ensity function P Y ( y ) satisﬁes ∇ C P Y ( y ) = H y P Y ( y ) CR N . (147) Pr oof: First, we recall that P Y ( y ) = E  P Y | X ( y | X )  . Thus the matrix g radient of the den sity P Y ( y ) with respect to C is ∇ C P Y ( y ) = E  ∇ C P Y | X ( y | X )  . Th e compu tation of the inner the gra dient ∇ C P Y | X ( y | X ) ca n be p erformed by rep lacing G s by x in (3) togethe r with ∇ C a T  CR N C T  − 1 a = − 2  CR N C T  − 1 aa T  CR N C T  − 1 CR N (148) ∇ C det  CR N C T  = 2 det  CR N C T  CR N C T  − 1 CR N , (149) where a is a ﬁxed vector of the app ropriate dimension and wh ere we have used [15, p. 178, Exercise 9.9.3] and the ch ain rule in Lemma B.5 in (148) and, [15, p. 180, Exercise 9.10.2 ] in (149). W ith thes e expressions at hand and recalling that R Z = C R N C T , the expres sion for the grad ient ∇ C P Y ( y ) is equ al to ∇ C P Y ( y ) = E n P Y | X ( y | X )  R − 1 Z ( y − X )( y − X ) T R − 1 Z − R − 1 Z o CR N . (150) T o complete the p roof, we n eed to calcula te the Hess ian matrix, H y P Y ( y ) . First co nsider the following two Jacobia ns D y ( y − x ) T R − 1 Z ( y − x ) = 2( y − x ) T R − 1 Z (151) D y R − 1 Z ( y − x ) = R − 1 Z , (152) which follow directly from [15, Sec tion 9.9, T able 3] an d [15, Section 9.1 2]. Now , from (151), we c an ﬁrst obtain the Jacob ian row vec tor D y P Y ( y ) as D y P Y ( y ) = − E n P Y | X ( y | X )( y − X ) T R − 1 Z o . (153) Recalling the expression in (152) and tha t H y P Y ( y ) = D y  D T y P Y ( y )  the Hes sian matrix be comes H y P Y ( y ) = E n P Y | X ( y | X )  R − 1 Z ( y − X )( y − X ) T R − 1 Z − R − 1 Z o . (154) By simple ins pection from (150) an d (154) the result in (147) follows. Lemma C.2: Le t Y = G S + Z , where S is arbitrarily distributed and where Z is a zero-mean Ga ussian random variable with c ov ariance matrix R Z and independ ent o f S . Then, P Y ( y ) sa tisﬁes ∇ G P Y ( y ) = − E n D T y P Y | S ( y | S ) S T o . (155) Pr oof: First we write D y P Y | S ( y | s ) = − P Y | S ( y | s )( y − G s ) T R − 1 Z , (156) where we have us ed (151). Now , we simply need to notice tha t ∇ G P Y | S ( y | s ) = P Y | S ( y | s ) R − 1 Z ( y − G s ) s T = − D T y P Y | S ( y | s ) s T , (157) where we h av e us ed ∇ G ( y − G s ) T R − 1 Z ( y − G s ) = − 2 R − 1 Z ( y − G s ) s T , which follows from [15, Section 9 .9, T ab le 4]. Recalling tha t ∇ G P Y ( y ) = E  ∇ G P Y | S ( y | S )  the result follo ws. Lemma C.3: Le t Y = G S + Z , where S is arbitrarily distributed and where Z is a zero-mean Ga ussian random variable with c ov ariance matrix R Z and independ ent o f S . Then, D y E { S | y } = Φ S ( y ) GR − 1 Z . (158) 23 Pr oof: D y E { S | y } = D y E  S P Y | S ( y | S ) P Y ( y )  (159) = E  S P Y ( y ) D y P Y | S ( y | S ) − P Y | S ( y | S ) D y P Y ( y ) P Y ( y ) 2  (160) = E ( S − P Y | S ( y | S )( y − G S ) T R − 1 Z + P Y | S ( y | S )( y − G E { S | y } ) T R − 1 Z P Y ( y ) ) (161) =  E  S S T   y  − E { S | y } E  S T   y  G T R − 1 Z . (162) Now , from the deﬁnition in (9) the result in the lemma follows. Note tha t we have use d the express ion in (15 6) for D y P Y | S ( y | S ) and that from (153 ) we c an write D y P Y ( y ) = − E n P Y | X ( y | X )( y − X ) T R − 1 Z o = − P Y ( y )( y − G E { S | y } ) T R − 1 Z . (163) Lemma C.4: Le t Y = G S + Z , where S is arbitrarily distributed and where Z is a zero-mean Ga ussian random variable with c ov ariance matrix R Z and independ ent o f S . Then, the Jacob ian a nd Hes sian of log P Y ( y ) sa tisfy D y log P Y ( y ) = ( E { X | y } − y ) T R − 1 Z (164) H y log P Y ( y ) = R − 1 Z Φ X ( y ) R − 1 Z − R − 1 Z . (165) Pr oof: Rec alling the express ion in (153) we can write D y log P Y ( y ) = 1 P Y ( y ) D y P Y ( y ) (166) = − 1 P Y ( y ) E  P Y | X ( y | X )( y − X ) T R − 1 Z  (167) = ( E { X | y } − y ) T R − 1 Z . (16 8) From the Ja cobian expres sion, the Hes sian c an be computed as H y log P Y ( y ) = D y R − 1 Z ( E { X | y } − y ) (169) = R − 1 Z ( G D y E { S | y } − I n ) (170) = R − 1 Z ( GΦ S ( y ) G T R − 1 Z − I n ) , (171) where the expression for D y E { S | y } follows from Lemma C.3. Lemma C.5: Let Y = G S + Z , wh ere S is arbitrarily distrib uted (with i -th eleme nt deno ted by S i ) a nd whe re Z is a zero-mean Ga ussian rand om variable with covariance ma trix R Z and independ ent o f S . Then, ∇ G E { S i | y } = 1 P Y ( y )  E { S i | y } E  D T y P Y | S ( y | S ) S T  − E  S i D T y P Y | S ( y | S ) S T   . Pr oof: The proof follows from this chain of equa lities ∇ G E { S i | y } = ∇ G E  S i P Y | S ( y | S ) P Y ( y )  = E  S i P Y ( y ) ∇ G P Y | S ( y | S ) − P Y | S ( y | S ) ∇ G P Y ( y ) P Y ( y ) 2  = E ( S i − D T y P Y | S ( y | S ) S T P Y ( y ) + P Y | S ( y | S ) E  D T y P Y | S ( y | S ) S T  P Y ( y ) 2 !) (172) = 1 P Y ( y )  − E  S i D T y P Y | S ( y | S ) S T  + E  S i P Y | S ( y | S ) P Y ( y )  E  D T y P Y | S ( y | S ) S T   , where (17 2) follows from Lemma C.2 and from (157). 24 D. Pr oof of The or em 1 Let us begin by co nsidering the expres sion for the entries of the Jacobian of the vector vec J Y , wh ich is [ D C vec J Y ] i +( j − 1) n,k +( l − 1) n = D C kl [ J Y ] ij , w here throughout this proof { i, j, k } ∈ [1 , n ] and l ∈ [1 , n ′ ] . From (13) a nd (14) we have that the e ntries of the Fishe r information matrix are given b y [ J Y ] ij = E  [ Γ Y ( Y )] ij  = − Z P Y ( y ) ∂ 2 log P Y ( y ) ∂ y i ∂ y j d y . (173) W e now dif ferentiate the expres sion above with res pect to the entries of the matrix C and we get D C kl [ J Y ] ij = − ∂ ∂ C k l Z P Y ( y ) ∂ 2 log P Y ( y ) ∂ y i ∂ y j d y (174) = − Z ∂ P Y ( y ) ∂ C k l ∂ 2 log P Y ( y ) ∂ y i ∂ y j d y − Z P Y ( y ) ∂ 2 ∂ y i ∂ y j  1 P Y ( y ) ∂ P Y ( y ) ∂ C k l  d y , (175) where the interchange of the order of integration and dif ferentiation can be justiﬁed f rom the Dominated Con vergence Theorem follo wing similar s teps a s in [1, Appendix B]. Now , u sing Lemma C.1 we transform the p artial de ri vati ves with res pect to C into deriv ativ es with res pect to the entries of vector y , yielding D C kl [ J Y ] ij = − Z [ H y P Y ( y ) CR N ] k l ∂ 2 log P Y ( y ) ∂ y i ∂ y j d y − Z P Y ( y ) ∂ 2 ∂ y i ∂ y j  1 P Y ( y ) [ H y P Y ( y ) CR N ] k l  d y . (176) Expressing the e lements of [ H y P Y ( y ) CR N ] k l as the sum of the product of the elemen ts of H y P Y ( y ) and CR N we ge t D C kl [ J Y ] ij = − n X r =1 [ CR N ] r l  Z ∂ 2 P Y ( y ) ∂ y k ∂ y r ∂ 2 log P Y ( y ) ∂ y i ∂ y j d y + Z P Y ( y ) ∂ 2 ∂ y i ∂ y j  1 P Y ( y ) ∂ 2 P Y ( y ) ∂ y k ∂ y r  d y  . (177) W e can n ow c ombine the integral identities (221) an d (222) deriv ed in P roposition H.4 to rewrite the ﬁrst term in the last eq uation a s Z ∂ 2 P Y ( y ) ∂ y k ∂ y r ∂ 2 log P Y ( y ) ∂ y i ∂ y j d y = Z P Y ( y ) ∂ 4 log P Y ( y ) ∂ y i ∂ y j ∂ y k ∂ y r d y . (178) Now , a pplying a scalar version of the logarithm iden tity in (12), 1 P Y ( y ) ∂ 2 P Y ( y ) ∂ y k ∂ y r = ∂ 2 log P Y ( y ) ∂ y k ∂ y r + ∂ log P Y ( y ) ∂ y k ∂ log P Y ( y ) ∂ y r , (179) the secon d term in the right h and side of (177) bec omes Z P Y ( y ) ∂ 2 ∂ y i ∂ y j  1 P Y ( y ) ∂ 2 P Y ( y ) ∂ y k ∂ y r  d y = Z P Y ( y ) ∂ 4 log P Y ( y ) ∂ y i ∂ y j ∂ y k ∂ y r d y + Z P Y ( y )  ∂ 2 log P Y ( y ) ∂ y j ∂ y r ∂ 2 log P Y ( y ) ∂ y i ∂ y k + ∂ 2 log P Y ( y ) ∂ y i ∂ y r ∂ 2 log P Y ( y ) ∂ y j ∂ y k  d y + Z P Y ( y )  ∂ 3 log P Y ( y ) ∂ y i ∂ y j ∂ y k ∂ log P Y ( y ) ∂ y r + ∂ 3 log P Y ( y ) ∂ y i ∂ y j ∂ y r ∂ log P Y ( y ) ∂ y k  d y . (180) Using the regularity con dition (226) in Co rollary H.5, we ﬁn ally o btain the desired result D C kl [ J Y ] ij = − Z P Y ( y ) n X r =1 ∂ 2 log P Y ( y ) ∂ y j ∂ y r [ CR N ] r l ! ∂ 2 log P Y ( y ) ∂ y i ∂ y k d y − Z P Y ( y ) n X r =1 ∂ 2 log P Y ( y ) ∂ y i ∂ y r [ CR N ] r l ! ∂ 2 log P Y ( y ) ∂ y j ∂ y k d y . (181) 25 Now , rec alling that Γ Y ( y ) = H y log P Y ( y ) and ide ntifying the elements o f the two matrices Γ Y ( y ) CR N and Γ Y ( y ) with the terms in (181), we o btain D C kl [ J Y ] ij = − E { [ Γ Y ( Y ) C R N ] j l [ Γ Y ( Y )] ik + [ Γ Y ( Y ) CR N ] il [ Γ Y ( Y )] j k } . (182) Finally , taking into a ccount that [ D C vec J Y ] i +( j − 1) n,k +( l − 1) n = D C kl [ J Y ] ij and applying Lemma A.1 wit h A = Γ Y ( Y ) C R N and B = Γ Y ( Y ) it can b e ea sily shown tha t D C vec J Y = − E { Γ Y ( Y ) CR N ⊗ Γ Y ( Y ) + K n ( Γ Y ( Y ) CR N ⊗ Γ Y ( Y )) } (183) = − 2 N n E { Γ Y ( Y ) CR N ⊗ Γ Y ( Y ) } . (184) E. Pr oof of The or em 2 Throughou t t his proo f we assume that { i, j, l } ∈ [1 , m ] and k ∈ [1 , n ] . Now , let us begin by considering the expression for the entries of the ma trix E S : [ E S ] ij = E { ( S i − E { S i | Y } )( S j − E { S j | Y } ) } = E { S i S j } − E { E { S i | Y } E { S j | Y }} . (185) Since the ﬁrst term in las t expres sion d oes not de pend on G , we have that D G kl [ E S ] ij = − D G kl E { E { S i | Y } E { S j | Y }} = − ∂ ∂ G k l Z P Y ( y ) E { S i | y } E { S j | y } d y = − Z ∂ P Y ( y ) ∂ G k l E { S i | y } E { S j | y } d y − Z P Y ( y ) ∂ E { S i | y } ∂ G k l E { S j | y } d y − Z P Y ( y ) E { S i | y } ∂ E { S j | y } ∂ G k l d y , (186) where, a s in Appendix D, the justiﬁcation of this interc hange of the order of deriv ation and integration and tw o other interchange s b elow follow similar ste ps as in [1, Append ix B]. Note that the sec ond and third terms in (186) have the s ame s tructure and , thus , we will deal with them jointly . The ﬁrst term in (186) ca n be rewritt en a s − Z ∂ P Y ( y ) ∂ G k l E { S i | y } E { S j | y } d y = Z E  S l ∂ P Y | S ( y | S ) ∂ y k  E { S i | y } E { S j | y } d y , (187) where we have us ed Lemma C. 2 to transform the de ri vati ve with res pect to G into a de ri vati ve with res pect to y . Using L emma C.5, the se cond term in (186) can be computed as − Z P Y ( y ) ∂ E { S i | y } ∂ G k l E { S j | y } d y = − Z E { S i | y } E  S l ∂ P Y | S ( y | S ) ∂ y k  E { S j | y } d y + Z E  S i S l ∂ P Y | S ( y | S ) ∂ y k  E { S j | y } d y . (188) Note that the third term in (186) can be obtaine d by interchang ing the roles of i and j in last eq uation. Plug ging the expressions (187) and (188) into (186) we can write D G kl [ E S ] ij = − Z E { S i | y } E  S l ∂ P Y | S ( y | S ) ∂ y k  E { S j | y } d y + Z E  S i S l ∂ P Y | S ( y | S ) ∂ y k  E { S j | y } d y + Z E  S j S l ∂ P Y | S ( y | S ) ∂ y k  E { S i | y } d y . (189) W e now simplify the obtaine d expression. The ﬁrst term can be reformulated as − Z E  S l ∂ P Y | S ( y | S ) ∂ y k  E { S i | y } E { S j | y } d y = − Z ∂ E  S l P Y | S ( y | S )  ∂ y k E { S i | y } E { S j | y } d y = − Z ∂ P Y ( y ) E { S l | y } ∂ y k E { S i | y } E { S j | y } d y = Z P Y ( y ) E { S l | y } ∂ E { S i | y } E { S j | y } ∂ y k d y , 26 where in the last s tep we have integrated by parts as deta iled in Proposition I.2. W e now make use of Le mma C.3 to s implify the de ri vati ve inside the integration sign in last equation to ob tain Z P Y ( y ) E { S l | y } ∂ E { S i | y } E { S j | y } ∂ y k d y = Z P Y ( y ) E { S l | y }  E { S j | y }  Φ S ( y ) G T R − 1 Z  ik + E { S i | y }  Φ S ( y ) G T R − 1 Z  j k  d y . (190) W e now proce ed to the comp utation of the se cond and third terms in (189) (note that they a re in fact the same term with the roles o f i and j interchang ed). W e have Z E  S i S l ∂ P Y | S ( y | S ) ∂ y k  E { S j | y } d y = Z ∂ E  S i S l P Y | S ( y | S )  ∂ y k E { S j | y } d y (191) = Z ∂ P Y ( y ) E { S i S l | y } ∂ y k E { S j | y } d y (192) = − Z P Y ( y ) E { S i S l | y } ∂ E { S j | y } ∂ y k d y , (193) where last equality follows by integrating by parts as Proposition I.2. W e are now ready to apply Lemma C.3 again to o btain − Z P Y ( y ) E { S i S l | y } ∂ E { S j | y } ∂ y k d y = − Z P Y ( y ) E { S i S l | y }  Φ S ( y ) G T R − 1 Z  j k d y . (194) Plugging (190) a nd (19 4) into (189) and rec alling tha t [ Φ S ( y )] j l = E { S j S l | y } − E { S j | y } E { S l | y } , we ﬁnally have D G kl [ E S ] ij = − Z P Y ( y )  [ Φ S ( y )] j l  Φ S ( y ) G T R − 1 Z  ik + [ Φ S ( y )] il  Φ S ( y ) G T R − 1 Z  j k  d y . (195) T ak ing into acc ount that D G kl [ E S ] ij = [ D G vec E S ] i +( j − 1) n,k +( l − 1) m and app lying L emma A.1 with A = Φ S ( y ) and B = Φ S ( y ) G T R − 1 Z we obtain D G vec E S = − E  Φ S ( Y ) ⊗ Φ S ( Y ) G T R − 1 Z + K m Φ S ( Y ) ⊗ Φ S ( Y ) G T R − 1 Z  (196) = − 2 N m E  Φ S ( Y ) ⊗ Φ S ( Y ) G T R − 1 Z  . (197) F . Proof of Th eorem 4 The developments lead ing to the expressions for the He ssian matrices H P h ( Y ) , H H h ( Y ) , an d H C h ( Y ) follow a very similar pattern. C onseque ntly , we will pres ent on ly one of them here. Consider the Hessian H P h ( Y ) , from the express ion for the Ja cobian D P h ( Y ) in (41) it follo ws that H P h ( Y ) = D P ( D T P h ( Y )) = D P vec H T R − 1 Z HPE S (198) =  E S ⊗ H T R − 1 Z H  + ( I m ⊗ H T R − 1 Z HP ) D m D P E S , (199) where in (199) we h av e used Lemma B.7 .1 adding the matrix D m becaus e E S is a symmetric matrix. The ﬁna l expression f or H P h ( Y ) is obtained by plugging in (199) the exp ression for D P E S obtained in Theorem 3 and recalling that D m D + m = N m . The calculation of the Hessian matrix H R Z h ( Y ) from its Ja cobian D R Z h ( Y ) in (44) follows: H R Z h ( Y ) = 1 2 D R Z D T n vec J Y = 1 2 D R Z D T n D n vech J Y = 1 2 D T n D n D R Z J Y , (200) where, in last equality , we have us ed Lemma B.7.2. Now , we only n eed to plug i n the expression for D R Z J Y , which c an b e found in Theo rem 3 and note tha t D T n D n D + n = D T n N n = D T n . Finally , the Hess ian matrix H R N h ( Y ) ca n be compu ted from its Jaco bian D R N h ( Y ) in (45) as H R N h ( Y ) = 1 2 D R N D T n ′ vec ( C T J Y C ) (201) = 1 2 D T n ′ ( C T ⊗ C T ) D n D R N J Y , (202 ) where we have u sed Lemmas A.4 and B.7.2 and a lso that D R N vec J Y = D n D R N vech J Y = D n D R N J Y similarly as in (20 0). Re calling the expression for D R N J Y in T heorem 3, the result follows. 27 G. Matrix algebra r es ults for the proof of the multidimensiona l EPI in The or em 7 In this append ix we prese nt a numbe r of lemmas and propo sitions tha t a re us ed in the proof o f our multidimen- sional E PI in Se ction V. Lemma G.1 (Bhatia [ 28, p. 1 5]): Let R ∈ S s + be a positiv e se mideﬁnite ma trix, R ≥ 0 . Then ,  R R R R  ≥ 0 . Pr oof: Since R ≥ 0 , con sider R = AA T and write  R R R R  =  A A   A T A T  . Lemma G.2 (Bhatia [ 28, Ex er cise 1.3 .10]): Let R ∈ S s ++ be a pos iti ve deﬁ nite matrix, R > 0 . Then,  R I s I s R − 1  ≥ 0 . (203) Pr oof: Cons ider aga in R = AA T , then we have R − 1 = A − T A − 1 . Now , simply write (203 ) as  R I s I s R − 1  =  A 0 0 A − T   I s I s I s I s   A T 0 0 A − 1  , which, from Sylvester’ s law of inertia for con gruent matrices [28, p. 5] and Lemma G.1, is p ositi ve semideﬁ nite. Lemma G.3: If the matrices R and T a re positi ve (semi)deﬁn ite, then so is the produc t R ⊗ T . In other words, the class o f po siti ve (semi)deﬁn ite matrices is closed u nder the Kron ecker product. Pr oof: See [27, p. 254, Fact 7.4.1 5] Cor ollary G.4 (Schur Theorem): The c lass of positi ve (semi)deﬁnite matrices is also closed under the S chur matrix prod uct, R ◦ T . Pr oof: The proof follo ws from Lemma G.3 by noting that the Schur product R ◦ T is a principal submatrix of the Kronecker product R ⊗ T as in [27, Propos ition 7.3.1] and that any principal submatrix of a positi ve (semi)deﬁnite matrix is also positiv e (semi)deﬁ nite, [27 , Propos itions 8.2.6 a nd 8.2.7]. Alternatively , see [29, Theorem 7 .5.3] or [26, Theorem 5 .2.1] for a comp letely different proof. Lemma G.5 (Schur complem ent): L et the matrices R ∈ S s ++ and T ∈ S q ++ be positive deﬁ nite, R > 0 and T > 0 , a nd not ne cessa rily of the same dimension. Then the follo wing stateme nts are equivalent 1)  R A A T T  ≥ 0 , 2) T ≥ A T R − 1 A , 3) R ≥ A T − 1 A T , where A ∈ R s × r is any arbitrary matrix. Pr oof: See [29, Theorem 7.7.6 ] and the s econd exercise following it or [27, Propostition 8.2.3]. W ith the above lemmas at ha nd, we are now ready to prove the follo wing propo sition: Pr oposition G.6: Cons ider two pos iti ve d eﬁnite matrices R ∈ S s ++ and T ∈ S s ++ of the same dimension. Then it follo ws that R ◦ T − 1 ≥ Diag ( R ) ( R ◦ T ) − 1 Diag ( R ) . (204) Pr oof: From Lemma s G.1, G.2, and G.4, it follows that  R R R R  ◦  T I s I s T − 1  =  R ◦ T Diag ( R ) Diag ( R ) R ◦ T − 1  ≥ 0 . Now , from Lemma G.5, the resu lt follows directly . Cor ollary G.7: Le t R ∈ S s ++ be a positiv e de ﬁnite matrix. Then , diag ( R ) T ( R ◦ R ) − 1 diag ( R ) ≤ s. (205) 28 Pr oof: Particularizing the result in Proposition G.6 with T = R and pre- and post-multiplying it by 1 T and 1 we obtain 1 T  R ◦ R − 1  1 ≥ 1 T Diag ( R ) ( R ◦ R ) − 1 Diag ( R ) 1 . The res ult in (205) now follows straightforwardly from the fact 1 T  R ◦ R − T  1 = s , [30] (see also [27, Fact 7.6.10], [26, L emma 5.4 .2(a)]). Note that R is symmetric and thus R T = R an d R − T = R − 1 . Remark G.8: Note that the proof o f Corollary G.7 is ba sed on the result o f Proposition G.6 in (204). An a lternati ve proof could follo w similarly from a dif ferent inequality b y Styan in [31] R ◦ R − 1 + I s ≥ 2 ( R ◦ R ) − 1 , where, in this c ase, R is con strained to have o nes in its main diagon al, i.e., R ◦ I s = I s . Pr oposition G.9: Con sider now the po siti ve semideﬁn ite matrix R ∈ S s + . Then, R ◦ R ≥ diag ( R ) diag ( R ) T s . Pr oof: For the case wh ere R ∈ S s ++ is pos iti ve deﬁnite, from (205) in Corollary G.7 and Lemma G.5, it follo ws that  R ◦ R diag ( R ) diag ( R ) T s  ≥ 0 . Applying a gain Lemma G.5, we get R ◦ R ≥ diag ( R ) diag ( R ) T s . (206) Now , ass ume that R ∈ S s + is pos iti ve semide ﬁnite. W e thus deﬁne ǫ > 0 and consider the positive deﬁnite ma trix R + ǫ I s . From (206), we k now that ( R + ǫ I s ) ◦ ( R + ǫ I s ) ≥ diag ( R + ǫ I s ) diag ( R + ǫ I s ) T s . T ak ing the limit a s ǫ ten ds to 0, the validity of (206) for positiv e semideﬁnite matrices follo ws from continuity . The last lemma in this section follows. Lemma G.10: For a given ran dom vector X , it follo ws that E  X X T  ≥ E  X  E  X T  . Pr oof: Simply note that E  X X T  − E  X  E  X T  = E  ( X − E { X } )( X − E { X } ) T  ≥ 0 , where las t ine quality follows from the fact that the expec tation preserves positiv e semideﬁn iteness. H. Inte g ral iden tities in volving functions and d erivatives of P Y ( y ) . The integral ide ntities presented in this sec tion are d eri ved through a s equenc e of lemmas which lead to the ma in proposition c ontaining the iden tities. First, we present a l emma , which is a straightforward ge neralization for no n-white Gauss ian rand om variables of [32 , L emma 4. 1] Lemma H.1: Assume Y = X + Z is a n n -dimens ional random vector , where X is a rbitraril y distributed a nd Z is d istrib uted follo wing a ze ro-mean Gaus sian distribut ion with covariance R Z and c onsider a non-emp ty se t of natural numbers I , who se elements range from 1 to n . The n, given ϕ > 1 , there exists a ﬁnite positiv e cons tant κ not depen ding o n y such that      ∂ |I | P Y ( y ) Q i ∈I ∂ y i      ≤ κ ( n, I , ϕ, R Z )( P Y ( y )) 1 /ϕ , (207) where we use the n otation Q i ∈I ∂ y i to d enote, e.g ., Q i ∈{ 3 , 1 , 3 , 5 } ∂ y i = ∂ y 1 ∂ y 2 3 ∂ y 5 . 29 Pr oof: T his proof follows the guide lines of the p roof of [32, Lemma 4.1 ]. For any R Z > 0 , which implies that R − 1 Z exists, we have tha t P Y ( y ) is continuous ly dif ferentiable in y , and ∂ P Y ( y ) ∂ y i = ∂ ∂ y i E  P Y | X ( y | X )  = E  ∂ ∂ y i P Z ( y − X )  (208) = − Z P X ( x ) P Z ( y − x )  ( y − x ) T R − 1 Z  i d x . (209) Now , u sing H ¨ older’ s inequality , for ϕ > 1 and 1 /ϕ + 1 /ψ = 1 we have     ∂ P Y ( y ) ∂ y i     ≤ Z ( P X ( x ) P Z ( y − x )) 1 /ϕ  P X ( x ) P Z ( y − x )    ( y − x ) T R − 1 Z  i   ψ  1 /ψ d x ≤ ( P Y ( y )) 1 /ϕ  Z P X ( x ) P Z ( y − x )    ( y − x ) T R − 1 Z  i   ψ d x  1 /ψ , (210) from which the result f or I = { i } , such tha t |I | = 1 , follo ws since P Z ( y )    y T R − 1 Z  i   ψ is b ounded above by a constant de pending o nly on R Z and ψ = ϕ/ ( ϕ − 1) . The inequalities for |I | > 1 follo w in a similar fashion from the fact that for any R Z > 0 , ∂ |I | P Z ( y ) ∂ y |I | i = ( − 1) |I |  R − 1 Z  ii 2 ! |I | / 2 H |I |    R − 1 Z y  i q 2  R − 1 Z  ii   P Z ( y ) (211) where H |I | ( x ) is the |I | -th o rder Hermite polynomial deﬁned follo wing the con vention in [33, p . 817], and noting that the partial d eri vati ves of the Hermite p olynomial a re o ther polynomials . Lemma H.2: Assume Y = X + Z is a n n -dimens ional random vector , where X is a rbitraril y distributed a nd Z is dis trib uted following a zero-mean Ga ussian distribution with covariance R Z . The n, given ϕ > 1 , there exist a set o f ﬁn ite po siti ve constan ts ξ not depe nding on y such that for all |I | ≥ 1      ∂ |I | log P Y ( y ) Q i ∈I ∂ y i      ≤ X π ξ ( n, π , ϕ, R Z )( P Y ( y )) | π | (1 − ϕ ) ϕ , (212) where the sum is over the pa rtitions π of the s et I . Pr oof: W e recall the Arbog ast-Fa ` a di Bruno’ s formula in its mos t gen eral form as given in [34, Eq. (5)] for the partial deri vati ve of a composite function, ∂ |I | f ( g ) Q i ∈I ∂ z i = X π d | π | f ( g ) d g | π | Y B ∈ π ∂ | B | g Q j ∈ B ∂ z j , (213) where, a s explained in [34 ], the sum is over the parti tions π of the s et I , a nd whe re B represe nts an element of the partition π . 6 Thus, for e ach given p artition π , B c an take | π | dif feren t values. Cons equently the order of the deriv ati ve with respec t to f , | π | , c oincides with the nu mber of factors in the p roduct indexed by B . Particularizing (213) for our cas e we obtain ∂ |I | log P Y ( y ) Q i ∈I ∂ y i = X π ( − 1) | π | | π | ! ( P Y ( y )) −| π | Y B ∈ π ∂ | B | P Y ( y ) Q j ∈ B ∂ y j . (214) Now , le t us ﬁ x ϕ > 1 and R Z > 0 an d app ly the bound fou nd in Le mma H.1 to ea ch factor ∂ | B | P Y ( y ) Q j ∈ B ∂ y j . Recalling that there are | π | of thes e factors, the bo und bec omes      ∂ |I | log P Y ( y ) Q i ∈I ∂ y i      ≤ X π | π | ! ( P Y ( y )) −| π | Y B ∈ π κ ( n, B , ϕ, R Z )( P Y ( y )) 1 /ϕ (215) = X π ξ ( n, π , ϕ, R Z )( P Y ( y )) | π | /ϕ −| π | (216) 6 Note that B is simply a set of indices. 30 where we have de ﬁned ξ ( n, π , ϕ, R Z ) = | π | ! Q B ∈ π κ ( n, B , ϕ, R Z ) Next, we prese nt the lemma which is key in the p roof of the propos ition that contains the integral iden tities. Lemma H.3: Assume Y = X + Z is a n n -dimens ional random vector , where X is a rbitraril y distributed a nd Z is dis trib uted following a zero-mea n Gau ssian distrib ution with covariance R Z . Let us c onsider a se t ω whose elements I are sets of ind ices ran ging from 1 to n . Then , given φ > 0 , it follows that lim | y k |→∞ ( P Y ( y )) φ Y I ∈ ω ∂ |I | log P Y ( y ) Q j ∈I ∂ y i j = 0 , (217) where k is an arbitrary index for the entries of vector y . Pr oof: Applying Lemma H.2 to e ach one of the individual factors inside the product in (217), yields, for any ϕ > 1 , tha t      ( P Y ( y )) φ Y I ∈ ω ∂ |I | log P Y ( y ) Q j ∈I ∂ y i j      ≤ ( P Y ( y )) φ Y I ∈ ω X π ( I ) ξ ( n, π ( I ) , ϕ, R Z )( P Y ( y )) | π ( I ) | /ϕ −| π ( I ) | (218) = Y I ∈ ω X π ( I ) ξ ( n, π ( I ) , ϕ, R Z )( P Y ( y )) | π ( I ) | /ϕ −| π ( I ) | + φ/ | ω | , (219) where we have mad e explicit the d ependen ce of the partit ion π on the current value o f the s et o f indices I . No w w e consider a g eneric term ξ ( n, π ( I ) , ϕ, R Z )( P Y ( y )) | π ( I ) | /ϕ −| π ( I ) | + φ/ | ω | and we no te that for all φ > 0 , there exists a value ϕ max ( φ, | π ( I ) | , | ω | ) > 1 such that the expo nent | π ( I ) | /ϕ − | π ( I ) | + φ/ | ω | is pos iti ve for a ny ϕ ins ide the interval ϕ ∈ (1 , ϕ max ( φ, | π ( I ) | , | ω | )) . Note that, if φ > | π || ω | , then ϕ max ( φ, | π ( I ) | , | ω | ) = ∞ . Next, simply taking ϕ min ( φ, | ω | ) = min π ϕ max ( φ, | π ( I ) | , | ω | ) , (220) which fulﬁlls that ϕ min ( φ, | ω | ) > 1 , we h av e tha t, for all ϕ inside the non-empty interval (1 , ϕ min ( φ, | ω | )) , all the exponents of P Y ( y ) in (219) a re positiv e. Since w e have that lim | y k |→∞ P Y ( y ) = 0 a nd the produc t and sum hav e a ﬁnite nu mber of factors and terms, res pectiv ely , we rea dily ob tain the resu lt of the lemma. W ith this las t lemma at hand we are ready to p rove the following proposition, which is the main purpose of this append ix. Pr oposition H.4: Assume Y = X + Z is an n -dimensional rand om vector , where X is arbitrarily distributed and Z is distributed following a zero-mea n Gauss ian distribution with covariance R Z . Th en the following integral identities hold Z ∂ 2 P Y ( y ) ∂ y k ∂ y l ∂ 2 log P Y ( y ) ∂ y i ∂ y j d y = − Z ∂ P Y ( y ) ∂ y l ∂ 3 log P Y ( y ) ∂ y i ∂ y j ∂ y k d y , (221) Z ∂ P Y ( y ) ∂ y l ∂ 3 log P Y ( y ) ∂ y i ∂ y j ∂ y k d y = − Z P Y ( y ) ∂ 4 log P Y ( y ) ∂ y i ∂ y j ∂ y k ∂ y l d y , (222) Pr oof: Th e p roof is based in integrating by parts the left hand side of (221)-(222) and showing that the re is a term tha t vanishes. Integrating by parts the left hand side of (221) we obtain Z ∂ 2 P Y ( y ) ∂ y k ∂ y l ∂ 2 log P Y ( y ) ∂ y i ∂ y j d y =  ∂ P Y ( y ) ∂ y l ∂ 2 log P Y ( y ) ∂ y i ∂ y j  y k = ∞ y k = −∞ − Z ∂ P Y ( y ) ∂ y l ∂ 3 log P Y ( y ) ∂ y i ∂ y j ∂ y k d y . (223) Casting Lemma H.1 with I = { l } to bound the ﬁrst factor in the term inside the ev alua tion limits in last equation yields     ∂ P Y ( y ) ∂ y l ∂ 2 log P Y ( y ) ∂ y i ∂ y j     ≤ κ ( n, { l } , ϕ, R Z )     ( P Y ( y )) 1 /ϕ ∂ 2 log P Y ( y ) ∂ y i ∂ y j     . (224) According to Lemma H.3 with φ = 1 /ϕ a nd ω = {{ i, j }} , the right han d side of last equation vanishes in the limi t as | y k | → ∞ , which implies the identity in (221). Repeating the procedure for (222), the res ulting term whe n integrating b y parts is  P Y ( y ) ∂ 3 log P Y ( y ) ∂ y i ∂ y j ∂ y k  y l = ∞ y l = −∞ , (225) 31 which is e asily sh own that it vanishes applying Lemma H.3 with φ = 1 a nd ω = { { i, j, k }} . A simple corollary res ults from Propo sition H.4. Cor ollary H.5: Using that ∂ P Y ( y ) ∂ y l = P Y ( y ) ∂ log P Y ( y ) ∂ y l in the left ha nd side of (222), it readily follows that Z P Y ( y ) ∂ log P Y ( y ) ∂ y l ∂ 3 log P Y ( y ) ∂ y i ∂ y j ∂ y k d y = − Z P Y ( y ) ∂ 4 log P Y ( y ) ∂ y i ∂ y j ∂ y k ∂ y l d y , (226) E  ∂ log P Y ( Y ) ∂ y l ∂ 3 log P Y ( Y ) ∂ y i ∂ y j ∂ y k  = − E  ∂ 4 log P Y ( Y ) ∂ y i ∂ y j ∂ y k ∂ y l  , (227) which is a higher-dimensional version of the regularity co ndition E  ∂ log P Y ( Y ) ∂ y l ∂ log P Y ( Y ) ∂ y k  = − E  ∂ 2 log P Y ( Y ) ∂ y k ∂ y l  , (228) which is u sed for examp le in the deriv ation of the CRLB in [14 ]. I. Inte gral identities in volving functions of E { S | y } . Similarly as in the p re vious appendix, the integral identities presented in this se ction are deri ved throug h a lemma which lea ds to the main propo sition containing the identities. Lemma I.1: Ass ume Y = G S + Z is a n n -dimensiona l random vector , where G is a d eterministic matrix, S is a n arbitrarily d istrib uted rando m vector , and Z is distributed following a zero-mea n Gaussian distrib ution with covari an ce R Z . Consider a set of M functions f i ( S ) , which have polyno mial depen dence on the elements of S . Then, given ϕ > 1 , there exists a ﬁnite positiv e cons tant κ no t de pending on y such that      P Y ( y ) M Y i =1 E { f i ( S ) | y }      ≤ κ ( n, { f i } , ϕ, R Z )( P Y ( y )) M − ϕ ( M − 1) ϕ , (229) Pr oof: The proof follows by ﬁrst noticing that | E { f i ( S ) | y }| ≤ 1 P Y ( y ) Z | f i ( s ) | P Y | S ( y | s ) P S ( s ) d s = 1 P Y ( y ) Z | f i ( s ) | P Z ( y − G s ) P S ( s ) d s , and then using H ¨ older’ s inequa lity with 1 /ϕ + 1 /ψ = 1 in an ana logous way as we have don e in (210) we obtain Z | f i ( s ) | P Z ( y − G s ) P S ( s ) d s = Z ( P S ( s ) P Z ( y − G s )) 1 /ϕ  P S ( s ) P Z ( y − G s ) | f i ( s ) | ψ  1 /ψ d s ≤ ( P Y ( y )) 1 /ϕ  Z P S ( s ) P Z ( y − G s ) | f i ( s ) | ψ d s  1 /ψ , (230) ≤ ξ ( n, f i , ϕ, R Z )( P Y ( y )) 1 /ϕ , (231) where last ineq uality follo ws from the fact that P Z ( y − G s ) | f i ( s ) | ψ is bou nded above by the co nstant ξ ( n, i, ϕ, R Z ) not depen ding o n y due to the fact tha t f i ( s ) is a polynomial on the entries o f s . Considering the product    P Y ( y ) Q M i =1 E { S i | Y }    the res ult of the lemma follows by n oting that the new constan t becomes κ ( n, { f i } , ϕ, R Z ) = Q M i =1 ξ ( n, f i , ϕ, R Z ) . Pr oposition I.2: As sume Y = G S + Z is an n -dimensional rando m vector , where G is a deterministic matrix, S is an arbitrarily distrib uted random vector , and Z is distributed foll owing a zero-mean Gaussian distrib ution with covari an ce R Z . Then, the following integral ide ntities h old Z ∂ P Y ( y ) E { S i S l | y } ∂ y k E { S j | y } d y = − Z P Y ( y ) E { S i S l | y } ∂ E { S j | y } ∂ y k d y (232) − Z ∂ P Y ( y ) E { S l | y } ∂ y k E { S i | y } E { S j | y } d y = Z P Y ( y ) E { S l | y } ∂ E { S i | y } E { S j | y } ∂ y k d y 32 Pr oof: Integrating by parts the left han d side of (232) we have Z ∂ P Y ( y ) E { S i S l | y } ∂ y k E { S j | y } d y = [ P Y ( y ) E { S i S l | y } E { S j | y } ] y k = ∞ y k = −∞ − Z P Y ( y ) E { S i S l | y } ∂ E { S j | y } ∂ y k d y . (233) Using L emma I.1 with M = 2 , f 1 ( S ) = S i S l , and f 2 ( S ) = S k , we have that | P Y ( y ) E { S i S l | y } E { S j | y }| ≤ κ ( n, { f i } , ϕ, R Z )( P Y ( y )) (2 − ϕ ) /ϕ . (234) Now choo sing 1 < ϕ < 2 it is easy to see that the ﬁrst term in the right hand side of (233) v anish es as lim y k →∞ P Y ( y ) = 0 . P roceeding similarly with the sec ond integral identity with M = 3 , f 1 ( S ) = S l , f 1 ( S ) = S i , and f 1 ( S ) = S j and choosing 1 < ϕ < 3 / 2 the result in the lemma follows. R E F E R E N C E S [1] D. P . Palomar and S . V erd ´ u, “Gradient of mutual information i n l inear vector Gaussian chann els, ” IEEE Tr ans. on Information Theory , vol. 52, no. 1, pp. 141–154 , 2006. [2] A. J. Stam, “Some inequalities satisﬁed by the quantities of information of F isher and S hannon, ” Inform. and C ontr ol , vol. 2, pp. 101–11 2, 1959. [3] T . E. Duncan , “On the calcu lation of mutual i nformation, ” SIAM Journal on Applied Math. , v ol. 19, pp . 215 – 220, July 1970. [4] T . T . Kadota, M. Zakai, and J. Ziv , “Mutual information of the white Gaussian channel with and without feedback, ” IEE E T rans. on Information Theory , vol. 17, pp. 368 – 371, July 19 71. [5] D. Guo, S. S hamai, and S. V erd ´ u, “Mutual information an d minimum mean-square error in Gaussian channels, ” I EEE T rans. on Information Theory , vol. 51, no. 4, pp . 1261–12 82, 2005. [6] M. Zakai, “On mutual information, likelihood ratios, and estimation error for the ad ditive Gaussian chan nel, ” IEEE T rans. on Information Theory , v ol. 51, no . 9, pp. 3017 – 302 4, S ep. 20 05. [7] D. P . Palomar and S. V erd ´ u, “Representation of mutual information via inpu t estimates, ” IEEE T rans. on Information Theory , v ol. 53, no. 2, pp. 453 – 470, Feb . 2007. [8] D. Guo, S. Shamai, and S. V erd ´ u, “Estimation in Gaussian noise: Properties of the minimum mean-sq uare error, ” Submitted to IEEE T rans. on Information Theo ry , 2008. [9] ——, “Estimation of non -Gaussian random v ariables in Gaussian noise: Properties of the MMSE, ” in IEEE International Symposium on Information Theory , (ISIT’08). , 2 008, pp. 1083– 1087. [10] M. H. M. Costa, “A new entrop y po wer inequality, ” IEEE T rans. on Information Theory , vol. 31, no . 6, pp. 751–760, 1985. [11] R. T andon and S . Ulukus, “Depende nce balance based outer bounds for Gaussian ne tworks with cooperation and feedback , ” Submitted to IEEE T rans. on Information Theory , 20 08. [12] T . M. Cover and J. A. Thomas, Elements of Information T heory . Ne w Y ork: Joh n Wiley & Sons, 1991. [13] A. Dembo, T . M. Cover , and J. A. Thomas, “Information theoretic inequalities, ” IEEE T ran s. on Information Theory , vol. 37, pp. 1501 – 1518 , 1991. [14] S. M. Kay , Funda mentals of Statistical Sign al Pr ocessing: Estimation Theory . Engle wood Hills, NJ: Prentice-Hall, 1993. [15] J. Magnus and H. Neudeck er , Matrix Differ ential Calculus with Applications i n Statistics and Econometrics , 3rd ed. Ne w Y ork: Wiley , 2007. [16] V . P relov and S. V erd ´ u, “Second-order asymptotics of mutual information, ” IEEE T rans. on Information Theory , vol. 50, no. 8, pp. 1567 – 1580, Aug. 200 4. [17] A. Lozano, A. Tulino , and S. V erd ´ u, “Optimum po wer allocation for parallel Gaussian channels with arbitrary input distributions, ” IEEE T rans. on Information Theory , v ol. 52, no. 7, pp. 3033–3051, July 2006 . [18] C. E. Shannon, “A mathematical t heory o f communication, ” Bell Syst T ech. J , vol. 27, no. 3, pp. 379–423, 194 8. [19] A. Dembo, “Simple proof of the conca vity of t he entrop y power with respect to added Gaussian noise, ” IEEE T rans. o n Information Theory , v ol. 35, no . 4, pp. 887–888, 1989. [20] C. V illani, “A short proof of the “conca vity of entropy po wer”, ” I EEE T rans. on Information Theory , v ol. 46, no. 4, pp. 1695–1696, 2000. [21] D. Guo, S . Shamai, and S. V erd ´ u, “P roof of entropy po wer inequa liti es via MMSE, ” in Pr oc. IEEE International Symposium on Information Theory (ISIT’06) , July 2006 . [22] O. Rioul, “Information Theoretic Proofs of Entropy P o wer Inequalities, ” arXiv:0704.175 1v1 [cs.IT ] , 2007. [23] M. Payar ´ o and D. P . Palomar , “ A multiv ariate gene ralization of Costa’ s entropy power inequality , ” in Pr oc. IEE E Intl. Symp. on Information Theory (ISIT’08) , T oronto, Canada, Ju ly 2008, pp . 1088–1092 . [24] F . D. Neeser and J. L. Massey , “Pr oper complex random processes w ith applications to information theory , ” IE EE Tr ans. on Information Theory , v ol. 39, no . 4, pp. 1293 – 130 2, Jul 1993. [25] D. H. Brandwoo d, “ A compex gradient operator and its application in adapti ve array theory , ” P r oc. Inst. Elec. Eng. , v ol. 130, pp. 11 – 16, 1983. [26] R. A. Horn and C. R. Johnson, T opics on Matrix Analysis . Cambridge Uni versity Press, 1991. [27] D. S. Bernstein, Matrix Mathema tics . Princeton Uni versity Press, 2005. [28] R. Bhatia, P ositive Deﬁ nite Matrices . Princeton Uni versity Press, 2007. 33 [29] R. A. Horn and C. R. Johnson, Matrix A nalysis . Cambridge Uni versity Pr ess, 1 985. [30] C. R. Johnson and H. M. Shapiro, “Mathematical Aspects of the Relative Gain Array A ◦ A − T , ” SIAM J ournal on Algebr aic and Discr ete Methods , v ol. 7, p. 627, 1986 . [31] G. P . H. Styan, “Hadamard products and multi variate statistical analysis, ” Linear Alge bra Appl. , vol. 6, pp. 217–240 , 1973. [32] G. T oscani, “Entrop y prod uction and the rate o f conv ergence to equilibrium for the Fokker -Planck eq uation, ” Quart. of A pplied Math. , vol. 57, no. 3, pp. 521–541 , Sep. 19 99. [33] G. B. Ar fken and H.-J. W eber , Mathematical Metho ds for Physicists , 6th ed. Amsterdam: Elsevier , 2005. [34] M. Hardy , “Combinatorics of partial deriv ativ es, ” The Electr onic J ournal of Combinatorics , vol. 13, 2006.

Hessian and concavity of mutual information, differential entropy, and entropy power in linear vector Gaussian channels

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment