Inference with minimal Gibbs free energy in information field theory

Inference with minimal Gibbs free energy in information ﬁeld theory T orsten A. Enßlin and Cornelius W eig Max-Planck-Institut f¨ ur Astr ophysik, Karl-Schwarzschild-Str. 1, 85741 Gar ching, Ge rmany (Dated: Oct ober 23, 2018 ) Non-linear and non-Gaussian signal inference problems are diﬃcult to tackle. Renormalization techniques p ermit u s t o construct goo d estimators for the p osterior signal mean within information ﬁeld theory (IFT), but the ap p ro ximations and assumptions made are not very obvious. Here w e introduce th e simple concept of minimal Gibbs free energy to IFT, and show that p rev ious renor- malization results emerge naturally . They can b e understo o d as b eing the Gaussian approximatio n to th e full p osterior probability , whic h has maximal cross information with it. W e derive optimized estimators for th ree applications, to illustrate the usage of th e framew ork: (i) reconstruction of a log-normal signal from Po issonian data with b ac kground counts and p oint spread function, as it is needed for gamma ray astronom y and for cosmograph y using p hotometric galaxy redshifts, (ii) inference of a Gaussian signal with un k nown sp ectru m and (iii) inference of a Poi ssonian log-normal signal with unknown spectru m, the combination of ( i) and (ii). Finally we explain ho w Gaussian knowl edge states constructed by the minimal Gibbs free energy p rinciple at diﬀerent temp eratures can b e combined into a more accurate surrogate of the n on-Gaussian p osterior. I. INTRO DUCT ION A. Abstract inference problem Measurements provide information on the signals we are interested in, encoded in the delivered data . How can this informatio n b e best retrie ved? Is there a ge neric and simple pr inciple from which optimal data analysis strategies derive? Can an informa tion energy b e con- structed which – if minimized – provides us with the cor - rect k nowledge state given the data and prior infor ma- tion? And if this exists, how can this information gr o und state b e found at least appr oximativ ely? An informa tion energy , to b e minimized, would b e very useful to hav e, s ince many of the existing minimization techn iques, analytical and n umerical, ca n then be ap- plied to it. A num ber of s uch functions to b e extremized to solve infer ence problems were prop osed in the liter- ature, like the likelihoo d, the p osterio r , o r the entropy . The likelihoo d is the pr obability that the data has r e- sulted from some signal. The p o sterior is the reverse, it is the probability that given the data some signal was the origin of it. Extremizing either o f them certainly makes sense, but often ignores the pre s ence o f slightly less prob- able, but m uch more n umero us p ossibilities in the signal phase space. Those have a muc h larger en tropy and are therefore fav ored by maximum entrop y metho ds. How- ever, maximum entrop y a lone c a n not b e the inference determining criterion, since it favors states of complete lack o f knowledge, irre s pe ctive of the data. Thu s s ome counteracting ener g y is r equired which pr ovides the right amount of for ce to the infer ence solution. Her e, we ar- gue that the idea l information ener gy is provided by the Gibbs free e ne r gy , which combines bo th maximum en- tropy and ma x imu m a pos ter iori (MAP) principles . The Gibbs fre e ener gy has to b e regar ded a s a func- tional over the space of p o ssible pr obability density func- tions (PDF) of the signal given the da ta. The r esult of the minimiza tio n is therefore a P DF itself, a nd not a sin- gle s ig nal estimate. Minimizing the Gibbs free energy maximizes the entrop y within the co nstraints given by the internal ener gy . The latter is unders to o d as the av- erage of the negative log arithm of the joint probability function of signal and data weigh ted with the PDF. The usag e of thermo dynamical concepts for inferenc e problems is not new, see e.g. [1, 2]. What is new here, is that we dev elop this for signals which a re ﬁelds , spa - tially distr ibuted quantities with a n inﬁnite nu mber of degrees of freedom, while using an appr oximate Gaus- sian a nsatz for the P DF to b e inferred. W e thereby co n- nect infor mation ﬁeld theory (IFT) [3 – 10], as a statistical ﬁeld theory dea ling with a huge nu mber o f microsco pic degrees of freedo m, to thermo dynamics , as a means to generate simpliﬁed, but macrosco pic descriptions of our knowledge. Ther eby we ﬁnd that for mer IFT r esults ob- tained with complex renor malization schemes in [9, 11] can ea sily b e repr o duced, and even b e extended to more complicated measur ement situations. In the rema inder of Sect. I we brieﬂy in tro duce to IFT, MAP , and Maximum En tropy . This motiv ates the minimal Gibbs free energy principle, which we formally derive in Sect. I I , and show its eq uiv alence to maxima l cross info r mation. Th e a pplica tion o f this principle to optimize approximations of the poster ior of concrete in- ference problems is provided in Sect. I I I. There, the log-nor mal P oiss on problem (Sect. I I I A) and the prob- lem to reco nstruct without known s ignal p ow er sp ectrum (Sect. II I B), as well as their combination (Sect. II I C) are addr essed. Finally , we show how appr oximate pos te- riors obtained at diﬀerent temp era tures c a n b e comb ined int o a b etter po sterior sur roga te in Sect. IV be fo re we conclude Sect. V. B. Information ﬁeld theory Information theo ry describ es knowledge states with probabilities. If Ω is the complete set of p ossibilities, 2 and A ⊂ Ω is a subse t, then P ( A ) ∈ [0 , 1] descr ibe s the plausibility of A b eing the cas e, with P ( A ) = 1 denot- ing A b eing a s sumed to b e sur e, P ( A ) = 0 denoting A being (assumed to b e) imp oss ible, and 0 < P ( A ) < 1 describing uncerta int y ab o ut the truth of A . Obviously P (Ω) = 1 and P ( ∅ ) = 0. The usual rules of proba - bilit y theory a pply , and generalize the binary lo gic of Aristotle to diﬀerent degrees of certaint y or uncertaint y [12, 13]. In case the set of p os sibilities is a contin uum, it makes sense to introduce a P DF P ( ψ ) over Ω, so that P ( A ) = R A dψ P ( ψ ). Each p o s sible state ψ can b e a m ulti-comp onent vector, co nt aining all a sp ects o f rea lity which ar e in the fo cus o f our inference pr oblem. W e might b e int eres ted in a sub-as p ect of ψ which we call our signal s = s ( ψ ). The induced signa l PDF is re trieved from a functional or path integral over a ll the phase spaces of the p ossibilities o f ψ via P ( s ) = R D ψ P ( ψ ) δ ( s − s ( ψ )). If s is a ﬁeld, a function ov er a ph ysical space V , then s = ( s x ) x ∈ V might be a vector in the Hilber t spa ce Ω o f all L 2 -integrable functions ov er V a nd P ( s ) is then a probability dens it y functional. In- formation theo r y for s be comes IFT, which is a statis tica l ﬁeld theory . Inference on the signa l s fro m data d is done from the po sterior pro bability P ( s | d ), which can b e constructed from the joint PDF o f signal and data P ( d, s ) via P ( s | d ) = P ( d, s ) P ( d ) = e − β H ( d,s ) Z β     β =1 , (1) where P ( d, s ) = R Ω D ψ P ( d | ψ ) δ ( s − s ( ψ )) P ( ψ ) = P ( d | s ) P ( s ) and P ( d ) = R D s P ( d, s ). The seco nd equal- it y in (1) is just a renaming o f the n umerato r and denomi- nator of the ﬁrst fraction, which highligh ts the connection to statistical mechanics. Thus we deﬁne the infor mation Hamiltonian H ( d, s ) = − log P ( d, s ) , (2) the pa r tition function including a moment generating source term J Z β ( d, J ) = Z D s e − β ( H ( d, s )+ J † s ) , (3) and the in verse temp era ture β = 1 / T as usual in statis- tical mechanics. Here s † is the transp os ed and complex conjugated signa l vector s , lea ding to a s calar pro duct j † s = R V dx ¯ j x s x . The ad-ho c notion of temp era ture is as in standar d simulated annealing pra ctice. It p ermits to nar row (for T < 1) o r widen (for T > 1) the explored phase space re g ion with res pe c t to the one o f the joint PDF and therefore is a useful auxiliar y par ameter. W e show in Sect. II A that the well known ther mo dynamical equipartition theor em holds: h H ( s, d ) i ( s | d ) − H ( m, d ) ≈ 1 2 N dgf T . (4) where N dgf is the num ber o f deg rees of freedom and m is the mea n signal ﬁeld as deﬁned b elow in (5), which deﬁnes the ground state energy . This relation ca n e .g . b e used to c heck the co rrectness of an implemen tation of a signal phase-spa c e sa mpling algor ithm. C. Maximum a posteriori The ﬁr st guess for a suitable energy to be minimized to obta in the information state might b e the Hamilto- nian. Minimizing the Hamiltonian with resp ect to s , while keeping d a t their obser ved v alues, is equiv alent to maximizing the joint probability P ( d, s ) and also the po sterior P ( s | d ). The cla s sical ﬁeld emerging from this is called the MAP s ignal reconstr uction in signal pro cess- ing. F o r a detailed discussion of the us age o f the MAP principle in IFT see [10]. The MAP ﬁeld is often a very go o d approximation o f the mean ﬁeld m = h s i ( s | d ) ≡ Z D s s P ( s | d ) , (5) which is the optimal estimator of the signal in a statistical L 2 error nor m sense: m = a r gmin ˜ s  Z V dx ( s x − ˜ s x ) 2  ( s | d ) . (6) The MAP estimator on the other hand c an b e shown to optimize the statistical L 0 norm 1 , the r esult o f which may str ongly deviate from the mean m , if the p oster io r is highly asymmetric a round its maximum. Thus w e can regar d the MAP e s timator as a go o d reference p oint, but not as the s olution we ar e seeking in general. It is, how- ever, accura te (in the L 2 error nor m sense) in ca se the po sterior around its ma ximum is clos e to a Gaussian. In this cas e, the MAP ﬁeld can easily b e aug mented with some uncer taint y information from the Hessia n of the Hamiltonian H = δ 2 H ( d, s ) δ s δ s †     s = m , (7) as an approximation of the tw o point function of the sig- nal uncertaint y D ≡  ( s − m ) ( s − m ) †  ( s | d ) . (8) Thu s we set D ≈ H − 1 in P ( s | d ) ≈ ˜ P ( s | d ) = G ( s − m, D ) , (9) where we intro duced the Gaussia n G ( φ, D ) ≡ 1 | 2 π D | 1 2 e − 1 2 φ † D − 1 φ . (10) Unfortunately , the MAP estimator can p erfor m sub op- timally in ca s es where the Gaussian approximation do es not hold, se e e.g. [11]. 1 The L 0 norm m easures the amoun t of exact agreeme nt via k f k 0 = l im ε → 0 1 ε R dx θ ( f 2 ( x ) − ε 2 ), wi th θ denoting the Hea vi- side f unction. 3 D. Maximum Entr opy 1. Image entr opy Another quantit y often extremized in ima ge reco n- struction problems is the so-called image entr opy (iE) [14 – 25]. In classical maximum (image) en tropy (MiE here, usually ME) methods the iE is deﬁned for a strictly po sitive signa l via S iE ( s ) = − Z V dx s x log( s x / ˜ s x ) ≡ − s † log( s/ ˜ s ) , (11) where ˜ s x is the reference image, which is us ed to mo del some prior infor mation. In the second equality w e have deﬁned the co mp o ne nt-wise applicatio n of functions on ﬁelds, e.g. ( f ( s )) x = f ( s x ), which we use thr o ughout this work. W e note, that the iE is ac tua lly not a physical e ntropy . Usually its usage is a rgued for by ad ho c a ssumptions on the distribution entrop y of pho ton pack ages in the image plane, rather than b eing a w ell motiv a ted description of the sig nal prior knowledge (or la ck thereof ). In the fol- lowing we will reveal the implicitly assumed pr ior of MiE metho ds. The da ta ent er the MiE metho d in form of a n ima ge energy , which is ide a lly chosen to be the negative log- likelihoo d, E ( d | s ) = − log  P ( d | s )  , (12) in or der to ensure the b est imprint of the data o n the reconstructio n. The entropy is then maximized with the energy constra int given by minimizing E iE ( d, s ) = E ( d | s ) − T S iE ( s ) (13) with r esp ect to s . Here T is some adjustable temper ature-like par a meter, per mitting us to choos e the relative weight of imag e entrop y and image energy . Low temper ature means that the MiE ma p follows the data closely , high temp era ture that the map spa ce wants to b e more uniformly o ccupied by the signal reco ns truction. The prior infor mation o n the signa l, P ( s ), does not ent er the MiE for malism explicitly . Actually , an implicit prior can b e identiﬁed, assuming that MiE is actually a MAP principle. In that c ase the implicitly assumed Hamiltonian is H iE ( d, s ) ∼ = E iE ( d, s ), wher e ∼ = denotes equality up to an irrelev ant, since s -independent, additive constant, and we ﬁnd P iE ( s ) ∝ e T S iE ( s ) ∝ Y x  s x ˜ s x  − T s x . (14) This is not a general pr ior, but a v ery sp eciﬁc PDF. Al- though there is s ome ﬂex ibilit y to a dopt its functional form by cho osing ˜ s , T , and the image space (pixel space, F ourier space, wa velet space, etc.) in which (11) holds, P iE ( s ) can not b e rega rded as b eing g eneric. The MiE prior str o ngly s uppresses lar ge v a lues in the MiE map. If a data fea ture can b e e ither explained by a single ma p pixel exhibiting a pea k v a lue or by s everal pixels dividing that v alue among themselves, MiE will us ually prefer the second option, lea ding to blur red reco nstructed ima ges. W e conclude, that the ter m maximum en tr opy com- monly used in image reconstruction is very misleading. A more accurate term would b e m inimal dynamic al r ange , since the implicitly assumed prio r s tates that pixe ls car- rying larger than average signal s x are extremely unlikely . 2. Physic al entr opy A ph ysical entropy should measur e the distribution spread of a PDF using a phase space integral ov er its phase space. In fact, the latter is given by the Bo ltzmann ent ro py as given by the negative Shannon infor mation, S B = − Z D s P ( s | d ) lo g P ( s | d ) , (15) which is a functional of the signal p os ter ior, S B = S B [ P ( s | d )], and not of the signa l map. Inse r ting (1) yields S B = h H ( d, s ) i ( s | d ) + log Z 1 ( d, 0) = U − F , (16) where we intro duced the internal energy U = h H ( d, s ) i ( s | d ) and the Helmholtz free ener gy F = F 1 ( d, 0) with F β ( d, J ) = − 1 β log Z β ( d, J ) . (17) The fully J -dep e ndent Helmholtz free energy provides the ﬁeld e xp ectation v alue via m = h s i ( s | d ) = ∂ F β ( d, J ) ∂ J     β =1 ,J =0 . (18) The entrop y is also given in terms o f the free energy via S B = ∂ F β ( d, J ) ∂ β     β =1 ,J =0 . (19) The entropy as well a s the free energy are functiona ls of the p o sterior and no t of the signal. Ma ximizing or min- imizing them do es no t provide a s ignal estimator, but singles out a P DF. If we restrict the space of PDFs to the ones we can handle ana ly tically , namely Ga ussians as given in (9) and (10), we might obtain a suitable ap- proximation scheme to the full ﬁeld theoretica l inference problem. Maximizing the en tropy alo ne do es not lea d to a s uit- able algorithm, since the ma ximal en tropy state is that of complete lack of knowledge, with a uniform probability for every signal p os sibility . The internal energ y , how ever, fav ors knowledge sta tes close to the po sterior maximum and would return the MAP so lution if extre mize d alone. 4 Thu s the rig ht combin ation of entrop y and internal en- ergy is to b e extremized. W e would exp ect a free ener gy of the for m U − T S B to be this function, in analogy to the energy (13 ) used in MiE methods. Thermo dyna mics teaches us tha t the Gibbs free energy is the qua nt ity to be minimized (which is identical to the Helmholtz free energy in case J = 0). Since we ar e going to calcula te this for an approximation of the real PDF, it is neces sary to go through the deriv ation in o r der to make sure we do this in the right fashion and unders ta nd all implications. II. THERMODYNAMICAL INFERENCE A. T emp e red Posterior In order to take full adv antage of the existing ther- mo dynamical machinery we wan t to construct the Gibbs free energy for information problems. T o this e nd, w e int ro duce a temp eratur e and a source function into the PDF o f the signal po sterior as suggested by the deﬁnition of the partition function (3) b y deﬁning P ( s | d, T , J ) = e − β ( H ( s,d )+ J † s ) Z β ( d, J ) = ( P ( d, s ) e − J † s ) β R D s ′ ( P ( d, s ′ ) e − J † s ′ ) β . (20) With the temp erature we can broaden (for T > 1 ) or nar- row (for T < 1) the po sterior . Three temper ature v alues are of sp ecial imp o r tance, namely T = 0, which mo di- ﬁes the P DF into a delta p eak lo cated at the p osterio r maximum, T = 1 , which returns the orig inal po sterior, and T = ∞ , leading to the maximum ent ro py sta te of an uniform PDF. The source function J pe rmits us to shift the mean of the P DF to any po ssible signal co nﬁguration m = m ( d, T , J ). The mo diﬁed PDF w ill b e appr oximated by a Gaussian with identical mean and v ariance: P ( s | d, T , J ) ≈ G ( s − m, D ) = ˜ P ( s | m, D ) , (21) where also D = D ( d, T , J ). W e w ill see, that the width D of this Gaussian ap- proximation of the P DF increa ses with incre a sing tem- per ature. A t low temp erature ( T ≪ 1) the center of the PDF is prob ed and mo deled, while a t large temp e ra- tures ( T ≫ 1) the fo cus is on its a s ymptotic tails. Since the Gaussian in (2 1 ) is an appr oximation, it is not ev en guaranteed that T = 1 pr ovides the b est recip e for sig - nal r econstruction. E.g. in [9] a c a se is shown, wher e signal reconstruction using T = 0 . 5 sligh tly outperfor ms bo th, T = 0 and T = 1. Since working at multiple tem- per atures can reveal diﬀerent asp ects of the s a me no n- Gaussian PDF (i.e. its central or a s ymptotic b ehavior), the question a pp ears how the diﬀerently retr ieved Gaus - sian approximations can b e combined into a single a nd more accura te representation of the original PDF. This will b e addr essed in Sect. IV. F or the moment we ap- proximate our p o s terior by a single Gaus s ian a s in (21 ). In this case, the par tition function ca n b e calculated explicitly and r eads ˜ Z β ( d, J ) =     2 π β D     1 / 2 exp  J † D J 2 β + J † m − β H ( m, d )  . With standard ther mo dynamics pro cedure we ca lc ula te h H i ( s | d ) ≈ − δ δ β ˜ Z β ( d, J )     J =0 = N dgf 2 T + H ( m, d ) (22) where N dgf is the dimension o f the signal vector. This result is the r e-phrased equipar tition theorem (4) from classical thermo dynamics a nd further motiv ates the no- tion of temp era ture in IFT. B. Int ernal, Helm holtz and Gibbs energy The next step is to calcula te the Helmho ltz free energ y . In case it can b e calc ula ted explicitly from (17), the in- ference problem is basically s olved, since a ny (connected) moment o f the signal p oster ior can directly b e calculated from it by ta k ing der iv atives with resp ect to the moment generating function J , e.g. see (18). This will, howev er, only b e the case for a very restricted class of Hamilto- nians, like the free ones , which ar e only quadr atic in s . In the more interesting c ase the Helmholtz free energy can not b e calculated explicitly , we can use the thermo- dynamical r elation of the Helmholtz free energy with the int erna l energy a nd entrop y . First, we note tha t the int erna l energy of the mo diﬁed po sterior is given by U ( d, T , J ) = h H ( s, d ) i ( s | d,T ,J ) ≈ h H ( s, d ) i ( s | m,D ) = ˜ U ( d, m, D ) , (23) where m and D ar e still functions of d , T , and J . The av erage in the s econd line has to b e understo o d to b e per formed o ver a Gaussian with mean m and disp ersio n D : h f ( s ) i ( s | m,D ) = R D s f ( s ) G ( s − m, D ). F urther, we need to ca lculate the e nt ro py for the mo d- iﬁed PDF, whic h for a Ga ussian dep ends only o n D : S B [ G ( s − m, D )] = 1 2 T r  1 + lo g(2 π D )  = ˜ S B ( D ) . (24 ) F or the full mo diﬁed p os terior, (20), the entropy is cal- culated via (15 ) to be S B = β  U + J † m − F  , (25) where m = m ( d, T , J ) = h s i ( s | d,T ,J ) , U is given b y (23), and F b y (1 7). So lving (25) for the Helmholtz free energy yields F β ( d, J ) = U − T S B + J † m. (26) This expr esses the Helmholtz free energy in terms of in- ternal energy and entropy . Unfortunately , this expression 5 contains the term J † m , where m depends on J implicitly through (18). In order to get r id of this ter m, we Leg- endre tr a nsform with r esp ect to J and there by use (1 8), which provides us with the Gibbs free ener gy G β ( d, m ) = F − J † δ F δ J = U − T S B . (27) The Gibbs ener g y dep ends s olely on m a nd not on J . It can b e constr ucted approximativ ely , in case a pproxima- tions of the in ternal energy and the entrop y are av ailable. F or our Gaus sian approximation of the mo diﬁed p o ste- rior we therefore write ˜ G β ( d, m, D ) = ˜ U ( d, m, D ) − T ˜ S B ( D ) . (28) W e know from thermo dynamics that the minimum o f the Gibbs free energy with respe c t to v ar iations in m provides the exp ectation v alue h s i ( s | d ) of our ﬁeld: δ G ( d, m, D ) δ m     m = h s i ( s | d ) = 0 (29) Thu s, the Gibbs ener gy is the infor mation energy we w ere lo oking for in the int ro ductio n. Minimizing the Gibbs free energy for a Gaussian PDF with resp ect to m yields 0 = δ ˜ G δ m = Z D s H ( d, s ) δ G ( s − m, D ) δ m = − D − 1 h φ H m ( d, φ ) i ( φ | D ) , (30) with H m ( d, φ ) = H ( d, m + φ ), whic h implies m = h s H ( d, s ) i ( s | m,D ) h H ( d, s ) i ( s | m,D ) = h s H ( d, s ) i ( s | m,D ) ˜ U ( m, D ) . (31) The optimal map is therefore the ﬁrst signal momen t of the full Hamiltonian w eighted with the a pproximating Gaussian. Thermo dynamics tea ches us further that the pro pa ga- tor, the uncertaint y disp ersion of the ﬁeld, is pr ovided by the seco nd deriv ative of the Gibbs free energ y a r ound this lo catio n, thanks to the well known re lation  δ 2 G δ m δ m †  − 1      m = h s i ( s | d ) = − δ 2 F δ J δ J †     J =0 = β D. (32) This relation clo ses the set of equa tions by providing D . Ev aluating (32) with our appr oximate Gibbs energy (28) and using (29) yie lds T D − 1 = δ 2 ˜ G δ m δ m †      m = h s i ( s | d ) = − D − 1 ˜ U ( d, m, D ) + D − 1  φ φ † H m ( d, φ )  ( φ | D ) D − 1 . Thu s the propagator is the second moment of the Gaus- sian weigh ted Hamiltonian, D =  φ φ † H m ( φ )  ( φ | D ) ˜ U ( d, m, D ) + T . (33) This equatio n seems to suggest that the propag ator ev al- uated at higher temp era ture is narr ow er, since T app ear s in the denomina to r. How ev er, the opp os ite is the c a se due to the presence o f D in all terms, as a test with a free Hamiltonian will show in (44). C. Cross information The Gibbs free ener gy at T = 1 is dir ectly rela ted to the cr o ss information b etw een the po sterior a nd its Gaus - sian approximation. The cros s information (or neg ative relative entrop y) o f a PDF ˜ P with resp ect to another one P is measur ed b y the so called Kullback-Leibler di- vergence [26]: d KL [ ˜ P , P ] = Z D s ˜ P ( s | d ) log ˜ P ( s | d ) P ( s | d ) ! . (34) The Kullback-Leibler divergence characterizes th e dis- tance b etw een a sur rogate and target P DF in an informa - tion theoretical sense. It is an as ymmetric distance mea- sure, reﬂecting that the r oles of the tw o involv ed P DF diﬀer. The equiv alence of Gibbs free ener gy and cross in- formation w ith res p ect to inference problems ca n easily be seen: ˜ G ( m, D ) = h H ( d, s ) + log ( G ( s − m, D )) i ( s | m,D ) = Z D s G ( s − m, D ) log  G ( s − m, D ) P ( s, d )  ∼ = Z D s G ( s − m, D ) log  G ( s − m, D ) P ( s | d )  = d KL [ ˜ P , P ] . (35) In the second la st step we added the term log P ( d ), which is ir r elev an t here, since m - and D - indepe ndent , and in the last step we int ro duce d the Kullback-Leibler diver- gence b etw een po sterior P ( s | d ) and its Gaussian sur- rogate ˜ P ( s | d ) = G ( s − m, D ). Minimal Gibbs free en- ergy therefore seems to cor resp onds to minimal Kullback- Leibler divergence, and therefore to maximal cro ss infor- mation of the surroga te with the exact po sterior. How ever, we hav e only minimized the Gibbs free en- ergy so far with resp ect to m , the mean ﬁeld, deg rees of freedom of our Gaussian, no t with res p ec t to the ones pa- rameterizing the uncertaint y dispersio n D . W e hav e de- termined this using the thermo dynamical rela tion (29). If we want that our surroga te P DF has maxima l cross in- formation with the p osterio r with resp ect to all deg rees of freedom of our Gauss ian, we also ha ve to minimizing the Gibbs energy with r esp ect to D . A short calculation 6 shows that this ac tually yields a result whic h is equiv alent to the thermo dyna mical relatio n (32): 0 = δ ˜ G δ D = Z D φ H m ( d, φ ) δ G ( φ, D ) δ D − T δ ˜ S B ( D ) δ D = D − 1 2 h  φ φ † H m ( d, φ )  ( φ | D ) − D  ˜ U ( m, D ) + T  i D − 1 , from which a lso (33) follows. Thus, we can regard b oth, the ma p m and its uncertaint y cov ar iance D , as param- eters for which the Gibbs energ y should b e minimized. W e will refer to this as the maximal c r oss infor mation principle. W e further note that the maxima l cro ss informa tio n principle als o holds if the Gaussia n is repla ced by some other mo del function, G [ ˜ P ( s | d )] ∼ = d KL [ ˜ P , P ], a prop er ty we will use later in Sect. IV. Note, that the minima l c r oss information and the ther- mo dynamical relations yield exactly the s a me r esults for m and D only if ˜ G ( m, D ) is calculated exactly . In cas e there are approximations in volv ed, the resulting algo- rithms diﬀer slightly , and this diﬀerence ca n b e use d to monitor the impact o f the approximation ma de. In the following, we us e the minimal cross infor mation principle for our ex amples. D. Calculating the internal e nergy In o rder to calculate the approximativ e Gibbs e nergy , we need to estimate the internal energy , for which we hav e to sp ecify the exac t Hamiltonian. W e assume that it can b e T aylor-F r´ echet expanded a s H ( d, s ) = ∞ X n =0 1 n ! Λ ( n ) x 1 ...x n s x 1 · · · s x n | {z } Λ ( n ) ( s,...s ) , (36) where rep eated co ordinates ar e thoug ht to b e int egr ated or summed over. The approximative internal energ y is then ˜ U ( m, D ) = U [ ˜ P ( s | d )] = Z D s H ( d, s ) ˜ P ( s | d ) = ∞ X n =0 1 n ! D Λ ( n ) ( s, . . . s ) E ( s | m,D ) . (37) The Gaussian n - p oint correlatio n functions in this eq ua - tion can actually be calcula ted analytically . F or this, we again use the shifted ﬁeld φ = s − m , whic h has the Hamiltonian H m ( d, φ ) = ∞ X n =0 1 n ! Λ ( n ) m ( φ, . . . φ ) , with (38) Λ ( n ) m ( φ, . . . φ ) = ∞ X k =0 1 k ! Λ ( n + k ) ( φ, . . . φ | {z } n , m, . . . m | {z } k ) . W e ass ume that the interaction co eﬃcients Λ ( n ) x 1 ...x n are symmetric with resp ect to index p ermutations, since they resulted from a T aylor-F r´ echet expansio n. The in ternal energy can then be calculated via the Wic k theo r em and the fact that a ll o dd mo ments of φ v anish: ˜ U ( m, D ) = ∞ X n =0 1 n ! D Λ ( n ) m ( φ, . . . φ ) E ( φ | D ) (39) = ∞ X n =0 1 2 n n ! Λ (2 n ) m ( n z }| { D ⊗ · · · D ) = ∞ X n,k =0 Λ (2 n + k ) ( n z }| { D ⊗ · · · D ⊗ k z }| { m ⊗ · · · m ) 2 n n ! k ! . Here, we deﬁned the symmetr iz ed tenso r pro duct  T ⊗ T ′  x 1 ...x n ≡ P π ∈ S n 1 n ! T x π (1) ...x π ( k ) · T ′ x π ( k +1) ...x π ( n ) by aver- aging ov er all per mut ations in S n , the symmetric g roup. Having obtained th e in ternal energy with (39), and ent ro py with (25) a pproximativ ely , we can construct the Gibbs free energ y according to (2 8) whic h we use for o ur inference. E. Minimi zing In order to get our optimal Gaussian approximation to the po s terior, we have to minimize ˜ G β ( m, D ) with resp ect to m and D . Minimizing for m is equiv alen t to minimizing the int erna l energy , since the entropy do es not dep end on m . This yields 0 = δ ˜ U ( m, D ) δ m (40) = ∞ X n,k =0 Λ (2 n + k +1) ( n z }| { D ⊗ · · · D ⊗ k z }| { m ⊗ · · · m, · ) 2 n n ! k ! , . which has to b e s olved for m for any g iven D . The prop- agator derives fr om (32) or from 0 = δ ˜ G ( m, D ) δ D ⇒ (41) T D − 1 = ∞ X n,k =0 Λ (2 n + k +2) ( · , · , n z }| { D ⊗ · · · D ⊗ k z }| { m ⊗ · · · m ) 2 n n ! k ! . which also dep ends on m . Th us, (40) and (41) hav e to be solved simultaneously . A s imple exa mple should b e in order . The s implest case is that o f the original Hamiltonian b eing qua dratic. The a pproximated one s hould then ma tch this exa ctly . A quadratic or free Hamiltonian is equiv alent to a Gaussian 7 po sterior, P ( s | d ) = G ( s − m ∗ , D ∗ ). W e get H ( d, s ) ∼ = 1 2 ( s − m ∗ ) † D − 1 ∗ ( s − m ∗ ) ∼ = Λ (1) x s x + 1 2 Λ (2) xy s x s y with (42) Λ (1) = − D − 1 ∗ m ∗ , and Λ (2) = D − 1 ∗ . Inserting this in to (40) and (41) yields 0 = Λ (1) ( · ) + Λ (2) ( m, · ) = D − 1 ∗ ( m − m ∗ ) ⇒ m = m ∗ , (43) T D − 1 = Λ (2) ( · , · ) = D − 1 ∗ ⇒ D = T D ∗ , (44) which indeed re cov ers the original co eﬃcients for T = 1, and a narrower or wider uncertaint y disp er sion for T < 1 or T > 1, resp ectively . In the following, we will see tha t also in ca se of interacting Hamiltonians the minimal free energy principle pr ovides the co rrect results. W e show this by repro ducing (and extending) signal estimator s de- rived previously in IFT using renormaliza tio n tec hniques. II I. APPLICA TION EXAMPLES A. Poissonian log-normal data 1. Sep ar able c ase Many inference pro blems hav e to deal with Poissonia n noise, like X-ray a nd γ -ray astrono my as well a s recon- struction of the cos mic large-scale structur e from g alaxy counts. Let us ass ume that the mean count ra te λ of photons o r gala xies is prop o r tional to an ex po nentiated Gaussian random ﬁeld s with cov ariance S =  s s †  ( s ) according to λ ( s ) = κ e b s . (45) Here, κ is the expected counts for s = 0, which may de- pend on the spatial po s ition. The sca lar b permits us to change con venien tly the s trength of the no n-linearity of the problem witho ut changing the sig nal statistics. This log-nor mal mo del for the cosmic large - scale structur es as an a pproximativ e description is actually supp orted ob- serv ationally [27, 28] and theor etically [29 – 3 4]. As a sta rting p oint, we assume a lo cal resp onse, s o that the Poisson statistics for the actual counts d x at lo c a tion x are P ( d x | λ x ) = λ d x x d x ! e − λ x , (46) and the full likelihoo d is well separa ble into lo cal ones: P ( d | s ) = Y x P ( d x | λ x ( s x )) . (47) The corres p o nding Ha milto nian was shown in [9] to be H ( d, s ) ∼ = 1 2 s † S − 1 s − d † b s + κ † e b s . (48) Reconstruction metho ds for this data mo del w ere devel- op ed by [9, 35 – 37]. The in ternal energy of our Gaussian approximation can be calculated analy tica lly , ˜ U ( m, D ) ∼ = 1 2 m † S − 1 m + 1 2 T r( D S − 1 ) − d † b m + κ † e b m + b 2 2 b D (49) where b D denotes the vector of diag onal elements of D . Minimizing ˜ G ( m, D ) = ˜ U ( m, D ) − T ˜ S B ( D ) with re- sp ect to m and D yields m = S b  d − κ m + b 2 b D  , and D = T  S − 1 + b 2 b κ m + b 2 b D  − 1 , (50) resp ectively . Here we hav e deﬁned κ t = κ exp( b t ) and denote a diago nal ma trix by putting a ha t onto a vector of its diago na l elements ( b λ ) xy = λ x δ xy . This result is ident ical w ith the o ne found in [9] using a lengthy r enor- malization calcula tion. Ther e it was found by numerical exp eriment, that using T = 0 . 5 in (50) se ems to pro duce slightly b etter results than T = 0 and T = 1 . 2. Entangle d c ase So far , we assumed that the r esp onse provides a one to one corr esp ondence b etw een lo cations in signa l a nd data space. Ho wev er, for mo st measurements this is no t ex- actly true. X- a nd γ -ray telesco pe s typically exhibit po int spread functions, which ma p a sing le signal space lo cation onto several detectors, of w hich each detects even ts com- ing from several indistinguishable dire ctions. Also ga laxy redshifts do not provide a c curate distance information, since re dshift disto r tions a nd measur ement er rors lead to eﬀective p oint s pr ead functions. In the following, we gener alize to the case of a known and ﬁxed, but non-lo ca l measurement r esp onse. Fixed means, that the res po nse is indep endent of the signa l. This excludes the treatment of g alaxy redshift distortions with this ca s e (e.g. see [38] for this), but still inc ludes photometric redshift error s of galax y catalogs as well as X- and γ - ray telescop e data. Such problems hav e bee n approached in the past via the MAP pr inc iple [39 – 42]. The p oint spread function is mo dele d b y the resp onse matrix R = ( R ix ) which describ es how e missivity a t lo- cation x is exp ected to b e obser ved in data channel i . The exp ected co un t rate is now λ ( s ) = R e b s , (51) 8 and the likelihoo d do e s no t separate any more with re- sp ect to x P ( d | s ) = Y i P ( d i | λ i ( s )) , (52) since λ i ( s ) entangles the signa l from several lo cations, whereas in (47) it dep e nds only on the lo ca l signal v alue. W e recov er the former case for a diago nal res po nse R ix = κ x δ ix . The resulting Hamiltonia n H ( s | d ) ∼ = 1 2 s † S − 1 s + 1 † R e b s − d † log( R e b s ) (53) reduces to (48 ) for R being diagonal. The in ternal ener g y o f our s urrog a te Gaussia n ˜ P ( s | d ) = G ( s − m, D ) is then ˜ U ( m, D ) = 1 2 m † S − 1 m + 1 2 T r( D S − 1 ) + 1 † R e b m + b 2 2 b D − X i d i Z D φ log  R † i e b ( m + φ )  G ( φ, D ) | {z } I i . (54) This integral I i can no t b e calculated in closed fro m due to the logarithm in the integrand. W e expa nd the log a- rithm aro und R † i e m , s inc e we will see that this recovers the r esult o f the separ able case most easily for R b eing diagonal. W e get I i = log  R † i e b m  + * log R † i e b ( m + φ ) R † i e b m !+ ( φ | D ) . (55) In case R is diagonal, the ﬁrst term reduce s to b m + log R i , the second v anishes as h log(exp( b φ )) i ( φ | D ) = h b φ i ( φ | D ) = 0, and we recover the Hamiltonian of the separable case . In the general case of an entangling re sp onse we T aylor expand the lo garithm of the second term I i = log  R † i e b m  − ∞ X n =1 ( − 1) n n D r † i e b φ − 1  n E ( φ | D ) | {z } II i n , with (56) r i = R i e b m R † i e b m or r i ( x ) = R i ( x ) e b m ( x ) R dx ′ R i ( x ′ ) e b m ( x ′ ) . W e note that r † i 1 = R dx r ix = 1 b y cons truction. The expansion c o eﬃcients II i n can b e worked out one by one. W e pro vide here the ﬁrst few, na mely II i 1 = r † i e 1 2 b 2 b D − 1 , II i 2 = r ix r iy e 1 2 b 2 ( D xx + D yy +2 D xy ) − 2 r † i e 1 2 b 2 b D + 1 , II i 3 = r ix r iy r iz exp   b 2 2 X a,b ∈{ x,y ,z } D ab   − 3 r ix r iy exp   b 2 2 X a,b ∈{ x,y } D ab   + 3 r † i e 1 2 b 2 b D − 1 . (57) These co eﬃcients stay sma ll if b 2 D ≪ 1, which means that the expansion can b e truncated if the signal is known within a few ten per cent or if non-Ga ussianity is small. Large uncer tainties in the signa l strength do not nec- essarily lead to lar ge co eﬃcients if they ar e lo c a ted at po sitions without instrumental sensitivity ( R ix small) o r m uch lower exp ected count rates ( m x small). In b oth cases mostly pr ior information and extrap ola tio n from regions with more infor mative data will determine the solution at such lo cations . In case so me of these co eﬃcients a re lar ge, s ubstan- tial sig nal uncer taint y at the lo cations to which they are sensitive must b e prese nt . In this case an accur ate recon- struction fo r these lo cations can not b e ex pe cted. Th us, if we simplify the Hamiltonian by dropping s uch terms, even if they a r e relatively larg e, the quality of the recon- struction will not s uﬀer too muc h since only r egions are aﬀected, w hich are p o o rly constra ined by the data any- wa y . Therefore, trunca ting the ex pa nsion should alr eady provide usable alg orithms. 3. Zer oth or der solution T o zeroth or der, we ig nore all I I i n -terms and ﬁnd for the approximative fr ee ener gy ˜ G ( m, D ) ≈ 1 2 m † S − 1 m + 1 2 T r( D S − 1 ) + X i h R † i e b m + b 2 2 b D − d i log  R † i e b m i − T 2 T r (1 + log(2 π D )) . (58) Minimizing this with resp ect to m and D yields m = S b X i R i e b m d i R † i e b m − e 1 2 b 2 b D ! = S b  d † r − κ ′ ( m + b b D/ 2)  , and D = T  S − 1 + b 2 b κ ′ ( m + b b D/ 2)  − 1 , with κ ′ ( t ) = X i R i e b t . (59) This is v ery similar to (50) and reduces to it for a diagona l resp onse. 4. First or der c orr e ction First or der corr ections are included by keeping the II i 1 -term in the approximative free ener g y , but ignor ing 9 higher terms. The resulting equations ar e m = S b X i d i  1 + r † i e b 2 2 b D  r i − κ ′′ ( m + b b D / 2) ! D = T  S − 1 + b 2 b κ ′′ ( m + b b D / 2)  − 1 , with κ ′′ ( t ) = X i R i e b t 1 + d i R † i e b m ! . (60) This is a slight mo diﬁcation with resp ect to (59) in tw o asp ects. The map c hanges a bit, but the sign of the changes depends on the details of the p oint spread func- tion, since there ar e tw o new terms of similar order, but with opp o site signs. The unce rtaint y v ariance is reduced, since the ter m added to the inv erse propaga tor is always po sitive. 5. Observation with b ackgr ound The observ ation may suﬀer fro m a background, even ts in da ta space, which do not contribute to our sig nal knowledge. F o r example γ -ray astronomy has to suppress cosmic ray even ts as muc h a s p ossible, since c har g ed par- ticles do not po int ba ck to the same sour ces a s neutral photons due to cosmic mag netic ﬁelds. F o rtunately , cos- mic rays have diﬀerent s ignatures in data space due to the diﬀerences in hadro nic and e le ctromagnetic interactions. How ever, not for all measur ed even ts is the distinction clearly cut and we hav e to use prio r knowledge to sup- press the bac kgr o und even ts. Therefore we should extend our formalism to also take such un wan ted bac kgro unds into a ccount. Actually a reinterpretation of the ab ov e formula will do. W e ex- tend our signal space by the quan tity f deter mining the logarithm o f the background count rate, s → s ′ = ( s, f ). f z might be a ﬁeld over the same physical space a s s x , or just a sing le n umber as a total iso tropic cosmic ray ﬂux. In any case, the x − and z − co ordina tes are regarded to b e ov er diﬀere nt spaces , or distinct areas of the jo int space ov er which f and s liv e. The joint cov ariance r eads S ′ =  S 0 0 F  (61) due to the independence of signa l and background. Her e, F =  f f †  ( f ) is the log-background cov ariance. The re- sp onse R → R ′ has to be extended to map also the back- ground spa ce into the data space. Whether the resp onse images of signal and background even ts in data space are well se parated or whether they ov erlap decides ab o ut the background discr iminating p ow er of the instrument. The combined map and cov a riance of signal a nd log- background can now b e obtained, e .g . from (59) o r (60) with the appro priate replacements for S, R, m, D → S ′ , R ′ , m ′ , D ′ . O ur join t map can be split in to a sig- nal and log- background part m ′ = ( ˜ s, ˜ f ). Since we are usually not in terested in the background prop er- ties, we mar ginalize ov er it. This is esp ecially simple in the Ga ussian appr oximation of our joint po sterior P ( s ′ | d ) ≈ G ( s ′ − m ′ , D ′ ), with s ′ = ( s, f ), m ′ = ( ˜ s, ˜ f ), m ≈ Z D s ′ s G ( s ′ − m ′ , D ′ ) = ˜ s, and (62) D xy ≈ Z D s ′ ( s − ˜ s ) x ( s − ˜ s ) y G ( s ′ − m ′ , D ′ ) = D ′ xy . Although this do es no t lo o k to o diﬀerent from the fo r- m ula for the ca se without background, the e ﬀect of the background entered through the joint cov ar iance matrix D ′ , which mixes the co ntribution from the signal and background even ts appropr iately . B. Reconstruction without spe ctral knowledge 1. Eﬀe ctive the ory The reco nstruction o f the sig nal in the Poisson lo g- normal model in the previous section assumed that the signal cov a r iance is known a pr iori. In case it is unknown, it has to b e extra cted fr om the s ame data used for the signal inference [43 – 4 7]. How ever, the o ptimal wa y to do this was usually not der ived fro m ﬁrst principles, mayb e except in [4 8 – 50]. A rigorous appro ach to such pr o b- lems is g iven by the co mputationally exp ensive Gibbs- sampling technique, which investigates the joint spac e of signal realizations and power s pe ctra [51 – 54], which ca n then easily b e marginalized over the p ow er spe c tra to obtain a gene r ic signal reconstructio n. This problem was also addressed approximativ ely for the ca se of linea r re- sp onse da ta from a Gaussian sig nal s ub ject to Gaussia n noise using the MAP pr inciple as well as by the help of parameter uncertaint y r enormalized estimation by [1 1]. W e re-addr ess this problem here using the minimal free energy approa ch. W e as s ume the cov aria nce S =  s s †  ( s ) of o ur Gaus - sian s ignal s to b e dia gonal within so me known function basis O kx , e.g. the F ourier basis with O kx = e i k x . W e mo del the power sp ectr um (in this basis) as b eing a lin- ear combination of a n umber of p os itive basis functions f i ( k ) with disjoint supp orts (the sp ectra l bands), so that P s ( k ) = X i p i f i ( k ) (63) is po sitive for all k (all co eﬃcients of p = ( p i ) i are po sitive and the sp ec tr al bands cov er the full k -spa ce do main). W e deﬁne ( S i ) xy = ( O † b f i O ) xy = O k x f i ( k ) O k y (64) to b e the i -th sp ectra l band matrix and S − 1 i to b e its pseudo-inv erse. Thus, we write our signal cov ar iance as S = X i p i S i , (65) 10 with p = ( p i ) the vector of unknown s p e c tral parame- ters. W e further assume that the individual sig nal-band amplitudes p i hav e an independent prior distribution, P ( p ) = Y i P ( p i ) , (66) with the individual pr iors b eing inv ers e - gamma distribu- tions, pow er- laws with exp onential low amplitude cutoﬀ at q i : P ( p i ) = 1 q i Γ( α i − 1)  p i q i  − α i exp  − q i p i  . (67) F o r α i ≫ 1 this is an infor mative prior, where q i /α i determines the pre fer red v alue. A non-informa tive pr ior would b e given by J eﬀreys pr ior with α i = 1 and q i = 0. 2 F o r a linear data mo del d = R s + n, (68) with Gaussian noise with cov aria nce N =  n n †  ( n ) , the parameter marginalized eﬀective Hamilto nia n is accord- ing to [1 1] H ( d, s ) ∼ = 1 2 s † M s − j † s + X i γ i log  q i + 1 2 s † S − 1 i s  . (69) Here M = R † N − 1 R , j = R † N − 1 d , γ i = α i − 1 +  i / 2, and  i = T r[ S − 1 i S i ] the n umber of sp ectral degrees of freedom within the band i . 2. F r e e ener gy exp ansion The internal energy of a Ga ussian p o s terior-a nsatz is then ˜ U ( m, D ) ∼ = 1 2 m † M m + 1 2 T r( D M ) − j † m + X i γ i  log  q i + 1 2 s † S − 1 i s  ( s | m,D ) | {z } I i . (70) Again we have to deal with a Gaussia n av erag e over a logarithm, which we expand as I i = lo g( ˜ q i ) − ∞ X k =1 ( − 1) k k ( ˜ q i ) k *  q i + 1 2 s † S − 1 i s − ˜ q i  k + ( s | m,D ) | {z } II ik , with ˜ q i = q i + 1 2 T r(( m m † + δ D ) S − 1 i ) . (71) 2 Since t his w ould result in an improp erly n ormali zed prior, we understand this as α i = 1 + ǫ , q i = ǫ , and lim ǫ → 0 at the end of the calculation. Here we have intro duced a parameter δ to b e ﬁxed so on. The ﬁrst tw o expa nsion co eﬃcients ar e II i 1 = 1 2 (1 − δ )T r( D S − 1 i ) II i 2 = I I 2 i 1 + T r  m m † + 1 2 D  S − 1 i D S − 1 i  . (72) 3. Zer oth or der solution T o ze r oth o rder we ﬁnd by minimizing the free energ y while igno r ing the I I-cor rections m = D ′ j, D = T D ′ , and D ′ = M + X i p − 1 i S − 1 i ! − 1 . (73) This means that the map is the Wiener ﬁltered data, where the s p ec tr al co eﬃcients a re assumed to be p i = ˜ q i γ i δ = 1 γ i δ  q i + 1 2 T r(( m m † + δ D ) S − 1 i )  . (74) F o r δ = 0 this yields p i = ∞ and ther efore D = M − 1 if M is (pseudo)-inv ertible. The resulting ﬁlter provides a no ise weigh ted dec o nv olution, how ever is unable to ex- trap olate in to uno bserved r egions of the signal spa c e . It is widely use d for ma p making in the ﬁeld o f cosmic mi- crow av e background observ ations. F or δ = 1 we recov er the critica l estima to r of [11]. Since there it was shown that the latter per forms signiﬁcantly b etter than the for- mer, and also since I I i 1 = 0 and I I i 2 is minimal for δ = 1, we ado pt this in the following. F or Jeﬀreys prior we ﬁnd p i = T r( B i )  i , (75) with B i = ( m m † + D ) S − 1 i . 4. Se c ond or der c orr e ction Including higher order corrections should improv e the reconstructio n. The ﬁrst order cor rections v anish for δ = 1. The second order co r rection yields m = D ′ j, D = T " D ′ − 1 − X i γ i ˜ q 2 i S − 1 i m m † S − 1 i # − 1 , D ′ = " M + X i γ i ˜ q i X i S − 1 i # − 1 , (76) X i = 1 + 1 ˜ q 2 i T r  ( m m † + 1 2 D ) S − 1 i D S − 1 i  − 1 ˜ q i S − 1 i D . The op er ator D ′ , which is applied to j to generate the map, and the uncertaint y disp er sion D ar e not identi- cal any more. Neither of them can still b e expressed as 11 ( M + P i p − 1 i S − 1 i ) − 1 , due to the o p erator structure of the S − 1 i D and S − 1 i m m † terms. This was also found in [11]. How ever, if we can as s ume that this op erator pro cesses any channel in the i -th band in a s imila r wa y , we can r e- place S − 1 i D S − 1 i and S − 1 i m m † S − 1 i by their channel av er- aged v alues T r( D S − 1 i ) S − 1 i / i and T r( m m † S − 1 i ) S − 1 i / i , resp ectively . This pe r mits to identify spectr al co eﬃ- cients of D = ( M + P i p − 1 i S − 1 i ) − 1 and D ′ = ( M + P i p ′ i − 1 S − 1 i ) − 1 . F or Jeﬀrey s pr io r they become p i = T r( B i )  i   1 − 2  i T r  m m † S − 1 i  T r( B i ) ! 2   − 1 , and p ′ i = T r( B i )  i " 1 + 2  i T r  m m † S − 1 i  T r  D S − 1 i  (T r( B i )) 2 # − 1 , (77) where m , D , and B i = ( m m † + D ) S − 1 i all dep end on p . It is obvious, that the second or der corr ection increases p i by some margin co mpared to (7 5), meaning that the reconstructio n uncerta int y increas e s. It is less o bvious how p ′ i develops, since at ﬁrst glance it seems to be cor- rected downw ards. Note how ever, that an increased p i implies an increased T r( B i ), since D grows (spectr ally) with increa s ing p i . The fact that we get t wo diﬀering s ets of sp ectral co eﬃ- cients, p i and p ′ i , reminds us to r e g ard them as auxilia ry v ariables of our signal r econstruction algorithm, rather than as optimal sp ectr um estimates. C. P oisson log-normal distribution with unkno wn sp e ctrum The combined problem, reconstr ucting a Poisson log - normal signal with unkno wn sp ectr um, can now b e treated approximatively . The combined free energy for the Gaussia n p o sterior appr oximation to zeroth order is ˜ G ( m, D ) ≈ X i h R † i e b m + b 2 2 b D − d i log  R † i e b m i + X i γ i log  q i + 1 2 T r  ( m m † + D ) S − 1 i   − T 2 T r (1 + log(2 π D )) . (78) The resulting map and uncer taint y disp ersio n are pro- vided by (59) with the addition tha t S = P i p i S i and the p i s are pr ovided b y (74). Higher o rder corrections can be included in a similar wa y as in the individual prob- lems. Also background co unts with k nown or unk nown cov aria nce str ucture can b e included in the sa me wa y they were trea ted in Sect. II I A 5. IV. INFOR MA T ION SYNTHESIS A. Multi-temp erature p osterior Although the o btained Gaussian kno wledge states from minimal free energ y es tima tio n are approximative and therefore of limited acc uracy , they might p ermit us to construct mor e accur a te mo dels of the p osterio r. The idea is to com bine several Gaussian distributions to a more accurate appr oximation o f the true no n-Gaussian po sterior pro bability , and to measure the mean map a nd its uncerta int y disp ersion fr o m this combination. W e r ecall that Gaussian a pproximations of the p oste- rior o bta ined at lo w temp eratur es ( T ≪ 1) mostly carry information on its p eak regio n, while those obtained at large temp er atures ( T ≫ 1) information on its asymp- totics. Also the c a nonical T = 1 do es not pr ovide a per fect representation of the p osterio r, a s a Gaussian ap- proximation for a non-Gaus sian PDF never ca n. How- ever, by combining such diﬀerent approximations in an appropria te way , we sho uld obtain an improv ed re pr e- sentation of the corre c t PDF, which per mits muc h easier calculation of moments lik e the sig nal mean and its un- certaint y v ariance. T o this end we postulate the existence of a temp era ture distribution function P ( T ), such that P ( s | d ) = Z ∞ 0 dT G ( s − m ( d,T ) , D ( d,T ) ) P ( T ) (7 9) combines the diﬀerent Gaussians with mea ns m ( d,T ) and disp e rsions D ( d,T ) to synthesize the right p os terior prob- ability . A for mal pro o f of the existence of P ( T ), and the necessary conditions for this is b eyond the sco p e of this work. It should be noted, that e.g. multi-peaked distri- butions cannot accur ately be repres ent ed b y a pproximate Gaussians obtained at diﬀeren t tempe r atures. They can, how ever, often b e well approximated by Gaus ians cen- tered on those pea ks. The recip es describ ed b elow do not dep end o n the way the diﬀerent Gaussia ns used in the mixture mo del were obtained, a nd therefor e can also be used in such ca ses. In the following we provide a recip e to construct P ( T ) in practice. W e ass ume that m i = m ( d,T i ) and D i = D ( d,T i ) hav e been co mputed for a n umber N T of temp er- atures T i . The temp eratures are b e st chosen to sample well the diﬀerent par t of the p oster ior, its p eak by having some T i ≪ 1, the bulk of the PDF with T i = 1, and the PDF tails with T i ≫ 1. The s urroga te probability function w e wan t to con- struct, and which should re semble the exact one as closely as p ossible, is therefore of the form ˜ P ( s | d ) = N T X i =1 G ( s − m i , D i ) P i . (80) ˜ P ( s | d ) should b e as close a s p ossible to P ( s | d ) in an infor- mation theoretical sense. The natural choice for the dis- tance meas ure is the Kullback-Leibler divergence, which 12 measures the cr oss-infor mation of ˜ P ( s | d ) on P ( s | d ), a nd which is practically identical to the free energy ˜ G [ ˜ P ( s | d )] of our surroga te p osterior accor ding to (34). Introducing un-normalized probabilities p i as our de g rees of freedom, and setting P i = p i / Z p with Z p = P j p j in order to enforce the proper normalization, P i P i = 1, this reads ˜ G ( p ) = X i p i Z p ( U i − ˜ U i ( p )) − F. (81) W e hav e introduced the her e irr elev an t, since p - independent, free energy F = − log Z d of the or iginal problem and the e nergies U i and ˜ U i ( p ) with res pe c t to the template distributions G i ( s ) = G ( s − m i , D i ): U i = h H ( d, s ) i G i = Z D s G i ( s ) H ( d, s ) and ˜ U i ( p ) = D ˜ H p ( s ) E G i , with (82) ˜ H p ( s ) = − log( X i p i G i ( s ) / Z p ) . B. Minimi zing the Gibbs energy 1. Analytic al scheme Now one has to minimize ˜ G ( p ) with resp ect to p . The problem to calc ula te the path in tegra ls deﬁning the en- ergies was alr eady addressed in this work. A system- atic way is to T aylor-F r´ ec het expa nd the Hamiltonians around the centers o f the Gauss ia ns m i and then use the known mo ment s of G i ( s ) to approximate the energies. F o r the surr o gate ener gies this yields up to second order in φ i = s − m i ˜ U i ( p ) = − log g i + 1 2 X j g j i g i T r( D − 1 j D i ) (83) + 1 2 X j k g j i g i  g k i g i − δ j k  m † ij D − 1 j D i D − 1 k m ik , with g j i = p j G j ( m i ) / Z p , and g i = N j X j =1 g j i , and (84) m ij = m i − m j . 2. Monte-Carlo sche me Alternatively , one c a n approximate the av erage h X [ s ] i G i of a quantit y X [ s ] by sums over N i sampling po ints { s ( j ) i } j , which can easily be dr awn from G i ( s ): h X [ s ] i G i ≈ X j X [ s ( j ) i ] / N i . (85) This w ay , ˜ G ( p ) can be approximated, and minimized with a suitable o ptimiza tion scheme. The sampling p oints, their Ga us sian proba bilities G ( j ) k i = G k ( s ( j ) i ), a s well as the energies U i need only b e ca lculated once, but the sur - rogate e nergies U i ( p ) = log Z p − P j log( P k p k G ( j ) k i ) / N i hav e to be updated at a ny step o f the scheme. One might argue, that if we use s to chastic metho ds to build ˜ P ( s | d ), one could have used a Ma r ko v-Chain Monte-Carlo (MCMC) metho d right from the b eginning for the signal inference problem. How ever, we exp e ct that the here describ ed po sterior synthesis metho d should re- pro duce the cor rect p osterior b etter than a sample po int cloud, since w e are using w ell adapted Gaussians as our building blo cks and not delta functions as the direct MCMC approach us es. F urther mo re, the a nalytical and sampling metho d can b e combined, in that the analyti- cal estimates are combined with the sampling estimates of the contributions of the neg lected terms in the T aylor- F r´ echet expans ions of (8 3). And ﬁnally , since o ur scheme draws sa mples from Gaussia ns, it can be trivia lly par al- lelized, which is not ea s ily p o ssible with MCMC schemes. C. Maps and moment s Once the minimum of ˜ G ( p ) with resp ect to p is found, one has s y nt hesized a poster io r a ppr oximation with a Gaussian mixture mode l. F rom this, any moment of the distribution function can easily b e ca lculated. The mean map can b e e x pressed as m ≈ h s i ˜ P ( s ) = X i P i h s i G i ( s ) = X i P i m i , (86) as well as the uncertaint y disp ersion a s D ≈  ( s − m ) ( s − m ) †  ˜ P ( s ) = X i P i ( D i + m i m † i ) − m m † . (87) W e leav e the veriﬁcation and application o f the infor ma- tion synthesis metho d for future w ork. V. CONCLUSIONS W e have s hown that the minimal free Gibbs energy principle in infor mation ﬁeld theory can be used to ob- tain approximate knowledge states with maximal cr o ss- information to the exact p osterior . The constr uction of such knowledge states with Gaus sian PDF is rela tively straightforward: 1. The joint PDF o f s ignal and da ta P ( d, s ) has to be sp eciﬁed, e.g. by sp ecifying a data likelihoo d P ( d | s ) and signal prior P ( s ), and using P ( d, s ) = P ( d | s ) P ( s ). 2. The information Hamiltonia n H is the nega tive log- arithm o f this, H ( d, s ) = − log( P ( d, s )). 13 3. A suitably para metrized PDF a s a surroga te for the po sterior has to b e sp eciﬁed, e.g. a Gaus sian with its mean a nd disp e r sion as degrees o f freedom. 4. The internal energ y U and entrop y S B of this PDF hav e to be calculated as the PDF-av erag e of the Hamiltonian and the ne g ative log-P DF, resp ec- tively . 5. The Gibbs free energy , G = U − T S B , has then to be minimized with resp ect to all deg r ees of freedom of the surr ogate P DF. 6. Any statistical summary like mean and v ar iance can now b e extracted from the s urrog ate PDF. The minimal free e nergy principle is there fo re well suited to tackle statistica l inference problems. W e hav e demon- strated this with tw o diﬀere nt problems and their com- bination: r econstructing a log-nor mal ﬁeld from Poisson data sub ject to a p oint spread function and r econstruc- tion without prior knowledge on the signa l p ow er sp ec- trum. Ea rlier results from reno rmalization calculations in [9, 11] hav e b een r epro duced. The there used renor - malization sc hemes can therefore be understo o d as aim- ing for a surro gate Gaussian PDF which has maximal cross info r mation to the co rrect p os terior. Since these results were prev iously shown to reconstr uct well, also the here pro p o sed metho d for the more complicated c om- bined case ca n b e exp ected to work. Howev er, a detailed implemen tation and veriﬁcation of this was left for future work. Finally we hav e sketc hed how Ga ussian knowledge states obtained a t diﬀerent thermo dynamical temper a- tures can be combined into a more accurate representa- tion of the p os terior, from which momen ts of the signal uncertaint y distr ibutio ns can easily b e extracted. The minimal Gibbs energy and maximal cro ss informa- tion principle introduce d here to IFT s hould a llow the construction of nov el r econstruction s chemes for s tatis- tical inference problems on spatially distributed signa ls . The thermo dynamical la nguage may help to clarify con- cepts a nd to simplify applications o f IFT, since it p ermits us to ta ckle non-linear inv erse problems without the need to use diag rammatic pertur ba tion theory and r enormal- ization schemes. Ackno wledgments W e thank Mona F rommer t, Jens J asche, Niels O ppe r - mann, Gerhard B¨ or ner, and three referees for discussions and comments on the manuscript. [1] E. T. Ja yn es, Physical Review 106 , 620 (1957). [2] E. T. Ja yn es, Physical Review 108 , 171 (1957). [3] J. N . F ry, A pJ 289 , 10 (1985). [4] E. Bertsc hinger, ApJ 323 , L103 (1987). [5] W. Bial ek and A. Zee, Physical Review Lett ers 58 , 741 (1987). [6] W. Bialek and A. Zee, Physica l Rev iew Letters 61 , 1512 (1988). [7] W. Bialek, C. G. Callan, and S. P . Strong, Ph ysical R e- view Letters 77 , 4693 (1996), arXiv:cond- mat/960718 0. [8] J. C. Lemm, A rX iv Physics e-p rints (1999), physics/9 912005. [9] T. A. Enßlin, M. F rommert, and F. S. Kitaura, Phys. Rev. D 80 , 105005 (2009), 0806.34 74. [10] J. C. Lemm, Bayesian Fi eld The ory ( Johns Hopkins Uni- versi ty Press, 2003). [11] T. A. Enßlin and M. F rommert, ArXiv e-prints (2010), 1002.29 28. [12] R. T. Cox, American Journal of Physics 14 , 1 ( 1946). [13] R. T. Cox, American Journal of Physics 31 , 66 (1963). [14] S. F. Gull and G. J. Daniell, Natu re 272 , 686 (1978). [15] J. Sk illing, A. W. Strong, and K. Bennett, MNR AS 187 , 145 (1979). [16] R. K. Bryan an d J. Skilling, MNRAS 191 , 69 (1980). [17] S. F. Burch, S. F. Gull, an d J. Skilling, Computer Vision Graphics and I mage Pro cessing 23 , 113 (1983). [18] S. F. Gull and J. Sk illing, in Indir e ct Imaging. Me asur e- ment and Pr o c essing for Indir e ct Imaging (1983), p. 267. [19] S. Sibisi, J. S k illing, R. G. Brereton, E. D. Laue, and J. Staunton, Nature 311 , 446 (1984). [20] D. M. Titterington and J. Skilling, Nature 312 , 381 (1984). [21] J. S k illing and R. K. Bryan, MNRAS 211 , 111 (1984). [22] R. K. Bryan and J. Skilling, Journal of Mo dern Op tics 33 , 287 (1986). [23] S. F. Gull, in M axi m um Entr opy and Bayesian Meth- o ds , edited by J. Skilling (K luw er Academic Publishers, Dordtrech t, 1989), pp. 53–71. [24] S. F. Gull and J. Skilling, The MEMSYS5 User’s Man- ual (Maximum Entrop y Data Consultants Ltd, Ro yston, 1990). [25] J. Skilling, in Maxim um Entr opy and Bayesian Metho ds , edited by G. J. Erickson, J. T. Rychert, and C. R . Smith (1998), p . 1. [26] S. Kullbac k and R. Leibler, Annals of Mathematical Statistics 22 (1) , 79 (1951). [27] E. H ubble, ApJ 79 , 8 (1934). [28] F. S. Kitaura, J. Jasche, C. Li, T. A. Enßlin, R. B. Met- calf, B. D. W andelt, G. Lemson, and S. D. M. White, MNRAS 400 , 183 (2009), 0906.397 8. [29] D. Layzer, AJ 61 , 383 (1956). [30] P . Coles and B. Jones, MN RAS 248 , 1 (1991). [31] R. K. Sh et h , MNRAS 277 , 933 (1995), astro- ph/9511096. [32] I. Kay o, A. T aruya, and Y . S u to, ApJ 561 , 22 (2001), arXiv:astro-ph/0105218 . [33] R. Vio, P . And reani, and W. W amstek er, P A SP 113 , 1009 ( 2001), arXiv:astro-ph/0105107. [34] M. C. Neyrinck, I. Szapudi, and A . S. Szalay, A pJ 698 , L90 (2009), 0903.4693 . 14 [35] J. Jasche and F. S. Kitaura, ArXiv e-prints (2009), 0911.24 96. [36] J. Jasche, F. S. Kitaura, C. Li, and T. A. Enßlin, ArX iv e-prints (2009), 0911.2498 . [37] F. K itaura, J. Jasc he, and R. B. Metcalf , MNRAS 403 , 589 (2010), 0911.14 07. [38] C. W eig and T. A. Enßlin, ArXiv e-prints (2010), 1003.13 11. [39] T. J. H eb ert and R. Leahy, IEEE T ransactions on Signal Processing 40 , 2290 ( 1992). [40] T. J. Corn well and K . F. Ev ans, A &A 143 , 77 (1985). [41] G. W ang, L. F u, and J. Qi, Ph ysics in Medicine and Biology 53 , 593 (2008). [42] C. Oh and B. Roy F rieden, O ptics Communications 282 , 2489 (2009). [43] D. H. Rob erts, J. Lehar, an d J. W. Dreher, AJ 93 , 968 (1987). [44] G. B. Rybicki and W. H . Press, Ap J 398 , 169 (1992). [45] J. Li and P . St oica, IEEE T ransactions on Signal Pro- cessing 44 , 1469 (1996). [46] P . St oica, H. Li, and J. Li, IEEE S ignal Pro cessing Let- ters 6 , 205 (1999). [47] P . St oica, E. G. Larsson, and J. Li, AJ 120 , 2163 (2000). [48] P . K. Kitanidis, W ater R esources Researc h 22 , 499 (1986). [49] G. R ydb eck, Ap J 675 , 1304 (2008). [50] U. Seljak, ApJ 503 , 492 (1998), astro-ph/9710269. [51] B. D. W andelt, D. L. Larson, and A. Lakshmi- nara yanan, Phys. R ev. D 70 , 083511 (2004), arXiv:astro- ph/0310080. [52] H. K. Eriksen, I. J. O’Dwyer, J. B. Jewell , B. D. W an- delt, D. L. Larson, K. M. G´ orski, S . Levin, A. J. Ban- day , and P . B. Lilje, ApJS 155 , 227 ( 2004), arXiv:astro- ph/0407028. [53] J. Jew ell, S. Levin, a nd C. H . A nderson, ApJ 609 , 1 (2004), arX iv:astro-ph/0209560 . [54] J. Jasc he, F. S . Kitaura, B. D. W and elt, and T. A. Enßlin, MNRAS 406 , 60 (2010), 0911.2493 .

Inference with minimal Gibbs free energy in information field theory

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment