Astrophysical data analysis with information field theory
Non-parametric imaging and data analysis in astrophysics and cosmology can be addressed by information field theory (IFT), a means of Bayesian, data based inference on spatially distributed signal fields. IFT is a statistical field theory, which perm…
Authors: Torsten En{ss}lin
Astr oph ysical data analysis with inf ormation field theory T orsten Enßlin Max Planck Institut für Astr ophysik (Karl-Schwarzsc hild-Straße 1, D-85748 Gar ching, Germany), and Ludwig-Maximilians-Universität München (Gesc hwister-Scholl-Platz 1, D-80539 München, Germany) Abstract. Non-parametric imaging and data analysis in astrophysics and cosmology can be addressed by information field theory (IFT), a means of Bayesian, data based inference on spatially distributed signal fields. IFT is a statistical field theory , which permits the construction of optimal signal recovery algorithms. It e xploits spatial correlations of the signal fields ev en for nonlinear and non-Gaussian signal inference problems. The alleviation of a perception threshold for reco vering signals of unknown correlation structure by using IFT will be discussed in particular as well as a nov el impro vement on instrumental self- calibration schemes. IFT can be applied to many areas. Here, applications in in cosmology (cosmic micro wa ve background, large-scale structure) and astrophysics (g alactic magnetism, radio interferometry) are presented. INFORMA TION FIELD THEOR Y Information field theory (IFT) is information theory for fields, describing in mathematical language how information on spatially distributed quantities follo wing some physical laws can optimally be extracted from data. IFT e xploits known or inferred correlation structures of the field of interest s ( s is regarded as a fuction s = s ( x ) and as a vector s = ( s x ) x ∈ Ω in a function space) ov er some domain Ω = { x } in order to regularize the otherwise ill-posed in verse problem of determining virtually infinitely man y field degrees of freedom from a finite dataset d = ( d 1 , . . . d n ) T = ( d i ) i , n ∈ N . What distinguishes it from many non-parametric inference methods is that an IFT is defined ov er continuous spaces, and any pixelization of the field used in actual computations must preserve this continuum limit and reco ver it for infinitely small pixels. A concise introduction into IFT can be found Ref. [1], exhaustiv e ones in Refs. [2, 3], and the numerical issues of properly discretized fields are addressed in Refs. [4, 5]. Signal field estimation in IFT relies on Bayes theorem that gets recast into a (statistical) field theoretical language: P ( s | d ) = P ( d | s ) P ( s ) P ( d ) ≡ e − H ( d , s ) Z d . (1) This defines the information Hamiltonian and its partition function to be H ( d , s ) ≡ − ln P ( d , s ) = − ln P ( d | s ) − ln P ( s ) and (2) Z d ≡ Z D s e − H ( d , s ) = Z D s P ( d , s ) = P ( d ) , (3) and permits the usage of an y appropriate field theoretical technique in order to do our signal inference. These include, among others, classical solutions (or Maximum A Posteriori (MAP) approximation), Feynman diagrams, renormalization and re-summation techniques, Gibbs free energy minimization (or Maximum Entropy), and mean field approaches. See Refs. [1, 3] for an ov ervie w . IFT can be applied in scientific areas that rely on imaging, like astronomy and astrophysics, and examples of such applications can be found at https://www .mpa-garching.mpg.de/ift. Useful IFT apporaches to some generic astrophysical signal inference problems are re viewed belo w . THE MEASUREMENT PR OBLEM In a generic linear measurement the data depend linearly on the signal, d = R s + n = Z Ω d x R ix s x + n i i , (4) with R the instrument response operator , which describes ho w the instrument senses the signal, and n the noise. In the simplest case, signal and noise priors are independent Gaussian distributions, P ( n , s ) = G ( s , S ) G ( n , N ) , with G ( φ , Φ ) = 1 p | 2 π Φ | exp − 1 2 φ † Φ − 1 φ , (5) with S and N known signal and noise cov ariances, respectiv ely , and † denoting the adjoint of a vector or field. Then, the signal posterior is Gaussian as well, P ( s | d ) = G ( s − m , D ) , with m = D j , D = S − 1 + R † N − 1 R − 1 , and j = R † N − 1 d . (6) Here, m is the signal posterior mean or W iener filter estimate, D the signal uncertainty co variance or information propagator , and j the information source [3]. A typical signal in astronomy is the intensity distribution I x of light of a certain wa velength on the celestial sphere Ω = S 2 . Since I x is strictly positive and can v ary o ver orders of magnitude, the log arithmic brightness s x = log ( I x / I 0 ) would be a natural variable that can be as well positiv e as negati ve. The Maximum Entr opy principle singles out a Gaussian prior for s if only information up to the second moment of the s -distrib ution is available a priori or should be taken into account. Furthermore, the likelihood might be a Poisson distribution, if individual photons are counted. All this complicates the inference problem and introduces so-called interaction terms into the information Hamiltonian, but can in principle be dealt with the toolbox of IFT [3, 6, 7]. F or simplicity , we assume in the following the measurement Eq. (4). In practical applications of IFT , it is often the case that the signal cov ariance S , the noise cov ariance N , the response R , or combinations thereof are not sufficiently known for applying the Wiener filter given by Eq. (6). W e denote all such undetermined parameters with the parameter vector p . Its uncertainty has to be taken into account in the signal inference. A p -mar ginalized joint probability of data and signal might be calculated analytically , P ( d , s ) = Z D p P ( d , s , p ) . (7) Howe ver , the corresponding information Hamiltonian H ( d , s ) = − ln P ( d , s ) is usually a complicated function of the signal, prev enting an easy calculation of the posterior mean m . Simple MAP signal estimators, resulting from solving ∂ H ( d , s ) / ∂ s = 0 for s , can be highly biased due to the strong sk ewness of the mar ginalized posterior [8]. The Empirical Bayes approach performs much better . There, the s -marginalized parameter posterior P ( d , p ) = R D s P ( d , s , p ) is used to choose a good point estimate p ? , which then is assumed to be correct in a signal mean inference based on P ( s | d , p ? ) . The reason for Empirical Bayes to be a logically consistent approximation can be observed in the follo wing reformulation of the posterior mean signal, m ≡ h s i ( s , p | d ) ≡ Z D p Z D s P ( s , p | d ) s = Z D p P ( p | d ) Z D s P ( s | d , p ) s = hh s i ( s | d , p ) i ( p | d ) ≈ h s i ( s | d , p ? ) . (8) In the last step a delta-approximation, P ( p | d ) ≈ δ ( p − p ? ) , was used. Thus, Empirical Bayes can be regarded as the zeroth order term of a perturbation expansion in the posterior parameter uncertainty . The next order term might be obtained by using a Gaussian approximation of P ( p | d ) . A more systematic expansion can be obtained using the minimal Gibbs free energy approach [6]. Howe ver , already the empirical Bayes approximation leads to well- working signal inference recipes in case of unknown hyper-parameters for the problems of unkno wn cov ariances and/or response. UNKNO WN SIGNAL CO V ARIANCE The signal cov ariance S might be unkno wn, howe ver , statistical homogeneity can often be assumed, i.e., the correlation h s x ¯ s y i ( s ) = S xy = C s ( x − y ) is only a function of the distance between x , and y . This implies that S is diagonal in F ourier space, S kk 0 = ( 2 π ) u δ ( k − k 0 ) P s ( k ) , (9) with k , k 0 being u -dimensional F ourier wa ve-v ectors. In this case, a suitable hyper -prior for the unknown signal po wer spectrum P s ( k ) might be giv en by a term penalizing deviations from po wer -law spectra and in verse Gamma priors for 0 . 1 1 1 0 1 0 0 0 . 0 1 0 . 1 1 1 0 1 0 0 c r i t i c a l f i l t e r x = ( P d / P N )( k i ) y = p i P Q ( k i ) P U RE M AP s pectrum cla s s ica l joint MA P 0 1 0 - ϱ /(2 ϱ +4) 1 δ ε MAP spect rum PURE crit ica l cl assi cal joi nt MAP no perception t reshold perception t reshold Sour ce: Figur es from [8]. the individual F ourier modes, P ( P s ) ∝ exp " − 1 2 Z d k ∂ log P s ( k ) σ k ∂ log k 2 # | {z } ≡− 1 2 log P † s T log P s ∏ k [ P s ( k )] − α k exp − q k P s ( k ) . (10) The hyper-parameters σ k , α k , and q k determine how strongly spectral curvature is penalized and how informativ e the spectral prior is for the individual modes. α k = 1 and q k = 0 correspond to Jeffreys’ non-informativ e prior . For the Empirical Bayes approximation we ha ve to deduce a point estimate of the spectrum. Using the MAP approximation in log P s ( k ) for this [7], we obtain P ? s ( k ) = q k + 1 2 T r h m ? m ? † + D ? 1 ( k ) i α k − 1 + 1 2 ρ k + ( T log P ? s ) k , (11) with 1 ( k ) a projection onto all Fourier modes with the same power -spectrum as k (e.g., on spheres in Fourier space, in case of statistical isotropy) and ρ k = Tr [ 1 k ] the number of such modes. m ? and D ? are the mean and uncertainty cov ariance, respectively , calculated for the Gaussian inference problem assuming P ? s ( k ) to be self-consistently the correct power spectrum. It is instructiv e to in vestigate this signal filter operation (the self-consistent calculation of m ? and P ? s ) in the context of similar filters. F or this, we simplify the problem by assuming that signal and noise cov ariances, as well as response are diagonal in Fourier space, and that our spectral prior is completely non-informati ve ( σ k = ∞ , α k = 1, and q k = 0 for all k ). A generic spectral estimator (to be calculated consistently with the corresponding signal estimator m ? = D ? j ) is then P ? s ( k ) = T r h m ? m ? † + δ D ? 1 ( k ) i ρ k + ε , (12) where the parameters δ and ε hav e been introduced in order to see how differently signal and spectrum estimators perform. The abov e deriv ed non-informative filter corresponds to ( δ , ε ) = ( 1 , 0 ) , but other assumptions or approxi- mations might lead to different v alues. These are depicted in the top right figure on this page. For example, the MAP estimator using the power spectrum marginalized posterior leads to ( δ , ε ) = ( 0 , 0 ) (labeled “classical”) and a param- eter uncertainty renormalized estimator (PURE) can be projected to ( δ , ε ) = ( 1 , − ρ / ( 2 ρ + 4 )) . The abov e estimator with ( δ , ε ) = ( 1 , 0 ) is called critical filter , because it resides on a critical line for perception thresholds. All the es- timators to the left of this line exhibit a perception threshold. They do not respond at all to data modes with po wer below some critical v alue. This is depicted in the left figure, where the power in the resulting reconstructions of these filters is shown as a function of the ratio of data to noise power . Filters with perception threshold require the data to hav e more variance than e xpected from the noise. For example, the classical filter with ( δ , ε ) = ( 0 , 0 ) does not account for the missing po wer in the reconstruction. As a consequence it has a strong perception threshold. It requires the data to ha ve four times the variance of the noise before it recognizes a signal. The critical filter has only a mar ginal perception threshold at the point where the data variance is exactly the noise variance. The PURE filter tries ev en to extract information from data with less v ariance than expected from the noise lev el. UNKNO WN NOISE CO V ARIANCE The critical filter needs to kno w precisely the noise lev el of the data in order to tune itself optimally . Howe ver , in many measurement situations this is also not reliably known and the data have to provide this information as well. As noise and signal variance are somehow degenerately encoded in the data, an informative prior for the noise lev el is mandatory whenever a non-informativ e prior for the signal power spectrum is used. The noise covariance might be decomposed as N = ∑ i η i N i , where the N i denote block matrices in data space that contain the best av ailable guesses for the error cov ariances. W e model our noise prior knowledge with a multi variate in verse-gamma prior , P ( η ) ∝ ∏ i η − β i i exp − r i η i , (13) by choosing β i > 1 and r i > 0 to represent our confidence in the reported or assumed error cov ariances N i appropriately . The resulting signal estimation scheme is as before, just with an additional estimator for the noise parameters, given by η ? i = r i + 1 2 T r ( d − R m ? ) ( d − R m ? ) † + R D ? R † N − 1 i β i − 1 + 1 2 µ i , (14) where N − 1 i is the pseudo-in verse of N i and µ i = T r ( N i N − 1 i ) the dimensionality of the noise blocks [9]. This extended critical filter has successfully been used to reconstruct an all sky map of the galactic magnetic field [10]. UNKNO WN CALIBRA TION Finally , it might also well be that the response in Eq. (4) is not fully known. The process and the result of determining the response is called calibration . Ideally , the measurement of a well-known e xternal calibration signal is used. Howe ver , often this is not sufficient, since the instrument response might change with time. In this case, the scientific signal of interest might be used as a calibrator itself. The resulting self-calibration scheme ( selfcal ) assumes some calibration, infers the signal with a reconstruction method, assumes this signal to be correct in order to calibrate on this, and iterates until con ver gence. Although the selfcal scheme is widely used, in particular in radio astronomy , an information theoretical in vestigation and a proof about its con ver gence was only pro vided recently [11]. There it has been sho wn that if the imaging and calibration algorithms used can be regarded as MAP estimators, the classical selfcal scheme corresponds to a joint MAP estimation of signal and calibration. Similar to the case of unknown signal co variance, we expect such a joint MAP estimation to be sub-optimal in an L 2 -error norm sense. Specifically , we assume the unknown calibration parameters γ = ( γ i ) i to affect the response linearly , R γ = B 0 + ∑ a γ a B a , (15) with B 0 and B a known and γ -independent, and P ( γ ) = G ( γ , Γ ) the calibration prior with some kno wn calibration uncertainty matrix Γ . Using the Empirical Bayes approach, we find that the signal reconstruction m ? should use the calibration point estimate giv en by Ref. [11] self-consistently γ ? = ∆ ? h ? , with (16) ∆ ? − 1 ab = Γ − 1 ab + T r h m ? m ? † + D ? B a † N − 1 B b i , and h ? b = m ? † B b † N − 1 d − Tr h m ? m ? † + D ? B 0† N − 1 B b i . 1.0 0.5 0.0 0.5 1.0 1.5 signal & reconstruction 0.0 0.5 1.0 signal domain [position] 0.2 0.1 0.0 0.1 0.2 0.3 0.4 reconstruction error original signal classical selfcal new selfcal 0.0 0.5 1.0 1.5 2.0 gain & calibration 0.0 0.5 1.0 1.5 2.0 2.5 3.0 data domain [time] 0.2 0.1 0.0 0.1 0.2 0.3 0.4 calibration error original gains classical selfcal new selfcal Sour ce: Figur es from [11]. This is a new selfcal scheme, which improves ov er classical selfcal by taking the posterior signal uncertainty D ? = h ( s − m ? ) ( s − m ? ) † i ( s | d , γ ? ) into account. An example of selfcal can be seen in the figure on this page. There, an unkno wn signal is shown, which was scanned three times with an instrument with v arying gains (also sho wn) according to the measurement equation d t = ( 1 + γ t ) s x t + n t with x t = t mod1. The solutions of the classical selfcal and the new selfcal schemes are compared to this. Both solve Eq. (6) and Eq. (16) simultaneously , while classical selfcal ignores D ? in Eq. (16). The figure sho ws that the new selfcal scheme alle viates partly a bias of classical selfcal for high gain solutions. ASTR OPHYSICAL APPLICA TIONS The versatility of IFT to deal with various measurement problems, to take uncertainties on covariances and responses into account and to pro vide implementable algorithms hav e let to a number of concrete IFT applications in astronomy and astrophysics. Many of them build on N I F T Y , Numerical Information Field Theory [4, 5], a publicly a vailable, object-oriented library that permits the user to program IFT algorithms abstractly , irrespective of the space discretiza- tion used. IFT was already applied to a number of areas, which we briefly discuss in the follo wing. The cosmic microwa ve background statistic is nearly Gaussian. Howe ver , different inflationary scenarios of cos- mology predict different le vels of non-Gaussian signatures. Optimal non-Gaussianity estimators could be constructed using Feynman diagrams [3] and by a saddle point approximation [12]. The cosmic large-scale structure of the matter distribution in the Univ erse is traced by galaxies and can therefore be reconstructed from galaxy surveys [13, 14, 15, and others]. The power spectrum of the large-scale structure can be measured as well [14] and analyzed for its information content on cosmology . Although the cosmic density field is a strictly positiv e quantity , it is often treated as a Gaussian random field. A log-normal model would be more accurate [3, 16, 17]. For its usage, it is necessary to translate between linear and logarithmic spectra [18]. Galactic magnetism can be studied using the Faraday rotation of extragalactic radio sources. The construction of all-sky F araday maps from such individual measurements tow ards the directions of these sources was possible thanks to the critical filter [19] and the extended critical filter [20]. Interferometric radio astronomy faces the problem that an interferometer array measures only some part of the Fourier transformed sk y brightness. The other part has to be reconstructed by the imaging algorithm. RESOL VE , a nov el IFT based imaging method, which assumes the sky brightness to follow a log-normal distrib ution with unkno wn power spectrum, seems to be superior in reconstructing diffuse emission compared to classical algorithms (e.g. CLEAN and MAXENT ) [21]. Gamma- and X-ray astronomy hav e to deal with extended emission, point sources, Poisson statistics, inhomoge- neous sk y e xposure, and complex point spread functions. The IFT based D 3 PO algorithm for denoising, decon volving, and decomposition of photon observations handles these and produces maps of the e xtended emission, catalogs of the point sources, and angular power spectra of the dif fuse flux [22]. These applications demonstrate that IFT is a versatile frame work to develop and analyze measurement and imaging problems not only in astronomy , but also in cosmology and other areas. A CKNO WLEDGMENTS First of all, I should apologize for exclusiv ely re viewing IFT work I was in volv ed in. There is a lar ge amount of related work I had to ignore for bre vity . I want to thank my IFT co-in vestigators of the here discussed works, Michael Bell, Sebastian Dorn, Mona Frommert, Maksim Greiner , Jens Jasche, Henrik Junklewitz, Righi Khatri, Francisco Shu Kitaura, Niels Oppermann, Martin Reinecke, Georg Robbers, Marco Selig, and Cornelius W eig. Furthermore, I acknowledge helpful comments on the manuscript by V anessa Böhm, Sebastian Dorn, and Marco Selig. REFERENCES 1. T . Enßlin, “Information field theory, ” in American Institute of Physics Conference Series , edited by U. von T oussaint, 2013, vol. 1553 of American Institute of Physics Confer ence Series , pp. 184–191, 1301.2556 . 2. J. C. Lemm, Bayesian F ield Theory , Johns Hopkins University Press, 2003. 3. T . A. Enßlin, M. Frommert, and F . S. Kitaura, Phys.Rev .D 80 , 105005 (2009), 0806.3474 . 4. M. Selig, M. R. Bell, H. Junklewitz, N. Oppermann, M. Reinecke, M. Greiner, C. Pachajoa, and T . A. Enßlin, A&A 554 , A26 (2013), 1301.4499 . 5. M. Selig, “The N I F T Y way of Bayesian signal inference, ” in Bayesian Inference and Maximum Entropy Methods in Science and Engineering , 2013, vol. these proceedings. 6. T . A. Enßlin, and C. W eig, Phys.Rev .E 82 , 051112 (2010), 1004.2868 . 7. N. Oppermann, M. Selig, M. R. Bell, and T . A. Enßlin, Phys.Rev .E 87 , 032136 (2013), 1210.6866 . 8. T . A. Enßlin, and M. Frommert, Phys.Rev .D 83 , 105014 (2011), 1002.2928 . 9. N. Oppermann, G. Robbers, and T . A. Enßlin, Phys.Rev .E 84 , 041118 (2011), 1107.2384 . 10. N. Oppermann et al., A&A 542 , A93 (2012), 1111.6186 . 11. T . A. Enßlin, H. Junklewitz, L. W inderling, and M. Selig, ArXiv e-prints (2013), 1312.1349 . 12. S. Dorn, N. Oppermann, R. Khatri, M. Selig, and T . A. Enßlin, Phys.Rev .D 88 , 103516 (2013), 1307.3884 . 13. F . S. Kitaura, and T . A. Enßlin, MNRAS 389 , 497–544 (2008), 0705.0429 . 14. J. Jasche, F . S. Kitaura, B. D. W andelt, and T . A. Enßlin, MNRAS 406 , 60–85 (2010), 0911.2493 . 15. F .-S. Kitaura, P . Erdo ˇ gdu, S. E. Nuza, A. Khalatyan, R. E. Angulo, Y . Hoffman, and S. Gottlöber, MNRAS 427 , L35–L39 (2012), 1205.5560 . 16. F . S. Kitaura et al., MNRAS 400 , 183–203 (2009), 0906.3978 . 17. C. W eig, and T . A. Enßlin, MNRAS 409 , 1393–1411 (2010), 1003.1311 . 18. M. Greiner, and T . A. Enßlin, ArXiv e-prints (2013), 1312.1354 . 19. N. Oppermann, H. Junklewitz, G. Robbers, and T . A. Enßlin, A&A 530 , A89+ (2011), 1008.1246 . 20. N. Oppermann, H. Junklewitz, G. Robbers, M. R. Bell, T . A. Enßlin, A. Bonafede, R. Braun, J. C. Brown, T . E. Clarke, I. J. Feain, B. M. Gaensler, A. Hammond, L. Harvey-Smith, G. Heald, M. Johnston-Hollitt, U. Klein, P . P . Kronberg, S. A. Mao, N. M. McClure-Griffiths, S. P . O’Sullivan, L. Pratley, T . Robishaw, S. Roy, D. H. F . M. Schnitzeler, C. Sotomayor-Beltran, J. Stev ens, J. M. Stil, C. Sunstrum, A. T anna, A. R. T aylor, and C. L. V an Eck, A&A 542 , A93 (2012), 1111.6186 . 21. H. Junklewitz, M. R. Bell, M. Selig, and T . A. Enßlin, ArXiv e-prints (2013), 1311.5282 . 22. M. Selig, and T . Enßlin, ArXiv e-prints (2013), 1311.1888 .
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment