Bayesian semiparametric power spectral density estimation with applications in gravitational wave data analysis
The standard noise model in gravitational wave (GW) data analysis assumes detector noise is stationary and Gaussian distributed, with a known power spectral density (PSD) that is usually estimated using clean off-source data. Real GW data often depar…
Authors: Matthew C. Edwards, Renate Meyer, Nelson Christensen
Ba y esian semiparametric p o w er sp ectral densit y estimation with applications in gra vitational w a v e data analysis Matthew C. Edw ards, 1, 2 , ∗ Renate Meyer, 1 , † and Nelson Christensen 2 , ‡ 1 Dep artment of Statistics, University of Auckland, Auckland 1142, New Ze aland 2 Physics and Astr onomy, Carleton Col le ge, Northfield, Minnesota 55057, USA The standard noise model in gravitational w a v e (GW) data analysis assumes detector noise is stationary and Gaussian distributed, with a known p ow er sp ectral density (PSD) that is usually estimated using clean off-source data. Real GW data often depart from these assumptions, and missp ecified parametric mo dels of the PSD could result in misleading inferences. W e prop ose a Ba yesian semiparametric approach to improv e this. W e use a nonparametric Bernstein p olynomial prior on the PSD, with weigh ts attained via a Dirichlet pro cess distribution, and up date this using the Whittle likelihoo d. Posterior samples are obtained using a blo c ked Metrop olis-within-Gibbs sampler. W e simultaneously estimate the reconstruction parameters of a rotating core collapse sup erno v a GW burst that has been embedded in simulated Adv anced LIGO noise. W e also discuss an approach to deal with non-stationary data by breaking longer data streams into smaller and lo cally stationary comp onen ts. P ACS num bers: 04.30.-w, 02.50.-r, 05.45.Tp, 97.60.Bw I. INTR ODUCTION Astronom y is entering a new and exciting era, with the second generation of ground-based gra vitational wa v e (GW) in terferometers (Adv anced LIGO [1], Adv anced Virgo [2], and KAGRA [3]) expected to reach design sen- sitivit y in the next few years. Throughout history , de- v elopments in astronomy ha ve led to deep er understand- ings of the univ erse. Eac h time w e prob e the univ erse with new sensors, we discov er exciting and unexpected phenomena, that challenge our current b eliefs in astro- ph ysics and cosmology . GW astronomy promises to do the same, providing a new set of ears to listen to (p oten- tially unanticipated) cataclysmic even ts in the cosmos. Apart from the first direct observ ation of GWs, ex- tracting astroph ysical information enco ded in GW sig- nals is one of the primary goals in GW data analysis. Since observ ations are sub ject to noise, accurate astro- ph ysical predictions rely on an honest characterization of these noise sources. A t its design sensitivity , Ad- v anced LIGO will b e sensitiv e to GWs in the frequency band from 10 Hz to 8 kHz. The main noise sources for ground-based interferometers include seismic noise, ther- mal noise, and photon shot (quantum) noise [1]. Seismic noise limits the low frequency sensitivit y of the detec- tors. Thermal noise is the predominate noise source in the most sensitive frequency band of Adv anced LIGO (around 100 Hz), and arises from the test mass mirror susp ensions and the Brownian motion of the mirror coat- ings. Photon shot noise is due to quan tum uncertainties in the detected photon arriv al rate, and dominates the high frequency sensitivity of the detectors. ∗ matt.edwards@auc kland.ac.nz † renate.meyer@auc kland.ac.nz ‡ nchriste@carleton.edu Standard assumptions ab out the noise mo del in the GW data analysis communit y rely on detector noise b e- ing stationary and Gaussian distributed, with a known p o w er sp ectral density (PSD) that is usually estimated using off-source data (not on a candidate signal) [4]. Real GW data often depart from these assumptions [5]. It w as demonstrated in [6] that fluctuations in the PSD can mo derately bias parameter estimates of compact binary coalescence GW signals embedded in LIGO data from the sixth science run (S6). High amplitude non-Gaussian transien ts (or “glitc hes”) in real detector data inv alidate the Gaussian noise assumption, and missp ecifications of the paramet- ric noise model could result in misleading inferences and predictions. A more sophisticated approach would b e to make no assumptions ab out the underlying noise distribution b y using nonparametric techniques. Unlik e parametric statistical mo dels, which ha v e a fixed and finite set of parameters (e.g., the Gaussian distribution has tw o parameters: µ and σ 2 represen ting the mean and v ariance resp ectively), nonparametric mo dels hav e a p oten tially infinite set of parameters, allowing for m uch greater flexibility . The theory of sp ectral density e stimation requires a time series to b e a stationary pro cess. If data is not sta- tionary (which is often the case for real LIGO data), it is imp ortan t to adjust for this b y introducing a time- v arying PSD. It was demonstrated in [7] that the noise PSD in real S6 LIGO data is in fact time-v arying. V ari- ation in detector sensitivity was also shown in [8]. Other GW literature that discusses non-stationary noise include [9, 10]. It would b e an o v er-simplification to assume the Adv anced LIGO PSD is constant ov er time, and to use off-source data in c haracterizing this. On-source estima- tion of the PSD w ould therefore be preferable to mitigate the time-v arying nature of the PSD. There ha ve b een attempts rep orted in the literature to improv e the mo delling of noise present in GW data, 2 primarily concen trating on noise with embedded signals from w ell-mo delled GW sources, suc h as binary inspi- rals [4, 11 – 14], and more recently from GW bursts (un- mo delled and t ypically short duration ev en ts) [7, 15]. Under the Ba yesian framew ork, R¨ ov er et al [11] used a Student- t lik eliho o d as a generalization to the com- monly used Whittle (approximate Gaussian) lik eliho od [16]. The b enefit of the Student- t set-up is tw o-fold: un- certain ty in the noise sp ectrum can b e accoun ted for via marginalization of nuisance parameters; and outliers can b e accommo dated due to the heavy-tail nature of the Studen t- t probabilit y densit y . A dra wback of this method is that the c hoice of h yp erparameters can unduly influ- ence p osterior inferences. Using the maximum likelihoo d approach, R¨ over [12] later demonstrated that the Student- t likelihoo d could b e used as a generalization to the matc hed-filtering detection method commonly used in the analysis of GW signals from w ell-mo delled sources. This approach w ould not b e appropriate for GW bursts, since matched- filtering requires accurate signal models with w ell-defined parameter spaces. Litten b erg and Cornish [13] used Bay esian mo del se- lection to determine the b est noise likelihoo d function in non-Gaussian noise. They considered Gaussian noise with a time v arying mean, noise from a weigh ted sum of t wo Gaussian distributions (non-Gaussian tails), and a com bination of Gaussian noise and glitc hes (mo delled as a linear combination of w a velets). Litten b erg et al [4] demonstrated how one can incor- p orate additional scale parameters in the Gaussian likeli- ho od, and marginalize o v er the uncertain t y in the PSD to reduce systematic biases in parameter estimates of com- pact binary mergers in S5 LIGO data. This metho d re- quires an initial estimate of the PSD. On a related note, Vitale et al [14] highlighted a Ba yesian method, similar to iteratively rew eighted least squares, that analytically marginalizes out background noise and requires no a pri- ori knowledge of the PSD. They applied this to sim ulated data from LISA Pathfinder. More recently , Littenberg and Cornish [7] in tro duced the Bay esLine algorithm in conjunction with Bay esW av e [15] to estimate the underlying PSD of GW detector noise. Ba yesLine is used to mo del the Gaussian noise comp onen t. They use a cubic spline to mo del the smo oth c hanging broadband noise and Lorentzians (Cauc hy den- sities) to mo del wandering spectral lines (due to AC sup- ply , violin mo des, etc.). Bay esW av e, on the other hand, mo dels the non-Gaussian instrumen t “glitc hes” and burst sources with a con tinuous wa v elet basis. Both meth- o ds make use of the trans-dimensional reversible jump Mark ov chain Monte Carlo (RJMCMC) algorithm of Green [17]. Bay esLine is v ery pragmatic and works re- mark ably well on real Adv anced LIGO data. Ho wev er, the authors did not consider statistically imp ortant no- tions such as the posterior consistency of the PSD [18]. Our approach to impro ving the GW noise mo del re- lies on developmen ts ov er the past decade in the area of Bay esian nonparametrics. Since parametric mo delling can lead to biased estimates when the underlying para- metric assumptions are in v alid, we prefer nonparametric tec hniques to estimate the PSD of a stationary noise time series. A common nonparametric estimate of the spectral den- sit y of a stationary time series is the p erio dogram, calcu- lated using the (normalized) squared mo dulus of F ourier co efficien ts. That is, I n ( λ ) = 1 2 π n n X t =1 X t exp( − i tλ ) 2 , λ ∈ ( − π , π ] , (1) where λ is the frequency , and { X t } is a stationary time series, where t = 1 , 2 , . . . , n represen ts discretized time. The p eriodogram randomly fluctuates ab out the true sp ectral densit y of a time series, but is not a consis- ten t estimator, motiv ating metho ds such as p eriodogram smo othing and av eraging [19]. Averaging of off-source p eriodograms from T ukey windo wed sim ulated Adv anced LIGO noise has b een used in GW literature relating to reconstructing rotating core collapse GWs [20], and predicting the important astrophysical parameters from these even ts [21]. In this pap er, w e implemen t the nonparamet- ric Ba yesian spectral density estimation method and Metrop olis-within-Gibbs Mark ov chain Monte Carlo (MCMC) sampler presented b y Choudh uri et al [22], whic h up dates a nonparametric Bernstein polynomial prior [23, 24] on the sp ectral density using the Whit- tle likelihoo d to make p osterior inferences. A Bernstein p olynomial prior is essen tially a finite mixture of Beta probabilit y densities (see Section I I C and App endix A). It w as pro ved that this metho d yields a consisten t estima- tor for the true sp ectral densit y of a (short-term memory) stationary time series [22] — an attractiv e feature, absen t in the perio dogram. Posterior consistency in this con text essen tially means that the p osterior probability of an ar- bitrary neighbourho o d around the true PSD go es to 1 as the length of the time series increases to infinity . Th us, as the sample size increases, the p osterior distribution of the PSD will ev en tually concentrate in a neighbourho o d of the true PSD [18]. This is an imp ortan t asymptotic robustness qualit y of the p osterior distribution in that the choice of prior parameters should not influence the p osterior distribution too muc h. Esp ecially in Bay esian nonparametrics, b ecause of the high dimension of the pa- rameter space, many posterior distributions do not auto- matically p ossess this qualit y [18]. W e refer the reader to App endix C for a visual demonstration of p osterior consistency . Unlik e references [4, 11, 14], we do not treat noise as a nuisance parameter to b e analytically integrated out. Although the signal parameters are our primary concern, w e are also interested in quantifying our uncertain ty in the underlying PSD of the noise in terms of p osterior probabilities and credible interv als. Knowledge of this uncertain ty will allow us to make honest astrophysical 3 statemen ts. In this study , we assume that data is the sum of a GW signal embedded in noise (from all noise sources), suc h that y = s ( β ) + ( θ ) , (2) where y is the (coincident) time-domain GW data v ector, s is a GW signal parameterized b y β , and is noise parameterized by θ . Notation with a tilde on top, suc h as ˜ y , refers to the frequency-domain equiv alen t of the same quan tit y , obtained b y the discrete F ourier transform (DFT). Note that w e are treating noise in this set-up as the conglomeration of detector noise (suc h as thermal noise and photon shot noise), background noise (suc h as seismic noise), and residual errors due to parametric statistical mo delling of GW signals. An important ca v eat is to ensure the magnitude of the errors in the statistical mo del of the signal are minimized, so as to not artificially dominate the noise. Estimation of sp ectral lines (as done b y the Bay esLine algorithm [7]) is left out of the scop e of this pap er. The GW signal could essentially come from any source, but in this pap er we will restrict our concentration to those from rotating core collapse sup ernov ae to simplify the problem and demonstrate the p o w er of the metho d. Using the recent w a veform catalogue of Ab dik amalov et al [25], we conduct principal comp onent analysis (PCA), and fit a principal comp onen t regression (PCR) mo del of the most imp ortant principal comp onen ts (PCs) on an arbitrary rotating core collapse GW signal [20, 21, 26]. The (parametric) signal component is easily em b edded as an additional Gibbs step in the Metrop olis-within-Gibbs MCMC sampler of Choudhuri et al [22]. That is, we utilize a blo cke d Gibbs approach to sequentially sample the signal parameters β given the noise parameters θ , and vice versa. As the mo del no w contains a paramet- ric signal comp onen t as w ell as a nonparametric noise comp onen t, it is “semiparametric”. T o accommo date for non-stationary noise, w e adapt an idea presented b y Rosen et al [27], and assume a non- stationary time series can b e broken down into smaller lo cally stationary segmen ts. F or eac h segmen t, w e sepa- rately estimate the PSD using the metho d of Choudhuri et al [22], and look at the time-v arying sp ectrum. W e see this work as b eing a complement to existing metho ds, with the following b enefits: • A Bay esian framework, allowing us to up date prior kno wledge based on observed data, as w ell as quan- tify uncertaint y in terms of probabilistic state- men ts; • Posterior consistency of the PSD, i.e., the p osterior distribution will concen trate around the true PSD as the sample size increases; • No parametric assumptions ab out the underlying noise distribution (parametric mo dels are very sen- sitiv e to missp ecifications), and high amplitude non-Gaussian transients in the noise can b e han- dled; • Non-stationarities can b e taken into accoun t b y splitting the data into smaller lo cally stationary segmen ts; • Estimation of noise and signal parameters are done sim ultaneously using Gibbs sampling; • Uncertaint y in astrophysically meaningful parame- ter estimates are honest, with less systematic bias presen t; • Non-informative priors can be c hosen, and the PSD do es not need to b e known a priori ; • Useful for any signal with a parametric statistical mo del (including rotating core collapse sup ernov a GWs). The pap er is structured as follows: Section II outlines the metho ds and mo dels used to simultaneously esti- mate signal and noise parameters in GW data; results for toy mo dels and simulated Adv anced LIGO data are presen ted in Section I I I; and in Section IV, we discuss the consequences of this work, as well as future initia- tiv es. Supplemen tary material can b e found in the three app endices. I I. METHODS AND MODELS A. P arametric, Nonparametric, and Semiparametric Models Statistical mo dels can b e classified into tw o groups — parametric and nonparametric. Par ametric mo dels hav e a fixed and finite set of parameters, are relatively easy to analyze, and are p o werful when their underlying as- sumptions are correctly sp ecified. Ho wev er, if the mo del is missp ecified, inferences will b e unreliable. Nonp ar a- metric models hav e far fewer restrictions, but are less ef- ficien t and p ow erful than their parametric coun terparts. No assumption about the underlying distribution of the data is made in nonparametric mo delling, and the num- b er of parameters are not fixed (and p otentially infinite dimensional). Instead, the effective n umber of parame- ters increases with more data, pro viding the model struc- ture. F or example, parametric regression (including linear mo dels, nonlinear mo dels, and generalized linear models) uses the following equation: y = g ( x 1 , x 2 , . . . , x k | β ) + , (3) where y is the resp onse v ariable, g ( x 1 , x 2 , . . . , x k | β ) is a function of k explanatory v ariables (that aim to explain the v ariability in y ) given some mo del parameters β , and is the statistical error, usually assumed to b e indep en- den t and identically distributed (iid) Gaussian random 4 v ariables, with 0 mean and constan t v ariance σ 2 . Here, the functional form of g ( . ) is known in adv ance, suc h as in linear regression, where w e hav e g ( x 1 , x 2 , . . . , x k | β ) = β 0 + β 1 x 1 + . . . + β k x k . (4) Nonparametric regression has a similar set-up, but as- sumes that the functional form of g ( . ) is unknown and to b e learnt from the data. F unction g ( . ) could b e thought of as an uncountably infinite-dimensional parameter in a nonparametric setting. Semip ar ametric mo dels contain b oth parametric and nonparametric comp onents. The parametric regression mo del presented in Equations (3) and (4) is essentially the same parametric mo del used in this pap er for GW signal reconstruction, where ( x 1 , x 2 , . . . , x k ) are princi- pal comp onent (PC) basis functions. How ev er, we model the noise nonparametrically , rather than assuming iid Gaussian noise. Since we hav e parametric and nonpara- metric comp onents, our mo del is semiparametric in na- ture. B. Ba yesian Nonparametrics Ba yesian nonparametrics contains the set of mo dels on the in terface b etw een the Bay esian framework and nonparametric statistics, and is characterized by large parameter spaces and probabilit y measures ov er these spaces [18]. The Ba yesian statistical framework is use- ful for incorp orating prior kno wledge, and is particularly p o w erful when these priors accurately represent our b e- liefs. As mentioned in the previous section, nonparamet- ric methods are useful for constructing flexible and robust alternativ es to parametric mo dels. A b enefit of Bay esian nonparametric mo dels is that they automatically infer mo del complexit y from the data, without explicitly con- ducting mo del comparison. Ba yesian nonparametrics is a relativ ely nascent field in statistics, and faces many c hallenges. The most obvi- ous one is the mathematical difficulty in specifying well- defined probabilit y distributions on infinite-dimensional function spaces. Constructing a prior on these spaces can b e arduous, and in the case of non-informativ e pri- ors, one should ensure large topological support so as not to put to o muc h mass on a small region. F urther, cre- ating computationally con venien t algorithms to sample from complicated posterior distributions presen ts its o wn set of challenges. It is also imp ortan t to ensure that a Ba yesian nonparametric model is statistically consistent (the truth is unco vered asymptotically), as some pro ce- dures do not automatically possess this quality [18]. Ba yesian nonparametric priors (and p osteriors) are sto c hastic processes rather than parametric distributions. F erguson [28] provided the seminal pap er for the field of Ba yesian nonparametrics, introducing the Dirichlet pro- cess, an infinite-dimensional generalization of the Dirich- let distribution, now commonly used as a prior in infinite mixture mo dels. This is a p opular mo del (often called the Chinese Restaurant Process) for classification prob- lems where the n umber of classes is unkno wn and to b e inferred from the data. A formal definition of the Dirich- let distribution and Dirichlet pro cess can be found in App endix B. Another p opular prior in Bay esian nonparametrics is the Gaussian process prior, which is often used in nonlin- ear regression contexts. In fact, one could extend the re- gression example in the previous section into the realm of Ba yesian nonparametrics b y putting a Gaussian pro cess prior on the function g . Compare this to the Ba yesian parametric counterpart, which puts a prior on the mo del parameters β . F or further discussion on Bay esian nonparametrics, we refer the reader to [18]. C. Sp ectral Density Estimation A w eakly (or second order) stationary time series { X t } is a sto c hastic pro cess that has constant and finite mean and v ariance ov er time, and an auto co v ariance function γ ( h ) that dep ends only on the time lag h . That is, for a zero-mean weakly stationary pro cess, the auto cov ariance function has the form γ ( h ) = E[ X t X t + h ] , ∀ t, (5) where E[ . ] is the expected v alue op erator, and t represents time. Assuming an absolutely summable auto cov ariance function ( P ∞ h = −∞ | γ ( h ) | < ∞ ), the (real-v alued) sp ectral densit y function f ( λ ) of a zero-mean weakly stationary time series is defined as f ( λ ) = 1 2 π ∞ X h = −∞ γ ( h ) exp( − i hλ ) , λ ∈ ( − π , π ] , (6) where λ is the angular frequency . Note that the sp ec- tral density function and auto cov ariance function are a F ourier transform pair. In this pap er, w e will also call this the p ower sp ectral densit y (PSD) function, although this term is sometimes reserved for the empirical spec- trum (p erio dogram) in the GW literature. F or a mean-centered w eakly stationary time series { X t } of length n , with sp ectral density f ( λ ), the Whittle appro ximation to the Gaussian lik eliho od, or simply the Whittle likeliho o d [16] is defined as L n ( x | f ) ∝ exp − b u c X l =1 log f ( λ l ) + I n ( λ l ) f ( λ l ) , (7) where λ l = 2 π l /n are the p ositive F ourier frequencies, u = ( n − 1) / 2, b u c is the greatest integer v alue less than or equal to u , and I n ( . ) is the p eriodogram defined in Equa- tion (1). If the PSD is known, the log f term in Equa- tion (7) is a constant and can b e ignored. The Whittle 5 lik eliho o d has an adv antage o ver the true Gaussian like- liho od as it has a direct dep endence on the PSD rather than the autocov ariance function. The Whittle lik eli- ho od is only exact for Gaussian white noise but works w ell under certain conditions, ev en when the data is not Gaussian [29]. More information ab out these concepts can be found in any go o d time series analysis textb o ok, suc h as Brockw ell and Davis [30]. W e no w need to sp ecify a nonparametric prior for the PSD. W e will briefly in tro duce the sp ectral densit y esti- mation technique of Choudhuri et al [22], which is based on the Bernstein p olynomial prior of Petrone [23, 24]. The Bernstein p olynomial prior is a nonparametric prior for a probability density on [0, 1], and is based on the W eierstrass appro ximation theorem that states that any con tinuous function on [0, 1] can b e uniformly approxi- mated to any desired degree by a Bernstein p olynomial. If this function is a density on [0, 1], this Bernstein p oly- nomial is essentially a finite mixture of Beta densities. W e refer the reader to App endix A for a definition of the Bernstein p olynomial and Beta density . Instead of putting a Dirichlet prior on the mixture weigh t vector, the weigh ts are defined via a probabilit y distribution G on [0, 1] and a Dirichlet pro cess prior is put on the space of probability distributions on [0, 1]. App endix B con- tains supplementary material on the Diric hlet pro cess. Since the sp ectral density is not defined on the unit in terv al, we reparameterize f ( λ ), such that f ( π ω ) = τ q ( ω ) , ω ∈ [0 , 1] , (8) where τ = R 1 0 f ( π ω )d ω is the normalization constant. T o sp ecify a prior on sp ectral density f ( π ω ), we put a Bernstein p olynomial prior on q ( ω ), using the following hierarc hical scheme: • q ( ω ) = P k j =1 G j − 1 k , j k β ( ω | j, k − j + 1), where G is a cumulativ e distribution function, and β ( ω | a, b ) is a Beta probabilit y density with parameters a and b . • G is a Dirichlet pro cess distributed random proba- bilit y measure with base measure G 0 and precision parameter M . • k has a discrete probabilit y mass function suc h that p ( k ) ∝ exp( − θ k k 2 ) , k = 1 , 2 , . . . . • τ has an Inv erse-Gamma( α τ , β τ ) distribution. • G , k , and τ are a priori indep enden t. W e use the stick-breaking construction of the Diric h- let pro cess by Sethuraman [31], which is an infinite- dimensional mixture mo del (defined in Appendix B). F or computational purp oses, we need to truncate the num- b er of mixture distributions to a large but finite num b er L . The choice of a large L will provide a more accurate appro ximation but also increase the computation time. Here, w e choose L = max { 20 , n 1 / 3 } . W e therefore repa- rameterize G to ( Z 0 , Z 1 , . . . , Z L , V 1 , . . . , V L ) such that G = L X l =1 p l δ Z l ! + 1 − L X l =1 p l ! δ Z 0 , (9) where p 1 = V 1 , p l = Q l − 1 j =1 (1 − V j ) V l for l ≥ 2, V l ∼ Beta(1 , M ) for l = 1 , . . . , L , and Z l ∼ G 0 for l = 0 , 1 , . . . , L . Note that δ a is a probabilit y density , degenerate at a . That is, δ a = 1 at a and 0 otherwise. This yields the prior mixture of the PSD f ( π ω ) = τ k X j =1 w j,k β ( ω | j, k − j + 1) , (10) with weigh ts w j,k = P L l =0 p l I { j − 1 k < Z l ≤ j k } and p 0 = 1 − P L l =1 p l . Abbreviating the vector of noise parameters as θ = ( v , z , k, τ ), the joint prior is therefore p ( θ ) ∝ L Y l =1 M (1 − v l ) M − 1 ! L Y l =0 g 0 ( z l ) ! p ( k ) p ( τ ) , (11) and is up dated using the Whittle lik eliho od to pro duce the unnormalized joint p osterior. This metho d is implemented as a Metrop olis-within- Gibbs MCMC sampler. In Choudhuri et al [22], param- eters k and τ are readily sampled from their full condi- tional p osteriors, while V and Z require the Metrop olis algorithm with Uniform prop osals. Our only v ariation on this implementation is our sampling of the smo oth- ness parameter k . W e found that a Metrop olis step is faster than sampling from the full conditional. The orig- inal implemen tation contains a for() lo op that ev aluates the log p osterior k max n umber of times, where k max is c hosen (during pilot runs) to b e large enough to cater for the roughness of the PSD. F or most w ell-b eha ved cases, k max = 50 will suffice, but the Adv anced LIGO PSD re- quires man y more mixture distributions (b y one to t wo orders of magnitude) due its steepness at lo w frequen- cies. This is a significan t computational burden, and a w ell-tuned Metrop olis step can therefore outp erform the original implementation. A discussion of the Diric hlet process and stick-breaking represen tation can be found in App endix B. D. Signal Reconstruction T o reconstruct a rotating core collapse GW signal that is embedded in noise, we use the (parametric) principal comp onen t regression (PCR) metho d describ ed in [20, 21, 26]. That is, let ˜ y = ˜ X β + ˜ , (12) 6 where ˜ y is the length n frequency-domain GW data vec- tor, ˜ X is the n × d matrix of the d frequency-domain principal comp onen t basis v ectors, β is the vector of sig- nal reconstruction parameters (PC co efficients), and ˜ is the frequency-domain noise vector with a known PSD. W e assume flat priors on β . It is imp ortan t to highligh t that useful astroph ysical information (suc h as the ratio of kinetic to gravitational potential energy of the inner core at b ounce, and precollapse differential rotation) can b e extracted b y regressing the posterior means of the PC co- efficien ts β on the kno wn astrophysical parameters from the w a veform catalogue, and sampling from the p osterior predictiv e distribution [21]. W e include an additional Gibbs step in the MCMC sampler describ ed in the previous section to simultane- ously reconstruct a rotating core collapse GW signal, whilst also estimating the noise p ow er spectrum. Omit- ting the conditioning on the data for clarity , we sequen- tially sample the full set of conditional posterior densities p ( θ | β ) and p ( β | θ ), where θ = ( v , z , k , τ ) are the noise pa- rameters defined in the previous section, and β are the signal reconstruction parameters. That is, w e sample in a cycle from the full conditional p osterior distribution of the signal parameters, given the PSD parameters, and the full conditionals of the PSD parameters, given the signal parameters. This set-up is called a blo cke d Gibbs sampler. T o sample the signal parameters, w e fix the most recen t MCMC sample of the PSD parameters. The conditional p osterior of β is P( β | θ ) = N( µ , Σ ) (13) where Σ = ( ˜ X 0 D − 1 ˜ X ) − 1 and µ = Σ ˜ X 0 D − 1 ˜ y . Here D = 2 π × diag ( f ( λ )) is the noise cov ariance matrix, and f ( λ ) is the most recent estimate of the PSD. More explicitly , at iteration i +1 in the blo c ked Gibbs sampling algorithm: 1. Create time-domain noise vector: ( i +1) = y − X β ( i ) . Due to the linearit y of the F ourier trans- form, β will b e the same no matter if we are in the time-domain or frequency-domain. 2. Sample the PSD parameters θ ( i +1) | β ( i ) using the metho d of Section I I C. 3. Sample the signal parameters β ( i +1) | θ ( i +1) using Equation (13) (since the PSD in iteration i + 1 is no w known). E. Non-stationary Noise As mentioned in Section I I C, stationary noise has a constan t and finite mean and v ariance o ver time, and an auto cov ariance function that dep ends only on the time lag. Non-stationary noise do es not meet these re- quiremen ts, and has a time-v arying sp ectrum. Station- arit y of a time series can b e tested using classical hy- p othesis tests suc h as the Augmented Dick ey-F uller test [32], Phillips-Perron unit ro ot test [33], and Kwiatw oski- Phillips-Sc hmidt-Shin (KPSS) tes t [34]. T o accommodate non-stationary noise, w e adapt an idea presen ted by Rosen et al [27], that assumes a time se- ries can be brok en down into lo cally stationary segmen ts. In their pap er, they treat the num ber of stationary com- p onen ts of a non-stationary time series as unknown, and use RJMCMC [17] to estimate the segment breaks. In a similar fashion, we break a non-stationary time series (or GW data stream) in to J equal segments. W e ha ve tw o requiremen ts for the length of these segments: the segment length is large enough for the Whittle ap- pro ximation to b e v alid; and the segments are lo cally stationary according to heuristics or formal stationarit y h yp othesis tests. This approach fits nicely into our cur- ren t MCMC framework. F or each segment, w e estimate the PSD using the nonparametric metho d introduced in Section I I C. A b enefit of this approach is that c hange- p oin ts in the PSD can b e detected without using RJM- CMC. The conditional p osterior density for all noise mo del parameters θ is the follo wing pro duct π ( θ | β , ˜ y ) = J Y j =1 π j ( θ j | β , ˜ y j ) , (14) where π j ( θ j | β , ˜ y j ) is the conditional p osterior density of the mo del parameters θ j in the j th segmen t given the signal parameters β and the j th segmen t of data ˜ y j . Note that under this set-up, the PC co efficien ts β do not dep end on segments j = 1 , 2 , . . . , J , since w e require one set of PC co efficien ts (not J sets) to reconstruct a rotating core collapse GW signal. T o sample β | θ , we use the same approach presen ted in Section I I D. The only difference is in the construction of the noise cov ariance matrix. This is constructed as D = 2 π × diag( f 1 ( λ ) , f 2 ( λ ) , . . . , f J ( λ )), where f j ( λ ) is the PSD of the j th noise segment. I II. RESUL TS F or the following examples, we set L = max { 20 , n 1 / 3 } , and use the non-informative prior set-up of Choudhuri et al [22]. That is, let G 0 ∼ Uniform[0 , 1] , M = 1 , α τ = β τ = 0 . 001, and θ k = 0 . 01. W e use k max = 50 for most examples, and k max = 400 for the example with simu- lated Adv anced LIGO noise to cater for the steep drop in the PSD at lo w frequencies. F or the examples with a signal em b edded in noise, w e use a Uniform( −∞ , ∞ ) prior on the signal reconstruction parameters β , and let d = 25 PCs. F or a discussion on the optimal choice of PCs, we refer the reader to [21]. W e also scale the signals to a signal-to-noise ratio (SNR) of % = 50. Here SNR (for n even) is defined as % = v u u t 2 n/ 2+1 X j =0 | ˜ s ( λ j ) | 2 | ˜ ( λ j ) | 2 , (15) 7 where λ j are the p ositive F ourier frequencies, ˜ s ( . ) is the F ourier transformed signal, and ˜ ( . ) is the F ourier trans- formed noise series. Note that for the zero and Nyquist frequencies, the factor of 2 in Equation (15) b ecomes a factor of 4. The v alue of % = 50 is physically motiv ated, as we w ould expect to see an SNR of approximately 50 to 170 for rotating core collapse sup ernov a GWs at a distance of 10 kp c. W e therefore demonstrate ho w the metho d w orks for the low er end of this range. The units for frequency in most examples are radians p er second (rad / s). In the example using simulated Ad- v anced LIGO noise, we rescale to kilohertz (kHz). PSD units are the inv erse of the frequency units, and the PSD figures are scaled logarithmically . GW strain amplitude is unitless. F or all examples, w e run the MCMC sampler for 150 , 000 iterations, with a burn-in p erio d of 50 , 000 and thinning factor of 10. This results in 10 , 000 samples re- tained. A. Estimating the PSD of Non-Gaussian Coloured Noise T o demonstrate how our mo del is capable of dealing with non-Gaussian transien ts in the data (or glitches as they are sometimes called in GW data analysis), we pro- vide an illustrative to y example, using coloured noise gen- erated from a first order autoregressive pro cess, abbrevi- ated to AR(1). A mean-centered AR(1) pro cess { X t } is defined as X t = ρX t − 1 + t , t = 1 , 2 , . . . , n, (16) where ρ is the first order auto correlation, and t is a white noise pro cess (not necessarily Gaussian), with zero mean and constant v ariance σ 2 . With this formulation, we see ho w the current observ ation at time t dep ends on the previous observ ation at time t − 1 through ρ , as w ell as some white noise t , often referred to as innovations or the innovation pr o c ess in time series literature. The AR(1) model is a useful example here since it has a w ell-defined theoretical spectral densit y that w e can com- pare our results against. Assuming | ρ | < 1, the AR(1) pro cess is stationary and has sp ectral density f ( λ ) = σ 2 1 + ρ 2 − 2 ρ cos 2 π λ , λ ∈ ( − π , π ] . (17) As seen in Equation (17), the AR(1) process has a PSD that is not flat, and the noise in our toy example is coloured (non-white), with correlations b etw een frequen- cies — typical of what we would exp ect with real Ad- v anced LIGO noise. As the AR(1) pro cess has a coloured sp ectrum, and white noise has a flat sp ectrum, w e will call use the term innovations to refer to the white noise comp onen t of the mo del to a void confusion. F or our example, rather than using Gaussian innov a- tions, which is the most common inno v ation pro cess used in autoregressiv e mo dels, we use Student- t innov ations with ν = 3 degrees of freedom. The choice of ν = 3 de- grees of freedom is the smallest integer that results in a Studen t- t mo del with finite v ariance (a requirement for the innov ation pro cess { t } of an AR(1) mo del). This mo del has wider tails than that of the Gaussian mo del (and in fact the widest tails possible whilst maintaining the finite v ariance requirement), meaning we can exp ect extreme v alues in the tails of the distribution to o ccur more often. This will b e our proxy for glitches. W e refer the reader to a relev an t time series analysis textb ook such as Bro c kwell and Davis [30] for further information on AR(1) pro cesses. F or this example, we generate a length n = 2 12 AR(1) pro cess with ρ = − 0 . 9 and Student- t inno v ations with ν = 3 degrees of freedom. Let this (stationary) time series hav e sampling in terv al ∆ t = 1 / 2 14 (the same as Adv anced LIGO). The data set-up can b e seen in Fig- ure 1. −20 −10 0 10 20 0 50 100 150 200 250 Time [ms] Amplitude FIG. 1: Simulated stationary AR(1) pro cess with first- order auto correlation ρ = − 0 . 9, and Student- t inno v a- tions ( ν = 3 degrees of freedom). W e can see the effect of using ν = 3 degrees of freedom in Figure 1. Notice how there are transient high am- plitude non-Gaussian even ts. These are a result of the wide-tailed nature of the Student- t density . It w ould b e v ery unlik ely to see these high amplitude even ts if the inno v ation pro cess was Gaussian. W e now run the noise-only algorithm of Section I I C to demonstrate that we can accurately c haracterize a non- Gaussian noise PSD. 8 −2 0 2 4 0 1 2 3 Frequency [rad/s] log(PSD) [log(s/rad)] FIG. 2: Estimated log PSD of the AR(1) time series in Figure 1. 90% credible region (shaded pink) and p oste- rior median log PSD (dashed blue), sup erimp osed with the true log PSD (solid black). The estimated p oin t-wise p osterior median log PSD in Figure 2 is very close to the true log PSD, and the 90% credible region generally contains the true log PSD. This demonstrates that even if there are non-Gaussian transien ts in the data (which is certainly the case for real LIGO data), this PSD estimation metho d p erforms well. This is ho wev er not surprising as the Whittle likelihoo d giv es a goo d approximation to Gaussian and some non- Gaussian likelihoo ds [29]. B. Extracting a Rotating Core Collapse Signal in Stationary Coloured Noise In this example, we aim to extract a rotating GW sig- nal from noisy data using the blo c ked Gibbs sampler de- scrib ed in Section II D. W e em bed the A 1 O 12 . 25 rotating core collapse GW signal from the Ab dik amalo v et al [25] test catalogue (i.e., a signal not part of the base cata- logue used to create the PC basis functions) in AR(1) noise with ρ = 0 . 9. F or clarity , let this pro cess hav e a Gaussian white noise inno v ation pro cess with σ 2 = 1. Let the time series b e length n = 2 12 , which corresp onds to 1 / 4 s of data at the Adv anced LIGO sampling rate. The signal is scaled to hav e a SNR of % = 50. The recon- structed signal can b e seen in Figure 3. The rotating core collapse GW signal in Figure 3 is reconstructed particularly well during the collapse and b ounce phases (the first few p eaks/troughs). The p ost- b ounce ring-down oscillations are usually p oorly esti- mated due to sto c hastic dynamics [21, 25], but are ac- ceptable for this particular example. −30 −20 −10 0 10 110 120 130 140 Time [ms] GW Strain Amplitude FIG. 3: Reconstructed rotating core collapse GW signal. 90% credible region (shaded pink) and posterior median signal (dashed blue), sup erimp osed with true A 1 O 12 . 25 GW signal from the Abdik amalo v et al [25] test catalogue (solid black). In this example, the signal parameters were simulta- neously estimated with the noise PSD using the blo c ked Gibbs sampler describ ed in Section I I D. W e no w com- pare the p erformance of the estimated noise PSD with and without a signal present. That is, we compare the noise PSD estimates betw een the algorithms presen ted in Section I I C (noise-only mo del) and Section I I D (signal- plus-noise mo del), using the same noise series for b oth mo dels. ● ● ● ● ● ● ● ● ● ● ● −2 0 2 0 1 2 3 Frequency [rad/s] log(PSD) [log(s/rad)] ● T rue Noise Only Signal + Noise FIG. 4: Comparison of the noise PSD estimates for the noise-only and signal-plus-noise models. Plotted are the p oin t-wise p osterior median log noise PSDs with and without a GW signal. The true log PSD of the AR(1) noise series is ov erlaid. W e can see in Figure 4 that b oth mo dels (noise-only and signal-plus-noise) p erform similarly when estimat- ing the PSD of coloured Gaussian noise. The poste- rior median log PSDs are approximately equal, and are v ery close to the true log PSD of an AR(1) pro cess with ρ = 0 . 9. This is a useful robustness c heck, and demon- strates that we are successfully decoupling the signal from the noise. 9 C. Comparing Input and Reconstruction P arameters As there is no analytic form linking the astroph ysical parameters of a rotating core collapse stellar ev en t to its GW signal, we can only appro ximate the GW signal using statistical metho ds. W e do this using PCR, but this means that there are no true input parameters that w e can compare with the estimated signal reconstruction parameters. How ev er, if one were to create a fictitious signal as a known linear combination of PCs, we could demonstrate the algorithm’s p erformance in estimating the signal reconstruction parameters. Consider the following fictitious rotating core collapse GW signal y = d X i =1 α i x i , (18) where y is the length n signal, ( x 1 , x 2 , . . . , x d ) are the d PC basis vectors of length n , and ( α 1 , α 2 , . . . , α d ) are the “true” w eights, or PC coefficients. T o randomize the w eights, w e randomly sample each from the standard normal distribution. In this example, w e embed the fictitious length n = 2 12 GW signal in AR(1) noise with ρ = 0 . 9 and Gaussian inno v ations with σ = 1. W e set d = 10. W e rescale the signal to ha ve SNR % = 50, and after the algorithm has run, we rescale our estimated PC co efficients back to the original level for comparison. −2 −1 0 1 2 1 2 3 4 5 6 7 8 9 10 PC PC Coefficient FIG. 5: Posterior median PC co efficients (blue square) and “true” PC co efficients (orange triangle) for the 10 PCs of a fictitious GW signal embedded in AR(1) noise. The error bands are the 95% credible interv als. It can b e seen in Figure 5 that the “true” PC coef- ficien ts are generally con tained within the 95% credible in terv als, demonstrating that the algorithm can estimate a signal’s input parameters well in the presence of station- ary coloured noise. Notice also that the credible inter- v als widen as the principal comp onent n umber increases. This is due to the fact that higher n um bered PCs explain lo wer amoun ts of v ariation in the wa v eform catalogue, re- sulting in low er amplitude wa v es. W e would therefore b e more uncertain ab out these PCs em b edded in noise. D. Extracting a Rotating Core Collapse Signal in Time-V arying Coloured Noise Non-stationary noise has a time-v arying sp ectrum. T o illustrate ho w our metho d can handle non-stationarities (or c hange-p oin ts in the sp ectral structure), we simulate a noise series with J = 2 lo cally stationary comp onents of equal length n 1 = n 2 = 2 12 . The first segment of the noise series is generated from an AR(1) pro cess with ρ = 0 . 5. The second noise segment comes from an AR(1) pro cess with ρ = − 0 . 75. Both segmen ts use a Gaussian inno v ation pro cess with v ariance σ 2 = 1 for clarit y . W e em b ed part of the A 1 O 8 . 25 wa veform from the Ab dik a- malo v et al catalogue [25]. This w av eform is in the test set, not included in the construction of PC basis func- tions. The data set-up can b e seen in Figure 6. −10 −5 0 5 220 240 260 280 Time [ms] GW Strain Amplitude Signal + Noise Signal FIG. 6: Snapshot of the signal sup erimp osed on signal plus noise. The noise series has length n = n 1 + n 2 = 2 13 and is segmented into t wo equal parts. The first half of the noise is generated from an AR(1) with ρ = 0 . 5, and the second half is generated from an AR(1) with ρ = − 0 . 75. Both segments use a Gaussian innov ation pro cess with v ariance σ 2 = 1. The A 1 O 8 . 25 rotating core collapse GW signal from the Ab dik amalo v et al test catalogue [25] is embedded in this noise with a SNR of % = 50. The aim here is to simultaneously estimate b oth noise PSDs, as w ell as reconstructing the embedded GW signal using the metho d describ ed in Section I I E. Here w e are assuming the c hange-p oin t b etw een the tw o noise series is kno wn, though we will demonstrate in the next section that our metho d can lo cate unknown c hange-p oin ts. Notice the difference b etw een the first half of the noise series compared with the second half. Eac h segmen t has a differen t dep endence structure, and are therefore coloured differently in the frequency-domain. This re- sults in a differen t time-domain morphology . Estimates of the noise PSDs can b e seen in Figures 7 and 8. 10 −3 −2 −1 0 1 2 3 Frequency [rad/s] log(PSD) [log(s/rad)] FIG. 7: Sp ectral density estimate of the first noise seg- men t ( ρ = 0 . 5) from Figure 6. 90% credible region (shaded pink), p osterior median log PSD (dashed blue), and theoretical log PSD (solid black). −3 −2 −1 0 1 0 1 2 3 Frequency [rad/s] log(PSD) [log(s/rad)] FIG. 8: Sp ectral density estimate of the second noise segmen t ( ρ = − 0 . 75) from Figure 6. 90% credible region (shaded pink), p osterior median log PSD (dashed blue), and theoretical log PSD (solid black). Figures 7 and 8 show the estimated log PSDs for the t wo noise segments. The p oin t-wise p osterior median log PSDs are close to the true log PSDs, and the 90% credi- ble regions for b oth segments mostly con tain the true log PSDs, but veer slightly off to wards the low frequencies. Due to p osterior consistency of the PSD, these estimates will only get b etter as the sample size increases. Slight imp erfections in the PSD estimates may not b e such a problem if the embedded GW signal is extracted well, whic h happ ens to b e the case in this example. The ex- tracted signal can b e seen in Figure 9. −5 0 220 240 260 280 Time [ms] GW Strain Amplitude FIG. 9: Reconstructed rotating core collapse GW. 90% credible region (shaded pink) and p osterior median sig- nal (dashed blue), sup erimp osed with true A 1 O 8 . 25 GW signal from the Ab dik amalov et al [25] test catalogue (solid black). The first half of the signal was embedded in AR(1) noise with ρ = 0 . 5, and the second half had AR(1) noise with ρ = − 0 . 75. Both noise segments had Gaussian white noise with σ 2 = 1. The 90% credible region for the reconstructed GW sig- nal in Figure 9 generally contains the true signal, and has p erformed particularly well during collapse and bounce. Again, the p ost-b ounce ring-down oscillations usually ha ve the po orest reconstruction through the time series, but has p erformed remark ably well in this example, re- gardless of the sligh t imp erfections of the PSD estimates. E. Detecting a Sp ectral Change-Poin t Consider a change-point problem similar to that of the previous section, where a time series exhibits a change in its sp ectral structure somewhere in the series. A v aluable consequence of the algorithm presented in Sec- tion I I E is its ability to detect change-points regardless of whether the change-point occurs within a segment or on the b oundary . F or the following examples, let n = 2 12 and break this in to J = 32 equal length segments. F or clarit y , assume the time series do es not contain an em- b edded GW signal. First consider the case where the change-point o ccurs on the b oundary of t w o noise series. Let n 1 = n 2 = 2 11 b e the lengths of eac h noise series, and let the first half of the time series be generated from an AR(1) with ρ = 0 . 5, and the second half from an AR(1) with ρ = − 0 . 75. Both AR(1) p ro cesses ha v e additive Gaussian innov ations with σ 2 = 1. In this example, the c hange-p oin t o ccurs ex- actly halfw a y through the series. Figure 10 shows a time- frequency map of the estimated log PSDs for eac h seg- men t. 11 0 1 2 3 0 50 100 150 200 250 Time [ms] Frequency [rad/s] −4 −3 −2 −1 0 log(PSD) FIG. 10: Time-frequency map showing the estimated p osterior median log PSDs for 32 segments of 1 / 4 s of AR(1) noise. The c hange-p oin t in sp ectral structure o c- curs exactly halfwa y through the series. It is ob vious that a change-point o ccurs halfwa y through Figure 10, as there is a sheer change in the sp ec- tral structure at this p oint b et ween segments 16 and 17. The first half of the time-frequency map exhibits stronger lo w-frequency b eha viour, whereas the second half has more p o wer in the higher frequencies. No w consider the case where the c hange-p oin t o ccurs during a segmen t rather than on the b oundary . Here, let each segment hav e the same set-up as b efore, but instead set n 1 = 2 11 − 2 6 and n 2 = 2 11 + 2 6 suc h that a change-point o ccurs halfw ay through segment 16. A time-frequency map of the estimated log PSDs can b e seen in Figure 11. 0 1 2 3 0 50 100 150 200 250 Time [ms] Frequency [rad/s] −5 −4 −3 −2 −1 0 log(PSD) FIG. 11: Time-frequency map showing the estimated p osterior median log PSDs for 32 segments of 1 / 4 s of AR(1) noise. The c hange-p oin t in sp ectral structure o c- curs in the middle of segment 16 just b efore the halfwa y p oin t. Figure 11 demonstrates that there is a noticeable c hange-p oin t roughly halfwa y through the series. There is a smo other transition from one PSD structure to the other than in the previous example since the true c hange- p oin t o ccurs in the middle of a segment rather than on the b oundary . These examples demonstrate that w e can detect p oten- tially unknown change-points in a time series. It is im- p ortan t to note that if more segments are used, the time duration within each segment b ecomes smaller, and our accuracy in detecting the change-point increases. That is, the time at which the c hange-p oin t o ccurs b ecomes more resolv ed if the segmen t durations are smaller. How ev er, one m ust also ensure that the segmen t durations are long enough for the Whittle appro ximation to b e v alid. F. Sim ulated Adv anced LIGO Noise In this example, w e simulate Adv anced LIGO noise and em b ed the A 1 O 10 . 25 rotating core collapse GW signal from the Ab dik amalov et al [25] catalogue in it, scaled to an SNR of % = 50. W e assume a one detector set-up, with linearly p olarized GW signal (zero cross p olarization). The Adv anced LIGO sampling rate is r s = 2 14 Hz, with a Nyquist frequency of r ∗ = 2 13 Hz. Let n = 2 12 , whic h corresp onds to quarter of a second of data. The sim ulated noise is Gaussian, and coloured by the Adv anced LIGO design sensitivity PSD. Generating this noise blindly results in a p erfect matching of the end-p oin ts and their deriv ativ es, due to the simplified frequency-domain mo del. This is not realistic, since real data will often not hav e matc hing end-p oints. In order to mak e the noise generation more realistic, w e in ternally generated a longer frequency-domain series (ten times longer), inv erse discrete F ourier transformed it, and re- turned a fraction of it with a random starting p oin t. This is referred to as “padding” the data. −105 −100 −95 −90 0 2 4 6 8 Frequency [kHz] log(PSD) [−log(Hz)] FIG. 12: Estimated log PSD for simulated Adv anced LIGO noise. 90% credible region (shaded pink) and p osterior median (dashed blue) ov erlaid with log p eri- o dogram (solid grey). Figure 12 shows the estimated log PSD and the 90% credible region, ov erlaid with the log p eriodogram. The metho d p erforms remark ably well, particularly at higher frequencies. Ev en though w e will not b e able to resolve frequencies b elow ∼ 10–20 Hz at the Adv anced LIGO design sensitivity , it is still interesting to see how this metho d p erforms at low er frequencies. Here, the lo w 12 frequency estimates are sligh tly off, but not b y muc h. W e b elieve this to b e due to tw o factors: 1 / 4 s of simu- lated Adv anced LIGO noise is actually a non-stationary series, and we did not adjust for non-stationarities (sim- ulated Adv anced LIGO data is not stationary for more than 1 / 16 s based on the Augmented Dic key-F uller test, Phillips-P erron unit ro ot test, and KPSS test); and the Bernstein p olynomial basis functions are notoriously slo w to con v erge to a true function [35, 36]. These factors con- sidered, the metho d still pro vides a reasonable approxi- mation. −2 −1 0 1 90 110 130 150 170 Time [ms] GW Strain × 10 − 20 FIG. 13: Reconstructed rotating core collapse GW signal. 90% credible interv al (shaded pink) and p osterior mean (dashed blue) o v erlaid with true A 1 O 10 . 25 signal (solid blac k) from Ab dik amalov et al [25] test catalogue. Signal is scaled to a SNR of % = 50. The resultant reconstructed GW signal can b e seen in Figure 13. The estimated signal here is very close to the true signal during the the collapse and b ounce phases, as w ell as during the ring-down oscillations. The 90% credible region contains most of the true GW signal. W e chose d = 25 PCs to reconstruct a rotating core collapse GW signal, but this could b e to o man y or to o few basis functions. Mo del selection metho ds similar to [21] were not inv estigated in the curren t study , and even though Figures 3, 9, and 13 demonstrated goo d estimates during all phases (including ring-down), there is a de- mand for improv ed reconstruction metho ds. W e then accommo dated for non-stationarities in de- tector noise b y breaking the series into smaller and lo- cally stationary comp onen ts, and lo oked at the res ulting time-v arying sp ectrum. This can b e seen in Figure 14. Rather than choosing J = 32 as in Section I I I E, non- stationarities in the Adv anced LIGO noise b ecome more apparen t if we slice the noise series into fewer segmen ts, eac h with longer duration. Instead, consider splitting the data into J = 8 equal length segments ( n j = 2 9 ). Here, the Whittle appro ximation is v alid, and the seg- men ts lo ok lo cally stationary . 0 2 4 6 8 0 50 100 150 200 250 Time [ms] Frequency [kHz] −99 −98 −97 −96 −95 −94 −93 log(PSD) FIG. 14: Time-frequency map of the estimated time- v arying noise sp ectrum for 8 segments of 1 / 4 s simulated Adv anced LIGO noise. The p osterior median log PSDs for each noise segment are used. Figure 14 illustrates that the Adv anced LIGO PSD is c hanging o ver time. Notice that lo wer frequencies are gaining more p ow er ov er time. Assuming that each seg- men t is locally stationary (whic h should be the case since the duration of eac h segmen t is less than 1 / 16 s), it is im- p ortan t to accommo date for the changing nature of the PSD since the Choudh uri et al [22] PSD estimation tech- nique is based on the theory of stationary processes. If we did not adjust for non-stationarities, estimates of astro- ph ysically meaningful parameters could become biased. IV. DISCUSSION AND OUTLOOK This study was motiv ated by the need for an improv ed mo del for PSD estimation in GW data analysis. The assumptions of the standard GW noise mo del are to o restrictiv e for Adv anced LIGO data. GW data is sub- ject to high amplitude non-Gaussian transients, meaning that the Gaussian assumption is not v alid. If the noise mo del is incorrectly sp ecified, we could make misleading inferences. The stationarity assumption is also not v alid, as simulated Adv anced LIGO noise is not stationary for m uch longer than 1 / 16 s according to classical statistical h yp othesis tests. Using off-source data to estimate the PSD is problematic since the PSD will naturally drift o ver time, and is not necessarily the same as on the GW source. The primary goal of this study was to develop a sta- tistical mo del that allows for on-source estimation of the PSD, while making no assumptions ab out the underly- ing noise distribution. W e also wan ted a metho d capable of accoun ting for non-stationary noise. Although we re- stricted our attention to GWs from rotating core collapse stellar ev en ts in this pap er, our approach is p erfectly v alid for any GW signal em b edded in noise. A secondary goal of this pap er w as to highlight to the GW communit y the rich and activ e area of Bay esian nonparametrics (and semiparametrics). W e b eliev e this 13 framew ork will b e a v ery p o werful toolb o x going for- w ard, particularly in the analysis of GW bursts, since accurate parametric mo dels for these types of signals are limited. F urther, our future research efforts regarding rotating core collapse even ts inv olv es Bay esian nonpara- metric regression mo dels to construct GWs from their initial conditions. Regularization metho ds, such as the Ba yesian LASSO [37], are also b eing considered. In this pap er, w e ha v e assumed linearly polarized GWs to b e detected by one interferometer. A relatively simple extension of this work is to include a netw ork of detec- tors, as well as GWs with non-zero cross p olarization. Another extension would b e to assume an unknown sig- nal arriv al time, as done in [20, 21]. These extensions can b e exp ected in the second generation of the algorithm. The noise in our model w as assumed to come from all sources, including detector noise, en vironmental noise, and statistical noise from parametric modelling of the signal. The statistical noise is the residual difference b e- t ween the true and fitted signals. An imp ortant factor to consider was whether statistical noise artificially dom- inated the noise. W e do not b elieve this to b e a domi- nating contributor to the o v erall noise. Since the “theoretical” PSD of Adv anced LIGO at its design sensitivity has a very steep decrease at low fre- quencies un til it reac hes a minim um at roughly 230 Hz, it is difficult for our algorithm to p erfectly characterize the shap e at low frequencies without increasing computation significan tly . This is due to the well-kno wn slow conv er- gence of Bernstein basis functions to a true curve. That is, man y Bernstein p olynomials (on order k = 1000) are required to accurately characterize the PSD of Adv anced LIGO. Compare this to more well-behav ed noise sources, suc h as those from autoregressive pro cesses, whic h re- quire k < 50. W e are currently developing a second generation of this algorithm, using a mixture of B-spline densities (normalized to the unit interv al), rather than Beta densities. B-splines hav e muc h faster conv ergence rates than Bernstein p olynomials [35, 36]. An additional b enefit of c hanging the basis functions to B-splines is that, lik e Bay esLine [7], we will be able to account for sp ectral lines b y p eak-loading knots at a priori kno wn frequencies that these o ccur at. W e ha ve left estimation of sp ectral lines out of the scope of this paper, but believe that a change of basis functions from Bernstein p olyno- mials to normalized B-spline densities could work well. Another in teresting approac h w ould be to mo del spectral lines with informative priors using a similar approach to Macaro [38]. W e used non-informativ e priors in this analysis. It may b e p ossible to translate the kno wn shap e of the Adv anced LIGO design sensitivit y PSD into a prior. This may also aid in improving PSD estimates at low er frequencies. W e discussed a simplified metho d for estimating the time-v arying PSD of non-stationary noise. Our approach assumed that a time series is split into equal length seg- men ts, and at known times. W e demonstrated that it is p ossible to identify change-points in a time series and its sp ectrum using this metho d, and that there is no need to estimate the lo cations of the segment splits. Thus, a fixed grid of kno wn segmen t placemen ts suffices, and no RJMCMC is required. RJMCMC would hav e slow ed the algorithm down significan tly , and created an entire new set of complications. There is m uch w ork to be done on PSD estimation. As the Adv anced LIGO and Adv anced Virgo in terferometers swiftly approach design sensitivity , it is imp ortant that w e contin ue to fo cus not only on parameter estimation tec hniques, but also on modelling detector noise. PSD es- timation is as imp ortan t as parameter estimation, since w e w ant to mak e honest statements ab out our observ a- tions based on rigorous statistical theory . It is hop ed that in the near future, w e can conv erge on a PSD estimation metho d that is less strict than the standard noise mo del, w orks well on real detector data, and is based on go o d statistical theory . W e believe that the methods presen ted in this pap er are definitely a step in the righ t direction. App endix A: Bernstein Polynomials and the Beta Densit y T o define the Bernstein p olynomial, we first need to discuss the Bernstein b asis p olynomials. There are k + 1 Bernstein basis p olynomials of degree k , having the follo wing form b j,k ( x ) = k j x j (1 − x ) k − j , j = 0 , 1 , . . . , k . (A1) A Bernstein p olynomial is the follo wing linear combi- nation of Bernstein basis polynomials B k ( x ) = k X j =0 β j b j,k ( x ) , (A2) where β j are called the Bernstein coefficients. As mentioned in Section I I C, the Bernstein p olyno- mial prior is a finite mixture of Beta probability densi- ties. W e use the following parameterization for the Beta probabilit y density function f ( x | α, β ) = Γ( α + β ) Γ( α )Γ( β ) x α − 1 (1 − x ) β − 1 , (A3) ∝ x α − 1 (1 − x ) β − 1 , (A4) where x ∈ (0 , 1), the shap e parameters are p ositive real n umbers (i.e., α > 0 and β > 0), and Γ( . ) is the gamma function defined as the follo wing improp er integral Γ( u ) = Z ∞ 0 x u − 1 exp( − x )d x. (A5) App endix B: The Dirichlet Distribution, Diric hlet Pro cess, and Stick-breaking Construction The Dirichlet distribution is a multiv ariate generaliza- tion of the Beta distribution (defined in App endix A) 14 with a probability density function defined on the K - dimensional simplex ∆ K = ( ( x 1 , . . . , x K ) : x i > 0 , K X i =1 x i = 1 ) . (B1) The probability densit y function of the Dirichlet dis- tribution is defined as f ( x | α ) = Γ P K i =1 α i Q K i =1 Γ( α i ) K Y i =1 x α i − 1 i , (B2) where α i > 0 , i = 1 , . . . , K . The Dirichlet pro cess is an infinite-dimensional gener- alization of the Dirichlet distribution. It is a probabil- it y distribution on the space of probability distributions, and is often used in Bay esian inference as a prior for infi- nite mixture mo dels. One of the many representati ons of the Dirichlet proce ss is Sethuraman’s stic k-breaking con- struction [18, 31]. This is useful for implementing MCMC sampling algorithms. Let G ∼ DP( M , G 0 ), where G 0 is the center measure, and M is the precision parameter (larger M implies a more precise prior). The Sethuraman representation is G = ∞ X i =1 p i δ Z i , (B3) p i = i − 1 Y j =1 (1 − V j ) V i , (B4) Z i ∼ G 0 , (B5) V i ∼ Beta(1 , M ) . (B6) Consider a stic k of unit length. The weigh ts p i asso- ciated with p oints Z i can b e thought of as breaking this stic k randomly into infinite segments. Break the stick at lo cation V 1 ∼ Beta(1 , M ), assigning the mass V 1 to the random p oin t Z 1 ∼ G 0 . Break the remaining length of the stic k 1 − V 1 b y the prop ortion V 2 ∼ Beta(1 , M ), as- signing the mass (1 − V 1 ) V 2 to the random p oin t Z 2 ∼ G 0 . A t the i th step, break the remaining length of the stick Q i − 1 j =1 (1 − V j ) by the prop ortion V i ∼ Beta(1 , M ), as- signing the mass Q i − 1 j =1 (1 − V j ) V i to the random point Z i ∼ G 0 . This pro cess is rep eated infinitely many times. App endix C: Demonstration of P osterior Consistency It was prov ed in [22] that under very general condi- tions on the prior, the PSD estimation metho d used in this pap er has the prop erty of p osterior consistency . W e pro vide an illustrativ e example of this in Figure 15. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2 0 2 0 1 2 3 Frequency [rad/s] log(PSD) [log(s/rad)] Sample size ● ● ● ● ● ● 2 8 2 9 2 10 2 11 2 12 2 13 FIG. 15: Illustration of p osterior consistency . The true log PSD (dashed blac k) o v erlaid with point-wise p osterior median log PSDs of v arying sample sizes. W e generated AR(1) processes (with ρ = 0 . 9 and Gaus- sian white noise) of v arying sample sizes, and compared their p erformance. It can b e seen in Figure 15 that as the sample size of the time series increases, the p oint- wise p osterior median log PSD gets closer to the true log PSD, thus demonstrating p osterior consistency . A CKNOWLEDGMENTS W e thank Neil Cornish for a thorough reading of the man uscript and for providing insigh tful commen ts, Clau- dia Kirch and Sally W o od for helpful discussions on sta- tionary and lo cally stationary time series, and Blake Seers for advice on data visualization. W e also thank the New Zealand eScience Infrastructure (NeSI) for their high p erformance computing facilities, and the Cen tre for eResearc h at the Universit y of Auckland for their tech- nical supp ort. RM’s and ME’s w ork is supp orted by the Univ ersity of Auckland staff research gran t 420048358, and NC’s w ork b y NSF grants PHY-1204371 and PHY- 1505373. This pap er has b een given LIGO Do cumen t Num b er P1500057. All analysis w as conducted in R , an open-source statistical soft ware av ailable on CRAN (cran.r-pro ject.org). W e ackno wledge the ggplot2 , grid , co da , and bspec pac k ages. [1] The LIGO Scientific Collab oration, Adv anced LIGO, Classic al and Quantum Gr avity , 32 , 074001 (2015). [2] F. Acernese et al , Adv anced Virgo: a second-generation in terferometric gra vitational w a ve detector, Classic al and 15 Quantum Gr avity , 32 , 024001 (2015). [3] Y. Aso, Y. Michim ura, K. Somiya, M. Ando, O. Miyak aw a, T. Sekiguc hi, D. T atsumi, and H. Y a- mamoto, In terferometer design of the KAGRA gravi- tational wa ve detector, Physic al R eview D , 88 , 043007 (2013). [4] T. B. Litten b erg, M. Coughlin, B. F arr, and W. M. F arr, F ortifying the characterization of binary mergers in LIGO data, Physic al R eview D , 88 , 084044 (2013). [5] N. Christensen (for the LIGO Scientific Collab oration and the Virgo Collab oration), LIGO S6 detector char- acterization studies, Classical and Quantum Gr avity , 27 , 194010 (2010). [6] J. Aasi et al , Parameter estimation for compact binary coalescence signals with the first generation gra vitational-wa v e detector netw ork, Physic al Review D , 88 , 062001 (2013). [7] T. B. Littenberg, and N. J. Cornish, Ba y esian infer- ence for sp ectral estimation of gravitational wa v e detec- tor noise, Physic al R eview D , 91 , 084034 (2015). [8] The LIGO Scientific Collab oration and the Virgo Col- lab oration, Sensitivity achiev ed by the LIGO and Virgo gra vitational wa ve detectors during LIGO’s sixth and Virgo’s second and third science runs, [gr-qc] (2012). [9] A. M. Sintes, and B. F. Sch utz, Coherent line remov al: Filtering out harmonic related line interference from ex- p erimen tal data, with application to gra vitational wa v e detectors, Physic al R eview D , 58 , 122003 (1998). [10] L. S. Finn, and S. Mukherjee, Data conditioning for gra v- itational w av e detectors: A Kalman filter for regressing susp ension violin mo des, Physic al R eview D , 63 , 062004 (2001). [11] C. R¨ ov er, R. Meyer, and N. Christensen, Mo delling coloured residual noise in gravitational-w a ve signal pro- cessing, Classic al and Quantum Gr avity , 28 , 015010 (2011). [12] C. R¨ ov er, Studen t- t based filter for robust signal detec- tion, Physic al R eview D , 84 , 122004 (2011). [13] T. B. Littenberg, and N. J. Cornish, Separating gravi- tational wa v e signals from instrument artifacts, Physic al R eview D , 82 , 103007 (2010). [14] S. Vitale, G. Congedo, R. Dolesi, V. F erroni, M. Hueller, D. V etrugno, W. J. W eb er, H. Audley , K. Danzmann, I. Diepholz, M. Hewitson, N. Korsako v a, L. F erraioli, F. Gib ert, N. Karnesis, M. Nofrarias, H. Inchauspe, E. Plagnol, O. Jennrich, P . W. McNamara, M. Armano, J. I. Thorpe, and P . W ass, Data series subtraction with unkno wn and unmo deled background noise, Physic al R e- view D , 90 , 042003 (2014). [15] N. J. Cornish, and T. B. Littenberg, Bay esW av e: Ba yesian Inference for Gravitational W av e Bursts and Instrumen t Glitches, arXiv:1410.3835 [gr-qc] (2014). [16] P . Whittle, Curve and p eriodogram smoothing, Journal of the R oyal Statistic al So ciety: Series B (Metho dolo gi- c al) , 19 , 38–63 (1957). [17] P . J. Green, Reversible jump Marko v chain Monte Carlo computation and Bay esian mo del determination Biometrika , 82 , 711–732 (1995). [18] S. Ghosal, in Bayesian Nonp ar ametrics , N. L. Hjort, C. Holmes, P . M ¨ uller, and S. G. W alker, (Cambridge Univ ersity Press, Cambridge, 2010), pp. 35–79. [19] P . D. W elch, The use of fast F ourier transform for the es- timation of p ow er sp ectra: A metho d based on time av- eraging o ver short, mo dified perio dograms, IEEE T rans- actions on A udio and Ele ctr o ac oustics , 15 , 70–73 (1967). [20] C. R¨ ov er, M.-A. Bizouard, N. Christensen, H. Dim- melmeier, I. S. Heng, and R. Mey er, Ba yesian reconstruc- tion of gra vitational wa v e burst signals from simulations of rotating stellar core collapse and b ounce, Physical R e- view D , 80 , 102004 (2009). [21] M. C. Edwards, R. Meyer, and N. Christensen, Bay esian parameter estimation of core collapse sup erno v ae using gra vitational wa v e signals, Inverse Pr oblems , 30 , 114008 (2014). [22] N. Choudhuri, S. Ghosal, and A. Roy , Bay esian Estima- tion of the Sp ectral Density of a Time Series, Journal of the Americ an Statistic al Association , 99 (468), 1050–1059 (2004). [23] S. Petrone, Random Bernstein Polynomials, Sc andana- vian Journal of Statistics , 26 , 373–393 (1999). [24] S. Petrone, Bay esian density estimation using Bernstein p olynomials, The Canadian Journal of Statistics , 27 , 105–126 (1999). [25] E. Ab dik amalo v, S. Gossan, A. M. DeMaio, and C. D. Ott, Measuring the angular momentum distribu- tion in core-collapse supernov a progenitors with gravita- tional wa v es, Physic al R eview D , 90 , 044001 (2014). [26] I. S. Heng, Rotating stellar core-collapse w av eform de- comp osition: a principal comp onent analysis approac h, Classic al and Quantum Gr avity , 26 , 105005 (2009). [27] O. Rosen, S. W o o d, and D. S. Stoffer, AdaptSPEC: Adaptiv e Sp ectral Estimation for Nonstationary Time Series, Journal of the A meric an Statistic al Asso ciation , 107 (500), 1575–1589 (2012). [28] T. S. F erguson, A Ba y esian analysis of some nonparamet- ric problems, Annals of Statistics , 1 (2), 209–230 (1973). [29] X. Shao, and B. W. W u, Asymptotic sp ectral theory for nonlinear time series, Annals of Statistics , 35 (4), 1773– 1801 (2007). [30] P . J. Bro ckw ell, and R. A. Davis, Time Series: The- ory and Metho ds , (Second Edition, Springer, New Y ork, 1991). [31] J. Sethuraman, A constructiv e definition of Dirichlet pri- ors, Statistic a Sinic a , 4 , 639–650 (1994). [32] S. E. Said, and D. A. Dick ey , T esting for unit ro ots in autoregressiv e-moving a v erage mo dels of unkno wn order, Biometrika , 71 , 599–607 (1984). [33] P . C. B. Phillips, and P . P erron, T esting for a unit ro ot in time series regression, Biometrika , 75 , 335–346 (1988). [34] D. Kwiatko wski, P . C. B. Phillips, P . Schmidt, and Y. Shin, T esting the null h ypothesis of stationarity against the alternative of a unit ro ot, Journal of Ec ono- metrics , 54 , 159–178 (1992). [35] M. J. D. P ow ell, Appr oximation the ory and metho ds , (Cam bridge Universit y Press, Cambridge, 1981). [36] W. Shen, and S. Ghosal, Adaptive Bay esian procedures using random series priors, arXiv:1403.0625 [math.ST] (2014). [37] T. Park, and G. Casella, The Bay esian Lasso, Journal of the Americ an Statistic al Asso ciation , 103 (482), 681–686 (2008). [38] C. Macaro, Bay esian non-parametric signal extraction for Gaussian time series, Journal of Ec onometrics , 157 , 381– 395 (2010).
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment