Summarizing posterior distributions in signal decomposition problems when the number of components is unknown
This paper addresses the problem of summarizing the posterior distributions that typically arise, in a Bayesian framework, when dealing with signal decomposition problems with unknown number of components. Such posterior distributions are defined over union of subspaces of differing dimensionality and can be sampled from using modern Monte Carlo techniques, for instance the increasingly popular RJ-MCMC method. No generic approach is available, however, to summarize the resulting variable-dimensional samples and extract from them component-specific parameters. We propose a novel approach to this problem, which consists in approximating the complex posterior of interest by a “simple”—but still variable-dimensional—parametric distribution. The distance between the two distributions is measured using the Kullback-Leibler divergence, and a Stochastic EM-type algorithm, driven by the RJ-MCMC sampler, is proposed to estimate the parameters. The proposed algorithm is illustrated on the fundamental signal processing example of joint detection and estimation of sinusoids in white Gaussian noise.
💡 Research Summary
The paper tackles a fundamental yet under‑addressed problem in Bayesian signal decomposition: how to summarize posterior distributions that live on a union of subspaces of varying dimensionality when the number of signal components is unknown. In such trans‑dimensional settings, reversible‑jump MCMC (RJ‑MCMC) provides samples from the joint posterior over model index k and component parameters θₖ, but these samples are difficult to interpret because each sample may belong to a different dimensional space and the classic label‑switching phenomenon obscures component‑specific information.
To overcome these difficulties, the authors introduce a novel parametric approximation q₍η₎ that retains the variable‑dimensional nature of the target posterior while being simple enough to allow tractable inference. The approximation consists of L latent Gaussian “components”. Each component l is associated with a mean vector µₗ, covariance matrix Σₗ, and a Bernoulli presence indicator ξₗ with probability πₗ. The total number of active components for a generated sample is k = Σₗ ξₗ, and the active Gaussian vectors are randomly ordered to form the parameter vector θₖ. This construction generalizes the standard Gaussian mixture model (GMM) by allowing multiple components to be present simultaneously, thereby naturally handling the trans‑dimensional label‑switching problem.
The fitting criterion is the Kullback‑Leibler (KL) divergence D_KL(f‖q₍η₎) between the true posterior density f and the approximating density q₍η₎. Since the KL divergence cannot be evaluated analytically, the authors replace it with a Monte‑Carlo estimate based on M RJ‑MCMC draws {x⁽ⁱ⁾ = (k⁽ⁱ⁾,θₖ⁽ⁱ⁾)}. Minimizing this estimate leads to a maximum‑likelihood‑type problem for η. Direct EM is infeasible because the E‑step would require summing over all possible allocation vectors z, whose number grows as k! and quickly becomes intractable.
Consequently, the paper adopts a Stochastic EM (SEM) scheme. In each iteration r, a stochastic “S‑step” draws allocation vectors z⁽ⁱ⁾ from the conditional distribution p(z|x⁽ⁱ⁾, η^{(r‑1)}) using an independent Metropolis‑Hastings (I‑MH) kernel, because exact sampling is not possible. The drawn allocations are then used in a “pseudo‑complete‑data” likelihood to perform an M‑step, updating η^{(r)} by maximizing the expected log‑likelihood given the sampled allocations. This stochastic approximation sidesteps the combinatorial explosion of the classic E‑step while preserving the EM spirit.
Two robustness enhancements are introduced. First, because the parametric model cannot capture the diffuse tail of the posterior that appears for large k, the authors replace ordinary sample means and covariances in the M‑step with robust estimators based on medians and inter‑quartile ranges. This reduces the influence of outlying RJ‑MCMC draws that would otherwise bias the maximum‑likelihood update. Second, a uniform‑intensity Poisson point‑process component is added to the model to absorb samples that do not fit any Gaussian component, effectively allowing L to be smaller than the maximum observed k.
The methodology is demonstrated on a classic sinusoid detection problem. A synthetic signal of length N=64 contains three sinusoids with amplitudes (20, 6.32, 20) and radial frequencies (0.63, 0.68, 0.73) in white Gaussian noise (SNR≈7 dB). RJ‑MCMC sampling yields a posterior over k (mostly k=2, 3, 4) and sorted frequencies for each k. Traditional Bayesian model selection (BMS) would pick the most probable model (k=2) and discard the weaker middle component. In contrast, the proposed approach sets L to the 90th‑percentile of the k‑distribution (L=3) and runs the robust SEM for 50 iterations.
Results show rapid convergence: the KL‑based objective J stabilizes after about ten iterations, and the estimated means µₗ converge to values close to the true frequencies. The estimated presence probabilities πₗ are ≈0.97 for the two strong sinusoids and ≈0.02 for the weaker middle one, reflecting the uncertainty observed in the RJ‑MCMC posterior. The standard deviations of the Gaussian components are small, indicating that the robustification successfully mitigated the effect of outliers. Visual comparison of the fitted Gaussian densities with the RJ‑MCMC frequency histograms confirms that the method effectively resolves the trans‑dimensional label‑switching issue.
In summary, the paper contributes a principled framework for summarizing variable‑dimensional Bayesian posteriors: a flexible parametric model that respects the underlying trans‑dimensional structure, a stochastic EM algorithm that can be driven by RJ‑MCMC samples, and practical robustness tricks to handle diffuse tails and outliers. This approach bridges the gap between Bayesian model selection (which discards information) and Bayesian model averaging (which cannot provide component‑specific summaries), delivering interpretable component‑wise estimates together with quantified existence probabilities. Future work may extend the method to multivariate component parameters, more complex point‑process priors, and real‑time implementations for large‑scale signal processing applications.
Comments & Academic Discussion
Loading comments...
Leave a Comment