Bayes and empirical Bayes changepoint problems

Bayes and empirical Bayes changepoint problems
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We generalize the approach of Liu and Lawrence (1999) for multiple changepoint problems where the number of changepoints is unknown. The approach is based on dynamic programming recursion for efficient calculation of the marginal probability of the data with the hidden parameters integrated out. For the estimation of the hyperparameters, we propose to use Monte Carlo EM when training data are available. We argue that there is some advantages of using samples from the posterior which takes into account the uncertainty of the changepoints, compared to the traditional MAP estimator, which is also more expensive to compute in this context. The samples from the posterior obtained by our algorithm are independent, getting rid of the convergence issue associated with the MCMC approach. We illustrate our approach on limited simulations and some real data set.


💡 Research Summary

The paper extends the Bayesian framework for multiple changepoint detection to the realistic setting where the number of changepoints is unknown. Building on Liu and Lawrence (1999), the authors replace the exhaustive enumeration of changepoint configurations with a dynamic programming (DP) recursion that integrates out segment‑specific parameters. This DP computes the marginal likelihood of the entire data sequence in O(N·K) time, where N is the length of the series and K is a user‑defined upper bound on the number of changepoints, thereby keeping computational cost independent of the actual number of changepoints.

Hyper‑parameter estimation (e.g., prior variance, concentration parameters) is tackled via a Monte Carlo Expectation‑Maximization (MC‑EM) algorithm. In the E‑step, given current hyper‑parameters, the algorithm draws independent samples from the posterior distribution of changepoint locations and segment parameters. Crucially, the DP machinery enables direct sampling of independent posterior configurations, eliminating the autocorrelation and convergence diagnostics that plague traditional Markov chain Monte Carlo (MCMC) approaches. The M‑step then maximizes the expected complete‑data log‑likelihood using these samples, yielding updated hyper‑parameters. This EM loop converges under the usual monotonicity guarantees while preserving the full Bayesian treatment of uncertainty.

A central argument of the work is that posterior sampling offers substantive advantages over the conventional maximum a posteriori (MAP) estimator. MAP collapses the posterior to a single point estimate, ignoring the inherent uncertainty about changepoint positions—particularly problematic when changepoints are closely spaced or the data are noisy. By contrast, the posterior samples provide a full distribution over changepoint configurations, allowing practitioners to compute credible intervals, perform Bayesian model averaging, and assess the probability that a changepoint occurs within any given window.

Empirical validation is performed on two fronts. First, synthetic experiments systematically vary the true number of changepoints, inter‑changepoint distances, and noise levels. The proposed DP‑based marginal likelihood computation combined with MC‑EM accurately recovers both the number and locations of changepoints, while achieving speed‑ups of one to two orders of magnitude compared with MCMC‑based Bayesian methods. Second, the approach is applied to real‑world datasets—a genomics copy‑number variation series and an economic time series exhibiting structural breaks. In both cases the algorithm identifies changepoints that correspond to known biological events or economic regime shifts, and the posterior distributions highlight regions of high uncertainty, offering actionable insight for domain experts.

Overall, the contribution lies in (1) a scalable DP recursion that integrates out nuisance parameters, (2) a Monte Carlo EM scheme that yields independent posterior samples and automatically tunes hyper‑parameters, and (3) a demonstration that Bayesian posterior sampling can be both computationally feasible and statistically superior to MAP estimation in changepoint problems. The paper suggests future extensions to multivariate series, non‑Gaussian observation models, and online changepoint detection, where the same DP‑EM machinery could provide real‑time, uncertainty‑aware inference.


Comments & Academic Discussion

Loading comments...

Leave a Comment