Changepoints are abrupt variations in the generative parameters of a data sequence. Online detection of changepoints is useful in modelling and prediction of time series in application areas such as finance, biometrics, and robotics. While frequentist methods have yielded online filtering and prediction techniques, most Bayesian papers have focused on the retrospective segmentation problem. Here we examine the case where the model parameters before and after the changepoint are independent and we derive an online algorithm for exact inference of the most recent changepoint. We compute the probability distribution of the length of the current ``run,'' or time since the last changepoint, using a simple message-passing algorithm. Our implementation is highly modular so that the algorithm may be applied to a variety of types of data. We illustrate this modularity by demonstrating the algorithm on three different real-world data sets.
Deep Dive into Bayesian Online Changepoint Detection.
Changepoints are abrupt variations in the generative parameters of a data sequence. Online detection of changepoints is useful in modelling and prediction of time series in application areas such as finance, biometrics, and robotics. While frequentist methods have yielded online filtering and prediction techniques, most Bayesian papers have focused on the retrospective segmentation problem. Here we examine the case where the model parameters before and after the changepoint are independent and we derive an online algorithm for exact inference of the most recent changepoint. We compute the probability distribution of the length of the current ``run,’’ or time since the last changepoint, using a simple message-passing algorithm. Our implementation is highly modular so that the algorithm may be applied to a variety of types of data. We illustrate this modularity by demonstrating the algorithm on three different real-world data sets.
Changepoint detection is the identification of abrupt changes in the generative parameters of sequential data. As an online and offline signal processing tool, it has proven to be useful in applications such as process control [1], EEG analysis [5,2,17], DNA segmentation [6], econometrics [7,18], and disease demographics [9].
Frequentist approaches to changepoint detection, from the pioneering work of Page [22,23] and Lorden [19] to recent work using support vector machines [10], offer online changepoint detectors. Most Bayesian approaches to changepoint detection, in contrast, have been offline and retrospective [24,4,26,13,8]. With a few exceptions [16,20], the Bayesian papers on changepoint detection focus on segmentation and techniques to generate samples from the posterior distribution over changepoint locations.
In this paper, we present a Bayesian changepoint detection algorithm for online inference. Rather than retrospective segmentation, we focus on causal predictive filtering; generating an accurate distribution of the next unseen datum in the sequence, given only data already observed. For many applications in machine intelligence, this is a natural requirement. Robots must navigate based on past sensor data from an environment that may have abruptly changed: a door may be closed now, for example, or the furniture may have been moved. In vision systems, the brightness change when a light switch is flipped or when the sun comes out.
We assume that a sequence of observations x 1 , x 2 , . . . , x T may be divided into non-overlapping product partitions [3].
The delineations between partitions are called the changepoints. We further assume that for each partition ρ, the data within it are i.i.d. from some probability distribution P (x t | η ρ ). The parameters η ρ , ρ = 1, 2, . . . are taken to be i.i.d. as well. We denote the contiguous set of observations between time a and b inclusive as x a:b . The discrete a priori probability distribution over the interval between changepoints is denoted as P gap (g).
We are concerned with estimating the posterior distribution over the current “run length,” or time since the last changepoint, given the data so far observed. We denote the length of the current run at time t by r t . We also use the notation x (r) t to indicate the set of observations associated with the run r t . As r may be zero, the set x (r) may be empty. We illustrate the relationship between the run length r and some hypothetical univariate data in Figures 1(a Figure 1(a) shows hypothetical univariate data divided by changepoints on the mean into three segments of lengths g 1 = 4, g 2 = 6, and an undetermined length g 3 . Figure 1(b) shows the run length r t as a function of time. r t drops to zero when a changepoint occurs. Figure 1(c) shows the trellis on which the messagepassing algorithm lives. Solid lines indicate that probability mass is being passed “upwards,” causing the run length to grow at the next time step. Dotted lines indicate the possibility that the current run is truncated and the run length drops to zero.
We assume that we can compute the predictive distribution conditional on a given run length r t . We then integrate over the posterior distribution on the current run length to find the marginal predictive distribution:
To find the posterior distribution
we write the joint distribution over run length and observed data recursively.
= rt-1
Note that the predictive distribution P (x t | r t-1 , x 1:t ) depends only on the recent data x (r)
t . We can thus generate a recursive message-passing algorithm for the joint distribution over the current run length and the data, based on two calculations: 1) the prior over r t given r t-1 , and 2) the predictive distribution over the newly-observed datum, given the data since the last changepoint.
The conditional prior on the changepoint P (r t | r t-1 ) gives this algorithm its computational efficiency, as it has nonzero mass at only two outcomes: the run length either continues to grow and r t = r t-1 + 1 or a changepoint occurs and r t = 0.
The function H(τ ) is the hazard function. [11].
In the special case is where P gap (g) is a discrete exponential (geometric) distribution with timescale λ, the process is memoryless and the hazard function is constant at H(τ ) = 1/λ.
A recursive algorithm must not only define the recurrence relation, but also the initialization conditions. We consider two cases: 1) a changepoint occurred a priori before the first datum, such as when observing a game. In such cases we place all of the probability mass for the initial run length at zero, i.e. P (r 0 =0) = 1.
- We observe some recent subset of the data, such as when modelling climate change. In this case the prior over the initial run length is the normalized survival function [11] P (r
where Z is an appropriate normalizing constant, and
Conjugate-exponential models are particularly convenient for integrating with the change
…(Full text truncated)…
This content is AI-processed based on ArXiv data.