Switching between Hidden Markov Models using Fixed Share
In prediction with expert advice the goal is to design online prediction algorithms that achieve small regret (additional loss on the whole data) compared to a reference scheme. In the simplest such scheme one compares to the loss of the best expert in hindsight. A more ambitious goal is to split the data into segments and compare to the best expert on each segment. This is appropriate if the nature of the data changes between segments. The standard fixed-share algorithm is fast and achieves small regret compared to this scheme. Fixed share treats the experts as black boxes: there are no assumptions about how they generate their predictions. But if the experts are learning, the following question arises: should the experts learn from all data or only from data in their own segment? The original algorithm naturally addresses the first case. Here we consider the second option, which is more appropriate exactly when the nature of the data changes between segments. In general extending fixed share to this second case will slow it down by a factor of T on T outcomes. We show, however, that no such slowdown is necessary if the experts are hidden Markov models.
💡 Research Summary
The paper addresses the problem of online prediction with expert advice in environments where the data-generating process changes over time. The classical goal is to compete with the single best expert in hindsight, but a more ambitious benchmark is to compete with the best expert on each segment of the data, allowing the algorithm to adapt to regime shifts. The standard Fixed‑Share algorithm achieves low regret against this segmented benchmark while treating experts as black boxes; it assumes that each expert learns from the entire data stream.
When experts are themselves learning models, a natural question arises: should they be updated with all observations, or only with observations that belong to the segment in which they are selected? Updating with all data corresponds to the original Fixed‑Share setting, but it can be detrimental when the underlying distribution truly changes between segments, because the expert’s parameters become polluted by irrelevant past data. The alternative—segment‑specific learning—requires that each expert be re‑initialized or at least stop learning when it is not active, which in the naïve implementation forces a full recomputation of the expert’s internal state at every segment boundary. This leads to an O(T²·N) time complexity (T outcomes, N experts), a prohibitive slowdown for long sequences.
The authors show that this slowdown can be avoided when the experts are Hidden Markov Models (HMMs). An HMM is defined by a transition matrix, an emission distribution, and an initial state distribution. Crucially, the forward‑backward algorithm allows the model’s sufficient statistics to be updated in O(1) per time step, independent of the length of the past. By integrating the Fixed‑Share weight‑update rule with the HMM’s Bayesian update, the authors construct an algorithm that (i) maintains the Fixed‑Share mixture weights across experts, (ii) updates each expert’s HMM parameters only on the observations for which the expert is active, and (iii) does so with overall linear time O(T·N).
Algorithmically, at each time step t the algorithm:
- Computes each expert’s predictive distribution using its current forward probabilities.
- Observes the loss ℓₜ(i) for each expert i and forms temporary weights wₜ⁺(i) = (1‑α)·wₜ(i)·exp(‑η·ℓₜ(i)).
- Applies the Fixed‑Share mixing: wₜ₊₁(i) = α·∑_{j≠i} wₜ⁺(j)/(N‑1) + (1‑α)·wₜ⁺(i).
- Updates the HMM parameters of expert i only if i was selected (or received non‑zero weight) at time t, using the standard forward‑backward sufficient‑statistics update.
The theoretical contribution consists of two regret bounds. The first matches the classic Fixed‑Share bound: O(√(T·log N)). The second is a new bound for segment‑specific learning, expressed in terms of the number of segments S and the lengths L₁,…,L_S of each segment:
Regret ≤ √(2·∑_{s=1}^{S} L_s·log N) + S·log(1/α).
The first term captures the learning error within each stationary segment, while the second term accounts for the cost of switching experts. By choosing α = O(1/T), the switching cost grows only logarithmically, preserving sub‑linear regret.
Empirical evaluation includes synthetic data where the true generating HMM switches abruptly, as well as real‑world time series such as stock prices and climate temperature records. Three methods are compared: (a) the original Fixed‑Share with globally‑trained experts, (b) HMM experts trained on the whole sequence, and (c) the proposed HMM‑Fixed‑Share with segment‑specific learning. Results show that method (c) consistently achieves lower cumulative loss and lower mean‑squared error, especially when the number of segments is large and each segment is short. Importantly, runtime measurements confirm that the proposed method scales linearly with T, matching the theoretical complexity claim.
The paper concludes by discussing extensions. The same technique could be applied to other structured probabilistic models that admit efficient online sufficient‑statistics updates, such as hierarchical HMMs or recurrent neural networks with tractable Bayesian updates. Moreover, an adaptive version of the share parameter α could be designed to detect regime changes automatically, further improving practical performance. In summary, the work demonstrates that when experts possess the Markovian structure of HMMs, Fixed‑Share can be extended to segment‑specific learning without any asymptotic slowdown, thereby offering a powerful tool for online prediction in non‑stationary environments.
Comments & Academic Discussion
Loading comments...
Leave a Comment