Online Expectation-Maximisation

Reading time: 4 minute
...

📝 Original Info

  • Title: Online Expectation-Maximisation
  • ArXiv ID: 1011.1745
  • Date: 2026-03-15
  • Authors: Kerrie Mengersen, Mike Titterington, Christian P. Robert

📝 Abstract

Tutorial chapter on the Online EM algorithm to appear in the volume 'Mixtures' edited by Kerrie Mengersen, Mike Titterington and Christian P. Robert.

💡 Deep Analysis

Figure 1

📄 Full Content

Before entering into any more details about the methodological aspects, let's discuss the motivations behind the association of the two phrases "online (estimation)" and "Expectation-Maximisation (algorithm)" as well as their pertinence in the context of mixtures and more general models involving latent variables.

The adjective online refers to the idea of computing estimates of model parameters on-the-fly, without storing the data and by continuously updating the estimates as more observations become available. In the machine learning literature, the phrase online learning has been mostly used recently to refer to a specific way of analysing the performance of algorithms that incorporate observations progressively (Césa-Bianchi and Lugosi, 2006). We dot not refer here to this approach and will only consider the more traditional setup in which the objective is to estimate fixed parameters of a statistical model and the performance is quantified by the proximity between the estimates and the parameter to be estimated. In signal processing and control, the sort of algorithms considered in the following is often referred to as adaptive or recursive (Ljung andSöderström, 1983, Benveniste et al., 1990). The word recursive is so ubiquitous in computer science that its use may be somewhat ambiguous and is not recommended. The term adaptive may refer to the type of algorithms considered in this chapter but is also often used in contexts where the focus is on the ability to track slow drifts or abrupt changes in the model parameters, which will not be our primary concerns.

Traditional applications of online algorithms involve situations in which the data cannot be stored, due to its volume and rate of sampling as in real-time signal processing or stream mining. The wide availability of very large datasets involving thousands or millions of examples is also at the origin of the current renewed interest in online algorithms. In this context, online algorithms are often more efficient -i.e., converging faster towards the target parameter value-and need fewer computer resource, in terms of memory or disk access, than their batch counterparts (Neal and Hinton, 1999). In this chapter, we are interested in both contexts: when the online algorithm is used to process on-the-fly a potentially unlimited amount of data or when it is applied to a fixed but large dataset. We will refer to the latter context as the batch estimation mode.

Our main interest is maximum likelihood estimation and although we may consider adding a penalty term (i.e., Maximum A Posteriori estimation), we will not consider “fully Bayesian” methods which aim at sequentially simulating from the parameter posterior. The main motivation for this restriction is to stick to computationally simple iterations which is an essential requirement of successful online methods. In particular, when online algorithms are used for batch estimation, it is required that each parameter update can be carried out very efficiently for the method to be computationally competitive with traditional batch estimation algorithms. Fully Bayesian approaches -see, e.g., (Chopin, 2002)-typically require Monte Carlo simulations even in simple models and raise some challenging stability issues when used on very long data records (Kantas et al., 2009).

This quest for simplicity of each of the parameter update is also the reason for focussing on the EM (Expectation-Maximisation) algorithm. Ever since its introduction by Dempster et al. (1977), the EM algorithm has been criticised for its often sub-optimal convergence behaviour and many variants have been proposed by, among others, Lange (1995), Meng and Van Dyk (1997). This being said, thirty years after the seminal paper by Dempster and his coauthors, the EM algorithm still is, by far, the most widely used inference tool for latent variable models due to its numerical stability and ease of implementation. Our main point here is not to argue that the EM algorithm is always preferable to other options. But the EM algorithm which does not rely on fine numerical tunings involving, for instance, line-searches, re-projections or pre-conditioning is a perfect candidate for developing online versions with very simple updates. We hope to convince the reader in the rest of this chapter that the online version of EM that is described here shares many of the attractive properties of the original proposal of Dempster et al. (1977) and provides an easy to implement and robust solution for online estimation in latent variable models.

Quite obviously, guaranteeing the strict likelihood ascent property of the original EM algorithm is hardly feasible in an online context. On the other hand, a remarkable property of the online EM algorithm is that it can reach asymptotic Fisher efficiency by converging towards the actual parameter value at a rate which is equivalent to that of the Maximum Likelihood Estimator (MLE). Hence, when the number of observations

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut