Variational Gaussian Process Dynamical Systems
High dimensional time series are endemic in applications of machine learning such as robotics (sensor data), computational biology (gene expression data), vision (video sequences) and graphics (motion capture data). Practical nonlinear probabilistic approaches to this data are required. In this paper we introduce the variational Gaussian process dynamical system. Our work builds on recent variational approximations for Gaussian process latent variable models to allow for nonlinear dimensionality reduction simultaneously with learning a dynamical prior in the latent space. The approach also allows for the appropriate dimensionality of the latent space to be automatically determined. We demonstrate the model on a human motion capture data set and a series of high resolution video sequences.
💡 Research Summary
The paper introduces a novel Bayesian framework for modeling high‑dimensional time‑series data called the Variational Gaussian Process Dynamical System (VGPDS). Traditional approaches that combine Gaussian Process Latent Variable Models (GP‑LVM) with a dynamical prior (often referred to as Gaussian Process Dynamical Systems, GP‑DS) typically rely on a maximum‑a‑posteriori (MAP) estimate of the latent trajectories. This MAP treatment leaves the latent variables un‑marginalised, which makes it difficult to optimise the hyper‑parameters of the dynamical prior without over‑fitting and prevents the model from automatically determining the appropriate latent dimensionality.
VGPDS resolves these issues by applying a variational inference scheme that marginalises the latent variables X approximately but rigorously, using a lower bound on the marginal likelihood. The model consists of two independent Gaussian processes: (i) a temporal GP that generates a Q‑dimensional latent trajectory X(t) from time stamps, and (ii) a mapping GP that maps each latent point X_n to the observed D‑dimensional data y_n. Different kernel families can be chosen for each GP. For the temporal kernel k_x the authors experiment with Ornstein‑Uhlenbeck, squared‑exponential (RBF), Matern‑3/2, and periodic kernels, allowing control over smoothness and Markovian properties. For the mapping kernel k_f they use an RBF with Automatic Relevance Determination (ARD) length‑scales, which automatically “switch off’’ irrelevant latent dimensions by driving the corresponding scales to zero.
To make inference tractable, the authors introduce a set of M inducing points u evaluated at pseudo‑inputs X_u. This sparse GP construction reduces the computational cost of the mapping GP from O(N³) to O(M³) while preserving accuracy. The variational distribution is factorised as q(X)=∏_q N(μ_q, S_q) for the latent trajectories and q(u)=∏_d N(m_d, S_u) for the inducing variables. Crucially, S_q is allowed to be a full N×N covariance matrix, capturing strong correlations between time points that are typical in dynamical systems.
The variational lower bound consists of a data‑fit term and a KL‑divergence term. By taking derivatives of the bound with respect to μ_q and S_q and setting them to zero, the authors obtain stationary conditions that involve diagonal matrices Λ_q. They then re‑parameterise the N² variational parameters (μ_q, S_q) using only O(N) quantities (λ_q, μ_q), dramatically reducing the optimisation burden.
The framework naturally extends to multiple independent sequences. Each sequence receives its own block‑diagonal temporal covariance K_t, while the mapping GP is shared across all sequences, enabling a common latent representation of diverse motions.
For prediction and reconstruction, the model introduces latent variables X* for new time stamps t* and computes a variational posterior q(X*). The predictive distribution p(y*|y, t, t*) is approximated by integrating over q(X*) and the conditional GP mapping. Although the resulting integral is non‑Gaussian, its mean and covariance can be derived analytically, yielding closed‑form expressions analogous to standard GP regression.
Empirical evaluation is performed on two challenging domains. First, a motion‑capture dataset from the CMU database comprising 2,613 frames of 59‑dimensional joint angles, split into 31 training sequences. VGPDS automatically inferred 3–4 effective latent dimensions (depending on the temporal kernel) and outperformed MAP‑based GP‑DS, a binary latent variable model, and k‑nearest‑neighbour baselines in both scaled‑space and angle‑space error metrics. The Matern kernel gave the best reconstruction for leg data, while the RBF kernel excelled for body data.
Second, the authors applied VGPDS directly to raw pixel values of high‑resolution video sequences (up to ~1 million dimensions per frame). They reconstructed frames with 40–50 % missing pixels, achieving lower mean‑squared error than k‑NN across three video datasets (a talking‑woman clip, an artificial ocean‑wave scene, and a dog‑running clip). Moreover, by sampling from the learned latent space they generated plausible new video sequences, demonstrating the model’s generative capability.
Overall, VGPDS offers three major advantages: (1) it quantifies uncertainty in the latent trajectories through variational marginalisation, (2) it automatically selects both the dynamical hyper‑parameters and the latent dimensionality via ARD and the variational bound, and (3) it scales to thousands of time points and millions of observed dimensions thanks to inducing‑point sparsity and O(N) re‑parameterisation. These properties make VGPDS a powerful and flexible tool for a wide range of applications involving high‑dimensional temporal data, such as robotics, computational biology, and computer vision.
Comments & Academic Discussion
Loading comments...
Leave a Comment