Reconstruction of sequential data with density models

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We introduce the problem of reconstructing a sequence of multidimensional real vectors where some of the data are missing. This problem contains regression and mapping inversion as particular cases where the pattern of missing data is independent of the sequence index. The problem is hard because it involves possibly multivalued mappings at each vector in the sequence, where the missing variables can take more than one value given the present variables; and the set of missing variables can vary from one vector to the next. To solve this problem, we propose an algorithm based on two redundancy assumptions: vector redundancy (the data live in a low-dimensional manifold), so that the present variables constrain the missing ones; and sequence redundancy (e.g. continuity), so that consecutive vectors constrain each other. We capture the low-dimensional nature of the data in a probabilistic way with a joint density model, here the generative topographic mapping, which results in a Gaussian mixture. Candidate reconstructions at each vector are obtained as all the modes of the conditional distribution of missing variables given present variables. The reconstructed sequence is obtained by minimising a global constraint, here the sequence length, by dynamic programming. We present experimental results for a toy problem and for inverse kinematics of a robot arm.

💡 Research Summary

The paper tackles the problem of reconstructing a sequence of multidimensional real‑valued vectors when some components are missing, a situation that encompasses regression and inverse‑mapping tasks but with the added difficulty that the missing‑data pattern can vary across time and that the relationship between observed and missing variables may be multivalued (one‑to‑many). The authors propose a two‑fold redundancy framework: (1) Vector redundancy – the data lie on a low‑dimensional manifold, so the observed components strongly constrain the missing ones; and (2) Sequence redundancy – consecutive vectors are close to each other (e.g., continuity in time).

To exploit vector redundancy they adopt a probabilistic joint density model, specifically the Generative Topographic Mapping (GTM). GTM defines a latent low‑dimensional space and maps it non‑linearly to the observed space, yielding a Gaussian mixture representation of the data distribution. After training GTM on the available (partially observed) data, the conditional distribution p(y|x) of missing variables y given observed variables x can be evaluated analytically for each time step. Because p(y|x) may be multimodal, the authors extract all modes of this conditional distribution; each mode becomes a candidate reconstruction for that time step. This step converts a potentially many‑to‑one inverse problem into a set of pointwise candidate solutions, preserving the multivalued nature of the underlying mapping.

The second part of the method enforces sequence redundancy by selecting, among all candidate reconstructions, a globally optimal sequence that minimizes a continuity‑based cost. The cost is defined as the sum of pairwise distances (or other physically motivated penalties) between consecutive candidate points. The optimal path through the lattice of candidates is found by dynamic programming (DP), analogous to the Viterbi algorithm for hidden Markov models. DP efficiently computes the minimal‑cost trajectory in O(N·K²) time, where N is the number of time steps and K the average number of candidates per step.

The authors validate the approach on two domains. The first is a synthetic toy problem where data are generated on a 2‑D manifold embedded in 3‑D space, with random missing entries. The DP‑selected sequence recovers the missing dimensions accurately while preserving smoothness. The second, more realistic, application is inverse kinematics of a six‑degree‑of‑freedom robot arm. Here the mapping from joint angles (hidden variables) to end‑effector position (observed) is inherently multivalued. When some joint angles are missing, the proposed method recovers them with significantly lower mean‑square error than standard regression or neural‑network baselines, and it avoids physically implausible sudden joint jumps.

Key contributions of the paper are:

Mode‑based candidate generation from conditional densities, allowing natural handling of one‑to‑many relationships without collapsing them to a single expectation.
Integration of manifold‑based vector redundancy (via GTM) with temporal continuity, yielding a principled probabilistic model of the data.
Global sequence optimization using dynamic programming, which selects a coherent reconstruction across the entire sequence rather than independent pointwise estimates.

The work also discusses limitations. The number of modes per time step can grow, increasing DP’s computational burden; thus scalable approximations or pruning strategies are needed for high‑dimensional or long sequences. Moreover, the quality of the GTM model depends on sufficient training data and appropriate latent dimensionality selection. Nonetheless, the paper provides a solid framework for missing‑data reconstruction in settings where both multivalued mappings and temporal continuity are essential, opening avenues for applications in speech processing, computer vision, robotics, and any domain where sequential sensor data are partially corrupted.

Reconstruction of sequential data with density models

💡 Research Summary

Comments & Academic Discussion

Leave a Comment