Unsupervised discovery of the shared and private geometry in multi-view data

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Studying complex real-world phenomena often involves data from multiple views (e.g. sensor modalities or brain regions), each capturing different aspects of the underlying system. Within neuroscience, there is growing interest in large-scale simultaneous recordings across multiple brain regions. Understanding the relationship between views (e.g., the neural activity in each region recorded) can reveal fundamental insights into each view and the system as a whole. However, existing methods to characterize such relationships lack the expressivity required to capture nonlinear relationships, describe only shared sources of variance, or discard geometric information that is crucial to drawing insights from data. Here, we present SPLICE: a neural network-based method that infers disentangled, interpretable representations of private and shared latent variables from paired samples of high-dimensional views. Compared to competing methods, we demonstrate that SPLICE 1) disentangles shared and private representations more effectively, 2) yields more interpretable representations by preserving geometry, and 3) is more robust to incorrect a priori estimates of latent dimensionality. We propose our approach as a general-purpose method for finding succinct and interpretable descriptions of paired data sets in terms of disentangled shared and private latent variables.

💡 Research Summary

The paper introduces SPLICE, a novel unsupervised neural‑network framework for jointly extracting shared and private latent representations from paired high‑dimensional views while preserving the intrinsic geometry of each latent submanifold. The authors motivate the problem with examples from multimodal sensor fusion, image‑text alignment, and especially large‑scale simultaneous neural recordings across brain regions, where each region can be viewed as a distinct “view” of the same underlying brain state. Existing multi‑view methods either assume linear relationships (e.g., CCA, reduced‑rank regression), focus on generative modeling without explicit disentanglement, or discard geometric information that is crucial for scientific interpretation.

Problem formulation
Given paired samples ((x_A, x_B)) from views (A) and (B), the generative model is assumed to be
(x_A = g_A(s, z_A),\quad x_B = g_B(s, z_B))
where (s) denotes the shared latent variables, and (z_A, z_B) are private latents for each view. The three latent groups are statistically independent, and the functions (g_A, g_B) are nonlinear. The goal is to recover (g_A, g_B) together with the three latent sets.

SPLICE architecture – two‑step approach

Step 1: Disentangling
- Each view is processed by two encoders: one produces a private code ((F_A(x_A) = \hat z_A), (F_B(x_B) = \hat z_B)), the other produces a shared code from the opposite view ((F_B^{\rightarrow A}(x_B) = \hat s_{B\rightarrow A}), (F_A^{\rightarrow B}(x_A) = \hat s_{A\rightarrow B})).
- Reconstruction is performed cross‑view: (\hat x_A = G_A(\hat s_{B\rightarrow A}, \hat z_A)) and (\hat x_B = G_B(\hat s_{A\rightarrow B}, \hat z_B)). This forces the shared code used for a view to come exclusively from the other view, preventing private information from leaking into the shared representation.
- To prevent the opposite leakage (shared information entering private codes) the authors adopt Predictability Minimization (Schmidhuber, 1992). Two auxiliary “measurement” networks (M_B^{\rightarrow A}) and (M_A^{\rightarrow B}) try to predict the opposite view’s raw data from the private code of the current view. The private encoders are trained to maximize the prediction error (i.e., minimize the variance of the measurement network outputs). When the private code contains no shared information, the best prediction is the data mean, yielding zero mutual information.
- The total loss for the autoencoder part combines reconstruction MSE and the variance penalties weighted by (\lambda_{dis}). The measurement networks are trained simultaneously to minimize their own prediction MSE, leading to an alternating adversarial‑like optimization (Algorithm 1). This explicit adversarial term is distinct from typical GANs because the generator (the encoder) tries to hide information rather than generate realistic samples.
Step 2: Geometry preservation
- After successful disentanglement, the model projects data onto each latent submanifold. For example, to isolate the private submanifold of view A, a random sample from view B provides a fixed shared code (\hat s_{B\rightarrow A}); all training points are then decoded with this fixed shared code while varying their private codes, yielding a set of points that lie on the private submanifold in observation space.
- Standard manifold‑learning techniques (Isomap, LLE, etc.) are applied to these projected points to estimate geodesic distances (D^{geo}). A landmark‑based scheme reduces computational cost from (O(N^2)) to (O(N \log N)).
- A geometry‑preserving loss aligns Euclidean distances in each latent space ((D^z), (D^s)) with the estimated geodesic distances, weighted by (\lambda_{geo}). The final objective is (L_{SPLICE} + \lambda_{geo} \sum L_{geo}). During fine‑tuning, latent encoders are updated so that the learned embeddings faithfully reflect the intrinsic curvature of the underlying data manifolds, yielding interpretable structures such as circles or tori.

Key contributions and insights

Explicit disentanglement via cross‑view reconstruction and predictability minimization yields statistically independent shared and private latents, even when the assumed dimensionalities are misspecified.
Geometry preservation bridges the gap between deep representation learning and classical manifold analysis, allowing scientists to read off meaningful topological features directly from the latent space.
Robustness: The adversarial predictability term makes the method less sensitive to over‑ or under‑estimation of latent dimensions, a common failure mode of recent nonlinear multi‑view models that rely on implicit architectural biases.
Scalability: The two‑step pipeline isolates the expensive manifold‑learning step to a post‑hoc operation on projected points, keeping the overall training cost comparable to standard autoencoders.

Experimental validation

Rotated MNIST: Paired data consist of an original digit (view A) and a randomly rotated version (view B). Shared information = digit identity; private information = rotation angle (only in view B). SPLICE perfectly isolates the rotation angle in the private latent of view B and learns a rotation‑invariant shared code. Competing methods either leak rotation into the shared code or fail to recover the angle accurately.
Neural recordings: Simultaneous recordings from two brain regions (e.g., visual cortex and prefrontal cortex) are treated as the two views. Known common stimulus features (e.g., visual stimulus orientation) are captured by the shared latent, while region‑specific firing patterns occupy the private latents. The learned shared manifold exhibits a circular geometry consistent with head‑direction encoding, while the private manifolds display region‑specific topologies (e.g., toroidal structure in entorhinal recordings). Traditional CCA or reduced‑rank regression miss these nonlinear structures.

Limitations and future directions

The current formulation handles exactly two views; extending to more than two would require additional cross‑encoding pathways and a more complex adversarial scheme.
Training stability of the measurement networks can be challenging; the authors suggest possible regularization (spectral normalization, gradient penalty) to improve convergence.
Landmark‑based geodesic estimation, while faster than full pairwise computation, still scales with the number of landmarks; scalable approximations (e.g., stochastic neighbor graphs) could further reduce runtime for very large datasets.

Conclusion
SPLICE offers a principled, unsupervised solution for disentangling shared and private latent factors while preserving the geometric essence of each factor’s manifold. By combining cross‑view autoencoding, predictability minimization, and post‑hoc manifold alignment, it delivers representations that are both statistically independent and geometrically interpretable. The method is especially promising for neuroscience, where understanding how different brain regions jointly encode information requires exactly this blend of nonlinear disentanglement and geometry preservation.

Unsupervised discovery of the shared and private geometry in multi-view data

💡 Research Summary

Comments & Academic Discussion

Leave a Comment