Latent Diffusion-Based 3D Molecular Recovery from Vibrational Spectra
Infrared (IR) spectroscopy, a type of vibrational spectroscopy, is widely used for molecular structure determination and provides critical structural information for chemists. However, existing approaches for recovering molecular structures from IR spectra typically rely on one-dimensional SMILES strings or two-dimensional molecular graphs, which fail to capture the intricate relationship between spectral features and three-dimensional molecular geometry. Recent advances in diffusion models have greatly enhanced the ability to generate molecular structures in 3D space. Yet, no existing model has explored the distribution of 3D molecular geometries corresponding to a single IR spectrum. In this work, we introduce IR-GeoDiff, a latent diffusion model that recovers 3D molecular geometries from IR spectra by integrating spectral information into both node and edge representations of molecular structures. We evaluate IR-GeoDiff from both spectral and structural perspectives, demonstrating its ability to recover the molecular distribution corresponding to a given IR spectrum. Furthermore, an attention-based analysis reveals that the model is able to focus on characteristic functional group regions in IR spectra, qualitatively consistent with common chemical interpretation practices.
💡 Research Summary
The paper introduces IR‑GeoDiff, a novel conditional latent diffusion model that recovers three‑dimensional molecular geometries directly from infrared (IR) spectra. The authors argue that existing spectrum‑to‑structure approaches rely on one‑dimensional SMILES strings or two‑dimensional molecular graphs, which discard the spatial information intrinsic to vibrational spectra. To bridge this gap, IR‑GeoDiff integrates spectral cues into both node (atom) and edge (bond) representations of a molecule and generates 3D coordinates in a controllable manner.
The problem is formally defined as learning the conditional distribution pθ(x | S, h), where x denotes atomic coordinates, S is the IR spectrum, and h (the atom types and count) is assumed known—a realistic assumption because elemental analysis or complementary techniques usually provide the molecular formula before IR interpretation. The model therefore focuses on reconstructing the geometry given the spectrum and composition.
The architecture consists of two main components. First, a Transformer‑based spectral classifier τθ extracts high‑level features from the raw 1‑D IR spectrum using a patch‑embedding scheme that captures local peak patterns. Second, a latent diffusion framework built upon GEOLDM (Geometric Latent Diffusion Model) encodes the molecular geometry into a latent vector zx while keeping the atom‑type latent zh fixed. During the forward diffusion process, Gaussian noise is added only to zx across T timesteps, preserving roto‑translational equivariance. The reverse denoising network εθ is an E(3)‑equivariant graph neural network that receives zx,t, the timestep t, the spectral features from τθ, and the fixed zh. Spectral information is injected via cross‑attention mechanisms applied to both node and edge embeddings, allowing the model to learn how specific spectral regions (e.g., C=O, N‑H, O‑H stretches) influence bond lengths and angles.
Training minimizes the standard diffusion loss in latent space, L = E
Comments & Academic Discussion
Loading comments...
Leave a Comment