MEIDNet: Multimodal generative AI framework for inverse materials design

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In this work, we present Multimodal Equivariant Inverse Design Network (MEIDNet), a framework that jointly learns structural information and materials properties through contrastive learning, while encoding structures via an equivariant graph neural network (EGNN). By combining generative inverse design with multimodal learning, our approach accelerates the exploration of chemical-structural space and facilitates the discovery of materials that satisfy predefined property targets. MEIDNet exhibits strong latent-space alignment with cosine similarity 0.96 by fusion of three modalities through cross-modal learning. Through implementation of curriculum learning strategies, MEIDNet achieves ~60 times higher learning efficiency than conventional training techniques. The potential of our multimodal approach is demonstrated by generating low-bandgap perovskite structures at a stable, unique, and novel (SUN) rate of 13.6 %, which are further validated by ab initio methods. Our inverse design framework demonstrates both scalability and adaptability, paving the way for the universal learning of chemical space across diverse modalities.

💡 Research Summary

MEIDNet (Multimodal Equivariant Inverse Design Network) is a novel generative AI framework that simultaneously incorporates three distinct modalities—crystal structure, electronic bandgap, and thermodynamic formation enthalpy—into a unified latent space. The structural encoder is built on an E(3)-equivariant graph neural network (EGNN) that respects rotations, reflections, and node permutations, producing a 128‑dimensional embedding that balances reconstruction fidelity and computational cost. Scalar properties are projected into the same dimensionality by lightweight multilayer perceptrons (MLPs).

Alignment of the three modality embeddings is achieved through contrastive learning using the InfoNCE loss. To avoid the common “modal collapse” and to accelerate convergence, the authors introduce a curriculum learning schedule that linearly ramps the contrastive loss weight from zero to one over the first two‑thirds of training epochs. This schedule yields a roughly 60‑fold speed‑up compared with conventional training while preserving high structure‑matching (SM) rates.

Three fusion strategies are evaluated: late fusion (LF), early fusion (EF), and early fusion combined with curriculum learning (EF+CL). LF attains good SM (~66 %) but poor latent alignment (cosine similarity CS≈0.31, L2 distance ≈1.06). EF improves alignment (CS≈0.89, L2≈0.45) but sacrifices SM (~32 %). EF+CL delivers the best of both worlds, achieving SM≈66 %, CS≈0.91, and L2≈0.40, demonstrating that early integration of modalities together with a curriculum that first focuses on structural learning mitigates feature interference.

With a well‑aligned latent space, the authors implement an inverse‑design pipeline. Target properties (low bandgap 0.8–1.5 eV and negative formation enthalpy) are encoded as constraints, and gradient‑based optimization navigates the latent vector toward these targets. The optimized latent code is decoded back into a crystal structure, which is first screened for thermodynamic stability using the eSEN‑30M‑OAM machine‑learning potential (energy hull < 100 meV/atom). Out of 140 generated perovskite candidates, 19 are both stable and novel (i.e., not present in the training set), yielding a SUN (Stable‑Unique‑Novel) rate of 13.6 %. This surpasses previously reported multimodal models (FTCP < 5 %, CDV‑AE ≈ 18 %) and approaches the performance of state‑of‑the‑art single‑modality generators such as MatterGen (~39 %).

Bandgap predictions for the generated structures, evaluated with a pretrained CGCNN, show a mean absolute error of ~0.02 eV, comparable to leading property predictors. However, DFT calculations at the PBE level tend to underestimate the target bandgap, a discrepancy attributed to (i) a slight bias toward smaller gaps in the training data, (ii) limited coverage of novel compositions, and (iii) the known underestimation of bandgaps by PBE without +U or spin‑orbit coupling.

The paper also demonstrates the generality of the EGNN encoder by testing on two additional datasets—MP‑20 and Carbon‑24—where MEIDNet outperforms prior unimodal autoencoders (FTCP, CDV‑AE) in structure‑matching rates.

In summary, MEIDNet advances materials inverse design by: (1) leveraging equivariant graph networks to preserve physical symmetries, (2) employing contrastive learning with a curriculum to efficiently align multimodal embeddings, (3) showing that early fusion with curriculum yields superior reconstruction and alignment, (4) providing a complete inverse‑design workflow that translates target properties into novel, stable crystal structures, and (5) achieving a record SUN rate for multimodal generative models without post‑hoc filtering. The work underscores the promise of multimodal learning as a universal platform for exploring the vast chemical‑structural space, and it sets the stage for future extensions that could incorporate textual descriptions, spectroscopic data, or experimental images into a unified materials‑design latent space.

MEIDNet: Multimodal generative AI framework for inverse materials design

💡 Research Summary

Comments & Academic Discussion

Leave a Comment