DistMLIP: A Distributed Inference Platform for Machine Learning Interatomic Potentials

DistMLIP: A Distributed Inference Platform for Machine Learning Interatomic Potentials
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Large-scale atomistic simulations are essential to bridge computational materials and chemistry to realistic materials and drug discovery applications. In the past few years, rapid developments of machine learning interatomic potentials (MLIPs) have offered a solution to scale up quantum mechanical calculations. Parallelizing these interatomic potentials across multiple devices poses a challenging, but promising approach to further extending simulation scales to real-world applications. In this work, we present DistMLIP, an efficient distributed inference platform for MLIPs based on zero-redundancy, graph-level parallelization. In contrast to conventional spatial partitioning parallelization, DistMLIP enables efficient MLIP parallelization through graph partitioning, allowing multi-device inference on flexible MLIP model architectures like multi-layer graph neural networks. DistMLIP presents an easy-to-use, flexible, plug-in interface that enables distributed inference of pre-existing MLIPs. We demonstrate DistMLIP on four widely used and state-of-the-art MLIPs: CHGNet, MACE, TensorNet, and eSEN. We show that DistMLIP can simulate atomic systems 3.4x larger and up to 8x faster compared to previous multi-GPU methods. We show that existing foundation potentials can perform near-million-atom calculations at the scale of a few seconds on 8 GPUs with DistMLIP.


💡 Research Summary

DistMLIP is a distributed inference platform designed to scale machine‑learning interatomic potentials (MLIPs) across multiple GPUs without requiring any changes to the underlying model architecture. The authors identify the fundamental limitation of existing multi‑GPU solutions, which rely on spatial partitioning (as implemented in LAMMPS). Spatial partitioning pads each sub‑domain with “ghost” atoms to account for interactions across cell boundaries, leading to massive redundant computation and memory usage, especially for long‑range graph neural network (GNN) based MLIPs.

To overcome this, DistMLIP adopts a graph‑level partitioning strategy. An atomic system is first converted into a graph where nodes represent atoms and edges encode pairwise distances and chemical species. The graph is then split vertically (along the longest cell dimension) into disjoint partitions. For each partition i, a “border” set of 1‑hop neighbor nodes (H_i) that have edges pointing into the partition is identified, and the sub‑graph G′_i is constructed by augmenting the local node set G_i with these border nodes. This ensures that every message‑passing layer has all required incoming information locally.

Data exchange between GPUs is minimized by classifying node IDs into three buckets: TO, FROM, and PURE. TO_i


Comments & Academic Discussion

Loading comments...

Leave a Comment