노이즈가 많은 원자 시뮬레이션을 위한 로그확률 기반 통합 위상 분류 및 결함 정량화 모델

Reading time: 5 minute
...

📝 Abstract

Atomistic simulations generate large volumes of noisy structural data, but extracting phase labels, order parameters (OPs), and defect information in a way that is universal, robust, and interpretable remains challenging. Existing tools such as PTM and CNA are restricted to a small set of hand-crafted lattices (e.g. FCC/BCC/HCP), degrade under strong thermal disorder or defects, and produce hard, template-based labels without per-atom probability or confidence scores. Here we introduce a logprobability foundation model that unifies denoising, phase classification, and OP extraction within a single probabilistic framework. We reuse the MACE-MP foundation interatomic potential on crystal structures mapped to AFLOW prototypes, training it to predict per-atom, per-phase logits l and to aggregate them into a global log-density log Pθ (r) whose gradient defines a conservative score field. Denoising corresponds to gradient ascent on this learned log-density, phase labels follow from arg max c l ac , and the l values act as continuous, defect-sensitive and interpretable OPs quantifying the Euclidean distance to ideal phases. We demonstrate universality across hundreds of prototypes, robustness under strong thermal and defect-induced disorder, and accurate treatment of complex systems such as ice polymorphs, ice-water interfaces, and shock-compressed Ti.

💡 Analysis

Atomistic simulations generate large volumes of noisy structural data, but extracting phase labels, order parameters (OPs), and defect information in a way that is universal, robust, and interpretable remains challenging. Existing tools such as PTM and CNA are restricted to a small set of hand-crafted lattices (e.g. FCC/BCC/HCP), degrade under strong thermal disorder or defects, and produce hard, template-based labels without per-atom probability or confidence scores. Here we introduce a logprobability foundation model that unifies denoising, phase classification, and OP extraction within a single probabilistic framework. We reuse the MACE-MP foundation interatomic potential on crystal structures mapped to AFLOW prototypes, training it to predict per-atom, per-phase logits l and to aggregate them into a global log-density log Pθ (r) whose gradient defines a conservative score field. Denoising corresponds to gradient ascent on this learned log-density, phase labels follow from arg max c l ac , and the l values act as continuous, defect-sensitive and interpretable OPs quantifying the Euclidean distance to ideal phases. We demonstrate universality across hundreds of prototypes, robustness under strong thermal and defect-induced disorder, and accurate treatment of complex systems such as ice polymorphs, ice-water interfaces, and shock-compressed Ti.

📄 Content

Atomistic simulations are central tools for studying solid-solid and solid-liquid phase transitions, defect formation, and microstructural evolution in materials [1,2,3,4]. Advances in first-principles calculations, machine-learning interatomic potentials (MLIPs), and highperformance computing now enable routine multi-million atom simulations over long timescales. However, extracting physical insight from such datasets still hinges on two challenging analysis tasks: (i) assigning crystalline phase labels to individual atoms, and (ii) defining continuous order parameters (OPs) that quantify the degree of structural order and track phase transformations. For realistic, thermally perturbed configurations with defects, surfaces, grain boundaries, or partial melting, systematic and universally applicable tools for these tasks are still lacking.

Significant progress has been made on crystal structure classification for ideal or weakly perturbed unit cells. The Curtarolo group, for example, has curated the AFLOW Encyclopedia of structural prototypes [5,6,7,8] and developed tools such as XtalFinder [9], which efficiently match relaxed primitive cells to known prototypes. For large-scale atomistic configurations, a range of local structural descriptors is widely used, including common neighbor analysis (CNA) [10], bond-orientational OPs [11,12], centrosymmetry analysis [13], and polyhedral template matching (PTM) [14]. These methods are highly effective for a handful of well-studied lattices such as BCC, FCC, and HCP, and have become standard in analysis packages like OVITO [15]. Yet, they typically rely on hand-crafted geometric thresholds and domain-specific heuristics, limiting their transferability to complex or less common prototypes. Under strong thermal distortions, disorder, or coexistence of multiple phases, they often mislabel atoms or return ambiguous classifications [16].

Continuous OPs provide complementary scalar measures of structural order. Classical examples include Steinhardt-type bond-order parameters and related metrics for liquid-solid transitions [11,12]. However, unlike the AFLOW prototype catalog for crystal structures, no analogous, systematic “encyclopedia” of OPs exists. Instead, OPs are typically designed on a case-by-case basis, tailored to particular polymorphs or specific transitions (e.g., FCC/BC-C/HCP). This lack of a general framework hinders automated analysis of large, heterogeneous datasets and complicates thermodynamic characterization of complex phase behavior.

Machine learning (ML) offers an attractive path toward more general structure characterization. Early work combined symmetry-invariant descriptors (e.g., SOAP, bispectrum) with neural networks to classify crystal structures or detect phase transitions [17,18,19,20,21,22,23,24,25,26,27,28,29,30,31]. In our previous works [16,32], we adapted the score-based diffusion models [33,34,35] from generative AI to atomistic systems, treating thermal noise removal as a statistical inference problem. A machine-learned denoiser model approximates the non-conservative score (nominally the gradient of a log-density) of ideal crystalline configurations and uses it to iteratively remove thermal perturbations from noisy structures [16,32]. Coupled with conventional classifiers (e.g., CNA and PTM), this twostage pipeline achieved near-perfect phase classification for a few familiar phases up to the melting point, while preserving physically meaningful disorder such as defects.

Despite these successes, existing scientific ML approaches still exhibit several severe limitations for broad applications. First, denoising and classification have typically been viewed and designed as separate tasks: a denoising model is trained for the purpose of either noise removal [16,32] or featurization/pretraining [36,37,38], without explicit knowledge of classification objectives, and a downstream classifier operates only on the cleaned structures. This separation complicates training and may discard subtle structural information useful for discrimination between closely related phases (e.g., HCP vs. ω). Second, most methods focus on producing discrete labels, with limited use of per-atom probabilities or confidence scores to expose ambiguity. This is particularly problematic near phase boundaries, in highly disordered regions, or for structures outside the training distribution. Third, many models are system-specific, specialized to a small set of phases or chemistries, and not ostensibly generalizable to arbitrary crystalline prototypes.

In face of these limitations, an ideal framework for structural analysis should therefore satisfy three criteria simultaneously. First, it should be universal, operating across a wide range of crystal prototypes and chemistries rather than being restricted to a few handtuned lattices such as FCC/BCC/HCP. Second, it must be robust to realistic perturbations such as thermal noise, defects, interfaces, and out-of-equili

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut