Database and deep-learning scalability of anharmonic phonon properties by automated brute-force first-principles calculations
Understanding the anharmonic phonon properties of crystal compounds – such as phonon lifetimes and thermal conductivities – is essential for investigating and optimizing their thermal transport behaviors. These properties also impact optical, electronic, and magnetic characteristics through interactions between phonons and other quasiparticles and fields. In this study, we develop an automated first-principles workflow to calculate anharmonic phonon properties and build a comprehensive database encompassing more than 6,000 inorganic compounds. Utilizing this dataset, we train a graph neural network model to predict thermal conductivity values and spectra from structural parameters, demonstrating a scaling law in which prediction accuracy improves with increasing training data size. High-throughput screening with the model enables the identification of materials exhibiting extreme thermal conductivities – both high and low. The resulting database offers valuable insights into the anharmonic behavior of phonons, thereby accelerating the design and development of advanced functional materials.
💡 Research Summary
In this work the authors address a long‑standing gap in materials informatics: the lack of a large, first‑principles‑based database of intrinsic anharmonic phonon properties, which are essential for accurate prediction of lattice thermal conductivity and related phenomena. They develop an automated workflow, named “auto‑kappa”, that couples VASP density‑functional theory calculations with the ALAMODE phonon package. The workflow handles all the challenging steps required for reliable anharmonic phonon analysis, including (i) robust structural optimization using an equation‑of‑state approach, (ii) automatic detection and elimination of imaginary phonon modes by enlarging supercells, (iii) systematic generation of second‑ and third‑order force constants with adaptive cutoff distances, and (iv) solution of the Boltzmann transport equation under the relaxation‑time approximation (RTA). By fully automating these tasks, the authors are able to compute a comprehensive set of anharmonic properties for more than 7 000 inorganic, non‑metallic, non‑magnetic compounds drawn from the Phonondb and Materials Project repositories. For each material the database stores phonon dispersions, participation ratios, density of states, Grüneisen parameters, temperature‑ and grain‑size‑dependent lattice thermal conductivity, mode‑resolved scattering rates and lifetimes, cumulative and spectral conductivity as functions of mean free path and frequency, as well as all intermediate files (displacement‑force datasets, force‑constant tensors, input/output scripts). Materials that exhibit unstable phonon modes are still represented by their harmonic data, ensuring that the overall dataset remains internally consistent.
Having assembled this “Phonix” database, the authors turn to machine‑learning (ML) to demonstrate its utility. They construct a graph neural network (GNN) that encodes crystal structures as graphs (atoms as nodes, bonds as edges) and trains the model to predict both the scalar lattice thermal conductivity κ and the full frequency‑dependent conductivity spectrum κ(ω). Crucially, they investigate how prediction accuracy scales with the size of the training set. By incrementally increasing the number of training examples from 500 to 6 000, they observe a clear improvement in mean absolute error (MAE) and coefficient of determination (R²), establishing a quantitative scaling law: larger, high‑quality anharmonic datasets systematically enhance model performance. This result validates the premise that first‑principles anharmonic data contain rich, transferable physical information that can be learned by deep‑learning architectures.
The trained GNN is then applied to the entire database for high‑throughput screening. The model efficiently identifies materials with extreme thermal conductivities: (a) ultra‑high κ (> 200 W m⁻¹ K⁻¹) candidates such as certain dense oxides and nitrides, and (b) ultra‑low κ (< 1 W m⁻¹ K⁻¹) candidates including complex oxides and low‑density halide‑based compounds. These predictions provide immediate targets for experimental verification and potential applications in thermal management, thermoelectrics, and thermal barrier coatings.
From a technical perspective the paper makes several notable contributions. First, it demonstrates that fully automated, high‑throughput anharmonic phonon calculations are feasible on modern supercomputing resources when the workflow intelligently adapts VASP and ALAMODE parameters (e.g., parallelization scheme, force‑constant cutoff). Second, the dataset dramatically expands the coverage of anharmonic properties beyond existing resources such as Phonondb (≈ 10 000 entries with only harmonic data) by adding ≈ 7 000 rigorously computed three‑phonon RTA results for structurally complex compounds. Third, the authors provide the entire software stack and the database (hosted on ARI‑mdx) as open‑source resources, encouraging community‑driven extensions to include four‑phonon scattering, self‑consistent phonon renormalization, electron‑phonon coupling, and magnon‑phonon interactions. Finally, the demonstrated scaling law suggests that as computational power continues to grow, future versions of the database—incorporating higher‑order anharmonicity and broader material classes—will enable even more accurate, multi‑physics property predictions.
In summary, this study delivers a robust, automated pipeline for generating large‑scale anharmonic phonon data, constructs the first substantial first‑principles anharmonic phonon database, and shows that deep‑learning models trained on this data can predict lattice thermal conductivity with accuracy that improves systematically with dataset size. The combined workflow and ML framework constitute a powerful platform for accelerating the discovery of materials with tailored thermal transport properties, and lay the groundwork for future integration of additional quasiparticle interactions into a unified materials informatics ecosystem.
Comments & Academic Discussion
Loading comments...
Leave a Comment