The Northeast Materials Database for Magnetic Materials

The Northeast Materials Database for Magnetic Materials
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The discovery of magnetic materials with high operating temperature ranges and optimized performance is essential for advanced applications. Current data-driven approaches are limited by the lack of accurate, comprehensive, and feature-rich databases. This study aims to address this challenge by using Large Language Models (LLMs) to create a comprehensive, experiment-based, magnetic materials database named the Northeast Materials Database (NEMAD), which consists of 67,573 magnetic materials entries(www.nemad.org). The database incorporates chemical composition, magnetic phase transition temperatures, structural details, and magnetic properties. Enabled by NEMAD, we trained machine learning models to classify materials and predict transition temperatures. Our classification model achieved an accuracy of 90% in categorizing materials as ferromagnetic (FM), antiferromagnetic (AFM), and non-magnetic (NM). The regression models predict Curie (Néel) temperature with a coefficient of determination (R2) of 0.87 (0.83) and a mean absolute error (MAE) of 56K (38K). These models identified 25 (13) FM (AFM) candidates with a predicted Curie (Néel) temperature above 500K (100K) from the Materials Project. This work shows the feasibility of combining LLMs for automated data extraction and machine learning models to accelerate the discovery of magnetic materials.


💡 Research Summary

The paper addresses a critical bottleneck in magnetic materials research: the scarcity of accurate, comprehensive, experimentally‑derived datasets for data‑driven discovery. By leveraging large language models (LLMs), the authors automatically extract and curate information from a wide variety of scholarly sources—including XML articles, PDFs, scanned handbooks, and image‑based PDFs—covering 67,573 magnetic compounds. The extraction pipeline combines text and table parsers for XML, a PDF‑to‑Markdown converter for standard PDFs, and Google Gemini OCR for scanned documents. All parsed content is fed to GPT‑4o with structured prompts, producing a uniform JSON record that contains fifteen key fields: chemical composition, Curie temperature, Néel temperature, Curie‑Weiss temperature, crystal structure, lattice structure, lattice parameters, space group, coercivity, magnetization, magnetic moment, remanence, susceptibility, DOI, and an experimental flag.

Quality control is performed by an independent LLM (Google Gemini 2.5) on a random sample of 5,015 entries, achieving a median field‑wise accuracy of 94 %. Manual spot checks confirm the reliability of the validation model. The resulting Northeast Materials Database (NEMAD) therefore offers a rich, high‑quality resource that surpasses existing magnetic‑materials databases both in size (over thirty times larger than MAGDAT) and in feature depth (including structural and magnetic property details).

Using NEMAD, the authors develop machine‑learning models for two tasks: (1) classification of materials into non‑magnetic (NM), ferromagnetic (FM), and antiferromagnetic (AFM) categories, and (2) regression of transition temperatures (Curie for FM, Néel for AFM). Feature engineering transforms chemical formulas into numerical descriptors (elemental fractions, statistical properties of constituent elements) and incorporates structural descriptors (crystal system, lattice parameters, space group). A Random Forest classifier, complemented by an XGBoost classifier, attains 90 % accuracy on a held‑out test set, with balanced precision, recall, and F1‑scores across classes. The AFM class shows slightly lower metrics due to fewer training examples, but overall bias is minimal.

For temperature prediction, Random Forest and XGBoost regressors are trained separately for Curie and Néel temperatures. The best Curie model (chemical composition + structure features) yields R² = 0.87 and MAE = 56 K; the best Néel model (chemical composition only) achieves R² = 0.83 and MAE = 38 K. These performance figures exceed those reported in prior literature, where R² values typically range from 0.70 to 0.80 for similar tasks.

The trained models are then applied to the Materials Project repository, screening over 200 k candidate compounds. The workflow identifies 25 ferromagnetic candidates with predicted Curie temperatures above 500 K and 13 antiferromagnetic candidates with Néel temperatures above 100 K. Notably, many of these candidates contain little or no rare‑earth elements, highlighting the database’s potential to accelerate the discovery of cost‑effective permanent magnets.

Key contributions of the work include: (i) a scalable, LLM‑driven pipeline that automates data extraction from heterogeneous document formats; (ii) the creation of NEMAD, a large, feature‑rich, experimentally validated magnetic‑materials database; (iii) demonstration that high‑quality tabular data enable robust classification and accurate temperature regression; and (iv) practical identification of high‑temperature magnetic candidates for future experimental validation.

Limitations are acknowledged. The current dataset is dominated by experimentally reported compounds; integration with high‑throughput DFT data could further enrich the chemical‑space coverage. The AFM class remains under‑represented, leading to modest performance for that category. Future work may involve hybrid training with computational data, deployment of graph neural networks to exploit crystal‑graph information, and active‑learning loops to iteratively improve the database with targeted experiments.

In summary, this study showcases how modern LLMs can be harnessed to build a comprehensive magnetic‑materials database and, when coupled with conventional machine‑learning models, can substantially accelerate the discovery of high‑performance magnetic materials, especially those free of scarce rare‑earth elements.


Comments & Academic Discussion

Loading comments...

Leave a Comment