"DIVE" into Hydrogen Storage Materials Discovery with AI Agents
Data-driven artificial intelligence (AI) approaches are fundamentally transforming the discovery of new materials. Despite the unprecedented availability of materials data in the scientific literature, much of this information remains trapped in unstructured figures and tables, hindering the construction of large language model (LLM)-based AI agent for automated materials design. Here, we present the Descriptive Interpretation of Visual Expression (DIVE) multi-agent workflow, which systematically reads and organizes experimental data from graphical elements in scientific literatures. We focus on solid-state hydrogen storage materials-a class of materials central to future clean-energy technologies and demonstrate that DIVE markedly improves the accuracy and coverage of data extraction compared to the direct extraction by multimodal models, with gains of 10-15% over commercial models and over 30% relative to open-source models. Building on a curated database of over 30,000 entries from 4,000 publications, we establish a rapid inverse design workflow capable of identifying previously unreported hydrogen storage compositions in two minutes. The proposed AI workflow and agent design are broadly transferable across diverse materials, providing a paradigm for AI-driven materials discovery.
💡 Research Summary
The paper introduces DIVE (Descriptive Interpretation of Visual Expression), a multi‑agent workflow designed to automatically extract and structure experimental data that are embedded in figures, tables, and graphs within scientific publications. Traditional multimodal large language models (LLMs) convert PDFs into text and images and then attempt a single‑shot extraction, but they struggle with quantitative information encoded in plots such as pressure‑composition‑temperature (PCT) curves, temperature‑programmed desorption (TPD) curves, and discharge curves. DIVE overcomes this limitation by chaining three specialized agents.
First, a lightweight inference model scans figure captions to detect whether target visual elements are present. If a relevant figure is found, the figure itself, its caption, and surrounding text are fed into a second multimodal LLM. Carefully engineered prompts instruct this model to translate every data point on the curve into descriptive text and to place the values into a predefined key‑value schema. The output replaces the original image with a textual representation, effectively converting visual information into a machine‑readable format. In the third stage, a separate LLM parses the fully text‑based article and outputs a structured JSON database containing all extracted material properties.
To quantify performance, the authors manually curated a benchmark set from 100 papers (≈10 000 data points). Using Gemini 2.5 Flash alone for direct extraction yielded a total score of 77.89 (out of 100, split equally between accuracy and completeness). When Gemini 2.5 Flash was combined with DeepSeek R1 in the DIVE pipeline, the score rose to 87.21 – an improvement of roughly 12 %. Open‑source models (LLaMA‑4‑Scout, LLaMA‑4‑Maverick, Qwen2.5‑VL‑72B) lagged behind by more than 30 % in the same benchmark. Even a lightweight 8‑billion‑parameter model, DeepSeek‑Qwen3‑8B, achieved 84.6 points when used as the post‑embedding LLM, demonstrating that the workflow’s gains are not limited to the largest commercial systems.
Applying DIVE to over 4 000 hydrogen‑storage publications, the authors built a curated database of more than 30 000 entries (the DigHyd platform). The dataset includes publication trends, gravimetric hydrogen capacities, elemental distributions across capacity ranges, and classifications of material families (interstitial hydrides, ionic hydrides, complex borohydrides, porous frameworks, high‑entropy alloys, superhydrides, etc.). Notably, the analysis reveals that Ni dominates the 0‑4 wt % range, Mg the 4‑8 wt % range, and Li the 8‑12 wt % range, reflecting the historical shift from interstitial to ionic and complex hydrides.
Leveraging this database, the authors constructed a GPT‑4‑based design agent called DigHyd. Users can pose natural‑language queries (e.g., “Find compositions with >5 wt % hydrogen capacity and dehydrogenation temperature below 600 K”). DigHyd first filters the database according to the criteria, then employs a machine‑learning verifier trained on the extracted data to predict thermodynamic stability and performance of candidate compositions. The inverse‑design loop iteratively proposes new alloy or dopant combinations, evaluates them with the verifier, and returns a ranked list of novel candidates—all within two minutes, as demonstrated in supplementary videos.
The paper also details a rigorous evaluation methodology for extraction quality. Because simple true/false scoring is insufficient for continuous material properties, the authors use an embedding‑based matching algorithm to align AI‑generated JSON entries with human‑curated ones, standardize units, and compute relative errors. The resulting composite score (accuracy + completeness) serves both as a benchmark and as a potential reward signal for future reinforcement‑learning fine‑tuning of LLMs.
Finally, the authors provide open‑source code, the full DigHyd database, and an online data‑checking platform (datachecking.dighyd.org) for community verification and correction of AI‑extracted entries. They argue that the DIVE workflow is broadly transferable to other material domains, offering a scalable solution for turning the wealth of unstructured scientific figures into actionable knowledge and enabling rapid, AI‑driven materials discovery.
Comments & Academic Discussion
Loading comments...
Leave a Comment