From Literature to Lab: Closed-Loop Advancement of Perovskite Solar Cells via Domain Knowledge Guided LLM

From Literature to Lab: Closed-Loop Advancement of Perovskite Solar Cells via Domain Knowledge Guided LLM
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Perovskite solar cells (PSCs) have been considered as a next-generation disruptive photovoltaic technology, yet their advancement is constrained by the complexity of perovskite recipe with high-dimensional material and process design space. Despite the impressive general reasoning of Large Language Models (LLMs), they struggle with two limitations for application in PSCs: an inability to align general semantics with the perovskite domain knowledge, and an inefficiency in navigating high-dimensional perovskite material and recipe design spaces. To address these limitations, we introduce a domain-knowledge-guided framework PVK-LLM, a specialized model to serve as an expert to bridge general semantics with perovskite domain knowledge. By integrating this domain knowledge into a hierarchical Bayesian Optimization workflow, our approach efficiently navigates the high-dimension design space on a solar cell simulator platform. The domain knowledge resolves cold-start problems while dynamically adapting to simulator feedback. Moreover, in an individual wet-lab experiment aimed at maximizing power conversion efficiency (PCE), our framework autonomously proposes a novel synergistic four-component recipe comprising specialized organic passivation recipe (3MTPAI, PDAI2, EDAI2, and PipDI) which has not been reported in existing literature. This AI-designed recipe effectively achieves a champion PCE value of over 26.0 %, approaching world records achieved through extensive expert trial-and-error. Our approach can effectively enable LLM comprehend the domain knowledge, which can efficiently navigate in a high-dimensional, capable to accelerate the advancement in real-world perovskite as well as other material science development.


💡 Research Summary

The paper presents a novel closed‑loop framework that integrates a domain‑knowledge‑guided large language model (PVK‑LLM) with hierarchical Bayesian optimization (PVK‑BO) to accelerate the design of perovskite solar cells (PSCs). Conventional LLMs excel at general language reasoning but fall short when applied to PSC research because they cannot align generic semantics with the highly specialized knowledge required for perovskite materials, and they are inefficient at navigating the enormous combinatorial space of precursors, solvents, processing parameters, and interface treatments.

To overcome these challenges, the authors construct PVK‑LLM by fine‑tuning the Qwen2.5‑32B backbone through a three‑stage curriculum learning pipeline. Stage I (Knowledge Injection) uses a curated corpus of over 4 000 recent high‑impact papers and a 55 k‑question‑answer dataset (PVK‑Sci) covering seven research themes (device architecture, performance enhancement, interface engineering, stability, materials, defects & recombination, metrics). Stage II (Instruction Alignment) adds two specialized QA sets: PVK‑Cite (22 k citations) to teach the model to ground answers in specific literature, and PVK‑Exp (10 k experimental records) to enable quantitative interpretation of tables and performance metrics. Stage III introduces a Retrieval‑Augmented Generation (RAG) mechanism that continuously updates a Perovskite Knowledge Graph (PVK‑KG) containing ~24 k entities and ~22 k triples, ensuring that the model stays current without frequent re‑training.

PVK‑LLM’s internal representations are shown to be semantically organized: t‑SNE visualizations reveal clear clustering of functional categories (e.g., hole‑transport layers) that are absent in generic LLM embeddings. Benchmarking on a domain‑specific test set (PVK‑QA and PVK‑MCQ, 2 206 items) demonstrates state‑of‑the‑art performance, with 87.25 % accuracy on multiple‑choice questions and superior scores on open‑ended generation evaluated by both LLM‑as‑judge and human experts. Human pairwise comparisons yield win rates of 65–70 % over strong baselines such as GPT‑4 and the base Qwen model.

The model is then embedded in a closed‑loop active learning loop. PVK‑LLM proposes initial candidate recipes, which are fed into a surrogate probabilistic model that predicts device performance. An acquisition function selects the most promising candidates for evaluation in the SCAPS‑1D physics‑based simulator. Two independent optimization tasks are explored: (1) band‑alignment tuning of perovskite, electron‑transport layer (ETL), and hole‑transport layer (HTL) to minimize energy barriers, and (2) doping concentration optimization in transport layers. Across five repeated runs, PVK‑BO consistently outperforms baselines that rely on a generic LLM (Qwen2.5‑32B) or standard Bayesian optimization algorithms (Standard BO, HEBO, TuRBO). PVK‑BO achieves higher initial PCE (demonstrating effective cold‑start) and reaches a final simulated efficiency of 25.44 % with lower variance, indicating stable convergence toward global optima.

Finally, a real‑world wet‑lab experiment validates the framework. PVK‑LLM autonomously identifies interface passivation as the performance bottleneck and proposes a novel four‑component organic passivation mixture (3MTPAI, PDAI₂, EDAI₂, PipDI) that has not been reported in the literature. Implementing this recipe in a p‑i‑n PSC yields a champion power conversion efficiency of >26 %, rivaling the best reported values that typically require extensive expert trial‑and‑error.

In summary, the study demonstrates that (i) domain‑specific fine‑tuning can endow LLMs with deep scientific expertise, (ii) a continuously refreshed knowledge graph enables up‑to‑date reasoning, and (iii) coupling such an LLM with Bayesian optimization creates a powerful closed‑loop system that efficiently explores high‑dimensional material‑process spaces. The approach is generalizable to other materials challenges, such as battery electrolyte formulation or organic photovoltaic active‑layer screening, and represents a significant step toward AI‑driven autonomous discovery in materials science.


Comments & Academic Discussion

Loading comments...

Leave a Comment