OSCAgent: Accelerating the Discovery of Organic Solar Cells with LLM Agents

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Organic solar cells (OSCs) hold great promise for sustainable energy, but discovering high-performance materials is time-consuming and costly. Existing molecular generation methods can aid the design of OSC molecules, but they are mostly confined to optimizing known backbones and lack effective use of domain-specific chemical knowledge, often leading to unrealistic candidates. In this paper, we introduce OSCAgent, a multi-agent framework for OSC molecular discovery that unifies retrieval-augmented design, molecular generation, and systematic evaluation into a continuously improving pipeline, without requiring additional human intervention. OSCAgent comprises three collaborative agents. The Planner retrieves knowledge from literature-curated molecules and prior candidates to guide design directions. The Generator proposes new OSC acceptors aligned with these plans. The Experimenter performs comprehensive evaluation of candidate molecules and provides feedback for refinement. Experiments show that OSCAgent produces chemically valid, synthetically accessible OSC molecules and achieves superior predicted performance compared to both traditional and large language model (LLM)-only baselines. Representative results demonstrate that some candidates achieve predicted efficiencies approaching 18%. The code will be publicly available.

💡 Research Summary

OSCAgent is a novel multi‑agent framework that automates the discovery of high‑performance organic solar‑cell (OSC) acceptor molecules by integrating large language models (LLMs) with domain‑specific retrieval, generation, and evaluation components. The system consists of three collaborative agents: a Planner, a Generator, and an Experimenter.

The Planner adopts a retrieval‑augmented strategy, pulling experimentally validated high‑efficiency OSC molecules from the literature and dynamically updating top‑performing candidates from previous cycles. By extracting structural motifs, electronic property trends, and synthetic feasibility patterns, the Planner synthesizes this knowledge into concise prompts that guide the downstream design process.

Guided by the Planner’s prompts, the Generator—implemented with a GPT‑4‑Turbo‑based LLM—produces novel SMILES strings for acceptor candidates. Generation is constrained by built‑in chemical validity checks (valence, aromaticity) and a pre‑filter on synthetic accessibility scores (SAscore), ensuring that only chemically plausible molecules are proposed.

The Experimenter evaluates each candidate on three complementary axes: (1) predicted power conversion efficiency (PCE) using a multimodal predictor that fuses graph neural network embeddings, SMILES transformer embeddings, and Morgan fingerprint features via a Mixture‑of‑Experts encoder; (2) synthetic accessibility; and (3) electronic feasibility (predicted HOMO/LUMO levels). The PCE predictor is first pretrained on the large Lopez computational dataset (≈51 k molecules) with contrastive learning and an auxiliary LUMO prediction task, then fine‑tuned on a curated experimental set of 1,027 OSC acceptors. Crucially, the predictor outputs both a mean and variance for PCE, employing heteroscedastic Gaussian loss to quantify uncertainty and avoid over‑reliance on noisy measurements.

After evaluation, the Experimenter compiles a structured report, adds promising molecules to the candidate database, and feeds the results back to the Planner. This closed‑loop enables continuous learning: the Planner refines its retrieval queries and design directives based on the latest performance data, the Generator adapts its creative direction, and the Experimenter iteratively improves its assessment models.

Benchmarking against VAE‑based DeepAcceptor, GA‑based genetic design, and a baseline LLM‑only prompt approach, OSCAgent demonstrates superior performance. It achieves a higher average predicted PCE (≈12 %p increase), produces chemically valid and synthetically accessible structures, and discovers candidates with predicted efficiencies approaching 18 %, a level comparable to state‑of‑the‑art experimental OSCs. The uncertainty‑aware predictor effectively filters out over‑optimistic predictions, improving the reliability of the search.

In summary, OSCAgent showcases how LLM‑driven agents, when combined with retrieval‑augmented planning and rigorous multimodal evaluation, can autonomously explore vast chemical spaces, generate novel, feasible OSC acceptors, and accelerate materials discovery without continuous human intervention. Future work will focus on experimental synthesis and validation of top candidates, extending the framework to other photovoltaic materials, and enhancing the Planner with meta‑learning capabilities to further reduce reliance on handcrafted prompts.

OSCAgent: Accelerating the Discovery of Organic Solar Cells with LLM Agents

💡 Research Summary

Comments & Academic Discussion

Leave a Comment