Crystal structure prediction (CSP), which aims to predict the three-dimensional atomic arrangement of a crystal from its composition, is central to materials discovery and mechanistic understanding. Existing deep learning models often treat crystallographic symmetry only as a soft heuristic or rely on space group and Wyckoff templates retrieved from known structures, which limits both physical fidelity and the ability to discover genuinely new material structures. In contrast to retrieval-based methods, our approach leverages large language models to encode chemical semantics and directly generate fine-grained Wyckoff patterns from composition, effectively circumventing the limitations inherent to database lookups. Crucially, we incorporate domain knowledge into the generative process through an efficient constrained-optimization search that rigorously enforces algebraic consistency between site multiplicities and atomic stoichiometry. By integrating this symmetry-consistent template into a diffusion backbone, our approach constrains the stochastic generative trajectory to a physically valid geometric manifold. This framework achieves state-of-the-art performance across stability, uniqueness, and novelty (SUN) benchmarks, alongside superior matching performance, thereby establishing a new paradigm for the rigorous exploration of targeted crystallographic space. This framework enables efficient expansion into previously uncharted materials space, eliminating reliance on existing databases or a priori structural knowledge.
Crystalline materials are fundamental to modern technologies spanning energy, electronics, medicine, and aerospace, and advancing these sectors relies on identifying novel crystal structures with tailored properties [1][2][3][4][5]. The vastness of the compositional and structural parameter space presents a fundamental bottleneck for materials discovery. Conventional pipelines, guided by chemical intuition and constrained by costly computations and experiments [6,7], are intrinsically inefficient at exploring this space at scale. In recent years, artificial intelligence (AI), and deep learning-based generative models in particular, has emerged as a promising route to proposing candidate materials or predicting their properties at scale, demonstrating superior generalization capabilities and significant advantages in computational efficiency [8][9][10][11][12][13][14][15]. Within this landscape, the crystal structure prediction (CSP) task [8,10,13], which seeks the three-dimensional arrangement of atoms in the unit cell from a given composition, plays a pivotal role in AI-driven materials design workflows [11]. Diffusion models have become the dominant architecture for 3D structure generation [16,17], and, for the CSP task, diffusion-based approaches also deliver state-of-the-art performance [10,14].
However, the CSP task imposes particularly stringent requirements on the design of AI methods. Physically plausible crystal structures typically obey the symmetry constraints imposed by specific space groups. Learning and explicitly enforcing these symmetries can substantially reduce the search space of CSP, thereby improving both the efficiency and success rate of structure generation. Crystal symmetry strictly constrains the allowed Wyckoff positions, determining atomic multiplic-ities, site symmetries, and the relative arrangement of atoms within the unit cell. Only when the given chemical composition is mathematically consistent with the corresponding Wyckoff positions can physically reasonable crystal structures be obtained.
However, existing AI methods only partially address this requirement. Some recent approaches on generating atomic structures condition the generator on a space group label [11,14] but do not explicitly represent or enforce Wyckoff positions, so symmetry is only captured in a coarse global sense. Other works attempt to encode Wyckoff templates into a latent representation that modulates the generative process [12,18], yet their modulation mechanisms do not guarantee that the final atomic coordinates satisfy the exact Wyckoff constraints. Dif-fCSP++ [10] goes further by enforcing strict space group symmetry defined by Wyckoff templates, but it assumes that a suitable space group and Wyckoff template are already known in the given lookup database. In practice, these templates are retrieved from existing structures via metric learning methods, such as CSPML [19]. However, restricting generation to retrieved templates confines the model to known symmetry patterns, thereby limiting its ability to discover genuinely new space-group and Wyckoff configurations for a given composition.
To address these fundamental limitations, we propose a symmetry-driven generative framework, which enables the ab initio generation of fine-grained Wyckoff site assignments directly from composition and atom counts, independent of any structural priors, as depicted in Fig. 1. Crucially, the predicted symmetries are imposed as hard geometric constraints to guide and modulate the diffusion process for three-dimensional crystal geometries, thereby rectifying the denoising trajectory. By ensuring strict compliance with Wyckoff symmetry, our approach confines the generative process to a physically valid manifold, guaranteeing both structural plausibility and computational precision. As a result, this leads to a significant enhancement in the stability, uniqueness, and novelty (SUN) [14] of the discovered materials, as well as the matching rate metrics [20]. Below, we outline the core methodology, with full implementation details and training configurations provided in Supplementary Information A and B, respectively.
First, we use two large language models (LLMs) to directly infer fine-grained crystallographic symmetry from compositional inputs. Both models are built upon the Transformer architecture [21], and we replace the standard feed-forward networks within each Transformer block with soft mixture-of-experts (SoftMoE) layers [22] to enhance model capacity. This design enables multiple experts to specialize in distinct compositional patterns, substantially boosting model capacity without a commensurate rise in computational cost, as only a small, softly weighted subset of experts is activated at each position.
The first LLM, denoted as LLM g , takes as input an atomic sequence representation of the composition, expanded to explicitly reflect the atom counts N in the unit cell, and outputs a probability distribut
This content is AI-processed based on open access ArXiv data.