Generative AI for Quantum Circuits and Quantum Code: A Technical Review and Taxonomy
We review thirteen generative systems and five supporting datasets for quantum circuit and quantum code generation, identified through a structured scoping review of Hugging Face, arXiv, and provenance tracing (January-February 2026). We organize the…
Authors: Juhani Merilehto
Generativ e AI for Quan tum Circuits and Quan tum Co de: A T ec hnical Review and T axonom y Juhani Merileh to Univ ersity of V aasa & Universit y of T urku merilehto@pm.me Abstract W e review thirteen generativ e systems and fiv e supp orting datasets for quantum circuit and quan tum co de generation, identified through a structured scoping review of Hugging F ace, arXiv, and prov enance tracing (Jan uary–F ebruary 2026). W e organize the field along tw o axes—artifact t yp e (Qiskit code, Op enQASM programs, circuit graphs) crossed with training regime (sup ervised fine-tuning, v erifier-in-the-lo op RL, diffusion/graph generation, agentic optimization)—and system- atically apply a three-la yer ev aluation framework cov ering syntactic v alidit y , semantic correctness, and hardware executabilit y . The central finding is that while all review ed systems address syn tax and most address semantics to some degree, none reports end-to-end ev aluation on quantum hardware (Lay er 3b), leaving a significan t gap b et w een generated circuits and practical deplo yment. Sc op e note : “quan tum co de” refers throughout to quan tum pr o gr am artifacts (QASM, Qiskit); w e do not cov er generation of quantum error-correcting co des (QEC). 1 In tro duction Generativ e AI for quantum softw are has diversified from quantum-a w are co de assistan ts in to multiple tec hnical families that synthesize quan tum artifacts at different abstraction levels. The imp ortan t axis of differen tiation across these systems is not “LLM vs. non-LLM,” but how semantic c orr e ctness is define d and enfor c e d : unit tests, fidelity proxies, ob jectiv e-function scores, or entanglemen t proxies. This review imp oses structure on this fragmen ted landscap e. 1 Scop e. W e focus on gener ative systems that output quan tum artifacts in tended to b e executed or compiled: (i) quantum circuits as gate sequences or graphs; (ii) Op enQASM (2.0 and 3.0) programs; and (iii) Qiskit (Python) co de that constructs circuits. W e exclude systems where quan tum circuits are in ternal components but outputs are non-circuit data (“quan tum-enhanced” generativ e modelling). W e use the follo wing terminology throughout: • Syn tactic v alidit y : the output parses/compiles under the target gram- mar/to olc hain. • Seman tic correctness : the generated artifact implemen ts the in tended unitary , algorithm, or task ob jectiv e. • Hardw are executabilit y : the artifact transpiles and runs under realistic device constraints (connectivit y , gate set, noise) with acceptable resource usage. Op enQASM 2.0 versus 3.0. Sev eral reviewed systems target Op enQASM 2.0 [ 1 ] while others target OpenQASM 3.0 [ 2 ]. Op enQASM 2.0 is a straigh t- line gate-sequence language; 3.0 introduces classical con trol flow ( for , while , if-else ), typed v ariables, subroutine definitions, and timing instructions. F or generativ e mo dels, this distinction matters in three wa ys: (1) the grammar space is considerably larger, increasing the probability of syn tactically inv alid output; (2) seman tic correctness becomes harder to verify b ecause classical con trol flo w creates path-dep enden t b eha viour; and (3) generated circuits ma y exploit features (e.g., mid-circuit measurement and feed-forward) that curren t sim ulators and hardw are supp ort unevenly . T able 2 identifies whic h QASM v ersion each system targets; systems op erating on 2.0 and 3.0 are not directly comparable in generation difficulty or ev aluation complexity . P ositioning against classical co de generation. Classical co de LLMs suc h as Co dex [ 3 ], AlphaCo de [ 4 ], and Co deBER T [ 5 ] generate programs ev aluated primarily via unit tests and execution-based feedback. Unit-test ev aluation transfers directly to Qiskit code generation, as demonstrated b y QiskitHumanEv al [ 6 ]. Ho wev er, quantum semantic equiv alence chec king— v erifying that t wo circuits implement the same unitary—is fundamentally more exp ensiv e: statev ector simulation requires O (2 n ) memory for n qubits, and full unitary comparison costs O (4 n ) . No general p olynomial-time equiv a- lence c hec ker is kno wn for arbitrary unitaries at scale. Hardw are executabilit y as a first-class constraint—connectivit y maps, native gate sets, and coherence- time budgets—has no classical analogue. These differences motiv ate the three-la yer framew ork developed in § 5 . 2 2 Review Metho dology 2.1 Searc h and Screening W e conducted a scoping review follo wing a structured search proto col mo d- elled on PRISMA-ScR reporting guidelines. All screening and inclusion decisions were p erformed b y a single reviewer (the author); no indep enden t second screening w as conducted. This is ac knowledged as a limitation in § 7.5 . Sources w ere assembled b et w een Jan uary 1 and F ebruary 15, 2026 via three channels: 1. Mo del-h ub searc h on Hugging F ace: four k eyw ord queries against mo del-card conten t— "QASM" (11 hits), "quantum circuit" (40 hits), "OpenQASM" (3 hits), "Qiskit generator" (0 hits)—yielding 35 unique base mo del cards after deduplication and collapsing quantized redistribu- tions. 2. P ap er searc h on arXiv (categories cs.AI , cs.LG , quant-ph ; submissions ≤ 2026-02-15): fiv e keyw ord queries yielding 193 unique pap ers after cross- query deduplication. The largest result set (185 pap ers for RL + quantum circuit) is dominated by quantum-enhanced RL w orks that do not gener ate circuits; these were excluded during screening. 3. Pro venance follo w-up via GitHub rep ositories, Hugging F ace orga- nization pages, and bac kward/forw ard citation tracing. This channel reco vered three systems (Granite-3.2-8b-Qiskit, Qw en2.5-14B-Qiskit, Ket- GPT) whose mo del cards did not match k eyword queries. Screening flo w. The com bined p o ol of 228 unique candidates (35 HF mo del cards + 193 arXiv pap ers) w as screened in t wo stages: (i) title/abstract screening remov ed 190 candidates (172 quantum-enhanced RL/VQC pap ers + 18 non-generative HF mo del cards); (ii) full-text screening of the remaining 38 candidates remov ed 16 (11 for insufficien t tec hnical disclosure, 5 for pro ducing outputs outside scope). After deduplication across c hannels, 13 generativ e systems and 5 datasets w ere retained. Inclusion criteria. A system was included if public artifacts (pap er, mo del card, or rep ository) jointly disclosed at least tw o of: (i) mo del architecture or parameter coun t, (ii) training data source and appro ximate scale, (iii) at least one quantitativ e ev aluation metric. Systems with partial disclosure w ere included and annotated with “Unsp ecified in source” for missing fields. F ully closed systems with no public disclosure were excluded. 3 T reatmen t of partial disclosure. “Insufficien t technical disclosure” w as applied when a system’s public artifacts did not meet the t wo-of-three criterion ab o ve and the missing information could not b e inferred from the rep ository . Systems excluded for this reason are noted in fo otnotes rather than listed individually , as their omission reflects disclosure limitations rather than technical inadequacy . 3 Bac kground and Timeline The systems review ed here b elong to a broader contin uum of automated quan tum circuit construction. Before the current w av e of generativ e-mo del- based approaches, the field dev elop ed substan tial foundations in ev olutionary and reinforcement-learning-based circuit synthesis. Genetic algorithms ha ve b een applied to quantum circuit compilation since at least 2019 [ 7 ], and deep RL w as demonstrated for quantum compiling by Moro et al. [ 8 ]. Multi- ob jectiv e ev olutionary architecture searc h [ 9 ] further matured the space. These pre-LLM metho ds t ypically op erate b y sequential gate placemen t guided by heuristic or learned v alue functions, and remain comp etitive for structured synthesis tasks. The generativ e-mo del wa ve review ed here (2024– 2026) differs primarily in its use of large pre-trained language or diffusion mo dels and in its am bition to generalize across task families rather than optimize for a single target unitary . T able 1 pro vides a chronological o verview. The field has progressed from b enc hmark and dataset construction (2020–2024) through sup ervised generation mo dels (2024–2026), with verifier-in-the-loop and agentic systems emerging prominently in 2025. 4 Y ear System / Dataset T yp e Key Innov ation 2020 QASMBench [ 10 , 11 ] Benchmark Curated low-lev el OpenQASM 2.0 b enc hmark suite for NISQ ev aluation 2024 genQC [ 12 , 13 ] Model First diffusion model for quantum circuit synthesis; text-conditioned denoising ov er discrete gate tok ens 2024 KetGPT [ 14 , 15 ] Model GPT-based transformer for generating realistic Open- QASM 2.0 circuits (dataset augmentation) 2024 AltGraph [ 16 ] Model Generative graph mo dels (D-V AE, DeepGMG) for circuit DA G rewriting and optimization 2024 QiskitHumanEv al [ 6 ] Benchmark Unit-test b enc hmark (101 tasks) for Qiskit co de generation 2024 quantum-circuits-8k [ 17 ] Dataset Synthetic text → QASM 2.0 pairs with paraphrase augmentation 2024 QuantumLLMInstruct [ 18 , 19 ] Dataset 500k+ claimed instruction-tuning pairs across 90+ quantum domains 2024 QCircuitBench [ 20 ] Benchmark 120k+ algorithm-design instances with v erification oracles spanning 25 algorithms 2025 Granite-3.2-8b / Qwen2.5-14B Qiskit [ 21 , 22 , 23 ] Model Industrial Qiskit code LLMs with GRPO p ost- training using quantum verifiable rewards 2025 UDiTQC [ 24 ] Model U-Net-style diffusion transformer; outp erforms genQC on entanglemen t and compilation 2025 Agent-Q (SFT) [ 25 ] Mo del SFT on 14k optimization circuits in Op enQASM 3.0 2025 Barta et al. [ 26 ] Model Diffusion for parameterized quantum circuits; ex- tends to contin uous gate parameters 2025 QUASAR (SFT+RL) [ 27 ] Model Agentic RL with hierarchical 4-level reward; tool- augmented LLM 2025 Q-F usion [ 28 ] Model Lay erDA G-based diffusion o ver circuit DAGs; 100% syntactic v alidit y in tested regimes 2025 genQC v2 [ 29 ] Model Multimodal diffusion generating discrete structure and contin uous parameters sim ultaneously 2025 QAgent [ 30 ] Model Multi-agent LLM for autonomous OpenQASM pro- gramming; RAG + CoT + to ol augmentation 2025 graph-data-quantum- rl [ 31 ] Dataset 14.5k rows with prompts, graphs, Hamiltonians, OpenQASM 3.0 circuits 2026 QuantumGPT-124M [ 32 ] Model Small sp ecialist GPT-2 for Op enQASM 2.0; task- specific tiny LM feasibilit y T able 1: Chronological milestones in generative AI for quantum circuits and code. Y ears reflect first public appearance (preprint, model card, or rep ository). 5 4 T axonom y of Generativ e Systems W e organize reviewed systems along tw o axes: artifact t yp e (Qiskit co de vs. QASM vs. circuit graph) crossed with training regime (static SFT, v erifier-in-the-lo op RL, diffusion/graph generation, agentic optimization). This pair maximises separation among reviewed systems and aligns with the t wo practical questions a practitioner faces: “What do I w ant the system to output?” and “How is correctness enforced during training?” Alternativ e axis choices—qubit regime or v erification cost—w ere considered but rejected as either degenerate (most systems op erate in the small-qubit regime) or conflating distinct mo del designs that share a cost profile. The six families are: • Qiskit co de assistan ts : general co de LLMs adapted to Qiskit APIs, ev aluated b y executable unit tests. • Op enQASM generators (static SFT) : sup ervised fine-tuned LMs pro ducing Op enQASM for sp ecific domains. • Sp ecialist small LMs : small mo dels ( ∼ 100M parameters) trained on text → QASM instruction pairs. • V erifier-in-the-lo op alignment : RL/preference optimization with sim ulator-based rew ards. • Graph and diffusion generators : mo dels op erating on circuit D AGs or discrete tokenizations of gates and parameters. • Agen tic systems : m ulti-step generation with external to ols (simulators, compilers) used for scoring and iterative impro vemen t. 4.1 Qiskit Co de Assistan ts Granite-3.2-8b-Qiskit [ 21 ] and Qwen2.5-Coder-14B-Qiskit [ 22 ] are general- purp ose co de LLMs extended pre-trained on a curated Qiskit corpus (approx- imately 50M tokens of Qiskit v2.0 API co de) and fine-tuned with sup ervised instruction tuning. Ev aluation uses QiskitHumanEv al [ 6 ], a b enc hmark of 101 tasks where the metric is pass@ k : the probability that at least one of k generated completions passes all unit tests. More recent work [ 23 ] adds GRPO (Group Relativ e P olicy Optimization) [ 33 ] p ost-training with quan- tum verifiable rewards. GRPO eliminates the critic net work by estimating adv antages relativ e to the group mean of sampled completions, reducing memory ov erhead. In the quantum setting, the reward function chec ks b oth syn tactic correctness and functional equiv alence via Qiskit A er sim ulation. 6 4.2 Op enQASM Generators and Sp ecialist Small LMs Agen t-Q [ 25 , 34 ] is a Qwen-based mo del fine-tuned on approximately 14,000 parameterized optimization circuits (QAO A, V QE, adaptive VQE) in Op en- QASM 3.0. The released Hugging F ace chec kp oin t is 4B parameters, though the pap er do es not clearly sp ecify the base-mo del size. Ev aluation measures ob jectiv e alignmen t : Jensen–Shannon div ergence b etw een the output dis- tribution of the generated circuit and the ground-truth distribution, as w ell as exp ectation-v alue discrepancy under problem-sp ecific cost Hamiltonians. Quan tumGPT-124M [ 32 , 17 ] is a GPT-2-scale (124M-parameter) mo del trained on approximately 8,000 synthetic text → Op enQASM 2.0 pairs with paraphrase augmentation. It targets small circuits ( ≤ 5 qubits) and ev aluates syn tactic v alidity via parser chec ks and approximate task-type success via man ual inspection. 4.3 V erifier-in-the-Lo op Alignmen t QUASAR [ 27 , 35 ] extends Agen t-Q’s SFT foundation with agen tic reinforce- men t learning using GRPO. The key innov ation is a hierarc hical four-level rew ard computed by an external quan tum sim ulation to ol: (1) a syn tax rew ard for successful Op enQASM 3.0 parsing; (2) a distributional align- men t term (Jensen–Shannon divergence); (3) an exp ectation-v alue align- men t term comparing cost-Hamiltonian expectation v alues; and (4) an optimization usabilit y term assessing whether the generated circuit con- v erges efficien tly under further classical parameter optimization. The mo del in teracts with a quantum to ol server via HTTP , receiving structured feedback at each RL step. 4.4 Graph and Diffusion Generators genQC [ 12 , 13 , 36 ] employs a denoising diffusion mo del on discrete circuit tok ens. Circuits are represen ted as 2D tensors (rows = qubits, columns = time steps, cells = gate iden tities). The rev erse pro cess uses a conditional U-Net with text conditioning via frozen Op enCLIP em b eddings. Ev aluation uses pro cess fidelity ( F = | T r ( U † gen U target ) | 2 /d 2 ) and compilation success rate (typically 3–5 qubits). Model size is not rep orted as a single count due to the U-Net + frozen CLIP arc hitecture. AltGraph [ 16 ] uses three generative graph mo dels—D-V AE (GRU and GCN v ariants) and DeepGMG—to transform quan tum circuit DA Gs. The mo dels learn a latent space from which p erturbations pro duce functionally equiv alent circuits with reduced depth and gate count. Ev aluation measures 7 densit y-matrix MSE (0.0074 a v erage) and p ost-transpilation gate coun t and depth reduction (37.55% and 37.75%). Model sizes are unspecified in source . Q-F usion [ 28 ] adapts the Lay erDA G diffusion framework to quantum circuit DA Gs. It rep orts 100% syn tactic v alidity in tested regimes (small random circuits), though seman tic ev aluation b ey ond v alidit y is limited. Mo del size is unsp ecified in source . UDiTQC [ 24 ] replaces genQC’s U-Net bac kb one with a U-Net-st yle Diffusion T ransformer (UDiT) combining multi-scale feature extraction with global self-attention. Ev aluated on entanglemen t generation and unitary compilation (up to 8 qubits), it rep orts higher accuracy than genQC. The framew ork supp orts mask ed circuit editing and constrained generation. Mo del size is unsp ecified in source . Barta et al. [ 26 ] extend diffusion to p ar ameterize d quantum circuits, generating b oth discrete gate structure and con tinuous rotation angles— addressing a limitation of earlier discrete-token diffusion mo dels. Accepted at QCE 2025. Model size is unsp ecified in source . genQC v2 [ 29 ] introduces a multimo dal denoising diffusion mo del that sim ultaneously generates circuit structure and con tinuous parameters using t wo indep enden t noise processes with a shared conditioning mechanism. Mo del size is unsp ecified in source ; ev aluation disclosure in the public preprin t is limited. 4.5 Agen tic Systems QAgen t [ 30 ] is a m ulti-agent LLM system for autonomous OpenQASM programming. Given a natural language task description, it decomp oses into sub-tasks dispatched to a Dynamic-few-shot Co der (in-context learning for regular circuits) and a T o ols-augmente d Co der (simulation to ols for complex parameterized tasks). Both incorp orate multi-round self-reflection with c hain- of-though t reasoning and RA G. The system rep orts 71.6% improv emen t ov er baseline LLMs on Op enQASM generation. Unlike QUASAR, QAgen t uses prompt engineering and to ol augmentation ov er a frozen base LLM rather than fine-tuning or RL. Mo del size dep ends on the pluggable base LLM. 4.6 Dataset Augmen tation Mo dels KetGPT [ 14 , 15 ] uses a GPT-based transformer to generate synthetic Op en- QASM 2.0 circuits trained on algorithm-deriv ed circuits from MQTBench. Its purp ose is dataset augmentation rather than task-directed generation: a 8 three-fold v erification pro cess (manual insp ection, transformer-based real- vs-random classification, and structural analysis) v alidates that generated circuits resem ble real algorithm-based circuits. Mo del size is unsp ecified in source . 4.7 Mo del Comparison T able 2 summarizes the review ed generativ e systems. The Syn., Sem., and HW columns enco de ev aluation co v erage using compact lab els rather than binary chec kmarks, reflecting that seman tic ev aluation metho ds differ fundamen tally across mo del families and are not directly interc hangeable. 9 System F amily Out- put Size Syn. Sem. HW Ev aluation Notes QuantumGPT- 124M [ 32 , 17 ] Spec. small LM QASM 2.0 124 M ✓ Lim — Parser v alidation; manual inspection on ≤ 5 qubit circuits (no oracle) Granite3.28b Qiskit [ 21 , 6 ] Qiskit LLM Qiskit (Py) 8 B ✓ UT — QiskitHumanEv al unit tests (pass@ k ); coding b enc hmarks Qwen2.514B Qiskit [ 22 , 6 ] Qiskit LLM Qiskit (Py) 14.7 B ✓ UT — Similar unit-test-driven evaluation Agent-Q (SFT) [ 25 , 34 , 31 ] Optim- LLM (SFT) QASM 3.0 Unspec. b ✓ DA — Distribution and exp ectation-v alue alignment QUASAR (SFT+RL) [ 27 , 35 , 31 ] V erifier RL QASM 3.0 4 B ✓ DA — Hierarchical 4-level reward; pass@ k on syntax + ob jectiv e alignment genQC [ 12 , 13 , 36 ] Diffusion Circuit tok. Unspec. a ✓ PF — Pro cess fidelity; compilation metrics (3–5 qubits) KetGPT [ 14 , 15 ] T rans. gen. QASM 2.0 Unspec. a ✓ RP — Real-vs.-random classification + structural analysis (realism proxy , not task semantics) AltGraph [ 16 ] Graph rewr. Circ. DA G Unspec. a ✓ PF 3a Density-matrix MSE; depth/gate reduction measured p ost-transpilation (L3a) Q-F usion [ 28 ] Graph diff. Circ. DA G Unspec. a ✓ Lim — V alidity rate in tested regimes; limited semantic ev al UDiTQC [ 24 ] Diff. transf. Circuit tok. Unspec. a ✓ PF — Pro cess fidelity on entanglemen t/compilation; outperforms genQC Barta et al. [ 26 ] Diff. (PQC) Param. circ. Unspec. a ✓ Lim — Diffusion for parameterized circuits; QCE 2025 genQC v2 [ 29 ] Diff. (multi) Param. circ. Unspec. a ✓ Lim — Multimo dal diffusion over discrete structure and contin uous parameters; limited evaluation disclosure QAgent [ 30 ] Agentic LLM Open- QASM Base LLM ✓ DA — Multi-agent RAG+CoT; 71.6% improv ement ov er baselines Ev a luation lay ers : Syn. = Syntactic v alidity (L1); Sem. = Seman tic metho d (L2); HW = Hardware (L3). Semantic co des : UT = unit tests; PF = process fidelity / density-matrix distance; DA = distributional + exp ectation-v alue alignment; RP = realism pro xy (structural similarit y); Lim = limited or manual only . Hardware co des : 3a = post-transpilation resource metrics reported; — = no hardware-lev el ev aluation. a Model size unsp ecified in source; architecture described qualitatively . b Paper text do es not clearly sp ecify the base-mo del size; the released Hugging F ace implementation is 4 B. T able 2: Review ed generative systems with artifact types, training regimes, and ev aluation cov erage. 4.8 Supp orting Datasets and Benc hmarks T able 3 summarizes k ey datasets and b enc hmarks that supp ort the gen- erativ e systems reviewed ab ov e. No single dataset curren tly addresses all three ev aluation la yers, and schema differences betw een OpenQASM 2.0 and 3.0 datasets remain a practical barrier to cross-system b enc hmarking. 10 Benc hmark suites suc h as QASMBench [ 10 , 11 ] and QCircuitBenc h [ 20 ] are not generative mo dels themselves but pro vide essential ev aluation in fras- tructure: QASMBenc h is asso ciated with execution fidelit y measurements on real devices (IBM, IonQ, Rigetti), while QCircuitBench supplies 120,290 algorithm-design instances with automatic v erification oracles spanning 25 algorithms in b oth Op enQASM 3.0 and Qiskit/Cirq formats. Dataset Primary use Scale Notes quantum-circuits-8k [ 17 ] T ext → OpenQASM 2.0 SFT ∼ 8 k Synthetic with paraphrase augmentation; small-circuit emphasis graph-data-quantum-rl [ 31 ] Optimization- circuit generation and RL 14.5 k rows Prompts, graphs, Hamiltonians, Op en- QASM 3.0 circuits, solutions QASMBench [ 11 , 10 ] OpenQASM-2 benchmark suite diverse Curated b enc hmark circuits and circuit- level metrics QCircuitBench [ 20 ] Algorithm design benchmarking 120,290 QASM 3.0 + co de (Qiskit/Cirq) + oracles / verification functions QuantumLLMInstruct [ 18 , 19 ] Broad quantum instruction data 500k+ claimed Paper/model card claim 500k+ instruction-tuning pairs across 90+ quan- tum domains; curren t public HF view er exposes 5.15k ro ws T able 3: Datasets and benchmarks supp orting quantum circuit/co de genera- tion and ev aluation. 5 Ev aluation F ramew ork A cross model families, ev aluation decomposes in to three la yers: 1. Syn tax : parsing, compilation, or imp ort success. F or graph- and D AG- based generators, “syntactic v alidit y” means structural w ell-formedness (v alid D A G topology , legal gate placements) rather than parser-v alid program text. 2. Seman tics : the metho d used to assess whether the generated artifact is c orr e ct . This v aries fundamentally across families: unit tests for co de generation; pro cess fidelity or densit y-matrix distances for compilation; exp ectation-v alue and distribution alignmen t for optimization tasks; and realism proxies (e.g., real-vs-random classification) for dataset augmen- tation. These metho ds are not in terchangeable, and a system ev aluated b y one metho d cannot b e directly rank ed against a system ev aluated by another. 3. Hardw are/resources , decomp osed into t wo subla yers: 11 3a. Compilabilit y and resource realism : transpilation to a target device’s native gate set and connectivity succeeds; resulting circuit depth, SW AP coun t, and t wo-qubit gate o verhead are acceptable. 3b. Empirical execution : the transpiled circuit is executed on a real QPU; measured output distributions are compared to ideal simulation using metrics such as Hellinger fidelit y or total v ariation distance. This sublay er distinction is diagnostic: a system ma y address 3a (AltGraph measures p ost-transpilation depth and gate counts) without addressing 3b (no system in the reviewed corpus rep orts QPU execution results as part of mo del ev aluation). Benc hmark suites suc h as QiskitHumanEv al [ 6 ] formalize unit-test ev al- uation for Qiskit co de generation. Optimization-focused systems suc h as QUASAR emphasize simulator-driv en ob jectiv e metrics and pass@ k v ariants o ver m ultiple correctness criteria [ 27 , 35 ]. Systematic application. T able 2 applies this framework to all review ed systems through the Sem. and HW columns. The pattern is clear: every review ed system addresses La yer 1 (syntax), most address La y er 2 (semantics) to some degree, and none addresses La yer 3b (empirical hardw are execution). Only AltGraph partially addresses Lay er 3a. This observ ation is elab orated in § 6 . T ask-ob jectiv e-to-ev aluator mapping. A practitioner selecting a se- man tic ev aluator m ust matc h the task ob jectiv e to an appropriate metric. T able 4 pro vides concrete guidance. Metric gaming and comp osite ev aluation. An y fixed ev aluation metric is susceptible to gaming. Distribution-matc hing metrics can be satisfied b y circuits that repro duce correct measurement statistics while implementing an incorrect unitary . Unit-test ev aluation can b e gamed b y ov erfitting to test-case structure. Fidelity metrics are robust against such shortcuts but exp onen tially exp ensive at scale. These failure modes motiv ate c omp osite ev aluation protocols com bining metrics from different paradigms, as w ell as adv ersarial test suites targeting common ev aluator blind sp ots. 12 T ask Ob jective Recommended Ev alua- tor Kno wn F ailure Mo des Compilation to tar- get unitary Process fidelity , diamond norm proxy , or equiv a- lence chec king on a basis subset Relative phase errors invisible to basis- restricted measurement; partial basis chec k- ing misses errors on un tested inputs; ancilla garbage passes fidelit y but fails full equiv a- lence Optimization ansatz generation Energy exp ectation v alue + conv ergence speed + robustness under re- optimization Low-energy ansatz may be a lo cal minimum; conv ergence sp eed conflated with initial parameter sensitivity Algorithm design tasks Oracle-based functional chec ks (as in QCircuit- Bench [ 20 ]) Oracle leakage if test structure correlates with training data; hard to verify beyond provided oracles Code assistants (Qiskit) Unit tests + execution traces (QiskitHumanEv al [ 6 ]) T ests chec k observ able b eha viour, not in- ternal correctness; ancilla state and relative phase may b e ignored Dataset augmenta- tion Real-vs-random classifica- tion + structural analysis (KetGPT [ 14 ]) Distribution matching without seman tic grounding; generated circuits may b e syntac- tically realistic but computationally trivial T able 4: Mapping task ob jectiv es to recommended semantic ev aluators and kno wn failure mo des. 6 Hardw are Gap and T ranspilation Hardw are ev aluation as a field-wide gap. A substan tive finding of this review is that none of the thirteen review ed generativ e systems rep orts end-to-end hardw are execution results (generation → transpile → execute → compare) as part of mo del ev aluation. Lay er 3b is absent from the generative mo del corpus. Benchmark suites such as QASMBench [ 10 ] are asso ciated with hardw are execution fidelity measuremen ts, but QASMBench is ev aluation infrastructure rather than a generative system. Among the generativ e systems, only AltGraph partially addresses Lay er 3a b y measuring p ost-transpilation depth and gate coun ts; no system closes the loop to La yer 3b. The fol lowing pr oto c ol is pr op ose d by this r eview as a dir e ction for futur e evaluation pr actic e; it is not establishe d in the r eviewe d c orpus. A hardw are ev aluation proto col for generative quantum circuits migh t include: (a) transpi- lation to a sp ecific device’s native gate set and connectivity (e.g., IBM Eagle 127-qubit heavy-hex), (b) execution with m ultiple shot coun ts, (c) compari- son of measured distributions against ideal simulation using Hellinger fidelity or total v ariation distance, and (d) resource accoun ting (SW AP insertions, final circuit depth, execution time relative to T 1 / T 2 coherence times). 13 T ranspilation constraints. All generated quan tum circuits must pass through transpilation b efore hardw are execution. T ranspilers such as the Qiskit transpiler [ 37 ], BQSKit [ 38 ], and SABRE [ 39 ] perform gate decom- p osition in to nativ e gate sets, qubit routing, and optimization passes. In the public artifacts review ed here, none of the generativ e systems explicitly accoun ts for hardw are connectivit y constraints or native gate sets during generation. Agen t-Q, QUASAR, an d QAgen t generate circuits using abstract gate sets that assume all-to-all connectivity . genQC generates from a fixed gate p o ol that may not align with target hardware. AltGraph is closest to hardware aw areness via p ost-transpilation metrics, but transpilation is applied after generation rather than constrained during generation. This means generativ e mo dels currently solv e a subset of the full circuit design problem: they pro duce logically correct circuits that may require substan tial transpilation ov erhead. W e pr op ose that future systems incor- p orating transpilation constrain ts during generation—e.g., conditioning on device connectivity graphs or p enalizing SW AP-heavy circuits during RL— w ould address a significant practical gap. Minimal rep orting baseline. A s a r e c ommendation fr om this r eview : ev en without transpilation-aw are generation, a straigh tforward impro vemen t w ould be to alwa ys rep ort post-transpilation metrics under a standard refer- ence back end, including at minim um: (i) SW AP ov erhead ratio, (ii) depth blo w-up factor, and (iii) the target back end top ology . These require only a single transpiler call and would enable cross-system comparison on a hardw are-realism dimension curren tly absen t from all review ed ev aluations. 7 Discussion 7.1 Ev aluation Standardization The primary b ottlenec k remains comparability: unit-test pass rates, dis- tributional alignment, and fidelity pro xies are not directly in terchangeable. Benc hmarks may b e gamed if they measure pro xies rather than the task ob jectiv e [ 6 , 27 , 12 ]. As a concrete illustration, consider a 5-qubit circuit ev aluated by b oth pass@ k and pro cess fidelity . A circuit implemen ting the correct computational- basis mapping (passing the unit test) ma y achiev e F < 0 . 8 if it agrees on tested observ able b eha viour but differs in relative phases, ancilla state, or b eha viour on untested inputs. Conv ersely , a circuit with F = 0 . 99 ma y fail a unit test that chec ks a side-effect (e.g., qubit ordering conv en tion) 14 the fidelit y metric ignores. These div ergences reflect structural differences b et w een ev aluation paradigms that prev ent cross-system ranking. 7.2 Data Prov enance and Repro ducibilit y Sc hema mismatc hes impede dataset reuse. The quantum-circuits-8k dataset uses Op enQASM 2.0 syn tax, while graph-data-quantum-rl uses Op enQASM 3.0 with typed v ariables and parameterized gates. A mo del trained on one format cannot b e directly ev aluated on the other without a translation la yer, and automated QASM 2.0 → 3.0 conv ersion is not lossless [ 17 ]. 7.3 Scaling and V erification Cost Scaling beyond small-qubit compilation is constrained b y classical v erifi- cation cost. Statevector simulation requires O (2 n ) memory; full unitary reconstruction costs O (4 n ) . F or n = 50 , statevector storage alone demands appro ximately 18 p etab ytes, and full unitary equiv alence c hecking is doubly in tractable. T ensor-netw ork and stabilizer-rank methods offer partial relief for structured circuits but do not generalize to arbitrary unitaries. This sim ulation w all is a fundamen tal barrier to scaling v erifier-in-the-lo op training b ey ond the 30–50 qubit regime [ 12 , 27 , 25 ]. 7.4 F uture Ev aluation Directions The fol lowing str ate gies ar e pr op ose d by this r eview as futur e evaluation dir e ctions; they ar e not establishe d pr actic e in the r eviewe d c orpus. The path-dependent seman tics of OpenQASM 3.0 create ev aluation c hallenges b ey ond those of 2.0’s straigh t-line circuits. Three strategies merit consideration: (i) b ounde d-p ath exe cution —en umerate all classical branc h paths up to a co verage b ound and verify each path’s unitary inde- p enden tly; (ii) tr ac e-b ase d unit testing —sp ecify exp ected measurement and classical-v ariable traces for represen tative inputs; and (iii) symb olic exe cu- tion —propagate symbolic states through classical branches to deriv e path conditions and v erify equiv alence on each feasible path. None of the review ed systems currently employs these strategies. 7.5 Threats to V alidity Sev eral limitations should b e considered when in terpreting this review: • Single-review er pro cess. All screening, inclusion, and co ding decisions w ere made b y one author. No in ter-rater reliability measure w as computed. 15 While appropriate for a scoping review of a small, emerging corpus, this in tro duces the p ossibility of systematic screening bias. • Corpus dependence on public disclosure. The review is limited to systems with publicly av ailable pap ers, model cards, or rep ositories. Closed-source industrial systems are excluded b y design, which ma y omit significan t w ork. • Pro venance-based inclusion. Three systems were identified through organization pages and citation tracing rather than keyw ord search. This reflects the limitations of keyw ord discov ery in a fast-mo ving field but in tro duces discretionary inclusion that is not fully reproducible from the k eyword proto col alone. • Mixed evidence quality . Review ed systems range from p eer-review ed publications to mo del cards with minimal do cumen tation. Ev aluation claims are taken at face v alue where replication w as not feasible. • Non-comparable metrics. The heterogeneit y of ev aluation metho ds across families means that cross-system ranking is not p ossible from the evidence base alone, despite the tabular presen tation. 8 Conclusion Generativ e AI for quan tum circuits and co de spans multiple mo del fami- lies unified by one central problem: enforcing seman tic correctness under exp ensiv e v erification. This review contributes a taxonom y grounded in artifact t yp e × training regime, a three-la yer ev aluation framew ork (with La yer 3 decomp osed in to compilabilit y and empirical execution sublay ers) rev ealing that no reviewed generativ e mo del closes the lo op to hardware execution, and a p ositioning against classical code generation that clarifies the unique challenges of the quan tum setting. F uture progress likely hinges on standardized ev aluation proto cols that separate syn tax, semantics, and hardw are realism; improv ed dataset prov enance with attention to QASM v ersion in terop erabilit y; transpilation-aw are generation; and scalable v erifier- in-the-lo op metho ds that generalize b eyond narro w problem families and small qubit counts. 16 References [1] Andrew W. Cross, Lev S. Bishop, John A. Smolin, and Jay M. Gam b etta. Op en quantum assem bly language. arXiv pr eprint arXiv:1707.03429 , 2017. [2] Andrew W. Cross, Ali Jav adi-Abhari, Thomas Alexander, Niel de Beau- drap, Lev S. Bishop, Stefan Heidel, Colm A. Ry an, Prasahnt Siv ara jah, John Smolin, Ja y M. Gam b etta, and Blake R. Johnson. OpenQASM 3: A broader and deep er quan tum assem bly language. A CM T r ansactions on Quantum Computing , 3(3):1–50, 2022. [3] Mark Chen, Jerry T worek, Heewoo Jun, Qiming Y uan, Henrique P onde de Oliveira Pin to, Jared Kaplan, Harri Edw ards, Y uri Burda, Nic holas Joseph, Greg Bro c kman, et al. Ev aluating large language mo dels trained on co de. arXiv pr eprint arXiv:2107.03374 , 2021. [4] Y ujia Li, Da vid Choi, Jun young Chung, Nate Kushman, Julian Sc hrit- t wieser, Rémi Leblond, T om Eccles, James Keeling, F elix Gimeno, Agustin Dal Lago, et al. Comp etition-level co de generation with Alpha- Co de. Scienc e , 378(6624):1092–1097, 2022. [5] Zhangyin F eng, Day a Guo, Duyu T ang, Nan Duan, Xiaocheng F eng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, et al. Co de- BER T: A pre-trained model for programming and natural languages. In Findings of EMNLP , 2020. [6] Sanja y Vish wakarma, F rancis Harkins, Siddharth Golec ha, Vishal Sharathc handra Ba jpe, Nicolas Dupuis, Luca Buratti, Da vid Kremer, Ismael F aro, Ruc hir Puri, and Juan Cruz-Benito. Qiskit HumanEv al: An ev aluation b enc hmark for quan tum co de generative mo dels. arXiv pr eprint arXiv:2406.14712 , 2024. [7] Riccardo Rasconi and Angelo Oddi. An inno v ativ e genetic algorithm for the quantum circuit compilation problem. In Pr o c e e dings of the AAAI Confer enc e on A rtificial Intel ligenc e , v olume 33, pages 7707–7714, 2019. [8] Luca Moro, Matteo G. A. P aris, Marcello Restelli, and Enrico Prati. Quan tum compiling b y deep reinforcement learning. Communic ations Physics , 4(1):178, 2021. [9] Xudong Lu, Kaisen P an, Ge Y an, Jiaming Shan, W enjie W u, and Junc hi Y an. QAS-Bench: Rethinking quan tum arc hitecture search and 17 a b enc hmark. In Pr o c e e dings of the 40th International Confer enc e on Machine L e arning , pages 22880–22898. PMLR, 2023. [10] Ang Li, Sam uel Stein, Sriram Krishnamo orth y , and James Ang. QASM- Benc h: A lo w-level quan tum benchmark suite for NISQ ev aluation and simulation. https://doi.org/10.1145/3550488 , 2023. Accessed 2026-02-27. [11] pnnl. Qasmbench (gith ub rep ository). https://github.com/pnnl/ QASMBench . Accessed 2026-02-27. [12] Florian Fürrutter, Gorka Muñoz-Gil, and Hans J. Briegel. Quan tum circuit synthesis with diffusion mo dels. Natur e Machine Intel ligenc e , 6:512–524, 2024. [13] Florian Fürrutter and collab orators. genqc (github rep ository). https: //github.com/FlorianFuerrutter/genQC . Accessed 2026-02-27. [14] Boran Apak, Medina Bandic, Aritra Sarkar, and Sebastian F eld. Ket- GPT – dataset augmentation of quantum circuits using transformers. In International Confer enc e on Computational Scienc e (ICCS) . Springer, 2024. [15] QML-Group. Ketgpt (gith ub rep ository). https://github.com/ QML- Group/KetGPT . Accessed 2026-02-27. [16] Collin Beaudoin, K oustubh Phalak, and Sw aro op Ghosh. AltGraph: Redesigning quantum circuits using generative graph mo dels for efficient optimization. In Pr o c e e dings of the Gr e at L akes Symp osium on VLSI (GLSVLSI) , pages 44–49. A CM, 2024. [17] merileijona. quan tum-circuits-8k (h ugging face dataset card). https:// huggingface.co/datasets/merileijona/quantum- circuits- 8k . Ac- cessed 2026-02-27. [18] Shlomo Kashani. Quan tumLLMInstruct: A 500k LLM instruction- tuning dataset with problem-solution pairs for quan tum computing. arXiv pr eprint arXiv:2412.20956 , 2024. [19] BoltzmannEn tropy . Quan tumllminstruct (hugging face dataset card). https://huggingface.co/datasets/BoltzmannEntropy/ QuantumLLMInstruct . Accessed 2026-02-27. 18 [20] R ui Y ang, Y ue Gu, Zihao W ang, Y e Liang, T ongy ang Li, et al. QCir- cuitBenc h: A large-scale dataset for b enc hmarking quantum algorithm design. In NeurIPS 2025 Datasets and Benchmarks T r ack , 2025. [21] Qiskit. granite-3.2-8b-qiskit (h ugging face model card). https:// huggingface.co/Qiskit/granite- 3.2- 8b- qiskit . Accessed 2026-02- 27. [22] Qiskit. Qw en2.5-co der-14b-qiskit (hugging face mo del card). https: //huggingface.co/Qiskit/Qwen2.5- Coder- 14B- Qiskit . Accessed 2026-02-27. [23] Nicolas Dupuis, A tin Tiw ari, Y oussef Mroueh, David Kremer, Ismael F aro, and Juan Cruz-Benito. Quan tum v erifiable rew ards for post- training qiskit co de assistant. arXiv pr eprint arXiv:2508.20907 , 2025. [24] Zhiw ei Chen and Hao T ang. UDiTQC: U-Net-style diffusion transformer for quantum circuit syn thesis. arXiv pr eprint arXiv:2501.16380 , 2025. [25] Lin us Jern, V alter Uotila, Cong Y u, and Bo Zhao. Agen t-Q: Fine-tuning large language models for quantum circuit generation and optimization. In 2025 IEEE International Confer enc e on Quantum Computing and Engine ering (QCE) . IEEE, 2025. [26] Daniel Barta, Darya Martyniuk, Johannes Jung, and Adrian P aschk e. Lev eraging diffusion mo dels for parameterized qu an tum circuit genera- tion. In 2025 IEEE International Confer enc e on Quantum Computing and Engine ering (QCE) . IEEE, 2025. [27] Cong Y u, Linus Jern, V alter Uotila, Bo Zhao, et al. QUASAR: Quantum assem bly co de generation using tool-augmented LLMs via agen tic RL. arXiv pr eprint arXiv:2510.00967 , 2025. [28] Collin Beaudoin and Swaroop Ghosh. Q-F usion: Diffusing quan tum circuits. arXiv pr eprint arXiv:2504.20794 , 2025. [29] Florian Fürrutter, Zohim Chandani, Ikk o Hamamura, Hans J. Briegel, and Gorka Muñoz-Gil. Syn thesis of discrete-con tinuous quantum circuits with m ultimo dal diffusion mo dels. arXiv pr eprint arXiv:2506.01666 , 2025. [30] Zhenxiao F u, F an Chen, and Lei Jiang. QAgent: An LLM-based m ulti- agen t system for autonomous Op enQASM programming. arXiv pr eprint arXiv:2508.20134 , 2025. 19 [31] Ben yucong. graph-data-quan tum-rl (h ugging face dataset card). https: //huggingface.co/datasets/Benyucong/graph- data- quantum- rl . A ccessed 2026-02-27. [32] merileijona. quantumgpt-124m (hugging face mo del card). https: //huggingface.co/merileijona/quantumgpt- 124m . Accessed 2026- 02-27. [33] Zhihong Shao, P eiyi W ang, Qihao Zh u, Runxin Xu, Junxiao Song, Mingc huan Zhang, Y. K. Li, Y. W u, and Da ya Guo. DeepSeekMath: Pushing the limits of mathematical reasoning in op en language mo dels. arXiv pr eprint arXiv:2402.03300 , 2024. [34] Ben yucong. sft_quantum_circuit_gen_4b (h ugging face mo del card). https://huggingface.co/Benyucong/sft_quantum_circuit_ gen_4B . Accessed 2026-02-27. [35] Ben yucong. rl_quan tum_4b (hugging face mo del card). https:// huggingface.co/Benyucong/rl_quantum_4b . Accessed 2026-02-27. [36] Floki00. qc_unitary_3qubit (hugging face model card). https:// huggingface.co/Floki00/qc_unitary_3qubit . Accessed 2026-02-27. [37] Ali Ja v adi-Abhari, Matthew T reinish, Kevin Krsulic h, Christopher J. W o od, Jak e Lishman, Julien Gacon, Simon Martiel, Paul D. Na- tion, Lev S. Bishop, Andrew W. Cross, Blak e R. Johnson, and Ja y M. Gam b etta. Quantum computing with Qiskit. arXiv pr eprint arXiv:2405.08810 , 2024. [38] Ed Y ounis, Costin Iancu, Wim Lavrijsen, Marc Davis, Ethan Smith, and USDOE. BQSKit: Berk eley quantum synthesis to olkit. https: //bqskit.lbl.gov , 2021. La wrence Berk eley National Lab oratory . OSTI:1785933. [39] Gush u Li, Y ufei Ding, and Y uan Xie. T ackling the qubit mapping problem for NISQ-era qu an tum devices. In Pr o c e e dings of the 24th International Confer enc e on A r chite ctur al Supp ort for Pr o gr amming L anguages and Op er ating Systems (ASPLOS) , pages 1001–1014, 2019. 20
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment