Generative AI for Quantum Circuits and Quantum Code: A Technical Review and Taxonomy

Generativ e AI for Quan tum Circuits and Quan tum Co de: A T ec hnical Review and T axonom y Juhani Merileh to Univ ersity of V aasa & Universit y of T urku merilehto@pm.me Abstract W e review thirteen generativ e systems and ﬁv e supp orting datasets for quantum circuit and quan tum co de generation, identiﬁed through a structured scoping review of Hugging F ace, arXiv, and prov enance tracing (Jan uary–F ebruary 2026). W e organize the ﬁeld along tw o axes—artifact t yp e (Qiskit code, Op enQASM programs, circuit graphs) crossed with training regime (sup ervised ﬁne-tuning, v eriﬁer-in-the-lo op RL, diﬀusion/graph generation, agentic optimization)—and system- atically apply a three-la yer ev aluation framework cov ering syntactic v alidit y , semantic correctness, and hardware executabilit y . The central ﬁnding is that while all review ed systems address syn tax and most address semantics to some degree, none reports end-to-end ev aluation on quantum hardware (Lay er 3b), leaving a signiﬁcan t gap b et w een generated circuits and practical deplo yment. Sc op e note : “quan tum co de” refers throughout to quan tum pr o gr am artifacts (QASM, Qiskit); w e do not cov er generation of quantum error-correcting co des (QEC). 1 In tro duction Generativ e AI for quantum softw are has diversiﬁed from quantum-a w are co de assistan ts in to multiple tec hnical families that synthesize quan tum artifacts at diﬀerent abstraction levels. The imp ortan t axis of diﬀeren tiation across these systems is not “LLM vs. non-LLM,” but how semantic c orr e ctness is deﬁne d and enfor c e d : unit tests, ﬁdelity proxies, ob jectiv e-function scores, or entanglemen t proxies. This review imp oses structure on this fragmen ted landscap e. 1 Scop e. W e focus on gener ative systems that output quan tum artifacts in tended to b e executed or compiled: (i) quantum circuits as gate sequences or graphs; (ii) Op enQASM (2.0 and 3.0) programs; and (iii) Qiskit (Python) co de that constructs circuits. W e exclude systems where quan tum circuits are in ternal components but outputs are non-circuit data (“quan tum-enhanced” generativ e modelling). W e use the follo wing terminology throughout: • Syn tactic v alidit y : the output parses/compiles under the target gram- mar/to olc hain. • Seman tic correctness : the generated artifact implemen ts the in tended unitary , algorithm, or task ob jectiv e. • Hardw are executabilit y : the artifact transpiles and runs under realistic device constraints (connectivit y , gate set, noise) with acceptable resource usage. Op enQASM 2.0 versus 3.0. Sev eral reviewed systems target Op enQASM 2.0 [ 1 ] while others target OpenQASM 3.0 [ 2 ]. Op enQASM 2.0 is a straigh t- line gate-sequence language; 3.0 introduces classical con trol ﬂow ( for , while , if-else ), typed v ariables, subroutine deﬁnitions, and timing instructions. F or generativ e mo dels, this distinction matters in three wa ys: (1) the grammar space is considerably larger, increasing the probability of syn tactically inv alid output; (2) seman tic correctness becomes harder to verify b ecause classical con trol ﬂo w creates path-dep enden t b eha viour; and (3) generated circuits ma y exploit features (e.g., mid-circuit measurement and feed-forward) that curren t sim ulators and hardw are supp ort unevenly . T able 2 identiﬁes whic h QASM v ersion each system targets; systems op erating on 2.0 and 3.0 are not directly comparable in generation diﬃculty or ev aluation complexity . P ositioning against classical co de generation. Classical co de LLMs suc h as Co dex [ 3 ], AlphaCo de [ 4 ], and Co deBER T [ 5 ] generate programs ev aluated primarily via unit tests and execution-based feedback. Unit-test ev aluation transfers directly to Qiskit code generation, as demonstrated b y QiskitHumanEv al [ 6 ]. Ho wev er, quantum semantic equiv alence chec king— v erifying that t wo circuits implement the same unitary—is fundamentally more exp ensiv e: statev ector simulation requires O (2 n ) memory for n qubits, and full unitary comparison costs O (4 n ) . No general p olynomial-time equiv a- lence c hec ker is kno wn for arbitrary unitaries at scale. Hardw are executabilit y as a ﬁrst-class constraint—connectivit y maps, native gate sets, and coherence- time budgets—has no classical analogue. These diﬀerences motiv ate the three-la yer framew ork developed in § 5 . 2 2 Review Metho dology 2.1 Searc h and Screening W e conducted a scoping review follo wing a structured search proto col mo d- elled on PRISMA-ScR reporting guidelines. All screening and inclusion decisions were p erformed b y a single reviewer (the author); no indep enden t second screening w as conducted. This is ac knowledged as a limitation in § 7.5 . Sources w ere assembled b et w een Jan uary 1 and F ebruary 15, 2026 via three channels: 1. Mo del-h ub searc h on Hugging F ace: four k eyw ord queries against mo del-card conten t— "QASM" (11 hits), "quantum circuit" (40 hits), "OpenQASM" (3 hits), "Qiskit generator" (0 hits)—yielding 35 unique base mo del cards after deduplication and collapsing quantized redistribu- tions. 2. P ap er searc h on arXiv (categories cs.AI , cs.LG , quant-ph ; submissions ≤ 2026-02-15): ﬁv e keyw ord queries yielding 193 unique pap ers after cross- query deduplication. The largest result set (185 pap ers for RL + quantum circuit) is dominated by quantum-enhanced RL w orks that do not gener ate circuits; these were excluded during screening. 3. Pro venance follo w-up via GitHub rep ositories, Hugging F ace orga- nization pages, and bac kward/forw ard citation tracing. This channel reco vered three systems (Granite-3.2-8b-Qiskit, Qw en2.5-14B-Qiskit, Ket- GPT) whose mo del cards did not match k eyword queries. Screening ﬂo w. The com bined p o ol of 228 unique candidates (35 HF mo del cards + 193 arXiv pap ers) w as screened in t wo stages: (i) title/abstract screening remov ed 190 candidates (172 quantum-enhanced RL/VQC pap ers + 18 non-generative HF mo del cards); (ii) full-text screening of the remaining 38 candidates remov ed 16 (11 for insuﬃcien t tec hnical disclosure, 5 for pro ducing outputs outside scope). After deduplication across c hannels, 13 generativ e systems and 5 datasets w ere retained. Inclusion criteria. A system was included if public artifacts (pap er, mo del card, or rep ository) jointly disclosed at least tw o of: (i) mo del architecture or parameter coun t, (ii) training data source and appro ximate scale, (iii) at least one quantitativ e ev aluation metric. Systems with partial disclosure w ere included and annotated with “Unsp eciﬁed in source” for missing ﬁelds. F ully closed systems with no public disclosure were excluded. 3 T reatmen t of partial disclosure. “Insuﬃcien t technical disclosure” w as applied when a system’s public artifacts did not meet the t wo-of-three criterion ab o ve and the missing information could not b e inferred from the rep ository . Systems excluded for this reason are noted in fo otnotes rather than listed individually , as their omission reﬂects disclosure limitations rather than technical inadequacy . 3 Bac kground and Timeline The systems review ed here b elong to a broader contin uum of automated quan tum circuit construction. Before the current w av e of generativ e-mo del- based approaches, the ﬁeld dev elop ed substan tial foundations in ev olutionary and reinforcement-learning-based circuit synthesis. Genetic algorithms ha ve b een applied to quantum circuit compilation since at least 2019 [ 7 ], and deep RL w as demonstrated for quantum compiling by Moro et al. [ 8 ]. Multi- ob jectiv e ev olutionary architecture searc h [ 9 ] further matured the space. These pre-LLM metho ds t ypically op erate b y sequential gate placemen t guided by heuristic or learned v alue functions, and remain comp etitive for structured synthesis tasks. The generativ e-mo del wa ve review ed here (2024– 2026) diﬀers primarily in its use of large pre-trained language or diﬀusion mo dels and in its am bition to generalize across task families rather than optimize for a single target unitary . T able 1 pro vides a chronological o verview. The ﬁeld has progressed from b enc hmark and dataset construction (2020–2024) through sup ervised generation mo dels (2024–2026), with veriﬁer-in-the-loop and agentic systems emerging prominently in 2025. 4 Y ear System / Dataset T yp e Key Innov ation 2020 QASMBench [ 10 , 11 ] Benchmark Curated low-lev el OpenQASM 2.0 b enc hmark suite for NISQ ev aluation 2024 genQC [ 12 , 13 ] Model First diﬀusion model for quantum circuit synthesis; text-conditioned denoising ov er discrete gate tok ens 2024 KetGPT [ 14 , 15 ] Model GPT-based transformer for generating realistic Open- QASM 2.0 circuits (dataset augmentation) 2024 AltGraph [ 16 ] Model Generative graph mo dels (D-V AE, DeepGMG) for circuit DA G rewriting and optimization 2024 QiskitHumanEv al [ 6 ] Benchmark Unit-test b enc hmark (101 tasks) for Qiskit co de generation 2024 quantum-circuits-8k [ 17 ] Dataset Synthetic text → QASM 2.0 pairs with paraphrase augmentation 2024 QuantumLLMInstruct [ 18 , 19 ] Dataset 500k+ claimed instruction-tuning pairs across 90+ quantum domains 2024 QCircuitBench [ 20 ] Benchmark 120k+ algorithm-design instances with v eriﬁcation oracles spanning 25 algorithms 2025 Granite-3.2-8b / Qwen2.5-14B Qiskit [ 21 , 22 , 23 ] Model Industrial Qiskit code LLMs with GRPO p ost- training using quantum veriﬁable rewards 2025 UDiTQC [ 24 ] Model U-Net-style diﬀusion transformer; outp erforms genQC on entanglemen t and compilation 2025 Agent-Q (SFT) [ 25 ] Mo del SFT on 14k optimization circuits in Op enQASM 3.0 2025 Barta et al. [ 26 ] Model Diﬀusion for parameterized quantum circuits; ex- tends to contin uous gate parameters 2025 QUASAR (SFT+RL) [ 27 ] Model Agentic RL with hierarchical 4-level reward; tool- augmented LLM 2025 Q-F usion [ 28 ] Model Lay erDA G-based diﬀusion o ver circuit DAGs; 100% syntactic v alidit y in tested regimes 2025 genQC v2 [ 29 ] Model Multimodal diﬀusion generating discrete structure and contin uous parameters sim ultaneously 2025 QAgent [ 30 ] Model Multi-agent LLM for autonomous OpenQASM pro- gramming; RAG + CoT + to ol augmentation 2025 graph-data-quantum- rl [ 31 ] Dataset 14.5k rows with prompts, graphs, Hamiltonians, OpenQASM 3.0 circuits 2026 QuantumGPT-124M [ 32 ] Model Small sp ecialist GPT-2 for Op enQASM 2.0; task- speciﬁc tiny LM feasibilit y T able 1: Chronological milestones in generative AI for quantum circuits and code. Y ears reﬂect ﬁrst public appearance (preprint, model card, or rep ository). 5 4 T axonom y of Generativ e Systems W e organize reviewed systems along tw o axes: artifact t yp e (Qiskit co de vs. QASM vs. circuit graph) crossed with training regime (static SFT, v eriﬁer-in-the-lo op RL, diﬀusion/graph generation, agentic optimization). This pair maximises separation among reviewed systems and aligns with the t wo practical questions a practitioner faces: “What do I w ant the system to output?” and “How is correctness enforced during training?” Alternativ e axis choices—qubit regime or v eriﬁcation cost—w ere considered but rejected as either degenerate (most systems op erate in the small-qubit regime) or conﬂating distinct mo del designs that share a cost proﬁle. The six families are: • Qiskit co de assistan ts : general co de LLMs adapted to Qiskit APIs, ev aluated b y executable unit tests. • Op enQASM generators (static SFT) : sup ervised ﬁne-tuned LMs pro ducing Op enQASM for sp eciﬁc domains. • Sp ecialist small LMs : small mo dels ( ∼ 100M parameters) trained on text → QASM instruction pairs. • V eriﬁer-in-the-lo op alignment : RL/preference optimization with sim ulator-based rew ards. • Graph and diﬀusion generators : mo dels op erating on circuit D AGs or discrete tokenizations of gates and parameters. • Agen tic systems : m ulti-step generation with external to ols (simulators, compilers) used for scoring and iterative impro vemen t. 4.1 Qiskit Co de Assistan ts Granite-3.2-8b-Qiskit [ 21 ] and Qwen2.5-Coder-14B-Qiskit [ 22 ] are general- purp ose co de LLMs extended pre-trained on a curated Qiskit corpus (approx- imately 50M tokens of Qiskit v2.0 API co de) and ﬁne-tuned with sup ervised instruction tuning. Ev aluation uses QiskitHumanEv al [ 6 ], a b enc hmark of 101 tasks where the metric is pass@ k : the probability that at least one of k generated completions passes all unit tests. More recent work [ 23 ] adds GRPO (Group Relativ e P olicy Optimization) [ 33 ] p ost-training with quan- tum veriﬁable rewards. GRPO eliminates the critic net work by estimating adv antages relativ e to the group mean of sampled completions, reducing memory ov erhead. In the quantum setting, the reward function chec ks b oth syn tactic correctness and functional equiv alence via Qiskit A er sim ulation. 6 4.2 Op enQASM Generators and Sp ecialist Small LMs Agen t-Q [ 25 , 34 ] is a Qwen-based mo del ﬁne-tuned on approximately 14,000 parameterized optimization circuits (QAO A, V QE, adaptive VQE) in Op en- QASM 3.0. The released Hugging F ace chec kp oin t is 4B parameters, though the pap er do es not clearly sp ecify the base-mo del size. Ev aluation measures ob jectiv e alignmen t : Jensen–Shannon div ergence b etw een the output dis- tribution of the generated circuit and the ground-truth distribution, as w ell as exp ectation-v alue discrepancy under problem-sp eciﬁc cost Hamiltonians. Quan tumGPT-124M [ 32 , 17 ] is a GPT-2-scale (124M-parameter) mo del trained on approximately 8,000 synthetic text → Op enQASM 2.0 pairs with paraphrase augmentation. It targets small circuits ( ≤ 5 qubits) and ev aluates syn tactic v alidity via parser chec ks and approximate task-type success via man ual inspection. 4.3 V eriﬁer-in-the-Lo op Alignmen t QUASAR [ 27 , 35 ] extends Agen t-Q’s SFT foundation with agen tic reinforce- men t learning using GRPO. The key innov ation is a hierarc hical four-level rew ard computed by an external quan tum sim ulation to ol: (1) a syn tax rew ard for successful Op enQASM 3.0 parsing; (2) a distributional align- men t term (Jensen–Shannon divergence); (3) an exp ectation-v alue align- men t term comparing cost-Hamiltonian expectation v alues; and (4) an optimization usabilit y term assessing whether the generated circuit con- v erges eﬃcien tly under further classical parameter optimization. The mo del in teracts with a quantum to ol server via HTTP , receiving structured feedback at each RL step. 4.4 Graph and Diﬀusion Generators genQC [ 12 , 13 , 36 ] employs a denoising diﬀusion mo del on discrete circuit tok ens. Circuits are represen ted as 2D tensors (rows = qubits, columns = time steps, cells = gate iden tities). The rev erse pro cess uses a conditional U-Net with text conditioning via frozen Op enCLIP em b eddings. Ev aluation uses pro cess ﬁdelity ( F = | T r ( U † gen U target ) | 2 /d 2 ) and compilation success rate (typically 3–5 qubits). Model size is not rep orted as a single count due to the U-Net + frozen CLIP arc hitecture. AltGraph [ 16 ] uses three generative graph mo dels—D-V AE (GRU and GCN v ariants) and DeepGMG—to transform quan tum circuit DA Gs. The mo dels learn a latent space from which p erturbations pro duce functionally equiv alent circuits with reduced depth and gate count. Ev aluation measures 7 densit y-matrix MSE (0.0074 a v erage) and p ost-transpilation gate coun t and depth reduction (37.55% and 37.75%). Model sizes are unspeciﬁed in source . Q-F usion [ 28 ] adapts the Lay erDA G diﬀusion framework to quantum circuit DA Gs. It rep orts 100% syn tactic v alidity in tested regimes (small random circuits), though seman tic ev aluation b ey ond v alidit y is limited. Mo del size is unsp eciﬁed in source . UDiTQC [ 24 ] replaces genQC’s U-Net bac kb one with a U-Net-st yle Diﬀusion T ransformer (UDiT) combining multi-scale feature extraction with global self-attention. Ev aluated on entanglemen t generation and unitary compilation (up to 8 qubits), it rep orts higher accuracy than genQC. The framew ork supp orts mask ed circuit editing and constrained generation. Mo del size is unsp eciﬁed in source . Barta et al. [ 26 ] extend diﬀusion to p ar ameterize d quantum circuits, generating b oth discrete gate structure and con tinuous rotation angles— addressing a limitation of earlier discrete-token diﬀusion mo dels. Accepted at QCE 2025. Model size is unsp eciﬁed in source . genQC v2 [ 29 ] introduces a multimo dal denoising diﬀusion mo del that sim ultaneously generates circuit structure and con tinuous parameters using t wo indep enden t noise processes with a shared conditioning mechanism. Mo del size is unsp eciﬁed in source ; ev aluation disclosure in the public preprin t is limited. 4.5 Agen tic Systems QAgen t [ 30 ] is a m ulti-agent LLM system for autonomous OpenQASM programming. Given a natural language task description, it decomp oses into sub-tasks dispatched to a Dynamic-few-shot Co der (in-context learning for regular circuits) and a T o ols-augmente d Co der (simulation to ols for complex parameterized tasks). Both incorp orate multi-round self-reﬂection with c hain- of-though t reasoning and RA G. The system rep orts 71.6% improv emen t ov er baseline LLMs on Op enQASM generation. Unlike QUASAR, QAgen t uses prompt engineering and to ol augmentation ov er a frozen base LLM rather than ﬁne-tuning or RL. Mo del size dep ends on the pluggable base LLM. 4.6 Dataset Augmen tation Mo dels KetGPT [ 14 , 15 ] uses a GPT-based transformer to generate synthetic Op en- QASM 2.0 circuits trained on algorithm-deriv ed circuits from MQTBench. Its purp ose is dataset augmentation rather than task-directed generation: a 8 three-fold v eriﬁcation pro cess (manual insp ection, transformer-based real- vs-random classiﬁcation, and structural analysis) v alidates that generated circuits resem ble real algorithm-based circuits. Mo del size is unsp eciﬁed in source . 4.7 Mo del Comparison T able 2 summarizes the review ed generativ e systems. The Syn., Sem., and HW columns enco de ev aluation co v erage using compact lab els rather than binary chec kmarks, reﬂecting that seman tic ev aluation metho ds diﬀer fundamen tally across mo del families and are not directly interc hangeable. 9 System F amily Out- put Size Syn. Sem. HW Ev aluation Notes QuantumGPT- 124M [ 32 , 17 ] Spec. small LM QASM 2.0 124 M ✓ Lim — Parser v alidation; manual inspection on ≤ 5 qubit circuits (no oracle) Granite3.28b Qiskit [ 21 , 6 ] Qiskit LLM Qiskit (Py) 8 B ✓ UT — QiskitHumanEv al unit tests (pass@ k ); coding b enc hmarks Qwen2.514B Qiskit [ 22 , 6 ] Qiskit LLM Qiskit (Py) 14.7 B ✓ UT — Similar unit-test-driven evaluation Agent-Q (SFT) [ 25 , 34 , 31 ] Optim- LLM (SFT) QASM 3.0 Unspec. b ✓ DA — Distribution and exp ectation-v alue alignment QUASAR (SFT+RL) [ 27 , 35 , 31 ] V eriﬁer RL QASM 3.0 4 B ✓ DA — Hierarchical 4-level reward; pass@ k on syntax + ob jectiv e alignment genQC [ 12 , 13 , 36 ] Diﬀusion Circuit tok. Unspec. a ✓ PF — Pro cess ﬁdelity; compilation metrics (3–5 qubits) KetGPT [ 14 , 15 ] T rans. gen. QASM 2.0 Unspec. a ✓ RP — Real-vs.-random classiﬁcation + structural analysis (realism proxy , not task semantics) AltGraph [ 16 ] Graph rewr. Circ. DA G Unspec. a ✓ PF 3a Density-matrix MSE; depth/gate reduction measured p ost-transpilation (L3a) Q-F usion [ 28 ] Graph diﬀ. Circ. DA G Unspec. a ✓ Lim — V alidity rate in tested regimes; limited semantic ev al UDiTQC [ 24 ] Diﬀ. transf. Circuit tok. Unspec. a ✓ PF — Pro cess ﬁdelity on entanglemen t/compilation; outperforms genQC Barta et al. [ 26 ] Diﬀ. (PQC) Param. circ. Unspec. a ✓ Lim — Diﬀusion for parameterized circuits; QCE 2025 genQC v2 [ 29 ] Diﬀ. (multi) Param. circ. Unspec. a ✓ Lim — Multimo dal diﬀusion over discrete structure and contin uous parameters; limited evaluation disclosure QAgent [ 30 ] Agentic LLM Open- QASM Base LLM ✓ DA — Multi-agent RAG+CoT; 71.6% improv ement ov er baselines Ev a luation lay ers : Syn. = Syntactic v alidity (L1); Sem. = Seman tic metho d (L2); HW = Hardware (L3). Semantic co des : UT = unit tests; PF = process ﬁdelity / density-matrix distance; DA = distributional + exp ectation-v alue alignment; RP = realism pro xy (structural similarit y); Lim = limited or manual only . Hardware co des : 3a = post-transpilation resource metrics reported; — = no hardware-lev el ev aluation. a Model size unsp eciﬁed in source; architecture described qualitatively . b Paper text do es not clearly sp ecify the base-mo del size; the released Hugging F ace implementation is 4 B. T able 2: Review ed generative systems with artifact types, training regimes, and ev aluation cov erage. 4.8 Supp orting Datasets and Benc hmarks T able 3 summarizes k ey datasets and b enc hmarks that supp ort the gen- erativ e systems reviewed ab ov e. No single dataset curren tly addresses all three ev aluation la yers, and schema diﬀerences betw een OpenQASM 2.0 and 3.0 datasets remain a practical barrier to cross-system b enc hmarking. 10 Benc hmark suites suc h as QASMBench [ 10 , 11 ] and QCircuitBenc h [ 20 ] are not generative mo dels themselves but pro vide essential ev aluation in fras- tructure: QASMBenc h is asso ciated with execution ﬁdelit y measurements on real devices (IBM, IonQ, Rigetti), while QCircuitBench supplies 120,290 algorithm-design instances with automatic v eriﬁcation oracles spanning 25 algorithms in b oth Op enQASM 3.0 and Qiskit/Cirq formats. Dataset Primary use Scale Notes quantum-circuits-8k [ 17 ] T ext → OpenQASM 2.0 SFT ∼ 8 k Synthetic with paraphrase augmentation; small-circuit emphasis graph-data-quantum-rl [ 31 ] Optimization- circuit generation and RL 14.5 k rows Prompts, graphs, Hamiltonians, Op en- QASM 3.0 circuits, solutions QASMBench [ 11 , 10 ] OpenQASM-2 benchmark suite diverse Curated b enc hmark circuits and circuit- level metrics QCircuitBench [ 20 ] Algorithm design benchmarking 120,290 QASM 3.0 + co de (Qiskit/Cirq) + oracles / veriﬁcation functions QuantumLLMInstruct [ 18 , 19 ] Broad quantum instruction data 500k+ claimed Paper/model card claim 500k+ instruction-tuning pairs across 90+ quan- tum domains; curren t public HF view er exposes 5.15k ro ws T able 3: Datasets and benchmarks supp orting quantum circuit/co de genera- tion and ev aluation. 5 Ev aluation F ramew ork A cross model families, ev aluation decomposes in to three la yers: 1. Syn tax : parsing, compilation, or imp ort success. F or graph- and D AG- based generators, “syntactic v alidit y” means structural w ell-formedness (v alid D A G topology , legal gate placements) rather than parser-v alid program text. 2. Seman tics : the metho d used to assess whether the generated artifact is c orr e ct . This v aries fundamentally across families: unit tests for co de generation; pro cess ﬁdelity or densit y-matrix distances for compilation; exp ectation-v alue and distribution alignmen t for optimization tasks; and realism proxies (e.g., real-vs-random classiﬁcation) for dataset augmen- tation. These metho ds are not in terchangeable, and a system ev aluated b y one metho d cannot b e directly rank ed against a system ev aluated by another. 3. Hardw are/resources , decomp osed into t wo subla yers: 11 3a. Compilabilit y and resource realism : transpilation to a target device’s native gate set and connectivity succeeds; resulting circuit depth, SW AP coun t, and t wo-qubit gate o verhead are acceptable. 3b. Empirical execution : the transpiled circuit is executed on a real QPU; measured output distributions are compared to ideal simulation using metrics such as Hellinger ﬁdelit y or total v ariation distance. This sublay er distinction is diagnostic: a system ma y address 3a (AltGraph measures p ost-transpilation depth and gate counts) without addressing 3b (no system in the reviewed corpus rep orts QPU execution results as part of mo del ev aluation). Benc hmark suites suc h as QiskitHumanEv al [ 6 ] formalize unit-test ev al- uation for Qiskit co de generation. Optimization-focused systems suc h as QUASAR emphasize simulator-driv en ob jectiv e metrics and pass@ k v ariants o ver m ultiple correctness criteria [ 27 , 35 ]. Systematic application. T able 2 applies this framework to all review ed systems through the Sem. and HW columns. The pattern is clear: every review ed system addresses La yer 1 (syntax), most address La y er 2 (semantics) to some degree, and none addresses La yer 3b (empirical hardw are execution). Only AltGraph partially addresses Lay er 3a. This observ ation is elab orated in § 6 . T ask-ob jectiv e-to-ev aluator mapping. A practitioner selecting a se- man tic ev aluator m ust matc h the task ob jectiv e to an appropriate metric. T able 4 pro vides concrete guidance. Metric gaming and comp osite ev aluation. An y ﬁxed ev aluation metric is susceptible to gaming. Distribution-matc hing metrics can be satisﬁed b y circuits that repro duce correct measurement statistics while implementing an incorrect unitary . Unit-test ev aluation can b e gamed b y ov erﬁtting to test-case structure. Fidelity metrics are robust against such shortcuts but exp onen tially exp ensive at scale. These failure modes motiv ate c omp osite ev aluation protocols com bining metrics from diﬀerent paradigms, as w ell as adv ersarial test suites targeting common ev aluator blind sp ots. 12 T ask Ob jective Recommended Ev alua- tor Kno wn F ailure Mo des Compilation to tar- get unitary Process ﬁdelity , diamond norm proxy , or equiv a- lence chec king on a basis subset Relative phase errors invisible to basis- restricted measurement; partial basis chec k- ing misses errors on un tested inputs; ancilla garbage passes ﬁdelit y but fails full equiv a- lence Optimization ansatz generation Energy exp ectation v alue + conv ergence speed + robustness under re- optimization Low-energy ansatz may be a lo cal minimum; conv ergence sp eed conﬂated with initial parameter sensitivity Algorithm design tasks Oracle-based functional chec ks (as in QCircuit- Bench [ 20 ]) Oracle leakage if test structure correlates with training data; hard to verify beyond provided oracles Code assistants (Qiskit) Unit tests + execution traces (QiskitHumanEv al [ 6 ]) T ests chec k observ able b eha viour, not in- ternal correctness; ancilla state and relative phase may b e ignored Dataset augmenta- tion Real-vs-random classiﬁca- tion + structural analysis (KetGPT [ 14 ]) Distribution matching without seman tic grounding; generated circuits may b e syntac- tically realistic but computationally trivial T able 4: Mapping task ob jectiv es to recommended semantic ev aluators and kno wn failure mo des. 6 Hardw are Gap and T ranspilation Hardw are ev aluation as a ﬁeld-wide gap. A substan tive ﬁnding of this review is that none of the thirteen review ed generativ e systems rep orts end-to-end hardw are execution results (generation → transpile → execute → compare) as part of mo del ev aluation. Lay er 3b is absent from the generative mo del corpus. Benchmark suites such as QASMBench [ 10 ] are asso ciated with hardw are execution ﬁdelity measuremen ts, but QASMBench is ev aluation infrastructure rather than a generative system. Among the generativ e systems, only AltGraph partially addresses Lay er 3a b y measuring p ost-transpilation depth and gate coun ts; no system closes the loop to La yer 3b. The fol lowing pr oto c ol is pr op ose d by this r eview as a dir e ction for futur e evaluation pr actic e; it is not establishe d in the r eviewe d c orpus. A hardw are ev aluation proto col for generative quantum circuits migh t include: (a) transpi- lation to a sp eciﬁc device’s native gate set and connectivity (e.g., IBM Eagle 127-qubit heavy-hex), (b) execution with m ultiple shot coun ts, (c) compari- son of measured distributions against ideal simulation using Hellinger ﬁdelity or total v ariation distance, and (d) resource accoun ting (SW AP insertions, ﬁnal circuit depth, execution time relative to T 1 / T 2 coherence times). 13 T ranspilation constraints. All generated quan tum circuits must pass through transpilation b efore hardw are execution. T ranspilers such as the Qiskit transpiler [ 37 ], BQSKit [ 38 ], and SABRE [ 39 ] perform gate decom- p osition in to nativ e gate sets, qubit routing, and optimization passes. In the public artifacts review ed here, none of the generativ e systems explicitly accoun ts for hardw are connectivit y constraints or native gate sets during generation. Agen t-Q, QUASAR, an d QAgen t generate circuits using abstract gate sets that assume all-to-all connectivity . genQC generates from a ﬁxed gate p o ol that may not align with target hardware. AltGraph is closest to hardware aw areness via p ost-transpilation metrics, but transpilation is applied after generation rather than constrained during generation. This means generativ e mo dels currently solv e a subset of the full circuit design problem: they pro duce logically correct circuits that may require substan tial transpilation ov erhead. W e pr op ose that future systems incor- p orating transpilation constrain ts during generation—e.g., conditioning on device connectivity graphs or p enalizing SW AP-heavy circuits during RL— w ould address a signiﬁcant practical gap. Minimal rep orting baseline. A s a r e c ommendation fr om this r eview : ev en without transpilation-aw are generation, a straigh tforward impro vemen t w ould be to alwa ys rep ort post-transpilation metrics under a standard refer- ence back end, including at minim um: (i) SW AP ov erhead ratio, (ii) depth blo w-up factor, and (iii) the target back end top ology . These require only a single transpiler call and would enable cross-system comparison on a hardw are-realism dimension curren tly absen t from all review ed ev aluations. 7 Discussion 7.1 Ev aluation Standardization The primary b ottlenec k remains comparability: unit-test pass rates, dis- tributional alignment, and ﬁdelity pro xies are not directly in terchangeable. Benc hmarks may b e gamed if they measure pro xies rather than the task ob jectiv e [ 6 , 27 , 12 ]. As a concrete illustration, consider a 5-qubit circuit ev aluated by b oth pass@ k and pro cess ﬁdelity . A circuit implemen ting the correct computational- basis mapping (passing the unit test) ma y achiev e F < 0 . 8 if it agrees on tested observ able b eha viour but diﬀers in relative phases, ancilla state, or b eha viour on untested inputs. Conv ersely , a circuit with F = 0 . 99 ma y fail a unit test that chec ks a side-eﬀect (e.g., qubit ordering conv en tion) 14 the ﬁdelit y metric ignores. These div ergences reﬂect structural diﬀerences b et w een ev aluation paradigms that prev ent cross-system ranking. 7.2 Data Prov enance and Repro ducibilit y Sc hema mismatc hes impede dataset reuse. The quantum-circuits-8k dataset uses Op enQASM 2.0 syn tax, while graph-data-quantum-rl uses Op enQASM 3.0 with typed v ariables and parameterized gates. A mo del trained on one format cannot b e directly ev aluated on the other without a translation la yer, and automated QASM 2.0 → 3.0 conv ersion is not lossless [ 17 ]. 7.3 Scaling and V eriﬁcation Cost Scaling beyond small-qubit compilation is constrained b y classical v eriﬁ- cation cost. Statevector simulation requires O (2 n ) memory; full unitary reconstruction costs O (4 n ) . F or n = 50 , statevector storage alone demands appro ximately 18 p etab ytes, and full unitary equiv alence c hecking is doubly in tractable. T ensor-netw ork and stabilizer-rank methods oﬀer partial relief for structured circuits but do not generalize to arbitrary unitaries. This sim ulation w all is a fundamen tal barrier to scaling v eriﬁer-in-the-lo op training b ey ond the 30–50 qubit regime [ 12 , 27 , 25 ]. 7.4 F uture Ev aluation Directions The fol lowing str ate gies ar e pr op ose d by this r eview as futur e evaluation dir e ctions; they ar e not establishe d pr actic e in the r eviewe d c orpus. The path-dependent seman tics of OpenQASM 3.0 create ev aluation c hallenges b ey ond those of 2.0’s straigh t-line circuits. Three strategies merit consideration: (i) b ounde d-p ath exe cution —en umerate all classical branc h paths up to a co verage b ound and verify each path’s unitary inde- p enden tly; (ii) tr ac e-b ase d unit testing —sp ecify exp ected measurement and classical-v ariable traces for represen tative inputs; and (iii) symb olic exe cu- tion —propagate symbolic states through classical branches to deriv e path conditions and v erify equiv alence on each feasible path. None of the review ed systems currently employs these strategies. 7.5 Threats to V alidity Sev eral limitations should b e considered when in terpreting this review: • Single-review er pro cess. All screening, inclusion, and co ding decisions w ere made b y one author. No in ter-rater reliability measure w as computed. 15 While appropriate for a scoping review of a small, emerging corpus, this in tro duces the p ossibility of systematic screening bias. • Corpus dependence on public disclosure. The review is limited to systems with publicly av ailable pap ers, model cards, or rep ositories. Closed-source industrial systems are excluded b y design, which ma y omit signiﬁcan t w ork. • Pro venance-based inclusion. Three systems were identiﬁed through organization pages and citation tracing rather than keyw ord search. This reﬂects the limitations of keyw ord discov ery in a fast-mo ving ﬁeld but in tro duces discretionary inclusion that is not fully reproducible from the k eyword proto col alone. • Mixed evidence quality . Review ed systems range from p eer-review ed publications to mo del cards with minimal do cumen tation. Ev aluation claims are taken at face v alue where replication w as not feasible. • Non-comparable metrics. The heterogeneit y of ev aluation metho ds across families means that cross-system ranking is not p ossible from the evidence base alone, despite the tabular presen tation. 8 Conclusion Generativ e AI for quan tum circuits and co de spans multiple mo del fami- lies uniﬁed by one central problem: enforcing seman tic correctness under exp ensiv e v eriﬁcation. This review contributes a taxonom y grounded in artifact t yp e × training regime, a three-la yer ev aluation framew ork (with La yer 3 decomp osed in to compilabilit y and empirical execution sublay ers) rev ealing that no reviewed generativ e mo del closes the lo op to hardware execution, and a p ositioning against classical code generation that clariﬁes the unique challenges of the quan tum setting. F uture progress likely hinges on standardized ev aluation proto cols that separate syn tax, semantics, and hardw are realism; improv ed dataset prov enance with attention to QASM v ersion in terop erabilit y; transpilation-aw are generation; and scalable v eriﬁer- in-the-lo op metho ds that generalize b eyond narro w problem families and small qubit counts. 16 References [1] Andrew W. Cross, Lev S. Bishop, John A. Smolin, and Jay M. Gam b etta. Op en quantum assem bly language. arXiv pr eprint arXiv:1707.03429 , 2017. [2] Andrew W. Cross, Ali Jav adi-Abhari, Thomas Alexander, Niel de Beau- drap, Lev S. Bishop, Stefan Heidel, Colm A. Ry an, Prasahnt Siv ara jah, John Smolin, Ja y M. Gam b etta, and Blake R. Johnson. OpenQASM 3: A broader and deep er quan tum assem bly language. A CM T r ansactions on Quantum Computing , 3(3):1–50, 2022. [3] Mark Chen, Jerry T worek, Heewoo Jun, Qiming Y uan, Henrique P onde de Oliveira Pin to, Jared Kaplan, Harri Edw ards, Y uri Burda, Nic holas Joseph, Greg Bro c kman, et al. Ev aluating large language mo dels trained on co de. arXiv pr eprint arXiv:2107.03374 , 2021. [4] Y ujia Li, Da vid Choi, Jun young Chung, Nate Kushman, Julian Sc hrit- t wieser, Rémi Leblond, T om Eccles, James Keeling, F elix Gimeno, Agustin Dal Lago, et al. Comp etition-level co de generation with Alpha- Co de. Scienc e , 378(6624):1092–1097, 2022. [5] Zhangyin F eng, Day a Guo, Duyu T ang, Nan Duan, Xiaocheng F eng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, et al. Co de- BER T: A pre-trained model for programming and natural languages. In Findings of EMNLP , 2020. [6] Sanja y Vish wakarma, F rancis Harkins, Siddharth Golec ha, Vishal Sharathc handra Ba jpe, Nicolas Dupuis, Luca Buratti, Da vid Kremer, Ismael F aro, Ruc hir Puri, and Juan Cruz-Benito. Qiskit HumanEv al: An ev aluation b enc hmark for quan tum co de generative mo dels. arXiv pr eprint arXiv:2406.14712 , 2024. [7] Riccardo Rasconi and Angelo Oddi. An inno v ativ e genetic algorithm for the quantum circuit compilation problem. In Pr o c e e dings of the AAAI Confer enc e on A rtiﬁcial Intel ligenc e , v olume 33, pages 7707–7714, 2019. [8] Luca Moro, Matteo G. A. P aris, Marcello Restelli, and Enrico Prati. Quan tum compiling b y deep reinforcement learning. Communic ations Physics , 4(1):178, 2021. [9] Xudong Lu, Kaisen P an, Ge Y an, Jiaming Shan, W enjie W u, and Junc hi Y an. QAS-Bench: Rethinking quan tum arc hitecture search and 17 a b enc hmark. In Pr o c e e dings of the 40th International Confer enc e on Machine L e arning , pages 22880–22898. PMLR, 2023. [10] Ang Li, Sam uel Stein, Sriram Krishnamo orth y , and James Ang. QASM- Benc h: A lo w-level quan tum benchmark suite for NISQ ev aluation and simulation. https://doi.org/10.1145/3550488 , 2023. Accessed 2026-02-27. [11] pnnl. Qasmbench (gith ub rep ository). https://github.com/pnnl/ QASMBench . Accessed 2026-02-27. [12] Florian Fürrutter, Gorka Muñoz-Gil, and Hans J. Briegel. Quan tum circuit synthesis with diﬀusion mo dels. Natur e Machine Intel ligenc e , 6:512–524, 2024. [13] Florian Fürrutter and collab orators. genqc (github rep ository). https: //github.com/FlorianFuerrutter/genQC . Accessed 2026-02-27. [14] Boran Apak, Medina Bandic, Aritra Sarkar, and Sebastian F eld. Ket- GPT – dataset augmentation of quantum circuits using transformers. In International Confer enc e on Computational Scienc e (ICCS) . Springer, 2024. [15] QML-Group. Ketgpt (gith ub rep ository). https://github.com/ QML- Group/KetGPT . Accessed 2026-02-27. [16] Collin Beaudoin, K oustubh Phalak, and Sw aro op Ghosh. AltGraph: Redesigning quantum circuits using generative graph mo dels for eﬃcient optimization. In Pr o c e e dings of the Gr e at L akes Symp osium on VLSI (GLSVLSI) , pages 44–49. A CM, 2024. [17] merileijona. quan tum-circuits-8k (h ugging face dataset card). https:// huggingface.co/datasets/merileijona/quantum- circuits- 8k . Ac- cessed 2026-02-27. [18] Shlomo Kashani. Quan tumLLMInstruct: A 500k LLM instruction- tuning dataset with problem-solution pairs for quan tum computing. arXiv pr eprint arXiv:2412.20956 , 2024. [19] BoltzmannEn tropy . Quan tumllminstruct (hugging face dataset card). https://huggingface.co/datasets/BoltzmannEntropy/ QuantumLLMInstruct . Accessed 2026-02-27. 18 [20] R ui Y ang, Y ue Gu, Zihao W ang, Y e Liang, T ongy ang Li, et al. QCir- cuitBenc h: A large-scale dataset for b enc hmarking quantum algorithm design. In NeurIPS 2025 Datasets and Benchmarks T r ack , 2025. [21] Qiskit. granite-3.2-8b-qiskit (h ugging face model card). https:// huggingface.co/Qiskit/granite- 3.2- 8b- qiskit . Accessed 2026-02- 27. [22] Qiskit. Qw en2.5-co der-14b-qiskit (hugging face mo del card). https: //huggingface.co/Qiskit/Qwen2.5- Coder- 14B- Qiskit . Accessed 2026-02-27. [23] Nicolas Dupuis, A tin Tiw ari, Y oussef Mroueh, David Kremer, Ismael F aro, and Juan Cruz-Benito. Quan tum v eriﬁable rew ards for post- training qiskit co de assistant. arXiv pr eprint arXiv:2508.20907 , 2025. [24] Zhiw ei Chen and Hao T ang. UDiTQC: U-Net-style diﬀusion transformer for quantum circuit syn thesis. arXiv pr eprint arXiv:2501.16380 , 2025. [25] Lin us Jern, V alter Uotila, Cong Y u, and Bo Zhao. Agen t-Q: Fine-tuning large language models for quantum circuit generation and optimization. In 2025 IEEE International Confer enc e on Quantum Computing and Engine ering (QCE) . IEEE, 2025. [26] Daniel Barta, Darya Martyniuk, Johannes Jung, and Adrian P aschk e. Lev eraging diﬀusion mo dels for parameterized qu an tum circuit genera- tion. In 2025 IEEE International Confer enc e on Quantum Computing and Engine ering (QCE) . IEEE, 2025. [27] Cong Y u, Linus Jern, V alter Uotila, Bo Zhao, et al. QUASAR: Quantum assem bly co de generation using tool-augmented LLMs via agen tic RL. arXiv pr eprint arXiv:2510.00967 , 2025. [28] Collin Beaudoin and Swaroop Ghosh. Q-F usion: Diﬀusing quan tum circuits. arXiv pr eprint arXiv:2504.20794 , 2025. [29] Florian Fürrutter, Zohim Chandani, Ikk o Hamamura, Hans J. Briegel, and Gorka Muñoz-Gil. Syn thesis of discrete-con tinuous quantum circuits with m ultimo dal diﬀusion mo dels. arXiv pr eprint arXiv:2506.01666 , 2025. [30] Zhenxiao F u, F an Chen, and Lei Jiang. QAgent: An LLM-based m ulti- agen t system for autonomous Op enQASM programming. arXiv pr eprint arXiv:2508.20134 , 2025. 19 [31] Ben yucong. graph-data-quan tum-rl (h ugging face dataset card). https: //huggingface.co/datasets/Benyucong/graph- data- quantum- rl . A ccessed 2026-02-27. [32] merileijona. quantumgpt-124m (hugging face mo del card). https: //huggingface.co/merileijona/quantumgpt- 124m . Accessed 2026- 02-27. [33] Zhihong Shao, P eiyi W ang, Qihao Zh u, Runxin Xu, Junxiao Song, Mingc huan Zhang, Y. K. Li, Y. W u, and Da ya Guo. DeepSeekMath: Pushing the limits of mathematical reasoning in op en language mo dels. arXiv pr eprint arXiv:2402.03300 , 2024. [34] Ben yucong. sft_quantum_circuit_gen_4b (h ugging face mo del card). https://huggingface.co/Benyucong/sft_quantum_circuit_ gen_4B . Accessed 2026-02-27. [35] Ben yucong. rl_quan tum_4b (hugging face mo del card). https:// huggingface.co/Benyucong/rl_quantum_4b . Accessed 2026-02-27. [36] Floki00. qc_unitary_3qubit (hugging face model card). https:// huggingface.co/Floki00/qc_unitary_3qubit . Accessed 2026-02-27. [37] Ali Ja v adi-Abhari, Matthew T reinish, Kevin Krsulic h, Christopher J. W o od, Jak e Lishman, Julien Gacon, Simon Martiel, Paul D. Na- tion, Lev S. Bishop, Andrew W. Cross, Blak e R. Johnson, and Ja y M. Gam b etta. Quantum computing with Qiskit. arXiv pr eprint arXiv:2405.08810 , 2024. [38] Ed Y ounis, Costin Iancu, Wim Lavrijsen, Marc Davis, Ethan Smith, and USDOE. BQSKit: Berk eley quantum synthesis to olkit. https: //bqskit.lbl.gov , 2021. La wrence Berk eley National Lab oratory . OSTI:1785933. [39] Gush u Li, Y ufei Ding, and Y uan Xie. T ackling the qubit mapping problem for NISQ-era qu an tum devices. In Pr o c e e dings of the 24th International Confer enc e on A r chite ctur al Supp ort for Pr o gr amming L anguages and Op er ating Systems (ASPLOS) , pages 1001–1014, 2019. 20

Generative AI for Quantum Circuits and Quantum Code: A Technical Review and Taxonomy

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment