QEMI: A Quantum Software Stacks Testing Framework via Equivalence Modulo Inputs
As quantum algorithms and hardware continue to evolve, ensuring the correctness of the quantum software stack (QSS) has become increasingly important. However, testing QSSes remains challenging due to the oracle problem, i.e., the lack of a reliable ground truth for expected program behavior. Existing metamorphic testing approaches often rely on equivalent circuit transformations, backend modifications, or parameter tuning to address this issue. In this work, inspired by Equivalence Modulo Inputs (EMI), we propose Quantum EMI (QEMI), a new testing approach for QSSes. Our key contributions include: (1) a random quantum program generator that produces code with dead code based on quantum control-flow structures, and (2) an adaptation of the EMI technique from classical compiler testing to generate variants by removing dead code. By comparing the behavior of these variants, we can detect potential bugs in QSS implementations. We applied QEMI to Qiskit, Q#, and Cirq, and successfully identified 11 crash bugs and 1 behavioral inconsistency. QEMI expands the limited set of testing techniques available for quantum software stacks by going beyond structural transformations and incorporating semantics-preserving ones into quantum program analysis.
💡 Research Summary
The paper introduces QEMI, a novel testing framework for quantum software stacks (QSS) that adapts the classical Equivalence Modulo Inputs (EMI) technique to the quantum domain. The authors identify two fundamental obstacles in quantum stack testing: a shortage of realistic test programs and the oracle problem, i.e., the absence of a reliable ground truth for program behavior. Existing metamorphic testing tools (e.g., QuteFuzz, QDiff, MorphQ) largely rely on circuit‑level transformations or parameter tweaks, which do not fully exercise the high‑level APIs and language semantics of QSSes.
QEMI’s core idea is to generate random quantum programs that deliberately contain “dead code” – code that is guaranteed never to affect the observable output for a given initial quantum state. The dead code is inserted into quantum control‑flow constructs such as conditional branches, while loops, and switch‑case statements. By carefully preparing the quantum registers (e.g., setting a control qubit to |0⟩ when a branch executes only on |1⟩) the framework ensures that the inserted blocks are semantically irrelevant for the chosen input state.
The workflow consists of three stages:
-
Random Program Generation – A seed program is built using the APIs of Qiskit, Q#, and Cirq. The generator mixes classical and quantum control flow, gate operations, and library calls, thereby covering a broad portion of the stack’s surface area.
-
EMI Variant Creation – A static analysis pass identifies the pre‑defined dead‑code patterns (e.g.,
if_test_dead,while_dead,switch_dead) and removes them, producing an EMI‑equivalent variant. Because the removal is based on static patterns rather than dynamic execution, the approach sidesteps the probabilistic nature of quantum measurement that would otherwise make dead‑code detection unreliable. -
Execution and Behaviour Checking – Both the original program and its EMI variant are run on the same backend with identical configuration (shots, random seed, etc.). The resulting measurement distributions are compared using statistical tests such as chi‑square, KL‑divergence, or Kolmogorov‑Smirnov. A statistically significant divergence indicates a bug in the stack (crash, incorrect state handling, or distributional error).
To reduce the cost of repeated quantum executions, QEMI incorporates an early‑stop strategy: after each batch of shots the statistical distance is recomputed, and once a predefined confidence threshold is reached the experiment terminates. This yields an average speed‑up of about 53 % compared with a fixed‑shot baseline.
The authors evaluated QEMI on three widely used QSSes—IBM’s Qiskit, Microsoft’s Q#, and Google’s Cirq. Across multiple versions they discovered twelve distinct bugs: eleven crashes (e.g., null‑reference exceptions, improper resource deallocation) and one subtle behavioral inconsistency where the output distribution deviated from the expected theoretical distribution. Notably, several of these bugs were not detected by prior tools, demonstrating that semantics‑preserving dead‑code removal uncovers a complementary class of defects.
The paper also discusses threats to validity. The current implementation only supports a limited set of dead‑code patterns; new language constructs would require manual pattern extensions. Because quantum programs are inherently probabilistic, extremely low‑probability paths could occasionally activate “dead” code, leading to false positives. Moreover, differences between simulators and real quantum hardware (noise, decoherence) may affect the statistical comparison.
In related work, the authors position QEMI alongside existing quantum testing approaches. While tools like QuteFuzz focus on differential testing of randomly generated circuits, and MorphQ leverages circuit equivalence transformations, QEMI is the first to bring program‑level semantic equivalence (via EMI) into quantum testing. This expands the toolbox for QSS verification and opens avenues for future research, such as automated discovery of dead‑code patterns, integration of dynamic probabilistic models, and extension to fault‑injection or error‑mitigation scenarios.
In conclusion, QEMI demonstrates that EMI can be successfully adapted to quantum software stacks, providing a practical, statistically sound method for uncovering both crash‑type and subtle distributional bugs. The framework’s early‑stop optimization makes it efficient enough for regular regression testing, and its open‑source implementation (available on GitHub) invites the community to extend and refine the approach.
Comments & Academic Discussion
Loading comments...
Leave a Comment