Benchmarking Quantum Computers: Towards a Standard Performance Evaluation Approach
The technological development of increasingly larger quantum processors on different quantum platforms raises the problem of how to fairly compare their performance, known as quantum benchmarking of quantum processors. This is a challenge that computer scientists have already faced when comparing classical processors, leading to the development of various mathematical tools to address it, but also to the identification of the limits of this problem. In this work, we briefly review the most important aspects of both classical processor benchmarks and the metrics comprising them, providing precise definitions and analyzing the quality attributes that they should exhibit. Subsequently, we analyze the intrinsic properties that characterize the paradigm of quantum computing and hinder the naive transfer of strategies from classical benchmarking. However, we can still leverage some of the lessons learned such as the quality attributes of a \textit{good} benchmark. Additionally, we review some of the most important metrics and benchmarks for quantum processors proposed in the literature, assessing what quality attributes they fulfill. Finally, we propose general guidelines for quantum benchmarking. These guidelines aim to pave the way for establishing a roadmap towards standardizing the performance evaluation of quantum devices, ultimately leading to the creation of an organization akin to the Standard Performance Evaluation Corporation (SPEC).
💡 Research Summary
The paper tackles the pressing need for a systematic way to compare the performance of quantum processors, drawing lessons from the long‑standing field of classical computer benchmarking. It begins with a concise history of classical benchmarks, highlighting how organizations such as SPEC and TPC established a set of quality attributes—relevance, reproducibility, fairness, verifiability, and usability—that define a “good” benchmark. These attributes ensure that a benchmark faithfully reflects real‑world workloads, yields consistent results, avoids bias toward particular architectures, can be independently checked, and is practical to run.
The authors then turn to quantum computing and enumerate the intrinsic properties that make a naïve transfer of classical benchmarking strategies impossible. Quantum devices are fundamentally probabilistic, suffer from decoherence and gate errors, have hardware‑specific gate sets, and are limited in qubit count. Consequently, any benchmark must explicitly model noise, provide statistical confidence intervals, and remain meaningful as the number of qubits scales. Moreover, quantum workloads need to capture circuit depth, entanglement, and other uniquely quantum resources.
A survey of existing quantum performance metrics follows. The authors examine Quantum Volume (QV), Random Circuit Sampling / Cross‑Entropy Benchmark (XEB), CLOPS (Classical Operations Per Second), and various error‑mitigation benchmarks. Each is evaluated against the five quality attributes. QV scores well on relevance and scalability but struggles with reproducibility under varying noise conditions and with fairness across different gate sets. XEB offers excellent verifiability and explicit noise modeling, yet its high computational cost and complex post‑processing hurt usability. CLOPS is easy to use and fairly hardware‑agnostic, but it does not directly reflect algorithmic usefulness, limiting relevance. Error‑mitigation benchmarks improve verifiability and reproducibility but add implementation complexity that reduces usability.
Based on this analysis, the paper proposes a set of concrete guidelines for designing quantum benchmarks: (1) Clear objective definition – explicitly state which performance aspect (speed, accuracy, energy, etc.) is being measured and map it to a concrete metric; (2) Hardware‑agnostic workload design – define a canonical circuit or task that can be translated to any gate set, with an optional noise‑profile layer; (3) Standardized initialization and measurement protocols – fix state preparation, measurement basis, and sample size to guarantee reproducibility; (4) Transparent verification procedures – publish reference data, independent verification tools, and detailed reporting formats; (5) Usability focus – provide open‑source implementations, automation scripts, and cloud‑based execution environments to lower the entry barrier.
Finally, the authors advocate for the creation of an international standardization body, the Standard Performance Evaluation for Quantum Computers (SPEQC), modeled after SPEC. SPEQC would be responsible for maintaining a curated suite of quantum benchmarks, issuing certifications, managing a public repository of benchmark results, and fostering collaboration among academia, industry, and government. By establishing such an organization, the quantum computing community can avoid the pitfalls of ad‑hoc, vendor‑specific “benchmark wars,” ensure that performance claims are comparable and trustworthy, and ultimately accelerate the maturation of quantum technologies.
Comments & Academic Discussion
Loading comments...
Leave a Comment