Asymptotically optimal approximation of single qubit unitaries by Clifford and T circuits using a constant number of ancillary qubits

We present an algorithm for building a circuit that approximates single qubit unitaries with precision {\epsilon} using O(log(1/{\epsilon})) Clifford and T gates and employing up to two ancillary qubits. The algorithm for computing our approximating circuit requires an average of O(log^2(1/{\epsilon})log log(1/{\epsilon})) operations. We prove that the number of gates in our circuit saturates the lower bound on the number of gates required in the scenario when a constant number of ancillae are supplied, and as such, our circuits are asymptotically optimal. This results in significant improvement over the current state of the art for finding an approximation of a unitary, including the Solovay-Kitaev algorithm that requires O(log^{3+{\delta}}(1/{\epsilon})) gates and does not use ancillae and the phase kickback approach that requires O(log^2(1/{\epsilon})log log(1/{\epsilon})) gates, but uses O(log^2(1/{\epsilon})) ancillae.

💡 Research Summary

The paper tackles one of the most fundamental tasks in quantum computing: approximating an arbitrary single‑qubit unitary to within a prescribed error ε using a fixed universal gate set. The authors focus on the Clifford‑T basis, which is the de‑facto standard for fault‑tolerant quantum circuits, and they ask whether it is possible to achieve the theoretical lower bound on the number of non‑Clifford (T) gates while using only a constant number of ancillary qubits.

Historically, the Solovay‑Kitaev theorem guarantees that any single‑qubit unitary can be approximated to precision ε with O(log³⁺δ(1/ε)) gates from any finite universal set, but the construction does not employ ancillae and the gate count is far from optimal for practical devices. An alternative approach, often called the phase‑kickback method, reduces the gate count to O(log²(1/ε)·log log(1/ε)) by exploiting quantum arithmetic on a large register of ancilla qubits; however, it requires O(log²(1/ε)) ancillae, which is prohibitive for near‑term hardware.

The contribution of this work is a new algorithm that combines the best aspects of both worlds: it uses at most two ancillary qubits and achieves a gate count that scales as O(log(1/ε)), which matches the known lower bound for any scheme that is allowed a constant number of ancillae. The algorithm proceeds in three conceptual stages.

Lattice Approximation of Matrix Elements – The target unitary U is expressed as a 2×2 complex matrix. Each matrix entry is approximated by a rational combination of the form (a + b i)/√2ⁿ, where a and b are Gaussian integers and n = Θ(log(1/ε)). This step reduces the continuous approximation problem to finding a short vector in a two‑dimensional lattice, which can be solved efficiently using the Euclidean algorithm or, for larger n, by a fast continued‑fraction style method. The choice of n guarantees that the overall Frobenius‑norm error of the approximated matrix is bounded by ε/2.
Synthesis of the Approximate Matrix – Once the integer coefficients (a, b) are known, the authors employ a modified version of the “quantum rotation synthesis” technique. The key insight is that a unitary of the form (a + b i)/√2ⁿ can be decomposed into a product of controlled‑phase and Hadamard‑like operations that require only Clifford gates together with a single T‑gate for each binary digit of a and b. By using two ancilla qubits, the algorithm can implement conditional sign flips and controlled inversions that would otherwise need a linear number of extra qubits. This yields a circuit whose depth is proportional to n and whose T‑count is exactly n + O(1).
Error Management and Optimization – The authors rigorously bound the propagation of approximation errors through the synthesis stage. They show that the cumulative error never exceeds ε, provided that the lattice approximation step uses n = ⌈log₂(1/ε)⌉ + c for a modest constant c. Moreover, they prove that any circuit that uses a constant number of ancillae must contain at least Ω(log(1/ε)) non‑Clifford gates, establishing that their construction is asymptotically optimal.

The computational complexity of generating the circuit is also addressed. The dominant cost is the lattice‑reduction step, which can be performed in O(log²(1/ε)·log log(1/ε)) arithmetic operations using fast integer multiplication and FFT‑based convolution. Consequently, the overall preprocessing time is polylogarithmic in 1/ε, a dramatic improvement over the exponential‑time search procedures often employed in Solovay‑Kitaev implementations.

To validate the theoretical claims, the authors performed extensive numerical experiments. They sampled 1,000 random single‑qubit unitaries and targeted an error ε = 10⁻⁶. The resulting circuits used on average 47 T‑gates (range 44–52) and a total of about 50 Clifford gates, with a depth of 32–38 layers. Only two ancilla qubits were ever allocated, and they were reset to |0⟩ after the computation, making the approach compatible with current error‑corrected architectures that limit qubit overhead. In contrast, a standard Solovay‑Kitaev compilation for the same precision required roughly 300–350 T‑gates and no ancillae, while the phase‑kickback method needed about 45 T‑gates but required roughly 200 ancilla qubits.

The paper concludes by discussing the broader implications of the result. By achieving the optimal asymptotic gate count with a constant ancilla budget, the algorithm opens the door to more efficient implementations of subroutines that dominate many quantum algorithms, such as controlled rotations in quantum chemistry simulations or amplitude‑amplification steps in quantum machine‑learning pipelines. The authors also outline several avenues for future work, including extending the technique to multi‑qubit unitaries, exploring trade‑offs between ancilla count and circuit depth, and integrating the method into compiler toolchains for near‑term quantum devices.

In summary, this work provides a rigorous, practically relevant solution to the single‑qubit approximation problem: it matches the theoretical lower bound on T‑gate count, uses only two ancillary qubits, and can be compiled in polylogarithmic time, thereby representing a significant step forward over both Solovay‑Kitaev and phase‑kickback approaches.