Scalable Multi-QPU Circuit Design for Dicke State Preparation: Optimizing Communication Complexity and Local Circuit Costs

Scalable Multi-QPU Circuit Design for Dicke State Preparation: Optimizing Communication Complexity and Local Circuit Costs
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Preparing large-qubit Dicke states is of broad interest in quantum computing and quantum metrology. However, the number of qubits available on a single quantum processing unit (QPU) is limited – motivating the distributed preparation of such states across multiple QPUs as a practical approach to scalability. In this article, we investigate the distributed preparation of $n$-qubit $k$-excitation Dicke states $D(n,k)$ across a general number $p$ of QPUs, presenting a distributed quantum circuit (each QPU hosting approximately $\lceil n/p \rceil$ qubits) that prepares the state with communication complexity $O(p \log k)$, circuit size $O(nk)$, and circuit depth $O\left(p^2 k + \log k \log (n/k)\right)$. To the best of our knowledge, this is the first construction to simultaneously achieve logarithmic communication complexity and polynomial circuit size and depth. We also establish a lower bound on the communication complexity of $p$-QPU distributed state preparation for a general target state. This lower bound is formulated in terms of the canonical polyadic rank (CP-rank) of a tensor associated with the target state. For the special case $p = 2$, we explicitly compute the CP-rank corresponding to the Dicke state $D(n,k)$ and derive a lower bound of $\lceil\log (k + 1)\rceil$, which shows that the communication complexity of our construction matches this fundamental limit.


💡 Research Summary

The paper tackles the problem of preparing large‑scale Dicke states D(n,k) in a distributed quantum computing (DQC) setting, where the limited qubit capacity of a single quantum processing unit (QPU) makes monolithic preparation infeasible. Existing approaches either minimise inter‑QPU communication at the expense of exponential local circuit size, or keep local circuits polynomial while incurring polynomial communication cost. Both are unsuitable for practical large‑scale implementations because inter‑QPU communication is the dominant bottleneck in current hardware.

The authors propose a new distributed circuit that simultaneously achieves logarithmic communication complexity and polynomial local resources. The system consists of p QPUs, each allocated roughly ⌈n/p⌉ data qubits (plus a small number of ancillas). The construction proceeds in two phases:

  1. Local Dicke preparation – On each QPU i, a local Dicke unitary U_{n_i,k_i} is applied, generating a partial Dicke state D(n_i,k_i). The unitary is implemented using the optimal algorithm from Lemma 1 (cited from

Comments & Academic Discussion

Loading comments...

Leave a Comment