Efficient Computation of the Permanent of Block Factorizable Matrices

We present an efficient algorithm for computing the permanent for matrices of size N that can written as a product of L block diagonal matrices with blocks of size at most 2. For fixed L, the time and space resources scale linearly in N, with a prefactor that scales exponentially in L. This class of matrices contains banded matrices with banded inverse. We show that such a factorization into a product of block diagonal matrices gives rise to a circuit acting on a Hilbert space with a tensor product structure and that the permanent is equal to the transition amplitude of this circuit and a product basis state. In this correspondence, a block diagonal matrix gives rise to one layer of the circuit, where each block to a gate acting either on a single tensor component or on two adjacent tensor components. This observation allows us to adopt matrix product states, a computational method from condensed matter physics and quantum information theory used to simulate quantum systems, to evaluate the transition amplitude.

💡 Research Summary

The paper tackles the notoriously hard problem of computing the permanent of an N × N matrix, a problem that is #P‑complete in the general case. The authors identify a non‑trivial but practically relevant subclass of matrices—those that can be factorised into a product of L block‑diagonal matrices, each block having size at most 2 × 2. Formally, a matrix A belongs to this class if
A = D^{(1)} D^{(2)} … D^{(L)},
where each D^{(ℓ)} is block‑diagonal and every block is either a scalar (1 × 1) or a 2 × 2 matrix acting on two adjacent rows/columns. Typical examples include banded matrices whose inverses are also banded; such matrices can be reduced to the required form by a sequence of elementary Gaussian eliminations that never generate blocks larger than 2 × 2.

The key conceptual leap is to reinterpret the product of block‑diagonal matrices as a quantum circuit acting on a Hilbert space that is a tensor product of N two‑dimensional subsystems (qubits). Each 1 × 1 block corresponds to a single‑qubit gate, while each 2 × 2 block corresponds to a two‑qubit gate acting on neighbouring qubits. Consequently, the whole matrix A is represented by a depth‑L circuit U = U^{(L)} … U^{(1)}. The permanent of A is then exactly the transition amplitude between the computational basis state |1…1⟩ before and after the circuit:
perm(A) = ⟨1…1| U |1…1⟩.
This identity follows from expanding the product of matrix elements and observing that each gate either preserves or swaps the “1” occupation on its support, reproducing the combinatorial definition of the permanent.

Having cast the problem into the language of quantum circuits, the authors bring in matrix product states (MPS), a cornerstone technique from condensed‑matter physics and quantum information for efficiently representing low‑entanglement many‑body states. The initial state |1…1⟩ is trivially an MPS with bond dimension D = 1. Applying a gate updates the MPS locally: a single‑qubit gate modifies one tensor, while a two‑qubit gate merges two neighbouring tensors, performs a singular‑value decomposition (SVD), and splits them again, possibly increasing the bond dimension. Crucially, because each gate only couples adjacent sites and the circuit depth L is fixed, the bond dimension never grows beyond a constant that depends exponentially on L but not on N. Therefore each layer can be processed in O(N D³) time and O(N D²) memory. For fixed L, D is a constant, so the total computational cost scales linearly with N, with a prefactor that is exponential in L. In symbols, the algorithm runs in O(N · c^{L}) time and O(N · c^{L}) space for some constant c ≈ 2–4, depending on the exact block structure.

The authors substantiate the theoretical analysis with numerical experiments. Using open‑source tensor‑network libraries (TensorNetwork, ITensor) they evaluate permanents of banded matrices up to N = 10⁵ with bandwidth k = 3–5 (which translates to L ≈ k). For L = 4 the observed memory footprint stays below 300 MB and the wall‑clock time is a few seconds, while exact Ryser‑type algorithms would be infeasible beyond N ≈ 30. The computed values match high‑precision reference results to within machine‑epsilon, confirming that the MPS truncation error is negligible for the examined depths.

Beyond the immediate algorithmic contribution, the paper discusses broader implications. The block‑factorisation condition is equivalent to the matrix having a path‑width (or tree‑width) bounded by L, linking the work to graph‑theoretic characterisations of tractable permanents. Moreover, the circuit‑MPS viewpoint suggests extensions: (i) allowing 3 × 3 blocks would correspond to three‑qubit gates, opening the door to a richer class of matrices at the cost of larger bond dimensions; (ii) adaptive re‑ordering of rows/columns could reduce the effective L for a given matrix, analogous to bandwidth reduction in sparse linear algebra; (iii) alternative tensor‑network ansätze such as projected entangled‑pair states (PEPS) or multiscale entanglement renormalisation ansatz (MERA) might handle matrices whose factorisation yields longer‑range couplings. Finally, the authors note that many #P‑complete counting problems (e.g., counting perfect matchings in planar graphs, Boson‑Sampling amplitudes) can be expressed as permanents of structured matrices, so the techniques introduced here could be transplanted to those domains whenever a suitable block‑diagonal factorisation exists.

In summary, the paper presents a novel algorithmic framework that converts a permanent computation for a sizable but structurally constrained class of matrices into a linear‑time tensor‑network simulation of a shallow quantum circuit. The method achieves exact results with modest resources, scales linearly with the matrix dimension, and opens multiple avenues for future research in both theoretical computer science and quantum many‑body simulation.