Some linear-time algorithms for systolic arrays

We survey some results on linear-time algorithms for systolic arrays. In particular, we show how the greatest common divisor (GCD) of two polynomials of degree n over a finite field can be computed in time O(n) on a linear systolic array of O(n) cells; similarly for the GCD of two n-bit binary numbers. We show how n * n Toeplitz systems of linear equations can be solved in time O(n) on a linear array of O(n) cells, each of which has constant memory size (independent of n). Finally, we outline how a two-dimensional square array of O(n)* O(n) cells can be used to solve (to working accuracy) the eigenvalue problem for a symmetric real n* n matrix in time O(nS(n)). Here S(n) is a slowly growing function of n; for practical purposes S(n) can be regarded as a constant. In addition to their theoretical interest, these results have potential applications in the areas of error-correcting codes, symbolic and algebraic computations, signal processing and image processing.

💡 Research Summary

The paper presents a unified framework for achieving linear‑time solutions to several classic computational problems by exploiting the regular, pipelined nature of systolic arrays. A systolic array consists of a regular grid of simple processing elements (cells) that communicate only with their immediate neighbors, passing data in a fixed direction while performing local arithmetic. Because each cell stores only a constant amount of state, the total hardware footprint scales linearly with the number of cells, and the overall execution time scales with the length of the input stream rather than with the square or higher powers that dominate conventional sequential algorithms.

The first major contribution is an O(n)‑time algorithm for computing the greatest common divisor (GCD) of two polynomials of degree n over a finite field. The authors map the Euclidean algorithm onto a one‑dimensional systolic array: each cell holds the current remainder coefficients and the divisor’s leading coefficient, while the polynomial coefficients flow through the array from left to right. At each clock cycle a subtraction (or field division) is performed locally, and the updated remainder is passed to the next cell. Since the Euclidean algorithm’s number of division steps is bounded by the degree, the pipeline completes after exactly n cycles, yielding a linear‑time GCD computation with O(n) cells and O(1) memory per cell.

A parallel result is obtained for the GCD of two n‑bit binary integers. By treating each bit as a datum that streams through the same one‑dimensional array, the classic binary Euclidean algorithm is turned into a bit‑wise pipeline. Each cell stores the current carry/borrow information and executes the subtraction required by the Euclidean step. The bit stream traverses the array once, so the integer GCD also finishes in O(n) time with the same linear hardware budget.

The third contribution addresses the solution of n × n Toeplitz linear systems. Toeplitz matrices have constant diagonals, which permits a representation where each row (or column) can be generated by shifting a short vector. The authors construct a linear systolic array where each cell represents one diagonal element and maintains a running partial sum. As the right‑hand side vector and the matrix diagonals flow through the array, each cell updates its local contribution to the solution via simple add‑and‑multiply operations. Because each diagonal is processed exactly once, the whole system is solved after O(n) clock cycles, again using O(n) cells with constant per‑cell storage.

Finally, the paper sketches how a two‑dimensional square systolic array of size O(n) × O(n) can be employed to compute eigenvalues of a symmetric real n × n matrix to working accuracy. The authors outline a systolic implementation of a QR‑iteration or a Jacobi‑type method: each cell holds a small block of the matrix and performs the orthogonal transformations required for a single iteration step. The transformations are propagated across rows and columns in a wave‑front fashion, and after a modest number S(n) of global sweeps (where S(n) grows very slowly, effectively constant for practical sizes) the matrix converges to a near‑diagonal form, from which eigenvalues are read off. The overall time is O(n S(n)), dramatically better than the O(n³) cost of conventional algorithms, while still using only O(n²) processing elements, each with constant memory.

Across all four problem domains, the key insight is that the regular data‑flow of systolic arrays eliminates the need for random memory accesses and global control, allowing the algorithmic “depth” to be proportional to the input size rather than its square. This yields hardware designs that are area‑efficient, power‑efficient, and highly amenable to VLSI implementation. The authors emphasize practical relevance: the polynomial GCD algorithm can accelerate decoding of Reed–Solomon and other algebraic codes; the binary GCD is useful in cryptographic key generation; the Toeplitz solver maps directly to digital filter design and real‑time signal processing; and the eigenvalue engine can support image‑compression, principal‑component analysis, and scientific computing tasks where moderate‑size symmetric matrices arise. In summary, the paper demonstrates that systolic arrays, when carefully matched to the algebraic structure of a problem, can break traditional complexity barriers and deliver true linear‑time performance for a broad class of computationally intensive tasks.