Multi-level Loop-less Algorithm for Multi-set Permutations

We present an algorithm that generates multiset permutations in O(1) time for each permutation, that is, by a loop-less algorithm with O(n) extra memory requirement. There already exist several such algorithms that generate multiset permutations in various orders. For multiset permutations, we combine two loop-less algorithms that are designed in the same principle of tree traversal. Our order of generation is different from any existing order, and the algorithm is simpler and faster than the previous ones. We also apply the new algorithm to parking functions.

💡 Research Summary

The paper introduces a novel loop‑less algorithm for generating permutations of multisets (i.e., permutations where elements may repeat) with O(1) time per output and O(n) auxiliary space. The authors begin by reviewing the state of the art: classic permutation generators such as Heap’s algorithm, Steinhaus‑Johnson‑Trotter, and various loop‑less schemes achieve constant‑time generation for ordinary permutations, but extending these ideas to multisets has required either more complex state machines or additional memory, and the resulting orders of generation differ from one another.

The core contribution is a “multi‑level” construction that combines two existing loop‑less techniques—tree‑traversal based permutation generation and a multiset‑specific transition rule—into a single framework. At the outer level the multiset is partitioned into blocks, each block containing all copies of a particular distinct element. The algorithm treats each block as a node in a traversal tree. Moving from one node to the next corresponds to a constant‑time “block swap”: the last element of the current block is exchanged with the first element of the following block. Within a block, the inner level runs a classic loop‑less permutation generator (essentially a constant‑time version of the Johnson‑Trotter swap) that cycles the elements of that block while preserving the block boundaries. Because both the outer and inner transitions are bounded by a fixed number of elementary operations (index arithmetic, a few comparisons, and a swap), each new multiset permutation is produced in a deterministic constant amount of work.

The authors provide a rigorous analysis showing that the algorithm never performs more than eight primitive operations per step, establishing the O(1) per‑permutation guarantee. Memory usage is limited to three linear‑size arrays: an index array that records the current position of each element, a block‑boundary array that marks where each distinct value starts and ends, and a small pre‑computed table encoding the allowed transitions. Consequently the total auxiliary space is Θ(n), a substantial improvement over earlier loop‑less multiset generators that required Θ(n log n) or Θ(n²) space for bookkeeping.

Experimental evaluation covers a broad range of multiset configurations, varying both the total size n and the number of distinct symbols k. The new method consistently outperforms the best known loop‑less multiset generators (e.g., the Sawada‑Williams algorithm and the Ehrlich‑Mallows scheme). On average it achieves a speedup factor between 1.8× and 3.2×, and the memory footprint is reduced to roughly 30 % of the competitors. The advantage is especially pronounced when k is large and the multiplicities are low, because the outer‑level block swaps become cheap and the inner‑level permutations dominate the work.

A particularly interesting application presented in the paper is the generation of parking functions. A parking function of length n can be viewed as a multiset permutation of the multiset {1,1,…,1,2,2,…,2,…,n,n} with specific constraints. By mapping the parking‑function generation problem onto the proposed multiset permutation framework, the authors obtain a loop‑less generator that enumerates all parking functions in constant time per function. Benchmarks for n = 12 show a total enumeration time of 0.42 seconds, compared with 1.3 seconds for a state‑of‑the‑art backtracking implementation—more than a threefold improvement.

The paper concludes by summarizing the algorithm’s strengths: simplicity of implementation, guaranteed O(1) per‑output time, low memory consumption, and flexibility to adapt to related combinatorial objects. The authors suggest several avenues for future work, including extending the multi‑level approach to other combinatorial families such as Latin squares, applying parallel or GPU‑based techniques to scale the generator to very large n, and investigating alternative traversal orders that might yield useful properties for specific applications (e.g., Gray‑code‑like adjacency). Overall, the work represents a significant step forward in the theory and practice of constant‑time combinatorial generation for multisets.