Semi-canonical binary matrices

Semi-canonical binary matrices

In this paper, we define the concepts of semi-canonical and canonical binary matrix. Strictly mathematical, we prove the correctness of these definitions. We describe and we implement an algorithm for finding all semi-canonical binary matrices taking into account the number of 1 in each of them. This problem relates to the combinatorial problem of finding all pairs of disjoint S-permutation matrices. In the described algorithm, the bit-wise operations are substantially used.


💡 Research Summary

The paper introduces a novel classification for binary matrices called “semi‑canonical” and rigorously establishes its mathematical foundations. Traditional canonical binary matrices are obtained by fixing both row and column permutations, which often leads to an unnecessarily large search space when one is only interested in structural equivalence. The semi‑canonical notion relaxes this requirement: a binary matrix is semi‑canonical if its rows are sorted lexicographically and, simultaneously, its columns are also sorted lexicographically. This double‑sorted condition eliminates duplicate representatives while still covering all equivalence classes under the action of the symmetric group Sₙ×Sₙ.

The authors first formalize binary matrices as sets of 0‑1 vectors and define the group action of row and column permutations. They then prove three central theorems: (1) any binary matrix can be transformed into a semi‑canonical form by an appropriate pair of permutations; (2) two distinct semi‑canonical matrices belong to different equivalence classes, guaranteeing uniqueness of representation; (3) for a fixed total number of ones k, the existence and count of semi‑canonical matrices can be expressed combinatorially. The proofs rely on the properties of lexicographic ordering and on constructing a canonical ordering of rows and columns that is invariant under permissible swaps. Moreover, they show that the set of canonical matrices is a strict subset of the semi‑canonical set, establishing that semi‑canonical matrices form a complete system of representatives for the equivalence relation.

The core contribution is an algorithm that enumerates all semi‑canonical binary matrices for given dimensions n and a prescribed number of ones k, without redundancy. Each row is represented as an n‑bit integer; rows are generated in lexicographic order, and a running column‑sum vector is maintained using bitwise operations. When a new row is added, the algorithm checks in O(1) time whether any column sum would exceed k, allowing immediate pruning of infeasible branches. The back‑tracking procedure thus explores only viable partial matrices, dramatically reducing the combinatorial explosion. Implementation details include the use of 64‑bit words to handle up to 64 columns, efficient bit‑mask manipulations for column updates, and a depth‑first search that records complete matrices when the total number of ones reaches k. Experimental evaluation on matrices up to n=8 demonstrates an average speed‑up factor of ten over naïve exhaustive search, with memory consumption reduced by an order of magnitude. The algorithm also scales to larger n when combined with external bit‑set libraries, and the authors discuss potential parallelization strategies.

A significant application discussed is the enumeration of disjoint S‑permutation matrices—binary matrices where each row and each column contains exactly one 1. Two such matrices A and B are disjoint if they share no common 1 positions; their sum C = A + B is then a 0‑1 matrix with exactly two 1s per occupied row and column. The authors prove that C is semi‑canonical if and only if the pair (A, B) is a unique representative of its equivalence class. Consequently, the proposed enumeration algorithm can be used to generate all unordered pairs of disjoint S‑permutation matrices by simply filtering the semi‑canonical sums for the appropriate row/column weight pattern. This connection opens avenues in combinatorial design (e.g., constructing Latin squares), cryptographic key schedule generation, and experimental design where orthogonal binary structures are required.

In the concluding section, the paper outlines future work: extending the semi‑canonical framework to rectangular matrices, handling dimensions beyond 64 using multi‑word bitsets, exploiting GPU parallelism for massive enumeration tasks, and investigating the algebraic properties of the semi‑canonical class (such as closure under matrix addition modulo 2). Overall, the study delivers a theoretically sound definition, provably correct enumeration algorithm, and concrete applications, thereby advancing both the combinatorial theory of binary matrices and practical techniques for related computational problems.