Exploiting Hidden Structure in Selecting Dimensions that Distinguish Vectors

Exploiting Hidden Structure in Selecting Dimensions that Distinguish   Vectors
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The NP-hard Distinct Vectors problem asks to delete as many columns as possible from a matrix such that all rows in the resulting matrix are still pairwise distinct. Our main result is that, for binary matrices, there is a complexity dichotomy for Distinct Vectors based on the maximum (H) and the minimum (h) pairwise Hamming distance between matrix rows: Distinct Vectors can be solved in polynomial time if H <= 2 ceil(h/2) + 1, and is NP-complete otherwise. Moreover, we explore connections of Distinct Vectors to hitting sets, thereby providing several fixed-parameter tractability and intractability results also for general matrices.


💡 Research Summary

The paper investigates the Distinct Vectors problem, a combinatorial feature‑selection task where one must delete as many columns as possible from a data matrix while keeping all rows pairwise distinct. Formally, given an n × d matrix S over a finite alphabet Σ and an integer k, the question is whether there exists a set K of at most k columns such that the submatrix S|K still has n distinct rows. The problem is known to be NP‑hard, but the authors provide a fine‑grained analysis based on two structural parameters: the minimum (h) and maximum (H) pairwise Hamming distances among the rows.

Binary matrices (|Σ| = 2).
The central contribution is a complete complexity dichotomy for binary inputs. If the maximum distance satisfies H ≤ 2⌈h/2⌉ + 1, the instance can be solved in polynomial time; otherwise it is NP‑complete. The polynomial‑time algorithm exploits extremal set‑theoretic results (Sperner’s theorem, Erdős–Ko–Rado) to show that when H is close to h the rows form a highly regular family of subsets. By iteratively selecting columns that appear in the smallest symmetric differences, the algorithm constructs a distinguishing column set in O(n·d) time. Conversely, when H exceeds the bound, the authors give a reduction from Distance‑3 Independent Set. They encode a graph’s incidence matrix (each row has exactly two 1’s) and add a zero row, forcing h = 2 and H = 4. The reduction preserves the parameter k′ = n − k (the number of kept columns), establishing NP‑completeness and W


Comments & Academic Discussion

Loading comments...

Leave a Comment