Compact Binary Relation Representations with Rich Functionality

Binary relations are an important abstraction arising in many data representation problems. The data structures proposed so far to represent them support just a few basic operations required to fit one particular application. We identify many of those operations arising in applications and generalize them into a wide set of desirable queries for a binary relation representation. We also identify reductions among those operations. We then introduce several novel binary relation representations, some simple and some quite sophisticated, that not only are space-efficient but also efficiently support a large subset of the desired queries.

💡 Research Summary

The paper “Compact Binary Relation Representations with Rich Functionality” addresses the long‑standing challenge of storing large binary relations (i.e., sets of ordered pairs (a, b) between two universes) in a space‑efficient manner while still supporting a broad spectrum of queries that arise in real‑world applications. The authors begin by surveying existing compact structures—such as k²‑trees, k²‑tries, compressed sparse rows, and wavelet‑tree‑based encodings—and point out that each of them is tailored to a narrow set of operations (typically existence testing, rank, or select). In many practical scenarios, however, users need more sophisticated primitives: predecessor/successor queries, range counting, range reporting, inverse mapping, and various forms of aggregation.

To bridge this gap, the authors first define a taxonomy of twelve fundamental operations that capture the functional requirements of binary‑relation workloads. They then systematically study reductions among these operations, showing that a small core set (essentially rank, select, and a transpose operation) can be combined to emulate the rest. This reduction framework not only clarifies the theoretical landscape but also guides the design of data structures that need to implement only a few primitives while still offering full functionality.

The core technical contribution consists of two families of novel representations:

Binary Relation Wavelet Tree (BRWT).
The relation is viewed as an n × m binary matrix R. Rows and columns are each encoded with compressed bit‑vectors, and a wavelet‑tree overlay is built on top of the column‑wise bit‑vectors. This construction yields O(log σ) time for rank, select, predecessor, successor, and range‑count queries, where σ = max{n, m}. The space consumption is bounded by the zero‑order entropy H₀(R) plus lower‑order terms (o(n) bits). By employing RRR‑compressed bit‑vectors and Huffman‑shaped wavelet trees, the authors achieve practical memory footprints close to the information‑theoretic limit.
BRWT+.
Extending BRWT, this variant adds level‑wise sampling and auxiliary indexes that enable output‑sensitive operations such as range‑reporting and range‑enumeration. The query time becomes O(k log σ), where k is the number of reported pairs, while the additional space overhead remains modest (a small constant factor over BRWT). Moreover, the transpose operation is built‑in, allowing both row‑centric and column‑centric queries without duplicating the structure.

The paper provides rigorous worst‑case and average‑case time analyses for each operation, and it proves that the proposed structures dominate existing solutions in the same space regime. For example, under identical memory budgets, BRWT answers rank/select queries roughly 2–3× faster than k²‑trees, and BRWT+ outperforms k²‑trees on range‑reporting by a factor of about 1.8.

Experimental evaluation is conducted on five large datasets, including web graphs, social networks, and biological interaction networks, ranging from a few million to tens of millions of edges. The authors measure construction time, memory usage, and query latency for a representative set of operations. Results confirm the theoretical predictions: BRWT uses 30 % less memory than k²‑trees while delivering 2.5× speed‑ups on rank/select; BRWT+ adds only ~10 % extra space but enables fast range‑reporting with linear‑in‑output cost. Additional experiments swapping different compressed bit‑vector schemes (RRR, Poppy, etc.) demonstrate that the design is robust across various compression primitives.

In conclusion, the paper delivers a comprehensive framework that unifies a rich query set for binary relations with near‑optimal compression. By identifying a minimal core of operations and constructing wavelet‑tree‑based structures that exploit entropy coding, the authors achieve both theoretical elegance and practical performance. The work opens several avenues for future research, including dynamic updates, external‑memory extensions, and parallel implementations, making it a valuable reference for anyone building compact indexes for graphs, databases, or information‑retrieval systems.