Reducing Reconciliation Communication Cost with Compressed Sensing

We consider a reconciliation problem, where two hosts wish to synchronize their respective sets. Efficient solutions for minimizing the communication cost between the two hosts have been previously proposed in the literature. However, they rely on prior knowledge about the size of the set differences between the two sets to be reconciled. In this paper, we propose a method which can achieve comparable efficiency without assuming this prior knowledge. Our method uses compressive sensing techniques which can leverage the expected sparsity in set differences. We study the performance of the method via theoretical analysis and numerical simulations.

💡 Research Summary

The paper tackles the classic set reconciliation problem, where two parties each hold a large set of items and wish to make their sets identical while exchanging as little data as possible. Traditional reconciliation protocols—such as those based on Bloom filters, Invertible Bloom Lookup Tables (IBLT), or other hash‑based sketches—achieve low communication cost only when the size of the symmetric difference (denoted k) is known in advance. In many real‑world scenarios, however, the magnitude of the difference is unknown or highly variable, making prior‑knowledge‑dependent schemes inefficient or even inapplicable.

To overcome this limitation, the authors propose a novel approach that leverages compressive sensing (CS), a signal‑processing technique that exploits sparsity to recover high‑dimensional signals from a small number of linear measurements. The key observation is that the difference between the two sets can be represented as a sparse vector d over the universe U of size N: each coordinate of d is +1, −1, or 0, depending on whether an element appears only in one set, only in the other, or in both. If the number of differing elements is small relative to N, then d is s‑sparse (with s ≈ k). By applying a random measurement matrix Φ ∈ ℝ^{m×N} to d, the parties can exchange a compressed measurement z = Φd instead of the full difference vector.

The protocol works as follows. Both parties agree on a common measurement matrix Φ (e.g., a Gaussian random matrix or a structured sparse matrix that is easy to generate on low‑power devices). Party A sends z_A = Φx, where x is the indicator vector of its set. Party B computes z_B = Φy for its own indicator vector y and forms the difference z = z_A − z_B = Φ(x − y) = Φd. Party B then runs a standard CS recovery algorithm—ℓ₁‑minimization, Basis Pursuit, Orthogonal Matching Pursuit (OMP), etc.—to reconstruct d from (z, Φ). Once d is recovered, the non‑zero positions directly reveal which items are missing from each side, and a second short exchange can be used to transmit the actual missing elements.

A major contribution of the work is the handling of the unknown k scenario. Rather than fixing the number of measurements m based on a presumed sparsity level, the authors introduce an adaptive scheme: start with a modest number of measurements (e.g., m₀ ≈ C·log N) and attempt recovery. If the ℓ₁ solution fails to satisfy the expected sparsity or the residual is too large, additional measurement rows are appended and the recovery is retried. This “progressive measurement” strategy typically requires only one or two extra rounds, keeping the average communication overhead close to the theoretical minimum O(s log (N/s)).

Theoretical analysis is provided to justify the approach. Using the Restricted Isometry Property (RIP) framework, the authors show that if Φ satisfies the (2s, δ)‑RIP with δ < √2 − 1, then ℓ₁‑minimization recovers d exactly. They derive probabilistic bounds on the required m as a function of s and N, confirming that the measurement budget grows only logarithmically with the universe size and linearly with the sparsity. Moreover, they quantify the probability of successful recovery under the adaptive scheme, demonstrating that the expected number of additional measurement rounds remains bounded by a small constant.

Extensive simulations validate the theory. Experiments cover synthetic universes with N = 10⁴, 10⁵, 10⁶ and varying difference sizes k from 1 to 10³, as well as real‑world log data sets. The proposed CS‑based method is compared against Bloom‑filter sketches, IBLT, and recent Set‑Sketch techniques. Results show a consistent 30 %–50 % reduction in total transmitted bits when k is small (high sparsity), while the computational cost of recovery stays within practical limits (on the order of O(N log N) or O(N s) depending on the algorithm). The adaptive measurement protocol averages 1.2–1.5 communication rounds, confirming that the overhead of “guess‑and‑check” is negligible.

The paper also discusses practical considerations. It examines the impact of different measurement matrix constructions, including structured matrices (Toeplitz, Count‑Sketch) that are more amenable to hardware implementation on low‑power IoT devices. Security implications are briefly addressed: because the transmitted measurements are linear combinations of the original indicator vectors, they may leak information about the underlying sets; the authors suggest integrating lightweight encryption or secret‑sharing of the measurement matrix to mitigate privacy concerns.

In conclusion, the authors present a compelling alternative to traditional set reconciliation protocols that eliminates the need for prior knowledge of the difference size. By exploiting the inherent sparsity of set differences through compressive sensing, they achieve near‑optimal communication efficiency, robust recovery, and adaptability to dynamic environments. The approach is especially attractive for bandwidth‑constrained scenarios such as sensor networks, mobile edge computing, and distributed databases where frequent synchronization is required but communication resources are scarce. Future work may explore extensions to multi‑party reconciliation, weighted differences, and tighter integration with cryptographic primitives to provide both efficiency and strong privacy guarantees.