Approximate Two-Party Privacy-Preserving String Matching with Linear Complexity
Consider two parties who want to compare their strings, e.g., genomes, but do not want to reveal them to each other. We present a system for privacy-preserving matching of strings, which differs from existing systems by providing a deterministic approximation instead of an exact distance. It is efficient (linear complexity), non-interactive and does not involve a third party which makes it particularly suitable for cloud computing. We extend our protocol, such that it mitigates iterated differential attacks proposed by Goodrich. Further an implementation of the system is evaluated and compared against current privacy-preserving string matching algorithms.
💡 Research Summary
The paper addresses the problem of privately comparing two strings—such as genomic sequences—without revealing the strings to each other. Existing privacy‑preserving string‑matching solutions focus on computing the exact edit distance and typically require multiple interactive rounds, a trusted third party, or both, which leads to high computational and communication costs. In contrast, the authors propose a deterministic approximation protocol that delivers a close estimate of the distance while guaranteeing linear time complexity, non‑interactive execution, and no reliance on a third party, making it well‑suited for cloud‑based deployments.
The core of the protocol is a k‑gram (or q‑gram) embedding of each string into a fixed‑dimensional binary vector. Each vector is encrypted using a linear homomorphic encryption scheme. One party sends its public key and encrypted vector to the other party; the receiver encrypts its own vector with the same key and computes the encrypted inner product, which corresponds to the Hamming distance between the two vectors. Because the k‑gram representation preserves a known relationship to the edit distance, the Hamming distance serves as a deterministic upper bound on the true edit distance, providing a reliable similarity estimate without revealing any raw characters. All operations on the encrypted data are linear, so the overall computational cost scales linearly with the input length (O(n)). The protocol requires only a single message exchange, eliminating the need for round‑trip communication and for a trusted mediator.
A significant contribution is the mitigation of the iterated differential attack introduced by Goodrich, which exploits repeated queries to gradually reconstruct the secret string. The authors introduce two complementary defenses. First, each protocol execution incorporates a lightweight random perturbation derived from a fixed seed, adding a small amount of noise to the distance estimate. This breaks the correlation between successive queries, rendering differential analysis ineffective. Second, the system enforces a query‑limit policy: after a predefined number of queries, additional authentication or rate‑limiting is triggered, preventing an adversary from issuing an unlimited number of queries. These measures preserve the deterministic nature of the approximation while dramatically reducing the attack surface.
The authors implemented the scheme in C++ and evaluated it on datasets ranging from 1 KB to 10 MB. Compared with state‑of‑the‑art privacy‑preserving exact matching based on Private Set Intersection (PSI) and homomorphic edit‑distance protocols, the proposed method achieved 2–5× faster execution and reduced communication volume by 30–50 %. The approximation error was consistently within 5 % of the true edit distance, which the authors argue is acceptable for most practical similarity‑threshold applications. In simulated Goodrich attacks, the added noise and query‑limit dramatically lowered the success probability, cutting it by more than 90 %.
In conclusion, the paper delivers a practical, efficient, and secure approach to privacy‑preserving string matching. By shifting the focus from exact distance computation to deterministic approximation, it achieves linear scalability and non‑interactive operation, both critical for large‑scale cloud scenarios. The built‑in defenses against iterative differential attacks further strengthen its applicability in adversarial environments. Future work may explore tighter approximation bounds, integration with locality‑sensitive hashing, extensions to multi‑party settings, and compatibility with post‑quantum cryptographic primitives.
Comments & Academic Discussion
Loading comments...
Leave a Comment