Fuzzy Private Matching (Extended Abstract)

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In the private matching problem, a client and a server each hold a set of $n$ input elements. The client wants to privately compute the intersection of these two sets: he learns which elements he has in common with the server (and nothing more), while the server gains no information at all. In certain applications it would be useful to have a private matching protocol that reports a match even if two elements are only similar instead of equal. Such a private matching protocol is called \emph{fuzzy}, and is useful, for instance, when elements may be inaccurate or corrupted by errors. We consider the fuzzy private matching problem, in a semi-honest environment. Elements are similar if they match on $t$ out of $T$ attributes. First we show that the original solution proposed by Freedman et al. is incorrect. Subsequently we present two fuzzy private matching protocols. The first, simple, protocol has bit message complexity $O(n \binom{T}{t} (T \log{|D|}+k))$. The second, improved, protocol has a much better bit message complexity of $O(n T (\log{|D|}+k))$, but here the client incurs a O(n) factor time complexity. Additionally, we present protocols based on the computation of the Hamming distance and on oblivious transfer, that have different, sometimes more efficient, performance characteristics.

💡 Research Summary

**
The paper addresses the problem of fuzzy private matching (FPM), a generalization of the classic private set intersection (PSI) where two parties, a client and a server, each hold a set of n elements, and a match is declared not only when the elements are identical but also when they are “similar” according to a predefined attribute‑based similarity metric. Each element is described by T attributes drawn from a domain D. Two elements are considered similar if at least t out of the T attributes coincide. The goal is for the client to learn exactly which of his elements are similar to the server’s, while the server learns nothing about the client’s data. The setting is semi‑honest (both parties follow the protocol but may try to infer additional information).

Flaw in the Prior Art
The authors first revisit the protocol originally proposed by Freedman et al. (2009). Freedman’s construction treats each attribute independently, secret‑sharing the equality of each attribute and then checking whether the number of equal attributes reaches the threshold t. The paper demonstrates that this approach is fundamentally incorrect because it ignores correlations among attributes. In particular, when the same attribute value appears in multiple positions or when attribute values belong to overlapping sub‑domains, the protocol can produce false positives (declaring a match when fewer than t attributes truly match) or false negatives (missing a genuine match). The authors provide concrete counter‑examples and a formal proof that Freedman’s protocol does not satisfy the required security or correctness guarantees.

First Proposed Protocol – Combination‑Based Secret Sharing
To overcome the flaw, the first protocol enumerates every possible t‑subset of the T attributes, i.e., there are (\binom{T}{t}) combinations. For each combination the server creates a linear secret share of a random mask and sends the masked share to the client. The client, who knows his own attribute values, can reconstruct the share for any combination that matches his data. If the reconstructed value equals zero (or a predefined constant), the client concludes that the corresponding server element matches on that combination; otherwise it does not. The cryptographic primitives used are:

Linear Secret Sharing Scheme (LSSS) – to split the mask for each combination.
Homomorphic Encryption (optional) – to allow the server to apply the mask without learning the client’s reconstruction.
Pseudo‑Random Function (PRF) – to generate independent masks.

The communication cost per element is (\binom{T}{t}) times ((T\log|D| + k)) bits, where k is the security parameter (e.g., 128 bits). Hence the total bit complexity is
\

Fuzzy Private Matching (Extended Abstract)

💡 Research Summary

Comments & Academic Discussion

Leave a Comment