Bifrost: A Much Simpler Secure Two-Party Data Join Protocol for Secure Data Analytics

Bifrost: A Much Simpler Secure Two-Party Data Join Protocol for Secure Data Analytics
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Secure data join enables two parties with vertically distributed data to securely compute the joined table, allowing the parties to perform downstream Secure multi-party computation-based Data Analytics (SDA), such as training machine learning models, based on the joined table. While Circuit-based Private Set Intersection (CPSI) can be used for secure data join, it introduces redundant dummy rows in the joined table, which results in high overhead in the downstream SDA tasks. iPrivJoin addresses this issue but introduces significant communication overhead in the redundancy removal process, as it relies on the cryptographic primitive OPPRF for data encoding and multiple rounds of oblivious shuffles. In this paper, we propose a much simpler secure data join protocol, Bifrost, which outputs (the secret shares of) a redundancy-free joined table. The highlight of Bifrost lies in its simplicity: it builds upon two conceptually simple building blocks, an ECDH-PSI protocol and a two-party oblivious shuffle protocol. The lightweight protocol design allows Bifrost to avoid the need for OPPRF. We also proposed a simple optimization named \textit{dual mapping} that reduces the rounds of oblivious shuffle needed from two to one. Experiments on datasets of up to 100 GB show that Bifrost achieves $2.54 \sim 22.32\times$ speedup and reduces the communication by $84.15% \sim 88.97%$ compared to the SOTA redundancy-free secure data join protocol iPrivJoin. Notably, the communication size of Bifrost is nearly equal to the size of the input data. In the two-step SDA pipeline evaluation (secure join and SDA), the redundancy-free property of Bifrost not only avoids the catastrophic error rate blowup in the downstream tasks caused by the dummy rows in the joined table (as introduced in CPSI), but also shows up to $2.80\times$ speed-up in the SDA process with up to $73.15%$ communication reduction.


💡 Research Summary

The paper addresses the problem of securely joining two vertically partitioned datasets held by different parties, a prerequisite for downstream secure multi‑party computation (SMPC) analytics such as statistical queries or machine‑learning model training. Existing solutions fall into two categories. Circuit‑based Private Set Intersection (CPSI) can compute the join but pads the result with dummy rows up to the maximum possible size, which dramatically inflates the cost of any subsequent SMPC task. The more recent iPrivJoin eliminates dummy rows by employing an Oblivious Programmable Pseudorandom Function (OPPRF) together with multiple rounds of oblivious shuffling, but this introduces heavy communication overhead (11 rounds) and requires complex data structures such as cuckoo hashing.

Bifrost proposes a dramatically simpler construction that still outputs a redundancy‑free joined table in secret‑shared form. Its design rests on two well‑studied primitives:

  1. ECDH‑based Private Set Intersection (ECDH‑PSI) – Both parties mask their identifiers with random elliptic‑curve scalars, exchange the masked values, and recover the intersection. Importantly, the protocol also returns the intersection indices after they have been permuted by a secret mapping π, which is the composition of two private permutations πₐ (known only to Party A) and π_b (known only to Party B). Consequently, each party learns only the size of the intersection; the actual matching positions remain hidden.

  2. Two‑party Oblivious Shuffle – Given a data column and a permutation, the protocol outputs secret‑shared, shuffled data to both parties without revealing the permutation. By applying the permutation π obtained from the PSI step, the parties obtain secret‑shared, shuffled feature columns ⟨π(Fₐ)⟩ and ⟨π(F_b)⟩. Local extraction of the rows indexed by the permuted intersection yields the final secret‑shared join table ⟨D⟩.

The key engineering contribution is the Dual Mapping optimization. In the naïve approach, two oblivious shuffles would be required—one for each party’s feature column—because the overall permutation π = πₐ ∘ π_b is not commutative. Dual Mapping modifies the PSI phase to simultaneously produce two distinct permutations: π₁ = πₐ₁ ∘ π_b₁ and π₂ = π_b₂ ∘ πₐ₂, each known partially to only one party. Party A shuffles its features locally with π₁, Party B with π₂, and a single oblivious shuffle is then executed on the already locally‑shuffled data. This reduces the online communication by O(m_b) (where m_b is Party B’s feature dimension) and cuts the number of shuffle rounds from two to one.

Bifrost also naturally handles the general, non‑aligned case without resorting to cuckoo hashing. Instead of forcing rows with the same identifier to occupy the same position in a hash table, the protocol directly outputs pairs of permuted indices (π₁(i), π₂(j)) for matching identifiers and proceeds with the same shuffle‑and‑extract pipeline. This eliminates the O(h · m · κ) overhead (h = number of hash functions, κ = security parameter) present in prior work.

Theoretical analysis shows that Bifrost’s computational complexity is O(n log p) for the PSI step and O(m log p) for the shuffle, where p is the size of the underlying elliptic‑curve group. The total round complexity is three (two for PSI, one for shuffle), a substantial reduction compared with iPrivJoin’s eleven rounds. A simulation‑based security proof demonstrates that neither party learns any information beyond the intersection size; the secret‑shared output is information‑theoretically indistinguishable from random.

Experimental evaluation uses real‑world datasets up to 100 GB. Compared with iPrivJoin, Bifrost achieves 2.54 × to 22.32 × speed‑up and reduces communication by 84.15 % to 88.97 %. The advantage grows with feature dimensionality: when the number of columns increases from 100 to 6,400, runtime improvements rise from 9.58 × to 21.46 ×. In a full two‑step pipeline (secure join followed by downstream analytics such as secure statistical queries and secure model training), the redundancy‑free join prevents the catastrophic error‑rate blow‑up observed with CPSI and yields up to 2.80 × faster downstream SMPC and 73.15 % less communication.

In summary, Bifrost delivers a lightweight, low‑round, low‑communication, and high‑performance solution for secure two‑party data joining. By discarding OPPRF and complex hashing, and by cleverly combining ECDH‑PSI with a single oblivious shuffle (enhanced by dual mapping), it produces a clean, dummy‑free secret‑shared join table that integrates seamlessly with downstream SMPC analytics. This makes Bifrost a practical building block for privacy‑preserving data analytics across domains such as finance, healthcare, and e‑commerce.


Comments & Academic Discussion

Loading comments...

Leave a Comment