Secure Mining of Association Rules in Horizontally Distributed Databases

Secure Mining of Association Rules in Horizontally Distributed Databases
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We propose a protocol for secure mining of association rules in horizontally distributed databases. The current leading protocol is that of Kantarcioglu and Clifton (TKDE 2004). Our protocol, like theirs, is based on the Fast Distributed Mining (FDM) algorithm of Cheung et al. (PDIS 1996), which is an unsecured distributed version of the Apriori algorithm. The main ingredients in our protocol are two novel secure multi-party algorithms — one that computes the union of private subsets that each of the interacting players hold, and another that tests the inclusion of an element held by one player in a subset held by another. Our protocol offers enhanced privacy with respect to the protocol of Kantarcioglu and Clifton. In addition, it is simpler and is significantly more efficient in terms of communication rounds, communication cost and computational cost.


💡 Research Summary

The paper addresses the problem of privately mining association rules from horizontally partitioned databases, where multiple sites hold the same schema but disjoint transaction sets. The goal is to discover all global s‑frequent itemsets and the corresponding (s, c) association rules without revealing any more information than what is unavoidable from the final output. This setting naturally leads to a secure multi‑party computation (SMC) problem: each party Pi holds a private input xi (its local database Di) and all parties wish to compute y = f(x1,…,xM) where f is the mining function.

The authors build upon the Fast Distributed Mining (FDM) algorithm of Cheung et al., an unsecured distributed version of Apriori. FDM proceeds in rounds k = 1,…,L, generating candidate k‑itemsets, locally pruning them, broadcasting locally frequent sets, and finally aggregating local supports to decide global frequency. The only stages that leak private information are (i) the broadcast of locally frequent itemsets (Stage 4) and (ii) the broadcast of local support counts (Stage 6). Kantarcioglu and Clifton (TKDE 2004) proposed a secure implementation of Stage 4 (called UniFI‑KC) that hides the size of each local set by padding with fake itemsets and hides the content by applying a commutative encryption scheme together with oblivious transfer (OT) and hash functions. Their protocol requires multiple communication rounds, heavy cryptographic primitives, and leaks some auxiliary information to individual parties or small coalitions.

The contribution of this paper is a new, simpler protocol for the same stage, together with a complementary protocol for testing set inclusion. The key idea is to view the union of private subsets as a bitwise OR of binary vectors. More generally, the authors define a family of “threshold functions” Tt(b1,…,bM) that output 1 if at least t of the input bits are 1, and 0 otherwise. The OR corresponds to t = 1, the AND to t = M. They design Protocol 2 (Threshold) that securely computes any Tt using only additive secret sharing and linear operations, following the approach of Ben‑Or et al. for secure sum. Each party splits its binary vector into random shares, distributes them, and the parties jointly reconstruct the sum vector modulo a small integer. By locally comparing each component of the sum with the threshold t, they obtain the desired OR/AND result without revealing individual bits.

Because the protocol relies solely on hashing and XOR‑based secret sharing, it eliminates the need for commutative encryption and OT. Consequently, the number of communication rounds drops from O(M) to a constant (two rounds for share exchange and one round for reconstruction), and the total transmitted data is reduced from O(|Ap(Fk‑1^s)|·M) to O(|Ap(Fk‑1^s)|). The privacy analysis shows that the only extra information potentially leaked is limited to coalitions of at most three parties, whereas the Kantarcioglu‑Clifton protocol may leak to single parties. The authors argue that this leakage is less sensitive and acceptable in practice.

The paper also presents a secure inclusion test: given an element held by one party, the protocol determines whether it belongs to the private subset of another party using the same threshold machinery (with t = 1 on a suitably encoded vector). This operation is needed in later stages of FDM when checking whether candidate itemsets are globally frequent.

Performance evaluation compares the new protocol with UniFI‑KC in terms of communication rounds, bandwidth, and CPU time. Experiments on synthetic datasets with thousands of items and up to ten parties demonstrate 30‑70 % reductions in all metrics, especially as the number of parties grows.

The remainder of the paper (Sections 3‑4) briefly recaps the secure handling of the remaining FDM stages (global support verification and rule generation) as described in the original Kantarcioglu‑Clifton work, and Section 5 surveys related literature on privacy‑preserving data mining.

In summary, the authors provide a more efficient and privacy‑enhanced solution for the critical union step in distributed association‑rule mining. By abstracting the problem to threshold functions and employing lightweight secret‑sharing techniques, they achieve significant reductions in cryptographic overhead while limiting information leakage to small, well‑defined coalitions. The approach is modular and can be adapted to other privacy‑preserving mining tasks, offering a valuable contribution to the field of secure multi‑party data analysis.


Comments & Academic Discussion

Loading comments...

Leave a Comment