An Improved Approach to High Level Privacy Preserving Itemset Mining

Privacy preserving association rule mining has triggered the development of many privacy preserving data mining techniques. A large fraction of them use randomized data distortion techniques to mask the data for preserving. This paper proposes a new transaction randomization method which is a combination of the fake transaction randomization method and a new per transaction randomization method. This method distorts the items within each transaction and ensures a higher level of data privacy in comparison to the previous approaches. The pertransaction randomization method involves a randomization function to replace the item by a random number guarantying privacy within the transaction also. A tool has also been developed to implement the proposed approach to mine frequent itemsets and association rules from the data guaranteeing the antimonotonic property.

💡 Research Summary

The paper addresses the critical challenge of preserving individual privacy while mining association rules from transactional data. Traditional privacy‑preserving data mining (PPDM) techniques often rely on either adding fake transactions to obscure real purchase patterns or applying a uniform randomization to all items in the dataset. Both approaches suffer from a trade‑off: stronger privacy typically leads to a loss of data utility, and many methods break the anti‑monotonic property that underlies efficient frequent‑itemset mining algorithms such as Apriori and FP‑Growth.

To overcome these limitations, the authors propose a novel two‑stage randomization framework that synergistically combines Fake Transaction Randomization with a newly introduced Per‑Transaction Randomization. In the first stage, a controlled proportion (p) of synthetic transactions is generated and inserted into the original database. These fake transactions are constructed to mimic the statistical characteristics (average length, item distribution) of genuine transactions, thereby preventing an adversary from distinguishing real from synthetic records based solely on aggregate statistics.

In the second stage, each genuine transaction undergoes an item‑level transformation using a simple modular function:

💡 Research Summary

📜 Original Paper Content