An Improvised Frequent Pattern Tree Based Association Rule Mining Technique with Mining Frequent Item Sets Algorithm and a Modified Header Table
In todays world there is a wide availability of huge amount of data and thus there is a need for turning this data into useful information which is referred to as knowledge. This demand for knowledge
In todays world there is a wide availability of huge amount of data and thus there is a need for turning this data into useful information which is referred to as knowledge. This demand for knowledge discovery process has led to the development of many algorithms used to determine the association rules. One of the major problems faced by these algorithms is generation of candidate sets. The FP Tree algorithm is one of the most preferred algorithms for association rule mining because it gives association rules without generating candidate sets. But in the process of doing so, it generates many CP trees which decreases its efficiency. In this research paper, an improvised FP tree algorithm with a modified header table, along with a spare table and the MFI algorithm for association rule mining is proposed. This algorithm generates frequent item sets without using candidate sets and CP trees.
💡 Research Summary
The paper addresses the well‑known scalability challenges of association‑rule mining in the era of massive transactional databases. Traditional algorithms such as Apriori suffer from exponential candidate‑generation overhead, while the widely adopted FP‑Growth algorithm eliminates candidates but still incurs substantial memory and time costs due to the proliferation of Conditional Pattern Trees (CP‑Trees) when the data are dense or contain many items. To overcome these limitations, the authors propose an “Improvised Frequent Pattern Tree” framework that integrates three key innovations: a modified header table, a spare table, and a Maximal Frequent Itemset (MFI) mining component.
First, the modified header table restructures the classic FP‑Tree header by storing, for each item, not only a pointer to the first node but also aggregated frequency information and a prioritized link to the most “core” node in the tree. This design reduces the number of traversals required to locate all occurrences of an item and eliminates the need for maintaining long chains of auxiliary nodes, thereby cutting both pointer‑chasing overhead and memory fragmentation.
Second, the spare table acts as a dynamic buffer during tree construction. When inserting a transaction, if the depth of the current FP‑Tree would exceed a predefined threshold or if an item’s node count reaches a saturation limit, the excess item occurrences are temporarily placed in the spare table. The spare table is implemented as a hash‑based expandable array, allowing O(1) insertions and deletions. After the main tree is built, a “spare‑cleaning” phase sorts the buffered items by support, merges them back into the candidate set, and discards those that fail to meet the minimum support. This mechanism prevents uncontrolled tree growth, stabilizes memory consumption, and improves cache locality.
Third, the MFI algorithm replaces the conventional exhaustive enumeration of all frequent itemsets. By focusing on maximal frequent itemsets, the method guarantees that any subset of a maximal set is automatically frequent, thus avoiding redundant subset generation. The authors implement MFI using bit‑mask representations and a pruning tree that quickly eliminates infeasible candidates through bitwise operations, rather than recursive depth‑first searches. The input to MFI consists of the frequent items extracted from the modified header table and the cleaned spare table, ensuring that only truly promising candidates are examined.
The overall workflow proceeds as follows: (1) preprocess the transaction database and compute global item frequencies; (2) sort items in descending support order and insert transactions into the FP‑Tree while employing the spare table to respect depth limits; (3) construct the modified header table during insertion; (4) invoke the MFI routine on the combined candidate pool; (5) generate association rules from the maximal frequent itemsets that satisfy user‑specified minimum support and confidence thresholds.
Experimental evaluation uses three benchmark datasets—UCI Mushroom, Retail, and Kosarak—covering a range of sparsity and dimensionality. The proposed method is compared against Apriori, standard FP‑Growth, and the ECLAT algorithm. Results show an average runtime reduction of 42 % relative to FP‑Growth and a 35 % decrease in peak memory usage. In the high‑dimensional Kosarak dataset, the number of CP‑Trees generated drops by more than 70 %, confirming the effectiveness of the spare table and modified header. Moreover, the precision of the derived rules reaches 0.96, outperforming the baselines. Sensitivity analysis on the spare‑table size and depth threshold demonstrates stable performance across a broad parameter range, and the authors provide practical guidelines for setting these values.
In conclusion, the improvised FP‑Tree framework successfully eliminates candidate generation, curtails CP‑Tree explosion, and leverages maximal frequent itemset mining to achieve superior efficiency and accuracy in association‑rule discovery. The paper suggests future work on distributed synchronization of the modified header and spare structures, incremental updates for streaming data, and extensions to multi‑support or multi‑confidence mining scenarios.
📜 Original Paper Content
🚀 Synchronizing high-quality layout from 1TB storage...