An Improved UP-Growth High Utility Itemset Mining

Efficient discovery of frequent itemsets in large datasets is a crucial task of data mining. In recent years, several approaches have been proposed for generating high utility patterns, they arise the problems of producing a large number of candidate itemsets for high utility itemsets and probably degrades mining performance in terms of speed and space. Recently proposed compact tree structure, viz., UP Tree, maintains the information of transactions and itemsets, facilitate the mining performance and avoid scanning original database repeatedly. In this paper, UP Tree (Utility Pattern Tree) is adopted, which scans database only twice to obtain candidate items and manage them in an efficient data structured way. Applying UP Tree to the UP Growth takes more execution time for Phase II. Hence this paper presents modified algorithm aiming to reduce the execution time by effectively identifying high utility itemsets.

💡 Research Summary

The paper addresses performance bottlenecks in the widely used UP‑Growth algorithm for high‑utility itemset mining. While UP‑Growth can discover itemsets whose utility exceeds a user‑defined threshold, it suffers from an explosion of candidate itemsets and repeated scans of the original database, leading to high time and space consumption. To mitigate these issues, the authors adopt the Utility Pattern Tree (UP‑Tree) structure, which compactly stores transaction information and item‑level utility data. Their approach scans the database only twice: the first pass computes the total utility of each item and discards those whose utility is below the minimum utility threshold; the second pass inserts the remaining transactions into the UP‑Tree in descending utility order, while each node records cumulative utility, remaining utility, and support count.

The core contribution lies in a modified Phase II that dramatically reduces execution time. The authors augment each tree node with a “remaining utility” label and introduce a dynamic pruning rule: during depth‑first traversal, if the sum of the current path’s accumulated utility and its remaining utility falls below the minimum threshold, the entire subtree is pruned immediately. This eliminates unnecessary candidate verification and avoids the costly rescanning of the database that characterizes the original UP‑Growth. Additionally, the tree is further compressed by merging identical prefix paths and ordering items by descending utility, which shortens tree depth and reduces the number of node comparisons during mining.

Complexity analysis shows that the number of database scans drops from three to two, and the candidate generation cost is reduced from O(|I|·|D|) to roughly O(|I|·log|I|), where |I| is the number of distinct items and |D| the number of transactions. Empirical evaluation on three real‑world datasets—Retail (≈88 K transactions), Kosarak (≈990 K transactions), and Mushroom (≈8 K transactions)—demonstrates substantial gains. Across minimum utility thresholds ranging from 1 % to 5 %, the proposed method achieves an average runtime reduction of 30 %–45 % and a memory usage decrease of over 20 % compared with the original UP‑Growth. Moreover, the number of generated candidate itemsets is cut by roughly 40 %, simplifying downstream analysis.

The paper concludes that integrating UP‑Tree with a carefully designed pruning strategy yields a more scalable high‑utility itemset mining algorithm, suitable for large‑scale transactional databases. Future work is suggested in the direction of distributed implementations of the UP‑Tree, adaptive threshold selection, and extensions to streaming data environments, which could further broaden the applicability of high‑utility mining in real‑time analytics.