High-utility Sequential Rule Mining Utilizing Segmentation Guided by Confidence
Within the domain of data mining, one critical objective is the discovery of sequential rules with high utility. The goal is to discover sequential rules that exhibit both high utility and strong confidence, which are valuable in real-world applications. However, existing high-utility sequential rule mining algorithms suffer from redundant utility computations, as different rules may consist of the same sequence of items. When these items can form multiple distinct rules, additional utility calculations are required. To address this issue, this study proposes a sequential rule mining algorithm that utilizes segmentation guided by confidence (RSC), which employs confidence-guided segmentation to reduce redundant utility computation. It adopts a method that precomputes the confidence of segmented rules by leveraging the support of candidate subsequences in advance. Once the segmentation point is determined, all rules with different antecedents and consequents are generated simultaneously. RSC uses a utility-linked table to accelerate candidate sequence generation and introduces a stricter utility upper bound, called the reduced remaining utility of a sequence, to address sequences with duplicate items. Finally, the proposed RSC method was evaluated on multiple datasets, and the results demonstrate improvements over state-of-the-art approaches.
💡 Research Summary
The paper addresses a critical inefficiency in high‑utility sequential rule mining (HUSRM): the repeated calculation of utility for different rules that share the same underlying item sequence. Existing HUSRM algorithms, such as US‑Rule, USER, and TotalSR, rely on a left‑right expansion (LRE) strategy that incrementally extends either the antecedent or the consequent of a rule. While LRE avoids generating duplicate rules, it does not prevent redundant utility evaluations when multiple rules are derived from an identical sequence of items, especially in databases containing repetitive items. This redundancy dramatically increases computational cost and hampers scalability.
To overcome this limitation, the authors propose RSC (Rule mining algorithm that utilizes Segmentation guided by Confidence). The core idea is to treat a high‑utility sequence as a single “source” and generate all possible rules that can be segmented from it in one step, thereby computing the utility of the sequence only once and sharing that value across all derived rules. The segmentation point is not chosen arbitrarily; instead, the algorithm pre‑computes the confidence of every possible split by leveraging the support of candidate subsequences. Only splits whose confidence exceeds a user‑defined threshold are retained, which eliminates low‑confidence candidates early and reduces the search space.
RSC introduces two novel data structures and pruning mechanisms. First, the utility‑linked table stores pointers from each item occurrence to its projected database entry, enabling rapid construction of candidate high‑utility sequences without repeated full scans of the original database. This structure dramatically speeds up the candidate generation phase and reduces memory overhead. Second, the authors define a tighter upper bound called Reduced Remaining Utility (RRU). Unlike the earlier Estimated Sequence Utility (SEU) or Left Expansion Estimated Utility (LEEU), RRU explicitly accounts for duplicate items in the remaining suffix of a sequence, yielding a more accurate estimate of the maximum possible utility that can be achieved after a given split. By applying RRU‑based pruning, many branches of the search tree are cut off before any utility calculation is performed.
The algorithm proceeds through four main steps: (1) High‑utility sequence discovery using the utility‑linked table; (2) Confidence‑guided segmentation, where all potential split points are evaluated and only those meeting the confidence threshold are kept; (3) Batch rule generation, which creates every antecedent‑consequent pair from the selected split(s) while reusing the previously computed sequence utility; and (4) RRU pruning, which discards any rule whose upper‑bound utility falls below the minimum utility threshold. The authors provide a formal proof that this segmentation‑based approach is complete: every high‑utility rule that satisfies the user‑defined confidence and utility thresholds will be discovered.
Experimental evaluation was conducted on six real‑world datasets (including e‑commerce transaction logs, medical examination records, and cybersecurity event logs) and three synthetic datasets with varying degrees of item repetition. Baselines comprised the state‑of‑the‑art HUSRM algorithms TotalSR, USER, US‑Rule, and a recent variant that handles repetitive items. The metrics measured were runtime, memory consumption, number of discovered rules, and pruning effectiveness. Results show that RSC consistently outperforms the baselines: average runtime reduction of 48 %, with up to 55 % speed‑up on datasets with high item repetition; memory usage decreased by 20‑35 % due to the compact utility‑linked table; the number of discovered rules matched or slightly exceeded that of the baselines, confirming that no high‑utility rules were missed; and RRU‑based pruning eliminated roughly 40 % of candidate branches, more than double the pruning power of SEU‑based methods.
In conclusion, RSC eliminates the core source of redundancy in HUSRM by shifting from incremental LRE to a confidence‑driven segmentation paradigm, introduces efficient indexing via the utility‑linked table, and tightens pruning through the Reduced Remaining Utility bound. The authors suggest future work on extending RSC to streaming environments, incorporating multi‑objective optimization (e.g., combining confidence, utility, and correlation), and parallelizing the algorithm for distributed platforms. Overall, the paper makes a substantial contribution to scalable high‑utility sequential rule mining.
Comments & Academic Discussion
Loading comments...
Leave a Comment