A Frequent Closed Itemsets Lattice-based Approach for Mining Minimal Non-Redundant Association Rules

Reading time: 6 minute
...

📝 Original Info

  • Title: A Frequent Closed Itemsets Lattice-based Approach for Mining Minimal Non-Redundant Association Rules
  • ArXiv ID: 1108.5253
  • Date: 2011-08-29
  • Authors: Bay Vo, Bac Le

📝 Abstract

There are many algorithms developed for improvement the time of mining frequent itemsets (FI) or frequent closed itemsets (FCI). However, the algorithms which deal with the time of generating association rules were not put in deep research. In reality, in case of a database containing many FI/FCI (from ten thousands up to millions), the time of generating association rules is much larger than that of mining FI/FCI. Therefore, this paper presents an application of frequent closed itemsets lattice (FCIL) for mining minimal non-redundant association rules (MNAR) to reduce a lot of time for generating rules. Firstly, we use CHARM-L for building FCIL. After that, based on FCIL, an algorithm for fast generating MNAR will be proposed. Experimental results show that the proposed algorithm is much faster than frequent itemsets lattice-based algorithm in the mining time.

💡 Deep Analysis

Figure 1

📄 Full Content

Mining association rules is divided into two phases: i) Mining FI/FCI and ii) Generating association rules from FI/FCI. There have been many algorithms developed for the phase i) such that Apriori-based [2,[14][15], FP-tree-based [5-7, 16, 23], and ITtree-based [25][26][27], etc. However, the algorithms deal with the phase ii) have received little attention. In 1993, Agrawal et al developed a method for mining traditional association rule (TAR) [1]. After that, Apriori algorithm has been proposed [2]. Because TAR contains a lot of redundancies, therefore, minimal non-redundant association rule (MNAR) concept has been proposed [3,[14][15]. The set of MNAR is more compact than TAR in number of generated rules. Besides, the number of FCI is often much smaller than the number of FI, so the time for generating rules from FCI reduces significantly.

Recent years, lattice-based approaches for fast mining association rules have been proposed. In 2009, we proposed an algorithm for mining TAR based on frequent itemsets lattice (FIL) [20]. This work saves a lot of time for generating association rules. Because of based on the lattice, we can determine all child nodes of a given node and need not traverse all FI. After that, a modification of FIL (MFIL) for generating MNAR has been proposed in [22]. MNAR only mines from X to Y, where X is a minimal generator, Y is an frequent closed itemset and X  Y. FIL is modified by adding one field to determine whether a lattice node is a minimal generator or not and one field to determine whether a lattice node is a closed itemset or not. After building the lattice, we can generate MNAR easily.

The purpose of this paper is to mine MNAR based on frequent closed itemsets lattice and compare it with the algorithm based on MFIL. In section 2, we introduce some basic concepts and related works. Section 3 presents an algorithm for mining MNAR using FCIL. Section 4 discusses our experimental results. Conclusion and future work are in section 5.

Let I = {i 1 , i 2 , …, i n } be a set of items, T = {t 1 , t 2 , …, t m } be a set of transaction identifiers (tids or tidset) in a database D. The input database is a binary relation   I  T. If an item i occurs in a transaction t, we write it as (i,t)   or it.

Example: Consider database in The second transaction can be represented as {C2, D2, W2}.

Let D be a transaction database and an itemset X  I. The support of X, denoted (X), is number of transactions in D containing X.

Itemset X  I is called to be frequent if (X)  minSup (minSup is a minimum support threshold). Let X be a frequent itemset, X is called a frequent closed itemset if there have not any frequent itemset Y such that X  Y and (X) = (Y).

Let X be a frequent closed itemset, X’≠  is called a generator of X if and only if: i) X’  X and ii) (X) = (X’). Let G(X) denote the set of generator of X. We say that X’G(X) is a minimal generator if it has no subset in G(X). Let mGs(X) denote the set of all minimal generators of X. By definition, mGs(X)   since if there is no proper generator then X is a mG of X.

Mining FCI is divided into four categories [9,24]: i) Test-and-generate (Close [15], A-Close [14]): Using level-wise approach to discover FCI. All of them are based on the Apriori algorithm. ii) Divide-and-conquer (Closet [16], Closet+ [23], FPClose [6]): using compact data structure (extended from FP-tree) to mine FCI. iii) Hybrid (CHARM [27], CloseMiner [18]): using both test-and-generate and divideand-conquer to mine FCI. They are based on vertical data format to transform the database into itemtidlist and develop properties to prune fast non-closed itemsets. iv) Hybrid without duplication (DCI-Close [12], LCM [19], PGMiner [13]): they differ from hybrid in that they do not use “subsume checking”. Therefore, they do not need storage of FCI in main memory and need not use hash tables as CHARM.

Mining MNAR was proposed in 1999 by Pasquier et al. [14][15]. Firstly, the authors mined all FCI by computing closure of minimal generators. After that, they mined all MNAR by generating rules with confidence = 100% from mGs(X) to X ( X is a frequent closed itemset) and generating rules with the confidence < 100% from mGs(X) to Y (X, Y are frequent closed itemsets and X  Y). In 2000, Zaki proposed the method to mine NARs [25]. He was based on FCI and theirs mGs to mine NARs. This approach only mined the rules that their left hand side and right hand side are minimal in th e set of rules that have the same support and confidence. In 2004, Zaki published his paper with some extensions [26].

Zaki and Hsiao proposed CHARM-L [27], which is an extension of CHARM, for building a frequent closed itemset lattice. We presented an extension of the Eclat algorithm [27] for building a frequent itemset lattice (FIL) [20]. A modification of the frequent itemset lattice for mining MNAR was also presented in [22].

In this paper, we extend the lattice-based approach for quickly

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut