A novel approach for fast mining frequent itemsets use N-list structure based on MapReduce
📝 Abstract
Frequent Pattern Mining is a one field of the most significant topics in data mining. In recent years, many algorithms have been proposed for mining frequent itemsets. A new algorithm has been presented for mining frequent itemsets based on N-list data structure called Prepost algorithm. The Prepost algorithm is enhanced by implementing compact PPC-tree with the general tree. Prepost algorithm can only find a frequent itemsets with required (pre-order and post-order) for each node. In this chapter, we improved prepost algorithm based on Hadoop platform (HPrepost), proposed using the Mapreduce programming model. The main goals of proposed method are efficient mining frequent itemsets requiring less running time and memory usage. We have conduct experiments for the proposed scheme to compare with another algorithms. With dense datasets, which have a large average length of transactions, HPrepost is more effective than frequent itemsets algorithms in terms of execution time and memory usage for all min-sup. Generally, our algorithm outperforms algorithms in terms of runtime and memory usage with small thresholds and large datasets.
💡 Analysis
Frequent Pattern Mining is a one field of the most significant topics in data mining. In recent years, many algorithms have been proposed for mining frequent itemsets. A new algorithm has been presented for mining frequent itemsets based on N-list data structure called Prepost algorithm. The Prepost algorithm is enhanced by implementing compact PPC-tree with the general tree. Prepost algorithm can only find a frequent itemsets with required (pre-order and post-order) for each node. In this chapter, we improved prepost algorithm based on Hadoop platform (HPrepost), proposed using the Mapreduce programming model. The main goals of proposed method are efficient mining frequent itemsets requiring less running time and memory usage. We have conduct experiments for the proposed scheme to compare with another algorithms. With dense datasets, which have a large average length of transactions, HPrepost is more effective than frequent itemsets algorithms in terms of execution time and memory usage for all min-sup. Generally, our algorithm outperforms algorithms in terms of runtime and memory usage with small thresholds and large datasets.
📄 Content
A novel approach for fast mining frequent itemsets use N-list structure based on MapReduce ARKAN A. G. AL-HAMODI1*, SONGFENG LU2 1Research scholar, School of computer science Huazhong University of Science and Technology Wuhan 430074, PRC 2Associate professor, School of computer science Huazhong University of Science and Technology Wuhan 430074, PRC E-mail: arkan_almalky@yahoo.com , lusongfeng@hust.edu.cn
Abstract Frequent Pattern Mining is a one field of the most significant topics in data mining. In recent years, many algorithms have been proposed for mining frequent itemsets. A new algorithm has been presented for mining frequent itemsets based on N-list data structure called Prepost algorithm. The Prepost algorithm is enhanced by implementing compact PPC-tree with the general tree. Prepost algorithm can only find a frequent itemsets with required (pre-order and post-order) for each node. In this chapter, we improved prepost algorithm based on Hadoop platform (HPrepost), proposed using the Mapreduce programming model. The main goals of proposed method are efficient mining frequent itemsets requiring less running time and memory usage. We have conduct experiments for the proposed scheme to compare with another algorithms. With dense datasets, which have a large average length of transactions, HPrepost is more effective than frequent itemsets algorithms in terms of execution time and memory usage for all min-sup. Generally; our algorithm outperforms algorithms in terms of runtime and memory usage with small thresholds and large datasets.
Keywords Data mining, Frequent itemsets, N-list, MapReduce
1 Introduction
Frequent pattern mining is one of the
most important and popular research areas in
mining Association rules field and data
mining [1,2]. It is becoming the hot topic for
finding frequent itemsets mining. Most of the
proposed algorithms for frequent itemsets
can be clustered in to Apriori method and FP-
growth method. Repeatedly, the Apriori
method scans the database to find frequent
itemsets with generates a large set of a
candidate [3]. FP-growth method scans the
database twice to mines frequent itemsets
without generating candidates [4]. The FP-
growth uses FP-tree data structure to store
database and employs a divide-and-conquer
strategy to find frequent itemsets, which is
much more efficient than Apriori method.
In the frequent itemsets, two kinds of data
structure (Node-list and N-list) have been
proposed by Deng and et al. [5,6], to reduce the
mining time and memory usage with mining
frequent itemsets. The two of data structures
based on a prefix tree with encoded nodes.
The Node-list and N-list based on PPC-tree,
and both of them consuming of memory
because they need to encoding nodes with
pre-order and post-order. Based on N-list
algorithm called Prepost. In this chapter, we
present a new method HPrepost algorithm
based
on
PPC-tree
under
Mapreduce
framework with Hadoop platform to obtain
more efficiently for mining frequent itemsets,
reduce running time, and usage memory.
- 2 -
2 Related work The previous proposed algorithms for mining frequent itemsets divided into three groups, Generate candidate, frequent pattern growth and Hybrid approach. In recent years, three kinds of structure have been proposed for finding a frequent itemsets efficiently. Node-list structure was proposed by Deng and et al. [5], based on PPC-tree (pre-order post-order Code tree). N-List structure was proposed by Deng and et al. [6], needs to encode a node of the PPC-tree with pre-order and post-order. Both of the two structures are based on a prefix tree called PPC-tree. Two novel data structures are memory consuming because need to encode a node with pre-order and post-order. N-list based on algorithm for mining called NAFCP was proposed by Tuong Le and Bay Vo [7]. An enhanced N-list and Subsume-based algorithm for mining Frequent Itemsets (NSFI) algorithm that uses a hash table to improve the process of creating the N-lists associated with 1- itemsets and an enhanced N-list intersection algorithm was presented by Bay Vo and et al. [8]. New algorithm more effective with reducing the memory usage and mining time. An improved version of the mining top-rank- k frequent pattern (NTK) presents by Huynh et al. [9]. A hybrid algorithm based on PrePost proposed by Vo et al. [10], An improved PrePost algorithm uses a hash table to enhance the process of creating the N-lists data structure. Mapreduce programming framework is very well known technique for processing such massive of data [11,12]. Liao et al. [13] presented a parallel algorithm adapted for mining big data based on Hadoop platform under Mapreduce (MRPrepost). The algorithm employs N-list data structure, which improves PrePost by way of adding a prefix pattern. An improved Prepost algorithm with hadoop platform proposed by Thakare et al. [14]. The
This content is AI-processed based on ArXiv data.