A Prefixed-Itemset-Based Improvement For Apriori Algorithm
📝 Abstract
Association rules is a very important part of data mining. It is used to find the interesting patterns from transaction databases. Apriori algorithm is one of the most classical algorithms of association rules, but it has the bottleneck in efficiency. In this article, we proposed a prefixed-itemset-based data structure for candidate itemset generation, with the help of the structure we managed to improve the efficiency of the classical Apriori algorithm.
💡 Analysis
Association rules is a very important part of data mining. It is used to find the interesting patterns from transaction databases. Apriori algorithm is one of the most classical algorithms of association rules, but it has the bottleneck in efficiency. In this article, we proposed a prefixed-itemset-based data structure for candidate itemset generation, with the help of the structure we managed to improve the efficiency of the classical Apriori algorithm.
📄 Content
A PREFIXED-ITEMSET-BASED IMPROVEMENT FOR APRIORI ALGORITHM Yu Shoujian1,Zhou Yiyang2 1College of computer science and technology, Donghua University,Shanghai, 201600, China jackyysj@dhu.edu.cn 2 College of computer science and technology, Donghua University, Shanghai, 201600, China yiyang0203@foxmail.com ABSTRACT Association rules is a very important part of data mining. It is used to find the interesting patterns from transaction databases. Apriori algorithm is one of the most classical algorithms of association rules, but it has the bottleneck in efficiency. In this article, we proposed a prefixed-itemset-based data structure for candidate itemset generation, with the help of the structure we managed to improve the efficiency of the classical Apriori algorithm. KEYWORDS Data mining, association rules, Apriori algorithm, prefixed-itemset, hash map
- INTRODUCTION With the rapid development of computer technology in various sectors, the data generated by different industries are becoming more and more, but how to get valuable information from the big data has become a new problem. Data mining, that is data knowledge discovery, came into being in this backdrop. Data mining is to excavate the implied, unknown, interesting knowledge and rules from a large number of data [1]. Association rules is an important part of data mining, it was first put forward by R.Agrawal, mainly to solve the customer transaction association rules between sets of items in the transaction library [2]. In the following year, R.Agrawal proposed the most classical algorithm to calculate association rules, that is Apriori algorithm [3], which is to infer the (k+1) – itemsets by the k- itemsets.
However, due to the computing bottleneck of Apriori algorithm when calculating the candidate set, in recent years there have been many improved algorithms of the traditional Apriori algorithm from different aspects. Chun-Sheng Z proposed an improved Apriori algorithm based on classification [4]. Jia Y improves the algorithm from the aspect of transaction database partitioning and dynamic itemset planning [5]. Shuangyue L proposed an improved algorithm based on the matrix of database to enhance the efficiency of calculating [6]. Wang P proposed an optimization method to reduce the search times of the transaction library to improve the efficiency [7]. Vaithiyanathan V uses the method of compressing the transactions of the similar interests in the database to improve the efficiency of the algorithm [8]. Lin X implements Apriori algorithm based on Map Reduce to improve the candidate sets of large amounts of data generation efficiency [9]. Zhang first analyze the characteristic of the data, that is medical data, and then combine the characteristics of the data to improved Apriori algorithm [10]. Wu Huan proposed an improved algorithm IAA, which adopts a new count-based method to prune candidate itemsets and uses generation record to reduce total data scan amount [11]. Wang Yuan proposes an improved item constrain association rules mining algorithm, which improves traditional algorithm in two aspects: trimming frequent itemsets and calculating candidate itemsets [12]. Lin Ming-Yen proposes three algorithms, named SPC, FPC, and DPC, to investigate effective implementations of the Apriori algorithm in the MapReduce framework [13]. Chai Sheng proposes a novel algorithm so called Reduced Apriori Algorithm with Tag (RAAT), which reduces one redundant pruning operations of C2 [14].
This article will be focus on the two concrete steps of classical Apriori algorithm, namely connecting step and the pruning step, using a new prefix-itemset-based storage, combining the fast lookup feature of hash tables to improve the efficiency. This paper will first describe the classical Apriori algorithm and its shortcomings, then specifically describe the improvements, and finally introduce the comparisons of efficiency of classical Apriori algorithm and improve Apriori algorithm on specific data sets. 2. APRIORI ALGORITHM 2.1. Apriori algorithm introduction Apriori algorithm is a classical algorithm for frequent itemset mining association rules, the basic idea of the algorithm is to use an iterative approach layer by layer to find the frequent. The algorithm will first obtain k-itemsets, and then use the k- itemsets to explore (k+1)-itemsets. First, let’s introduce the priori knowledge of frequent itemsets, which is, any subset of a frequent itemset is also a frequent itemset. Apriori algorithm uses the prior knowledge of frequent itemsets, first to find the collection of frequent 1-itemsets, denoted L1. Then use the 2- itemsets of L1 to get L2, and then L3, and so on, until you cannot find the frequent k-itemsets. Apriori algorithm mainly consists of the following three steps:
(1) Connecting step: connecting k- frequent itemsets to generate (k+1)-candidate sets, denoted by Ck+1. The connect condition
This content is AI-processed based on ArXiv data.