Data mining has been widely recognized as a powerful tool to explore added value from large-scale databases. Finding frequent item sets in databases is a crucial in data mining process of extracting association rules. Many algorithms were developed to find the frequent item sets. This paper presents a summary and a comparative study of the available FP-growth algorithm variations produced for mining frequent item sets showing their capabilities and efficiency in terms of time and memory consumption on association rule mining by taking application of specific information into account. It proposes pattern growth mining paradigm based FP-tree growth algorithm, which employs a tree structure to compress the database. The performance study shows that the anti- FP-growth method is efficient and scalable for mining both long and short frequent patterns and is about an order of magnitude faster than the Apriority algorithm and also faster than some recently reported new frequent-pattern mining.
Deep Dive into Similarity Data Item Set Approach: An Encoded Temporal Data Base Technique.
Data mining has been widely recognized as a powerful tool to explore added value from large-scale databases. Finding frequent item sets in databases is a crucial in data mining process of extracting association rules. Many algorithms were developed to find the frequent item sets. This paper presents a summary and a comparative study of the available FP-growth algorithm variations produced for mining frequent item sets showing their capabilities and efficiency in terms of time and memory consumption on association rule mining by taking application of specific information into account. It proposes pattern growth mining paradigm based FP-tree growth algorithm, which employs a tree structure to compress the database. The performance study shows that the anti- FP-growth method is efficient and scalable for mining both long and short frequent patterns and is about an order of magnitude faster than the Apriority algorithm and also faster than some recently reported new frequent-pattern mining
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 3, MARCH 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
95
Similarity Data Item Set Approach: An
Encoded Temporal Data Base Technique
M.S.Danessh, C. Balasubramanian and K. Duraiswamy
AbstractĀ āĀ Data mining has been widely recognized as a powerful tool to explore added value from large-scale databases.
Finding frequent item sets in databases is a crucial in data mining process of extracting association rules. Many algorithms
were developed to find the frequent item sets. This paper presents a summary and a comparative study of the available
FP-growth algorithm variations produced for mining frequent item sets showing their capabilities and efficiency in terms of
time and memory consumption on association rule mining by taking application of specific information into account. It
proposes pattern growth mining paradigm based FP-tree growth algorithm, which employs a tree structure to compress
the database. The performance study shows that the anti- FP-growth method is efficient and scalable for mining both long
and short frequent patterns and is about an order of magnitude faster than the Apriority algorithm and also faster than
some recently reported new frequent-pattern mining.
Keywords:Ā Ā EncodingĀ method, frequent pattern mining, FP growth, FP tax, anti FP growth algorithmĀ
āāāāāāāāāā ļµ āāāāāāāāāā
Ā
1 INTRODUCTION
One of the currently fastest and most popular algorithms
for frequent item set mining is the FP-growth algorithm.
It is based on a prefix tree representation of the given
database of transactions (called an FP-tree), which can
save considerable amounts of memory for storing the
transactions. The basic idea of the FP-growth algorithm
can be described as a recursive elimination scheme in a
preprocessing step delete all items from the transactions
that are not frequent individually i.e., do not appear in a
user-specified
minimum
number
of
transactions.
Recourses to process the obtained reduced (also known as
projected) database, remembering that the item sets
found in the recursion share the deleted item as a prefix.
On return, remove the processed item also from the
database of all transactions and start over, i.e., process the
second frequent item etc. In these processing steps the
prefix tree, which is enhanced by links between the
branches, is exploited to quickly find the transactions
containing a given item and also to remove this item from
the transactions after it has been processed[4][7].
Ā
Ā
Ā
Ā
Ā
Ā
Ā
TheĀ AprioriĀ heuristicĀ achievesĀ goodĀ performanceĀ
gainedĀ byĀ (possiblyĀ significantly)Ā reducingĀ theĀ sizeĀ ofĀ
candidateĀ setsĀ [3].Ā However,Ā inĀ situationsĀ withĀ aĀ largeĀ
numberĀ ofĀ frequentĀ patterns,Ā longĀ patterns,Ā orĀ quiteĀ
lowĀ minimumĀ supportĀ thresholds,Ā compactĀ dataĀ
structure,Ā calledĀ frequentāpatternĀ tree,Ā orĀ FPātreeĀ inĀ
shortĀ isĀ constructed,Ā whichĀ isĀ anĀ extendedĀ prefixātreeĀ
structureĀ storingĀ crucial,Ā quantitativeĀ informationĀ
aboutĀ frequentĀ patterns.Ā ToĀ ensureĀ thatĀ theĀ treeĀ
structureĀ isĀ compactĀ andĀ informativeĀ onlyĀ frequentĀ
lengthā1Ā itemsĀ willĀ haveĀ nodesĀ inĀ theĀ tree,Ā andĀ theĀ
treeĀ nodesĀ areĀ arrangedĀ inĀ suchĀ aĀ wayĀ thatĀ moreĀ
frequentlyĀ occurringĀ nodesĀ willĀ haveĀ betterĀ chancesĀ
ofĀ nodeĀ sharingĀ thanĀ lessĀ frequentlyĀ occurringĀ ones.Ā
ThisĀ experimentsĀ showĀ thatĀ suchĀ aĀ treeĀ isĀ compactĀ
andĀ itĀ sometimesĀ ordersĀ ofĀ magnitudeĀ smallerĀ thanĀ
theĀ originalĀ databaseĀ [7].Ā SubsequentĀ frequentāpatternĀ
miningĀ willĀ onlyĀ needĀ toĀ workĀ onĀ theĀ FPātreeĀ insteadĀ
ofĀ theĀ wholeĀ dataĀ set.Ā TheĀ propertiesĀ ofĀ FPātreeĀ areĀ
thoroughlyĀ studiedĀ [10].Ā Also,Ā itĀ pointĀ outĀ theĀ factĀ
that,Ā althoughĀ itĀ isĀ oftenĀ compact,Ā FPātreeĀ mayĀ notĀ
alwaysĀ beĀ minimal.Ā SomeĀ optimizationsĀ areĀ proposedĀ
toĀ speedĀ upĀ FPāgrowthĀ whichĀ isĀ aĀ techniqueĀ toĀ handleĀ
singleĀ pathĀ FPātreeĀ hasĀ beenĀ furtherĀ developedĀ forĀ
performanceĀ improvements.Ā Ā
Ā
AĀ
databaseĀ
projectionĀ
methodĀ
hasĀ
beenĀ
developedĀ inĀ SectionĀ 2Ā toĀ copeĀ withĀ theĀ situationĀ
whenĀ anĀ FPātreeĀ cannotĀ beĀ heldĀ inĀ mainĀ memoryĀ theĀ
caseĀ thatĀ mayĀ happenĀ inĀ aĀ veryĀ largeĀ database.Ā
Ā
ExtensiveĀ experimentalĀ resultsĀ haveĀ beenĀ reported.Ā
ThusĀ examineĀ theĀ sizeĀ ofĀ FPātreeĀ asĀ Ā WellĀ asĀ theĀ
turningĀ pointĀ ofĀ FPāgrowthĀ onĀ dataĀ projectionĀ toĀ
buildingĀ FPātree[9].Ā TheĀ mainĀ stepĀ isĀ describedĀ inĀ
SectionĀ 3,Ā namelyĀ howĀ anĀ FPātreeĀ isĀ projectedĀ inĀ orderĀ
- PG student, Dept of CSE, K. S. R. College of Technology,Tiruchengode, Tamilnadu, India.
- Asst.Professor,Dept of CSE, K.S.R. College of Technology, Tiruchengode, Tamilnadu, India.
3.Dean(academic),Dept of CSE, K.S.R. College of Technology,Tiruchengode, Tamilnadu, India.
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 3, MARCH 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
96
toĀ obtainĀ anĀ FPātreeĀ ofĀ theĀ (sub)Ā databaseĀ containingĀ theĀ
transactionsĀ withĀ aĀ specificĀ itemĀ (thoughĀ withĀ
…(Full text truncated)…
This content is AI-processed based on ArXiv data.