Similarity Data Item Set Approach: An Encoded Temporal Data Base Technique

Reading time: 6 minute
...

šŸ“ Original Info

  • Title: Similarity Data Item Set Approach: An Encoded Temporal Data Base Technique
  • ArXiv ID: 1003.4076
  • Date: 2010-03-23
  • Authors: ** - M. S. Danessh - C. Balasubramanian - K. Duraiswamy **

šŸ“ Abstract

Data mining has been widely recognized as a powerful tool to explore added value from large-scale databases. Finding frequent item sets in databases is a crucial in data mining process of extracting association rules. Many algorithms were developed to find the frequent item sets. This paper presents a summary and a comparative study of the available FP-growth algorithm variations produced for mining frequent item sets showing their capabilities and efficiency in terms of time and memory consumption on association rule mining by taking application of specific information into account. It proposes pattern growth mining paradigm based FP-tree growth algorithm, which employs a tree structure to compress the database. The performance study shows that the anti- FP-growth method is efficient and scalable for mining both long and short frequent patterns and is about an order of magnitude faster than the Apriority algorithm and also faster than some recently reported new frequent-pattern mining.

šŸ’” Deep Analysis

Deep Dive into Similarity Data Item Set Approach: An Encoded Temporal Data Base Technique.

Data mining has been widely recognized as a powerful tool to explore added value from large-scale databases. Finding frequent item sets in databases is a crucial in data mining process of extracting association rules. Many algorithms were developed to find the frequent item sets. This paper presents a summary and a comparative study of the available FP-growth algorithm variations produced for mining frequent item sets showing their capabilities and efficiency in terms of time and memory consumption on association rule mining by taking application of specific information into account. It proposes pattern growth mining paradigm based FP-tree growth algorithm, which employs a tree structure to compress the database. The performance study shows that the anti- FP-growth method is efficient and scalable for mining both long and short frequent patterns and is about an order of magnitude faster than the Apriority algorithm and also faster than some recently reported new frequent-pattern mining

šŸ“„ Full Content

JOURNAL OF COMPUTING, VOLUME 2, ISSUE 3, MARCH 2010, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/

95 Similarity Data Item Set Approach: An Encoded Temporal Data Base Technique
M.S.Danessh, C. Balasubramanian and K. Duraiswamy Abstract ‐ Data mining has been widely recognized as a powerful tool to explore added value from large-scale databases. Finding frequent item sets in databases is a crucial in data mining process of extracting association rules. Many algorithms were developed to find the frequent item sets. This paper presents a summary and a comparative study of the available FP-growth algorithm variations produced for mining frequent item sets showing their capabilities and efficiency in terms of time and memory consumption on association rule mining by taking application of specific information into account. It proposes pattern growth mining paradigm based FP-tree growth algorithm, which employs a tree structure to compress the database. The performance study shows that the anti- FP-growth method is efficient and scalable for mining both long and short frequent patterns and is about an order of magnitude faster than the Apriority algorithm and also faster than some recently reported new frequent-pattern mining.

Keywords:Ā Ā EncodingĀ method, frequent pattern mining, FP growth, FP tax, anti FP growth algorithmĀ  ——————————  —————————— Ā  1 INTRODUCTION

One of the currently fastest and most popular algorithms for frequent item set mining is the FP-growth algorithm. It is based on a prefix tree representation of the given database of transactions (called an FP-tree), which can save considerable amounts of memory for storing the transactions. The basic idea of the FP-growth algorithm can be described as a recursive elimination scheme in a preprocessing step delete all items from the transactions that are not frequent individually i.e., do not appear in a user-specified minimum number of transactions. Recourses to process the obtained reduced (also known as projected) database, remembering that the item sets found in the recursion share the deleted item as a prefix. On return, remove the processed item also from the database of all transactions and start over, i.e., process the second frequent item etc. In these processing steps the prefix tree, which is enhanced by links between the branches, is exploited to quickly find the transactions containing a given item and also to remove this item from the transactions after it has been processed[4][7].
Ā  Ā  Ā  Ā  Ā  Ā  Ā  TheĀ  AprioriĀ  heuristicĀ  achievesĀ  goodĀ  performanceĀ  gainedĀ byĀ (possiblyĀ significantly)Ā reducingĀ theĀ sizeĀ ofĀ  candidateĀ setsĀ [3].Ā However,Ā inĀ situationsĀ withĀ aĀ largeĀ  numberĀ ofĀ frequentĀ patterns,Ā longĀ patterns,Ā orĀ  quiteĀ  lowĀ  minimumĀ  supportĀ  thresholds,Ā  compactĀ  dataĀ  structure,Ā  calledĀ  frequent‐patternĀ  tree,Ā  orĀ  FP‐treeĀ  inĀ  shortĀ isĀ constructed,Ā whichĀ isĀ anĀ extendedĀ prefix‐treeĀ  structureĀ  storingĀ  crucial,Ā  quantitativeĀ  informationĀ  aboutĀ  frequentĀ  patterns.Ā  ToĀ  ensureĀ  thatĀ  theĀ  treeĀ  structureĀ  isĀ  compactĀ  andĀ  informativeĀ  onlyĀ  frequentĀ  length‐1Ā  itemsĀ  willĀ  haveĀ  nodesĀ  inĀ  theĀ  tree,Ā  andĀ  theĀ  treeĀ  nodesĀ  areĀ  arrangedĀ  inĀ  suchĀ  aĀ  wayĀ  thatĀ  moreĀ  frequentlyĀ  occurringĀ  nodesĀ  willĀ  haveĀ  betterĀ  chancesĀ  ofĀ nodeĀ sharingĀ thanĀ lessĀ frequentlyĀ occurringĀ ones.Ā  ThisĀ  experimentsĀ  showĀ  thatĀ  suchĀ  aĀ  treeĀ  isĀ  compactĀ  andĀ  itĀ  sometimesĀ  ordersĀ  ofĀ  magnitudeĀ  smallerĀ  thanĀ  theĀ originalĀ databaseĀ [7].Ā SubsequentĀ frequent‐patternĀ  miningĀ willĀ onlyĀ needĀ toĀ workĀ onĀ theĀ FP‐treeĀ insteadĀ  ofĀ  theĀ  wholeĀ  dataĀ  set.Ā  TheĀ  propertiesĀ  ofĀ  FP‐treeĀ  areĀ  thoroughlyĀ  studiedĀ  [10].Ā  Also,Ā  itĀ  pointĀ  outĀ  theĀ  factĀ  that,Ā  althoughĀ  itĀ  isĀ  oftenĀ  compact,Ā  FP‐treeĀ  mayĀ  notĀ  alwaysĀ beĀ minimal.Ā SomeĀ optimizationsĀ areĀ proposedĀ  toĀ speedĀ upĀ FP‐growthĀ whichĀ isĀ aĀ techniqueĀ toĀ handleĀ  singleĀ  pathĀ  FP‐treeĀ  hasĀ  beenĀ  furtherĀ  developedĀ  forĀ  performanceĀ improvements.Ā Ā  Ā  AĀ  databaseĀ  projectionĀ  methodĀ  hasĀ  beenĀ  developedĀ  inĀ  SectionĀ  2Ā  toĀ  copeĀ  withĀ  theĀ  situationĀ  whenĀ anĀ FP‐treeĀ cannotĀ beĀ heldĀ inĀ mainĀ memoryĀ theĀ  caseĀ thatĀ mayĀ happenĀ inĀ aĀ veryĀ largeĀ database.Ā  Ā  ExtensiveĀ  experimentalĀ  resultsĀ  haveĀ  beenĀ  reported.Ā  ThusĀ  examineĀ  theĀ  sizeĀ  ofĀ  FP‐treeĀ  asĀ  Ā  WellĀ  asĀ  theĀ  turningĀ  pointĀ  ofĀ  FP‐growthĀ  onĀ  dataĀ  projectionĀ  toĀ  buildingĀ  FP‐tree[9].Ā  TheĀ  mainĀ  stepĀ  isĀ  describedĀ  inĀ  SectionĀ 3,Ā namelyĀ howĀ anĀ FP‐treeĀ isĀ projectedĀ inĀ orderĀ 

  1. PG student, Dept of CSE, K. S. R. College of Technology,Tiruchengode, Tamilnadu, India.
  2. Asst.Professor,Dept of CSE, K.S.R. College of Technology, Tiruchengode, Tamilnadu, India. 3.Dean(academic),Dept of CSE, K.S.R. College of Technology,Tiruchengode, Tamilnadu, India.

JOURNAL OF COMPUTING, VOLUME 2, ISSUE 3, MARCH 2010, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/

96 toĀ  obtainĀ  anĀ  FP‐treeĀ  ofĀ  theĀ  (sub)Ā  databaseĀ  containingĀ  theĀ  transactionsĀ  withĀ  aĀ  specificĀ  itemĀ  (thoughĀ  withĀ 

…(Full text truncated)…

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
↵
ESC
⌘K Shortcut