Comparative Study Of Data Mining Query Languages

Reading time: 5 minute
...

📝 Original Info

  • Title: Comparative Study Of Data Mining Query Languages
  • ArXiv ID: 1701.08190
  • Date: 2017-01-31
  • Authors: Researchers from original ArXiv paper

📝 Abstract

Since formulation of Inductive Database (IDB) problem, several Data Mining (DM) languages have been proposed, confirming that KDD process could be supported via inductive queries (IQ) answering. This paper reviews the existing DM languages. We are presenting important primitives of the DM language and classifying our languages according to primitives' satisfaction. In addition, we presented languages' syntaxes and tried to apply each one to a database sample to test a set of KDD operations. This study allows us to highlight languages capabilities and limits, which is very useful for future work and perspectives.

💡 Deep Analysis

Deep Dive into Comparative Study Of Data Mining Query Languages.

Since formulation of Inductive Database (IDB) problem, several Data Mining (DM) languages have been proposed, confirming that KDD process could be supported via inductive queries (IQ) answering. This paper reviews the existing DM languages. We are presenting important primitives of the DM language and classifying our languages according to primitives’ satisfaction. In addition, we presented languages’ syntaxes and tried to apply each one to a database sample to test a set of KDD operations. This study allows us to highlight languages capabilities and limits, which is very useful for future work and perspectives.

📄 Full Content

COMPARATIVE STUDY OF DATA MINING QUERY LANGUAGES Mohamed Anis Bach Tobji LARODEC Laboratory – Institut Supérieur de Gestion de Tunis 41 rue de la liberté Bouchoucha, 2000, Tunis - Tunisia ABSTRACT Since formulation of Inductive Database (IDB) problem, several Data Mining (DM) languages have been proposed, confirming that KDD process could be supported via inductive queries (IQ) answering. This paper reviews the existing DM languages. We are presenting important primitives of the DM language and classifying our languages according to primitives‟ satisfaction. In addition, we presented languages‟ syntaxes and tried to apply each one to a database sample to test a set of KDD operations. This study allows us to highlight languages capabilities and limits, which is very useful for future work and perspectives. KEYWORDS Knowledge Discovery from Databases, Inductive Database, Data Mining Languages. 1. INTRODUCTION IDB is a new generation of databases introduced in (Imielinski and Mannila, 1996) as a framework of KDD (Fayyad et al, 1996). An inductive database contains data and patterns that are extracted from. Databases are generally supported by SQL language, however, IDBs are supported by a DM Query Language, which allows KDD operations (mainly data selection, data preprocessing, patterns mining and pattern post-processing). The development of theoretical framework is interesting and has been the subject of many researches (Boulicaut et al, 1999), (De Raedt, 2003), (Dan Lee and De Raedt, 2003), (De Raedt et al, 2004). However, there is no clear definition or formalization, such as an algebra language that could be a base for a standard DM query language. In fact, the KDD community would reproduce the success of SQL based on Codd‟s algebra (Codd, 1970). In this paper we study existing DM languages to try to find out advantages and limits. The paper is organized as the following: In section 2 we present essential DM query language primitives. In section 3 we compare six existing DM query languages with a taxonomy based on primitives‟ satisfaction. In section 4, we show the languages in action, i.e., we give a small database and we perform some data mining operations using languages‟ queries. Finally in section 5, we discuss the study, and we give some perspectives related to the existent languages weaknesses. 2. INDUCTIVE QUERY LANGUAGE PRIMITIVES Data mining query language primitives‟ definition is a basic problem. Once primitives are defined, conceiving a good DM query language will be easier. In this section we give the primitives as defined in (Han and Kamber,2000), (Botta et al, 2004), and languages papers (Imielinski and Virmani, 1999), (Meo et al, 2002), (Han et al,1996), (Morzy and Zakrzewic, 1997), (Netz et al,2000) and (Elfeky et al, 2000). A data mining query language must offer: - Data selection: it‟s naturally satisfied if the language nests SQL. The language must provide data selection query. - Pre-processing task: providing pre-processing operations (sampling, discretization, data cleaning etc. ) - Specifying the data mining task: mining several patterns kinds (decision trees, sequential and association rules etc). - Specification of background knowledge: background knowledge is information about the application field. This primitive offers to the Data Miner the opportunity to specify his domain knowledge which affects positively the mined knowledge quality. Concept hierarchy is the most used background knowledge (Han and Kamber,2000). - Specification of constraints mining: specification of constraints set that the patterns must satisfy. - Closure property: the result of data mining query could be re-queried such as for SQL. - Post-processing task: the user should be able to query extracted patterns, cross over patterns and data etc. 3. THE COMPARATIVE TABLE In this section, we study six DM query languages. We present these languages according to a set of properties corresponding to the primitives defined in the previous section. Thus, we classify the languages in a table such that rows correspond to properties and columns to languages. Eeach cell (crossing a property Pi and a language Lj) is the satisfaction degree of the property Pi by the language Lj (see table 1). Table 1 contains two parts. In the first one, each language is described generally (language authors, design, year etc). In the second part, we present the functionalities provided by each language as explained on the top. 4. DATA MINING QUERY LANGUAGES IN ACTION In this section, we explore DM query languages capabilities and we present the syntax of each language. In addition, we set a database example about supermarket sales (see table 2) and tried to write some queries to perform KDD operations that turn around the DM step, mainly in order to extract association rules since their mining is provided by all the languages. 4.1. MSQL MSQL has four main queries: - Create

…(Full text truncated)…

📸 Image Gallery

cover.png page_2.webp page_3.webp

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut