Ontology Based Pivoted normalization using Vector Based Approach for information Retrieval

Reading time: 4 minute
...

📝 Abstract

The proposed methodology is procedural i.e. it follows finite number of steps that extracts relevant documents according to users query. It is based on principles of Data Mining for analyzing web data. Data Mining first adapts integration of data to generate warehouse. Then, it extracts useful information with the help of algorithm. The task of representing extracted documents is done by using Vector Based Statistical Approach that represents each document in set of Terms.

💡 Analysis

The proposed methodology is procedural i.e. it follows finite number of steps that extracts relevant documents according to users query. It is based on principles of Data Mining for analyzing web data. Data Mining first adapts integration of data to generate warehouse. Then, it extracts useful information with the help of algorithm. The task of representing extracted documents is done by using Vector Based Statistical Approach that represents each document in set of Terms.

📄 Content

Ontology Based Pivoted normalization using Vector – Based Approach for information Retrieval Vishal Jain1 and Dr. Mayank Singh2 1Research Scholar, Computer Science and Engineering Department, Lingaya’s University, Faridabad 2Associate Professor, Computer Science and Engineering Department, Lingaya’s University, Faridabad 1vishaljain83@ymail.com, 2mayanksingh2005@gmail.com ABSTRACT An ample amount of documents present on web puts the users in state of dilemma. Users get confused about relevance of documents. Relevance means how closely the given query matches large number of documents. Many information extraction techniques are used for extracting documents but they all are in vain. The paper deals with the problem of classification, analyzing and extraction of web documents by using one of information extraction methods called Ontology Based Web Content Mining Methodology. We have evaluated proposed methodology in two specific domains- weather domain (web pages containing information about weather forecasting and analysis) and Google TM collection (web pages containing news). The proposed methodology is procedural i.e. it follows finite number of steps that extracts relevant documents according to user’s query. It is based on principles of Data Mining for analyzing web data. Data Mining first adapts integration of data to generate warehouse. Then, it extracts useful information with the help of algorithm. The task of representing extracted documents is done by using Vector Based Statistical Approach that represents each document in set of Terms.

Keywords Data Mining, Ontology, Ontology Web Content Mining Methodology, WORDnet, Vector Based Approach

  1. INTRODUCTION Data Mining is called as Knowledge Discovery in Databases (KDD) [1]. It is multi-level field i.e. it includes different areas like Database Systems, Information Retrieval (IR), Machine Learning etc.
    Prediction and Description are considered as two goals of Data Mining where Prediction involves use of some variables or records in database to predict future values of other variables while Description finds useful patterns describing the given data.

Initial Target Data Pre-Processed Final Data Model Data (P1) (P2) Data (P3) (P4) (P5) Figure 1: KDD Process [2] Building Ontology needs attention of domain expert that represents concepts and relations between them for a given domain. There are many algorithms used for extracting and discovering knowledge from structured data like Naïve Bayes, K-Means etc. The proposed methodology builds ontology for a given domain by using phases of data mining like Data preparation, Data Mapping, extracting knowledge from mapped data etc. Then, classification algorithm is used for writing generated ontology expressed in OWL and XML languages.
There are various uses of Ontology: • Used for knowledge sharing and reuse. • Can improve understanding between concepts. • It is useful in Semantic Web that is information in machine form. • Some search engines use ontology for finding relevant pages related to given query. The paper is divided into following sections: Section 2 gives information about following concepts: • Domain Ontology • Stages of Ontology Based Web Content Mining Methodology • Increasing accuracy of classification of web documents using WORDnet. Section 3 describes representation schemes of documents extracted during Ontology Based Phrase Extractor by using Statistical Vector Based Approach. Section 4 concludes about given paper.

  1. CLASSIFYING AND ANALYSING WEB DOCUMENTS Many information extraction methods and techniques were used but they all are in vain. So we need more intelligent system to gather useful information from huge amount of data.
    Problem: - To find meaningful and informative documents with help of Data Mining algorithms and then interpreting mining results in expressive way. Solution: - Ontology Based Web Content Mining Methodology Approach involved: - The proposed methodology uses concept of Domain Ontology [3]. Domain Ontology organizes concepts, relations and instances into given domain. This approach is used because it resolves synonyms and reducing confusion among agents

2.1 Ontology Web Based Content Mining Ontology Based Web Content Mining represents conceptual information about given domain. It shows document representation, extraction of relevant information from text documents and creates classification models. This methodology is followed that uses the ideas and principles of Data Mining to analyze web data.

          Includes            

Figure 2: Stages of On

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut