Semantic Information Retrieval from Distributed Heterogeneous Data Sources

February 23, 2026

Reading time: 6 minute

...

📝 Original Info

Title: Semantic Information Retrieval from Distributed Heterogeneous Data Sources
ArXiv ID: 0707.0745
Date: 2007-07-06
Authors: Researchers from original ArXiv paper

📝 Abstract

Information retrieval from distributed heterogeneous data sources remains a challenging issue. As the number of data sources increases more intelligent retrieval techniques, focusing on information content and semantics, are required. Currently ontologies are being widely used for managing semantic knowledge, especially in the field of bioinformatics. In this paper we describe an ontology assisted system that allows users to query distributed heterogeneous data sources by hiding details like location, information structure, access pattern and semantic structure of the data. Our goal is to provide an integrated view on biomedical information sources for the Health-e-Child project with the aim to overcome the lack of sufficient semantic-based reformulation techniques for querying distributed data sources. In particular, this paper examines the problem of query reformulation across biomedical data sources, based on merged ontologies and the underlying heterogeneous descriptions of the respective data sources.

💡 Deep Analysis

Deep Dive into Semantic Information Retrieval from Distributed Heterogeneous Data Sources.

📄 Full Content

Over the past few years, the biomedical domain has been witnessing a tremendous increase in the number of data providers, the volume, and heterogeneity of generated data. To enable knowledge discovery, clinicians' queries generally require an integrated and merged view of the data available across distributed data sources. The associated query processing is therefore based on searching for information in documents, searching within (often very heterogeneous) databases and searching for metadata or descriptions of data. Query reformulation is a part of this query processing whose main objective is to extend a user query in order to retrieve additional 1 See www.Health-e-Child.org meaningful results and to access data from data source(s) according to user needs. In recent years, several methods have been proposed that use semantic knowledge and mapping details to reformulate a user query in order to provide quick and intelligent answers to the queries.

Ontology integration and merging approaches are widely being used for the integration of information from distributed heterogeneous data sources [F. Hakimpour 2001]. In order to effectively utilize an integrated or merged ontology, intelligent query reformulation techniques are often required. The challenging problem here is that query reformulation ought to be based on the merged ontology and the descriptions of underlying heterogeneous data sources with the goal of overcoming the lack of sufficient semantic-based reformulation techniques for querying distributed heterogeneous data sources.

In this position paper we describe a framework for a data integration system which provides access to distributed heterogeneous data sources. Our aim here is to demonstrate how a merged ontology that is constructed over distributed information source ontologies can effectively be exploited to reformulate a user query that suits the needs of the user.

The data integration and semantic information retrieval concept presented in this paper will lead towards the construction of powerful query reformulation rules to be utilized in the European Health-e-Child (HeC) [J. Freund 2006] project. The Health-e-Child project aims to develop an integrated healthcare platform for European paediatrics, providing seamless integration of traditional and emerging sources of biomedical information. The long-term goal of the project is to provide uninhibited access to universal biomedical knowledge repositories for personalised and preventive healthcare, large-scale information-based biomedical research and training, and informed policy making.

In the remaining part of this paper we begin by discussing and analyzing the feasibility of using existing biomedical information integration systems.

We have closely analyzed the two most cited ontology based information integration approaches namely data warehousing and mediation. Finally, after presenting related work in sections 2 and 3 we outline our methodology for semantic data retrieval based on distributed heterogeneous data sources in which we utilise a merged ontology and identify the challenging problem of query reformulation on the basis of merged ontology and data source descriptions.

Today, we are faced with the challenging problems of dealing with distributed and heterogeneous data sources containing huge amounts of data in varieties of semantic structures. Designing a data integration system is a complex task which involves major issues that include the heterogeneity of the underlying data sources, the difference in access mechanisms, the support of query languages and aspects of semantic heterogeneity in relation to their data models. Currently ontologies are being widely used to overcome the problem of semantic heterogeneity. In this paper we introduce an architecture which utilises ontologies for data integration to provide access to distributed heterogeneous data sources. We use a merged ontology and an associated mapping of information which will enable us to construct query reformulation rules for the semantic information retrieval to be utilised in the HeC project.

Currently ontologies are being used as the basis for communication for representing and storing data, for knowledge sharing, classification and organization of data resources and for policy enforcement etc. The term ‘ontology’ has been defined in many different ways [B. Chandrasekaran 1999, M. Uschold 1996and C. Wroe 2003]. The simplest definition of ontology is that “it describes the logical structure of a domain, its concepts and the relationships”. A domain ontology means an ontology that has been built for a particular subject or for a specific problem in the domain e.g. bioinformatics, geophysics, brain tumors, cardiac disease etc. or for any sub-type of a particular subject e.g. neurons. A number of ontologies have been developed for the purposes of managing and extracting semantic knowledge from on-line literature and databases.

Biomedical information sour

…(Full text truncated)…

📄 Read Full PDF on ArXiv

📸 Image Gallery

Reference

This content is AI-processed based on ArXiv data.

Semantic Information Retrieval from Distributed Heterogeneous Data Sources

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Related Posts

A Complex Data Warehouse for Personalized, Anticipative Medicine

Adaptive Tuning Algorithm for Performance tuning of Database Management System

Preserving Individual Privacy in Serial Data Publishing

Start searching

No results found