Query Routing and Processing in Peer-To-Peer Data Sharing Systems

Reading time: 6 minute
...

📝 Abstract

Sharing musical files via the Internet was the essential motivation of early P2P systems. Despite of the great success of the P2P file sharing systems, these systems support only “simple” queries. The focus in such systems is how to carry out an efficient query routing in order to find the nodes storing a desired file. Recently, several research works have been made to extend P2P systems to be able to share data having a fine granularity (i.e. atomic attribute) and to process queries written with a highly expressive language (i.e. SQL). These works have led to the emergence of P2P data sharing systems that represent a new generation of P2P systems and, on the other hand, a next stage in a long period of the database research area. ? The characteristics of P2P systems (e.g. large-scale, node autonomy and instability) make impractical to have a global catalog that represents often an essential component in traditional database systems. Usually, such a catalog stores information about data, schemas and data sources. Query routing and processing are two problems affected by the absence of a global catalog. Locating relevant data sources and generating a close to optimal execution plan become more difficult. In this paper, we concentrate our study on proposed solutions for the both problems. Furthermore, selected case studies of main P2P data sharing systems are analyzed and compared.

💡 Analysis

Sharing musical files via the Internet was the essential motivation of early P2P systems. Despite of the great success of the P2P file sharing systems, these systems support only “simple” queries. The focus in such systems is how to carry out an efficient query routing in order to find the nodes storing a desired file. Recently, several research works have been made to extend P2P systems to be able to share data having a fine granularity (i.e. atomic attribute) and to process queries written with a highly expressive language (i.e. SQL). These works have led to the emergence of P2P data sharing systems that represent a new generation of P2P systems and, on the other hand, a next stage in a long period of the database research area. ? The characteristics of P2P systems (e.g. large-scale, node autonomy and instability) make impractical to have a global catalog that represents often an essential component in traditional database systems. Usually, such a catalog stores information about data, schemas and data sources. Query routing and processing are two problems affected by the absence of a global catalog. Locating relevant data sources and generating a close to optimal execution plan become more difficult. In this paper, we concentrate our study on proposed solutions for the both problems. Furthermore, selected case studies of main P2P data sharing systems are analyzed and compared.

📄 Content

                

10.5121/ijdms.2010.2208 116
           Raddad Al King, Abdelkader Hameurlain, Franck Morvan Institut de Recherche en Informatique de Toulouse (IRIT), Université Paul Sabatier 118, route de Narbonne, F-31062 Toulouse Cedex 9, France E-mail: {alking, hameur, morvan}@irit.fr ABSTRACT Sharing musical files via the Internet was the essential motivation of early P2P systems. Despite of the great success of the P2P file sharing systems, these systems support only “simple” queries. The focus in such systems is how to carry out an efficient query routing in order to find the nodes storing a desired file. Recently, several research works have been made to extend P2P systems to be able to share data having a fine granularity (i.e. atomic attribute) and to process queries written with a highly expressive language (i.e. SQL). These works have led to the emergence of P2P data sharing systems that represent a new generation of P2P systems and, on the other hand, a next stage in a long period of the database research area.  The characteristics of P2P systems (e.g. large-scale, node autonomy and instability) make impractical to have a global catalog that represents often an essential component in traditional database systems. Usually, such a catalog stores information about data, schemas and data sources. Query routing and processing are two problems affected by the absence of a global catalog. Locating relevant data sources and generating a close to optimal execution plan become more difficult. In this paper, we concentrate our study on proposed solutions for the both problems. Furthermore, selected case studies of main P2P data sharing systems are analyzed and compared.
KEYWORDS P2P Databases, Query Routing, Schema Matching, Query Processing and Optimization.

  1. INTRODUCTION Nowadays, Peer-to-Peer (hereafter P2P) systems become very popular. This popularity can be seen as a result of the features of these systems such as: scalability, node autonomy, self-configuration and decentralized control. P2P systems offer a good opportunity to overcome the limitations of the Client/Server based systems. By avoiding bottlenecks and being fault tolerant, P2P systems are suitable for large-scale distributed environments in which nodes (interchangeably called peers) can share their resources (e.g. computing power, storage capacity, network bandwidth) in an autonomously and decentralized manner. The more the resources are available in a P2P system, the more the computing power and the storage capacity have important values. This advantage enables P2P systems to perform complex tasks with relatively low cost without any need to powerful servers. In the next section, we highlight the notion of “P2P Systems”. 1.1. P2P Systems There is no agreement about what are P2P systems. Through our reading, we find several definitions of these systems [40, 52, 57]. The definition of [52] represents the systems having one or more servers while the definition of [49] ignores this type of systems. Thus, we agree with the definition of Milojicic et al. [52] as “The term “peer-to-peer” (P2P) refers to a class of systems and applications that employ distributed resources to perform a function in a decentralized manner. The resources encompass computing power, data (storage and content), network bandwidth, and presence (computers, human, and other resources). The critical function can be distributed computing, data/content sharing, communication and collaboration, or platform services. Decentralized may apply to algorithms, data, and metadata or to all of them”. Even if there is no standard definition of                 

117 P2P systems, most researchers characterize them by: (i) scalability in terms of the node number and the resource number; (ii) node autonomy; (iii) dynamicity, (iv) resource heterogeneity, (v) decentralized control and (vi) self-configuration. In such systems, each node can act as: (i) a server when it offers its resources to be used by other nodes, (ii) client when it uses the resources of other nodes, (iii) a router when it propagates coming queries and messages to other nodes and (iv) data source1 when it shares its own data with the system nodes. The researches on P2P systems become more and more numerous and the contexts in which we use these systems become also too much numerous. In this paper, we focus our study on the P2P database context.
1.2. P2P Systems and Database Systems P2P systems are successfully used in several domains such as: file sharing, computing power sharing and instant message exchange. Due to their "

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut