In this study, we address the problem of answering queries over a peer-to-peer system of taxonomy-based sources. A taxonomy states subsumption relationships between negation-free DNF formulas on terms and negation-free conjunctions of terms. To the end of laying the foundations of our study, we first consider the centralized case, deriving the complexity of the decision problem and of query evaluation. We conclude by presenting an algorithm that is efficient in data complexity and is based on hypergraphs. More expressive forms of taxonomies are also investigated, which however lead to intractability. We then move to the distributed case, and introduce a logical model of a network of taxonomy-based sources. On such network, a distributed version of the centralized algorithm is then presented, based on a message passing paradigm, and its correctness is proved. We finally discuss optimization issues, and relate our work to the literature.
Consider a tetrad (T, , Obj , I) where T is a set of terms, is a subsumption relation over concepts expressed using T (e.g. (Animal ∧ FlyingObject) ∨ Penguin Bird), Obj is a set of objects and I is a function from T to P(Obj ), assigning a description (i.e., a set of terms) to each object. Now assume that all these are not stored at a single place but they are distributed over a set N = {S 1 , . . . , S n } of independent peers. Moreover assume that each peer S i can have zero, one or more -relationships between its terms (i.e. T i ) and some concepts over the terminologies of other peers (e.g. Parrot j Birds i and Animal k ∧ Flying k Birds i ).
In this paper we address the problem of answering Boolean queries over this kind of systems. Some parts of the work reported in this paper have been already published. Namely, [40] presents a first model of a network of articulated sources, while [39] studies query evaluation on taxonomies includ-ing only term-to-term subsumption relationships. Finally, [30] presents a procedure for evaluating queries over centralized sources supporting term-to-query subsumption relationships, as well as hardness results for extensions. In this paper, -we consider from the start the most complex type of subsumption for which we can propose an efficient query evaluation procedure, allowing subsumption relationships between negation-free DNF combinations of terms and negation-free conjunctions of terms. We then place the hardness results presented in [30] in context, thus showing that any Boolean extension of the expressive power of subsumption leads to intractability of the query answering problem;
-we ground the centralized query evaluation procedure for this kind of sources, presented in [30], on solid theoretical basis, proving its correctness, and linking it to the existing algorithmic and complexity literature;
-we present a distributed query evaluation procedure, based on a functional model of a peer; correctness and complexity of this procedure are given;
-we describe optimization techniques that can be used for improving the efficiency of query evaluation;
-we relate our work to the existing literature on peer-to-peer systems.
The paper is structured as follows: Section 2 gives the background on peer-to-peer systems, while Section 3 introduces sources, presenting the centralized query evaluation procedure. Networks of sources are considered in Section 4, where our algorithm for query evaluation on networks is presented, and Section 5 discusses optimization issues. Section 6 compares our work with related work and Section 7 concludes the paper.
A peer-to-peer (P2P) system is a distributed system in which participants (the peers) rely on one another for service, rather than solely relying on dedicated and often centralized servers. The most popular P2P systems have focused on specific application domains like music file sharing [3,1,2]) or on providing file-system-like capabilities [8]. In most of the cases, these systems do not provide semantic-based retrieval services as the name of an object (e.g. the title of a music file) is the only means for describing the contents of an object.
Semantic-based retrieval in P2P systems is a great challenge that raises questions about data models, conceptual modeling, query languages, algorithms and data structures for query evaluation, and techniques for dynamic schema mapping. Roughly, the language that can be used for indexing the objects of the domain and for formulating semantic-based queries, can be free (e.g natural language) or controlled, i.e. object descriptions and queries may have to conform to a specific vocabulary and syntax. The former case, resembles distributed Information Retrieval (IR) systems and this approach is applicable in the case where the objects of the domain have a textual content (e.g. [29,27,14,37]). In the latter case, the objects of a peer are indexed according to a specific conceptual model represented in a particular data model (e.g. relational, object-oriented, logic-based, etc), and content searches are formulated using a specific query language. Of course, a P2P system might impose a single conceptual model on all participants to enforce uniform, global access, but this will be too restrictive. Alternatively, a limited number of conceptual models may be allowed, so that traditional information mediation and integration techniques will likely apply (with the restriction that there is no central authority), e.g. see [32,31].
The case of fully heterogeneous conceptual models makes uniform global access extremely challenging and this is the focus of this paper. From a data modeling point of view several approaches for P2P systems have been proposed recently, including relational-based approaches [7], XML-based approaches [24] and RDF-based [31].
In this paper we consider the fully heterogeneous conceptual model approach (where each peer can have its own schema), with the only restriction that eac
This content is AI-processed based on open access ArXiv data.