Query Evaluation and Optimization in the Semantic Web

Reading time: 5 minute
...

📝 Original Info

  • Title: Query Evaluation and Optimization in the Semantic Web
  • ArXiv ID: 0711.2087
  • Date: 2007-11-15
  • Authors: ** 논문 본문에 저자 정보가 명시되지 않았습니다. (일반적으로 해당 논문은 2006년경 발표된 것으로 추정됩니다.) **

📝 Abstract

We address the problem of answering Web ontology queries efficiently. An ontology is formalized as a Deductive Ontology Base (DOB), a deductive database that comprises the ontology's inference axioms and facts. A cost-based query optimization technique for DOB is presented. A hybrid cost model is proposed to estimate the cost and cardinality of basic and inferred facts. Cardinality and cost of inferred facts are estimated using an adaptive sampling technique, while techniques of traditional relational cost models are used for estimating the cost of basic facts and conjunctive ontology queries. Finally, we implement a dynamic-programming optimization algorithm to identify query evaluation plans that minimize the number of intermediate inferred facts. We modeled a subset of the Web ontology language OWL Lite as a DOB, and performed an experimental study to analyze the predictive capacity of our cost model and the benefits of the query optimization technique. Our study has been conducted over synthetic and real-world OWL ontologies, and shows that the techniques are accurate and improve query performance. To appear in Theory and Practice of Logic Programming (TPLP).

💡 Deep Analysis

📄 Full Content

Ontology systems usually provide reasoning and retrieval services that identify the basic facts that satisfy a requirement, and derive implicit knowledge using the ontology's inference axioms. In the context of the Semantic Web, the number of inferred facts can be extremely large. On one hand, the amount of basic ontology facts (domain concepts and Web source annotations) can be considerable, and on the other hand, Open World reasoning in Web ontologies may yield a large space of choices. Therefore, efficient evaluation strategies are needed in Web ontology's inference engines.

In our approach, ontologies are formalized as a deductive database called a Deductive Ontology Base (DOB). The extensional database comprises all the ontology language’s statements that represent the explicit ontology knowledge. The intensional database corresponds to the set of deductive rules which define the semantics of the ontology language. We provide a cost-based optimization technique for Web ontologies represented as a DOB.

Traditional query optimization techniques for deductive databases systems include join-ordering strategies, and techniques that combine a bottom-up evaluation with top-down propagation of query variable bindings in the spirit of the Magic-Sets algorithm (Ramakrishnan and Ullman 1993). Join-ordering strategies may be heuristic-based or cost-based; some cost-based approaches depend on the estimation of the join selectivity; others rely on the fan-out of a literal (Staudt et al. 1999). Cost-based query optimization has been successfully used by relational database management systems; however, these optimizers are not able to estimate the cost or cardinality of data that do not exist a priori, which is the case of intensional predicates in a DOB.

We propose a hybrid cost model that combines two techniques for cardinality and cost estimation: (1) the sampling technique proposed in (Lipton and Naughton 1990;Lipton et al. 1990) is applied for the estimation of the evaluation cost and cardinality of intensional predicates, and (2) a cost model à la System R cost model is used for the estimation of the cost and cardinality of extensional predicates and the cost of conjunctive queries.

Three evaluation strategies are considered for “joining” predicates in conjunctive queries. They are based on the Nested-Loop, Block Nested-Loop, and Hash Join operators of relational databases (Ramakrishnan and Gehrke 2003). To identify a good evaluation plan, we provide a dynamic-programming optimization algorithm that orders subgoals in a query, considering estimates of the subgoal’s evaluation cost.

We modeled a subset of the Web ontology language OWL Lite (McGuinness and Harmelen 2004) as a DOB, and performed experiments to study the predictive capacity of the cost model and the benefits of the ontology query optimization techniques. The study has been conducted over synthetic and real-world OWL ontologies. Preliminary results show that the cost-model estimates are pretty accurate and that optimized queries are significantly less expensive than non-optimized ones.

Our current formalism does not represent the OWL built-in constructor Comple-mentOf. We stress that in practice this is not a severe limitation. For example, this operator is not used in any of the three real-world ontologies that we have studied in our experiments; and in the survey reported in (Wang 2006), only 21 ontologies out of 688 contain this constructor.

Our work differs from other systems in the Semantic Web that combine a Description Logics (DL) reasoner with a relational DBMS in order to solve the scalability problems for reasoning with individuals (Calvanese et al. 2005;Haarslev and Moller 2004;Horrocks and Turi 2005;Pan and Hefflin 2003). Clearly, all of these systems use the query optimization component embedded in the relational DBMS; however, they do not develop cost-based optimization for the implicit knowledge, that is, there is no estimation of the cost of data not known a priori.

Other systems use Logic Programming (LP) to reason on large-scale ontologies. This is the case of the projects described in (Grosof et al. 2003;Hustadt and Motik 2005;Motik et al. 2003) . In Description Logic Programs (DLP) (Grosof et al. 2003), the expressive intersection between DL and LP without function symbols is defined. DL queries are reduced to LP queries and efficient LP algorithms are explored. The project described in (Hustadt and Motik 2005;Motik et al. 2003) reduces a SHIQ knowledge base to a Disjunctive Datalog program. Both projects apply Magic-Sets rewriting techniques but to the best of our knowledge, no cost-based optimization techniques have been developed. The OWL Lite -species of the OWL language proposed in (Bruijn et al. 2004) is based in the DLP project; it corresponds to the portion of the OWL Lite language that can be translated to Datalog. All of these systems develop LP reasoning with individuals, whereas in the DOB model we develop Data

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut