Querying XML Documents in Logic Programming

Reading time: 7 minute
...

📝 Original Info

  • Title: Querying XML Documents in Logic Programming
  • ArXiv ID: 0710.4780
  • Date: 2007-10-25
  • Authors: Researchers from original ArXiv paper

📝 Abstract

Extensible Markup Language (XML) is a simple, very flexible text format derived from SGML. Originally designed to meet the challenges of large-scale electronic publishing, XML is also playing an increasingly important role in the exchange of a wide variety of data on the Web and elsewhere. XPath language is the result of an effort to provide address parts of an XML document. In support of this primary purpose, it becomes in a query language against an XML document. In this paper we present a proposal for the implementation of the XPath language in logic programming. With this aim we will describe the representation of XML documents by means of a logic program. Rules and facts can be used for representing the document schema and the XML document itself. In particular, we will present how to index XML documents in logic programs: rules are supposed to be stored in main memory, however facts are stored in secondary memory by using two kind of indexes: one for each XML tag, and other for each group of terminal items. In addition, we will study how to query by means of the XPath language against a logic program representing an XML document. It evolves the specialization of the logic program with regard to the XPath expression. Finally, we will also explain how to combine the indexing and the top-down evaluation of the logic program. To appear in Theory and Practice of Logic Programming (TPLP)"

💡 Deep Analysis

Deep Dive into Querying XML Documents in Logic Programming.

Extensible Markup Language (XML) is a simple, very flexible text format derived from SGML. Originally designed to meet the challenges of large-scale electronic publishing, XML is also playing an increasingly important role in the exchange of a wide variety of data on the Web and elsewhere. XPath language is the result of an effort to provide address parts of an XML document. In support of this primary purpose, it becomes in a query language against an XML document. In this paper we present a proposal for the implementation of the XPath language in logic programming. With this aim we will describe the representation of XML documents by means of a logic program. Rules and facts can be used for representing the document schema and the XML document itself. In particular, we will present how to index XML documents in logic programs: rules are supposed to be stored in main memory, however facts are stored in secondary memory by using two kind of indexes: one for each XML tag, and other for e

📄 Full Content

Extensible Markup Language (XML) (W3C 2007a) is a simple, very flexible text format derived from SGML. Originally designed to meet the challenges of largescale electronic publishing, XML is also playing an increasingly important role in the exchange of a wide variety of data on the Web and elsewhere.

XPath language (W3C 2007b) is the result of an effort to provide address parts of an XML document. In support of this primary purpose, it becomes in a query language against an XML document, providing basic facilities for manipulation of strings, numbers and booleans. XPath uses a compact, non-XML syntax to facilitate the use of XPath within URIs and XML attribute values. XPath operates on the abstract, logical structure of an XML document, rather than its surface syntax. XPath gets its name from its use of a path notation as in URLs for navigating through the hierarchical structure of an XML document.

Essential to semi-structured data (Abiteboul et al. 2000) is the selection of data from incompletely specified data items as in an XML document. For such data selection, the XPath language is a path language which provides constructors similar to regular expressions and “wildcards” allowing a flexible node retrieval. The XML schema (W3C 2001), which is also an XML document, defines the structure of well-formed documents and thus it can be seen as a type definition.

The integration of logic programming languages and web technologies, in particular XML data processing, is interesting from the point of view of the applicability of logic programming.

On one hand, XML documents are the standard format of exchanging information between applications, therefore logic languages should be able to handle and query such documents.

On the other hand, logic languages could be used for extracting and inferring semantic information from XML documents, in the line of “Semantic Web” requirements (Berners-Lee et al. 2001). Therefore logic languages can find a natural and interesting application field in this area.

In this paper, we are interested in the use of logic programming for handling XML documents and XPath queries. In this context, our contributions can be summarized as follows:

  1. An XML document can be seen as a logic program by considering facts and rules for expressing both the XML schema and document.

On one hand, rules can describe the schema of an XML document in which a (possibly recursive) definition specifies the well-formed documents.

On the other hand, each XML document can be described by means of facts, one for each terminal item (i.e. the XML tree leaves). Although the XML schema is usually available for XML documents, our method has been studied for extracting the XML schema from the XML document itself. It can be considered in a certain sense as a type inference. As future work, we will consider to adapt our technique to directly translate XML schemas into logic rules. 2. Our second contribution is the following: once XML documents can be described by means of a logic program, an XPath expression against the document requires to obtain a subset of the Herbrand model (Apt 1990) represented by the logic program. In other words, only a subset of the facts representing the XML document is required for each XPath query.

Our idea is to provide a specialization program method in order to retrieve only the subset of the Herbrand model required for answering the query. In other words, we will specialize the logic program representing an XML document with regard to an XPath expression in order to get the answer; that is, the XML data relevant to the query.

Basically, the specialization technique will consist on specialization of rules by removing and reordering predicates. It will be achieved on the rules for the schema of the XML document, which now can be used for retrieving a subset of the set of facts representing the XML document. In addition, for each XPath query, a specific goal (or goals) is called, where appropriate arguments can be instantiated. It depends on the occurrences of boolean conditions in the XPath expression. 3. Our technique allows the handling of XML documents as follows.

Firstly, the XML document is loaded. It involves the translation of the XML document into a logic program. For efficiency reasons, the rules corresponding to the XML schema are loaded in main memory, but facts, which basically represent the XML document, are stored in secondary memory (using appropriate indexing techniques) whenever they do not fit in main memory.

Secondly, the user can now write queries against the loaded document. For query solving the logic program (corresponding to the XML schema) is specialized for each query, and the top-down evaluation of such specialized program computes the answer. The indexing technique allows that the query solving is more efficient, that is, it uses indexes for retrieving the facts required for the answer. 4. We have developed a prototype called XIndalog which impleme

…(Full text truncated)…

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut