A Join Index for XML Data Warehouses

Reading time: 5 minute
...

📝 Original Info

  • Title: A Join Index for XML Data Warehouses
  • ArXiv ID: 0809.1981
  • Date: 2008-09-12
  • Authors: Researchers from original ArXiv paper

📝 Abstract

XML data warehouses form an interesting basis for decision-support applications that exploit complex data. However, native-XML database management systems (DBMSs) currently bear limited performances and it is necessary to research for ways to optimize them. In this paper, we propose a new join index that is specifically adapted to the multidimensional architecture of XML warehouses. It eliminates join operations while preserving the information contained in the original warehouse. A theoretical study and experimental results demonstrate the efficiency of our join index. They also show that native XML DBMSs can compete with XML-compatible, relational DBMSs when warehousing and analyzing XML data.

💡 Deep Analysis

Deep Dive into A Join Index for XML Data Warehouses.

XML data warehouses form an interesting basis for decision-support applications that exploit complex data. However, native-XML database management systems (DBMSs) currently bear limited performances and it is necessary to research for ways to optimize them. In this paper, we propose a new join index that is specifically adapted to the multidimensional architecture of XML warehouses. It eliminates join operations while preserving the information contained in the original warehouse. A theoretical study and experimental results demonstrate the efficiency of our join index. They also show that native XML DBMSs can compete with XML-compatible, relational DBMSs when warehousing and analyzing XML data.

📄 Full Content

A Join Index for XML Data Warehouses

Hadj Mahboubi, Kamel Aouiche and Jérôme Darmont

University of Lyon (ERIC Lyon 2) 5 avenue Pierre Mendès-France 69676 Bron Cedex France {first-name.last-name}@eric.univ-lyon2.fr

Abstract XML data warehouses form an interesting basis for decision-support applications that exploit complex data. However, native-XML database management systems (DBMSs) currently bear limited performances and it is necessary to research for ways to optimize them. In this paper, we propose a new join index that is specifically adapted to the multidimensional architecture of XML warehouses. It eliminates join operations while preserving the information contained in the original warehouse. A theoretical study and experimental results demonstrate the efficiency of our join index. They also show that native XML DBMSs can compete with XML-compatible, relational DBMSs when warehousing and analyzing XML data.

Keywords XML data warehouses, performance, XML join index, XQuery.

  1. Introduction Decision-support applications nowadays exploit heterogeneous data from various sources. Furthermore, the development of the Web 2.0 and the proliferation of multimedia documents contributed to the analysis of data are not only numerical or symbolic. These so-called complex data may indeed fall in several of the following categories (Darmont et al. 2005): data represented in various formats (databases, texts, images, sounds, videos…); diversely structured data (relational databases, XML document repositories…); data originating from several different sources (distributed databases, the Web…); data described through several channels or points of view (radiographies and audio diagnosis of a physician, data expressed in different scales or languages…); data that change in terms of definition or value (temporal databases, periodical surveys…). For example, analyzing medical data may lead to exploit jointly information under various forms: patient records (classical database), medical history (text), radiographies, echographs (multimedia documents), physician diagnoses (texts or audio recordings), etc.

In this context, XML proves a very interesting tool in the process of integrating and warehousing complex data for analysis (Darmont et al. 2003). However, decision-support queries are generally complex because they involve several join and aggregation operations. In addition, many native XML database management systems (DBMSs) present relatively poor performances when data volume is very large and queries are complex. Thus, it is crucial to design XML data warehouses that guarantee the best performance when accessing data. Indexing is one of the most frequently used techniques to achieve this goal.

Several solutions have been proposed for XML data indexing in the literature. However, the existing techniques support single labeled path expressions within one XML document. A path expression helps explore an XML document and extract a specific node (element) or sub- tree (sub-document). It cannot perform a join operation over several XML documents. In the context of XML data warehouses, decision-support queries are complex and involve several path expressions. Data are also generally distributed into several XML documents due to their large volume. Hence, XML queries should use specific indices to navigate these documents.

In this paper, we propose a new join index structure that is specifically adapted to multidimensional XML data warehouses. This structure is able to maintain a star schema of several XML documents and to preserve the information contained in these documents. It is actually a join index that ensures a faster execution of XQuery decision-support queries by eliminating join costs.

We theoretically and experimentally demonstrate that the use of our index significantly reduces the execution time of XQuery decision-support queries expressed on a data warehouse. In addition, our experiments show that, in our context, XML native DBMSs are competitive with relational, XML-compatible DBMSs.

The remainder of this paper is organized as follows. We first present the state of the art on XML indexing in Section 2. Then, we outline the context of this study in Section 3. We detail our join index structure in Section 4. In order to validate our proposal, we present a theoretical study in Section 5 and some experimental results in Section 6. Finally, we conclude the paper and discuss research perspectives in Section 7.

  1. Related work In this section, we assume that an XML document is defined as a labeled graph whose nodes represent document elements or attributes, and edges represent the element-subelement (or parent-child) relationship. Edges are labeled with element or attribute names.

Several studies address the issue of XML data indexing (Goldman & Widom 1997; Milo & Suciu, 1999; Cooper et al. 2001; Chung et al. 2002).

…(Full text truncated)…

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut