XML data warehouses form an interesting basis for decision-support applications that exploit complex data. However, native-XML database management systems (DBMSs) currently bear limited performances and it is necessary to research for ways to optimize them. In this paper, we propose a new join index that is specifically adapted to the multidimensional architecture of XML warehouses. It eliminates join operations while preserving the information contained in the original warehouse. A theoretical study and experimental results demonstrate the efficiency of our join index. They also show that native XML DBMSs can compete with XML-compatible, relational DBMSs when warehousing and analyzing XML data.
Deep Dive into A Join Index for XML Data Warehouses.
XML data warehouses form an interesting basis for decision-support applications that exploit complex data. However, native-XML database management systems (DBMSs) currently bear limited performances and it is necessary to research for ways to optimize them. In this paper, we propose a new join index that is specifically adapted to the multidimensional architecture of XML warehouses. It eliminates join operations while preserving the information contained in the original warehouse. A theoretical study and experimental results demonstrate the efficiency of our join index. They also show that native XML DBMSs can compete with XML-compatible, relational DBMSs when warehousing and analyzing XML data.
A Join Index for XML Data Warehouses
Hadj Mahboubi, Kamel Aouiche and Jérôme Darmont
University of Lyon (ERIC Lyon 2)
5 avenue Pierre Mendès-France
69676 Bron Cedex
France
{first-name.last-name}@eric.univ-lyon2.fr
Abstract
XML data warehouses form an interesting basis for decision-support applications that exploit
complex data. However, native-XML database management systems (DBMSs) currently bear
limited performances and it is necessary to research for ways to optimize them. In this paper,
we propose a new join index that is specifically adapted to the multidimensional architecture
of XML warehouses. It eliminates join operations while preserving the information contained
in the original warehouse. A theoretical study and experimental results demonstrate the
efficiency of our join index. They also show that native XML DBMSs can compete with
XML-compatible, relational DBMSs when warehousing and analyzing XML data.
Keywords
XML data warehouses, performance, XML join index, XQuery.
- Introduction
Decision-support applications nowadays exploit heterogeneous data from various sources.
Furthermore, the development of the Web 2.0 and the proliferation of multimedia documents
contributed to the analysis of data are not only numerical or symbolic. These so-called
complex data may indeed fall in several of the following categories (Darmont et al. 2005):
data represented in various formats (databases, texts, images, sounds, videos…); diversely
structured data (relational databases, XML document repositories…); data originating from
several different sources (distributed databases, the Web…); data described through several
channels or points of view (radiographies and audio diagnosis of a physician, data expressed
in different scales or languages…); data that change in terms of definition or value (temporal
databases, periodical surveys…). For example, analyzing medical data may lead to exploit
jointly information under various forms: patient records (classical database), medical history
(text), radiographies, echographs (multimedia documents), physician diagnoses (texts or audio
recordings), etc.
In this context, XML proves a very interesting tool in the process of integrating and
warehousing complex data for analysis (Darmont et al. 2003). However, decision-support
queries are generally complex because they involve several join and aggregation operations.
In addition, many native XML database management systems (DBMSs) present relatively
poor performances when data volume is very large and queries are complex. Thus, it is crucial
to design XML data warehouses that guarantee the best performance when accessing data.
Indexing is one of the most frequently used techniques to achieve this goal.
Several solutions have been proposed for XML data indexing in the literature. However, the
existing techniques support single labeled path expressions within one XML document. A
path expression helps explore an XML document and extract a specific node (element) or sub-
tree (sub-document). It cannot perform a join operation over several XML documents. In the
context of XML data warehouses, decision-support queries are complex and involve several
path expressions. Data are also generally distributed into several XML documents due to their
large volume. Hence, XML queries should use specific indices to navigate these documents.
In this paper, we propose a new join index structure that is specifically adapted to
multidimensional XML data warehouses. This structure is able to maintain a star schema of
several XML documents and to preserve the information contained in these documents. It is
actually a join index that ensures a faster execution of XQuery decision-support queries by
eliminating join costs.
We theoretically and experimentally demonstrate that the use of our index significantly
reduces the execution time of XQuery decision-support queries expressed on a data
warehouse. In addition, our experiments show that, in our context, XML native DBMSs are
competitive with relational, XML-compatible DBMSs.
The remainder of this paper is organized as follows. We first present the state of the art on
XML indexing in Section 2. Then, we outline the context of this study in Section 3. We detail
our join index structure in Section 4. In order to validate our proposal, we present a theoretical
study in Section 5 and some experimental results in Section 6. Finally, we conclude the paper
and discuss research perspectives in Section 7.
- Related work
In this section, we assume that an XML document is defined as a labeled graph whose nodes
represent document elements or attributes, and edges represent the element-subelement (or
parent-child) relationship. Edges are labeled with element or attribute names.
Several studies address the issue of XML data indexing (Goldman & Widom 1997; Milo &
Suciu, 1999; Cooper et al. 2001; Chung et al. 2002).
…(Full text truncated)…
This content is AI-processed based on ArXiv data.