Data Mining-based Materialized View and Index Selection in Data Warehouses

Reading time: 5 minute
...

📝 Original Info

  • Title: Data Mining-based Materialized View and Index Selection in Data Warehouses
  • ArXiv ID: 0707.1548
  • Date: 2007-07-12
  • Authors: ** 논문에 명시된 저자 정보가 제공되지 않았습니다. (가능하면 원문에서 확인 필요) **

📝 Abstract

Materialized views and indexes are physical structures for accelerating data access that are casually used in data warehouses. However, these data structures generate some maintenance overhead. They also share the same storage space. Most existing studies about materialized view and index selection consider these structures separately. In this paper, we adopt the opposite stance and couple materialized view and index selection to take view-index interactions into account and achieve efficient storage space sharing. Candidate materialized views and indexes are selected through a data mining process. We also exploit cost models that evaluate the respective benefit of indexing and view materialization, and help select a relevant configuration of indexes and materialized views among the candidates. Experimental results show that our strategy performs better than an independent selection of materialized views and indexes.

💡 Deep Analysis

📄 Full Content

Large-scale usage of databases in general and data warehouses in particular requires an administrator whose principal role is data management, both at the logical level (schema definition) and physical level (files and disk storage), as well as performance optimization. With the wide development of Database Management Systems (DBMSs), minimizing the administration function has become crucial (Chaudhuri & Narasayya, 1997). One important administration task is the selection of suitable physical structures to improve system performance by minimizing data access time (Finkelstein, Schkolnick, & Tiberio, 1988).

Among techniques adopted in data warehouse relational implementations for improving query performance, view materialization and indexing are presumably the most effective (Rizzi & Saltarelli, 2003). Materialized views are physical structures that improve data access time by precomputing intermediary results. Therefore, end-user queries can be efficiently processed through data stored in views and do not need to access the original data. Indexes are also physical structures that allow direct data access. They avoid sequential scans and thereby reduce query response time. Nevertheless, exploiting either materialized views or indexes requires additional storage space and entails maintenance overhead when refreshing the data warehouse. The issue is thus to select an appropriate configuration (set) of materialized views and indexes that minimizes query response time and the selected data structures’ maintenance cost, given a limited storage space.

The literature regarding materialized view and index selection in relational databases and data warehouses is quite abundant. However, we have identified two key issues requiring enhancements. First, the actual selection of suitable candidate materialized views and indexes is rarely addressed in existing approaches. Most of them indeed present scaling problems at this level. Second, none of these approaches takes into account the interactions that may exist between materialized views, between indexes, and between indexes and materialized views (including the approaches that simultaneously select both materialized views and indexes).

In this paper, we present a novel strategy for optimizing data warehouse performance that aims at addressing both these issues. We have indeed designed a generic approach whose objective is to automatically propose solutions to data warehouse administrators for optimizing data access time. The principle of this approach is to apply data mining techniques on a workload (set of queries) that is representative of data warehouse usage in order to deduce a quasi-optimal configuration of materialized views and/or indexes. Data mining actually helps reduce the selection problem’s complexity and improves scalability. Then, cost models help select among the selected materialized views and indexes the most efficient in terms of performance gain/overhead ratio. We have applied our approach on three related problems: isolate materialized view selection, isolate index selection and joint materialized view and index selection. In the last case, we included index-view interactions in our cost models.

The remainder of this paper is organized as follows. Section 2 presents and discusses the state of the art regarding materialized view and index selection. Section 3 motivates and presents the principle of our performance optimization approach. Section 4 further details how we apply this approach to isolate materialized view selection, isolate index selection and joint materialized view and index selection, respectively. We particularly focus on joint materialized view and index selection, which is our latest development. Section 5 presents the experimental results we achieved to illustrate our approach’s relevance. Finally, we conclude this paper and provide research perspectives in Section 6.

In this section, we first formalize the materialized view and index selection problem, and then detail and discuss the state of the art regarding materialized view selection, index selection and joint index and materialized view selection, respectively.

The materialized view and index selection problem consists in building a set of materialized views and indexes that optimizes the execution cost of a given workload. This optimization may be realized under constraints, typically the storage space available for storing these physical data structures.

Let VC and IC be two sets of materialized views and indexes, respectively, that are termed candidate and are susceptible to reduce the execution cost of a given query set Q (generally supposed representative of system workload). Let OC = VC ∪ IC . Let S be the storage space allotted by the data warehouse administrator to build objects (materialized views or indexes) from set OC . The joint materialized view and index selection problem consists in building an object configuration O ⊆ OC that minimizes the execution c

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut