Frequent Query Matching in Dynamic Data Warehousing

Reading time: 7 minute
...

📝 Abstract

With the need for flexible and on-demand decision support, Dynamic Data Warehouses (DDW) provide benefits over traditional data warehouses due to their dynamic characteristics in structuring and access mechanism. A DDW is a data framework that accommodates data source changes easily to allow seamless querying to users. Materialized Views (MV) are proven to be an effective methodology to enhance the process of retrieving data from a DDW as results are pre-computed and stored in it. However, due to the static nature of materialized views, the level of dynamicity that can be provided at the MV access layer is restricted. As a result, the collection of materialized views is not compatible with ever-changing reporting requirements. It is important that the MV collection is consistent with current and upcoming queries. The solution to the above problem must consider the following aspects: (a) MV must be matched against an OLAP query in order to recognize whether the MV can answer the query, (b) enable scalability in the MV collection, an intuitive mechanism to prune it and retrieve closely matching MVs must be incorporated, (c) MV collection must be able to evolve in correspondence to the regularly changing user query patterns. Therefore, the primary objective of this paper is to explore these aspects and provide a well-rounded solution for the MV access layer to remove the mismatch between the MV collection and reporting requirements. Our contribution to solve the problem includes a Query Matching Technique, a Domain Matching Technique and Maintenance of the MV collection. We developed an experimental platform using real data-sets to evaluate the effectiveness in terms of performance and precision of the proposed techniques.

💡 Analysis

With the need for flexible and on-demand decision support, Dynamic Data Warehouses (DDW) provide benefits over traditional data warehouses due to their dynamic characteristics in structuring and access mechanism. A DDW is a data framework that accommodates data source changes easily to allow seamless querying to users. Materialized Views (MV) are proven to be an effective methodology to enhance the process of retrieving data from a DDW as results are pre-computed and stored in it. However, due to the static nature of materialized views, the level of dynamicity that can be provided at the MV access layer is restricted. As a result, the collection of materialized views is not compatible with ever-changing reporting requirements. It is important that the MV collection is consistent with current and upcoming queries. The solution to the above problem must consider the following aspects: (a) MV must be matched against an OLAP query in order to recognize whether the MV can answer the query, (b) enable scalability in the MV collection, an intuitive mechanism to prune it and retrieve closely matching MVs must be incorporated, (c) MV collection must be able to evolve in correspondence to the regularly changing user query patterns. Therefore, the primary objective of this paper is to explore these aspects and provide a well-rounded solution for the MV access layer to remove the mismatch between the MV collection and reporting requirements. Our contribution to solve the problem includes a Query Matching Technique, a Domain Matching Technique and Maintenance of the MV collection. We developed an experimental platform using real data-sets to evaluate the effectiveness in terms of performance and precision of the proposed techniques.

📄 Content

Frequent Query Matching in Dynamic Data Warehousing Charles H. Goonetilleke, J. Wenny Rahayu, and Md. Saiful Islam La Trobe University, Melbourne, Australia  Corresponding author. Tel.: +61-3-9479-1282; fax: +61-3-9479-3060; e-mail: w.rahayu@latrobe.edu.au

  1. Introduction Businesses collect large amounts of data from transactions in order to gain insight into their performance and growth. Decisions must be made based on this data to support the ever- changing needs of customers and business requirements. In present days’ the data warehouse has become a popular tool for decision making among businesses. Traditional data warehouses (DW) are subject-oriented, time variant, non- volatile and integrated data repositories that are capable of providing decision support as data is kept in an aggregated form [16]. Commonly, the data from transactional data sources are extracted and transformed into an aggregated form via the Extract-Transform-Load (ETL) process to be stored in the data warehouse and then onwards users can either execute On-Line Analytical Processing [3, 5, 18, 19] (OLAP) queries on it or create smaller data marts from it. Several data sources may have to be integrated to create the data warehouse. The query results can then be used for analytical purposes, for example to identify customer trends, so that business owners or management can make important business decisions based on this information.
    In order to retrieve information from a data warehouse, users can either directly submit an OLAP query or use an existing Materialized View (MV), which contains the pre-computed result for a particular query. This type of information is mainly used for compiling reports for decision making purposes. Reports may be produced on a daily, weekly, monthly and/or yearly basis, and results from one or more queries would be required. Therefore, storing a number of MVs for this type of frequently used queries would be beneficial as query results need not be computed at runtime. This would mean that the utilization of MVs would reduce query execution time, as opposed to direct querying that requires expensive database operations such as joins, grouping, nesting to be performed at runtime. For example, if the cost of accessing the query result is C1, then directly querying would have a total cost of C1 + O2, where O2 is the cost of performing the aforementioned operations in DW. Contrary to this, the total cost of using an MV to get the result is just C1, as the result is already computed. Hence MVs are an effective solution to faster reporting. The presence of an MV collection in a data warehouse is advantageous to the reporting process. However, the following issues exist from the user perspective:  Users are required to determine which MV(s) can answer their query; and  Users must explicitly reference the MV(s) in their queries. This means that users are required to know intricate details about the collection and re-write their queries, if they wish to utilize MVs as illustrated in left hand side of Figure 1. Furthermore, from a DW perspective, the following issues are observed:  The MV collection must be intuitive so that majority of the queries are answered using MVs;  When the MV collection is large, finding a match by a linear scan is time consuming; and

ABSTRACT With the need for flexible and on-demand decision support, Dynamic Data Warehouses (DDW) provide benefits over traditional data warehouses due to their dynamic characteristics in structuring and access mechanism. A DDW is a data framework that accommodates data source changes easily to allow seamless querying to users. Materialized Views (MV) are proven to be an effective methodology to enhance the process of retrieving data from a DDW as results are pre-computed and stored in it. However, due to the static nature of materialized views, the level of dynamicity that can be provided at the MV access layer is restricted. As a result, the collection of materialized views is not compatible with ever-changing reporting requirements. It is important that the MV collection is consistent with current and upcoming queries. The solution to the above problem must consider the following aspects: (a) MV must be matched against an OLAP query in order to recognize whether the MV can answer the query, (b) enable scalability in the MV collection, an intuitive mechanism to prune it and retrieve closely matching MVs must be incorporated, (c) MV collection must be able to evolve in correspondence to the regularly changing user query patterns. Therefore, the primary objective of this paper is to explore these aspects and provide a well-rounded solution for the MV access layer to remove the mismatch between the MV collection and reporting requirements. Our contribution to solve the problem includes a Query Matching Technique, a Domain Matching Technique and Maintenance of the MV collection. We deve

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut