A Chase-based Approach to Consistent Answers of Analytic Queries in Star Schemas

A Chase-based Approach to Consistent Answers of Analytic Queries in Star Schemas
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We present an approach to computing consistent answers to queries possibly involving an aggregation operator in databases operating under a star schema and possibly containing missing values and inconsistent data. Our approach is based on earlier work concerning consistent query answering for standard queries (with no aggregate operator) in multi-table databases. In that work, we presented polynomial algorithms for computing either the exact consistent answer to a query or bounds of the exact answer, depending on whether the query involves a selection condition or not. In the present work, we consider databases operating under a star schema. Calling data warehouses such databases, we extend our previous work to queries involving aggregate operators, called analytic queries. In this context, we propose specific algorithms for computing exact consistent answers to queries, whether analytic or not, provided that the selection condition in the query satisfies the property of independency (i.e., the condition can be expressed as a conjunction of conditions each involving a single attribute). We show that the overall time complexity of these specific algorithms is in O(W.log(W)), where W is the size of the data warehouse. Moreover, the case of analytic queries involving a having clause associated with a group-by clause is discussed in the context of our approach.


💡 Research Summary

The paper addresses the problem of providing consistent answers to analytic (i.e., GROUP‑BY with aggregation) queries over data warehouses that are organized according to a star schema and may contain inconsistencies and missing values. Building on the authors’ earlier work on consistent query answering for standard (non‑analytic) queries, the authors adapt the classical Chase procedure and its extended version, called m‑Chase, to the star‑schema setting.

In a traditional Chase, the algorithm works on the actual tuples of the database and stops as soon as a functional‑dependency violation (conflict) is detected. Consequently, it cannot distinguish between “true” data and “conflicting” data when the database is inconsistent. The m‑Chase, by contrast, operates over the set 𝑇 of all possible tuples that can be formed from the active domains of the attributes. It continues applying functional dependencies even after conflicts appear, thereby classifying every tuple in 𝑇 along two dimensions: true/false and conflicting/non‑conflicting. This richer classification enables the extraction of a consistent subset of tuples (those that are true and non‑conflicting) that can be safely used for query answering.

A star schema consists of a single fact table and several dimension tables, each with its own primary key and functional dependencies. The authors focus on analytic queries whose selection predicates satisfy an “independency” property: the predicate can be expressed as a conjunction of single‑attribute conditions. Under this restriction, the selection can be applied independently to each dimension table, and the m‑Chase can be used to prune away conflicting tuples before any join with the fact table is performed.

The core algorithm proceeds in two phases. First, for each dimension table, the independent selection predicates are evaluated and the m‑Chase is run to obtain the set of non‑conflicting tuples. Second, the fact table is joined with the filtered dimension tables, and the GROUP‑BY aggregation (SUM, COUNT, MIN, MAX, etc.) is computed. By using sorting and heap‑based aggregation, the overall time complexity is O(W·log W), where W denotes the total size of the data warehouse.

The paper also discusses analytic queries that include a HAVING clause. After the GROUP‑BY and aggregation are performed, the HAVING condition is applied to the aggregated results. The same m‑Chase‑based consistency check is then used to discard any groups that contain conflicting tuples. If the HAVING condition itself violates the independency property, the algorithm may only provide an approximation or require additional repair steps.

Key contributions are: (1) extending the m‑Chase framework to star‑schema data warehouses; (2) presenting polynomial‑time algorithms (O(W·log W)) that return exact consistent answers for analytic queries when the selection predicates are independent; (3) analyzing the handling of HAVING clauses within this framework. The authors compare their approach with existing methods for standard queries, highlighting that their technique avoids costly full‑schema joins and remains robust in the presence of data inconsistencies. Limitations include the requirement that selection predicates be independent; more complex predicates or multi‑attribute dependencies are left for future work, as is the extension to queries with multiple aggregations or nested HAVING conditions.


Comments & Academic Discussion

Loading comments...

Leave a Comment