Equivalence of SQL Queries in Presence of Embedded Dependencies

Reading time: 6 minute
...

📝 Original Info

  • Title: Equivalence of SQL Queries in Presence of Embedded Dependencies
  • ArXiv ID: 0812.2195
  • Date: 2009-06-27
  • Authors: Researchers from original ArXiv paper

📝 Abstract

We consider the problem of finding equivalent minimal-size reformulations of SQL queries in presence of embedded dependencies [1]. Our focus is on select-project-join (SPJ) queries with equality comparisons, also known as safe conjunctive (CQ) queries, possibly with grouping and aggregation. For SPJ queries, the semantics of the SQL standard treat query answers as multisets (a.k.a. bags), whereas the stored relations may be treated either as sets, which is called bag-set semantics for query evaluation, or as bags, which is called bag semantics. (Under set semantics, both query answers and stored relations are treated as sets.) In the context of the above Query-Reformulation Problem, we develop a comprehensive framework for equivalence of CQ queries under bag and bag-set semantics in presence of embedded dependencies, and make a number of conceptual and technical contributions. Specifically, we develop equivalence tests for CQ queries in presence of arbitrary sets of embedded dependencies under bag and bag-set semantics, under the condition that chase [9] under set semantics (set-chase) on the inputs terminates. We also present equivalence tests for aggregate CQ queries in presence of embedded dependencies. We use our equivalence tests to develop sound and complete (whenever set-chase on the inputs terminates) algorithms for solving instances of the Query-Reformulation Problem with CQ queries under each of bag and bag-set semantics, as well as for instances of the problem with aggregate queries.

💡 Deep Analysis

Deep Dive into Equivalence of SQL Queries in Presence of Embedded Dependencies.

We consider the problem of finding equivalent minimal-size reformulations of SQL queries in presence of embedded dependencies [1]. Our focus is on select-project-join (SPJ) queries with equality comparisons, also known as safe conjunctive (CQ) queries, possibly with grouping and aggregation. For SPJ queries, the semantics of the SQL standard treat query answers as multisets (a.k.a. bags), whereas the stored relations may be treated either as sets, which is called bag-set semantics for query evaluation, or as bags, which is called bag semantics. (Under set semantics, both query answers and stored relations are treated as sets.) In the context of the above Query-Reformulation Problem, we develop a comprehensive framework for equivalence of CQ queries under bag and bag-set semantics in presence of embedded dependencies, and make a number of conceptual and technical contributions. Specifically, we develop equivalence tests for CQ queries in presence of arbitrary sets of embedded dependen

📄 Full Content

arXiv:0812.2195v3 [cs.DB] 26 Jun 2009 Equivalence of SQL Queries In Presence of Embedded Dependencies Rada Chirkova Department of Computer Science NC State University, Raleigh, NC 27695, USA chirkova@csc.ncsu.edu Michael R. Genesereth Department of Computer Science Stanford University, Stanford, CA 94305, USA genesereth@stanford.edu ABSTRACT We consider the problem of finding equivalent minimal- size reformulations of SQL queries in presence of embed- ded dependencies [1]. Our focus is on select-project-join (SPJ) queries with equality comparisons, also known as safe conjunctive (CQ) queries, possibly with grouping and aggregation. For SPJ queries, the semantics of the SQL standard treat query answers as multisets (a.k.a. bags), whereas the stored relations may be treated ei- ther as sets, which is called bag-set semantics for query evaluation, or as bags, which is called bag semantics. (Under set semantics, both query answers and stored relations are treated as sets.) In the context of the above Query-Reformulation Prob- lem, we develop a comprehensive framework for equiva- lence of CQ queries under bag and bag-set semantics in presence of embedded dependencies, and make a num- ber of conceptual and technical contributions. Specif- ically, we develop equivalence tests for CQ queries in presence of arbitrary sets of embedded dependencies under bag and bag-set semantics, under the condition that chase [10] under set semantics (set-chase) on the inputs terminates. We also present equivalence tests for aggregate CQ queries in presence of embedded depen- dencies. We use our equivalence tests to develop sound and complete (whenever set-chase on the inputs termi- nates) algorithms for solving instances of the Query- Reformulation Problem with CQ queries under each of bag and bag-set semantics, as well as for instances of the problem with aggregate queries. Some of our results are of independent interest. In particular, it is known that constraints that force some relations to be sets on all instances of a given database schema arise naturally in the context of sound (i.e., cor- rect) chase [9] under bag semantics. We develop a for- mal framework for defining such constraints as embed- ded dependencies, provided that row (tuple) IDs, com- monly used in commercial database-management sys- tems, are defined for the respective relations. We also extend the condition of [4] for bag equivalence of CQ queries, to those cases where some relations are set valued in all instances of the given schema. Our proof of this nontrivial result includes reasoning involv- ing bag (non)containment. In particular, we provide an original proof (adapted to our context) of the result of [4] that CQ query Q1 is bag contained in CQ query Q2 only if, for each predicate used in Q1, Q2 has at least as many subgoals with this predicate as Q1 does. Our contributions are clearly applicable beyond the Query-Reformulation Problem considered in this pa- per. Specifically, the results of this paper can be used in developing algorithms for rewriting CQ queries and queries in more expressive languages (e.g., including grouping and aggregation, or arithmetic comparisons) using views in presence of embedded dependencies, un- der bag or bag-set semantics for query evaluation. This text contains corrections to Sections 2.4 and 4 of [5]. 1. INTRODUCTION Query containment and equivalence were recognized fairly early as fundamental problems in database query evaluation and optimization. The reason is, for conjunc- tive queries (CQ queries) — a broad class of frequently used queries, whose expressive power is equivalent to that of select-project-join queries in relational algebra — query equivalence can be used as a tool in query optimization. Specifically, to find a more efficient and answer-preserving formulation of a given CQ query, it is enough to “try all ways”of arriving at a “shorter”query formulation, by removing query subgoals, in a process called query minimization [2]. A subgoal-removal step succeeds only if equivalence (via containment) of the “original” and “shorter” query formulations can be en- sured. The equivalence test of [2] for CQ queries is known to be NP complete, whereas equivalence of gen- eral relational queries is undecidable. In recent years, there has been renewed interest in the study of query containment and equivalence, because of their close relationship to the problem of answering queries using views [17]. In particular, the problem of rewriting relational queries equivalently using views has been the subject of extensive rigorous investigations. Please see [11, 17, 21, 23] for discussions of the state of the art and of the numerous practical applications of the problem. A test for equivalence of a CQ query to its candidate CQ rewriting in terms of CQ views uses an equivalent transformation of the rewriting to its CQ expansion, which (informally speaking) replaces refer- ences to views in the rewriting by their definitions [17, 23].

…(Full text truncated)…

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut