📝 Original Info
- Title: Equivalence of SQL Queries in Presence of Embedded Dependencies
- ArXiv ID: 0812.2195
- Date: 2009-06-27
- Authors: Researchers from original ArXiv paper
📝 Abstract
We consider the problem of finding equivalent minimal-size reformulations of SQL queries in presence of embedded dependencies [1]. Our focus is on select-project-join (SPJ) queries with equality comparisons, also known as safe conjunctive (CQ) queries, possibly with grouping and aggregation. For SPJ queries, the semantics of the SQL standard treat query answers as multisets (a.k.a. bags), whereas the stored relations may be treated either as sets, which is called bag-set semantics for query evaluation, or as bags, which is called bag semantics. (Under set semantics, both query answers and stored relations are treated as sets.) In the context of the above Query-Reformulation Problem, we develop a comprehensive framework for equivalence of CQ queries under bag and bag-set semantics in presence of embedded dependencies, and make a number of conceptual and technical contributions. Specifically, we develop equivalence tests for CQ queries in presence of arbitrary sets of embedded dependencies under bag and bag-set semantics, under the condition that chase [9] under set semantics (set-chase) on the inputs terminates. We also present equivalence tests for aggregate CQ queries in presence of embedded dependencies. We use our equivalence tests to develop sound and complete (whenever set-chase on the inputs terminates) algorithms for solving instances of the Query-Reformulation Problem with CQ queries under each of bag and bag-set semantics, as well as for instances of the problem with aggregate queries.
💡 Deep Analysis
Deep Dive into Equivalence of SQL Queries in Presence of Embedded Dependencies.
We consider the problem of finding equivalent minimal-size reformulations of SQL queries in presence of embedded dependencies [1]. Our focus is on select-project-join (SPJ) queries with equality comparisons, also known as safe conjunctive (CQ) queries, possibly with grouping and aggregation. For SPJ queries, the semantics of the SQL standard treat query answers as multisets (a.k.a. bags), whereas the stored relations may be treated either as sets, which is called bag-set semantics for query evaluation, or as bags, which is called bag semantics. (Under set semantics, both query answers and stored relations are treated as sets.) In the context of the above Query-Reformulation Problem, we develop a comprehensive framework for equivalence of CQ queries under bag and bag-set semantics in presence of embedded dependencies, and make a number of conceptual and technical contributions. Specifically, we develop equivalence tests for CQ queries in presence of arbitrary sets of embedded dependen
📄 Full Content
arXiv:0812.2195v3 [cs.DB] 26 Jun 2009
Equivalence of SQL Queries
In Presence of Embedded Dependencies
Rada Chirkova
Department of Computer Science
NC State University, Raleigh, NC 27695, USA
chirkova@csc.ncsu.edu
Michael R. Genesereth
Department of Computer Science
Stanford University, Stanford, CA 94305, USA
genesereth@stanford.edu
ABSTRACT
We consider the problem of finding equivalent minimal-
size reformulations of SQL queries in presence of embed-
ded dependencies [1]. Our focus is on select-project-join
(SPJ) queries with equality comparisons, also known as
safe conjunctive (CQ) queries, possibly with grouping
and aggregation. For SPJ queries, the semantics of the
SQL standard treat query answers as multisets (a.k.a.
bags), whereas the stored relations may be treated ei-
ther as sets, which is called bag-set semantics for query
evaluation, or as bags, which is called bag semantics.
(Under set semantics, both query answers and stored
relations are treated as sets.)
In the context of the above Query-Reformulation Prob-
lem, we develop a comprehensive framework for equiva-
lence of CQ queries under bag and bag-set semantics in
presence of embedded dependencies, and make a num-
ber of conceptual and technical contributions. Specif-
ically, we develop equivalence tests for CQ queries in
presence of arbitrary sets of embedded dependencies
under bag and bag-set semantics, under the condition
that chase [10] under set semantics (set-chase) on the
inputs terminates. We also present equivalence tests for
aggregate CQ queries in presence of embedded depen-
dencies. We use our equivalence tests to develop sound
and complete (whenever set-chase on the inputs termi-
nates) algorithms for solving instances of the Query-
Reformulation Problem with CQ queries under each of
bag and bag-set semantics, as well as for instances of
the problem with aggregate queries.
Some of our results are of independent interest. In
particular, it is known that constraints that force some
relations to be sets on all instances of a given database
schema arise naturally in the context of sound (i.e., cor-
rect) chase [9] under bag semantics. We develop a for-
mal framework for defining such constraints as embed-
ded dependencies, provided that row (tuple) IDs, com-
monly used in commercial database-management sys-
tems, are defined for the respective relations.
We also extend the condition of [4] for bag equivalence
of CQ queries, to those cases where some relations are
set valued in all instances of the given schema.
Our
proof of this nontrivial result includes reasoning involv-
ing bag (non)containment.
In particular, we provide
an original proof (adapted to our context) of the result
of [4] that CQ query Q1 is bag contained in CQ query
Q2 only if, for each predicate used in Q1, Q2 has at least
as many subgoals with this predicate as Q1 does.
Our contributions are clearly applicable beyond the
Query-Reformulation Problem considered in this pa-
per. Specifically, the results of this paper can be used
in developing algorithms for rewriting CQ queries and
queries in more expressive languages (e.g., including
grouping and aggregation, or arithmetic comparisons)
using views in presence of embedded dependencies, un-
der bag or bag-set semantics for query evaluation.
This text contains corrections to Sections 2.4 and 4 of [5].
1.
INTRODUCTION
Query containment and equivalence were recognized
fairly early as fundamental problems in database query
evaluation and optimization. The reason is, for conjunc-
tive queries (CQ queries) — a broad class of frequently
used queries, whose expressive power is equivalent to
that of select-project-join queries in relational algebra
— query equivalence can be used as a tool in query
optimization. Specifically, to find a more efficient and
answer-preserving formulation of a given CQ query, it is
enough to “try all ways”of arriving at a “shorter”query
formulation, by removing query subgoals, in a process
called query minimization [2]. A subgoal-removal step
succeeds only if equivalence (via containment) of the
“original” and “shorter” query formulations can be en-
sured.
The equivalence test of [2] for CQ queries is
known to be NP complete, whereas equivalence of gen-
eral relational queries is undecidable.
In recent years, there has been renewed interest in the
study of query containment and equivalence, because
of their close relationship to the problem of answering
queries using views [17]. In particular, the problem of
rewriting relational queries equivalently using views has
been the subject of extensive rigorous investigations.
Please see [11, 17, 21, 23] for discussions of the state
of the art and of the numerous practical applications of
the problem. A test for equivalence of a CQ query to
its candidate CQ rewriting in terms of CQ views uses
an equivalent transformation of the rewriting to its CQ
expansion, which (informally speaking) replaces refer-
ences to views in the rewriting by their definitions [17,
23].
…(Full text truncated)…
📸 Image Gallery
Reference
This content is AI-processed based on ArXiv data.