📝 Original Info
- Title: Dynamic Indexability: The Query-Update Tradeoff for One-Dimensional Range Queries
- ArXiv ID: 0811.4346
- Date: 2008-11-27
- Authors: Researchers from original ArXiv paper
📝 Abstract
The B-tree is a fundamental secondary index structure that is widely used for answering one-dimensional range reporting queries. Given a set of $N$ keys, a range query can be answered in $O(\log_B \nm + \frac{K}{B})$ I/Os, where $B$ is the disk block size, $K$ the output size, and $M$ the size of the main memory buffer. When keys are inserted or deleted, the B-tree is updated in $O(\log_B N)$ I/Os, if we require the resulting changes to be committed to disk right away. Otherwise, the memory buffer can be used to buffer the recent updates, and changes can be written to disk in batches, which significantly lowers the amortized update cost. A systematic way of batching up updates is to use the logarithmic method, combined with fractional cascading, resulting in a dynamic B-tree that supports insertions in $O(\frac{1}{B}\log\nm)$ I/Os and queries in $O(\log\nm + \frac{K}{B})$ I/Os. Such bounds have also been matched by several known dynamic B-tree variants in the database literature. In this paper, we prove that for any dynamic one-dimensional range query index structure with query cost $O(q+\frac{K}{B})$ and amortized insertion cost $O(u/B)$, the tradeoff $q\cdot \log(u/q) = \Omega(\log B)$ must hold if $q=O(\log B)$. For most reasonable values of the parameters, we have $\nm = B^{O(1)}$, in which case our query-insertion tradeoff implies that the bounds mentioned above are already optimal. Our lower bounds hold in a dynamic version of the {\em indexability model}, which is of independent interests.
💡 Deep Analysis
Deep Dive into Dynamic Indexability: The Query-Update Tradeoff for One-Dimensional Range Queries.
The B-tree is a fundamental secondary index structure that is widely used for answering one-dimensional range reporting queries. Given a set of $N$ keys, a range query can be answered in $O(\log_B \nm + \frac{K}{B})$ I/Os, where $B$ is the disk block size, $K$ the output size, and $M$ the size of the main memory buffer. When keys are inserted or deleted, the B-tree is updated in $O(\log_B N)$ I/Os, if we require the resulting changes to be committed to disk right away. Otherwise, the memory buffer can be used to buffer the recent updates, and changes can be written to disk in batches, which significantly lowers the amortized update cost. A systematic way of batching up updates is to use the logarithmic method, combined with fractional cascading, resulting in a dynamic B-tree that supports insertions in $O(\frac{1}{B}\log\nm)$ I/Os and queries in $O(\log\nm + \frac{K}{B})$ I/Os. Such bounds have also been matched by several known dynamic B-tree variants in the database literature. In
📄 Full Content
arXiv:0811.4346v1 [cs.DS] 26 Nov 2008
Dynamic Indexability: The Query-Update Tradeoff for
One-Dimensional Range Queries
Ke Yi
Department of Computer Science and Engineering
Hong Kong University of Science and Technology
Hong Kong, China
Abstract
The B-tree is a fundamental secondary index structure that is widely used for answering one-dimensional range
reporting queries. Given a set of N keys, a range query can be answered in O(logB
N
M + K
B ) I/Os, where B is the disk
block size, K the output size, and M the size of the main memory buffer. When keys are inserted or deleted, the B-
tree is updated in O(logB N) I/Os, if we require the resulting changes to be committed to disk right away. Otherwise,
the memory buffer can be used to buffer the recent updates, and changes can be written to disk in batches, which
significantly lowers the amortized update cost. A systematic way of batching up updates is to use the logarithmic
method, combined with fractional cascading, resulting in a dynamic B-tree that supports insertions in O( 1
B log N
M )
I/Os and queries in O(log N
M + K
B ) I/Os. Such bounds have also been matched by several known dynamic B-tree
variants in the database literature. Note that, however, the query cost of these dynamic B-trees is substantially worse
than the O(logB
N
M + K
B ) bound of the static B-tree by a factor of Θ(log B).
In this paper, we prove that for any dynamic one-dimensional range query index structure with query cost O(q +
K
B ) and amortized insertion cost O(u/B), the tradeoff q · log(u/q) = Ω(log B) must hold if q = O(log B). For
most reasonable values of the parameters, we have N
M = BO(1), in which case our query-insertion tradeoff implies
that the bounds mentioned above are already optimal. We also prove a lower bound of u · log q = Ω(log B), which
is relevant for larger values of q. Our lower bounds hold in a dynamic version of the indexability model, which is of
independent interests. Dynamic indexability is a clean yet powerful model for studying dynamic indexing problems,
and can potentially lead to more interesting complexity results.
1
Introduction
The B-tree [5] is a fundamental secondary index structure used in nearly all database systems. It has both very good
space utilization and query performance: Assuming each disk block can store B data records, the B-tree occupies
O( N
B ) disk blocks for N data records, and supports one-dimensional range reporting queries in O(logB N + K
B ) I/Os
(or page accesses) where K is the output size. Due to the large fanout of the B-tree, for most practical values of N
and B, the B-tree is very shallow and logB N is essentially a constant. Very often we also have a memory buffer of
size M, which can be used to store the top Θ(logB M) levels of the B-tree, further lowering the effective height of the
B-tree to O(logB
N
M ), meaning that we can usually get to the desired leaf with merely one or two I/Os, and then start
pulling out results.
If one wants to update the B-tree directly on disk, it is also well known that it takes O(logB N) I/Os. Things
become much more interesting if we make use of the main memory buffer to collect a number of updates and then
perform the updates in batches, lowering the amortized update cost significantly. For now let us focus on insertions
only; deletions are in general much less frequent than insertions, and there are some generic methods for dealing with
deletions by converting them into insertions of “delete signals” [2, 17]. The idea of using a buffer space to batch
up insertions has been well exploited in the literature, especially for the purpose of managing historical data, where
there are much more insertions than queries. The LSM-tree [17] was the first along this line of research, by applying
the logarithmic method [7] to the B-tree. Fix a parameter 2 ≤ℓ≤B. It builds a collection of B-trees of sizes up to
1
M, ℓM, ℓ2M, . . . , respectively, where the first one always resides in memory. An insertion always goes to the memory-
resident tree; if the first i trees are full, they are merged together with the (i+1)-th tree by rebuilding. Standard analysis
shows that the amortized insertion cost is O( ℓ
B logℓ
N
M ). A query takes O(logB N logℓ
N
M + K
B ) I/Os since O(logℓ
N
M )
trees need to be queried. Using fractional cascading [10], the query cost can be improved to O(logℓ
N
M + K
B ) without
affecting the (asymptotic) size of the index and the update cost, but this result appears to be folklore. Later Jermaine et
al. [14] proposed the Y-tree as “yet” another B-tree structure for the purpose of lowering the insertion cost. The Y-tree
is an ℓ-ary tree, where each internal node is associated with a bucket storing all the elements to be pushed down to its
subtree. The bucket is emptied only when it has accumulated Ω(B) elements. Although [14] did not give a rigorous
analysis, it is not difficult to derive that its insertion cost is O( ℓ
B logℓ
N
M ) and query cost O(logℓ
N
M + K
B ), namely, the
same
…(Full text truncated)…
Reference
This content is AI-processed based on ArXiv data.