Dynamic Indexability: The Query-Update Tradeoff for One-Dimensional Range Queries

Reading time: 7 minute
...

📝 Original Info

  • Title: Dynamic Indexability: The Query-Update Tradeoff for One-Dimensional Range Queries
  • ArXiv ID: 0811.4346
  • Date: 2008-11-27
  • Authors: Researchers from original ArXiv paper

📝 Abstract

The B-tree is a fundamental secondary index structure that is widely used for answering one-dimensional range reporting queries. Given a set of $N$ keys, a range query can be answered in $O(\log_B \nm + \frac{K}{B})$ I/Os, where $B$ is the disk block size, $K$ the output size, and $M$ the size of the main memory buffer. When keys are inserted or deleted, the B-tree is updated in $O(\log_B N)$ I/Os, if we require the resulting changes to be committed to disk right away. Otherwise, the memory buffer can be used to buffer the recent updates, and changes can be written to disk in batches, which significantly lowers the amortized update cost. A systematic way of batching up updates is to use the logarithmic method, combined with fractional cascading, resulting in a dynamic B-tree that supports insertions in $O(\frac{1}{B}\log\nm)$ I/Os and queries in $O(\log\nm + \frac{K}{B})$ I/Os. Such bounds have also been matched by several known dynamic B-tree variants in the database literature. In this paper, we prove that for any dynamic one-dimensional range query index structure with query cost $O(q+\frac{K}{B})$ and amortized insertion cost $O(u/B)$, the tradeoff $q\cdot \log(u/q) = \Omega(\log B)$ must hold if $q=O(\log B)$. For most reasonable values of the parameters, we have $\nm = B^{O(1)}$, in which case our query-insertion tradeoff implies that the bounds mentioned above are already optimal. Our lower bounds hold in a dynamic version of the {\em indexability model}, which is of independent interests.

💡 Deep Analysis

Deep Dive into Dynamic Indexability: The Query-Update Tradeoff for One-Dimensional Range Queries.

The B-tree is a fundamental secondary index structure that is widely used for answering one-dimensional range reporting queries. Given a set of $N$ keys, a range query can be answered in $O(\log_B \nm + \frac{K}{B})$ I/Os, where $B$ is the disk block size, $K$ the output size, and $M$ the size of the main memory buffer. When keys are inserted or deleted, the B-tree is updated in $O(\log_B N)$ I/Os, if we require the resulting changes to be committed to disk right away. Otherwise, the memory buffer can be used to buffer the recent updates, and changes can be written to disk in batches, which significantly lowers the amortized update cost. A systematic way of batching up updates is to use the logarithmic method, combined with fractional cascading, resulting in a dynamic B-tree that supports insertions in $O(\frac{1}{B}\log\nm)$ I/Os and queries in $O(\log\nm + \frac{K}{B})$ I/Os. Such bounds have also been matched by several known dynamic B-tree variants in the database literature. In

📄 Full Content

arXiv:0811.4346v1 [cs.DS] 26 Nov 2008 Dynamic Indexability: The Query-Update Tradeoff for One-Dimensional Range Queries Ke Yi Department of Computer Science and Engineering Hong Kong University of Science and Technology Hong Kong, China Abstract The B-tree is a fundamental secondary index structure that is widely used for answering one-dimensional range reporting queries. Given a set of N keys, a range query can be answered in O(logB N M + K B ) I/Os, where B is the disk block size, K the output size, and M the size of the main memory buffer. When keys are inserted or deleted, the B- tree is updated in O(logB N) I/Os, if we require the resulting changes to be committed to disk right away. Otherwise, the memory buffer can be used to buffer the recent updates, and changes can be written to disk in batches, which significantly lowers the amortized update cost. A systematic way of batching up updates is to use the logarithmic method, combined with fractional cascading, resulting in a dynamic B-tree that supports insertions in O( 1 B log N M ) I/Os and queries in O(log N M + K B ) I/Os. Such bounds have also been matched by several known dynamic B-tree variants in the database literature. Note that, however, the query cost of these dynamic B-trees is substantially worse than the O(logB N M + K B ) bound of the static B-tree by a factor of Θ(log B). In this paper, we prove that for any dynamic one-dimensional range query index structure with query cost O(q + K B ) and amortized insertion cost O(u/B), the tradeoff q · log(u/q) = Ω(log B) must hold if q = O(log B). For most reasonable values of the parameters, we have N M = BO(1), in which case our query-insertion tradeoff implies that the bounds mentioned above are already optimal. We also prove a lower bound of u · log q = Ω(log B), which is relevant for larger values of q. Our lower bounds hold in a dynamic version of the indexability model, which is of independent interests. Dynamic indexability is a clean yet powerful model for studying dynamic indexing problems, and can potentially lead to more interesting complexity results. 1 Introduction The B-tree [5] is a fundamental secondary index structure used in nearly all database systems. It has both very good space utilization and query performance: Assuming each disk block can store B data records, the B-tree occupies O( N B ) disk blocks for N data records, and supports one-dimensional range reporting queries in O(logB N + K B ) I/Os (or page accesses) where K is the output size. Due to the large fanout of the B-tree, for most practical values of N and B, the B-tree is very shallow and logB N is essentially a constant. Very often we also have a memory buffer of size M, which can be used to store the top Θ(logB M) levels of the B-tree, further lowering the effective height of the B-tree to O(logB N M ), meaning that we can usually get to the desired leaf with merely one or two I/Os, and then start pulling out results. If one wants to update the B-tree directly on disk, it is also well known that it takes O(logB N) I/Os. Things become much more interesting if we make use of the main memory buffer to collect a number of updates and then perform the updates in batches, lowering the amortized update cost significantly. For now let us focus on insertions only; deletions are in general much less frequent than insertions, and there are some generic methods for dealing with deletions by converting them into insertions of “delete signals” [2, 17]. The idea of using a buffer space to batch up insertions has been well exploited in the literature, especially for the purpose of managing historical data, where there are much more insertions than queries. The LSM-tree [17] was the first along this line of research, by applying the logarithmic method [7] to the B-tree. Fix a parameter 2 ≤ℓ≤B. It builds a collection of B-trees of sizes up to 1 M, ℓM, ℓ2M, . . . , respectively, where the first one always resides in memory. An insertion always goes to the memory- resident tree; if the first i trees are full, they are merged together with the (i+1)-th tree by rebuilding. Standard analysis shows that the amortized insertion cost is O( ℓ B logℓ N M ). A query takes O(logB N logℓ N M + K B ) I/Os since O(logℓ N M ) trees need to be queried. Using fractional cascading [10], the query cost can be improved to O(logℓ N M + K B ) without affecting the (asymptotic) size of the index and the update cost, but this result appears to be folklore. Later Jermaine et al. [14] proposed the Y-tree as “yet” another B-tree structure for the purpose of lowering the insertion cost. The Y-tree is an ℓ-ary tree, where each internal node is associated with a bucket storing all the elements to be pushed down to its subtree. The bucket is emptied only when it has accumulated Ω(B) elements. Although [14] did not give a rigorous analysis, it is not difficult to derive that its insertion cost is O( ℓ B logℓ N M ) and query cost O(logℓ N M + K B ), namely, the same

…(Full text truncated)…

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut