Computer Science / Data Structures Computer Science / Databases

Dynamic Indexability: The Query-Update Tradeoff for One-Dimensional Range Queries

February 23, 2026

Reading time: 7 minute

...

#Computer Science #Data Structures #Databases

📝 Original Info

Title: Dynamic Indexability: The Query-Update Tradeoff for One-Dimensional Range Queries
ArXiv ID: 0811.4346
Date: 2008-11-27
Authors: Researchers from original ArXiv paper

📝 Abstract

The B-tree is a fundamental secondary index structure that is widely used for answering one-dimensional range reporting queries. Given a set of $N$ keys, a range query can be answered in $O(\log_B \nm + \frac{K}{B})$ I/Os, where $B$ is the disk block size, $K$ the output size, and $M$ the size of the main memory buffer. When keys are inserted or deleted, the B-tree is updated in $O(\log_B N)$ I/Os, if we require the resulting changes to be committed to disk right away. Otherwise, the memory buffer can be used to buffer the recent updates, and changes can be written to disk in batches, which significantly lowers the amortized update cost. A systematic way of batching up updates is to use the logarithmic method, combined with fractional cascading, resulting in a dynamic B-tree that supports insertions in $O(\frac{1}{B}\log\nm)$ I/Os and queries in $O(\log\nm + \frac{K}{B})$ I/Os. Such bounds have also been matched by several known dynamic B-tree variants in the database literature. In this paper, we prove that for any dynamic one-dimensional range query index structure with query cost $O(q+\frac{K}{B})$ and amortized insertion cost $O(u/B)$, the tradeoff $q\cdot \log(u/q) = \Omega(\log B)$ must hold if $q=O(\log B)$. For most reasonable values of the parameters, we have $\nm = B^{O(1)}$, in which case our query-insertion tradeoff implies that the bounds mentioned above are already optimal. Our lower bounds hold in a dynamic version of the {\em indexability model}, which is of independent interests.

💡 Deep Analysis

Deep Dive into Dynamic Indexability: The Query-Update Tradeoff for One-Dimensional Range Queries.

📄 Full Content

arXiv:0811.4346v1 [cs.DS] 26 Nov 2008 Dynamic Indexability: The Query-Update Tradeoff for One-Dimensional Range Queries Ke Yi Department of Computer Science and Engineering Hong Kong University of Science and Technology Hong Kong, China Abstract The B-tree is a fundamental secondary index structure that is widely used for answering one-dimensional range reporting queries. Given a set of N keys, a range query can be answered in O(logB N M + K B ) I/Os, where B is the disk block size, K the output size, and M the size of the main memory buffer. When keys are inserted or deleted, the B- tree is updated in O(logB N) I/Os, if we require the resulting changes to be committed to disk right away. Otherwise, the memory buffer can be used to buffer the recent updates, and changes can be written to disk in batches, which signiﬁcantly lowers the amortized update cost. A systematic way of batching up updates is to use the logarithmic method, combined with fractional cascading, resulting in a dynamic B-tree that supports insertions in O( 1 B log N M ) I/Os and queries in O(log N M + K B ) I/Os. Such bounds have also been matched by several known dynamic B-tree variants in the database literature. Note that, however, the query cost of these dynamic B-trees is substantially worse than the O(logB N M + K B ) bound of the static B-tree by a factor of Θ(log B). In this paper, we prove that for any dynamic one-dimensional range query index structure with query cost O(q + K B ) and amortized insertion cost O(u/B), the tradeoff q · log(u/q) = Ω(log B) must hold if q = O(log B). For most reasonable values of the parameters, we have N M = BO(1), in which case our query-insertion tradeoff implies that the bounds mentioned above are already optimal. We also prove a lower bound of u · log q = Ω(log B), which is relevant for larger values of q. Our lower bounds hold in a dynamic version of the indexability model, which is of independent interests. Dynamic indexability is a clean yet powerful model for studying dynamic indexing problems, and can potentially lead to more interesting complexity results. 1 Introduction The B-tree [5] is a fundamental secondary index structure used in nearly all database systems. It has both very good space utilization and query performance: Assuming each disk block can store B data records, the B-tree occupies O( N B ) disk blocks for N data records, and supports one-dimensional range reporting queries in O(logB N + K B ) I/Os (or page accesses) where K is the output size. Due to the large fanout of the B-tree, for most practical values of N and B, the B-tree is very shallow and logB N is essentially a constant. Very often we also have a memory buffer of size M, which can be used to store the top Θ(logB M) levels of the B-tree, further lowering the effective height of the B-tree to O(logB N M ), meaning that we can usually get to the desired leaf with merely one or two I/Os, and then start pulling out results. If one wants to update the B-tree directly on disk, it is also well known that it takes O(logB N) I/Os. Things become much more interesting if we make use of the main memory buffer to collect a number of updates and then perform the updates in batches, lowering the amortized update cost signiﬁcantly. For now let us focus on insertions only; deletions are in general much less frequent than insertions, and there are some generic methods for dealing with deletions by converting them into insertions of “delete signals” [2, 17]. The idea of using a buffer space to batch up insertions has been well exploited in the literature, especially for the purpose of managing historical data, where there are much more insertions than queries. The LSM-tree [17] was the ﬁrst along this line of research, by applying the logarithmic method [7] to the B-tree. Fix a parameter 2 ≤ℓ≤B. It builds a collection of B-trees of sizes up to 1 M, ℓM, ℓ2M, . . . , respectively, where the ﬁrst one always resides in memory. An insertion always goes to the memory- resident tree; if the ﬁrst i trees are full, they are merged together with the (i+1)-th tree by rebuilding. Standard analysis shows that the amortized insertion cost is O( ℓ B logℓ N M ). A query takes O(logB N logℓ N M + K B ) I/Os since O(logℓ N M ) trees need to be queried. Using fractional cascading [10], the query cost can be improved to O(logℓ N M + K B ) without affecting the (asymptotic) size of the index and the update cost, but this result appears to be folklore. Later Jermaine et al. [14] proposed the Y-tree as “yet” another B-tree structure for the purpose of lowering the insertion cost. The Y-tree is an ℓ-ary tree, where each internal node is associated with a bucket storing all the elements to be pushed down to its subtree. The bucket is emptied only when it has accumulated Ω(B) elements. Although [14] did not give a rigorous analysis, it is not difﬁcult to derive that its insertion cost is O( ℓ B logℓ N M ) and query cost O(logℓ N M + K B ), namely, the same

…(Full text truncated)…

🇰🇷 이 논문을 한글로 읽기

📄 Read Full PDF on ArXiv

Reference

This content is AI-processed based on ArXiv data.

Dynamic Indexability: The Query-Update Tradeoff for One-Dimensional Range Queries

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

Reference

Related Posts

A new greedy randomized adaptive search procedure for multiobjective RNA structural alignment

An Improved Randomized Truthful Mechanism for Scheduling Unrelated Machines

Approximation Algorithms for the Loop Cutset Problem

Start searching

No results found