Optimal Succinctness for Range Minimum Queries

Reading time: 5 minute
...

📝 Abstract

For a static array A of n ordered objects, a range minimum query asks for the position of the minimum between two specified array indices. We show how to preprocess A into a scheme of size 2n+o(n) bits that allows to answer range minimum queries on A in constant time. This space is asymptotically optimal in the important setting where access to A is not permitted after the preprocessing step. Our scheme can be computed in linear time, using only n + o(n) additional bits at construction time. In interesting by-product is that we also improve on LCA-computation in BPS- or DFUDS-encoded trees.

💡 Analysis

For a static array A of n ordered objects, a range minimum query asks for the position of the minimum between two specified array indices. We show how to preprocess A into a scheme of size 2n+o(n) bits that allows to answer range minimum queries on A in constant time. This space is asymptotically optimal in the important setting where access to A is not permitted after the preprocessing step. Our scheme can be computed in linear time, using only n + o(n) additional bits at construction time. In interesting by-product is that we also improve on LCA-computation in BPS- or DFUDS-encoded trees.

📄 Content

For an array A [1, n] of n natural numbers or other objects from a totally ordered universe, a range minimum query rmq A (i, j) for i ≤ j returns the position of a minimum element in the sub-array A[i, j]; i.e., rmq A (i, j) = argmin i≤k≤j {A[k]}. This fundamental algorithmic problem has numerous applications, e.g., in text indexing [1,15,36], text compression [7], document retrieval [31,37,42], flowgraphs [19], range queries [40], position-restricted pattern matching [8], just to mention a few.

In all of these applications, the array A in which the range minimum queries (RMQs) are performed is static and known in advance, which is also the scenario considered in this article. In this case it makes sense to preprocess A into a (preprocessing-) scheme such that future RMQs can be answered quickly. We can hence formulate the following problem.

Given: a static array A [1, n] of n totally ordered objects. Compute: an (ideally small) data structure, called scheme, that allows to answer RMQs on A in constant time.

The historically first such scheme due to Gabow et al. [16] is based on the following idea: because an RMQ-instance can be transformed into an instance of lowest common ancestors (LCAs) in the Cartesian Tree [43], one can use any linear-time preprocessing scheme for O(1)-LCAs [3,5,23,41] in order to answer RMQs in constant time.

The problem of this transformation [16], both in theory and in practice, can be seen by the following dilemma: storing the Cartesian Tree explicitly (i.e., with labels and pointers) needs O(n log n) bits of space, while storing it succinctly in 2n + o(n) bits [4,30] does not allow to map the arrayindices to the corresponding nodes (see Sect. 1.1 for more details on why this is difficult).

A succinct data structure uses space that is close to the information-theoretic lower bound, in the sense that objects from a universe of cardinality L are stored in (1 + o(1)) log L bits. 1 Research on succinct data structures is very active, and we just mention some examples from the realm of trees [4,9,18,26,30,39], dictionaries [33,34], and strings [10,11,21,22,35,38], being well aware of the fact that this list is far from complete. This article presents the first succinct data structure for O(1)-RMQs in the standard word-RAM model of computation (which is also the model used in all LCA-and RMQ-schemes cited in this article).

Table 1. Preprocessing schemes for O(1)-RMQs, where |A| denotes the space for the (read-only) input array.

reference final space construction space comments [5,23,41] O(n log n) + |A| O(n log n) + |A| originally devised for LCA, but solve RMQ via Cartesian Tree [3] O(n log n) + |A| O(n log n) + |A| significantly simpler than previous schemes [2] O(n log n) + |A| O(n log n) + |A| only solution not based on Cartesian Trees [13] 2n

only for ±1rmq; A must be encoded as an n-bit-vector [37] 4n + o(n) O(n log n) + |A| only non-systematic data structure so far

Before detailing our contribution, we first classify and summarize existing solutions for O(1)-RMQs.

In accordance with common nomenclature [17], preprocessing schemes for O(1)-RMQs can be classified into two different types: systematic and non-systematic. Systematic schemes must store the input array A verbatim along with the additional information for answering the queries. In such a case the query algorithm can consult A when answering the queries; this is indeed what all systematic schemes make heavy use of. On the contrary, non-systematic schemes must be able to obtain their final answer without consulting the array. This second type is important for at least two reasons:

  1. In some applications, e.g., in algorithms for document retrieval [31,37] or position restricted substring matching [8], only the position of the minimum matters, but not the value of this minimum. In such cases it would be a waste of space (both in theory and in practice) to keep the input array in memory, just for obtaining the final answer to the RMQs, as in the case of systematic schemes. 2. If the time to access the elements in A is ω(1), this slowed-down access time propagates to the time for answering RMQs if the query algorithm consults the input array. As a prominent example, in string processing RMQ is often used in conjunction with the array of longest common prefixes of lexicographically consecutive suffixes, the so-called LCP-array [27]. However, storing the LCP-array efficiently in 2n + o(n) bits [36] increases the access-time to the time needed to retrieve an entry from the corresponding suffix array [27], which is Ω(log ǫ n) (constant ǫ > 0) at the very best if the suffix array is also stored in compressed form [21,35]. Hence, with a systematic scheme the time needed for answering RMQs on LCP could never be O(1) in this case. But exactly this would be needed for constant-time navigation in RMQ-based compressed suffix trees [15] (where for different reasons the LCP-array is still needed, so this is not the same as the

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut