Succinct Indexable Dictionaries with Applications to Encoding $k$-ary Trees, Prefix Sums and Multisets

Succinct Indexable Dictionaries with Applications to Encoding $k$-ary   Trees, Prefix Sums and Multisets
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We consider the {\it indexable dictionary} problem, which consists of storing a set $S \subseteq {0,…,m-1}$ for some integer $m$, while supporting the operations of $\Rank(x)$, which returns the number of elements in $S$ that are less than $x$ if $x \in S$, and -1 otherwise; and $\Select(i)$ which returns the $i$-th smallest element in $S$. We give a data structure that supports both operations in O(1) time on the RAM model and requires ${\cal B}(n,m) + o(n) + O(\lg \lg m)$ bits to store a set of size $n$, where ${\cal B}(n,m) = \ceil{\lg {m \choose n}}$ is the minimum number of bits required to store any $n$-element subset from a universe of size $m$. Previous dictionaries taking this space only supported (yes/no) membership queries in O(1) time. In the cell probe model we can remove the $O(\lg \lg m)$ additive term in the space bound, answering a question raised by Fich and Miltersen, and Pagh. We present extensions and applications of our indexable dictionary data structure, including: An information-theoretically optimal representation of a $k$-ary cardinal tree that supports standard operations in constant time, A representation of a multiset of size $n$ from ${0,…,m-1}$ in ${\cal B}(n,m+n) + o(n)$ bits that supports (appropriate generalizations of) $\Rank$ and $\Select$ operations in constant time, and A representation of a sequence of $n$ non-negative integers summing up to $m$ in ${\cal B}(n,m+n) + o(n)$ bits that supports prefix sum queries in constant time.


💡 Research Summary

The paper introduces a new data‑structure problem called the “indexable dictionary”: given a universe {0,…,m‑1} and a set S of size n, store S using as few bits as possible while supporting two order‑based queries in constant time. The queries are Rank(x), which returns the number of elements of S smaller than x (or –1 if x∉S), and Select(i), which returns the i‑th smallest element of S. The information‑theoretic lower bound for any representation of an n‑element subset of an m‑element universe is B(n,m)=⌈log (m choose n)⌉ bits. Prior work achieved this bound for membership (yes/no) queries only; supporting Rank and Select simultaneously required additional space.

The authors present a RAM‑model data structure that uses B(n,m)+o(n)+O(log log m) bits and answers both Rank and Select in O(1) time. In the cell‑probe model the O(log log m) additive term can be eliminated, achieving exactly B(n,m)+o(n) bits, thus answering an open question posed by Fich & Miltersen and by Pagh.

Core construction

  1. Bit‑vector representation – The set S is encoded as a length‑m bit vector B where B

Comments & Academic Discussion

Loading comments...

Leave a Comment