Time-Space Trade-Offs for Longest Common Extensions
We revisit the longest common extension (LCE) problem, that is, preprocess a string $T$ into a compact data structure that supports fast LCE queries. An LCE query takes a pair $(i,j)$ of indices in $T$ and returns the length of the longest common prefix of the suffixes of $T$ starting at positions $i$ and $j$. We study the time-space trade-offs for the problem, that is, the space used for the data structure vs. the worst-case time for answering an LCE query. Let $n$ be the length of $T$. Given a parameter $\tau$, $1 \leq \tau \leq n$, we show how to achieve either $O(\infrac{n}{\sqrt{\tau}})$ space and $O(\tau)$ query time, or $O(\infrac{n}{\tau})$ space and $O(\tau \log({|\LCE(i,j)|}/{\tau}))$ query time, where $|\LCE(i,j)|$ denotes the length of the LCE returned by the query. These bounds provide the first smooth trade-offs for the LCE problem and almost match the previously known bounds at the extremes when $\tau=1$ or $\tau=n$. We apply the result to obtain improved bounds for several applications where the LCE problem is the computational bottleneck, including approximate string matching and computing palindromes. We also present an efficient technique to reduce LCE queries on two strings to one string. Finally, we give a lower bound on the time-space product for LCE data structures in the non-uniform cell probe model showing that our second trade-off is nearly optimal.
💡 Research Summary
The paper revisits the classic Longest Common Extension (LCE) problem, which asks for a data structure that, after preprocessing a string T of length n, can answer queries of the form LCE(i, j) – the length of the longest common prefix of the suffixes starting at positions i and j. While the extremes of this problem are well‑studied (O(1) query time with O(n) space, or O(n) time with O(1) space), there has been no smooth trade‑off between time and space. The authors fill this gap by introducing a tunable parameter τ (1 ≤ τ ≤ n) and presenting two complementary families of solutions.
The first family achieves space O(n/√τ) and query time O(τ). By sampling the original string at intervals of size τ and storing compact LCP information for each sample, the structure can answer a query by climbing a small hierarchy of sampled levels; the larger τ is, the fewer samples are stored, which reduces space at the cost of a linearly larger query time. This regime is ideal when memory is scarce and queries are relatively infrequent.
The second family uses space O(n/τ) and query time O(τ·log(|LCE(i,j)|/τ)). Here the authors build a multi‑level index where each level ℓ samples the string with spacing τ·2^ℓ. A query first probes the coarsest level to obtain a rough estimate of the common prefix, then descends to finer levels, performing a binary‑search‑like refinement. The logarithmic factor depends on the actual LCE length, so short matches are answered almost in constant time, while long matches incur an extra log factor. This trade‑off is near‑optimal: in the non‑uniform cell‑probe model the authors prove a lower bound of Ω(n²/(S·log n)) on the product of space S and query time, showing that any structure using O(n/τ) space must spend at least Ω(τ·log n) time in the worst case.
A noteworthy technical contribution is a reduction that transforms LCE queries on two separate strings into LCE queries on a single concatenated string. By inserting unique delimiters and appending reversed copies of the original strings, the authors ensure that any cross‑string suffix comparison can be simulated by a suffix comparison within the combined string, without increasing asymptotic space.
The paper then demonstrates how these trade‑offs improve several classic applications where LCE is the bottleneck. In approximate string matching with k allowed errors, the new structures enable an O(k·τ·log m) time algorithm (m = pattern length) while using only O(n/τ) additional space. For longest palindrome detection, each potential center requires two LCE queries; the proposed data structures yield O(n/τ) space and O(τ·log n) time per center, effectively achieving near‑linear overall time with sub‑linear memory. Similar gains are reported for repetitive‑structure detection and compressed‑index construction.
In summary, the authors provide the first smooth, parameterized time–space trade‑offs for LCE, backed by a near‑tight lower bound, a practical reduction for two‑string queries, and concrete algorithmic improvements for multiple string‑processing tasks. The work bridges the gap between theoretical limits and practical needs, offering a versatile toolkit for developers facing memory‑constrained environments or workloads with highly variable LCE lengths.