A Unified Algorithm for Accelerating Edit-Distance Computation via Text-Compression
We present a unified framework for accelerating edit-distance computation between two compressible strings using straight-line programs. For two strings of total length $N$ having straight-line program representations of total size $n$, we provide an algorithm running in $O(n^{1.4}N^{1.2})$ time for computing the edit-distance of these two strings under any rational scoring function, and an $O(n^{1.34}N^{1.34})$ time algorithm for arbitrary scoring functions. This improves on a recent algorithm of Tiskin that runs in $O(nN^{1.5})$ time, and works only for rational scoring functions. Also, in the last part of the paper, we show how the classical four-russians technique can be incorporated into our SLP edit-distance scheme, giving us a simple $\Omega(\lg N)$ speed-up in the case of arbitrary scoring functions, for any pair of strings.
💡 Research Summary
The paper introduces a unified algorithmic framework for accelerating edit‑distance computation between two strings that are given in compressed form as straight‑line programs (SLPs). An SLP is a context‑free grammar that generates a single string; each non‑terminal corresponds to a substring, and the total size of the grammar, denoted by n, can be dramatically smaller than the actual length N of the strings. The authors exploit this hierarchical structure to replace the classic O(N²) dynamic‑programming (DP) table with a set of “compressed interval matrices” that are computed recursively along the SLP derivation tree.
For rational scoring functions (where insertion, deletion, and substitution costs are rational numbers), the algorithm computes each matrix product using fast matrix multiplication techniques, achieving a total running time of O(n¹·⁴ N¹·²). For arbitrary scoring functions, where fast multiplication cannot be directly applied, the authors devise a balanced approach that yields O(n¹·³⁴ N¹·³⁴) time. Both bounds improve upon the previous best result by Tiskin (O(n N¹·⁵)), which handled only rational scores and could not reduce the exponent on N below 1.5.
A key technical contribution is the definition of the “compressed interval matrix” M_X for each non‑terminal X. M_X
Comments & Academic Discussion
Loading comments...
Leave a Comment