A Unified Algorithm for Accelerating Edit-Distance Computation via Text-Compression

A Unified Algorithm for Accelerating Edit-Distance Computation via   Text-Compression
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We present a unified framework for accelerating edit-distance computation between two compressible strings using straight-line programs. For two strings of total length $N$ having straight-line program representations of total size $n$, we provide an algorithm running in $O(n^{1.4}N^{1.2})$ time for computing the edit-distance of these two strings under any rational scoring function, and an $O(n^{1.34}N^{1.34})$ time algorithm for arbitrary scoring functions. This improves on a recent algorithm of Tiskin that runs in $O(nN^{1.5})$ time, and works only for rational scoring functions. Also, in the last part of the paper, we show how the classical four-russians technique can be incorporated into our SLP edit-distance scheme, giving us a simple $\Omega(\lg N)$ speed-up in the case of arbitrary scoring functions, for any pair of strings.


💡 Research Summary

The paper introduces a unified algorithmic framework for accelerating edit‑distance computation between two strings that are given in compressed form as straight‑line programs (SLPs). An SLP is a context‑free grammar that generates a single string; each non‑terminal corresponds to a substring, and the total size of the grammar, denoted by n, can be dramatically smaller than the actual length N of the strings. The authors exploit this hierarchical structure to replace the classic O(N²) dynamic‑programming (DP) table with a set of “compressed interval matrices” that are computed recursively along the SLP derivation tree.

For rational scoring functions (where insertion, deletion, and substitution costs are rational numbers), the algorithm computes each matrix product using fast matrix multiplication techniques, achieving a total running time of O(n¹·⁴ N¹·²). For arbitrary scoring functions, where fast multiplication cannot be directly applied, the authors devise a balanced approach that yields O(n¹·³⁴ N¹·³⁴) time. Both bounds improve upon the previous best result by Tiskin (O(n N¹·⁵)), which handled only rational scores and could not reduce the exponent on N below 1.5.

A key technical contribution is the definition of the “compressed interval matrix” M_X for each non‑terminal X. M_X


Comments & Academic Discussion

Loading comments...

Leave a Comment