Function-Correcting Codes for Insertion-Deletion Channel
In coding theory, handling errors that occur when symbols are inserted or deleted from a transmitted message is a long-standing challenge. Optimising redundancy for insertion and deletion channels remains a key open problem with significant importance for applications in DNA data storage and document exchange. Recently, a coding framework known as function-correcting codes has been proposed to address the challenge of minimising redundancy while preserving specific functions of the message. This framework has gained attention due to its potential applications in machine learning systems and long-term archival data storage. Motivated by the problem of redundancy optimisation for insertion and deletion channels, we propose a new framework called function-correcting codes for insdel channels. In this paper, we introduce the notions of function-correcting insertion codes, function-correcting deletion codes, and function-correcting insdel codes, and we show that these three formulations are equivalent. We then define insdel distance matrices and irregular insdel-distance codes, and derive lower and upper bounds on the optimal redundancy achievable by function-correcting codes for insdel channels. In addition, we establish Gilbert-Varshamov and Plotkin-like bounds on the length of irregular insdel-distance codes. Using the relation between optimal redundancy and the length of such codes, we obtain a simplified lower bound on optimal redundancy. Finally, we derive bounds on the optimal redundancy of function-correcting insdel codes for several classes of functions, including locally bounded functions, VT syndrome functions, the number-of-runs function, and the maximum-run-length function.
💡 Research Summary
The paper tackles the problem of designing codes that enable the reliable computation of a target function from a transmitted message that passes through an insertion‑deletion (insdel) channel, rather than requiring full message recovery. This “function‑correcting” paradigm, originally introduced for Hamming‑type channels, promises substantially lower redundancy when only a function of the data matters. The authors extend this paradigm to the insdel setting, which is fundamentally more challenging because insertions and deletions destroy symbol alignment and are measured by edit distance (or equivalently by the length of the longest common subsequence).
First, three notions are defined: function‑correcting insertion codes (FCI), function‑correcting deletion codes (FCD), and function‑correcting insdel codes (FCIDC). The paper proves that these formulations are equivalent: a code that corrects t insertions can be transformed into a code that corrects t deletions or a combination of t insertions and deletions with the same redundancy. This equivalence allows the authors to focus on a single unified model.
To analyse redundancy, the authors introduce irregular insdel‑distance codes. Unlike classical minimum‑distance codes, an irregular code permits different distance requirements for different pairs of codewords. These requirements are captured in an “insdel distance matrix” I∈ℕ^{M×M}, where each entry I_{ij} specifies the minimum insdel distance that the pair (c_i,c_j) must satisfy. By linking the structure of the target function f (e.g., its sensitivity, locality, or run‑based properties) to constraints on I, they establish a precise relationship: the optimal redundancy r*(f) equals the smallest possible code length n_min minus the message length k, i.e., r*(f)=n_min−k.
Using this bridge, the paper derives both lower and upper bounds on r*(f).
Lower bound: A Gilbert‑Varshamov‑type existential argument shows that for any function f,
r*(f) ≥ log₂(2^k / A_{ID}(t)),
where A_{ID}(t) is the maximum size of a code with insdel distance at least 2t+1. This bound mirrors the classic GV bound but is adapted to the edit‑distance metric, yielding a factor of roughly 2t compared with Hamming‑based results.
Upper bound: A Plotkin‑like inequality is proved for irregular insdel‑distance codes. If the average row sum of I exceeds a certain threshold, the code length cannot exceed a linear function of the redundancy. This leads to concrete upper bounds for function classes whose “local boundedness” limits how much the function value can change when a few symbols are altered. For such functions the redundancy scales only as Θ(log n).
The authors then apply the general theory to several concrete function families:
-
VT syndrome function – The classic Varshamov‑Tenengolts (VT) syndrome is itself a function that can be recovered with a single‑deletion correcting code. By reusing VT codes, the paper shows that the optimal redundancy for this function is exactly ⌈log₂(n+1)⌉ bits, matching the known VT redundancy.
-
Number‑of‑runs function r(x) – This function counts the number of runs (maximal constant blocks) in a binary string. By constructing an insdel‑distance matrix that respects run structure, the authors achieve redundancy O(log n) and prove a matching lower bound.
-
Maximum‑run‑length function – Similar techniques yield O(log n) redundancy, with the bound reflecting the fact that the longest run can only change by a limited amount under a bounded number of insertions/deletions.
-
Locally bounded functions – For functions where changing any t symbols can alter the output by at most Δ_f(t), the paper derives a general lower bound r ≥ Δ_f(t)·log₂ n and provides explicit constructions that meet this bound up to constant factors.
Overall, the paper establishes a comprehensive theoretical framework for function‑correcting codes over insdel channels, introduces the novel concept of irregular insdel‑distance codes, and demonstrates that, for many practically relevant functions, the redundancy required is dramatically lower than that of full‑message error‑correcting codes. The work bridges a gap between synchronization‑error coding and function‑level protection, opening avenues for DNA storage, asynchronous communication, and edge‑computing scenarios where only specific computations need to survive channel noise.
Future directions suggested include extending the framework to non‑binary alphabets, handling multiple functions simultaneously, and developing efficient decoding algorithms that operate close to the theoretical redundancy limits.
Comments & Academic Discussion
Loading comments...
Leave a Comment