Variants of Constrained Longest Common Subsequence
In this work, we consider a variant of the classical Longest Common Subsequence problem called Doubly-Constrained Longest Common Subsequence (DC-LCS). Given two strings s1 and s2 over an alphabet A, a set C_s of strings, and a function Co from A to N, the DC-LCS problem consists in finding the longest subsequence s of s1 and s2 such that s is a supersequence of all the strings in Cs and such that the number of occurrences in s of each symbol a in A is upper bounded by Co(a). The DC-LCS problem provides a clear mathematical formulation of a sequence comparison problem in Computational Biology and generalizes two other constrained variants of the LCS problem: the Constrained LCS and the Repetition-Free LCS. We present two results for the DC-LCS problem. First, we illustrate a fixed-parameter algorithm where the parameter is the length of the solution. Secondly, we prove a parameterized hardness result for the Constrained LCS problem when the parameter is the number of the constraint strings and the size of the alphabet A. This hardness result also implies the parameterized hardness of the DC-LCS problem (with the same parameters) and its NP-hardness when the size of the alphabet is constant.
💡 Research Summary
The paper introduces a new variant of the classic Longest Common Subsequence (LCS) problem called Doubly‑Constrained LCS (DC‑LCS). Given two strings s₁ and s₂ over an alphabet A, a set Cₛ of “must‑contain” strings, and a function Co : A → ℕ that bounds the number of occurrences of each symbol, the task is to find the longest common subsequence s of s₁ and s₂ such that (i) every c ∈ Cₛ appears as a subsequence of s (so s is a super‑sequence of the constraint set) and (ii) for each a ∈ A the count of a in s does not exceed Co(a). This formulation simultaneously generalises two previously studied constrained LCS variants: the Constrained LCS (which only enforces condition (i)) and the Repetition‑Free LCS (which only enforces condition (ii)). The authors argue that DC‑LCS captures a realistic scenario in computational biology where a researcher wants to guarantee the presence of several conserved motifs while also preventing excessive repetition of particular residues.
The technical contributions are twofold. First, the authors present a fixed‑parameter tractable (FPT) algorithm when the parameter is the length ℓ of the desired solution. The algorithm builds on the colour‑coding technique: each of the ℓ positions of a prospective solution is assigned a distinct colour, and a mapping from colours to alphabet symbols respects the upper‑bound function Co. A dynamic‑programming table indexed by (i, j, S) – where i and j are positions in s₁ and s₂ and S is the set of colours already used – is filled in O*(2^ℓ·poly(n)) time (the O* notation hides polynomial factors). The DP transition checks both the subsequence compatibility with s₁ and s₂ and the Co‑constraints, and a final check verifies that all strings in Cₛ are embedded. Consequently, when the optimal subsequence is short (as is often the case for short conserved motifs), the algorithm runs efficiently despite the double constraints.
Second, the paper establishes a parameterised hardness result for the original Constrained LCS problem when the parameters are (i) the number k = |Cₛ| of constraint strings and (ii) the alphabet size σ = |A|. The reduction is from the Multi‑Colored Clique problem, which is W
Comments & Academic Discussion
Loading comments...
Leave a Comment