(Sets of ) Complement Scattered Factors
Starting in the 1970s with the fundamental work of Imre Simon, \emph{scattered factors} (also known as subsequences or scattered subwords) have remained a consistently and heavily studied object. The majority of work on scattered factors can be split into two broad classes of problems: given a word, what information, in the form of scattered factors, are contained, and which are not. In this paper, we consider an intermediary problem, introducing the notion of \emph{complement scattered factors}. Given a word $w$ and a scattered factor $u$ of $w$, the complement scattered factors of $w$ with regards to $u$, $C(w, u)$, is the set of scattered factors in $w$ that can be formed by removing any embedding of $u$ from $w$. This is closely related to the \emph{shuffle} operation in which two words are intertwined, i.e., we extend previous work relating to the shuffle operator, using knowledge about scattered factors. Alongside introducing these sets, we provide combinatorial results on the size of the set $C(w, u)$, an algorithm to compute the set $C(w, u)$ from $w$ and $u$ in $O(\vert w \vert \cdot \vert u \vert \binom{w}{u})$ time, where $\binom{w}{u}$ denotes the number of embeddings of $u$ into $w$, an algorithm to construct $u$ from $w$ and $C(w, u)$ in $O(\vert w \vert^2 \binom{\vert w \vert}{\vert w \vert - \vert u \vert})$ time, and an algorithm to construct $w$ from $u$ and $C(w, u)$ in $O(\vert u \vert \cdot \vert w \vert^{\vert u \vert + 1})$ time.
💡 Research Summary
The paper introduces the notion of complement scattered factors, a concept that fills a gap in the study of subsequences (scattered factors) of words. Given a word w and a scattered factor u of w, the complement set C(w,u) consists of all words v such that w can be obtained by shuffling u and v; equivalently, v is the word that remains after deleting a single embedding of u from w. This definition ties directly to the classical shuffle operation, but from the opposite perspective: instead of interleaving two given words, one removes a chosen subsequence and records what is left.
The authors first present a dynamic‑programming algorithm (Algorithm 1) that computes C(w,u) in O(|w|·|u|·⌈w⌉_u) time, where ⌈w⌉_u denotes the number of embeddings of u in w. The DP table P
Comments & Academic Discussion
Loading comments...
Leave a Comment