Reconstructing Words from Right-Bounded-Block Words
Let $`\N`$ be the set of natural numbers, $`\N_0 = \N\cup\{0\}`$, and let $`\N_{\geq k}`$ be the set of all natural numbers greater than or equal to $`k`$. Let $`[n]`$ denote the set $`\{1,\ldots, n\}`$ and $`[n]_0 = [n]\cup\{0\}`$ for an $`n\in\N`$.
An alphabet $`\Sigma=\{\ta,\tb,\tc,\dots\}`$ is a finite set of letters and a word is a finite sequence of letters. We let $`\Sigma^*`$ denote the set of all finite words over $`\Sigma`$. The empty word is denoted by $`\varepsilon`$ and $`\Sigma^+`$ is the free semigroup $`\Sigma^*\backslash\{\varepsilon\}`$. The length of a word $`w`$ is denoted by $`|w|`$. Let $`\Sigma^{\leq k}:=\{w\in\Sigma^{\ast}|\,|w|\leq k\}`$ and $`\Sigma^k`$ be the set of all words of length exactly $`k\in\N`$. The number of occurrences of a letter $`\ta\in\Sigma`$ in a word $`w\in\Sigma^{\ast}`$ is denoted by $`|w|_\ta`$. The letter of a word $`w`$ is given by $`w[i]`$ for $`i\in[|w|]`$. The powers of $`w\in\Sigma^{\ast}`$ are defined recursively by $`w^0=\varepsilon`$, $`w^n=ww^{n-1}`$ for $`n\in\N`$. A word $`u\in\Sigma^{\ast}`$ is a factor of $`w\in\Sigma^{\ast}`$, if $`w=xuy`$ holds for some words $`x,y\in\Sigma^{\ast}`$. Moreover, $`u`$ is a prefix of $`w`$ if $`x=\varepsilon`$ holds and a suffix if $`y=\varepsilon`$ holds. The factor of $`w`$ from the to the letter will be denoted by $`w[i..j]`$ for $`1\leq i\leq j\leq |w|`$. Two words $`u,v\in\Sigma^{\ast}`$ are called conjugates or rotations of each other if there exist $`x,y\in\Sigma^{\ast}`$ with $`u=xy`$ and $`v=yx`$. Additional basic information about combinatorics on words can be found in .
Let $`<`$ be a total ordering on $`\Sigma`$. A word $`w\in\Sigma^{\ast}`$ is called right-bounded-block word if there exist $`\tx,\ty\in\Sigma`$ with $`\tx<\ty`$ and $`\ell\in\N_0`$ with $`w=\tx^{\ell}\ty`$.
A word $`u=\mathtt{a_1} \cdots \mathtt{a_n}\in\Sigma^n`$, for $`n\in\N`$, is a scattered factor of a word $`w\in\Sigma^+`$ if there exist $`v_0,\ldots,v_n\in\Sigma^{\ast}`$ with $`w=v_0\mathtt{a_1}v_1 \cdots v_{n-1}\mathtt{a_n}v_n`$. For words $`w,u\in\Sigma^{\ast}`$, define $`\binom{w}{u}`$ as the number of occurrences of $`u`$ as a scattered factor of $`w`$.
Notice that $`|w|_{\tx}=\binom{w}{\tx}`$ for all $`\tx\in\Sigma`$.
The following definition addresses Problem [reconstproblem].
A word $`w\in\Sigma^n`$ is called uniquely reconstructible/determined by the set $`S \subset \Sigma^{\ast}`$ if for all words $`v\in\Sigma^n\backslash\{w\}`$ there exists a word $`u\in S`$ with $`\binom{w}{u}\neq\binom{v}{u}`$.
Consider $`S=\{\ta\tb,\tb\ta\}`$. Then $`w=\ta\tb\tb\ta`$ is not uniquely reconstructible by $`S`$ since $`\left[\binom{w}{\ta\tb},\binom{w}{\tb\ta}\right]=[2,2]`$ is also the $`2`$-vector of binomial coefficients of $`\tb\ta\ta\tb`$. On the other hand $`S=\{\ta,\ta\tb,\ta\tb^2\}`$ reconstructs $`w`$ uniquely. The following remark gives immediate results for binary alphabets.
Let $`\Sigma=\{\ta,\tb\}`$ and $`w\in\Sigma^{n}`$. If $`|w|_\ta\in\{0,n\}`$ then $`w`$ contains either only $`\tb`$ or $`\ta`$ and by the given length $`n`$ of $`w`$, $`w`$ is uniquely determined by $`S=\{\ta\}`$. This fact is in particular an equivalence: $`w\in\Sigma^{n}`$ can be uniquely determined by $`\{\ta\}`$ iff $`|w|_\ta\in\{0,n\}`$. If $`|w|_\ta\in\{1,n-1\}`$, $`w`$ is not uniquely determined by $`\{\ta\}`$ as witnessed by $`\ta\tb`$ and $`\tb\ta`$ for $`n=2`$. It is immediately clear that the additional information $`\binom{w}{\ta\tb}`$ leads to unique determinism of $`w`$.
Lyndon words play an important role regarding the reconstruction problem. As shown in only scattered factors which are Lyndon words are necessary to determine a word uniquely, i.e., $`S`$ can always be assumed to be a set of Lyndon words.
Let $`<`$ be a total ordering on $`\Sigma`$. A word $`w\in\Sigma^{\ast}`$ is a Lyndon word iff for all $`u,v \in \Sigma^{+}`$ with $`w = uv`$, we have $`w <_{lex} vu`$ where $`<_{lex}`$ is the lexicographical ordering on words induced by $`<`$.
Let $`w`$ and $`u`$ be two words. The binomial coefficient $`\binom{w}{u}`$ can be computed using only binomial coefficients of the type $`\binom{w}{v}`$ where $`v`$ is a Lyndon word of length up to $`|u|`$ such that $`v \leq_{lex} u`$.
To obtain a formula to compute the binomial coefficient $`\binom{w}{u}`$ for $`w,u\in\Sigma^{\ast}`$ by binomial coefficients $`\binom{w}{v_i}`$ for Lyndon words $`v_1,\dots,v_k`$ with $`v_i\in\Sigma^{\leq|u|}`$, $`i\in[k]`$, and $`k\in\N`$ the definitions of shuffle and infiltration are necessary .
Let $`n_1,n_2\in\N`$, $`u_1 \in \Sigma^{n_1}`$, and $`u_2 \in \Sigma^{n_2}`$. Set $`n=n_1+n_2`$. The shuffle of $`u_1`$ and $`u_2`$ is the polynomial $`u_1 \shuffle u_2 = \sum_{I_1,I_2} w(I_1,I_2)`$ where the sum has to be taken over all pairs $`(I_1,I_2)`$ of sets that are partitions of $`[n]`$ such that $`|I_1| = n_1`$ and $`|I_2| = n_2`$. If $`I_1 = \{ i_{1,1} < \ldots < i_{1,n_1}\}`$ and $`I_2 = \{ i_{2,1} < \ldots < i_{2,n_2}\}`$, then the word $`w(I_1,I_2)`$ is defined such that $`w[i_{1,1}] w[i_{1,2}] \cdots w[i_{1,n_1}] = u_1`$ and $`w[i_{2,1}] w[i_{2,2}] \cdots w[i_{2,n_2}] = u_2`$ hold.
The infiltration is a variant of the shuffle in which equal letters can be merged.
Let $`n_1,n_2\in\N`$, $`u_1 \in \Sigma^{n_1}`$, and $`u_2 \in \Sigma^{n_2}`$. Set $`n=n_1+n_2`$. The infiltration of $`u_1`$ and $`u_2`$ is the polynomial $`u_1 \downarrow u_2 = \sum_{I_1,I_2} w(I_1,I_2)`$, where the sum has to be taken over all pairs $`(I_1,I_2)`$ of sets of cardinality $`n_1`$ and $`n_2`$ respectively, for which the union is equal to the set $`[n']`$ for some $`n' \leq n`$. Words $`w(I_1,I_2)`$ are defined as in the previous definition. Note that some $`w(I_1,I_2)`$ are not well defined if $`i_{1,j} = i_{2,k}`$ but $`u_1[j] \neq u_2[k]`$. In that case they do not appear in the previous sum.
Considering for instance $`u_1=\ta\tb\ta`$ and $`u_2=\ta\tb`$ gives the polynomials
\begin{align*}
u_1 \shuffle u_2 & = 2\ta\tb\ta\tb\ta+4\ta\ta\tb\tb\ta+2\ta\ta\tb\ta\tb+2\ta\tb\ta\ta\tb,\\
u_1 \downarrow u_2 &= \ta\tb\ta \shuffle \ta\tb + \ta\tb\ta + 2 \ta\tb\tb\ta + 2 \ta\ta\tb\ta + 2 \ta\tb\ta\tb.
\end{align*}
Based on Definitions [shuffle] and [infiltration], we are able to give a formula to compute a binomial coefficient from the ones making use of Lyndon words. This formula is given implicitely in : Let $`u\in\Sigma^{\ast}`$ be a non-Lyndon word. By there exist non-empty words $`x,y\in\Sigma^{\ast}`$ and with $`u = xy`$ and such that every word appearing in the polynomial $`x \shuffle y`$ is lexicographically less than or equal to $`u`$. Then, for all word $`w \in \Sigma^*`$, we have
\binom{w}{u} = \frac{1}{(x \shuffle y, u)} \left[ \binom{w}{x} \binom{w}{y} - \sum_{v \in \Sigma^{\ast}, v\neq u} (x \downarrow y, v) \binom{w}{v}\right],
where $`(P, v)`$ is a notation giving the coefficient of the word $`v`$ in the polynomial $`P`$. One may apply recursively this formula until only Lyndon factors are considered.
Considering $`\Sigma=\{\ta,\tb\}`$ the binomial coefficient $`\binom{w}{\tb \ta}`$ can be computed using the Lyndon words $`\ta`$, and $`\tb`$ by
\begin{align*}
\binom{w}{\tb\ta} &= \frac{1}{(\tb \shuffle \ta, \tb\ta)} \left[ \binom{w}{\tb} \binom{w}{\ta} - (\tb \downarrow \ta, \ta\tb) \binom{w}{\ta\tb} \right]
= \binom{w}{\tb} \binom{w}{\ta} - \binom{w}{\ta\tb}.
\end{align*}
Regarding word length three, the Lyndon words are $`\ta\ta\tb`$ and $`\ta\tb\tb`$. Let us give formulas to compute $`\binom{w}{\ta\tb\ta}`$, $`\binom{w}{\tb\ta\ta}`$, $`\binom{w}{\tb\ta\tb}`$ and $`\binom{w}{\tb\tb\ta}`$. Having $`x=\ta\tb`$ and $`y=\ta`$, we obtain
\binom{w}{\ta\tb\ta} = \binom{w}{\ta\tb} \left[ \binom{w}{\ta} - 1 \right] - 2 \binom{w}{\ta\ta\tb}.
For $`u = \tb\ta\ta`$, we can either choose $`x = \tb`$ and $`y = \ta\ta`$ or $`x = \tb\ta`$ and $`y = \ta`$. In the first case, we get
\binom{w}{\tb\ta\ta} = \binom{w}{\tb}\binom{w}{\ta\ta} - \binom{w}{\ta\tb\ta} - \binom{w}{\ta\ta\tb}
and by reinjecting formulas for $`\binom{w}{\ta\ta}`$ and $`\binom{w}{\ta\tb\ta}`$, obtained recursively,
\binom{w}{\tb\ta\ta} = \left[ \binom{w}{\ta} - 1 \right] \left[ \frac{1}{2} \binom{w}{\ta} \binom{w}{\tb} - \binom{w}{\ta\tb} \right] + \binom{w}{\ta\ta\tb}.
Finally, the last two formulas are quite similar to what we already had:
\binom{w}{\tb\ta\tb} = \binom{w}{\ta\tb} \left[ \binom{w}{\tb} - 1 \right] - 2 \binom{w}{\ta\tb\tb}
and
\binom{w}{\tb\tb\ta} = \left[ \binom{w}{\tb} - 1 \right] \left[ \frac{1}{2} \binom{w}{\ta} \binom{w}{\tb} - \binom{w}{\ta\tb} \right] + \binom{w}{\ta\tb\tb}.
Let $`w_1,\ldots, w_k\in\Sigma^{\ast}`$ for $`k\in\N`$, and $`K=(k_\ta)_{\ta\in \Sigma}`$ a sequence of $`|\Sigma|`$ natural numbers. A $`K-`$valid marking of $`w_1,\ldots, w_k`$ is a mapping $`\psi: [k]\times \N \rightarrow \N`$ such that for all $`j\in [k]`$, $`i,\ell\in [|w_j|]`$, and $`\ta\in \Sigma`$ there holds
-
if $`w_j[i]=\ta`$ then $`\psi(j,i)\leq k_\ta`$,
-
if $`i<\ell\leq |w_j|`$ and $`w_j[i]=w_j[\ell]=\ta`$ then $`\psi(j,i)<\psi(j,\ell)`$.
A $`K`$-valid marking of $`w_1,\ldots, w_k`$ is represented as the string $`w_1^\psi`$, $`w_2^\psi, \ldots`$, $`w_k^\psi`$, where $`w_j^\psi [i]=(w_j[i])_{\psi(j,i)}`$ for fresh letters $`(w_j[i])_{\psi(j,i)}`$.
For instance, let $`k=2`$, $`\Sigma=\{\ta,\tb\}`$, and $`w_1=\ta\ta\tb`$, $`w_2=\ta\tb\tb`$. Let $`k_\ta=3, k_\tb=2`$ define the sequence $`K`$. A $`K`$-valid marking of $`w_1, w_2`$ would be $`w_1^\psi=(\ta)_1(\ta)_3(\tb)_1, w_2^\psi =(\ta)_2 (\tb)_1(\tb)_2`$ defining $`\psi`$ implicitly by the indices. We used parentheses in the marking of the letters in order to avoid confusions.
We recall that a topological sorting of a directed graph $`G=(V,E)`$, with $`V=\{v_1,\ldots,v_n\}`$, is a linear ordering $`v_{\sigma(1)} < v_{\sigma(2)} < \ldots < v_{\sigma(n)}`$ of the nodes, defined by the permutation $`\sigma:[n]\rightarrow [n]`$, such that there exists no edge in $`E`$ from $`v_{\sigma(i)}`$ to $`v_{\sigma(j)}`$ for any $`i>j`$ (i.e., if $`v_{a}`$ comes after $`v_b`$ in the linear ordering, for some $`a=\sigma(i)`$ and $`b=\sigma(j)`$, then we have $`i>j`$ and there should be no edge between $`v_a`$ and $`v_b`$). It is a folklore result that any directed graph $`G`$ has a topological sorting if and only if $`G`$ is acyclic.
Let $`w_1,\ldots, w_k\in\Sigma^{\ast}`$ for $`k\in\N`$, $`K=(k_\ta)_{\ta\in \Sigma}`$ a sequence of $`|\Sigma|`$ natural numbers, and $`\psi`$ a $`K-`$valid marking of $`w_1,\ldots, w_k`$. Let $`G_\psi`$ be the graph that has $`\sum_{\ta\in \Sigma} k_\ta`$ nodes, labelled with the letters $`(\ta)_1,\ldots,(\ta)_{k_\ta}`$, for all $`\ta\in \Sigma`$, and the directed edges $`((w_j[i])_{\psi(j,i)},(w_j[i+1])_{\psi(j,i+1)})`$, for all $`j\in[k]`$, $`i\in[|w_j|]`$, and $`((\ta)_i,(\ta)_{i+1})`$, for all occuring $`i`$ and $`\ta\in\Sigma`$. We say that there exists a valid topological sorting of the $`\psi`$-marked letters of the words $`w_1,\ldots,w_k`$ if there exists a topological sorting of the nodes of $`G_{\psi}`$, i.e., $`G_{\psi}`$ is a directed acyclic graph.
The graph associated with the $`K`$-valid marking of $`w_1, w_2`$ from above would have the five nodes $`(\ta)_1,(\ta)_2,(\ta)_3,(\tb)_1,(\tb)_2`$ and the six directed edges $`((\ta)_1,(\ta)_3)`$, $`((\ta)_3,(\tb)_1)`$, $`((\ta)_2,(\tb)_1)`$, $`((\tb)_1,(\tb)_2)`$, $`((\ta)_1,(\ta)_2), ((\ta)_2,(\ta)_3)`$ (where the direction of the edge is from the left node to the right node of the pair defining it). This graph has the topological sorting $`(\ta)_1(\ta)_2(\ta)_3(\tb)_1(\tb)_2`$.
For $`w_1, \ldots, w_k\in \Sigma^*`$ and a sequence $`K=(k_\ta)_{\ta\in \Sigma}`$ of $`|\Sigma|`$ natural numbers, there exists a word $`w`$ such that $`w_i`$ is a scattered factor of $`w`$ with $`|w|_{\ta}=k_\ta`$, for all $`i\in [k]`$ and all $`\ta\in\Sigma`$, if and only if there exist a $`K`$-valid marking $`\psi`$ of the words $`w_1,\ldots,w_k`$ and a valid topological sorting of the $`\psi`$-marked letters of the words $`w_1,\ldots,w_k`$.
Proof. If $`w`$ is such that $`w_i`$ is a scattered factor of $`w`$, for all $`i\in [k]`$, and $`|w|_\ta=k_\ta`$, for all $`\ta\in \Sigma`$, then we can mark the $`i^{th}`$ occurrence of $`\ta`$ as $`(\ta)_i`$, for all $`\ta\in \Sigma`$ and $`i\in [k_\ta]`$. This induces a $`K`$-valid marking $`\psi`$ of the words $`w_i`$, and, moreover, the linear ordering of the nodes of $`G_\psi`$ induced by the order in which the marked letters (i.e., nodes of $`G_\psi`$) occur in $`w`$ is a topological sorting of $`G_{\psi}`$.
Let us now assume that there exists a $`K`$-valid marking $`\psi`$ of the words $`w_1,\ldots,w_k`$, and there exists a valid topological sorting of the $`\psi`$-marked letters of the words $`w_1,\ldots,w_k`$. Let $`w'`$ be the word obtained by writing the nodes of $`G_{\psi}`$ in the order given by its topological sorting and removing their markings. It is clear that $`w'`$ has $`w_i`$ as a scattered factor, for all $`i\in [k]`$, and that $`|w'|_\ta\leq k_\ta`$, for all $`\ta\in \Sigma`$. Let now $`w=w' \prod_{\ta\in \Sigma}\ta^{k_\ta-|w'|_\ta}`$, where $`\prod_{\ta\in \Sigma}\ta^{k_\ta-|w'|_\ta}`$ is the concatenation of the factors $`\ta^{k_\ta-|w'|_\ta}`$, for $`\ta\in \Sigma`$ in some fixed order. Now $`w`$ has $`w_i`$ as a scattered factor, for all $`i\in [k]`$, and $`|w|_\ta= k_\ta`$, for all $`\ta\in \Sigma`$. 0◻ ◻
Next we show that in Theorem [recon1] uniqueness propagates in the $`\Leftarrow`$-direction.
Let $`w_1, \ldots, w_k\in \Sigma^*`$ and $`K=(k_\ta)_{\ta\in \Sigma}`$ a sequence of $`|\Sigma|`$ natural numbers. If the following hold
-
there exists a unique $`K`$-valid marking $`\psi`$ of the words $`w_1,\ldots,w_k`$,
-
in the unique $`K`$-valid marking $`\psi`$ we have that for each $`\ta\in \Sigma`$ and $`\ell \in [k_\ta]`$ there exists $`i\in [k]`$ and $`j\in [|w_i|]`$ with $`\psi(i,j)=\ell`$, and
-
there exists a unique valid topological sorting of the $`\psi`$-marked letters of the words $`w_1,\ldots,w_k`$
then there exists a unique word $`w`$ such that $`w_i`$ is a scattered factor of $`w`$, for all $`i\in [k]`$ and $`|w|_\ta=k_\ta`$ for all $`\ta\in \Sigma`$.
Proof. Let $`w`$ be the word obtained by writing in order the letters of the unique valid topological sorting of the $`\psi`$-marked letters of the words $`w_1,\ldots,w_k`$ and removing their markings. It is clear that $`w'`$ has $`w_i`$ as a scattered factor, for all $`i\in [k]`$, and that $`|w|_\ta= k_\ta`$, for all $`\ta\in \Sigma`$. The word $`w`$ is uniquely defined (as there is no other $`K`$-valid marking nor valid topological sorting of the $`\psi`$-marked letters), and $`|w|_\ta=k_\ta`$, for all $`\ta\in \Sigma`$. 0◻ ◻
In order to state the second result, we need the projection $`\pi_S(w)`$ of a word $`w\in\Sigma^{\ast}`$ on $`S\subseteq\Sigma`$: $`\pi_S(w)`$ is obtained from $`w`$ by removing all letters from $`\Sigma \setminus S`$.
Set $`W=\{w_{\ta,\tb}\mid \ta<\tb\in \Sigma \}`$ such that
-
$`w_{\ta,\tb}\in \{\ta,\tb\}^*`$ for all $`\ta,\tb\in \Sigma`$,
-
for all $`w,w'\in W`$ and all $`\ta \in \Sigma`$, if $`|w|_{\ta} \cdot |w'|_{\ta} > 0`$, then $`|w|_{\ta} = |w'|_{\ta}`$.
Then there exists at most one $`w\in \Sigma^*`$ such that $`w_{\ta,\tb}`$ is $`\pi_{\{\ta,\tb\}}(w)`$ for all $`\ta,\tb\in \Sigma`$.
Proof. Notice firstly $`|W|=\frac{q(q-1)}{2}`$. Let $`k_\ta=|w_{\ta,\tb}|_\ta`$, for $`\ta<\tb\in \Sigma`$. These numbers are clearly well defined, by the second item in our hypothesis. Let $`K=(k_\ta)_{\ta\in \Sigma}`$. It is immediate that there exists a unique $`K`$-valid marking $`\psi`$ of the words $`(w_{\ta,\tb})_{\ta<\tb\in \Sigma}`$. As each two marked letters $`(\ta)_i`$ and $`(\tb)_j`$ (i.e., each two nodes $`(\ta)_i`$ and $`(\tb)_j`$ of $`G_\psi`$) appear in the marked word $`w^\psi_{\ta,\tb}`$, we know the order in which these two nodes should occur in a topological sorting of $`G_{\psi}`$. This means that, if $`G_\psi`$ is acyclic, then it has a unique topological sorting. Our statement follows now from Corollary [reconst_topo]. 0◻ ◻
Given the set $`W=\{w_{\ta,\tb}\mid \ta<\tb\in \Sigma \}`$ as in the statement of Theorem [recon2], with $`k_\ta=|w_{\ta,\tb}|_\ta`$, for $`\ta<\tb\in \Sigma`$, and $`K=(k_\ta)_{\ta\in \Sigma}`$, we can produce the unique $`K`$-valid marking $`\psi`$ of the words $`(w_{\ta,\tb})_{\ta<\tb\in \Sigma}`$ in linear time $`O(\sum_{\ta<\tb\in \Sigma} |w_{\ta,\tb}|)=O((q-1)\sum_{\ta \in \Sigma} k_\ta)`$: just replace the letter $`\ta`$ of $`w_{\ta,\tb}`$ by $`(\ta)_i`$, for all $`\ta`$ and $`i`$. The graph $`G_{\psi}`$ has $`O((q-1)\sum k_{\ta})`$ edges and $`O(\sum k_\ta)`$ vertices and can be constructed in linear time $`O((q-1)\sum k_\ta)`$. Sorting $`G_\psi`$ topologically takes $`O((q-1)\sum k_\ta)`$ time (see, e.g., the handbook ). As such, we conclude that reconstructing a word $`w\in \Sigma^*`$ from its projections over all two-letter-subsets of $`\Sigma`$ can be done in linear time w.r.t. the total length of the respective projections.
Theorem [recon2] is in a sense optimal: in order to reconstruct a word over $`\Sigma`$ uniquely, we need all its projections on two-letter-subsets of $`\Sigma`$. That is, it is always the case that for a strict subset $`U`$ of $`\{\{\ta,\tb\}\mid \ta<\tb\in \Sigma\}`$, with $`|U|=\frac{q(q-1)}{2} -1`$, there exist two words $`w'\neq w`$ such that $`\{\pi_p(w')\mid p\in U\}=\{\pi_p(w)\mid p\in U\}`$. We can, in fact, show the following results:
Let $`S_1,\ldots, S_k`$ be subsets of $`\Sigma`$. The following hold:
-
If each pair $`\{\ta,\tb\}\subseteq \Sigma`$ is included in at least one of the sets $`S_i`$, then we can reconstruct any word uniquely from its projections $`\pi_{S_1}(\cdot), \ldots, \pi_{S_k}(\cdot)`$.
-
If there exists a pair $`\{\ta,\tb\}`$ that is not contained in any of the sets $`S_1,\ldots, S_k`$, then there exist two words $`w`$ and $`w'`$ such that $`w\neq w'`$ and $`\pi_{S_1}(w)=\pi_{S_1}(w'), \ldots,`$ $`\pi_{S_k}(w)=\pi_{S_k}(w')`$.
Proof. The first part is, once again, a consequence of Corollary [reconst_topo]. The second part can be shown by assuming that $`\Sigma=\{\ta_1,\ldots,\ta_q\}`$ and the pair $`\{\ta_1,\ta_2\}`$ is not contained in any of the sets $`S_1,\ldots,S_k`$. Then, for $`w=\ta_1\ta_3 \ta_4\ldots \ta_q`$ and $`w'=\ta_2\ta_3 \ta_4\ldots \ta_q`$, we have that $`\pi_{S_1}(w)=\pi_{S_1}(w'), \ldots,`$ $`\pi_{S_k}(w)=\pi_{S_k}(w')`$. 0◻ ◻
In this context, we can ask how efficiently can we decide if a word is uniquely reconstructible from the projections $`\pi_{S_1}(\cdot), \ldots`$, $`\pi_{S_k}(\cdot)`$ for $`S_1,\dots,S_k\subset\Sigma`$.
Given the sets $`S_1,\ldots, S_k\subset \Sigma`$, we decide whether we can reconstruct any word uniquely from its projections $`\pi_{S_1}(\cdot), \ldots, \pi_{S_k}(\cdot)`$ in $`O(q^2 k)`$ time. Moreover, under the Strong Exponential Time Hypothesis (see the survey and the references therein), there is no $`O(q^{2-d} k^c)`$ algorithm for solving the above decision problem, for any $`d,c>0`$.
Proof. We begin with a series of preliminaries. Let us recall the Orthogonal Vectors problem: Given sets $`A, B`$ consisting of $`n`$ vectors in $`\{0,1\}^k`$, decide whether there are vectors $`a\in A`$ and $`b\in B`$ which are orthogonal (i.e., for any $`i\in [k]`$ we have $`a[i]b[i] = 0`$). This problem can be solved naïvely in $`O(n^2k)`$ time, but under the Strong Exponential Time Hypothesis there is no $`O(n^{2-d} k^c)`$ algorithm for solving it, for any $`d,c>0`$ (once more, see the survey and the references therein).
We show that our problem is equivalent to the Orthogonal Vectors problem.
Let us first assume that we are given the sets $`S_1,\ldots, S_k\subset \Sigma`$, and we want to decide whether we can reconstruct any word uniquely from its projections $`\pi_{S_1}(\cdot), \ldots, \pi_{S_k}(\cdot)`$. This is equivalent, according to Theorem [char_proj], to checking whether each pair $`\{\ta,\tb\}\subseteq \Sigma`$ is included in at least one of the sets $`S_i`$. For each letter $`\ta`$ of $`\Sigma`$ we define the $`k`$-dimensional vectors $`x_\ta`$ where $`x_\ta[i]=1`$ if $`\ta \in S_i`$ and $`x_\ta[i]=0`$ if $`\ta \not\in S_i`$. This can be clearly done in $`O(q k)`$ time. Now, there exists a pair $`\{\ta,\tb\}`$ that is not contained in any of the sets $`S_1,\ldots, S_k`$ if and only if there exists a pair of vectors $`\{x_\ta,x_\tb\}`$ such that $`x_\ta[i] x_\tb[i]=0`$ for all $`i\in [k]`$. We can check whether there exists a pair of vectors $`\{x_\ta,x_\tb\}`$ such that $`x_\ta[i] x_\tb[i]=0`$ for all $`i\in [k]`$ by solving the Orthogonal Vectors problem by using for both input sets of vectors the set $`\{x_\ta\mid \ta \in \Sigma\}`$. As such, we can check whether there exists a pair $`\{\ta,\tb\}`$ that is not contained in any of the sets $`S_1,\ldots, S_k`$ in $`O(q^2 k)`$ time.
Let us now assume that we are given two sets $`A, B`$ consisting of $`n`$ vectors in $`\{0,1\}^k`$, and we want to decide whether there are vectors $`a\in A`$ and $`b\in B`$ which are orthogonal (i.e., for any $`i\in [k]`$ we have $`a[i]b[i] = 0`$). We can compute the set of $`(k+2)`$-dimensional vectors $`A'`$ containing the vectors of $`A`$ extended with two new positions (position $`k+1`$ and position $`k+2`$) set to $`10`$ and the vectors of $`B`$ extended with two new positions (position $`k+1`$ and position $`k+2`$) set to $`01`$. To decide whether there are vectors $`a\in A`$ and $`b\in B`$ which are orthogonal is equivalent to decide whether there are vectors $`a,b\in A'`$ which are orthogonal (if two such vectors exist, they must be different on their last two positions, so one must come from $`A`$ and one from $`B`$). Assume that $`A'=\{x_1,x_2,\ldots,x_{2n}\}`$. Now we define an alphabet $`\Sigma=\{\ta_1,\ldots,\ta_{2n}\}`$ of size $`2n`$ and the sets $`S_1,\ldots, S_k`$, where $`\ta_j\in S_i`$ if and only if $`x_j[i]=1`$. Computing $`A'`$ and then the alphabet $`\Sigma`$ and the sets $`S_i`$, for $`i\in [k]`$, takes $`O(nk)`$ time. Now, to decide whether there are vectors $`x_i,x_j\in A'`$ which are orthogonal is equivalent to decide whether there exists a pair of letters $`\{\ta_i, \ta_j\}`$ of $`\Sigma`$ that is not contained in any of the sets $`S_1,\ldots, S_k`$. The conclusion of the theorem now follows. 0◻ ◻
In this section we present a method to reconstruct a binary word uniquely from binomial coefficients of right-bounded-block words. Let $`n\in\N`$ be a natural number and $`w\in\{\ta,\tb\}^n`$ a word. Since the word length $`n`$ is assumed to be known, $`|w|_\ta`$ is known if $`|w|_\tb`$ is given and vice versa. Set for abbreviation $`k_u=\binom{w}{u}`$ for $`u\in \Sigma^{\ast}`$. Moreover we assume w.l.o.g. $`k_\ta\leq k_\tb`$ and that $`k_\ta`$ is known (otherwise substitute each $`\ta`$ by $`\tb`$ and each $`\tb`$ by $`\ta`$, apply the following reconstruction method and revert the substitution). This implies that $`w`$ is of the form
\begin{align}
\label{basisword}
\tb^{s_1}\ta\tb^{s_2}\dots \tb^{s_{k_\ta}}\ta\tb^{s_{k_\ta+1}}
\end{align}
for $`s_i\in\N_0`$ and $`i\in[|w|_{\ta}+1]`$ with $`\sum_{i\in[k_\ta +1]}s_i=n-k_\ta=k_\tb`$ and thus we get for $`\ell\in[k_\ta]_0`$
\begin{equation}
\label{alb}
k_{\ta^{\ell}\tb}=\binom{w}{\ta^{\ell}\tb}=\sum_{i=\ell+1}^{k_\ta+1}\binom{i-1}{\ell}s_i.
\end{equation}
Notice that for fixed
$`\ell\in[k_\ta]_0`$ and $`c_i=\binom{i-1}{\ell}`$ for
$`i\in[k_\ta+1]\backslash[\ell]`$, we have $`c_i
Equation ([alb]) shows that reconstructing a word uniquely from binomial coefficients of right-bounded-block words equates to solve a system of Diophantine equations. The knowledge of $`k_{\tb}, \ldots, k_{\ta^{\ell} \tb}`$ provides $`\ell + 1`$ equations. If the equation of $`k_{\ta^{\ell} \tb}`$ has a unique solution for $`\{ s_{\ell + 1}, \ldots, s_{k_{\ta} + 1} \}`$ (in this case we say, by language abuse, that $`k_{\ta^{\ell} \tb}`$ is unique), then the system in row echelon form has a unique solution and thus the binary word is uniquely reconstructible. Notice that $`k_{\ta^{k_\ta}\tb}`$ is always unique since $`k_{\ta^{k_\ta}\tb}=s_{k_{\ta}+1}`$.
Consider $`n=10`$ and $`k_\ta=4`$. This leads to $`w=\tb^{s_1}\ta \tb^{s_2}\ta \tb^{s_3}\ta \tb^{s_4}\ta\tb^{s_5}`$ with $`\sum_{i\in[5]}s_i=6`$. Given $`k_{\ta\tb}=4`$ we get $`4=s_2+2s_3+3s_4+4s_5`$. The $`s_i`$ are not uniquely determined. If $`k_{\ta^2\tb}=2`$ is also given, we obtain the equation $`2=s_3+3s_4+6s_5`$ and thus $`s_3=2`$ and $`s_4=s_5=0`$ is the only solution. Substituting these results in the previous equation leads to $`s_2=0`$ and since we only have six $`\tb`$, we get $`s_1=4`$. Hence $`w=\tb^4\ta^2\tb^2\ta^2`$ is uniquely reconstructed by $`S=\{\ta,\ta\tb,\ta^2\tb\}`$.
The following definition captures all solutions for the equation defined by $`k_{\ta^{\ell}\tb}`$ for $`\ell\in[k_\ta]_0`$.
Set $`M(k_{\ta^{\ell}\tb})=\{(r_{\ell+1},\dots,r_{k_\ta +1})|\,k_{\ta^{\ell}\tb}=\sum_{i=\ell+1}^{k_{\ta}+1}\binom{i-1}{\ell}r_i\}`$ for fixed $`\ell\in[k_\ta]_0`$. We call $`k_{\ta^{\ell}\tb}`$ unique if $`|M(k_{\ta^{\ell}\tb})|=1`$.
By Remark [ci] the coefficients of each equation of the form ([alb]) are strictly increasing. The next lemma provides the range each $`k_{\ta^{\ell}\tb}`$ may take under the constraint $`\sum_{i=1}^{k_\ta+1}s_i=n-k_{\ta}`$.
Let $`n\in\N`$,
$`k\in[n]_0`$, $`j\in[k+1]`$ and
$`c_1,\dots,c_{k+1},s_1,\dots,s_{k+1}\in\N_0`$ with $`c_i
Proof. The case $`k=0`$ is trivial. Consider the case $`n=k`$, i.e.,
$`\sum_{i=1}^{k+1}s_i=0`$. This implies immediately $`s_i=0`$ for all
$`i\in[k+1]`$ and the equivalence holds. Assume for the rest of the
proof $`k This implies $`\sum_{i=j}^{k}c_i s_i' \geq c_{k+1}\ell`$. Since the
coefficients are strictly increasing we get
$`\sum_{i=j}^{k}c_i s_i' \leq c_k\sum_{i=j}^{k}
s_i' c_{k+1}(n-k)\leq
\sum_{i=j}^{k+1}c_i s_i' =\left(\sum_{i=j}^{k}c_i s_i' \right)+c_{k+1}(n-k-\ell).
Let $`k_{\ta}\in[n]_0`$, $`\ell\in[k_\ta]_0`$, and $`s_1,\dots,s_{k_{\ta}+1}\in\N_0`$ with $`\sum_{i=1}^{k_\ta+1}s_i=n-k_\ta`$. Then $`\binom{w}{\ta^{\ell}\tb}\in\left[\binom{k_\ta}{\ell}(n-k_\ta)\right]_0`$.
Proof. It follows directly from Equation [alb] and Lemma [maxvalues].0◻ ◻
The following lemma shows some cases in which $`k_{\ta^{\ell}\tb}`$ is unique.
Let $`k_{\ta}\in[n]`$, $`\ell\in[k_\ta]_0`$ and $`s_1,\dots,s_{k_\ta+1}\in\N_0`$ with $`\sum_{i=1}^{k_\ta+1}s_i=n-k_\ta`$. If $`k_{\ta^{\ell}\tb}\in[\ell]_0\cup\{\binom{k_\ta}{\ell}(n-k_\ta)\}`$ or $`k_{\ta^{\ell}\tb}=\binom{k_\ta-1}{\ell}r+\binom{k_\ta}{\ell}(n-k_{\ta}-r)`$ for $`r\in[k_\tb]_0`$ then $`k_{\ta^{\ell}\tb}`$ is unique.
Proof. Consider firstly $`k_{\ta^{\ell}\tb}\in[\ell]_0`$. By
Remark [ci]
we have $`c_{\ell+1}=1`$ and $`c_{\ell+2}=\ell+1`$. By $`c_i
Since we are not able to fully characterise the uniquely determined values for each $`k_{\ta^{\ell}\tb}`$ for arbitrary $`n`$ and $`\ell`$, the following proposition gives the characterisation for $`\ell\in\{0,1\}`$. Notice that we use $`k_{\ta}`$ immediately since it is determinable by $`n`$ and $`k_{\ta^0\tb}=k_{\tb}`$.
The word $`w \in \Sigma^{n}`$ is uniquely determined by $`k_\ta`$ and $`k_{\ta\tb}`$ iff one of the following occurs
-
$`k_\ta = 0`$ or $`k_\ta = n`$ (and obviously $`k_{\ta\tb} = 0`$),
-
$`k_\ta = 1`$ or $`k_\ta = n-1`$ and $`k_{\ta\tb}`$ is arbitrary,
-
$`k_\ta \in [n-2]_{\geq 2}`$ and $`k_{\ta\tb} \in \{0,1,k_\ta(n-k_\ta) - 1, k_\ta(n-k_\ta) \}`$.
Proof. Let us first prove that $`w`$ is uniquely determined in these cases. It is obvious if $`k_\ta = 0`$ or $`k_\ta = n`$ since the word is composed of the same letter repeated $`n`$ times. If $`k_\ta = 1`$, then $`w = \tb^{s_1} \ta \tb^{n-1-s_1}`$ and $`\binom{w}{\ta\tb} = n-1-s_1 = k_{\ta\tb}`$. Therefore $`w`$ is uniquely determined. If $`k_\ta = n-1`$, then $`w = \tb^{s_1} \ta \tb^{s_2} \cdots \ta \tb^{s_n}`$ with exactly one of the $`s_i`$ being non zero and, in fact, equal to one. We have $`\binom{w}{\ta\tb} = \sum_{i=2}^{n} (i-1) s_i`$ and, if $`k_{\ta\tb}`$ is given (between $`0`$ and $`n-1`$), then $`s_{k_{\ta\tb}+1} = 1`$ is the only non zero exponent. Consider now $`k_\ta\in[n-2]_{\geq 2}`$, i.e. $`w=\tb^{s_1}\ta\tb^{s_2}\dots \tb^{s_{k_\ta}}\ta\tb^{s_{k_\ta+1}}`$. Thus $`k_{\ta\tb} = 0`$ implies $`s_1 = n - k_\ta`$ and $`s_2 = 0, \ldots, s_{k_\ta + 1} = 0`$ while $`k_{\ta\tb} = 1`$ implies $`s_2 = 1`$, $`s_1 = n - k_\ta - 1`$ and $`s_3 =0, \ldots, s_{k_\ta + 1} = 0`$. By Lemma [maxvalues], we know that [alb] is maximal if and only if $`s_{k_\ta +1 } = n - k_{\ta}`$ and all the other $`s_i`$ are equal to zero. In that case, the value of the sum equals $`k_{\ta}(n - k_{\ta})`$. Therefore, if $`\binom{w}{\ta\tb} = k_{\ta}(n - k_{\ta})`$, the word $`w`$ is uniquely determined. Finally, if $`k_{\ta\tb} = k_{\ta}(n - k_{\ta}) -1`$, we must have $`s_{k_\ta + 1} \leq n - k_\ta - 1`$. If we choose $`s_{k_\ta + 1} = n - k_\ta - 1`$, it remains that $`\sum_{i=1}^{k_\ta} s_i = 1`$ and $`\sum_{i=2}^{k_\ta} (i-1)s_i = k_\ta -1`$. We must have $`s_{k_\ta} = 1`$ and the other ones equal to zero. In fact, choosing $`s_{k_\ta + 1} = n - k_\ta - 1`$ is the only possibility: if otherwise $`s_{k_\ta + 1} = n - k_\ta - \ell`$ with $`\ell > 1`$, we obtain that $`\sum_{i=2}^{k_\ta} (i-1) s_i \geq \ell k_\ta
- 1
$ with $\sum_{i=1}^{k_\ta} s_i = \ell`$. It is easy to check with Lemma [maxvalues] that these conditions are incompatible.
We now need to prove that $`w`$ cannot be uniquely determined if $`k_\ta \in [n-2]_{\geq 2}`$ and $`k_{\ta\tb} \in [k_\ta (n-k_{\ta}) - 2]_{\geq 2}`$. To this aim we will give two different sets of values for the $`s_i`$. The first decomposition is the greedy one. Let us put $`s_{k_\ta + 1} = \lfloor \frac{k_{\ta\tb}}{k_\ta} \rfloor`$, $`s_{(k_{\ta\tb} \bmod k_\ta) + 1} = 1`$ and the other $`s_i`$ equal to $`0`$. Let us finally modify the value of $`s_1`$ (which is, at this stage, equal to $`0`$ or $`1`$) by adding the value needed. By $`\sum_{i=1}^{k_{\ta}+1} s_i=n-k_\ta`$ we get $`s_1 \leftarrow s_1 + (n-k_\ta) -\lfloor \frac{k_{\ta\tb}}{k_\ta} \rfloor - 1`$. This implies $`\sum_{i=1}^{k_\ta + 1} s_i = 1 + (n-k_\ta) - \left \lfloor \frac{k_{\ta\tb}}{k_\ta} \right \rfloor - 1 + \left \lfloor \frac{k_{\ta\tb}}{k_\ta} \right \rfloor = n-k_\ta`$ and $`s_i \geq 0`$ for all $`i`$. Moreover we have $`\sum_{i=2}^{k_\ta + 1} (i-1) s_i = (k_{\ta\tb} \bmod k_\ta) + k_\ta \left \lfloor \frac{k_{\ta\tb}}{k_\ta} \right \rfloor = k_{\ta\tb}`$.
Now we provide a second decomposition for the $`s_i`$. First, let us assume that $`2 \leq k_{\ta\tb} < k_{\ta}`$. In that case, the greedy algorithm sets $`s_{k_{\ta\tb} + 1} = 1`$, $`s_1 = n - k_\ta - 1`$ and the other $`s_i`$ to $`0`$. Let us now set $`s_1 = n - k_{\ta} - 2`$ and all the other $`s_i`$ to $`0`$. Then, update $`s_{k_{\ta\tb}} \leftarrow s_{k_{\ta\tb}} + 1`$ and $`s_2 \leftarrow s_2 + 1`$ (in the case where $`k_{\ta\tb} = 2`$, $`s_2`$ will be equal to $`2`$ after these manipulations). We have that the sum in [alb] is equal to $`1 + (k_{\ta\tb} - 1)`$ as needed. Finally, if $`k_{\ta\tb} \geq k_\ta`$, then $`s_{k_\ta + 1}`$ was non zero in the greedy decomposition, and the idea is to reduce it of a value $`1`$. Let us set $`s_{k_\ta + 1} =\lfloor \frac{k_{\ta\tb}}{k_\ta} \rfloor - 1`$ and the other $`s_i`$ to $`0`$. Then, let us update some values: $`s_{(k_{\ta\tb} \bmod k_\ta) + 2} \leftarrow s_{(k_{\ta\tb} \bmod k_\ta) + 2} + 1`$ and $`s_{k_\ta} \leftarrow s_{k_\ta} + 1`$ if $`(k_{\ta\tb} \bmod k_\ta) \neq k_\ta - 1`$, and $`s_{k_\ta} = 2`$, $`s_2 = 1`$ otherwise. Finally, set $`s_1`$ to the right value, i.e., $`n-k_\ta - \sum_{i=2}^{k_\ta +1} s_i`$. It can be easily checked that, in both cases, $`s_1 \geq 0`$ (notice that $`(k_{\ta\tb} \bmod k_\ta) = k_\ta - 1`$ implies that $`\lfloor \frac{k_{\ta\tb}}{k_\ta} \rfloor \leq n - k_\ta -2`$) and that all $`s_i`$ sum up to $`n - k_\ta`$. Similarly, we can check that $`\sum_{i=2}^{k_\ta + 1} (i-1) s_i`$ is equal to $`k_{\ta\tb}`$ in both cases.
To sum up, we gave two different decompositions for the $`s_i`$ in cases where $`k_\ta \in [n-2]_{\geq 2}`$ and $`k_{\ta\tb} \in [k_\ta (n-k_{\ta}) - 2]_{\geq 2}`$. That implies that $`w`$ cannot be uniquely determined in those cases.0◻ ◻
In all cases not covered by Proposition [smallell] the word cannot be uniquely determined by $`\binom{w}{\ta}`$ and $`\binom{w}{\ta\tb}`$. The following theorem combines the reconstruction of a word with the binomial coefficients of right-bounded-block words.
Let $`j\in[k_\ta]_0`$. If $`k_{\ta^{j}\tb}`$ is unique, then the word $`w\in\Sigma^n`$ is uniquely determined by $`\{\tb,\ta\tb,\ta^2\tb,\dots,\ta^{j}\tb\}`$.
Proof. If $`k_{\ta^{j}\tb}`$ is unique, the coefficients $`s_{j+1},\dots,s_{k_{\ta}+1}`$ are uniquely determined. Substituting backwards the known values in the first $`j-1`$ equations ([alb]) (for $`\ell = 1, \ldots, j-1`$) we can now obtain successively the values for $`s_{j},\dots,s_1`$.0◻ ◻
Let $`\ell`$ be minimal such that $`k_{\ta^{\ell}\tb}`$ is unique. Then $`w`$ is uniquely determined by $`\{\ta,\ta\tb,\ta^2\tb,\dots,\ta^{\ell}\tb\}`$ and not uniquely determined by any $`\{\ta,\ta\tb,\ta^2\tb,\dots,\ta^{i}\tb\}`$ for $`i<\ell`$.
Proof. It follows directly from Theorem [uniquerecon].0◻ ◻
By an upper bound on the number of binomial coefficients to uniquely reconstruct the word $`w\in\Sigma^n`$ is given by the amount of the binomial coefficients of the $`(\lfloor \frac{16}{7}\sqrt{n}\rfloor +5)`$-spectrum. Notice that implicitly the full spectrum is assumed to be known. As proven in Section 2, Lyndon words up to this length suffice. Since there are $`\frac{1}{n}\sum_{d|n}\mu(d)\cdot 2^{\frac{n}{d}}`$ Lyndon words of length $`n`$, the combination of both results presented in states that, for $`n > 6`$,
\begin{align}
\label{eqform}
\sum_{i=1}^{\lfloor
\frac{16}{7}\sqrt{n}\rfloor +5} \frac{1}{i}\sum_{d|i}\mu(d)\cdot 2^{\frac{i}{d}}
\end{align}
binomial coefficients are sufficient for a unique reconstruction with the Möbius function $`\mu`$. Up to now, it was the best known upper bound.
Theorem [uniquerecon] shows that $`\min\{k_\ta,k_\tb\}+1`$ binomial coefficients are enough for reconstructing a binary word uniquely. By Proposition [smallell] we need exactly one binomial coefficient if $`n\in[3]`$ and at most two if $`n=4`$. For $`n\in\{5,6\}`$ we need at most $`n-2`$ different binomial coefficients. The following theorem shows that by Theorem [uniquerecon] we need strictly less binomial coefficients for $`n > 6`$.
Let $`w\in\Sigma^n`$. We have that $`\min\{k_\ta,k_\tb\}+1`$ binomial coefficients suffice to uniquely reconstruct $`w`$. If $`k_{\ta} \leq k_{\tb}`$, then the set of sufficient binomial coefficients is $`S = \{\tb,\ta\tb,\ta^2\tb,...,\ta^h \tb\}`$ where $`h=\lfloor\frac{n}{2}\rfloor`$. If $`k_{\ta} > k_{\tb}`$, then the set is $`S = \{\ta,\tb\ta,\tb^2\ta,...,\tb^h \ta\}`$. This bound is strictly smaller than ([eqform]).
Proof. Assume w.l.o.g. $`k_\ta\leq k_\tb`$. Then $`k_{\ta} \leq \frac{n}{2}`$ and Theorem [uniquerecon] shows that words in the set $`\{ \tb, \ta \tb, \ldots, \ta^{\lfloor \frac{n}{2} \rfloor} \tb \}`$ can reconstruct $`w`$ uniquely. If $`k_{\ta} > k_{\tb}`$, the set $`S`$ is obtained by replacing the letter $`\ta`$ by $`\tb`$ and vice-versa.
Set $`N_2(i):=\frac{1}{i}\sum_{d|i}\mu(d)2^{\frac{i}{d}}`$ for all $`i\in[\lfloor\frac{16}{7}\sqrt{n}\rfloor+5]`$, i.e., $`\sum_{i=1}^{\lfloor \frac{16}{7}\sqrt{n}\rfloor +5} N_2(i)`$, which is Equation [eqform], binomial coefficients suffice. By we have
\begin{align*}
N_2(i) & \geq
\frac{1}{i}\left(2^i-\frac{2^{\frac{i}{2}}-1}{2-1}\right)
= \frac{1}{i}\left(2^i-2^{\frac{i}{2}}+1\right)
= \frac{1}{i}\left(2^{\frac{i}{2}} (2^{\frac{i}{2}}-1)+1\right)
\geq \frac{2^{\frac{i}{2}}}{i}.
\end{align*}
This results in
\begin{align*}
\sum_{i=1}^{\lfloor
\frac{16}{7}\sqrt{n}\rfloor +5} N_2(i) &\geq \sum_{i=1}^{\lfloor
\frac{16}{7}\sqrt{n}\rfloor +5} \frac{2^{\frac{i}{2}}}{i}
\geq
\frac{1}{\frac{16}{7}\sqrt{n} +5} \, \frac{\sqrt{2}^{\frac{16}{7} \sqrt{n}+5}-1}{\sqrt{2}-1}.
\end{align*}
We want to show that this quantity is at least equal to $`\frac{n+1}{2}`$. Let us define
f(x) = \frac{1}{\frac{16}{7}\sqrt{x} +5} \, \frac{\sqrt{2}^{\frac{16}{7} \sqrt{x}+5}-1}{\sqrt{2}-1}
- \frac{x+1}{2}
for all $`x > 0`$, which is the continuous extension on $`\mathbb{R}^{+}`$ of the quantity we are interested in. It is easy to verify by hand that $`f(1), f(2), f(3)`$ and $`f(4)`$ are positive. Let us formally show that $`f(x) > 0`$ for all $`x \geq 5`$. Since this function is differentiable, we get with $`y = \frac{16}{7} \sqrt{x} + 5`$
f'(x) = \frac{1}{y} \, \sqrt{2}^{y} \,
\frac{\ln(\sqrt{2})}{\sqrt{2}-1} \, \frac{8}{7 \sqrt{x}} - \frac{1}{y^2}
\, \frac{8}{7 \sqrt{x}} \, \frac{\sqrt{2}^{y}-1}{\sqrt{2}-1} - \frac{1}{2}.
We thus have $`\sqrt{x} = \frac{7y-35}{16}`$ and $`y \geq 7`$ for all $`x \geq 1`$. By injecting $`y`$ in the previous expression, and reducing to the common denominator, we have to show that
\begin{align*}
& \, 2y \sqrt{2}^y 128 \ln(\sqrt{2}) - 256 (\sqrt{2}^y - 1) - 7(7y-35) y^2(\sqrt{2}-1) \\
= & \,\sqrt{2}^y (128 \ln(2) y - 256) + 256 - 49y^3 (\sqrt{2} - 1) + 245 y^2 (\sqrt{2} - 1) \\
\geq & \, \sqrt{2}^y 365 + 256 - 49y^3 (\sqrt{2} - 1) + 245 y^2 (\sqrt{2} - 1)
\end{align*}
is strictly positive. Let us call the last quantity $`g(y)`$. We will show that it is positive for all $`y \geq 10.05`$, which means that $`f(x)`$ is positive for all $`x`$ such that $`\frac{16}{7} \sqrt{x}
- 5 \geq 10.05
$, i.e., for all $x \geq 5`$. We have
\begin{align*}
g'(y) &= 365 \sqrt{2}^y \ln(\sqrt{2}) - 147(\sqrt{2}-1) y^2 + 490(\sqrt{2}-1)y, \\
g''(y) &= 365 \sqrt{2}^y (\ln(\sqrt{2}))^2 - 294(\sqrt{2}-1) y + 490(\sqrt{2}-1), \\
g'''(y) &= 365 \sqrt{2}^y (\ln(\sqrt{2}))^3 - 294(\sqrt{2}-1),
\end{align*}
and $`g'''(7) > 50`$, $`g''(8.5) > 2`$, $`g'(10.05) > 8`$ and finally $`g(10.05) > 1787`$. Since $`g'''(y)`$ is increasing and positive in $`7`$, $`g''(y)`$ is increasing for $`y \geq 7`$. Therefore $`g'(y)`$ is increasing for $`y \geq 8.5`$ and finally $`g(y)`$ is increasing for $`y \geq 10.05`$ and positive.0◻ ◻
By Lemma [uniquekalb] we know that $`k_{\ta^{\ell}\tb}`$ is unique if it is in $`[\ell]_0`$ or exactly $`\binom{k_\ta}{\ell}(n-k_\ta)`$. The probability for the latter is $`\frac{1}{2^n}`$ for $`w\in\{\ta,\tb\}^n`$. If $`k_{\ta^{\ell}\tb}=m\in[\ell]_0`$ we get by ([alb]) immediately $`s_{\ell+1}=m`$ and $`s_{i}=0`$ for $`\ell+2\leq i\leq k_\ta+1`$. Hence, the values for $`s_j`$ for $`j\in[\ell]`$ are not determined. By $`\sum_{i\in[\ell]}s_i=n-k_\ta-m`$ there are $`d=\sum_{i\in[\ell]_0}\binom{\ell}{\ell-i}\binom{n-k_\ta-m-1}{i-1}`$ possibilities to fulfill the constraints, i.e., we have a probability of $`\frac{d}{2^n}`$ to have such a word.
In this section we address the problem of reconstructing words over
arbitrary alphabets from their scattered factors. We begin with a series
of results of algorithmic nature. Let $`\Sigma=\{\ta_1,\dots,\ta_q\}`$
be an alphabet equipped with the ordering $`\ta_i<\ta_j`$ for
$`1\leq i Let $`w_1,\ldots, w_k\in\Sigma^{\ast}`$ for $`k\in\N`$, and
$`K=(k_\ta)_{\ta\in \Sigma}`$ a sequence of $`|\Sigma|`$ natural
numbers. A $`K-`$valid marking of $`w_1,\ldots, w_k`$ is a mapping
$`\psi: [k]\times \N \rightarrow \N`$ such that for all $`j\in [k]`$,
$`i,\ell\in [|w_j|]`$, and $`\ta\in \Sigma`$ there holds if $`w_j[i]=\ta`$ then $`\psi(j,i)\leq k_\ta`$, if $`i<\ell\leq |w_j|`$ and $`w_j[i]=w_j[\ell]=\ta`$ then
$`\psi(j,i)<\psi(j,\ell)`$. A $`K`$-valid marking of $`w_1,\ldots, w_k`$ is represented as the
string $`w_1^\psi`$, $`w_2^\psi, \ldots`$, $`w_k^\psi`$, where
$`w_j^\psi [i]=(w_j[i])_{\psi(j,i)}`$ for fresh letters
$`(w_j[i])_{\psi(j,i)}`$. For instance, let $`k=2`$, $`\Sigma=\{\ta,\tb\}`$, and
$`w_1=\ta\ta\tb`$, $`w_2=\ta\tb\tb`$. Let $`k_\ta=3, k_\tb=2`$ define
the sequence $`K`$. A $`K`$-valid marking of $`w_1, w_2`$ would be
$`w_1^\psi=(\ta)_1(\ta)_3(\tb)_1, w_2^\psi =(\ta)_2 (\tb)_1(\tb)_2`$
defining $`\psi`$ implicitly by the indices. We used parentheses in the
marking of the letters in order to avoid confusions. We recall that a topological sorting of a directed graph $`G=(V,E)`$,
with $`V=\{v_1,\ldots,v_n\}`$, is a linear ordering
$`v_{\sigma(1)} < v_{\sigma(2)} < \ldots < v_{\sigma(n)}`$ of the
nodes, defined by the permutation $`\sigma:[n]\rightarrow [n]`$, such
that there exists no edge in $`E`$ from $`v_{\sigma(i)}`$ to
$`v_{\sigma(j)}`$ for any $`i>j`$ (i.e., if $`v_{a}`$ comes after
$`v_b`$ in the linear ordering, for some $`a=\sigma(i)`$ and
$`b=\sigma(j)`$, then we have $`i>j`$ and there should be no edge
between $`v_a`$ and $`v_b`$). It is a folklore result that any directed
graph $`G`$ has a topological sorting if and only if $`G`$ is acyclic. Let $`w_1,\ldots, w_k\in\Sigma^{\ast}`$ for $`k\in\N`$,
$`K=(k_\ta)_{\ta\in \Sigma}`$ a sequence of $`|\Sigma|`$ natural
numbers, and $`\psi`$ a $`K-`$valid marking of $`w_1,\ldots, w_k`$.
Let $`G_\psi`$ be the graph that has $`\sum_{\ta\in \Sigma} k_\ta`$
nodes, labelled with the letters $`(\ta)_1,\ldots,(\ta)_{k_\ta}`$, for
all $`\ta\in \Sigma`$, and the directed edges
$`((w_j[i])_{\psi(j,i)},(w_j[i+1])_{\psi(j,i+1)})`$, for all
$`j\in[k]`$, $`i\in[|w_j|]`$, and $`((\ta)_i,(\ta)_{i+1})`$, for all
occuring $`i`$ and $`\ta\in\Sigma`$. We say that there exists a valid
topological sorting of the $`\psi`$-marked letters of the words
$`w_1,\ldots,w_k`$ if there exists a topological sorting of the nodes of
$`G_{\psi}`$, i.e., $`G_{\psi}`$ is a directed acyclic graph. The graph associated with the $`K`$-valid marking of $`w_1, w_2`$ from
above would have the five nodes
$`(\ta)_1,(\ta)_2,(\ta)_3,(\tb)_1,(\tb)_2`$ and the six directed edges
$`((\ta)_1,(\ta)_3)`$, $`((\ta)_3,(\tb)_1)`$, $`((\ta)_2,(\tb)_1)`$,
$`((\tb)_1,(\tb)_2)`$, $`((\ta)_1,(\ta)_2), ((\ta)_2,(\ta)_3)`$ (where
the direction of the edge is from the left node to the right node of the
pair defining it). This graph has the topological sorting
$`(\ta)_1(\ta)_2(\ta)_3(\tb)_1(\tb)_2`$. For
$`w_1, \ldots, w_k\in \Sigma^*`$ and a sequence
$`K=(k_\ta)_{\ta\in \Sigma}`$ of $`|\Sigma|`$ natural numbers, there
exists a word $`w`$ such that $`w_i`$ is a scattered factor of $`w`$
with $`|w|_{\ta}=k_\ta`$, for all $`i\in [k]`$ and all $`\ta\in\Sigma`$,
if and only if there exist a $`K`$-valid marking $`\psi`$ of the words
$`w_1,\ldots,w_k`$ and a valid topological sorting of the
$`\psi`$-marked letters of the words $`w_1,\ldots,w_k`$. Proof. If $`w`$ is such that $`w_i`$ is a scattered factor of $`w`$,
for all $`i\in [k]`$, and $`|w|_\ta=k_\ta`$, for all $`\ta\in \Sigma`$,
then we can mark the $`i^{th}`$ occurrence of $`\ta`$ as $`(\ta)_i`$,
for all $`\ta\in \Sigma`$ and $`i\in [k_\ta]`$. This induces a
$`K`$-valid marking $`\psi`$ of the words $`w_i`$, and, moreover, the
linear ordering of the nodes of $`G_\psi`$ induced by the order in which
the marked letters (i.e., nodes of $`G_\psi`$) occur in $`w`$ is a
topological sorting of $`G_{\psi}`$. Let us now assume that there exists a $`K`$-valid marking $`\psi`$ of
the words $`w_1,\ldots,w_k`$, and there exists a valid topological
sorting of the $`\psi`$-marked letters of the words $`w_1,\ldots,w_k`$.
Let $`w'`$ be the word obtained by writing the nodes of $`G_{\psi}`$ in
the order given by its topological sorting and removing their markings.
It is clear that $`w'`$ has $`w_i`$ as a scattered factor, for all
$`i\in [k]`$, and that $`|w'|_\ta\leq k_\ta`$, for all
$`\ta\in \Sigma`$. Let now
$`w=w' \prod_{\ta\in \Sigma}\ta^{k_\ta-|w'|_\ta}`$, where
$`\prod_{\ta\in \Sigma}\ta^{k_\ta-|w'|_\ta}`$ is the concatenation of
the factors $`\ta^{k_\ta-|w'|_\ta}`$, for $`\ta\in \Sigma`$ in some
fixed order. Now $`w`$ has $`w_i`$ as a scattered factor, for all
$`i\in [k]`$, and $`|w|_\ta= k_\ta`$, for all $`\ta\in \Sigma`$. 0◻ ◻ Next we show that in Theorem [recon1] uniqueness propagates in the
$`\Leftarrow`$-direction. Let
$`w_1, \ldots, w_k\in \Sigma^*`$ and $`K=(k_\ta)_{\ta\in \Sigma}`$ a
sequence of $`|\Sigma|`$ natural numbers. If the following hold there exists a unique $`K`$-valid marking $`\psi`$ of the words
$`w_1,\ldots,w_k`$, in the unique $`K`$-valid marking $`\psi`$ we have that for each
$`\ta\in \Sigma`$ and $`\ell \in [k_\ta]`$ there exists $`i\in [k]`$
and $`j\in [|w_i|]`$ with $`\psi(i,j)=\ell`$, and there exists a unique valid topological sorting of the $`\psi`$-marked
letters of the words $`w_1,\ldots,w_k`$ then there exists a unique word $`w`$ such that $`w_i`$ is a scattered
factor of $`w`$, for all $`i\in [k]`$ and $`|w|_\ta=k_\ta`$ for all
$`\ta\in \Sigma`$. Proof. Let $`w`$ be the word obtained by writing in order the letters
of the unique valid topological sorting of the $`\psi`$-marked letters
of the words $`w_1,\ldots,w_k`$ and removing their markings. It is clear
that $`w'`$ has $`w_i`$ as a scattered factor, for all $`i\in [k]`$, and
that $`|w|_\ta= k_\ta`$, for all $`\ta\in \Sigma`$. The word $`w`$ is
uniquely defined (as there is no other $`K`$-valid marking nor valid
topological sorting of the $`\psi`$-marked letters), and
$`|w|_\ta=k_\ta`$, for all $`\ta\in \Sigma`$. 0◻ ◻ In order to state the second result, we need the projection $`\pi_S(w)`$
of a word $`w\in\Sigma^{\ast}`$ on $`S\subseteq\Sigma`$: $`\pi_S(w)`$ is
obtained from $`w`$ by removing all letters from $`\Sigma \setminus S`$. Set
$`W=\{w_{\ta,\tb}\mid \ta<\tb\in \Sigma \}`$ such that $`w_{\ta,\tb}\in \{\ta,\tb\}^*`$ for all $`\ta,\tb\in \Sigma`$, for all $`w,w'\in W`$ and all $`\ta \in \Sigma`$, if
$`|w|_{\ta} \cdot |w'|_{\ta} > 0`$, then $`|w|_{\ta} = |w'|_{\ta}`$. Then there exists at most one $`w\in \Sigma^*`$ such that
$`w_{\ta,\tb}`$ is $`\pi_{\{\ta,\tb\}}(w)`$ for all
$`\ta,\tb\in \Sigma`$. Proof. Notice firstly $`|W|=\frac{q(q-1)}{2}`$. Let
$`k_\ta=|w_{\ta,\tb}|_\ta`$, for $`\ta<\tb\in \Sigma`$. These numbers
are clearly well defined, by the second item in our hypothesis. Let
$`K=(k_\ta)_{\ta\in \Sigma}`$. It is immediate that there exists a
unique $`K`$-valid marking $`\psi`$ of the words
$`(w_{\ta,\tb})_{\ta<\tb\in \Sigma}`$. As each two marked letters
$`(\ta)_i`$ and $`(\tb)_j`$ (i.e., each two nodes $`(\ta)_i`$ and
$`(\tb)_j`$ of $`G_\psi`$) appear in the marked word
$`w^\psi_{\ta,\tb}`$, we know the order in which these two nodes should
occur in a topological sorting of $`G_{\psi}`$. This means that, if
$`G_\psi`$ is acyclic, then it has a unique topological sorting. Our
statement follows now from Corollary
[reconst_topo]. 0◻ ◻ Given the set
$`W=\{w_{\ta,\tb}\mid \ta<\tb\in \Sigma \}`$ as in the statement of
Theorem [recon2], with $`k_\ta=|w_{\ta,\tb}|_\ta`$,
for $`\ta<\tb\in \Sigma`$, and $`K=(k_\ta)_{\ta\in \Sigma}`$, we can
produce the unique $`K`$-valid marking $`\psi`$ of the words
$`(w_{\ta,\tb})_{\ta<\tb\in \Sigma}`$ in linear time
$`O(\sum_{\ta<\tb\in \Sigma} |w_{\ta,\tb}|)=O((q-1)\sum_{\ta \in \Sigma} k_\ta)`$:
just replace the letter $`\ta`$ of $`w_{\ta,\tb}`$ by $`(\ta)_i`$, for
all $`\ta`$ and $`i`$. The graph $`G_{\psi}`$ has
$`O((q-1)\sum k_{\ta})`$ edges and $`O(\sum k_\ta)`$ vertices and can be
constructed in linear time $`O((q-1)\sum k_\ta)`$. Sorting $`G_\psi`$
topologically takes $`O((q-1)\sum k_\ta)`$ time (see, e.g., the handbook
). As such, we conclude that reconstructing a word $`w\in \Sigma^*`$
from its projections over all two-letter-subsets of $`\Sigma`$ can be
done in linear time w.r.t. the total length of the respective
projections. Theorem [recon2] is in a sense optimal: in order to
reconstruct a word over $`\Sigma`$ uniquely, we need all its projections
on two-letter-subsets of $`\Sigma`$. That is, it is always the case that
for a strict subset $`U`$ of $`\{\{\ta,\tb\}\mid \ta<\tb\in \Sigma\}`$,
with $`|U|=\frac{q(q-1)}{2} -1`$, there exist two words $`w'\neq w`$
such that $`\{\pi_p(w')\mid p\in U\}=\{\pi_p(w)\mid p\in U\}`$. We can,
in fact, show the following results: Let $`S_1,\ldots, S_k`$
be subsets of $`\Sigma`$. The following hold: If each pair $`\{\ta,\tb\}\subseteq \Sigma`$ is included in at least
one of the sets $`S_i`$, then we can reconstruct any word uniquely
from its projections $`\pi_{S_1}(\cdot), \ldots, \pi_{S_k}(\cdot)`$. If there exists a pair $`\{\ta,\tb\}`$ that is not contained in any
of the sets $`S_1,\ldots, S_k`$, then there exist two words $`w`$
and $`w'`$ such that $`w\neq w'`$ and
$`\pi_{S_1}(w)=\pi_{S_1}(w'), \ldots,`$
$`\pi_{S_k}(w)=\pi_{S_k}(w')`$. Proof. The first part is, once again, a consequence of Corollary
[reconst_topo]. The second part can be
shown by assuming that $`\Sigma=\{\ta_1,\ldots,\ta_q\}`$ and the pair
$`\{\ta_1,\ta_2\}`$ is not contained in any of the sets
$`S_1,\ldots,S_k`$. Then, for $`w=\ta_1\ta_3 \ta_4\ldots \ta_q`$ and
$`w'=\ta_2\ta_3 \ta_4\ldots \ta_q`$, we have that
$`\pi_{S_1}(w)=\pi_{S_1}(w'), \ldots,`$ $`\pi_{S_k}(w)=\pi_{S_k}(w')`$.
0◻ ◻ In this context, we can ask how efficiently can we decide if a word is
uniquely reconstructible from the projections
$`\pi_{S_1}(\cdot), \ldots`$, $`\pi_{S_k}(\cdot)`$ for
$`S_1,\dots,S_k\subset\Sigma`$. Given the sets
$`S_1,\ldots, S_k\subset \Sigma`$, we decide whether we can reconstruct
any word uniquely from its projections
$`\pi_{S_1}(\cdot), \ldots, \pi_{S_k}(\cdot)`$ in $`O(q^2 k)`$ time.
Moreover, under the Strong Exponential Time Hypothesis (see the
survey and the references therein), there is no $`O(q^{2-d} k^c)`$
algorithm for solving the above decision problem, for any $`d,c>0`$. Proof. We begin with a series of preliminaries. Let us recall the
Orthogonal Vectors problem: Given sets $`A, B`$ consisting of $`n`$
vectors in $`\{0,1\}^k`$, decide whether there are vectors $`a\in A`$
and $`b\in B`$ which are orthogonal (i.e., for any $`i\in [k]`$ we have
$`a[i]b[i] = 0`$). This problem can be solved naïvely in $`O(n^2k)`$
time, but under the Strong Exponential Time Hypothesis there is no
$`O(n^{2-d} k^c)`$ algorithm for solving it, for any $`d,c>0`$ (once
more, see the survey and the references therein). We show that our problem is equivalent to the Orthogonal Vectors
problem. Let us first assume that we are given the sets
$`S_1,\ldots, S_k\subset \Sigma`$, and we want to decide whether we can
reconstruct any word uniquely from its projections
$`\pi_{S_1}(\cdot), \ldots, \pi_{S_k}(\cdot)`$. This is equivalent,
according to Theorem [char_proj], to checking whether each
pair $`\{\ta,\tb\}\subseteq \Sigma`$ is included in at least one of the
sets $`S_i`$. For each letter $`\ta`$ of $`\Sigma`$ we define the
$`k`$-dimensional vectors $`x_\ta`$ where $`x_\ta[i]=1`$ if
$`\ta \in S_i`$ and $`x_\ta[i]=0`$ if $`\ta \not\in S_i`$. This can be
clearly done in $`O(q k)`$ time. Now, there exists a pair
$`\{\ta,\tb\}`$ that is not contained in any of the sets
$`S_1,\ldots, S_k`$ if and only if there exists a pair of vectors
$`\{x_\ta,x_\tb\}`$ such that $`x_\ta[i] x_\tb[i]=0`$ for all
$`i\in [k]`$. We can check whether there exists a pair of vectors
$`\{x_\ta,x_\tb\}`$ such that $`x_\ta[i] x_\tb[i]=0`$ for all
$`i\in [k]`$ by solving the Orthogonal Vectors problem by using for
both input sets of vectors the set $`\{x_\ta\mid \ta \in \Sigma\}`$. As
such, we can check whether there exists a pair $`\{\ta,\tb\}`$ that is
not contained in any of the sets $`S_1,\ldots, S_k`$ in $`O(q^2 k)`$
time. Let us now assume that we are given two sets $`A, B`$ consisting of
$`n`$ vectors in $`\{0,1\}^k`$, and we want to decide whether there are
vectors $`a\in A`$ and $`b\in B`$ which are orthogonal (i.e., for any
$`i\in [k]`$ we have $`a[i]b[i] = 0`$). We can compute the set of
$`(k+2)`$-dimensional vectors $`A'`$ containing the vectors of $`A`$
extended with two new positions (position $`k+1`$ and position $`k+2`$)
set to $`10`$ and the vectors of $`B`$ extended with two new positions
(position $`k+1`$ and position $`k+2`$) set to $`01`$. To decide whether
there are vectors $`a\in A`$ and $`b\in B`$ which are orthogonal is
equivalent to decide whether there are vectors $`a,b\in A'`$ which are
orthogonal (if two such vectors exist, they must be different on their
last two positions, so one must come from $`A`$ and one from $`B`$).
Assume that $`A'=\{x_1,x_2,\ldots,x_{2n}\}`$. Now we define an alphabet
$`\Sigma=\{\ta_1,\ldots,\ta_{2n}\}`$ of size $`2n`$ and the sets
$`S_1,\ldots, S_k`$, where $`\ta_j\in S_i`$ if and only if $`x_j[i]=1`$.
Computing $`A'`$ and then the alphabet $`\Sigma`$ and the sets $`S_i`$,
for $`i\in [k]`$, takes $`O(nk)`$ time. Now, to decide whether there are
vectors $`x_i,x_j\in A'`$ which are orthogonal is equivalent to decide
whether there exists a pair of letters $`\{\ta_i, \ta_j\}`$ of
$`\Sigma`$ that is not contained in any of the sets $`S_1,\ldots, S_k`$.
The conclusion of the theorem now follows. 0◻ ◻ Coming now back to combinatorial results, we use the method developed in
Section 3 to reconstruct a word over an arbitrary
alphabet. We show that we need at most $`\sum_{i\in[q]}|w|_i(q+1-i)`$
different binomial coefficients to reconstruct $`w`$ uniquely for the
alphabet $`\Sigma=\{1,\dots,q\}`$. In fact, following the results from
the first part of this section, we apply this method on all combinations
of two letters. Consider for an example that for
$`w\in\{\ta,\tb,\tn\}^{6}`$ the following binomial coefficients
$`\binom{w}{\ta^0\tb}=1`$, $`\binom{w}{\ta^0\tn}=2`$,
$`\binom{w}{\ta^1\tb}=0`$, $`\binom{w}{\ta^1\tn}=3`$,
$`\binom{w}{\tb^1\tn}=2`$, and $`\binom{w}{\ta^2\tn}=1`$ are given. By
$`|w|=6`$, $`|w|_\tb=1`$, and $`|w|_\tn=2`$, we get $`|w|_\ta=3`$.
Applying the method from
Section 3 for $`\{\ta,\tb\}`$, $`\{\ta,\tn\}`$, and
$`\{\tb,\tn\}`$ we obtain the scattered factors $`\tb\ta^3`$,
$`\ta\tn\ta\tn\ta`$, and $`\tb\tn^2`$. Combining all these three
scattered factors gives us uniquely $`\tb\ta\tn\ta\tn\ta`$. Notice that
in this example we only needed six binomial coefficients instead of ten,
which is the worst case. As seen in the example we have not only the word length but also
$`\binom{w}{\tx}`$ for all $`\tx\in\Sigma`$ but one. Both information
give us the remaining single letter binomial coefficient and hence we
will assume that we know all of them. For convenience in the following theorem consider
$`\Sigma=\{1,\dots,q\}`$ for $`q>2`$ and set
$`\alpha:=\lfloor\frac{16}{7}\sqrt{n}\rfloor +5`$. In the general case
the results by and yield that is smaller than the best known upper bound on the number of binomial
coefficients sufficient to reconstruct a word uniquely. The following theorem generalises
Theorem [binarybound] on an arbitrary alphabet. For
uniquely reconstructing a word $`w\in\Sigma^{\ast}`$ of length at least
$`q-1`$, $`\sum_{i\in[q]}|w|_i(q+1-i)`$ binomial coefficients suffice,
which is strictly smaller than
([generalnumber]). Proof. The claim that $`\sum_{i\in[q]}|w|_i(q+1-i)`$ binomial
coefficients suffice to reconstruct $`w`$ uniquely follows by
Theorem [recon2]: for each pair of letters we apply
the method of the binary case. We are thus going to reconstruct words
$`w_{\ta,\tb}`$ for all pairs of letters $`\ta < \tb`$. If $`\ta`$ is
the $`i^\text{th}`$ letter in the alphabet, there are $`q-i`$ such
pairs. To determine $`w_{\ta,\tb}`$ uniquely,
$`\min(k_{\ta}, k_{\tb}) + 1 \leq k_{\ta} + 1`$ binomial coefficients
from the set
$`\{k_{\tb}, k_{\ta \tb} ,\ldots, k_{\ta^{|w|_\ta} \tb} \}`$ suffice.
In total, we thus need the binomial coefficients of the set There are $`\sum_{i \in [q]} |w|_i (q-i) + (q-1)`$ such coefficients.
This quantity is less than or equal to $`\sum_{i \in [q]} |w|_i(q+1-i)`$
for every $`w`$ of length at least $`q-1`$. We show the second claim about the bound by induction on $`q`$ where the
binary case in Theorem [binarybound] serves as induction
basis. This implies and therefore On the other hand, we have to compare this quantity with
[generalnumber] which can be
rewritten as Thus the claim is proven, if the substraction of the latter one and the
previous one is greater than zero, i.e., we show that With
$`f(i)=\frac{(q-1)(q+1)^{\frac{i}{2}}-qq^{\frac{i}{2}}+1}{iq(q-1)}`$ for
all $`i\in[\alpha]`$, the proof of
([eqfinal]) contains the following steps For all $`i\geq 2`$ we have
$`f(i)\geq 0`$, $`f(5)+f(1)\geq 0`$, $`f(\alpha)-n > 0`$. Step [item1].: For $`i=2`$ we have For $`i=3`$ we have Consider the function $`g:\R\rightarrow\R;q\mapsto q^4-2q^3-2q^2+q+1`$.
This function has two minima (between $`-0.75`$ and $`-0.5`$ as well as
between $`1.75`$ and $`2`$) and one maximum (between $`0.125`$ and
$`0.25`$). Since $`g`$ has only two inflexion points and $`g`$ is
strictly greater than zero at the first minima, $`g`$ has only two
roots. The first root is between $`0.7`$ and $`0.8`$ and the second root
is between $`2.5`$ and $`2.75`$. Thus for all $`q\geq 2.75`$ we have
$`g(q)>0`$. This implies $`q^5+q^4-2q^3-2q^2+q+1>q^5`$. Hence
equivalently we get $`(q+1)(q^4-2q^2+1)>q^5`$, i.e.,
$`(q+1)(q^2-1)^2>qq^4`$. This implies $`\sqrt{q+1}(q^2-1)>\sqrt{q}q^2`$
which proves that the numerator of $`f(3)`$ is positive and hence
$`f(3)>0`$. Before we prove the claim for $`i\geq 4`$, we will prove
that $`(q-1)(q+1)^j\geq q^{j+1}`$ for $`j\geq 2`$. Firstly we get Due to the central symmetry of each row of the Pascal triangle and since
the distribution of the binomial coefficient is unimodal, for
$`k\le \lfloor j/2\rfloor`$, we have and thus Since $`k\le \lfloor j/2\rfloor`$, we have $`j-k+1>k`$ and each term of
the above sum is thus positive. This shows that
$`(q-1)(q+1)^j\ge q^{j+1}`$. This leads to the following estimations for
$`f(i)`$. For $`i=2j`$ and $`j\geq 2`$ we get Finally for $`i=2j+1`$ and $`j\geq 2`$ we get Step [item2].: Notice that $`\alpha\geq 7`$ holds
and thus $`f(5)`$ is always a summand. For $`f(5)+f(1)`$ we have to
prove Thus we get for the numerator We have $`q^3 \sqrt{q+1} > q^3 \sqrt{q}`$ and, since $`q \geq 3`$, Therefore
$`q^2 \sqrt{q+1} + 4q \sqrt{q+1} \geq 6 \sqrt{q+1} + 5q \sqrt{q}`$ and
the numerator is positive. Step [item3].: Notice that for fixed $`i`$,
$`f(i)`$ is monotonically increasing for increasing $`q`$. This implies We are going to prove that Recall that $`\alpha`$ is a function of $`n`$, given by
$`\alpha = \lfloor \frac{16}{7} \sqrt{n} \rfloor + 5`$. First, we have Indeed, this inequality is equivalent to We can check that this last inequality is true by taking the logarithm
of both sides, since $`\alpha > 5`$. Therefore, it is sufficient for
[lastEquation] to show that Note that $`2^{\frac{\alpha}{2}} > n`$ (indeed,
$`\lfloor \frac{16}{7} \sqrt{n} \rfloor+5 > 2 \sqrt{n}+5`$, thus
$`2^{\frac{\alpha}{2}} > 2^{\frac{5}{2}} \cdot 2^{ \sqrt{n}}`$). Once
again, taking the logarithms, one can check that
$`2^{\frac{5}{2}} \cdot 2^{ \sqrt{n}} > n`$ holds. To verify [lastEquation], it remains to show
that $`2^{\frac{\alpha}{2} - 1} -1 \geq 6 \alpha`$ or that
$`2^{\frac{\alpha}{2} - 1} > 6 \alpha`$. Taking the logarithms, it is
equivalent to which is true for $`\alpha \geq 15`$, that is for $`n \geq 16`$.
Equation [lastEquation] can be verified by a
computer for $`q-1 \leq n < 16`$. By [item1].,
[item2]., and
[item3]. Equation
([eqfinal]) is proven and this proves the
claim.0◻ ◻ Since the estimation in
Theorem [reconstructgeneral] depends on
the distribution of the letters in contrast to the method of
reconstruction, it is wise to choose an order $`<`$ on $`\Sigma`$ such
that $`x Let’s note that the number of binomial coefficients we need is at most
$`qn`$. Indeed, we will prove that
$`\sum_{i\in[q]}|w|_i(q+1-i)\leq qn`$. We have $`qn
=qn+n-n
=q\sum_{i\in[q]}|w|_i+\sum_{i\in[q]}|w|_i-\sum_{i\in[q]}|w|_i
\geq q\sum_{i\in[q]}|w|_i+\sum_{i\in[q]}|w|_i-\sum_{i\in[q]}(|w|_ii)
=\sum_{i\in[q]}|w|_i(q+1-i)`$. A reconstruction problem of words from scattered factors asks for the
minimal information, like multisets of scattered factors of a given
length or the number of occurrences of scattered factors from a given
set, necessary to uniquely determine a word. We show that a word
$`w\in\{\ta,\tb\}^*`$ can be reconstructed from the number of
occurrences of at most $`\min(|w|_\ta,|w|_\tb)+1`$ scattered factors of
the form $`\ta^i \tb`$, where $`|w|_{\ta}`$ is the number of occurrences
of the letter $`\ta`$ in $`w`$. Moreover, we generalize the result to
alphabets of the form $`\{1, \ldots, q\}`$ by showing that at most
$`\sum_{i=1}^{q-1} |w|_i \, (q-i+1)`$ scattered factors suffices to
reconstruct $`w`$. Both results improve on the upper bounds known so
far. Complexity time bounds on reconstruction algorithms are also
considered here. The general scheme for a so-called reconstruction problem is the
following one: given a sufficient amount of information about
substructures of a hidden discrete structure, can one uniquely determine
this structure? In particular, what are the fragments about the
structure needed to recover it all. For instance, a square matrix of
size at least $`5`$ can be reconstructed from its principal minors given
in any order . In graph theory, given some subgraphs of a graph (these subgraphs may
share some common vertices and edges), can one uniquely rebuild the
original graph? Given a finite undirected graph $`G=(V,E)`$ with $`n`$
vertices, consider the multiset made of the $`n`$ induced subgraphs of
$`G`$ obtained by deleting exactly one vertex from $`G`$. In particular,
one knows how many isomorphic subgraphs of a given class appear. Two
graphs leading to the same multiset (generally called a deck) are said
to be hypomorphic. A conjecture due to Kelly and Ulam states that two
hypomorphic graphs with at least three vertices are isomorphic . A
similar conjecture in terms of edge-deleted subgraphs has been proposed
by Harary . These conjectures are known to hold true for several
families of graphs. A finite word, i.e., a finite sequence of letters of some given
alphabet, can be seen as an edge- or vertex-labeled linear tree. So
variants of the graph reconstruction problem can be considered and are
of independent interest. Participants of the Oberwolfach meeting on
Combinatorics on Words in 2010 gave a list of 18 important open problems
in the field. Amongst them, the twelfth problem is stated as
reconstruction from subwords of given length. In the following
statement and all along the paper, a subword of a word is understood
as a subsequence of not necessarily contiguous letters from this word,
i.e., subwords can be obtained by deleting letters from the given word.
To highlight this latter property, they are often called scattered
subwords or scattered factors, which is the notion we are going to
use. Let $`k,n`$ be natural numbers. Words of length $`n`$ over a given
alphabet are said to be $`k`$-reconstructible whenever the multiset of
scattered factors of length $`k`$ (or $`k`$-deck) uniquely determines
any word of length $`n`$. Notice that the definition requires multisets to store the information
how often a scattered factor occurs in the words. For instance, the
scattered factor $`\tb\ta`$ occurs three times in $`\tb\ta\tb\ta`$ which
provides more information for the reconstruction than the mere fact that
$`\tb\ta`$ is a scattered factor. The challenge is to determine the function $`f(n)=k`$ where $`k`$ is the
least integer for which words of length $`n`$ are $`k`$-reconstructible.
This problem has been studied by several authors and one of the first
trace goes back to 1973 . Results in that direction have been obtained
by M.-P. Schützenberger (with the so-called Schützenberger’s Guessing
game) and L. Simon . They show that words of length $`n`$ sharing the
same multiset of scattered factors of length up to
$`\lfloor n/2 \rfloor +1`$ are the same. Consequently, words of
length $`n`$ are $`(\lfloor n/2 \rfloor +1)`$-reconstructible. In , this
upper bound has been improved: Krasikov and Roditty have shown that
words of length $`n`$ are $`k`$-reconstructible for
$`k\ge\lfloor 16\sqrt{n} /7\rfloor+5`$. On the other hand Dudik and
Schulmann provide a lower bound: if words of length $`n`$ are
$`k`$-reconstructible, then $`k\ge 3^{(\sqrt{2/3}-o(1))\log_3^{1/2}n}`$.
Bounds were also considered in . Algorithmic complexity of the
reconstruction problem is discussed, for instance, in . Note that the
different types of reconstruction problems have application in
philogenetic networks, see, e.g., , or in the context of molecular
genetics and coding theory . Another motivation, close to combinatorics on words, stems from the
study of $`k`$-binomial equivalence of finite words and $`k`$-binomial
complexity of infinite words (see for more details). Given two words of
the same length, they are $`k`$-binomially equivalent if they have the
same multiset of scattered factors of length $`k`$, also known as
$`k`$-spectrum (, , ). Given two words $`x`$ and $`y`$ of the same
length, one can address the following problem: decide whether or not
$`x`$ and $`y`$ are $`k`$-binomially equivalent? A polynomial time
decision algorithm based on automata and a probabilistic algorithm have
been addressed in . A variation of our work would be to find, given
$`k`$ and $`n`$, a minimal set of scattered factors for which the
knowledge of the number of occurrences in $`x`$ and $`y`$ permits to
decide $`k`$-binomial equivalence. Over an alphabet of size $`q`$, there are $`q^k`$ pairwise distinct
length-$`k`$ factors. If we relax the requirement of only considering
scattered factors of the same length, another interesting question is to
look for a minimal (in terms of cardinality) multiset of scattered
factors to reconstruct entirely a word. Let the binomial coefficient
$`\binom{u}{x}`$ be the number of occurrences of $`x`$ as a scattered
factor of $`u`$. The general problem addressed in this paper is
therefore the following one. Let $`\Sigma`$
be a given alphabet and $`n`$ a natural number. We want to reconstruct a
hidden word $`w \in \Sigma^n`$. To that aim, we are allowed to pick a
word $`u_i`$ and ask questions of the type “What is the value of
$`\binom{w}{u_i}`$?". Based on the answers to questions related to
$`\binom{w}{u_1}, \ldots, \binom{w}{u_i}`$, we can decide which will be
the next question (i.e. decide which word will be $`u_{i+1}`$). We want
to have the shortest sequence $`(u_1, \ldots, u_k)`$ uniquely
determining $`w`$ by knowing the values of
$`\binom{w}{u_1}, \ldots, \binom{w}{u_k}`$. We naturally look for a value of $`k`$ less than the upper bound for
$`k`$-reconstructibility. In this paper, we firstly recall the use of Lyndon words in the context
of reconstructibility. A word $`w`$ over a totally ordered alphabet is
called Lyndon word if it is the lexicographically smallest amongst all
its rotations, i.e., $`w=xy`$ is smaller than $`yx`$ for all non trivial
factorisations $`w=xy`$. Every binomial coefficient $`\binom{w}{x}`$ for
arbitrary words $`w`$ and $`x`$ over the same alphabet can be deduced
from the values of the coefficients $`\binom{w}{u}`$ for Lyndon words
$`u`$ that are lexicographically less than or equal to $`x`$. This
result is presented in
Section 2
along with the basic definitions. We consider an alphabet equipped with
a total order on the letters. Words of the form $`\ta^n \tb`$ with
letters $`\ta<\tb`$ and a natural number $`n`$ are a special form of
Lyndon words, the so-called right-bounded-block words. We consider the reconstruction problem from the information given by the
occurrences of right-bounded-block words as scattered factors of a word
of length $`n`$. In Section 3 we show how to reconstruct a word uniquely
from $`m+1`$ binomial coefficients of right-bounded-block words where
$`m`$ is the minimum number of occurrences of $`\ta`$ and $`\tb`$ in the
word. We also prove that this is less than the upper bound given in . In
Section 4 we reduce the problem for arbitrary
finite alphabets $`\{1,\dots,q\}`$ to the binary case. Here we show that
at most $`\sum_{i=1}^{q-1} |w|_i \, (q-i+1) \le q|w|`$ binomial
coefficients suffice to uniquely reconstruct $`w`$ with $`|w|_i`$ being
the number of occurrences of letter $`i`$ in $`w`$. Again, we compare
this bound to the best known one for the classical reconstruction
problem (from words of a given length). In the last section of the paper
we also propose several results of algorithmic nature regarding the
efficient reconstruction of words from given scattered factors. In this paper we have proven that a relaxation of the so far
investigated reconstruction problem from scattered factors from
$`k`$-spectra to arbitrary sets yields that less scattered factors than
the best known upper bound are sufficient to reconstruct a word
uniquely. Not only in the binary but also in the general case the
distribution of the letters plays an important role: in the binary case
the amount of necessary binomial coefficients is smaller the larger
$`|w|_\ta-|w|_\tb`$ is. The same observation results from the general
case - if all letters are equally distributed in $`w`$ then we need more
binomial coefficients than in the case where some letters rarely occur
and others occur much more often. Nevertheless the restriction to
right-bounded-block words (that are intrinsically Lyndon words) shows
that a word can be reconstructed by fewer binomial coefficients if
scattered factors from different spectra are taken. Further
investigations may lead into two directions: firstly a better
characterisation of the uniqueness of the $`k_{\ta^{\ell}\tb}`$ would be
helpful to understand better in which cases less than the worst case
amount of binomial coefficients suffices and secondly other sets than
the right-bounded-block words could be investigated for the
reconstruction problem.
\begin{align}
\label{generalnumber}
\sum_{i\in[\alpha]}\frac{1}{i}\frac{(q+1)^{\frac{i}{2}}-1}{q}
\end{align}
\{ k_{\ta^j \tb} : \ta < \tb, j \in [|w|_{\ta}] \} \cup \{k_\tb : \tb \in \Sigma \backslash \{1\} \}.
\begin{align*}
\sum_{i\in[q]}|w|_i(q+1-i)
&=\left(\sum_{i\in[q-1]}|w|_i(q+1-i)\right)+|w|_q(q+1-q)\\
&=\left(\sum_{i\in[q-1]}|w|_i(q-i)\right)+\sum_{i\in[q-1]}|w|_i+|w|_q
\end{align*}
\begin{align*}
\sum_{i\in[q]}|w|_i(q+1-i) &\leq \left(\sum_{i\in[\alpha]}\frac{1}{i}\frac{q^{\frac{i}{2}}-1}{q-1}\right)+n\\
&=\left(\sum_{i\in[\alpha]}\frac{1}{i}\frac{q(q^{\frac{i}{2}}-1)}{q(q-1)}\right)+n.
\end{align*}
\sum_{i\in[\alpha]}\frac{1}{i}\frac{(q+1)^{\frac{i}{2}}-1}{q}
=\sum_{i\in[\alpha]}\frac{1}{i}\frac{(q-1)((q+1)^{\frac{i}{2}}-1)}{q(q-1)}.
\begin{align}
\left(\sum_{i\in[\alpha]}\frac{1}{i} \frac{(q-1)((q+1)^{\frac{i}{2}}-1)-q(q^{\frac{i}{2}}-1)}{q(q-1)}\right)-n &>0,\mbox{ i.e.} \\
\left(\sum_{i\in[\alpha]}\frac{1}{i}
\frac{(q-1)(q+1)^{\frac{i}{2}}-qq^{\frac{i}{2}}+1}{q(q-1)}\right)-n &> 0.\label{eqfinal}
\end{align}
\begin{align*}
f(2)
=\frac{1}{2}\frac{(q-1)(q+1)-q^2+1}{q(q-1)}
=\frac{q^2-1-q^2+1}{2q(q-1)}
=0.
\end{align*}
\begin{align*}
f(3)
&=\frac{1}{3}\frac{(q-1)(q+1)\sqrt{q+1}-q^2\sqrt{q}+1}{q(q-1)}.
\end{align*}
(q-1)(q+1)^j
=\left(\sum_{k\in [j]}\left (\binom{j}{k-1}-\binom{j}{k}\right) q^k\right) +q^{j+1}-1.
\binom{j}{j-k}-\binom{j}{j-k+1}=-\left(\binom{j}{k-1}-\binom{j}{k}\right)>0
(q-1)(q+1)^j=\left(\sum_{k\in [\lfloor j/2\rfloor]} \left (\binom{j}{k}-\binom{j}{k-1}\right) (q^{j-k+1}-q^k)\right) +q^{j+1}-1.
\begin{align*}
f(i)
&= \frac{(q-1)(q+1)^j-qq^j+1}{iq(q-1)}\geq \frac{q^{j+1}-q^{j+1}+1}{iq(q-1)}>0.
\end{align*}
\begin{align*}
f(i)
&=\frac{(q-1)(q+1)^j\sqrt{q+1}-qq^j\sqrt{q}+1}{iq(q-1)}
&\geq \frac{q^{j+1}(\sqrt{q+1}-\sqrt{q})+1}{iq(q-1)}>0.
\end{align*}
\begin{align*}
\frac{(q-1)(q+1)^2\sqrt{q+1}-qq^2\sqrt{q}+1}{5q(q-1)} + \frac{(q-1)\sqrt{q+1}-q\sqrt{q}+1}{q(q-1)}\geq 0
\end{align*}
\begin{align*}
&(q-1)(q+1)^2\sqrt{q+1}-qq^2\sqrt{q}+1+5(q-1)\sqrt{q+1}-5q\sqrt{q}+5\\
&= (q-1)\sqrt{q+1}((q+1)^2+5)-q\sqrt{q}(q^2+5)+6\\
&=(q-1)\sqrt{q+1}(q^2+2q+6)-q\sqrt{q}(q^2+5)+6\\
&=q^3\sqrt{q+1}+q^2\sqrt{q+1}+4q\sqrt{q+1}-6\sqrt{q+1}-q^3\sqrt{q}-5q\sqrt{q}+6.
\end{align*}
q^2 \sqrt{q+1} \geq (6+q) \sqrt{q+1}.
f(\alpha)\geq \frac{2\cdot 4^{\frac{\alpha}{2}}-3\cdot 3^{\frac{\alpha}{2}}+1}{6\alpha}
=\frac{2^{\alpha+1}-3^{\frac{\alpha}{2}+1}+1}{6\alpha}.
\begin{align}
\label{lastEquation}
2^{\alpha+1}-3^{\frac{\alpha}{2}+1}+1 > 6\alpha n.
\end{align}
2^{\alpha+1}-3^{\frac{\alpha}{2}+1} > 2^{\alpha-1} - 2^{\frac{\alpha}{2}}.
\begin{align*}
2^{\frac{\alpha}{2}} \left(3 \cdot 2^{\frac{\alpha}{2}-1} + 1 \right) > 3^{\frac{\alpha}{2}+1} \quad \Leftrightarrow \quad
2^{\frac{\alpha}{2}-1} + \frac{1}{3} > \left(\frac{3}{2}\right)^{\frac{\alpha}{2}}.
\end{align*}
2^{\alpha-1} - 2^{\frac{\alpha}{2}} = 2^{\frac{\alpha}{2}} (2^{\frac{\alpha}{2}-1} -1) > 6 \alpha n.
\begin{align*}
&\frac{\alpha}{2} - 1 > \log(6) + \log(\alpha) \\
\Leftrightarrow &\alpha - 2 \log(\alpha) > 2 \log(6) + 2,
\end{align*}