Note on the Infiniteness and Equivalence Problems for Word-MIX Languages

Note on the Infiniteness and Equivalence Problems for Word-MIX Languages

Decidability and Examples

The decision problem whether both balance and pumping condition are satisfied for a given trace $`T \subseteq {\mathcal P}({\mathcal G}^{N}_A) \cup {\mathcal C}({\mathcal G}^{N}_A)`$ in $`{\mathcal G}^{N}_A`$ can be reduced into $`\mathrm{\Sigma}_1`$-formula (existential formula) of Presburger arithmetic (see the examples in below). The set of traces in $`{\mathcal G}^{N}_A`$ is clearly finite and effectively enumerable (due to Lemma [lem:trace]), in addition. Thus we obtain the following corollary.

For all words $`w_1, \cdots, w_k \in A^*`$, it is decidable whether $`L(w_1, \ldots, w_k)`$ is infinite or not.

Proof. Enumerate possible traces in $`{\mathcal G}^{N}_A`$ and check whether there is a trace that satisfies both balance and pumping condition. ◻

Consider the language $`L(ab, ba, a)`$ over $`A = \{a, b\}`$, $`\max(|ab|, |ba|, |a|) = 2`$ and the 2-dimensional de Bruijn graph $`{\mathcal G}^{2}_A`$ shown in Fig. 1. We claim that a trace $`T_1 = \{\pi_1 = (ba, ab)\} \cup \{ \gamma_1 = (ba, ab, ba)\}`$ satisfies both balance and pumping condition. One can easily observe that

\begin{align*}
  |ba|_{(ab,ba,a)} = (0,1,1) \quad
  |\pi_1|_{(ab,ba,a)} = (1,0,0) \quad
  |\gamma_1|_{(ab,ba,a)} = (1,1,1)
\end{align*}

and hence the coefficient $`x_1 = 1`$ simultaneously satisfies the two condition stated in (2) of Theorem [thm]. For each $`n \geq 1`$, by Proposition [prop:dg] the word $`ba (ba)^n b = \mathtt{word}_{{\mathcal G}^{N}_A}(\gamma_1^n \odot\pi_1)`$ is in $`\#\!\left(L(ab, ba, a)\right) = \infty`$. Hence $`ba (ba)^+ b \subseteq L(ab, ba, a)`$ and $`\#\!\left(L(ab, ba, a)\right) = \infty`$.

Next consider another language $`L(ab, ba, a, b)`$ over $`A = \{a, b\}`$, $`\max(|ab|, |ba|, |a|, |b|) = 2`$ and again the 2-dimensional de Bruijn graph $`{\mathcal G}^{2}_A`$ shown in Fig. 1. In contrast with Example [ex1], the trace $`T_1 = \{\pi_1 = (ba, ab)\} \cup \{\gamma_1 = (ba, ab, ba)\}`$ does not satisfy the balance condition any more (even it still satisfies the pumping condition). We have

\begin{align*}
  |ba|_{(ab,ba,a,b)} =& (0,1,1,1) \qquad
  |\pi_1|_{(ab,ba,a,b)} = (1,0,0,1) \\
  |\gamma_1|_{(ab,ba,a,b)} =& (1,1,1,1)
\end{align*}

We can formally prove that there is no positive coefficient $`x_1 \in \mathbb{N}\, (x_1 > 0)`$ that satisfies the balance condition, since the existence of such coefficients can be expressed in the following $`\mathrm{\Sigma}_1`$-formula of Presburger arithmetic

\begin{align*}
   \phi_{T_1} \xlongequal{\!\!\!\mathtt{def}\!\!\!}
   \exists c \Bigl( \exists x_1 \bigl(& x_1 > 0 \, \land \theta_{T_1}^{ab} = c \land \theta_{T_1}^{ba} = c
   \land \theta_{T_1}^{a} = c \land
   \theta_{T_1}^{b} = c \bigr) \Bigr)\\
   \equiv
 \exists c \Bigl( \exists x_1 \bigl(& x_1 > 0 \, \land \\
  & (0+1+x_1) = c \land (1+0+x_1) = c \, \land\\
  & (1+0+x_1) = c \land (1+1+x_1) = c \quad \bigr) \Bigr)
\end{align*}

where $`\theta_{T_1}^{w}`$ is a subexpression defined by

\theta_{T_1}^{w} \xlongequal{\!\!\!\mathtt{def}\!\!\!}|ba|_{(w)} + |\pi_1|_{(w)} +
 \underbrace{x_1 + \cdots + x_1}_{|\gamma_1|_{(w)}
 \text{ times}}.

$`\phi_{T_1}`$ can be algorithmically verified to be not valid since the validity of a first-order formula of Presburger arithmetic is decidable (cf. Section 6.2 of ). We can algorithmically verify, by using the same reduction into $`\mathrm{\Sigma}_1`$-formulae of Presburger arithmetic, that no trace in $`{\mathcal G}^{2}_A`$ satisfies both balance and pumping condition. Thus $`\#\!\left(L(ab,ba,a,b)\right) < \infty`$ by Theorem [thm].

Preliminaries

For a set $`X`$, we denote by $`\#\!\left(X\right)`$ the cardinality of $`X`$. We write $`\#\!\left(X\right) = \infty`$ if $`X`$ is an infinite set, and write $`\#\!\left(X\right) < \infty`$ otherwise. We denote by $`\mathbb{N}`$ the set of natural numbers including $`0`$. We call a mapping $`M: X \rightarrow \mathbb{N}`$ multiset over $`X`$.

Words and Orders

For an alphabet $`A`$, we denote the set of all (resp. non-empty) words over $`A`$ by $`A^*`$ (resp. $`A^+`$). We write $`A^n`$ (resp. $`A^{

|w|_v \xlongequal{\!\!\!\mathtt{def}\!\!\!}\#\!\left(\{ (w_1, w_2) \in A^* \times
 A^* \mid w_1 v w_2 = w \}\right).

For words $`w_1, \ldots, w_k \in A^*`$, we define

\begin{align*}
L(w_1, \ldots, w_k) \xlongequal{\!\!\!\mathtt{def}\!\!\!}\{ w \in A^* \mid |w|_{w_1} = \cdots =
 |w|_{w_k} \}
\end{align*}

and call it the Word-MIX language of $`k`$ parameter words $`w_1, \ldots, w_k`$ ((k-)WMIX for short). For a word $`w \in A^*`$, we denote the set of prefixes and suffixes of $`w`$ by

\begin{align*}
\mathrm{pref}(w) & \xlongequal{\!\!\!\mathtt{def}\!\!\!}\{ u \in A^* \mid
 uv = w \text{ for some } v \in A^*\} \\
\mathrm{suff}(w) & \xlongequal{\!\!\!\mathtt{def}\!\!\!}\{ v \in A^*
\mid uv = w \text{ for some } u \in A^* \}
\end{align*}

and denote the length-$`n`$ ($`n \leq |w|`$) prefix and suffix of $`w`$ by $`\mathrm{pref}_n(w)`$ and $`\mathrm{suff}_n(w)`$, respectively.

A quasi order $`\leq`$ on a set $`X`$ is called well-quasi-order (wqo for short) if any infinite sequence $`(x_i)_{i \in \mathbb{N}} \, (x_i \in X)`$ contains an increasing pair $`x_i \leq x_j`$ with $`i < j`$. Let $`\leq_1`$ be a quasi order on a set $`X_1`$ and $`\leq_2`$ be a quasi order on a set $`X_2`$. The product order $`\leq_{1,2}`$ is a quasi order on $`X_1 \times X_2`$ defined by

(x_1, y_1) \leq_{1,2} (x_2, y_2) \stackrel{\mathtt{def}}{\Longleftrightarrow}x_1 \leq_1
 x_2 \text{ and } y_1 \leq_2 y_2.

Let $`\leq_1`$ be a wqo on a set $`X_1`$ and $`\leq_2`$ be a wqo on a set $`X_2`$. The product order $`\leq_{1,2}`$ is again a wqo on $`X_1 \times X_2`$.

We list some examples of wqos below:

  1. The identity relation $`=`$ on any finite set $`X`$ is a wqo (the pigeonhole principle).

  2. The usual order $`\leq`$ on $`\mathbb{N}`$ is a wqo.

  3. The product order $`\leq_m`$ on $`\mathbb{N}^m`$ is a wqo for any $`m \geq 1`$ (Dickson’s lemma), which is a direct corollary of Lemma [wqo].

  4. The point-wise order $`\leq_{\mathtt{pt}}`$ on the multisets $`\mathbb{N}^X`$ ($`M \leq_{\mathtt{pt}}M' \stackrel{\mathtt{def}}{\Longleftrightarrow}M(x) \leq M'(x)`$ for all $`x \in X`$) over a finite set $`X`$ is a wqo (just a paraphrase of Dickson’s lemma).

Graphs and Walks

Let $`{\mathcal G}= (V, E)`$ be a (directed) graph. We call a sequence of vertices $`\omega= (v_1, \ldots, v_n) \in V^n \, (n \geq 1)`$ walk (from $`v_1`$ into $`v_n`$ in $`{\mathcal G}`$) if $`(v_i, v_{i+1}) \in E`$ for each $`i \in \{1, \ldots, n-1\}`$, and define the length of $`\omega`$ as $`n-1`$ and denote it by $`|\omega|`$. We denote by $`\mathtt{from}(\omega)`$ and $`\mathtt{into}(\omega)`$ the source $`\mathtt{from}(\omega) \xlongequal{\!\!\!\mathtt{def}\!\!\!}v_1`$ and the target $`\mathtt{into}(\omega) \xlongequal{\!\!\!\mathtt{def}\!\!\!}v_n`$ of $`\omega`$. $`\omega`$ is called an empty walk if $`|\omega| = 0`$. If two walks $`\omega_1 = (v_1, \ldots, v_m), \omega_2 = (v'_1, \ldots, v'_n)`$ is connectable (i.e., $`\mathtt{into}(\omega_1) = \mathtt{from}(\omega_2)`$), we write $`\omega_1 \odot\omega_2`$ for the connecting walk $`\omega_1 \odot\omega_2 \xlongequal{\!\!\!\mathtt{def}\!\!\!}(v_1, \ldots, v_m, v'_2, \ldots, v'_n)`$. A non-empty walk $`\omega`$ is called $`\emph{loop}`$ (on $`\mathtt{from}(\omega)`$) if $`\mathtt{from}(\omega) = \mathtt{into}(\omega)`$. A walk $`(v_1, \ldots, v_n)`$ is called path if $`v_i \neq v_j`$ for every $`i, j \in \{1, \ldots, n\}`$ with $`i \neq j`$. A loop $`(v, v_1, \ldots, v_n, v)`$ is called cycle if $`(v, v_1, \ldots, v_n)`$ is a path. We use the metavariable $`\pi`$ for a path, and the metavariable $`\gamma`$ for a cycle. For a cycle $`\gamma`$ and $`n \geq 1`$, we write $`\gamma^n`$ for the loop which is an $`n`$-times repetition of $`\gamma`$. We denote by $`{\mathcal W}({\mathcal G}), {\mathcal P}({\mathcal G}),`$ and by $`{\mathcal C}({\mathcal G})`$ the set of all walks, paths and cycles in $`{\mathcal G}`$. Note that $`{\mathcal W}({\mathcal G})`$ is infinite in general, but $`{\mathcal P}({\mathcal G})`$ and $`{\mathcal C}({\mathcal G})`$ are both finite if $`{\mathcal G}`$ is finite (i.e., $`\#\!\left(V\right) < \infty`$).

The $`N`$-dimensional de Bruijn graph $`{\mathcal G}^{N}_A= (A^N, E)`$ over $`A`$ is a graph whose vertex set $`A^N`$ is the set of words of length $`N`$ and the edge set $`E`$ is defined by

E \xlongequal{\!\!\!\mathtt{def}\!\!\!}\{ (a v, v b) \mid a,b \in A, v \in A^{N-1} \}.

The case $`N = 2`$ is depicted in Fig. 1.

The 2-dimensional de Bruijn graph 𝒢A2 over A = {a, b}, a walk (ba, aa, aa, ab, bb, ba) (dotted red arrow) on 𝒢A2 and its corresponding word baaabba.

Let $`v`$ be a vertex of $`{\mathcal G}^{N}_A`$. A word $`w = a_1 \cdots a_m \in A^+`$ induces the walk $`(v, v_1, \ldots, v_m)`$ (where $`v_i = \mathrm{suff}_n(v \, \mathrm{pref}_i(w))`$) in $`{\mathcal G}^{N}_A`$, and we denote it by $`\mathtt{walk}_{{\mathcal G}^{N}_A}(v, w)`$. Conversely, a walk $`\omega= (v_1, \ldots, v_n)`$ in $`{\mathcal G}^{N}_A`$ induces the word $`v_1 \mathrm{suff}_1(v_2) \cdots \mathrm{suff}_1(v_n) \in A^*`$, and we denote it by $`\mathtt{word}_{{\mathcal G}^{N}_A}(\omega)`$ (see Fig. 1). For words $`w, w_1, \ldots, w_k \in A^*`$ and a walk $`\omega= (v_0, v_1, \ldots, v_n) \in {\mathcal W}({\mathcal G}^{N}_A)`$, we define the following vectors in $`\mathbb{N}^k`$:

\begin{align*}
|w|_{(w_1, \ldots, w_k)}^{\mathtt{suff}} \xlongequal{\!\!\!\mathtt{def}\!\!\!}\, &
 (c_1, \ldots, c_k)
 \text{ where } c_i = 1 \text{ if }
 w_i \in \mathrm{suff}(w), c_i = 0 \text{ otherwise},\\
|w|_{(w_1, \ldots, w_k)} \xlongequal{\!\!\!\mathtt{def}\!\!\!}\, &
 (|w|_{w_1}, \ldots, |w|_{w_k})
 \qquad
 |\omega|_{(w_1, \ldots, w_k)} \xlongequal{\!\!\!\mathtt{def}\!\!\!}
 \sum_{i = 1}^{n} |v_i|_{(w_1, \ldots, w_k)}^{\mathtt{suff}}.
\end{align*}

We notice that the range of the summation in the above definition of $`|\omega|_{(w_1, \ldots, w_k)}`$ does not contain $`0`$, hence $`|\omega|_{(w_1, \ldots, w_k)} = (0, \ldots, 0)`$ if $`\omega`$ is an empty walk $`\omega= (v_0)`$. The next proposition states a basic property of $`{\mathcal G}^{N}_A`$.

Let $`w_1, \ldots, w_k \in A^*`$ and $`N = \max(|w_1|, \ldots, |w_k|)`$. For any pair of words $`v,w \in A^*`$ such that $`|v| = N`$, we have

|vw|_{(w_1, \ldots, w_k)}
=
|v|_{(w_1, \ldots, w_k)}
 + |\omega|_{(w_1, \ldots, w_k)}

where $`\omega= \mathtt{walk}_{{\mathcal G}^{N}_A}(v, w)`$.

Proof. Straightforward induction on the length of $`w`$. ◻

Characterisation of the Equivalence

In the previous section, multi-traces and traces play crucial role for the characterisation of the finiteness. Multi-traces are also important for the characterisation of the equivalence of WMIX languages which is given here. Before stating the main statement, we lift the notion of traces of walks to one of languages. For a language $`L \subseteq A^*`$, we define $`\mathop{\mathrm{\mathbb{N}Tr}}(L)`$ the multi-trace of a language $`L`$ (of order $`N`$) as

\mathop{\mathrm{\mathbb{N}Tr^N}}(L) \xlongequal{\!\!\!\mathtt{def}\!\!\!}\{ \mathop{\mathrm{\mathbb{N}Tr}}(\omega) \mid \omega= \mathtt{walk}_{{\mathcal G}^{N}_A}(v, u), |v| = N, vu \in L\}.

The following theorem states that any WMIX language is completely determined by its multi-trace (excluding shorter part $`A^{

  Let $`w_1, \ldots, w_k, w'_1, \ldots, w'_{k'} \in A^*`$ and $`N = \max(|w_1|, \ldots, |w_k|, |w'_1|, \ldots, |w'_{k'}|)`$. Then $`L(w_1, \ldots, w_k) = L(w'_1, \ldots, w'_{k'})`$ if and only if

L(w_1, \ldots, w_k) \cap A^{<N} = L(w'_1, \ldots, w'_{k'}) \cap
 A^{<N}

and

\mathop{\mathrm{\mathbb{N}Tr^N}}(L(w_1, \ldots, w_k)) = \mathop{\mathrm{\mathbb{N}Tr^N}}(L(w'_1, \ldots, w'_{k'})).

Proof. The “only-if”-part is trivial. We prove the “if”-part by contraposition. Assume $`L(w_1, \ldots, w_k) \neq L(w'_1, \ldots, w'_{k'})`$. Then we can assume that there is some word $`w`$ such that $`w \in L(w_1, \ldots, w_k)`$ but $`w \notin L(w'_1, \ldots, w'_{k'})`$ without loss of generality. If $`|w| < N`$ it is clear that

w \in  L(w_1, \ldots, w_k) \cap A^{<N} \neq L(w'_1, \ldots, w'_{k'}) \cap
 A^{<N} \not\ni w

and the “if”-part holds. Thus we consider the case $`|w| \geq N`$. Let $`w = vu`$ such that $`|v| = N`$ and $`M = \mathop{\mathrm{\mathbb{N}Tr}}(\mathtt{walk}_{{\mathcal G}^{N}_A}(v, u))`$. We now prove that $`L(w'_1, \ldots, w'_{k'})`$ does not contain any word $`w' = v'u' \, (|v'| = N)`$ that has the same multi-trace with $`w`$ (i.e., $`\mathop{\mathrm{\mathbb{N}Tr}}(\mathtt{walk}_{{\mathcal G}^{N}_A}(v', u')) = M = \mathop{\mathrm{\mathbb{N}Tr}}(\mathtt{walk}_{{\mathcal G}^{N}_A}(v, u))`$; $`v' = v`$ holds in this case). By Proposition [prop:dg] and Proposition [prop:mtrace], any subword occurrences in a word is completely determined by its multi-trace. Thus if there is a word $`w' = vu'`$ in $`L(w'_1, \ldots, w'_{k'})`$ such that $`\mathop{\mathrm{\mathbb{N}Tr}}(\mathtt{walk}_{{\mathcal G}^{N}_A}(v, u')) = M`$, then

\begin{align*}
|w'|_{(w'_1, \ldots, w'_{k'})} = &\, |v|_{(w'_1, \ldots, w'_{k'})} + \!\!\!\!\!\!
 \sum_{\pi\in {\mathcal P}({\mathcal G}^{N}_A)} \!\!\!\!\!
 M(\pi) \cdot |\pi|_{(w'_1, \ldots, w'_{k'})}
 + \!\!\!\!\!\! \sum_{\gamma\in {\mathcal C}({\mathcal G}^{N}_A)} \!\!\!\!\!
 M(\gamma) \cdot |\gamma|_{(w'_1, \ldots, w'_{k'})}
 \\
 = &\, |w|_{(w'_1, \ldots, w'_{k'})}
\end{align*}

from which we obtain $`w \in L(w'_1, \ldots, w'_{k'})`$; this contradicts with the assumption. Therefore we can conclude that

M \in \mathop{\mathrm{\mathbb{N}Tr^N}}(L(w_1, \ldots, w_k)) \neq
 \mathop{\mathrm{\mathbb{N}Tr^N}}(L(w'_1, \ldots, w'_{k'})) \not\ni M. \quad\qed

 ◻

Decidability

By using Theorem [thm:equiv], we can obtain an algorithm for deciding the equivalence of two WMIX languages. This algorithm also uses the decidability of Presburger arithmetic, as like the previous algorithm for the infiniteness, but in contrast to the case of inifiniteness, it is reduced into $`\mathrm{\Pi}_1`$-formula of Presburger arithmetic.

For any word $`w_1, \ldots, w_k, w'_1, \ldots, w'_{k'} \in A^*`$, it is decidable whether $`L(w_1, \ldots, w_k) = L(w'_1, \ldots, w'_{k'})`$ or not.

Proof. Let $`N = \max(|w_1|, \ldots, |w_k|, |w'_1|, \ldots, |w'_{k'}|)`$. We can effectively check $`L(w_1, \ldots, w_k) \cap A^{

\begin{align}
 M \in \mathop{\mathrm{\mathbb{N}Tr^N}}(L(w_1, \ldots, w_k)) \text{ if and only if }
 M \in \mathop{\mathrm{\mathbb{N}Tr^N}}(L(w'_1, \ldots, w'_{k'})) \tag{$\blacklozenge$}\label{lozen}
\end{align}

or not. If there is some multi-trace that does not satisfy Condition [lozen] then $`L(w_1, \ldots, w_k) \neq L(w'_1, \ldots, w'_{k'})`$, otherwise $`L(w_1, \ldots, w_k) = L(w'_1, \ldots, w'_{k'})`$ holds. Since every multi-trace can be represented by a corresponding trace and its multiplicity (positive coefficients), for a trace $`T = \{\pi\} \cup \{\gamma_1, \ldots, \gamma_m\}`$, the statement “every multi-trace $`M`$ with $`T = \{ \omega\in {\mathcal P}({\mathcal G}^{N}_A) \cup {\mathcal C}({\mathcal G}^{N}_A) \mid M(\omega) \neq 0\}`$ satisfies Condition [lozen]” can be represented by the following $`\mathrm{\Pi}_1`$-formula of Presburger arithmetic $`\psi_T`$:

\begin{align*}
 \psi_T \xlongequal{\!\!\!\mathtt{def}\!\!\!}\, & \forall x_1, \ldots, x_m \,
 (x_1 > 0 \land \cdots \land x_m > 0)\\
& \Rightarrow \left( \Bigl(\theta_{T}^{w_1} =
 \cdots = \theta_{T}^{w_k}
 \Bigr) \Leftrightarrow
 \Bigl( \theta_{T}^{w'_1} =
 \cdots = \theta_{T}^{w'_{k'}}
 \Bigr) \right)
\end{align*}

where $`\theta_T^{w}`$ is a subexpression defined by

\theta_T^{w} \xlongequal{\!\!\!\mathtt{def}\!\!\!}|\mathtt{from}(\pi)|_{(w)} + |\pi|_{(w)} +
 \sum_{i = 1}^{m} \underbrace{x_i + \cdots + x_i}_{|\gamma_i|_{(w)}
 \text{ times}}.  \qquad\qed

 ◻