Note on the Infiniteness and Equivalence Problems for Word-MIX Languages
Decidability and Examples
The decision problem whether both balance and pumping condition are satisfied for a given trace $`T \subseteq {\mathcal P}({\mathcal G}^{N}_A) \cup {\mathcal C}({\mathcal G}^{N}_A)`$ in $`{\mathcal G}^{N}_A`$ can be reduced into $`\mathrm{\Sigma}_1`$-formula (existential formula) of Presburger arithmetic (see the examples in below). The set of traces in $`{\mathcal G}^{N}_A`$ is clearly finite and effectively enumerable (due to Lemma [lem:trace]), in addition. Thus we obtain the following corollary.
For all words $`w_1, \cdots, w_k \in A^*`$, it is decidable whether $`L(w_1, \ldots, w_k)`$ is infinite or not.
Proof. Enumerate possible traces in $`{\mathcal G}^{N}_A`$ and check whether there is a trace that satisfies both balance and pumping condition. ◻
Consider the language $`L(ab, ba, a)`$ over $`A = \{a, b\}`$, $`\max(|ab|, |ba|, |a|) = 2`$ and the 2-dimensional de Bruijn graph $`{\mathcal G}^{2}_A`$ shown in Fig. 1. We claim that a trace $`T_1 = \{\pi_1 = (ba, ab)\} \cup \{ \gamma_1 = (ba, ab, ba)\}`$ satisfies both balance and pumping condition. One can easily observe that
\begin{align*}
|ba|_{(ab,ba,a)} = (0,1,1) \quad
|\pi_1|_{(ab,ba,a)} = (1,0,0) \quad
|\gamma_1|_{(ab,ba,a)} = (1,1,1)
\end{align*}
and hence the coefficient $`x_1 = 1`$ simultaneously satisfies the two condition stated in (2) of Theorem [thm]. For each $`n \geq 1`$, by Proposition [prop:dg] the word $`ba (ba)^n b = \mathtt{word}_{{\mathcal G}^{N}_A}(\gamma_1^n \odot\pi_1)`$ is in $`\#\!\left(L(ab, ba, a)\right) = \infty`$. Hence $`ba (ba)^+ b \subseteq L(ab, ba, a)`$ and $`\#\!\left(L(ab, ba, a)\right) = \infty`$.
Next consider another language $`L(ab, ba, a, b)`$ over $`A = \{a, b\}`$, $`\max(|ab|, |ba|, |a|, |b|) = 2`$ and again the 2-dimensional de Bruijn graph $`{\mathcal G}^{2}_A`$ shown in Fig. 1. In contrast with Example [ex1], the trace $`T_1 = \{\pi_1 = (ba, ab)\} \cup \{\gamma_1 = (ba, ab, ba)\}`$ does not satisfy the balance condition any more (even it still satisfies the pumping condition). We have
\begin{align*}
|ba|_{(ab,ba,a,b)} =& (0,1,1,1) \qquad
|\pi_1|_{(ab,ba,a,b)} = (1,0,0,1) \\
|\gamma_1|_{(ab,ba,a,b)} =& (1,1,1,1)
\end{align*}
We can formally prove that there is no positive coefficient $`x_1 \in \mathbb{N}\, (x_1 > 0)`$ that satisfies the balance condition, since the existence of such coefficients can be expressed in the following $`\mathrm{\Sigma}_1`$-formula of Presburger arithmetic
\begin{align*}
\phi_{T_1} \xlongequal{\!\!\!\mathtt{def}\!\!\!}
\exists c \Bigl( \exists x_1 \bigl(& x_1 > 0 \, \land \theta_{T_1}^{ab} = c \land \theta_{T_1}^{ba} = c
\land \theta_{T_1}^{a} = c \land
\theta_{T_1}^{b} = c \bigr) \Bigr)\\
\equiv
\exists c \Bigl( \exists x_1 \bigl(& x_1 > 0 \, \land \\
& (0+1+x_1) = c \land (1+0+x_1) = c \, \land\\
& (1+0+x_1) = c \land (1+1+x_1) = c \quad \bigr) \Bigr)
\end{align*}
where $`\theta_{T_1}^{w}`$ is a subexpression defined by
\theta_{T_1}^{w} \xlongequal{\!\!\!\mathtt{def}\!\!\!}|ba|_{(w)} + |\pi_1|_{(w)} +
\underbrace{x_1 + \cdots + x_1}_{|\gamma_1|_{(w)}
\text{ times}}.
$`\phi_{T_1}`$ can be algorithmically verified to be not valid since the validity of a first-order formula of Presburger arithmetic is decidable (cf. Section 6.2 of ). We can algorithmically verify, by using the same reduction into $`\mathrm{\Sigma}_1`$-formulae of Presburger arithmetic, that no trace in $`{\mathcal G}^{2}_A`$ satisfies both balance and pumping condition. Thus $`\#\!\left(L(ab,ba,a,b)\right) < \infty`$ by Theorem [thm].
Preliminaries
For a set $`X`$, we denote by $`\#\!\left(X\right)`$ the cardinality of $`X`$. We write $`\#\!\left(X\right) = \infty`$ if $`X`$ is an infinite set, and write $`\#\!\left(X\right) < \infty`$ otherwise. We denote by $`\mathbb{N}`$ the set of natural numbers including $`0`$. We call a mapping $`M: X \rightarrow \mathbb{N}`$ multiset over $`X`$.
Words and Orders
For an alphabet $`A`$, we denote the set of all (resp. non-empty) words
over $`A`$ by $`A^*`$ (resp. $`A^+`$). We write $`A^n`$ (resp.
$`A^{ For words $`w_1, \ldots, w_k \in A^*`$, we define and call it the Word-MIX language of $`k`$ parameter words $`w_1,
\ldots, w_k`$ ((k-)WMIX for short). For a word $`w \in A^*`$, we denote
the set of prefixes and suffixes of $`w`$ by and denote the length-$`n`$ ($`n \leq |w|`$) prefix and suffix of $`w`$
by $`\mathrm{pref}_n(w)`$ and $`\mathrm{suff}_n(w)`$, respectively. A quasi order $`\leq`$ on a set $`X`$ is called well-quasi-order
(wqo for short) if any infinite sequence $`(x_i)_{i \in \mathbb{N}} \,
(x_i \in X)`$ contains an increasing pair $`x_i \leq x_j`$ with
$`i < j`$. Let $`\leq_1`$ be a quasi order on a set $`X_1`$ and
$`\leq_2`$ be a quasi order on a set $`X_2`$. The product order
$`\leq_{1,2}`$ is a quasi order on $`X_1 \times X_2`$ defined by Let $`\leq_1`$ be a wqo on a set
$`X_1`$ and $`\leq_2`$ be a wqo on a set $`X_2`$. The product order
$`\leq_{1,2}`$ is again a wqo on $`X_1 \times X_2`$. We list some examples of wqos below: The identity relation $`=`$ on any finite set $`X`$ is a wqo (the
pigeonhole principle). The usual order $`\leq`$ on $`\mathbb{N}`$ is a wqo. The product order $`\leq_m`$ on $`\mathbb{N}^m`$ is a wqo for any
$`m \geq 1`$ (Dickson’s lemma), which is a direct corollary of
Lemma [wqo]. The point-wise order $`\leq_{\mathtt{pt}}`$ on the multisets
$`\mathbb{N}^X`$
($`M \leq_{\mathtt{pt}}M' \stackrel{\mathtt{def}}{\Longleftrightarrow}M(x) \leq M'(x)`$
for all $`x \in X`$) over a finite set $`X`$ is a wqo (just a
paraphrase of Dickson’s lemma). Let $`{\mathcal G}= (V, E)`$ be a (directed) graph. We call a sequence
of vertices $`\omega= (v_1, \ldots, v_n) \in V^n \,
(n \geq 1)`$ walk (from $`v_1`$ into $`v_n`$ in $`{\mathcal G}`$) if
$`(v_i, v_{i+1}) \in E`$ for each $`i
\in \{1, \ldots, n-1\}`$, and define the length of $`\omega`$ as
$`n-1`$ and denote it by $`|\omega|`$. We denote by
$`\mathtt{from}(\omega)`$ and $`\mathtt{into}(\omega)`$ the source
$`\mathtt{from}(\omega)
\xlongequal{\!\!\!\mathtt{def}\!\!\!}v_1`$ and the target
$`\mathtt{into}(\omega) \xlongequal{\!\!\!\mathtt{def}\!\!\!}v_n`$ of
$`\omega`$. $`\omega`$ is called an empty walk if $`|\omega| = 0`$. If
two walks $`\omega_1 = (v_1, \ldots, v_m), \omega_2 = (v'_1, \ldots,
v'_n)`$ is connectable (i.e.,
$`\mathtt{into}(\omega_1) = \mathtt{from}(\omega_2)`$), we write
$`\omega_1 \odot\omega_2`$ for the connecting walk
$`\omega_1 \odot\omega_2
\xlongequal{\!\!\!\mathtt{def}\!\!\!}(v_1, \ldots, v_m, v'_2, \ldots, v'_n)`$.
A non-empty walk $`\omega`$ is called $`\emph{loop}`$ (on
$`\mathtt{from}(\omega)`$) if
$`\mathtt{from}(\omega) = \mathtt{into}(\omega)`$. A walk
$`(v_1, \ldots, v_n)`$ is called path if $`v_i \neq v_j`$ for every
$`i, j \in \{1, \ldots, n\}`$ with $`i \neq
j`$. A loop $`(v, v_1, \ldots, v_n, v)`$ is called cycle if
$`(v, v_1, \ldots, v_n)`$ is a path. We use the metavariable $`\pi`$ for
a path, and the metavariable $`\gamma`$ for a cycle. For a cycle
$`\gamma`$ and $`n \geq 1`$, we write $`\gamma^n`$ for the loop which is
an $`n`$-times repetition of $`\gamma`$. We denote by
$`{\mathcal W}({\mathcal G}), {\mathcal P}({\mathcal G}),`$ and by
$`{\mathcal C}({\mathcal G})`$ the set of all walks, paths and cycles in
$`{\mathcal G}`$. Note that $`{\mathcal W}({\mathcal G})`$ is infinite
in general, but $`{\mathcal P}({\mathcal G})`$ and
$`{\mathcal C}({\mathcal G})`$ are both finite if $`{\mathcal G}`$ is
finite (i.e., $`\#\!\left(V\right) < \infty`$). The $`N`$-dimensional de Bruijn graph $`{\mathcal G}^{N}_A= (A^N, E)`$
over $`A`$ is a graph whose vertex set $`A^N`$ is the set of words of
length $`N`$ and the edge set $`E`$ is defined by The case $`N = 2`$ is depicted in
Fig. 1. Let $`v`$ be a vertex of $`{\mathcal G}^{N}_A`$. A word
$`w = a_1 \cdots a_m \in A^+`$ induces the walk $`(v, v_1, \ldots,
v_m)`$ (where $`v_i = \mathrm{suff}_n(v \, \mathrm{pref}_i(w))`$) in
$`{\mathcal G}^{N}_A`$, and we denote it by
$`\mathtt{walk}_{{\mathcal G}^{N}_A}(v, w)`$. Conversely, a walk
$`\omega= (v_1, \ldots, v_n)`$ in $`{\mathcal G}^{N}_A`$ induces the
word $`v_1 \mathrm{suff}_1(v_2) \cdots \mathrm{suff}_1(v_n) \in A^*`$,
and we denote it by $`\mathtt{word}_{{\mathcal G}^{N}_A}(\omega)`$ (see
Fig. 1). For words
$`w, w_1, \ldots, w_k \in A^*`$ and a walk
$`\omega= (v_0, v_1, \ldots, v_n) \in {\mathcal W}({\mathcal G}^{N}_A)`$,
we define the following vectors in $`\mathbb{N}^k`$: We notice that the range of the summation in the above definition of
$`|\omega|_{(w_1, \ldots, w_k)}`$ does not contain $`0`$, hence
$`|\omega|_{(w_1, \ldots, w_k)} = (0, \ldots, 0)`$ if $`\omega`$ is an
empty walk $`\omega= (v_0)`$. The next proposition states a basic
property of $`{\mathcal G}^{N}_A`$. Let
$`w_1, \ldots, w_k \in A^*`$ and $`N = \max(|w_1|, \ldots, |w_k|)`$. For
any pair of words $`v,w \in A^*`$ such that $`|v| = N`$, we have where $`\omega= \mathtt{walk}_{{\mathcal G}^{N}_A}(v, w)`$. Proof. Straightforward induction on the length of $`w`$. ◻ In the previous section, multi-traces and traces play crucial role for
the characterisation of the finiteness. Multi-traces are also important
for the characterisation of the equivalence of WMIX languages which is
given here. Before stating the main statement, we lift the notion of
traces of walks to one of languages. For a language $`L \subseteq A^*`$,
we define $`\mathop{\mathrm{\mathbb{N}Tr}}(L)`$ the multi-trace of a
language $`L`$ (of order $`N`$) as The following theorem states that any WMIX language is completely
determined by its multi-trace (excluding shorter part $`A^{ Let
$`w_1, \ldots, w_k, w'_1, \ldots, w'_{k'} \in A^*`$ and $`N =
\max(|w_1|, \ldots, |w_k|, |w'_1|, \ldots, |w'_{k'}|)`$. Then
$`L(w_1, \ldots, w_k) = L(w'_1, \ldots, w'_{k'})`$ if and only if and Proof. The “only-if”-part is trivial. We prove the “if”-part by
contraposition. Assume
$`L(w_1, \ldots, w_k) \neq L(w'_1, \ldots, w'_{k'})`$. Then we can
assume that there is some word $`w`$ such that $`w \in L(w_1,
\ldots, w_k)`$ but $`w \notin L(w'_1, \ldots, w'_{k'})`$ without loss
of generality. If $`|w| < N`$ it is clear that and the “if”-part holds. Thus we consider the case $`|w| \geq N`$. Let
$`w = vu`$ such that $`|v| = N`$ and
$`M = \mathop{\mathrm{\mathbb{N}Tr}}(\mathtt{walk}_{{\mathcal G}^{N}_A}(v, u))`$.
We now prove that $`L(w'_1, \ldots, w'_{k'})`$ does not contain any word
$`w' = v'u' \, (|v'| = N)`$ that has the same multi-trace with $`w`$
(i.e.,
$`\mathop{\mathrm{\mathbb{N}Tr}}(\mathtt{walk}_{{\mathcal G}^{N}_A}(v', u')) = M = \mathop{\mathrm{\mathbb{N}Tr}}(\mathtt{walk}_{{\mathcal G}^{N}_A}(v, u))`$;
$`v' = v`$ holds in this case). By
Proposition [prop:dg] and
Proposition [prop:mtrace], any subword occurrences
in a word is completely determined by its multi-trace. Thus if there is
a word $`w' = vu'`$ in $`L(w'_1, \ldots,
w'_{k'})`$ such that
$`\mathop{\mathrm{\mathbb{N}Tr}}(\mathtt{walk}_{{\mathcal G}^{N}_A}(v, u')) = M`$,
then from which we obtain $`w \in L(w'_1, \ldots, w'_{k'})`$; this
contradicts with the assumption. Therefore we can conclude that ◻ By using Theorem [thm:equiv], we can obtain an algorithm
for deciding the equivalence of two WMIX languages. This algorithm also
uses the decidability of Presburger arithmetic, as like the previous
algorithm for the infiniteness, but in contrast to the case of
inifiniteness, it is reduced into $`\mathrm{\Pi}_1`$-formula of
Presburger arithmetic. For any word
$`w_1, \ldots, w_k, w'_1, \ldots, w'_{k'} \in A^*`$, it is decidable
whether $`L(w_1, \ldots, w_k) = L(w'_1, \ldots, w'_{k'})`$ or not. Proof. Let
$`N = \max(|w_1|, \ldots, |w_k|, |w'_1|, \ldots, |w'_{k'}|)`$. We can
effectively check $`L(w_1, \ldots, w_k) \cap A^{ or not. If there is some multi-trace that does not satisfy
Condition [lozen] then
$`L(w_1, \ldots, w_k) \neq L(w'_1, \ldots,
w'_{k'})`$, otherwise
$`L(w_1, \ldots, w_k) = L(w'_1, \ldots, w'_{k'})`$ holds. Since every
multi-trace can be represented by a corresponding trace and its
multiplicity (positive coefficients), for a trace
$`T = \{\pi\} \cup \{\gamma_1, \ldots, \gamma_m\}`$, the statement
“every multi-trace $`M`$ with
$`T = \{ \omega\in {\mathcal P}({\mathcal G}^{N}_A)
\cup {\mathcal C}({\mathcal G}^{N}_A) \mid M(\omega) \neq 0\}`$
satisfies Condition [lozen]” can be represented by the following
$`\mathrm{\Pi}_1`$-formula of Presburger arithmetic $`\psi_T`$: where $`\theta_T^{w}`$ is a subexpression defined by ◻|w|_v \xlongequal{\!\!\!\mathtt{def}\!\!\!}\#\!\left(\{ (w_1, w_2) \in A^* \times
A^* \mid w_1 v w_2 = w \}\right).
\begin{align*}
L(w_1, \ldots, w_k) \xlongequal{\!\!\!\mathtt{def}\!\!\!}\{ w \in A^* \mid |w|_{w_1} = \cdots =
|w|_{w_k} \}
\end{align*}
\begin{align*}
\mathrm{pref}(w) & \xlongequal{\!\!\!\mathtt{def}\!\!\!}\{ u \in A^* \mid
uv = w \text{ for some } v \in A^*\} \\
\mathrm{suff}(w) & \xlongequal{\!\!\!\mathtt{def}\!\!\!}\{ v \in A^*
\mid uv = w \text{ for some } u \in A^* \}
\end{align*}
(x_1, y_1) \leq_{1,2} (x_2, y_2) \stackrel{\mathtt{def}}{\Longleftrightarrow}x_1 \leq_1
x_2 \text{ and } y_1 \leq_2 y_2.
Graphs and Walks
E \xlongequal{\!\!\!\mathtt{def}\!\!\!}\{ (a v, v b) \mid a,b \in A, v \in A^{N-1} \}.
\begin{align*}
|w|_{(w_1, \ldots, w_k)}^{\mathtt{suff}} \xlongequal{\!\!\!\mathtt{def}\!\!\!}\, &
(c_1, \ldots, c_k)
\text{ where } c_i = 1 \text{ if }
w_i \in \mathrm{suff}(w), c_i = 0 \text{ otherwise},\\
|w|_{(w_1, \ldots, w_k)} \xlongequal{\!\!\!\mathtt{def}\!\!\!}\, &
(|w|_{w_1}, \ldots, |w|_{w_k})
\qquad
|\omega|_{(w_1, \ldots, w_k)} \xlongequal{\!\!\!\mathtt{def}\!\!\!}
\sum_{i = 1}^{n} |v_i|_{(w_1, \ldots, w_k)}^{\mathtt{suff}}.
\end{align*}
|vw|_{(w_1, \ldots, w_k)}
=
|v|_{(w_1, \ldots, w_k)}
+ |\omega|_{(w_1, \ldots, w_k)}
Characterisation of the Equivalence
\mathop{\mathrm{\mathbb{N}Tr^N}}(L) \xlongequal{\!\!\!\mathtt{def}\!\!\!}\{ \mathop{\mathrm{\mathbb{N}Tr}}(\omega) \mid \omega= \mathtt{walk}_{{\mathcal G}^{N}_A}(v, u), |v| = N, vu \in L\}.
L(w_1, \ldots, w_k) \cap A^{<N} = L(w'_1, \ldots, w'_{k'}) \cap
A^{<N}
\mathop{\mathrm{\mathbb{N}Tr^N}}(L(w_1, \ldots, w_k)) = \mathop{\mathrm{\mathbb{N}Tr^N}}(L(w'_1, \ldots, w'_{k'})).
w \in L(w_1, \ldots, w_k) \cap A^{<N} \neq L(w'_1, \ldots, w'_{k'}) \cap
A^{<N} \not\ni w
\begin{align*}
|w'|_{(w'_1, \ldots, w'_{k'})} = &\, |v|_{(w'_1, \ldots, w'_{k'})} + \!\!\!\!\!\!
\sum_{\pi\in {\mathcal P}({\mathcal G}^{N}_A)} \!\!\!\!\!
M(\pi) \cdot |\pi|_{(w'_1, \ldots, w'_{k'})}
+ \!\!\!\!\!\! \sum_{\gamma\in {\mathcal C}({\mathcal G}^{N}_A)} \!\!\!\!\!
M(\gamma) \cdot |\gamma|_{(w'_1, \ldots, w'_{k'})}
\\
= &\, |w|_{(w'_1, \ldots, w'_{k'})}
\end{align*}
M \in \mathop{\mathrm{\mathbb{N}Tr^N}}(L(w_1, \ldots, w_k)) \neq
\mathop{\mathrm{\mathbb{N}Tr^N}}(L(w'_1, \ldots, w'_{k'})) \not\ni M. \quad\qed
Decidability
\begin{align}
M \in \mathop{\mathrm{\mathbb{N}Tr^N}}(L(w_1, \ldots, w_k)) \text{ if and only if }
M \in \mathop{\mathrm{\mathbb{N}Tr^N}}(L(w'_1, \ldots, w'_{k'})) \tag{$\blacklozenge$}\label{lozen}
\end{align}
\begin{align*}
\psi_T \xlongequal{\!\!\!\mathtt{def}\!\!\!}\, & \forall x_1, \ldots, x_m \,
(x_1 > 0 \land \cdots \land x_m > 0)\\
& \Rightarrow \left( \Bigl(\theta_{T}^{w_1} =
\cdots = \theta_{T}^{w_k}
\Bigr) \Leftrightarrow
\Bigl( \theta_{T}^{w'_1} =
\cdots = \theta_{T}^{w'_{k'}}
\Bigr) \right)
\end{align*}
\theta_T^{w} \xlongequal{\!\!\!\mathtt{def}\!\!\!}|\mathtt{from}(\pi)|_{(w)} + |\pi|_{(w)} +
\sum_{i = 1}^{m} \underbrace{x_i + \cdots + x_i}_{|\gamma_i|_{(w)}
\text{ times}}. \qquad\qed