An undecidable property of context-free languages

An undecidable property of context-free languages
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We prove that there exists no algorithm to decide whether the language generated by a context-free grammar is dense with respect to the lexicographic ordering. As a corollary to this result, we show that it is undecidable whether the lexicographic orderings of the languages generated by two context-free grammars have the same order type.


💡 Research Summary

**
The paper investigates the order‑theoretic properties of languages generated by context‑free grammars (CFGs) when these languages are equipped with the standard lexicographic ordering <ℓ on Σ*. After recalling that any countable linear order can be represented as (L,<ℓ) for some language L over an alphabet of size at least two, the authors define “regular”, “context‑free”, and “deterministic context‑free” linear orders as those isomorphic to the lexicographic orderings of regular, CFL, and deterministic CFL languages respectively. Known characterisations (e.g., regular orders correspond to order‑0 recursion schemes, deterministic CFL orders to order‑1 schemes) are briefly surveyed.

The main contributions are two undecidability results. The first shows that there is no algorithm that, given a CFG (even a prefix grammar) G over a binary alphabet, decides whether the linear order (L(G),<ℓ) is dense. The proof proceeds by reduction from the Post Correspondence Problem (PCP). For an arbitrary PCP instance (α,β) the authors construct a finite alphabet Δ consisting of symbols 1,…,n, a, b, ¢, $ together with auxiliary symbols d_{j0}, d_{j1}, d_{j2} (j = 1…n+2). Over Δ they define a prefix grammar G_{α,β} whose language L is the union of four families:

  1. L_α = { i_1…i_m (α_{i_1}…α_{i_m})^{-1} ¢ | m≥1, each i_k∈{1,…,n} }
  2. L_β = { i_1…i_m (β_{i_1}…β_{i_m})^{-1} $ | m≥1, each i_k∈{1,…,n} }
  3. For each j, L_j = Σ* Q_j where Q_j = {d_{j0}, d_{j2}}* d_{j1}.

Each Q_j (and consequently each L_j) is a regular language whose lexicographic order type is η, the order type of the rationals; the finite sum of η‑type orders is again η. The grammar G_{α,β} is explicitly given in BNF and is clearly a prefix grammar.

If (α,β) has a solution, there exist strings uα = i_1…i_m u ¢ and uβ = i_1…i_m u $ belonging to L. No word of L lies strictly between them, because any word between would have to share the same numeric prefix i_1…i_m and then contain a symbol strictly between ¢ and $, which does not exist. Hence L is not dense. Conversely, if the PCP instance has no solution, the authors show that for any two distinct words u <ℓ v in L one can always find a third word w with u <ℓ w <ℓ v. The argument distinguishes three cases depending on whether the first differing symbols belong to the same Δ_j, to different Δ_j, or are numeric symbols c_i, c_j. In each case the dense regular sublanguages Q_j (or the concatenations Σ* Q_j) provide a suitable intermediate word. Thus L is dense precisely when the PCP instance is unsolvable. By encoding the symbols of Δ as fixed‑length binary strings, the same construction yields a CFG over {0,1} with the same property, establishing the undecidability of density for CFGs.

The second result leverages the first to prove that the isomorphism problem for lexicographic orderings of CFLs is undecidable. Using the same grammar G_{α,β} and a simple right‑linear (hence regular) grammar G′ that generates the language {00,11}*01, whose lexicographic order type is also η, the authors observe that (L(G_{α,β}),<ℓ) and (L(G′),<ℓ) are isomorphic iff the PCP instance has no solution. Consequently, deciding whether two CFGs (or a CFG and a regular grammar) generate linearly ordered languages of the same order type is as hard as PCP and therefore undecidable.

The paper also discusses related work: regular linear orders are known to be decidable for isomorphism (Thomas 1986), while the status for deterministic CFL orders remains open. The authors note that while checking whether a CFG’s lexicographic order is scattered or a well‑ordering can be done in polynomial time, the dense case is fundamentally intractable. They conjecture that a PTIME algorithm might exist for the scattered/well‑ordering decision for arbitrary CFGs, but leave this as future work.

In conclusion, the authors have established two fundamental limits: (1) there is no algorithm to decide density of the lexicographic ordering of a context‑free language, and (2) there is no algorithm to decide whether two context‑free (or one context‑free and one regular) languages have lexicographic orderings of the same order type. These results deepen our understanding of the expressive power of CFLs from an order‑theoretic perspective and open new avenues for exploring decidability boundaries in formal language theory.


Comments & Academic Discussion

Loading comments...

Leave a Comment