Context-Free Recognition with Transformers
Transformers excel empirically on tasks that process well-formed inputs according to some grammar, such as natural language and code. However, it remains unclear how they can process grammatical syntax. In fact, under standard complexity conjectures, standard transformers cannot recognize context-free languages (CFLs), a canonical formalism to describe syntax, or even regular languages, a subclass of CFLs. Past work proves that $\mathcal{O}(\log(n))$ looping layers (w.r.t. input length n) allows transformers to recognize regular languages, but the question of context-free recognition remained open. In this work, we show that looped transformers with $\mathcal{O}(\log(n))$ looping layers and $\mathcal{O}(n^6)$ padding tokens can recognize all CFLs. However, training and inference with $\mathcal{O}(n^6)$ padding tokens is potentially impractical. Fortunately, we show that, for natural subclasses such as unambiguous CFLs, the recognition problem on transformers becomes more tractable, requiring $\mathcal{O}(n^3)$ padding. We empirically validate our results and show that looping helps on a language that provably requires logarithmic depth. Overall, our results shed light on the intricacy of CFL recognition by transformers: While general recognition may require an intractable amount of padding, natural constraints such as unambiguity yield efficient recognition algorithms.
💡 Research Summary
This paper investigates the ability of transformer architectures to recognize context‑free languages (CFLs), a fundamental class of formal languages that captures hierarchical syntactic structure. While empirical evidence shows that transformers excel on natural language and code—both of which exhibit well‑formed grammatical patterns—theoretical work has established that standard, fixed‑depth transformers reside in the complexity class TC⁰. Under the widely accepted conjecture TC⁰ ⊊ NC¹, fixed‑depth transformers cannot even recognize regular languages, let alone CFLs. Recent research demonstrated that augmenting transformers with a logarithmic number of “looped” layers (i.e., repeating a block of layers O(log n) times for an input of length n) suffices to recognize regular languages. The open question, which this paper resolves, is whether the same looping mechanism enables recognition of the richer CFL class.
The authors introduce the notion of a looped transformer: a transformer whose middle block of layers is repeated O(d(n)) times, where d(n) is a function of the input length. They focus on the case d(n)=log n. To prove that such a model can recognize any CFL, they construct a parallel parsing algorithm based on items of the form (
Comments & Academic Discussion
Loading comments...
Leave a Comment