Modeling Language as a Sequence of Thoughts

Reading time: 1 minute
...

📝 Original Info

  • Title: Modeling Language as a Sequence of Thoughts
  • ArXiv ID: 2512.25026
  • Date: 2025-12-31
  • Authors: Nasim Borazjanizadeh, James McClelland

📝 Abstract

Transformer language models can generate strikingly natural text by modeling language as a sequence of tokens. Yet, by relying primarily on surface-level co-occurrence statistics, they fail to form globally consistent latent representations of entities and events, lack of which contributes to poor relational generalization (reversal curse), contextualization errors, and data inefficiency. On the other hand, cognitive science shows that human comprehension involves converting the input linguistic stream into compact, event-like representations that persist in memory while verbatim form is short-lived. Motivated by these cognitive findings, we introduce the Thought Gestalt (TG) Model, a recurrent transformer that models language at two levels of abstraction-tokens and sentence-level "thought" states. TG generates the tokens of one sentence at a time while cross-attending to a working memory of prior sentence representations. In TG, token and sentence representations are generated using a shared stack of transformer blocks and trained with a single objective, the next-token prediction loss: by retaining the computation graph of sentence representations written to the working memory, gradients from future token losses flow backward through cross-attention to optimize the pa...

📄 Full Content

...(본문 내용이 길어 생략되었습니다. 사이트에서 전문을 확인해 주세요.)

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut