Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence

Reading time: 1 minute
...

📝 Original Info

  • Title: Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence
  • ArXiv ID: 2511.07384
  • Date: 2025-11-10
  • Authors: ** 논문에 명시된 저자 정보가 제공되지 않았습니다. (저자명, 소속, 연락처 등은 원문을 참고하시기 바랍니다.) **

📝 Abstract

Recent advances in depth-recurrent language models show that recurrence can decouple train-time compute and parameter count from test-time compute. In this work, we study how to convert existing pretrained non-recurrent language models into depth-recurrent models. We find that using a curriculum of recurrences to increase the effective depth of the model over the course of training preserves performance while reducing total computational cost. In our experiments, on mathematics, we observe that converting pretrained models to recurrent ones results in better performance at a given compute budget than simply post-training the original non-recurrent language model.

💡 Deep Analysis

📄 Full Content

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut