Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence
📝 Original Info
- Title: Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence
- ArXiv ID: 2511.07384
- Date: 2025-11-10
- Authors: ** 논문에 명시된 저자 정보가 제공되지 않았습니다. (저자명, 소속, 연락처 등은 원문을 참고하시기 바랍니다.) **
📝 Abstract
Recent advances in depth-recurrent language models show that recurrence can decouple train-time compute and parameter count from test-time compute. In this work, we study how to convert existing pretrained non-recurrent language models into depth-recurrent models. We find that using a curriculum of recurrences to increase the effective depth of the model over the course of training preserves performance while reducing total computational cost. In our experiments, on mathematics, we observe that converting pretrained models to recurrent ones results in better performance at a given compute budget than simply post-training the original non-recurrent language model.💡 Deep Analysis
📄 Full Content
Reference
This content is AI-processed based on open access ArXiv data.