Infini-Attention Boosting Small-Scale Pretraining Limits

Reading time: 2 minute
...

📝 Original Paper Info

- Title: Probing the Limits of Compressive Memory A Study of Infini-Attention in Small-Scale Pretraining
- ArXiv ID: 2512.23862
- Date: 2025-12-29
- Authors: Ruizhe Huang, Kexuan Zhang, Yihao Fang, Baifeng Yu

📝 Abstract

This study investigates small-scale pretraining for Small Language Models (SLMs) to enable efficient use of limited data and compute, improve accessibility in low-resource settings and reduce costs. To enhance long-context extrapolation in compact models, we focus on Infini-attention, which builds a compressed memory from past segments while preserving local attention. In our work, we conduct an empirical study using 300M-parameter LLaMA models pretrained with Infini-attention. The model demonstrates training stability and outperforms the baseline in long-context retrieval. We identify the balance factor as a key part of the model performance, and we found that retrieval accuracy drops with repeated memory compressions over long sequences. Even so, Infini-attention still effectively compensates for the SLM's limited parameters. Particularly, despite performance degradation at a 16,384-token context, the Infini-attention model achieves up to 31% higher accuracy than the baseline. Our findings suggest that achieving robust long-context capability in SLMs benefits from architectural memory like Infini-attention.

💡 Summary & Analysis

1. Contribution 1: [Simple explanation and metaphor for the first key contribution in English] 2. Contribution 2: [Easily understandable explanation of the second major contribution] 3. Contribution 3: [Accessible description of the third significant contribution]

📄 Full Paper Content (ArXiv Source)

1. Contribution 1: [Simple explanation and metaphor for the first key contribution in English] 2. Contribution 2: [Easily understandable explanation of the second major contribution] 3. Contribution 3: [Accessible description of the third significant contribution]

📊 논문 시각자료 (Figures)

Figure 1



Figure 2



Figure 3



Figure 4



Figure 5



Figure 6



A Note of Gratitude

The copyright of this content belongs to the respective researchers. We deeply appreciate their hard work and contribution to the advancement of human civilization.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut