AdmTree: Compressing Lengthy Context with Adaptive Semantic Trees

December 04, 2025

Reading time: 5 minute

...

📝 Original Info

Title: AdmTree: Compressing Lengthy Context with Adaptive Semantic Trees
ArXiv ID: 2512.04550
Date: 2025-12-04
Authors: Yangning Li, Shaoshen Chen, Yinghui Li, Yankai Chen, Hai-Tao Zheng, Hui Wang, Wenhao Jiang, Philip S. Yu

📝 Abstract

The quadratic complexity of self-attention constrains Large Language Models (LLMs) in processing long contexts, a capability essential for many advanced applications. Context compression aims to alleviate this computational bottleneck while retaining critical semantic information. However, existing approaches often fall short: explicit methods may compromise local detail, whereas implicit methods can suffer from positional biases, information degradation, or an inability to capture long-range semantic dependencies. We propose AdmTree, a novel framework for adaptive, hierarchical context compression with a central focus on preserving high semantic fidelity while maintaining efficiency. AdmTree dynamically segments input based on information density, utilizing gist tokens to summarize variable-length segments as the leaves of a semantic binary tree. This structure, together with a lightweight aggregation mechanism and a frozen backbone LLM (thereby minimizing new trainable parameters), enables efficient hierarchical abstraction of the context. By preserving fine-grained details alongside global semantic coherence, mitigating positional bias, and dynamically adapting to content, AdmTree robustly retains the semantic information of long contexts.

💡 Deep Analysis

📄 Full Content

AdmTree: Compressing Lengthy Context with Adaptive Semantic Trees Yangning Li1,2∗, Shaoshen Chen1∗, Yinghui Li1‡, Yankai Chen3, Hai-Tao Zheng1,2‡, Hui Wang2, Wenhao Jiang4‡, Philip S. Yu3 1Shenzhen International Graduate School, Tsinghua University 2Peng Cheng Laboratory 3University of Illinois Chicago 4Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ) Abstract The quadratic complexity of self-attention constrains Large Language Models (LLMs) in processing long contexts—a capability essential for many advanced applications. Context compression aims to alleviate this computational bottleneck while retaining critical semantic information. However, existing approaches often fall short: explicit methods may compromise local detail, whereas implicit methods can suffer from positional biases, information degradation, or an inability to capture long-range semantic dependencies. We propose AdmTree, a novel framework for adaptive, hierarchical context compression with a central focus on preserving high semantic fidelity while maintaining efficiency. AdmTree dynamically segments input based on information density, utilizing gist tokens to summarize variable- length segments as the leaves of a semantic binary tree. This structure, together with a lightweight aggregation mechanism and a frozen backbone LLM (thereby minimizing new trainable parameters), enables efficient hierarchical abstraction of the context. By preserving fine-grained details alongside global semantic coherence, mitigating positional bias, and dynamically adapting to content, AdmTree robustly retains the semantic information of long contexts. 1 Introduction Large Language Models (LLMs) [2, 3, 37, 38, 46, 50, 68, 75] have demonstrated remarkable profi- ciency in processing and understanding long contexts [9, 10, 59, 60], enabling advances in retrieval- augmented generation [47, 61, 78] and agentic system [39, 77], etc. However, handling long contexts remains computationally intensive due to the quadratic complexity of self-attention with input token length. This leads to high memory consumption and inference latency. Consequently, context com- pression has emerged as a critical technique, aiming to reduce input token length while preserving maximal semantic integrity. Despite promising results, most existing methods fail to simultaneously preserve information across multiple semantic dimensions, such as global versus local semantics, or information across positions. This inability in preserving different information dimensions may leads to poor general- ization across real-world tasks, which demand distinct types of semantic information. Specifically, existing methods can be divided into two main categories. Explicit methods [30, 79, 82] directly shorten text by removing content deemed less essential for overall understanding. Although these methods effectively capture global meaning, they often disrupt local coherence due to excessive omission, leading to the loss of fine-grained details. In contrast, implicit methods [18, 23, 42, 62] encode long contexts into compact latent vectors (also called “gist tokens”) in a flat manner. These methods achieve higher compression ratios, but exhibit different compression efficiency to context at ∗Equal Contribution. ‡: Corresponding Author. This work was primarily conducted at GML under the leadership of Wenhao. 39th Conference on Neural Information Processing Systems (NeurIPS 2025). arXiv:2512.04550v1 [cs.CL] 4 Dec 2025 12 14 16 18 20 22 Beginning Middle End Beginning Middle End LongLLMLingua SnapKV Beacon AdmTree (Ours) Book Level Chapter Level Paragraph Level (a) Multi-Granularity Summarization. Implicit methods tend to prioritize the preservation of global information. 80.2 (b) Multi-Document Question Answering. Explicit methods exhibit varying compression effectiveness across content at different positions. Figure 1: Pre-experiments demonstrate that existing methods struggle to balance semantic information of different dimensions for different types of tasks. distinct positions. As evidenced by [7, 27, 57], implicit compression methods are prone to positional bias, often overlooking information from the earlier or middle parts of the context. In other words, less salient information is easily overshadowed by more prominent content. To mitigate semantic loss caused by position bias, some implicit methods also explored recursive compression [14, 24, 83]. These methods progressively condense segmented contexts into serialized gist tokens. However, they often rely on fixed-size segments without considering variations in information density, leading to imbalanced compression loads across different input regions. More- over, such linear manner still causes semantic information to degrade progressively during recursive compression, making it difficult to capture long-range dependencies and maintain global coherence. To overcome these limitations, we draw inspiration from cognitive science,

📄 Read Full PDF on ArXiv