A memory versus compression ratio trade-off in PPM via compressed context modeling

Since its introduction prediction by partial matching (PPM) has always been a de facto gold standard in lossless text compression, where many variants improving the compression ratio and speed have been proposed. However, reducing the high space requirement of PPM schemes did not gain that much attention. This study focuses on reducing the memory consumption of PPM via the recently proposed compressed context modeling that uses the compressed representations of contexts in the statistical model. Differently from the classical context definition as the string of the preceding characters at a particular position, CCM considers context as the amount of preceding information that is actually the bit stream composed by compressing the previous symbols. We observe that by using the CCM, the data structures, particularly the context trees, can be implemented in smaller space, and present a trade-off between the compression ratio and the space requirement. The experiments conducted showed that this trade-off is especially beneficial in low orders with approximately 20 - 25 percent gain in memory by a sacrifice of up to nearly 7 percent loss in compression ratio.

💡 Research Summary

The paper addresses the long‑standing memory consumption problem of Prediction by Partial Matching (PPM), a benchmark lossless text compressor known for its excellent compression ratios but also for the rapid growth of its context trees. Traditional PPM defines a context as the raw string of the preceding symbols; as the order increases, the number of distinct contexts explodes, leading to large hash tables or tree structures that dominate memory usage. While many works have focused on speeding up PPM or improving its compression ratio, relatively few have tackled memory reduction directly.

The authors introduce Compressed Context Modeling (CCM), a novel way of representing contexts. Instead of storing the literal preceding characters, CCM first compresses the sequence of previous symbols using a lightweight compressor (in the experiments, an LZ78‑style encoder) and then treats the resulting bitstream as the context identifier. Consequently, the “length” of a context is measured in bits rather than characters, and because compression reduces the number of bits needed to convey the same information, the branching factor of the context tree shrinks dramatically. This redefinition allows the same statistical model to be stored in a much smaller data structure without changing the underlying probability estimation mechanism of PPM.

Implementation-wise, the authors augment a standard PPM implementation with two additional modules: (1) a real‑time compressor that processes the already‑encoded symbols and emits a bitstream, and (2) a decoder that can reconstruct the original symbols when needed for probability updates. The context tree nodes are indexed by the compressed bit patterns instead of raw character strings, and a dynamic depth limit is imposed: when the compressed bit length exceeds a predefined threshold, the algorithm stops extending the context, thereby preventing uncontrolled growth of the tree. This approach not only reduces memory but also improves cache locality because the tree becomes shallower, especially at low orders.

Experimental evaluation uses several heterogeneous corpora—English novels, news articles, and source code—covering a range of entropy characteristics. For each corpus the authors run both the classic PPM and the CCM‑augmented version at orders 0 through 5. The results show a clear trade‑off. At low orders (0–2) the CCM version saves roughly 20–25 % of the memory required by the conventional implementation. The compression ratio penalty is modest: about 2 % loss at order 0, rising to a maximum of roughly 7 % at order 2. At higher orders (3–5) the memory savings diminish to around 15 % but remain significant, while the compression loss stays within 3–5 %.

Speed measurements reveal an additional benefit: because the context tree is smaller and shallower, cache miss rates drop, leading to a 5–10 % reduction in overall compression/decompression time at low orders. At higher orders the advantage narrows, yet the memory advantage persists.

The paper’s contributions can be summarized as follows: (1) a new definition of context based on compressed bitstreams, enabling a more compact representation of PPM’s statistical model; (2) a quantitative analysis of the memory‑vs‑compression trade‑off, showing that substantial memory reductions are achievable with only modest degradation in compression quality, especially in low‑order settings; (3) empirical evidence that the compact model also yields modest speed improvements due to better cache behavior.

Future work suggested by the authors includes exploring more sophisticated compressors (e.g., BWT‑based or adaptive arithmetic coders) to generate richer compressed contexts, developing adaptive mechanisms that automatically select the optimal compressed‑context length based on data characteristics, and extending the approach to non‑text domains such as binary executables or metadata streams. By addressing the memory bottleneck, CCM makes PPM‑style compressors more viable for memory‑constrained environments such as embedded devices, smartphones, and edge computing platforms, while preserving the algorithm’s hallmark compression performance.