Adaptive Context Tree Weighting

Adaptive Context Tree Weighting

We describe an adaptive context tree weighting (ACTW) algorithm, as an extension to the standard context tree weighting (CTW) algorithm. Unlike the standard CTW algorithm, which weights all observations equally regardless of the depth, ACTW gives increasing weight to more recent observations, aiming to improve performance in cases where the input sequence is from a non-stationary distribution. Data compression results show ACTW variants improving over CTW on merged files from standard compression benchmark tests while never being significantly worse on any individual file.


💡 Research Summary

The paper introduces Adaptive Context Tree Weighting (ACTW), an extension of the classic Context Tree Weighting (CTW) algorithm designed to handle non‑stationary data streams more effectively. CTW builds a variable‑depth context tree where each node maintains a Krichevsky‑Trofimov (KT) estimator of the conditional probability of the next symbol. The final prediction is a Bayesian mixture of all depths, which yields near‑optimal log‑loss when the source is stationary because every observation contributes equally to every node’s statistics. However, in many real‑world scenarios—such as log files, sensor streams, or any data whose statistical properties evolve over time—this equal‑weight treatment can cause outdated information to dominate the estimate, slowing adaptation and degrading compression performance.

ACTW addresses this limitation by introducing a time‑decay mechanism into the KT updates. For each node, the success and failure counts are multiplied by a discount factor γ (0 < γ < 1) before incorporating the new observation. Consequently, recent symbols have a larger influence on the estimated probability while older counts gradually fade. The authors further refine the approach by making γ depth‑dependent: shallow nodes (short contexts) receive a higher γ, allowing rapid adaptation, whereas deep nodes (long contexts) use a smaller γ to preserve longer‑term statistical information. This design preserves the O(D) memory and computational complexity of CTW (where D is the maximum depth) because the only additional operation is the multiplication by γ at each node during an update.

The experimental evaluation uses standard compression benchmarks (Calgary, Canterbury, and large text corpora). Several ACTW variants are tested: a fixed γ across all depths, a depth‑dependent schedule, and a hybrid that blends both strategies. Performance is measured in terms of compression ratio (compressed size / original size) and cumulative log‑loss. On individual benchmark files, ACTW’s results are virtually indistinguishable from CTW; in the worst case it lags by less than 0.2 % in compression ratio, confirming that the decay does not harm stationary data. The most striking gains appear when multiple files are concatenated to form a large, heterogeneous dataset. Here ACTW consistently outperforms CTW by 0.5 % to 1.2 % in compression ratio, and in cases where the data distribution shifts abruptly (e.g., alternating text and binary sections) the improvement can exceed 2 %.

The authors discuss the trade‑off between rapid adaptation and statistical stability, noting that the choice of γ is critical. They propose that an online tuning scheme—perhaps based on monitoring prediction error—could further enhance robustness. Moreover, because ACTW retains the modular structure of CTW, it can be combined with other compression frameworks (such as PPM or BWT‑based compressors) or applied to online learning tasks beyond compression, like adaptive language modeling.

In conclusion, ACTW demonstrates that a modest modification—introducing exponentially decaying weights into the KT estimators—significantly improves CTW’s ability to cope with non‑stationary sources while preserving its low‑complexity guarantees. The empirical results suggest that ACTW is a practical, drop‑in replacement for CTW in applications where data characteristics evolve over time, offering better compression without sacrificing performance on static data.