An Empirical Study of World Model Quantization

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

World models learn an internal representation of environment dynamics, enabling agents to simulate and reason about future states within a compact latent space for tasks such as planning, prediction, and inference. However, running world models rely on hevay computational cost and memory footprint, making model quantization essential for efficient deployment. To date, the effects of post-training quantization (PTQ) on world models remain largely unexamined. In this work, we present a systematic empirical study of world model quantization using DINO-WM as a representative case, evaluating diverse PTQ methods under both weight-only and joint weight-activation settings. We conduct extensive experiments on different visual planning tasks across a wide range of bit-widths, quantization granularities, and planning horizons up to 50 iterations. Our results show that quantization effects in world models extend beyond standard accuracy and bit-width trade-offs: group-wise weight quantization can stabilize low-bit rollouts, activation quantization granularity yields inconsistent benefits, and quantization sensitivity is highly asymmetric between encoder and predictor modules. Moreover, aggressive low-bit quantization significantly degrades the alignment between the planning objective and task success, leading to failures that cannot be remedied by additional optimization. These findings reveal distinct quantization-induced failure modes in world model-based planning and provide practical guidance for deploying quantized world models under strict computational constraints. The code will be available at https://github.com/huawei-noah/noah-research/tree/master/QuantWM.

💡 Research Summary

This paper presents the first comprehensive empirical investigation of post‑training quantization (PTQ) applied to world models used for visual planning, focusing on the DINO‑WM transformer‑based architecture. World models repeatedly simulate environment dynamics during planning, so quantization errors can accumulate over long horizons, potentially degrading performance in ways that differ from standard classification or language tasks. To address this gap, the authors evaluate a diverse set of PTQ techniques—R‑TN (uniform rounding), OMSE (mean‑squared‑error calibration), AWQ (activation‑aware weight quantization), SmoothQuant (joint weight‑activation smoothing), and OmniQuant (global optimization)—under both weight‑only and joint weight‑activation configurations.

Experiments are conducted on two benchmark environments: Wall (navigation) and PushT (manipulation). The authors measure success rates across planning horizons ranging from 0 to 50 iterations, thereby capturing both short‑term and long‑term effects. Quantization is explored at 8‑bit, 4‑bit, and 3‑bit precision for weights, and at several joint settings (W8A8, W6A6, W4A8, W4A4). Granularity variations include per‑channel vs. per‑group (group size 128) for weights and per‑tensor vs. per‑token for activations. Calibration data are collected from short random rollouts that are strictly separated from test data.

Key findings are:

8‑bit quantization is essentially lossless – all PTQ methods achieve success rates comparable to full‑precision (≥ 0.90) across the entire horizon, confirming that modern integer hardware can support world‑model inference without noticeable degradation.
4‑bit and lower expose substantial sensitivity – weight‑only 4‑bit quantization leads to severe drops in early‑horizon success (≈ 0.04–0.10). However, when weights are quantized in groups of 128, the planner can partially compensate: success rates improve dramatically as the number of iterations grows, reaching > 0.9 at 50 steps for OmniQuant‑group. This suggests that the multiple candidate evaluations inherent in planning act as a form of error averaging.
Activation quantization granularity is inconsistent – per‑tensor activation quantization is more stable than per‑token at low bit‑widths; per‑token often produces extreme clipping of outlier tokens, leading to near‑zero success. At 8‑bit the difference disappears, indicating that granularity matters mainly in aggressive quantization regimes.
Asymmetric module sensitivity – the encoder (visual feature extractor) is far more vulnerable to low‑bit quantization than the predictor (latent dynamics). The encoder’s performance collapses already at 4‑bit, while the predictor remains usable down to 6‑bit. Consequently, allocating higher precision to the encoder while applying aggressive group‑wise quantization to the predictor yields the best trade‑off.
Goal‑success alignment degrades under aggressive quantization – low‑bit models may still report low planning loss (e.g., L2 distance between predicted and goal observations) yet fail to achieve the actual goal. This decoupling indicates that traditional loss‑based metrics are insufficient for evaluating quantized world models; direct success‑rate measurement is essential.

The paper also documents that 3‑bit quantization is currently impractical for these models, as all methods produce near‑zero success.

From these observations, the authors derive practical deployment guidelines: use 8‑bit quantization whenever possible; if memory constraints demand 4‑bit, prefer group‑wise weight quantization and avoid aggressive activation quantization; assign higher bit‑width to the encoder and apply more aggressive compression to the predictor; calibrate with data that reflect the intended planning horizon; and always validate with success‑rate metrics rather than solely relying on loss.

In conclusion, this study illuminates distinct failure modes introduced by quantization in world‑model‑based planning, quantifies the interaction between bit‑width, granularity, and planning horizon, and offers actionable recommendations for deploying quantized world models in real‑time robotics, autonomous driving, and large‑scale simulation. Future work is suggested in designing quantization‑aware world‑model architectures, PTQ‑aware fine‑tuning, and extending the analysis to multimodal sensor inputs.

An Empirical Study of World Model Quantization

💡 Research Summary

Comments & Academic Discussion

Leave a Comment