Compressed Causal Reasoning: Quantization and GraphRAG Effects on Interventional and Counterfactual Accuracy

Reading time: 5 minute
...

📝 Original Info

  • Title: Compressed Causal Reasoning: Quantization and GraphRAG Effects on Interventional and Counterfactual Accuracy
  • ArXiv ID: 2512.13725
  • Date: 2025-12-13
  • Authors: Steve Nwaiwu, Nipat Jongsawat, Anucha Tungkasthan

📝 Abstract

Causal reasoning in Large Language Models spanning association, intervention, and counterfactual inference is essential for reliable decision making in high stakes settings. As deployment shifts toward edge and resource constrained environments, quantized models such as INT8 and NF4 are becoming standard. Yet the impact of precision reduction on formal causal reasoning is poorly understood. To our knowledge, this is the first study to systematically evaluate quantization effects across all three levels of Pearls Causal Ladder. Using a 3000 sample stratified CLadder benchmark, we find that rung level accuracy in Llama 3 8B remains broadly stable under quantization, with NF4 showing less than one percent overall degradation. Interventional queries at rung 2 are the most sensitive to precision loss, whereas counterfactual reasoning at rung 3 is comparatively stable but exhibits heterogeneous weaknesses across query types such as collider bias and backdoor adjustment. Experiments on the CRASS benchmark show near identical performance across precisions, indicating that existing commonsense counterfactual datasets lack the structural sensitivity needed to reveal quantization induced reasoning drift. We further evaluate Graph Retrieval Augmented Generation using ground truth causal graphs and observe a consistent improvement in NF4 interventional accuracy of plus 1.7 percent, partially offsetting compression related degradation. These results suggest that causal reasoning is unexpectedly robust to four bit quantization, graph structured augmentation can selectively reinforce interventional reasoning, and current counterfactual benchmarks fail to capture deeper causal brittleness. This work provides an initial empirical map of compressed causal reasoning and practical guidance for deploying efficient and structurally supported causal AI systems.

💡 Deep Analysis

Figure 1

📄 Full Content

COMPRESSED CAUSAL REASONING: QUANTIZATION AND GRAPHRAG EFFECTS ON INTERVENTIONAL AND COUNTERFACTUAL ACCURACY Steve Nwaiwu, Nipat Jongsawat, Anucha Tungkasthan School of Data and Information Rajamangala University of Technology Pathum Thani, Thailand ABSTRACT Causal reasoning in Large Language Models (LLMs)spanning association, intervention, and coun- terfactual inference is essential for reliable decision making in high stakes settings. As deployment shifts toward edge and resource-constrained environments, quantized models (e.g., INT8, NF4) are becoming standard. Yet the impact of precision reduction on formal causal reasoning is poorly understood. To our knowledge, this is the first study to systematically evaluate quantization effects across all three levels of Pearl’s Causal Ladder. Using a 3,000-sample stratified CLadder benchmark, we find that rung-level accuracy in Llama-3-8B remains broadly stable under quantization, with NF4 showing less than 1% overall degradation. Interventional queries (Rung 2) are the most sensitive to precision loss, whereas counterfactual reasoning (Rung 3) is comparatively stable but exhibits hetero- geneous weaknesses across query types such as collider bias and backdoor adjustment. Experiments on the CRASS benchmark show near-identical performance across precisions, indicating that existing commonsense counterfactual datasets lack the structural sensitivity to reveal quantization-induced reasoning drift. We further evaluate Graph Retrieval-Augmented Generation (GraphRAG) using ground-truth causal graphs and observe a consistent improvement in NF4 interventional accuracy (∆= +1.7%), partially offsetting compression-related degradation. These results suggest that (a) causal reasoning is unexpectedly robust to 4-bit quantization, (b) graph-structured augmentation can selectively reinforce interventional reasoning, and (c) current counterfactual benchmarks fail to capture deeper causal brittleness. This work provides an initial empirical map of “compressed causal reasoning” and practical guidance for deploying efficient, structurally supported causal-AI systems. Keywords Causal Reasoning · Quantization · Large Language Models · Counterfactual Inference · Retrieval-Augmented Generation 1 Introduction Causal reasoning—the ability to infer how changes in one variable propagate through a system—underpins reliable decision making in domains such as scientific discovery, social policy, risk analysis, and autonomous agents. Formal frameworks such as Pearl’s Causal Ladder distinguish three qualitatively different forms of reasoning: association (“seeing”), intervention (“doing”), and counterfactual inference (“imagining”). Together, these levels define a principled hierarchy of causal abstraction, with increasing structural and compositional demands. While modern Large Language Models (LLMs) demonstrate impressive performance on a wide range of language tasks, their capacity to perform such structured causal transformations remains an open and actively debated question. Recent studies show that even state-of-the-art LLMs frequently rely on surface heuristics or correlational shortcuts, leading to systematic failures on intervention and counterfactual queries that require explicit causal reasoning [1, 2]. At the same time, practical deployment constraints are reshaping how LLMs are used in real-world systems. The rapid expansion of LLM applications to edge devices, offline agents, and cost-sensitive production environments has acceler- ated the adoption of low-precision quantization. By compressing model weights to 8-bit or 4-bit representations (e.g., arXiv:2512.13725v2 [cs.AI] 24 Dec 2025 INT8, NF4), quantization substantially reduces memory footprint and inference latency, enabling efficient deployment at scale. Weight-only and activation-aware quantization methods have become standard tools for efficient inference [3]. However, existing evaluations of quantized LLMs focus almost exclusively on perplexity, classification accuracy, or generic language understanding tasks. Whether precision reduction preserves the internal representations required for structured causal reasoning—particularly for interventions and counterfactuals—remains largely unexplored. In parallel, retrieval-augmented generation (RAG) has emerged as a prominent strategy for improving factual accuracy and long-context reasoning in LLMs by incorporating external knowledge at inference time [4, 5]. Despite its success in factual and multi-hop reasoning tasks, relatively little attention has been paid to the role of retrieval in supporting causal reasoning. This omission is notable because causal knowledge is inherently relational: causal dependencies, interventions, and counterfactual contrasts are naturally represented as directed graphs. Graph-structured retrieval, therefore, offers a promising mechanism for injecting explicit causal structure into model inference, potentially compensating for representational di

📸 Image Gallery

figure_1.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut