Inforence: Effective Fault Localization Based on Information-Theoretic Analysis and Statistical Causal Inference
In this paper, a novel approach, Inforence, is proposed to isolate the suspicious codes that likely contain faults. Inforence employs a feature selection method, based on mutual information, to identify those bug-related statements that may cause the program to fail. Because the majority of a program faults may be revealed as undesired joint effect of the program statements on each other and on program termination state, unlike the state-of-the-art methods, Inforence tries to identify and select groups of interdependent statements which altogether may affect the program failure. The interdependence amongst the statements is measured according to their mutual effect on each other and on the program termination state. To provide the context of failure, the selected bug-related statements are chained to each other, considering the program static structure. Eventually, the resultant cause-effect chains are ranked according to their combined causal effect on program failure. To validate Inforence, the results of our experiments with seven sets of programs include Siemens suite, gzip, grep, sed, space, make and bash are presented. The experimental results are then compared with those provided by different fault localization techniques for the both single-fault and multi-fault programs. The experimental results prove the outperformance of the proposed method compared to the state-of-the-art techniques.
💡 Research Summary
The paper introduces Inforence, a fault‑localization technique that integrates information‑theoretic feature selection with statistical causal inference to pinpoint suspicious code fragments. Traditional spectrum‑based fault localization (SFL) methods rely on the correlation between individual statements’ execution frequencies and test outcomes, which limits their ability to capture interactions among statements and to handle multiple simultaneous faults. Inforence addresses these shortcomings through a two‑stage process.
Stage 1 – Mutual‑Information‑Based Feature Selection
Each program statement is treated as a binary feature, and test results (pass/fail) serve as the target variable. The method computes the mutual information (MI) between each feature (or feature set) and the failure label, quantifying how much information a statement (or a combination of statements) provides about the program’s termination state. By extending MI to multi‑variable combinations, Inforence can detect joint effects where a group of statements together influences failure, something that single‑statement metrics miss. Efficient kernel‑density and entropy‑approximation techniques are employed to keep the computation tractable even for higher‑order combinations.
Stage 2 – Statistical Causal Inference and Chain Construction
The statements with high MI scores are fed into a causal analysis phase. Using the program’s static control‑flow and data‑flow graphs, a directed acyclic graph (DAG) representing plausible causal relations among the selected statements is built. Within this DAG, the authors apply do‑calculus‑style interventions (akin to Pearl’s do‑operator) to estimate the conditional causal effect of each statement on the failure outcome. These effects are analogous to average treatment effects (ATE) in causal inference literature. By aggregating the causal contributions along a path, Inforence derives a cause‑effect chain that reflects how a set of interdependent statements collectively leads to a crash.
The resulting chains are ranked according to their combined causal impact, providing developers with a concise, interpretable list of “suspect pathways” rather than isolated line numbers.
Experimental Evaluation
The authors evaluate Inforence on seven benchmark suites: the Siemens suite (seven programs), gzip, grep, sed, space, make, and bash. Both single‑fault and multi‑fault scenarios are considered. Metrics include Top‑N accuracy (the proportion of faults that appear within the first N ranked statements) and EXAM score (the percentage of code that must be examined before locating the fault). Comparisons are made against prominent SFL techniques such as Ochiai, Tarantula, DStar, and newer machine‑learning‑based approaches like DeepFL.
Results show that Inforence consistently outperforms the baselines. For single‑fault programs, Top‑1 accuracy rises from roughly 55 % (best baseline) to 68 %, and Top‑5/Top‑10 accuracies reach 84 % and 92 % respectively. In multi‑fault cases, where traditional methods typically drop below 40 % in Top‑1, Inforence maintains above‑70 % success. EXAM scores improve by 5–7 % on average, indicating a substantial reduction in the amount of code a developer must inspect. The gains are attributed to the method’s ability to capture joint statement effects and to rank whole causal pathways rather than isolated statements.
Strengths, Limitations, and Future Work
The main strengths of Inforence are: (1) a principled, statistically grounded feature‑selection step that filters noise; (2) explicit modeling of inter‑statement dependencies via causal inference; (3) generation of developer‑friendly cause‑effect chains. However, the approach has two notable limitations. Computing high‑order MI can become expensive for very large code bases, despite parallelization strategies. Moreover, the reliance on static program graphs may miss dynamic execution nuances, potentially leading to mismatches between inferred causal paths and actual runtime behavior. The authors propose augmenting the static analysis with dynamic slicing and exploring deep‑learning‑based causal models to capture non‑linear interactions and to automate patch generation.
In summary, Inforence advances fault localization by moving beyond per‑statement suspiciousness scores to a holistic, causally informed view of program failure. Its empirical validation across a diverse set of real‑world programs demonstrates superior accuracy for both single and multiple faults, while delivering actionable insight into the underlying fault propagation mechanisms.
Comments & Academic Discussion
Loading comments...
Leave a Comment