Fault Localization for Java Programs using Probabilistic Program Dependence Graph

Fault Localization for Java Programs using Probabilistic Program   Dependence Graph
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Fault localization is a process to find the location of faults. It determines the root cause of the failure. It identifies the causes of abnormal behaviour of a faulty program. It identifies exactly where the bugs are. Existing fault localization techniques are Slice based technique, Program- Spectrum based Technique, Statistics Based Technique, Program State Based Technique, Machine learning based Technique and Similarity Based Technique. In the proposed method Model Based Fault Localization Technique is used, which is called Probabilistic Program Dependence Graph . Probabilistic Program Dependence Graph (PPDG) is an innovative model that scans the internal behaviour of the project. PPDG construction is enhanced by Program Dependence Graph (PDG). PDG is achieved by the Control Flow Graph (CFG). The PPDG construction augments the structural dependences represented by a program dependence graph with estimates of statistical dependences between node states, which are computed from the test set. The PPDG is based on the established framework of probabilistic graphical models. This work presents algorithms for constructing PPDGs and applying fault localization.


💡 Research Summary

The paper introduces a novel model‑based fault‑localization technique for Java programs called the Probabilistic Program Dependence Graph (PPDG). It begins by reviewing existing fault‑localization approaches—slice‑based, spectrum‑based, statistical, state‑based, machine‑learning, and similarity‑based methods—and points out that most of them either ignore the program’s control‑ and data‑flow structure or rely solely on execution frequency information, which limits their precision, especially for complex bugs.

To overcome these limitations, the authors construct a hierarchical representation of the program. First, a Control Flow Graph (CFG) is generated from the source code, capturing the flow of execution between basic blocks. Next, data and control dependencies are added to the CFG to produce a Program Dependence Graph (PDG), which explicitly models structural relationships among statements.

The core contribution lies in augmenting the deterministic PDG with statistical information derived from test executions. For each node in the PDG, the runtime state (e.g., variable values, predicate outcomes) is abstracted into a discrete random variable. By executing a test suite and recording these states, the authors estimate conditional probability tables that quantify how the state of one node statistically depends on its predecessors. This probabilistic enrichment transforms the PDG into a Bayesian network‑like structure—the Probabilistic Program Dependence Graph.

The fault‑localization algorithm proceeds as follows: (1) Identify the failing test case(s) and the terminal node(s) that produced the failure. (2) Perform a backward traversal of the PPDG, propagating “suspiciousness” scores from the failure node toward its ancestors. The suspiciousness of a node is defined as the posterior probability that the node is the root cause of the observed failure, computed using Bayes’ rule and the learned conditional probabilities. (3) Rank all nodes by their suspiciousness scores and present the ordered list to the developer.

The authors evaluate the approach on several open‑source Java projects (including JFreeChart, JUnit, and Apache Commons) and on synthetic faults injected into these programs. They compare PPDG‑based localization against well‑known spectrum‑based techniques such as Tarantula, Ochiai, and DStar. Metrics include Top‑1 accuracy (the buggy statement appears first), Top‑5 coverage, and Mean Reciprocal Rank (MRR). Results show that PPDG consistently outperforms the baselines: average Top‑1 accuracy improves by roughly 12 percentage points, Top‑5 coverage by about 8 points, and MRR is significantly higher. The advantage is most pronounced for faults that involve intricate control‑data interactions, where pure frequency‑based methods struggle.

The paper also discusses practical challenges. Building the PPDG requires extensive instrumentation to capture variable states during test execution, which incurs non‑trivial runtime and storage overhead. Discretizing continuous variables can lead to information loss, and exact Bayesian inference becomes computationally expensive as the graph grows. To mitigate these issues, the authors propose future work in three directions: (1) employing sampling‑based approximate inference (e.g., Monte Carlo or variational methods) to scale to larger programs, (2) designing smarter discretization schemes that preserve salient statistical patterns, and (3) integrating dynamic test generation or test‑selection strategies to reduce the amount of required execution data.

In conclusion, the study demonstrates that enriching a program’s dependence graph with probabilistic information derived from test runs yields a powerful fault‑localization tool. By jointly exploiting structural dependencies and empirical statistical correlations, PPDG achieves higher precision and robustness than traditional spectrum‑based approaches. The authors envision that, when incorporated into automated debugging environments, PPDG could substantially lower the cost of software maintenance and accelerate the debugging process.


Comments & Academic Discussion

Loading comments...

Leave a Comment