Multi-View Adaptive Contrastive Learning for Information Retrieval Based Fault Localization

Multi-View Adaptive Contrastive Learning for Information Retrieval Based Fault Localization
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Most studies focused on information retrieval-based techniques for fault localization, which built representations for bug reports and source code files and matched their semantic vectors through similarity measurement. However, such approaches often ignore some useful information that might help improve localization performance, such as 1) the interaction relationship between bug reports and source code files; 2) the similarity relationship between bug reports; and 3) the co-citation relationship between source code files. In this paper, we propose a novel approach named Multi-View Adaptive Contrastive Learning for Information Retrieval Fault Localization (MACL-IRFL) to learn the above-mentioned relationships for software fault localization. Specifically, we first generate data augmentations from report-code interaction view, report-report similarity view and code-code co-citation view separately, and adopt graph neural network to aggregate the information of bug reports or source code files from the three views in the embedding process. Moreover, we perform contrastive learning across these views. Our design of contrastive learning task will force the bug report representations to encode information shared by report-report and report-code views,and the source code file representations shared by code-code and report-code views, thereby alleviating the noise from auxiliary information. Finally, to evaluate the performance of our approach, we conduct extensive experiments on five open-source Java projects. The results show that our model can improve over the best baseline up to 28.93%, 25.57% and 20.35% on Accuracy@1, MAP and MRR, respectively.


💡 Research Summary

The paper addresses a critical limitation of information‑retrieval (IR) based fault localization: reliance on textual similarity between bug reports and source code, which suffers from lexical gaps and insufficient bug descriptions. To overcome this, the authors propose MACL‑IRFL (Multi‑View Adaptive Contrastive Learning for Information Retrieval‑based Fault Localization), a framework that jointly exploits three complementary sources of auxiliary information. First, a bug‑report‑to‑code interaction view models historical bug‑fixing records as a bipartite graph linking each report with the files that were modified to fix it. Second, a report‑report similarity view captures relationships among bug reports, assuming that similar reports tend to involve the same faulty files. Third, a code‑code co‑citation view connects source files that are jointly cited in the same bug report, reflecting implicit code dependencies.

Each view is a heterogeneous graph containing distinct node types (reports, code files) and edge types (interaction, similarity, co‑citation). The authors employ relational graph convolutional networks (R‑GCNs) to aggregate node features (textual embeddings for reports, token/AST embeddings for code) within each view, producing view‑specific node representations. To fuse these multi‑view embeddings while suppressing noisy auxiliary data, they introduce an adaptive contrastive learning objective. Positive pairs are formed between the same bug report’s embeddings from the report‑report and report‑code views (or between the same code file’s embeddings from the code‑code and report‑code views). Negative pairs consist of embeddings from different reports or different code files. By minimizing an InfoNCE loss, the model maximizes mutual information across views for the same entity, pulling together shared, useful signals and pushing apart unrelated information. The “adaptive” aspect learns view‑specific weighting, allowing the model to down‑weight a view that contributes more noise (e.g., a similarity graph with many spurious links).

The approach is evaluated on five open‑source Java projects (including Eclipse components). Using standard fault‑localization metrics—Accuracy@1, Mean Average Precision (MAP), and Mean Reciprocal Rank (MRR)—MACL‑IRFL outperforms the strongest baselines by up to 28.93 % (Accuracy@1), 25.57 % (MAP), and 20.35 % (MRR). Ablation studies demonstrate that each view independently improves performance and that removing the contrastive loss leads to a noticeable drop, confirming the effectiveness of the multi‑view and contrastive components. Qualitative analysis on bug reports with noisy stack traces shows that the auxiliary views help the model infer the correct faulty files despite poor textual overlap.

Key contributions are: (1) formalizing three heterogeneous graph views that capture interaction, similarity, and co‑citation relationships; (2) designing an adaptive contrastive learning scheme that aligns representations across views while filtering out irrelevant information; (3) providing extensive empirical evidence of superior fault‑localization performance. The paper suggests future extensions such as incorporating dynamic execution traces, scaling graph processing via sampling techniques, and applying online learning for streaming bug‑report environments. Overall, MACL‑IRFL represents a significant step toward more robust, context‑aware fault localization that leverages rich relational data beyond plain text similarity.


Comments & Academic Discussion

Loading comments...

Leave a Comment