Towards Effective Bug Triage with Towards Effective Bug Triage with Software Data Reduction Techniques

February 23, 2026

Reading time: 6 minute

...

📝 Abstract

Software companies spend over 45 percent of cost in dealing with software bugs. An inevitable step of fixing bugs is bug triage, which aims to correctly assign a developer to a new bug. To decrease the time cost in manual work, text classification techniques are applied to conduct automatic bug triage. In this paper, we address the problem of data reduction for bug triage, i.e., how to reduce the scale and improve the quality of bug data. We combine instance selection with feature selection to simultaneously reduce data scale on the bug dimension and the word dimension. To determine the order of applying instance selection and feature selection, we extract attributes from historical bug data sets and build a predictive model for a new bug data set. We empirically investigate the performance of data reduction on totally 600,000 bug reports of two large open source projects, namely Eclipse and Mozilla. The results show that our data reduction can effectively reduce the data scale and improve the accuracy of bug triage. Our work provides an approach to leveraging techniques on data processing to form reduced and high-quality bug data in software development and maintenance.

💡 Analysis

🇰🇷 한글로 읽기

📄 Content

IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID 1

Towards Effective Bug Triage with
Software Data Reduction Techniques Jifeng Xuan, He Jiang, Member, IEEE, Yan Hu, Zhilei Ren,
Weiqin Zou, Zhongxuan Luo, Xindong Wu, Fellow, IEEE Abstract—Software companies spend over 45 percent of cost in dealing with software bugs. An inevitable step of fixing bugs is bug triage, which aims to correctly assign a developer to a new bug. To decrease the time cost in manual work, text classification techniques are applied to conduct automatic bug triage. In this paper, we address the problem of data reduction for bug triage, i.e., how to reduce the scale and improve the quality of bug data. We combine instance selection with feature selection to simultaneously reduce data scale on the bug dimension and the word dimension. To determine the order of applying instance selection and feature selection, we extract attributes from historical bug data sets and build a predictive model for a new bug data set. We empirically investigate the performance of data reduction on totally 600,000 bug reports of two large open source projects, namely Eclipse and Mozilla. The results show that our data reduction can effectively reduce the data scale and improve the accuracy of bug triage. Our work provides an approach to leveraging techniques on data processing to form reduced and high-quality bug data in software development and maintenance.
Index Terms—Mining software repositories, application of data preprocessing, data management in bug repositories, bug data reduction, feature selection, instance selection, bug triage, prediction for reduction orders.
——————————  —————————— 1 INTRODUCTION INING software repositories is an interdisciplinary domain, which aims to employ data mining to deal with software engineering problems [22]. In modern soft- ware development, software repositories are large-scale databases for storing the output of software development, e.g., source code, bugs, emails, and specifications. Tradi- tional software analysis is not completely suitable for the large-scale and complex data in software repositories [58]. Data mining has emerged as a promising means to handle software data (e.g., [7], [32]). By leveraging data mining techniques, mining software repositories can uncover in- teresting information in software repositories and solve real-world software problems.
A bug repository (a typical software repository, for stor- ing details of bugs), plays an important role in managing software bugs. Software bugs are inevitable and fixing bugs is expensive in software development. Software companies spend over 45 percent of cost in fixing bugs [39]. Large software projects deploy bug repositories (also called bug tracking systems) to support information collection and to assist developers to handle bugs [14], [9]. In a bug repos- itory, a bug is maintained as a bug report, which records the textual description of reproducing the bug and updates according to the status of bug fixing [64]. A bug repository provides a data platform to support many types of tasks on bugs, e.g., fault prediction [7], [49], bug localization [2], and reopened-bug analysis [63]. In this paper, bug reports in a bug repository are called bug data.
There are two challenges related to bug data that may affect the effective use of bug repositories in software de- velopment tasks, namely the large scale and the low quali- ty. On one hand, due to the daily-reported bugs, a large number of new bugs are stored in bug repositories. Taking an open source project, Eclipse [13], as an example, an av- erage of 30 new bugs are reported to bug repositories per day in 2007 [3]; from 2001 to 2010, 333,371 bugs have been reported to Eclipse by over 34,917 developers and users [57]. It is a challenge to manually examine such large-scale bug data in software development. On the other hand, software techniques suffer from the low quality of bug data. Two typical characteristics of low-quality bugs are noise and redundancy. Noisy bugs may mislead related devel- opers [64] while redundant bugs waste the limited time of bug handling [54].
A time-consuming step of handling software bugs is bug triage, which aims to assign a correct developer to fix a new bug [1], [25], [3], [40]. In traditional software devel- opment, new bugs are manually triaged by an expert de- veloper, i.e., a human triager. Due to the large number of daily bugs and the lack of expertise of all the bugs, manual bug triage is expensive in time cost and low in accuracy. In manual bug triage in Eclipse, 44 percent of bugs are as- signed by mistake while the time cost between opening one bug and its first triaging is 19.3 days on average [25]. To avoid the expensive cost of manual bug triage, existing work [1] has proposed an automatic bug triage approach, which applies text classification techniques to predict de- velopers for

View Original ArXiv

This content is AI-processed based on ArXiv data.

Towards Effective Bug Triage with Towards Effective Bug Triage with Software Data Reduction Techniques

📝 Abstract

💡 Analysis

📄 Content

Table of Contents

Table of Contents

📝 Abstract

💡 Analysis

📄 Content

Start searching

No results found