Investigating Effort Prediction of Software Projects on the ISBSG Dataset

Investigating Effort Prediction of Software Projects on the ISBSG   Dataset
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Many cost estimation models have been proposed over the last three decades. In this study, we investigate fuzzy ID3 decision tree as a method for software effort estimation. Fuzzy ID software effort estimation model is designed by incorporating the principles of ID3 decision tree and the concepts of the fuzzy settheoretic; permitting the model to handle uncertain and imprecise data when presenting the software projects. MMRE (Mean Magnitude of Relative Error) and Pred(l) (Prediction at level l) are used, as measures of prediction accuracy, for this study. A series of experiments is reported using ISBSG software projects dataset. Fuzzy trees are grown using different fuzziness control thresholds. Results showed that optimizing the fuzzy ID3 parameters can improve greatly the accuracy of the generated software cost estimate.


💡 Research Summary

The paper addresses the long‑standing challenge of accurately estimating software development effort, a critical factor for project planning and budgeting. While numerous models such as COCOMO, linear regression, and various machine‑learning techniques have been proposed over the past three decades, many of them struggle with the inherent uncertainty, non‑linearity, and multicollinearity present in real‑world project data. To mitigate these issues, the authors introduce a fuzzy version of the ID3 decision‑tree algorithm (Fuzzy‑ID3) and evaluate its performance on the International Software Benchmarking Standards Group (ISBSG) dataset, one of the most comprehensive publicly available repositories of software project records.

Methodology
The core contribution lies in integrating fuzzy set theory with the classic ID3 algorithm. Each attribute value is mapped to a set of linguistic labels (e.g., “low”, “medium”, “high”) through predefined membership functions, typically triangular or trapezoidal. Instead of using crisp probabilities in the entropy calculation, the algorithm employs fuzzy membership degrees, thereby computing a “fuzzy information gain”. This allows the tree to split on attributes even when the data points lie near decision boundaries, reducing the risk of over‑fitting that plagues conventional decision trees on noisy datasets.

The ISBSG Release 10 dataset was filtered to retain 4,200 complete records. Independent variables include Function Points, Lines of Code, development methodology (traditional vs. agile), team size, tool usage, and project duration; the dependent variable is actual effort measured in person‑months. Missing values were imputed using mean (for continuous attributes) or mode (for categorical attributes), and all continuous features were normalized to the


Comments & Academic Discussion

Loading comments...

Leave a Comment