Software effort estimation based on optimized model tree
📝 Abstract
Background: It is widely recognized that software effort estimation is a regression problem. Model Tree (MT) is one of the Machine Learning based regression techniques that is useful for software effort estimation, but as other machine learning algorithms, the MT has a large space of configuration and requires to carefully setting its parameters. The choice of such parameters is a dataset dependent so no general guideline can govern this process which forms the motivation of this work. Aims: This study investigates the effect of using the most recent optimization algorithm called Bees algorithm to specify the optimal choice of MT parameters that fit a dataset and therefore improve prediction accuracy. Method: We used MT with optimal parameters identified by the Bees algorithm to construct software effort estimation model. The model has been validated over eight datasets come from two main sources: PROMISE and ISBSG. Also we used 3-Fold cross validation to empirically assess the prediction accuracies of different estimation models. As benchmark, results are also compared to those obtained with Stepwise Regression Case-Based Reasoning and Multi-Layer Perceptron. Results: The results obtained from combination of MT and Bees algorithm are encouraging and outperforms other well-known estimation methods applied on employed datasets. They are also interesting enough to suggest the effectiveness of MT among the techniques that are suitable for effort estimation. Conclusions: The use of the Bees algorithm enabled us to automatically find optimal MT parameters required to construct effort estimation models that fit each individual dataset. Also it provided a significant improvement on prediction accuracy.
💡 Analysis
Background: It is widely recognized that software effort estimation is a regression problem. Model Tree (MT) is one of the Machine Learning based regression techniques that is useful for software effort estimation, but as other machine learning algorithms, the MT has a large space of configuration and requires to carefully setting its parameters. The choice of such parameters is a dataset dependent so no general guideline can govern this process which forms the motivation of this work. Aims: This study investigates the effect of using the most recent optimization algorithm called Bees algorithm to specify the optimal choice of MT parameters that fit a dataset and therefore improve prediction accuracy. Method: We used MT with optimal parameters identified by the Bees algorithm to construct software effort estimation model. The model has been validated over eight datasets come from two main sources: PROMISE and ISBSG. Also we used 3-Fold cross validation to empirically assess the prediction accuracies of different estimation models. As benchmark, results are also compared to those obtained with Stepwise Regression Case-Based Reasoning and Multi-Layer Perceptron. Results: The results obtained from combination of MT and Bees algorithm are encouraging and outperforms other well-known estimation methods applied on employed datasets. They are also interesting enough to suggest the effectiveness of MT among the techniques that are suitable for effort estimation. Conclusions: The use of the Bees algorithm enabled us to automatically find optimal MT parameters required to construct effort estimation models that fit each individual dataset. Also it provided a significant improvement on prediction accuracy.
📄 Content
Software Effort Estimation Based on Optimized Model Tree
Mohammad Azzeh Faculty of Information Technology Applied Science University Amman, Jordan POBOX 166 m.y.azzeh@asu.edu.jo
ABSTRACT
Background: It is widely recognized that software effort
estimation is a regression problem. Model Tree (MT) is one of the
Machine Learning based regression techniques that is useful for
software effort estimation, but as other machine learning
algorithms, the MT has a large space of configuration and requires
to carefully setting its parameters. The choice of such parameters
is a dataset dependent so no general guideline can govern this
process which forms the motivation of this work. Aims: This
study investigates the effect of using the most recent optimization
algorithm called Bees algorithm to specify the optimal choice of
MT parameters that fit a dataset and therefore improve prediction
accuracy. Method: We used MT with optimal parameters
identified by the Bees algorithm to construct software effort
estimation model. The model has been validated over eight
datasets come from two main sources: PROMISE and ISBSG.
Also we used 3-Fold cross validation to empirically assess the
prediction accuracies of different estimation models. As
benchmark, results are also compared to those obtained with
Stepwise Regression Case-Based Reasoning and Multi-Layer
Perceptron. Results: The results obtained from combination of
MT and Bees algorithm are encouraging and outperforms other
well-known estimation methods applied on employed datasets.
They are also interesting enough to suggest the effectiveness of
MT among the techniques that are suitable for effort estimation.
Conclusions: The use of the Bees algorithm enabled us to
automatically find optimal MT parameters required to construct
effort estimation models that fit each individual dataset. Also it
provided a significant improvement on prediction accuracy.
Categories and Subject Descriptors
D.2.9 [Software Engineering]: Management—cost estimation.
General Terms
Management, Measurement
Keywords
Software Effort Estimation, Model Tree, Bees Algorithm.
.
- INTRODUCTION Estimating the likely software project effort is one of the major challenges in software engineering and has achieved a considerable interest within scientific research community [2, 3, 15, 16]. In literature, a variety of software effort estimation models have been proposed so far but they have suffered from common problems such as very large performance deviations as well as being highly dataset dependent [10]. The evaluation and comparison results of those models are often contradictory so no single model can outperform others [11, 12]. The main principal reason behind that is the nature of software datasets which are characteristically noisy. However, Software effort estimation is recognized as a regression problem [1], and machine learning methods such as Regression Tree [9], Model Tree (MT) [23], Support Vector Machine [1], Radial Basis Functions, etc. are more capable of handling noisy datasets than statistical based regression models that focus on the correlation between variables. The main concern of this paper will focus on MT. MT [23, 25] is a special type of decision tree and regression tree, but unlike regression tree that have numerical values at the leaves, the MT have linear functions as illustrated in Figure 1. MT is one of the powerful methods for performing regression since it can include categorical features in constructing such model without the need to convert them into dummy variables as performed in the basic regression models. But, like other machine learning techniques, the performance of MT is a data dependent and has large space of configuration possibilities and design options induced for each individual dataset. So it is not surprise to see contradictory results and different performance figures when make slight changes to MT parameters. Such parameters include selection of minimum number of cases (C ) that one node may represent, Whether to prune the tree (P ), finding smoothing coefficient (K ), and split threshold(T ).
Figure 1. Difference between regression trees and Model Trees.
Since the selection of these parameter values is subjective and dataset dependent, there is no general guideline can govern this
process which forms the motivation of this work. In this paper we employed Bees algorithm [18] to search for the optimal design options of MT that fit a specific dataset. The Bees algorithm is a new population-based search algorithm, it was first proposed by Pham [18]. The algorithm mimics the food foraging behavior of swarms of honey bees. In its basic version, the algorithm performs a kind of neighborhood search combined with random search and can be used for optimization. The present paper investigates the effect on the improvement of effort estimation accuracy in MT when t
This content is AI-processed based on ArXiv data.