Model tree based adaption strategy for software effort estimation by analogy
📝 Abstract
Background: Adaptation technique is a crucial task for analogy based estimation. Current adaptation techniques often use linear size or linear similarity adjustment mechanisms which are often not suitable for datasets that have complex structure with many categorical attributes. Furthermore, the use of nonlinear adaptation technique such as neural network and genetic algorithms needs many user interactions and parameters optimization for configuring them (such as network model, number of neurons, activation functions, training functions, mutation, selection, crossover, … etc.). Aims: In response to the abovementioned challenges, the present paper proposes a new adaptation strategy using Model Tree based attribute distance to adjust estimation by analogy and derive new estimates. Using Model Tree has an advantage to deal with categorical attributes, minimize user interaction and improve efficiency of model learning through classification. Method: Seven well known datasets have been used with 3-Fold cross validation to empirically validate the proposed approach. The proposed method has been investigated using various K analogies from 1 to 3. Results: Experimental results showed that the proposed approach produced better results when compared with those obtained by using estimation by analogy based linear size adaptation, linear similarity adaptation, ‘regression towards the mean’ and null adaptation. Conclusions: Model Tree could form a useful extension for estimation by analogy especially for complex data sets with large number of categorical attributes.
💡 Analysis
Background: Adaptation technique is a crucial task for analogy based estimation. Current adaptation techniques often use linear size or linear similarity adjustment mechanisms which are often not suitable for datasets that have complex structure with many categorical attributes. Furthermore, the use of nonlinear adaptation technique such as neural network and genetic algorithms needs many user interactions and parameters optimization for configuring them (such as network model, number of neurons, activation functions, training functions, mutation, selection, crossover, … etc.). Aims: In response to the abovementioned challenges, the present paper proposes a new adaptation strategy using Model Tree based attribute distance to adjust estimation by analogy and derive new estimates. Using Model Tree has an advantage to deal with categorical attributes, minimize user interaction and improve efficiency of model learning through classification. Method: Seven well known datasets have been used with 3-Fold cross validation to empirically validate the proposed approach. The proposed method has been investigated using various K analogies from 1 to 3. Results: Experimental results showed that the proposed approach produced better results when compared with those obtained by using estimation by analogy based linear size adaptation, linear similarity adaptation, ‘regression towards the mean’ and null adaptation. Conclusions: Model Tree could form a useful extension for estimation by analogy especially for complex data sets with large number of categorical attributes.
📄 Content
Model Tree Based Adaption Strategy for Software Effort Estimation by Analogy Mohammad Azzeh Department of Software Engineering Applied Science University Amman, Jordan PO BOX 133 m.y.azzeh@asu.edu.jo
Abstract— Background: Adaptation technique is a crucial task for analogy based estimation. Current adaptation techniques often use linear size or linear similarity adjustment mechanisms which are often not suitable for datasets that have complex structure with many categorical attributes. Furthermore, the use of nonlinear adaptation technique such as neural network and genetic algorithms needs many user interactions and parameters optimization for configuring them (such as network model, number of neurons, activation functions, training functions, mutation, selection, crossover,…etc.). Aims: In response to the abovementioned challenges, the present paper proposes a new adaptation strategy using Model Tree based attribute distance to adjust estimation by analogy and derive new estimates. Using Model Tree has an advantage to deal with categorical attributes, minimize user interaction and improve efficiency of model learning through classification. Method: Seven well known datasets have been used with 3-Fold cross validation to empirically validate the proposed approach. The proposed method has been investigated using various K analogies from 1 to 3. Results: Experimental results showed that the proposed approach produced better results when compared with those obtained by using estimation by analogy based linear size adaptation, linear similarity adaptation, ‘regression towards the mean’ and null adaptation. Conclusions: Model Tree could form a useful extension for estimation by analogy especially for complex data sets with large number of categorical attributes.
Keywords: Adaptation Strategy, Analogy-based estimation, Model
Tree.
I.
INTRODUCTION
Estimation by Analogy (EBA) makes prediction for a new
project by retrieving previously completed similar projects that
have been encountered and remembered as historical projects
[2, 7, 18, 21, 22, 23]. The effort values in the retrieved projects
are reused as proposed prediction to the new project. In a few
cases, particularly when the dataset is enough large and exhibit
some normal characteristics, the effort of the retrieved project
can be reused directly without adaptation [20]. But for others, it
is common for the retrieved project to be regarded as an initial
solution that should be refined to capture the differences
between the new and retrieved projects [20].
Adaptation (synonymously adjustment) is a mechanism
used to capture the differences between target project and most
similar project(s) and then derive a new estimate [14, 20]. It is
an important step in estimation by analogy as it reflects the
structure of target project on the retrieved projects. Figure 1
illustrates the process of adjusted analogy based estimation.
However, in literature, many adaptation techniques have been
proposed to improve prediction accuracy of estimation by
analogy such as using ‘regression towards the mean’ [11],
Genetic based similarity adjustment [6], linear size adjustment
[10, 14, 24], and nonlinear adjustment [16].
Figure 1. Process of adjusted analogy based method [16]
The majority of these adjustment mechanisms use linear
adjustment such as size adjustment, similarity adjustment and
productivity adjustment, which are generally restricted to size
attribute and could not accept other than numeric attributes
[16]. In practice, these approaches are not often efficient
because software project datasets often have a complex
structure and exhibit non-normal characteristics [2, 3, 16], and
contain large proportion of categorical attributes [3, 8].
Moreover, the other learning based adaptation techniques such
as genetic algorithm and neural networks are often challenging
because they need parameter optimization and configuration
setup that requires many user interactions such as decisions
about: network model, number of neurons, activation functions,
training
functions,
mutation,
selection,
crossover, etc.
Moreover, learning and optimization through neural network
and genetic algorithm takes sometimes longer time to train and
may reduce performance of the model. Therefore any useful
adaptation mechanism should learn from the structure of the
historical dataset and should involve categorical attributes as
they contain useful information to improve the accuracies of
effort estimation [3, 8]. In addition to that it should minimize
user interaction and reduce configuration parameters.
In response to the abovementioned reasons, the present
paper proposes a new flexible adaptation technique based on
Model Tree (see section 3 for more details) using attribute
distance values between source historical projects and their
closest analogies. In this approach, the conventional EBA
procedure
This content is AI-processed based on ArXiv data.