Analogy-based effort estimation: a new method to discover set of analogies from dataset characteristics

February 23, 2026

Reading time: 6 minute

...

📝 Abstract

Analogy-based effort estimation (ABE) is one of the efficient methods for software effort estimation because of its outstanding performance and capability of handling noisy datasets. Conventional ABE models usually use the same number of analogies for all projects in the datasets in order to make good estimates. The authors’ claim is that using same number of analogies may produce overall best performance for the whole dataset but not necessarily best performance for each individual project. Therefore there is a need to better understand the dataset characteristics in order to discover the optimum set of analogies for each project rather than using a static k nearest projects. Method: We propose a new technique based on Bisecting k-medoids clustering algorithm to come up with the best set of analogies for each individual project before making the prediction. Results & Conclusions: With Bisecting k-medoids it is possible to better understand the dataset characteristic, and automatically find best set of analogies for each test project. Performance figures of the proposed estimation method are promising and better than those of other regular ABE models

💡 Analysis

🇰🇷 한글로 읽기

📄 Content

Analogy-Based Effort Estimation: A New Method to Discover Set of Analogies from Dataset Characteristics

Mohammad Azzeh Department of Software Engineering Applied Science University Amman, Jordan POBOX 166 m.y.azzeh@asu.edu.jo Ali Bou Nassif Department of Computer Science University of Western Ontario London, Ontario, Canada, N6A 5B9 abounas@uwo.ca

ABSTRACT.
Background: Analogy-Based Effort Estimation (ABE) is one of the efficient methods for software effort estimation because of its outstanding performance and capability of handling noisy datasets. Problem & Objective: Conventional ABE models usually use the same number of analogies for all projects in the datasets in order to make good estimates. Our claim is that using same number of analogies may produce overall best performance for the whole dataset but not necessarily best performance for each individual project. Therefore there is a need to better understand the dataset characteristics in order to discover the optimum set of analogies for each project rather than using a static k nearest projects. Method: We propose a new technique based on Bisecting k-medoids clustering algorithm to come up with the best set of analogies for each individual project before making the prediction.
Results & Conclusions: With Bisecting k-medoids it is possible to better understand the dataset characteristic, and automatically find best set of analogies for each test project. Performance figures of the proposed estimation method are promising and better than those of other regular ABE models.

Keywords: Software Effort Estimation, Analogy-Based Effort Estimation, Cluster analysis.

INTRODUCTION Analogy Based Effort Estimation (ABE) is simplified a process of finding nearest analogies based on notion of retrieval by similarity [1, 12, 16, 24]. It was remarked that the predictive performance of ABE is a dataset dependent where each dataset requires different configurations and design decisions [14, 15, 19, 20]. Recent publications reported the importance of adjustment mechanism for generating better estimates in ABE than null-adjustment mechanism [1, 13, 26]. However, irrespective of the type of adjustment technique followed, the process of discovering the best set of analogies to be used is still a key challenge.
This paper focuses on the problem of how can we automatically come up with the optimum set of analogies for each individual project before making the prediction? Yet, there is no reliable method that can discover such set of nearest analogies before making prediction. Conventional ABE models start with one analogy and increase this number depending on the overall performance of the whole dataset then it uses the set of first k analogies that produces the best overall performance. However, a fixed k value that produces overall best performance does not necessarily provide the best performance for each individual project, and may not be suitable for other datasets. Our claim is that we can avoid sticking to a fixed best performing number of analogies which changes from dataset to dataset or even from a single project to another in the same dataset. Therefore we propose an alternative technique to tune ABE by proposing a Bisecting k-medoids (BK) clustering algorithm. The Bisecting procedure is used with k-medoids to avoid guessing number of clusters, by recursively applying the basic k-medoids algorithm and splitting each cluster into two sub-clusters to form a binary tree of clusters, starting from the whole dataset. This allows us to discover the structure of dataset efficiently and automatically come up with the best set of analogies as well as excluding irrelevant analogies for each individual test project. It is important to note that the discovered set of analogies does not necessarily include the same order of nearest analogies as in conventional ABE.
The rest of the article is structured as follows: Section 2 defines the research problem in more details. Section 3 provides the related work. Section 4 the methodology we propose to address the research problem. Section 5 presents the results we obtained. Section 6 presents discussion of our results and findings. Lastly Section 7 summarizes our conclusions and future work.
RESEARCH PROBLEM Several studies in software effort estimation try to address the problem of finding optimum number of nearest analogies to be used by ABE [14, 15, 16, 31]. The conclusion drawn from these studies that using a static k value that produces overall lowest MMRE does not necessarily provide the lowest MRE value for each individual project, and may not be suitable for other datasets. This shows that every dataset has different characteristics and this would have a significant impact on the process of discovering the best set of analogies. To illustrate our point of view and better understand this problem we carried out an intensive search to find the mean e

View Original ArXiv

This content is AI-processed based on ArXiv data.

Analogy-based effort estimation: a new method to discover set of analogies from dataset characteristics

📝 Abstract

💡 Analysis

📄 Content

Table of Contents

Table of Contents

📝 Abstract

💡 Analysis

📄 Content

Start searching

No results found