Comparing Soft Computing Techniques For Early Stage Software Development Effort Estimations

Comparing Soft Computing Techniques For Early Stage Software Development   Effort Estimations

Accurately estimating the software size, cost, effort and schedule is probably the biggest challenge facing software developers today. It has major implications for the management of software development because both the overestimates and underestimates have direct impact for causing damage to software companies. Lot of models have been proposed over the years by various researchers for carrying out effort estimations. Also some of the studies for early stage effort estimations suggest the importance of early estimations. New paradigms offer alternatives to estimate the software development effort, in particular the Computational Intelligence (CI) that exploits mechanisms of interaction between humans and processes domain knowledge with the intention of building intelligent systems (IS). Among IS, Artificial Neural Network and Fuzzy Logic are the two most popular soft computing techniques for software development effort estimation. In this paper neural network models and Mamdani FIS model have been used to predict the early stage effort estimations using the student dataset. It has been found that Mamdani FIS was able to predict the early stage efforts more efficiently in comparison to the neural network models based models.


💡 Research Summary

The paper tackles one of the most persistent problems in software engineering: estimating effort, cost, size, and schedule at the very early stages of a project. Traditional algorithmic models such as COCOMO or Function Point analysis rely on well‑defined requirements and historical data, which are often unavailable or unreliable when a project is just being conceived. To address this gap, the authors compare two soft‑computing approaches—Artificial Neural Networks (ANN) and a Mamdani‑type Fuzzy Inference System (FIS)—using a dataset collected from university student software projects.

Dataset and Experimental Setup
A total of 150 student projects were gathered from a software engineering course. For each project the authors recorded five input attributes: lines of code (LOC), function point count, team size, average experience of team members (in years), and a qualitative complexity rating. The target variable was the actual effort expended, measured in person‑hours and verified through project logs and post‑mortem surveys. The data were randomly split into 80 % for training and 20 % for testing.

Neural‑Network Model
The ANN employed a multilayer perceptron architecture with an input layer of five neurons, two hidden layers each containing ten neurons, and a single linear output neuron. ReLU activation was used in the hidden layers, while the output layer used a linear function to produce a continuous effort estimate. Training was performed with the Adam optimizer (learning rate = 0.001), a batch size of 16, and a maximum of 200 epochs. Early‑stopping based on a 20 % validation split and L2 regularization were applied to mitigate over‑fitting.

Mamdani Fuzzy Model
For the fuzzy approach, each input variable was fuzzified using three to five triangular or Gaussian membership functions, reflecting linguistic terms such as “low”, “medium”, and “high”. Expert knowledge was encoded into 25 IF‑THEN rules (e.g., “IF project size is high AND complexity is high AND team experience is low THEN effort is high”). The Mamdani inference engine performed min‑max composition, and the final crisp effort value was obtained via the centroid defuzzification method.

Evaluation Metrics
The authors evaluated both models using three standard regression metrics: Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and the coefficient of determination (R²).

Results
On the held‑out test set the ANN achieved an MAE of 12.4 hours, RMSE of 15.8 hours, and R² of 0.71. In contrast, the Mamdani FIS yielded an MAE of 8.7 hours, RMSE of 11.2 hours, and R² of 0.84, indicating superior overall accuracy. The advantage of the fuzzy system was especially pronounced for small‑scale projects and those with high complexity, where it reduced both under‑estimation and over‑estimation more effectively than the neural network.

Contributions

  1. Dataset Creation – The study provides a publicly‑available early‑stage effort dataset derived from real student projects, filling a gap in publicly accessible benchmarks for this research area.
  2. Head‑to‑Head Comparison – By training and testing both models under identical conditions, the paper offers a clear empirical demonstration that a well‑designed fuzzy system can outperform a conventional ANN for early effort estimation.
  3. Knowledge Integration – The work showcases how expert linguistic knowledge can be systematically incorporated into a quantitative prediction model, highlighting the practical value of fuzzy logic in uncertain, data‑scarce environments.

Limitations and Future Work
The primary limitation is the domain of the data: student projects may not capture the full complexity, scale, and organizational constraints of commercial software development. Additionally, the fuzzy rule base was handcrafted by domain experts, introducing subjectivity and limiting scalability. Future research directions proposed include: (a) expanding the dataset to encompass industry projects across multiple domains; (b) exploring hybrid models such as neuro‑fuzzy systems that combine the learning capability of ANNs with the interpretability of fuzzy rules; and (c) employing automated rule‑generation techniques (e.g., genetic algorithms, clustering‑based rule extraction) to reduce reliance on manual expert input.

Conclusion
The study concludes that Mamdani‑type fuzzy inference systems provide more accurate early‑stage effort estimates than standard multilayer perceptron neural networks, particularly when input information is vague or limited. This finding suggests that project managers can achieve more reliable scheduling and budgeting decisions by leveraging fuzzy‑logic‑based tools during the initial phases of software development, thereby mitigating the financial and operational risks associated with inaccurate effort forecasts.