Predicting Coding Effort in Projects Containing XML Code
This paper studies the problem of predicting the coding effort for a subsequent year of development by analysing metrics extracted from project repositories, with an emphasis on projects containing XML code. The study considers thirteen open source projects and applies machine learning algorithms to generate models to predict one-year coding effort, measured in terms of lines of code added, modified and deleted. Both organisational and code metrics associated to revisions are taken into account. The results show that coding effort is highly determined by the expertise of developers while source code metrics have little effect on improving the accuracy of estimations of coding effort. The study also shows that models trained on one project are unreliable at estimating effort in other projects.
💡 Research Summary
The paper tackles the problem of forecasting how much coding work a software project will require in the next calendar year. The authors focus on open‑source projects that contain a substantial amount of XML code, because XML is often used for configuration, data exchange, and domain‑specific languages, and its structural characteristics may affect effort estimation differently from plain source code. Thirteen mature projects were selected from public repositories (GitHub, Apache, etc.). Each project had at least two years of commit history, yielding roughly 45 000 revisions for analysis.
For every revision the authors extracted three families of metrics: (1) Organizational / human metrics – total number of past commits per developer, tenure in the project, activity frequency in the last six months, average lines changed per commit, and a simple collaboration network measure (co‑commit count). (2) Code‑level metrics – file size in lines of code, number of XML tags, maximum tag depth, number of namespaces, schema complexity indicators, and token density. (3) Revision‑level metrics – number of files touched, added/modified/deleted lines, commit‑message length, and temporal attributes (day of week, month, presence of a version tag). The target variables were the total lines of code that would be added, modified, and deleted during the following year, treated as three separate regression problems.
The authors trained a suite of machine‑learning models: ordinary linear regression, ridge and lasso regularisation, random‑forest regression, Gradient Boosting Machine, and XGBoost. Hyper‑parameters were tuned with five‑fold cross‑validation, and performance was measured using Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). Feature importance was assessed through permutation importance and SHAP values.
Key findings:
- Human‑centric metrics dominate predictive power. The most influential features across all projects were the cumulative commit count of the developer who made the revision and the recent activity frequency. These two variables alone reduced MAE by roughly 20–30 % compared with models that relied only on code‑level metrics. The result suggests that a developer’s experience, familiarity with the codebase, and recent engagement are far stronger drivers of future coding effort than the structural complexity of the XML files themselves.
- XML‑specific structural metrics have limited impact. Tag depth, namespace count, and schema complexity contributed little to model accuracy; in some cases they introduced noise and slightly worsened performance. This indicates that, at least for the projects examined, the effort required to modify XML is not strongly correlated with the intrinsic “hardness” of the XML structure.
- Cross‑project generalisation is poor. When a model trained on one project was applied to another, MAE increased on average by 45 % and RMSE by a similar margin. The degradation was especially pronounced when the source and target projects differed in size, domain, or development process. Consequently, a one‑size‑fits‑all predictor is unrealistic; each project needs its own calibrated model or a meta‑learning approach that can adapt to project‑specific distributions.
- Temporal drift matters. Even within the same project, models trained on early years performed worse on later years, highlighting the need for periodic retraining to capture evolving team composition, tooling, and process changes.
Limitations: The dataset is confined to open‑source projects, so the findings may not transfer directly to proprietary environments where development practices differ. Only quantitative metrics were used; qualitative aspects such as sentiment in commit messages, code‑review comments, or developer burnout were omitted. The study also excluded other markup languages (JSON, YAML) that are increasingly common, leaving an open question about whether the observed dominance of human metrics holds for those formats.
Future work suggested by the authors includes: (a) enriching the feature set with natural‑language processing of commit messages and review discussions; (b) exploring meta‑learning or transfer‑learning techniques to build models that can be fine‑tuned across projects; (c) applying deep‑learning time‑series models (LSTM, Transformer) to capture long‑term trends; and (d) conducting industrial case studies to validate the approach in corporate settings.
In summary, the paper provides empirical evidence that, for XML‑heavy software projects, the primary determinant of future coding effort is the expertise and recent activity of the developers involved, while XML structural metrics add little predictive value. Moreover, effort‑prediction models are highly project‑specific and must be trained or adapted for each codebase individually. These insights advise project managers and resource planners to prioritize human‑resource analytics over purely code‑centric metrics when estimating future development work.
Comments & Academic Discussion
Loading comments...
Leave a Comment