Requirements for Automated Assessment of Spreadsheet Maintainability

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The use of spreadsheets is widespread. Be it in business, finance, engineering or other areas, spreadsheets are created for their flexibility and ease to quickly model a problem. Very often they evolve from simple prototypes to implementations of crucial business logic. Spreadsheets that play a crucial role in an organization will naturally have a long lifespan and will be maintained and evolved by several people. Therefore, it is important not only to look at their reliability, i.e., how well is the intended functionality implemented, but also at their maintainability, i.e., how easy it is to diagnose a spreadsheet for deficiencies and modify it without degrading its quality. In this position paper we argue for the need to create a model to estimate the maintainability of a spreadsheet based on (automated) measurement. We propose to do so by applying a structured methodology that has already shown its value in the estimation of maintainability of software products. We also argue for the creation of a curated, community-contributed repository of spreadsheets.

💡 Research Summary

The paper addresses the largely overlooked issue of spreadsheet maintainability, arguing that as spreadsheets evolve from simple prototypes to mission‑critical business logic and are often maintained by multiple users over many years, their maintainability – the ease with which they can be understood, diagnosed, modified, and validated – becomes a crucial quality attribute. The authors propose an automated, metric‑based assessment framework that adapts the Software Improvement Group’s (SIG) methodology, which has been successfully applied to software products, to the spreadsheet domain.

The SIG approach consists of a layered quality model grounded in ISO/IEC 9126. Low‑level static analysis collects quantitative data on basic elements (in software: lines, methods, classes; in spreadsheets: cells, formulas, rows, sheets). These raw metrics are mapped to ordinal ratings (0.5–5.5, rounded to a 1‑5 star scale) for high‑level properties such as volume, duplication, and complexity. The mapping uses risk profiles that partition metric values into four risk categories (low, moderate, high, very high). Property ratings are then aggregated into the four ISO maintainability sub‑characteristics and finally into an overall maintainability score. Crucially, the model’s thresholds are calibrated on a large benchmark repository and are periodically re‑calibrated to keep pace with evolving development practices.

To transfer this methodology to spreadsheets, the authors identify several concrete steps. First, they will define measurement goals using the Goal‑Question‑Metric (GQM) paradigm: the ultimate goal is a reliable maintainability rating; supporting questions address understandability, change risk, and validation effort; and specific metrics (e.g., formula complexity, cell‑dependency graph density, duplicate formula ratio, macro usage) will be selected to answer those questions. Second, a representative corpus of spreadsheets must be assembled. The existing EUSES corpus (≈4,500 files) is useful for preliminary experiments but suffers from lack of professional relevance, insufficient meta‑information, unclear legal permissions, outdated file formats, and static composition.

Consequently, the paper calls for the creation of a curated, dynamic, open, and useful spreadsheet repository. “Curated” means each uploaded spreadsheet is screened for usability, relevance, and accompanied by rich metadata (author, domain, creation date, etc.). “Dynamic” implies continuous growth and versioning, enabling periodic re‑calibration of the quality model. “Open” ensures free access and community contributions, while allowing restricted sections under NDAs for confidential data. “Useful” guarantees that contributors receive feedback on the quality of their submissions and that researchers obtain a reliable benchmark for studies.

The authors argue that an automated assessment offers four major advantages over expert‑based reviews: (1) objectivity – results depend solely on measured data; (2) repeatability – the same tool yields identical outcomes; (3) cost‑effectiveness – computers are cheaper and faster than human experts; and (4) scalability – thousands of spreadsheets can be evaluated, supporting organization‑wide risk monitoring.

In conclusion, the paper outlines the requirements for an automated spreadsheet maintainability assessment: a well‑defined GQM‑driven metric set, a large and representative benchmark corpus, a calibrated quality model, and an infrastructure for continuous repository maintenance. The authors invite the spreadsheet developer and user communities to contribute spreadsheets, metadata, and expertise, offering to lead the effort while emphasizing that community participation is essential for success. By establishing such a framework, organizations can gain actionable insight into the hidden risks of poorly maintainable spreadsheets, improve overall spreadsheet quality, and strengthen their broader IT governance and risk management practices.

Requirements for Automated Assessment of Spreadsheet Maintainability

💡 Research Summary

Comments & Academic Discussion

Leave a Comment