A Maintainability Checklist for Spreadsheets

Spreadsheets are widely used in industry, because they are flexible and easy to use. Often, they are even used for business-critical applications. It is however difficult for spreadsheet users to correctly assess the maintainability of spreadsheets. Maintainability of spreadsheets is important, since spreadsheets often have a long lifespan, during which they are used by several users. In this paper, we present a checklist aimed at measuring the maintainability of a spreadsheet. This is achieved via asking several questions, and can indicate whether the spreadsheet is safe to use now and in the future. We demonstrate the applicability of our approach on 11 spreadsheets from the EUSES corpus.

💡 Research Summary

Spreadsheets have become indispensable tools in many enterprises because they allow users to quickly prototype calculations, visualise data, and share results without requiring programming expertise. However, the very flexibility that makes spreadsheets attractive also leads to poorly structured, undocumented, and hard‑to‑maintain artefacts, especially when they evolve over months or years and are handed over to multiple users. While software engineering has a rich set of metrics and checklists for assessing maintainability of source code, comparable guidance for spreadsheets is scarce. This paper addresses that gap by proposing a systematic maintainability checklist specifically designed for spreadsheet artefacts.

Checklist Design
The authors start from established software‑maintainability dimensions—modularity, documentation, complexity, and testability—and reinterpret them for the spreadsheet domain. The resulting checklist contains 25 questions grouped into four categories:

Structural Design – evaluates naming conventions for sheets and ranges, the clarity of cell‑range definitions, and the presence of excessive external links.
Documentation & Annotation – checks whether cells contain explanatory comments, whether version information or change logs are kept, and whether a user guide or inline documentation exists.
Complexity & Computational Stability – measures nesting depth of formulas, the use of array formulas, reliance on volatile functions (e.g., NOW, RAND), and the presence of error‑handling constructs such as IFERROR.
Testing & Validation – looks for input validation rules, dedicated test sheets, and scenario‑based verification procedures.

Each question can be answered with a binary “yes/no” or on a five‑point Likert scale. The authors assign weights to the questions based on a small expert survey and a literature‑derived risk assessment. Scores are normalised to a 0–100 scale and interpreted as follows: ≥80 = “Safe”, 50–79 = “Caution Needed”, <50 = “Risk”.

Empirical Evaluation
To validate the checklist, the authors selected 11 spreadsheets from the EUSES corpus, a publicly available repository containing real‑world spreadsheets from diverse domains (finance, education, scientific research, etc.). Two independent assessors applied the checklist to each file; inter‑rater reliability was high (Cohen’s κ = 0.78). The average overall score was 58, placing most spreadsheets in the “Caution Needed” band. Detailed findings include:

Structural Design – average score 70; most files used meaningful sheet names and avoided unnecessary external links.
Documentation – average score 45; many spreadsheets lacked comments, version histories, or user instructions.
Complexity – 30 % of files contained formulas with nesting depth greater than four, and 20 % over‑used volatile functions, raising the risk of hidden errors.
Testing – scores were consistently low (average 30); few spreadsheets provided systematic input checks or dedicated test scenarios.

After the assessment, the authors conducted short interviews with the spreadsheet creators and end‑users. Participants reported that the checklist was intuitive, helped them identify “red‑flag” areas before further development, and could serve as a pre‑deployment gate. However, they also noted that the checklist does not cover VBA macros or custom add‑ins, which are common in more sophisticated spreadsheets.

Discussion
The study demonstrates that a concise, question‑based checklist can surface maintainability concerns that are otherwise invisible to casual users. By translating qualitative observations into a numeric score, organisations can establish policies such as “no spreadsheet may be released to production unless it scores at least 70”. Moreover, the checklist’s structure lends itself to automation: static‑analysis tools could extract answers for many items (e.g., naming conventions, volatile‑function usage) and compute a preliminary score, leaving only the documentation‑related items for human review.

Limitations include the modest sample size (11 spreadsheets) and the reliance on expert‑derived weights, which may not generalise across industries. The checklist also assumes a primarily formula‑driven spreadsheet; files that heavily depend on macros, external APIs, or complex user‑defined functions would require additional items.

Conclusion and Future Work
The authors conclude that their maintainability checklist is both feasible and valuable for real‑world spreadsheet governance. Future research directions are outlined: (1) integration with automated static‑analysis engines to produce fully automated scores, (2) extension of the checklist to cover macro‑heavy spreadsheets, (3) large‑scale field studies measuring the impact of checklist‑driven governance on error rates and maintenance effort, and (4) refinement of weighting schemes through broader expert participation. By providing a practical, repeatable assessment method, the work paves the way for more disciplined spreadsheet development practices, ultimately reducing the hidden costs and risks associated with long‑lived, mission‑critical spreadsheet applications.