Impact of Errors in Operational Spreadsheets

Impact of Errors in Operational Spreadsheets Powell, Lawson & Baker Impact of Errors in Operational Spreadsheets Stephen G. Powell, Barry La wson, and Kenneth R. Baker Tuck School of Business Dartmouth College Hanover NH 03755 USA May 22, 2007 Abstract All users of spreadsheets struggle with the problem of errors. Errors are thought to be prevalent in spreadsheets, and in some instances they have cost organizations millions of dollars. In a previous study of 50 operational spreadsheets we found erro rs in 0.8% to 1.8 % of all formula cells, depending on how errors are defined. In the current study we estimate the quantitative impacts of errors in 25 operational spreadsheets from five di fferent organizations. We find that many errors have no quantitative impact on the spreadsheet. Those that have an impact often affect unimportant portions of the spreadsheet. The rema ining errors do sometimes have substantial impacts on key aspects of the spreadsheet. Th is paper provides the first fully-documented evidence on the quantitative impact of er rors in operational spreadsheets. 1. Introduction Although previous research has suggested that e rrors are prevalent in spreadsheets (Panko and Halverson, 1996; Panko 2005), we have only lim ite d information on the types o f errors that occur, why they occur, and how to avoid them. We recently summarized the research literature on spreadsheet errors (Powell et al., 2006a) a nd came to the following conclusions: • There is no generally agreed upon taxonomy of errors in spreadsheets. • The distinction between actual errors a nd poor practices is not well-defined. • All studies that have examined spreadsheets in the field have found errors, although there is no standardization in these studies of o the categories of errors o the methods used to detect errors o the sample of spreadsheets studied. • It is not possible at present to estimate accuratel y the prevalence or the impact of errors in spreadsheets. The research we report here was designed to test fo r errors in a large sample of spreadsheets in actual use by organizations. We developed an e xplicit auditing protocol and trained a group of researchers to apply it consistently . Using an explicit protocol is important for two reasons: one, it allows other researchers to replicate and impr ove on our work; two, it contributes to the development of improved auditing proc edures, whic h is important in its own right. The auditing procedure is described in detail in Powell et al. (2006b). We begin this paper with a brief review of previ ous work on spreadsheet errors. Then we describe the design of our study and the sample of spreadsh eets we audited. In a previous study we audited 50 spreadsheets from a variety of organizations and summarized the results in terms of error instances (the occurrence of a single type of error) and error cells (the number of cells affected by a single error instance). Th e current study focuses on the quantitative impact of errors. Each error identified in this study was verified by th e developer of the spreadsheet, and the quantitative impact of fixing the error was determined. 57 Impact of Errors in Operational Spreadsheets Powell, Lawson & Baker 2. Research design Our research into spreadsh eet errors is focused on completed, operational spreadsheets, not errors made in a laboratory setting or errors made during the development of a spreadsheet. In a previous study we used an auditing procedure on a large number of spreadsheets, but we did not have access to the developers. This allowed us to audit a large number of spreadsheets, but it also meant that we could not check our understanding of a model with the developer. In practice, this meant that at times we accepted a suspicious formula as correct because we could not be sure that it was incorrect. In the current st udy we worked with 25 volunteer spreadsheet developers within five different organizations. The developers each completed a survey describing their spreadsheet’s purpose and design, as well as any unusual or special formulas or assumptions. Two researchers then independently audited each sp readsheet and pooled their results. Finally, we debriefed the developer on each issue our audit ra ised, and categorized each issue as an error, a poor practice, or not an error. Actual errors were then corrected and the change in the relevant output cell was recorded as the quantitative measure of the impact of the error. 2.1 Sample spreadsheets We first identified five organizations that were willing to provide volunteers and to have their spreadsheets audited. This included two consulti ng companies, a large financial services firm, a manufacturing company, and a college. Each orga nization identified five vol unteers, each of whom provided one spreadsheet for auditing. We provided the following specifications to the volunteers for help in choosing a spreadsheet to audit: -contains 3-10 worksheets -contains 250-10,000 formulas -occupies 200-1,000 kb of mem ory -developed by the volunteer in the last twelve months -contains no complex Visual Basic code -is well understood by the developer -has no broken external links Not all the spreadsheets we audited conform to these specifications. In fact, the average number of sheets in our sample was 15.2 (the range was 2 to 135 sheets) and the average size in kilob ytes was 1,463 (the range was 45 to 7,450kb). Many of the spreadsheets in our sample were larger on one or more of the dimensions than specified above. While our sample is not strictly random , it is certainly representative of the general population of spreadsheets (with the caveats c ited above). The sample includes spreadsheets fr om different types of organizations, spreadsheets created by deve lopers with different degrees of expertise, and spreadsheets that span a broad range from sm all and simple to large and complex. 2.2 Auditing protocol The auditing protocol we developed for a previ ous stud y is a highly-detailed document that specifies the steps to take in auditing a spread sheet for size, complexity, several types of qualitative features, and errors. (A complete descr iption of the protocol is available at our research project’s website, http://mba.tuck. dartmouth.edu/spreadsheet/index.html.) This protocol evolved over several months of t esting. During this time we trained auditors and tested the protocol ourselves and through our assistants on dozens of operational spreadsheets. Our current study used a very similar approach but focused less on gathering data about the spreadsheets and more on finding potential errors. The typical approach was to review the survey 58 Impact of Errors in Operational Spreadsheets Powell, Lawson & Baker provided by the developer, especially the porti on describing the various worksheets and their interrelationships. The next step was to run the auditing software Spreadsheet Professional (http://www.spreadsheetinnovations.com), which provides summary maps for each sheet and reports on the location of potential errors. We then examined each sheet in turn, first looking at the map for suspicious cells or ranges and sca nning the error tests for unusual conditions. Then we inspected the sheet itself, first determining th e location of data and formulas and subsequently auditing every unique form ula and most copied form ulas. In this study we recorded data only on cells th at were potentially errors. For each problematic cell or range we recorded the following information: • location: cell address or range • type of error (see below) • how it was discovered (whether by map analys is, error tests, code inspection, or sensitivity testing) • description of the possible error After we had discussed the potential error with the developer, we then recorded how the issue was resolved (Error, Poor Practice, No Error). For Errors, we also recorded the cell used to measure the impact and the absolute and percen tage changes in that cell when the error was corrected. 2.3 Measuring impacts Measuring the impact of an error on a spreadsheet is necessarily somewhat subjective. First, some errors occur in a single cell while others occur in many cells. Do we consider each cell separately and measure the impact of correcting it alone, or do we correct all cells with a similar error and measure the overall impact? Second, some error cells are used to calculate many other cells while others impact no other cells. When a cell impacts man y other cells it is not always obvious which of the impacted cells to use to measure the effect. (And, of course, different errors can impact different cells.) Third, it is not always clear how to correct an error. For exam ple, if erroneous inputs were used do we replace them with aver age inputs or extreme inputs? Finally, it is necessary to decide whether to measure errors in absolute or relative terms, and whether to combine all the errors in a given workbook into one overall error. In this study we chose to measure the effect of each error separately. In most cases we corrected all cells with a given type of error, considering this one error with a singl e im pact. When such an error impacted only the erroneous cells themselves we computed the maximum change from the base case and took that as our error estimate. When such an error impacted a single cell we measured the impact of correcting all the error cells on that one cell. We did not attempt to determine a single error estimate for each workbook but measured each error separately. In many cases the only cell impacted by an error was the ce ll itself. When an error cell had dependencies, we traced these to what we judged to be the most important dependent cell and measured the impact of correcting the error on that cell. When we began this research, we had expected that most spreadsheets would be set up to calculate a few key outputs. If this were the case, it would be straightforward to determine the impact on these key outputs of an error anywhere in the spreadsheet. We were surprised to discover during our research that many spreadsheet s do not have a small number of key output cells. Rather, we found many examples in which hund reds or thous ands of results were calculated that themselves had no dependents. This made it more difficult to determine the im pact of errors. 59 Impact of Errors in Operational Spreadsheets Powell, Lawson & Baker In the end we had to use quite a bit of judgment to decide which cell or cells to use to measure the impact of errors. 3. Error types One of the challenges of spreadsheet error research is how to categorize errors. As we pointed out earlier, many different error classifications have b een offered (Rajalingham, et al., 2000; Purser and Chadwick, 2006). Most of these suffer from the same flaw: errors that arise from different causes cannot be distinguished by an auditor. Fo r example, when we encounter an error in a formula we can rarely determine whether the error was due to sloppy t yping, lack of domain knowledge, lack of Excel knowledge, a subsequent user changing the formula, or an unknown cause. We can, however, easily detect some formul as that give the wrong result. We can also identify many practices that are likel y to cause errors as the spreadsh eet is used or that simply will make it harder than necessary to use th e spreadsheet productively. Other poor practices include limited or nonexistent docum entation, dupli cation of inputs, illogical phy sical layout, and so on. After considerable testing we settled on the follo wing six error types that our experience with auditing suggested were well-define d in theory and could be identified with high reliability in practice: • Logic error - a formula is used incorr ectly, leading to an incorrect result • Reference error - a formula contains one or more incorrect references to other cells • Hard-coding numbers in a formula - one or m ore numbers appear in formulas and the practice is sufficiently dangerous • Copy/Paste error - a formula is wrong due to inaccurate use of copy/paste • Data input error - an incorrect data input is used • Omission error - a formula is wrong because one or more of its input cells is blank. More information on this error taxonom y is available in Powell et al. (2006c). We should point out that hard codin g was identified in our previous study as the m ost common poor practice. In this study we ignored hard coding in most instances, on the grounds that it was unlikely to represent an outright error. Ho wever, there were a few instances in which contradictory inputs were hard-coded and we did cite those as potential errors. 4. Impact of errors Table 1 summarizes our results. In column 1 we have used a two-digit code to label each spreadsheet. For example, spreadsheet 3.4 is th e fourth spreadsheet from organization 3. The table gives the following info rmation for each spreadsheet: • number of issues raised in our audits • number of errors confirmed in interviews • number of errors with non-zero quantitative im pact • maximum percentage impact • maximum absolute impact Within this sample of 25 spreadsheets we identified a total of 381 issues. After we discussed these issues with the developers we found th at nine spreadsheets had no errors; among the remaining 16 spreadsheets we found a total of 117 errors. Of these 177, 47 had zero quantitative impact, leaving 70 errors with non-zero im pact. 60 Impact of Errors in Operational Spreadsheets Powell, Lawson & Baker As we pointed out above, there are two ways to measure the impact of errors: absolute and relative. Absolute impacts are important because they tell us how large errors are in the units of the spreadsheet. However, they cannot be co mpared easily across workbooks, since a million dollar error may be trivial in one spreadsheet a nd catastrophic in another. Relative (or percentage) errors more accurately reflect the significance of an error, but they have their shortcomings as well. One problem with relative errors is that pe rcentage changes cannot be determined when the initial value is zero; another is that percenta ge changes in percentages are not generally as meaningful as percentage changes in dollar amounts. We present our results here in both absolute and relative terms. Figure 1 shows the distribution of absolute error im pacts. The most salient point to draw from this figure is that 47 of the errors we found had zero im pact on the spreadsheet. This often came about when a formula had an erroneous reference, but both the erroneous and the correct input cells had the same value. Thus when the error was fixed the results did not change. Returning to Figure 1, we see that 12 of the erro rs involved percentag es; among these the average absolute change was 22%. Twenty-four of the rema ining 58 errors involved absolute errors less than $10,000. However, some errors were huge : the largest single absolute error we found was over $100 million! Figure 2 shows the distribution of percentage error impacts. (The NA category includes four errors in cells with an initial value of zero, fo r which a percentage change is not defined.) Of course 47 of the 117 errors had zero impact, regard less of how m easured. Forty-seven of the 66 remaining errors were less than 10% of the initial value. As with absolute errors, there are some very large errors: four, in fact, were over 100%. Our evidence suggests that spreadsh eet practice is very different among the five organizations we studied. In the five spreadsheets from Organization 5 we could identify only five issues to discuss with the developers and no erro rs were identified among those five issues. Organization 5 is a small consulting company with highly educat ed em ployees and a culture that demands excellence. The spreadsheets we audited from this firm were works of art: thoughtfully designed, well documented, easy to understand, and error free. Organization 4 had two spreadsheets with no errors and two with 22 and 44 errors, respectively. The quality of the spreadsheet practice in this organization clearly depends on just where one looks. In this case we found both the best of practice and the worst of practice in offices just a few miles apart. In Organization 3, which is another consulting company, all the spreadsheets we audited had errors but in three cases no error had a measurabl e impact on the results. Even in the remaining two spreadsheets the errors were few in number and fairly small in terms of impact. Organizations 1 and 2 are both very large. One is a financial firm and the other is a manufacturing firm. Some of the spreadsheets we audited fro m these companies were astonishingly large and complex. Perhaps for this reason, only two of the ten we audited were error-free (four had no errors with impact). The quality of spread sheet practice in both of these companies was inconsistent, with inadequate attention paid to spreadsheet design, simplicity, ease of use, documentation, and consistency. 61 Impact of Errors in Operational Spreadsheets Powell, Lawson & Baker We can summarize our findings as follows: • Some organizations have spreadsheet s that are essentially error-free. • Within a single organization, spreadsheet practice can range from excellent to poor. • Some organizations use spreadsheets that are rife with errors and some of these errors are of substantial magnitude. • Many errors have zero impact, or impact unimportant calculations. • There is little correlation between the impor tance of the application or the risk involved and the quality of the spreadsheet. • Few spreadsheets contain errors that, in the eyes of their developers, would destroy their usefulness. 5. Qualitative observations Many writers have observed that the spreadsheet, fo r all its attractiveness to end-users, is in some ways a dangerous modeling platform. Not only are th e logic of the model and the numbers commingled, but the physical layout permits unstructured designs. It is not surprising that amateur programmers, who lack structured desi gn methods, make errors when using such free- form software. Our research has shown that errors come in more varieties than perhaps even the most extensive taxonomy can encompass. Because th e spreadsheet platform is so unstructured, and because end- users generally follow unique designs, errors and poor practices can arise in thousands of different guises. Error researchers inevitably must use their judgment in deciding what is an error and what is not. Thus we should be skeptical of claims about the frequency and im pact of errors based on rough averages and casual research. Another general observation that our research s upports is that spreadsheet auditing is more difficult and limited than might have been anticip ated. We knew in advance that we would not be able to identify errors in problem formulati on or in the use of the model by auditing the spreadsheet itself, although these types of errors ma y be the most consequential. But even within the narrower domain of our audits we encountered limitations. First, the data used in most spreadsheets is undocumented and there is no prac tical way to check it. Even the original developer would have difficulty checking the data . Second, m any spreadsheets are so chaotically designed that auditing (especially of a few formulas) is extremely difficult or impossible. Finally, we have found that many spreadsheets do not have just a few key outputs but are used to calculate hundreds or thousands of results. This makes it di fficult to unam biguously measure the im pact of a particular error. One important generalization our work supports is th a t many errors are benign: they either have no impact on the results, the quantitative impact is very small, or the effect is on a vestigial portion of the spreadsheet that is no longer of inte rest. One can conjecture that this is the result of a sensible attitude toward errors on the part of spreadsheet developers. Perhaps developers look out for errors that impact the key outputs, and in general are good at correcting them. However, they pay less attention to inconsequential erro rs and therefore more of these survive to be observed. And, as we know from our intervie ws, many developers do not clean up their spreadsheets before they m ove on to other tasks. One factor that might explain the substantia l differences within and am ong companies in the quality of their spreadsheets is the degree of risk involved. We might hypothesize that companies 62 Impact of Errors in Operational Spreadsheets Powell, Lawson & Baker devote their best resources to high-risk spreadsheet s, and fewer resources to low-risk ones. (There is some evidence in a user survey we conducted that would support this conjecture. See Lawson et al. (2006).) We did not measure this feature of the spreadsheets we audited (this would certainly be difficult to do), but our impression is that no such correlation existed within our sample. For example, one of the best spreadsheet s we audited was designed to help with daily staffing of nurses and doctors to a medical prac tice. The spreadsheet wa s elegantly engineered, error-free, and easy to use. But little was at risk: an error in staffing at worst would assign the same person to adjacent shifts or to too many c onsecutive days. Errors of this type would almost certainly be caught, and their impact would be negligible. None-the-less, the spreadsheet was nearly perfect. By contrast, we also audited spr eadsheets in use in a major financial firm for calculating tax liabilities (measured in the billions ) to various state and national entities. These spreadsheets were astonishingly complex, difficult to understand, difficult to work with, and error-prone. So factors other than risk appear to explain spreadsheet quality. Another observation that helps to understand our r esults is that m any of the developers we worked with were not especially surprised or de vastated when we pointed out potential errors. Sometimes the reaction was that they knew the fo rmula “wasn’t quite right” but they saw that it gave the right answer and thus was acceptable. Sometimes the reaction was that the result was “close enough,” or that the result in question was no longer used, o r not important. So developers seem to have a sense of what level of accuracy is appropriate for a given spreadsheet. (It is another question entirely as to whether their pe rceptions are correct, and the spreadsheets are actually as accurate as they need to be.) An experienced auditor can rather quickly detect a spreadsheet that is likely to have errors. The major symptoms we observed of poor spreadsheet practice are the following: • Chaotic design • Embedded num bers • Special cases • Non-repeating structures • Complex formulas. Chaotic design refers to a poorly structured physical layout of the formulas and data. Num bers embedded in formulas, while not necessarily direct causes of errors, are strongly correlated with the presence of other problems. Special cases refe rs to designs in which similar results are calculated in slightly different ways, which requi res great care in building and checking formulas. Non-repeating structures includes designs in which the formulas in a row or column change structure repeatedly, precluding the use of copying and pasting. In the hands of experts, complex formulas can be used to great effect. But in th e hands of novices the same formulas can be error- prone. Why is spreadsheet practice sometimes so poor? We cannot know for sure, but we did gather some anecdotal evidence during our interviews. When asked what kept them from building better spreadsheets, our developers typically cite d one or more of the following reasons: • Time pressure • Organic design • Changing specs • Lack of testing • Lack of relevant knowledge and skills 63 Impact of Errors in Operational Spreadsheets Powell, Lawson & Baker Time pressure was the most often cited reason. Many spreadsheets are built under great time pressure, which precludes use of some of the mo st effective methods for avoiding errors. Managers of spreadsheet developers should be aware of the effects of putting their employees under excessive time pressure. Another commonly cited factor was organic design: either the spreadsheet design was inherited from a predecessor spreadsheet, or it grew organically during the project without ever consciously being designed. Another complaint was changing specifications: if the designer had only known fro m the start what the spreadsheet was going to used for, he or she could have designed it m ore appropriately. We also observed that very few of our developers used any formal approach to testing their spreadsheets; in fact, most of them did no testing as such. Finally, in some cases we coul d observe directly that the cause of an error was lack of relevant knowledge, either of the probl em domain or of spreadsheet tools. It was remarkable, however, how rare this cause appeared to be. Most developers could see quickly that a particular formula was an error, once we had pointed it out t o them. Only very rarely did we have to explain to them why it was an error, or how to fix it. Finally, we offer a comment on the importance to auditing, and to good spreadsheet usage in general, of good design. Our work makes us extremely conscious that a well-designed spreadsheet is simple , consistent , and general . Simplicity means a logical use of worksheets and a logical and intuitive layout of each individua l sheet. Simplicity makes building and auditing easier. Consistency means, for example, that a single formula can be written and then copied down an entire row or across an entire column. Su ch a formula can easily be checked. Rows or columns in which the formulas change structure constantly often hide errors. Generality means that the spreadsheet is built to handle all of the likely combinations of inputs that users will want to use. The opposite is a workbook in which i ndividu al cases are calculated separately, which makes it difficult to keep input s consistent across cases. 6. Summary and Future Research We have audited 25 spreadsheets from five organi zations. We identified cells or ranges that appeared to be problematic and discussed each one with the developer of the spreadsheet. For each issue that was classified as an actual error, we then identified the cell or cells affected and measured the absolute and percentage impact of correcting the error. Several conclusions emerge from this research: • The quality of spreadsheet practices differs s ubstantially among and within organizations. • Some individuals and organizations are cap able of developing essentially error-free spreadsheets. • Many spreadsheets are built in ways that violate good design practices. • Operational spreadsheets are highly complex and often poorl y structured. • Poor practices (such as hard-coding numbers in formulas) abound, but quantitative errors are relatively rare. • The quantitative impact of many errors is ne gligible or zero, or occurs in unimportant cells. • Devastating errors are rare. None of these conclusions should be taken as pr oven by the current research. Rather, they are suggestive hypotheses that should be refined through further research. 64 Impact of Errors in Operational Spreadsheets Powell, Lawson & Baker References Lawson, B., Baker, K., Powell, S., and Fost er-Johnson, L., “Spreadsheet Experience and Expertise,” submitted to Omega , November 15, 2006. Panko, R. (2005). “What We Know About Spreadsheet Errors.” http://panko.cba.hawaii.edu/ ssr/Mypapers/whatknow.htm , accessed September 2, 2006. Panko, R. and Halverson, R. (1996). "Spreadsh eets on Trial: A Survey of Research on Spreadsheet Risks." Proceedings of the 29th Annual Ha waii International Conference on Systems Sciences , pp. 326-335. Powell, S., Baker, K., and Lawson, B. (2006a). “A Critical Review of the Literature on Spreadsheet Errors.” Spread sheet Engineering Research Project working paper. Powell, S., Baker, K., and Lawson, B. (2006b). “An Auditing Protocol for Spreadsheet Models.” Spreadsheet Engineering Re search Project working paper. Powell, S., Baker, K., and Lawson, B. (2006c). “Errors in Operational Spreadsheets.” Spreadsheet Engineering Resear ch Project working paper. Purser, M. and Chadwick, D. (2006). "Doe s An Awareness of Differing Types of Spreadsheet Errors Aid End-Users in Identifying Spreadsheet Errors?" Pro ceedings of the European Spreadsheet Risk Interest Group Annual Conference, Cambridge, UK, pp. 185–204. Rajalingham, K., Chadwick. D., and Knight, B. (2000). "Classification of Spreadsheet Errors." Proceedings of the European Spreadsheet Risks Interest Group Annual Conference , Greenwich, UK. pp. 23-34. 65 Impact of Errors in Operational Spreadsheets Powell, Lawson & Baker Organization- Workbook # Issues # Err ors Erro with Non- zer Impact 730 6 7 410 000 6 11 600 4 2 2 110 2 610 310 22 1 4 000 100 44 2 200 200 000 000 100 117 70 rs o Maximum Percentage Impact Maximum Ab s o l u t e Impact 1 . 1 N A N A 1.2 50 5 28.8% $32,105,400 1.3 18 3 137.5% $110,543,305 1 . 4 N A N A 1 . 5 N A N A 2.1 19 5 3.6% $13,909,000 2.2 27 7 16.0% $74,000,000 2 . 3 N A N A 2.4 30 3 416.5% $10,650,000 2.5 40 2 NA 8.90% 3.1 19 2 5.3% $238,720 3 . 2 N A N A 3.3 11 2 15.6% $4,930,000 3 . 4 N A N A 3 . 5 2 N A N A 4.1 27 2 116.7% $13,355,445 4.2 8 2 141.8% $272,000 4 . 3 N A N A 4 . 4 N A N A 4.5 79 7 39.1% $216,806 5 . 1 N A N A 5 . 2 N A N A 5 . 3 N A N A 5 . 4 N A N A 5 . 5 N A N A Totals 381 Table 1 Summary of Audit Results 66 Impact of Errors in Operational Spreadsheets Powell, Lawson & Baker Figure 1. Distribution of Error s, by A bsolute Impact (n = 1 17) 0 5 10 15 20 25 30 35 40 45 50 % changes $0 $0<$10,000 $10,000<$100,000 $100,000<$1,000,000 $1,000,000<$10,000,000 $10,000,000<$100,000,000 >$100,000,000 Figure 2. Di stribution of Er rors, by Percentage Impact (n = 1 17) 0 5 10 15 20 25 30 35 40 45 50 NA 0% <5% 5%<10% 10%<20% 20%<10 0% >100% 67 Impact of Errors in Operational Spreadsheets Powell, Lawson & Baker Blank Page 68

Impact of Errors in Operational Spreadsheets

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment