Considering Functional Spreadsheet Operator Usage Suggests the Value of Example Driven Modelling for Decision Support Systems

Most spreadsheet surveys both for reporting use and error focus on the practical application of the spreadsheet in a particular industry. Typically these studies will illustrate that a particular percentage of spreadsheets are used for optimisation a…

Authors: Simon Thorne, David Ball

Considering Functional Spreadsheet Operator Usage Suggests the Value of   Example Driven Modelling for Decision Support Systems
Considering functional spreadsheet operat or usage suggests the value of Example Driven Modelling for Decision Support Systems: Thorne & Ball Page 147 Considering functional spread sheet operator usage suggests the value of Example Driven Modelling for Decision Support Systems S. Thorne, D. Ba ll University of Wales Institute Cardiff Sthorne@uwic.ac.uk Dball@uwic.ac.uk Abstract Most spreadsheet surveys both for reporting use an d error focus on the prac tical application of t he spreadsheet in a particular industry. Typically these st udies will illustrate that a particular percentag e of spreadsheets are use d for optimisation an d a further perce ntage are used for ‘What if’ an alysis. Much less comm on is examini ng the classes of functi on, as defi ned by the vendor, used by m odellers to build their spreadsheet models. This alternative analysis allows furthe r insight into the programming n ature of spreadsheets and may assist resea rchers in targetin g partic ular structures in spreads heet software for further invest igation. Furt her, underst anding the f unctional m ake-up of s preadsheets allows e ffective evaluation of novel appr oaches fr om a progra mming poi nt of view. It all ows greater i nsight int o studies that report what spreadsheets are used for since it is explicit which functional structures are in use in spreadsheets. We concl ude that a deeper understa ndi ng of the use of operators a nd the operat or’s relationshi p to error w ould provi de fresh i nsight i nto the sp readsheet err or problem . Cons idering functional spreadsheet operator u sage suggests the value of Example Driven Modelling for Decision Support System s Electronic Spreadsheets and business Research has sho wn that the use of spreads heets in organi sations is pe rvasive, use d in many i ndustries and for ma ny purposes. This pervasive use of sp readsheets i s reflected in a gradual increas e in spreadsheet error research; this is partly due to the sl ow realisation that spreadsheets, as end user tools, are error prone and that sprea dsheets are us ed in strategic business app lications Most End User Develop ment (EUD) academic surv eys such as (Davies 1987, Jenne 1996 and Taylor et al. 1998) ra rely focus on s preadsheets. It is therefore difficult to precisely say who uses spreadsheets and for what purpos e. However, there are some surveys that provide data on sprea dsheet usage specifically. Pemberton and Robson ( 2000) surveyed 227 respond ents from a mix of Private and Public sector organisa tions in t he UK. They found that o ver 80% of re spondent s were sprea dsheet users at some level. The activities spreadsheets were used for ranged from data storing to Decision Support Systems (DSS) implem ented in spreadsheets. The Spreadsheet Engineering Research Programm e (SERP) is an international resea rch effort aim ed at surveying spre adsheet usage across a num ber of la rge organi zations. It is not cu rrently com plete but has already su rveyed over 1300 par ticip ants in a number of organizations in the United States and th e United Kingdom. SERP (2005) found that spreadsheets ar e used predomin antly for alternative modeling ( what if analy sis) wit h 98% of resp ondents in dicating so. T his was cl osely follo wed by 90% using spreadsheets for data an alysis. The most comm on technique used in these models is statistical analysis (49% ) followe d by o ptimizat ion technique s (31%). Interestingly the results presente d by SERP (2005) differ signi ficantly to Pembert on and R obson (2000). The SERP data s uggests that m ost spreadsheet s are used for “what if” analysis usi ng statistical techniques, c onversely Pem berton and Robson ( 2000) foun d that stati stical analy sis was the least popular use for spreadsheets, the most popular being data sorting. Further, Branc heau and Wet herbe (199 0) found t hat the adoptio n of sprea dsheet applicati ons by use rs started in 1981 with VisiCalc and grew significan tly until 19 87 when the survey was completed . In historical terms 1987 marked the introduction of GUIs and subsequently ex cel was introduced in t he same year. Taylor et al. (1998) surveyed 34 UK organisation s and found that 85% of those organisations used spreads heets as their primary m ethod for EUD. Howeve r, since this study was aimed at EUD and not sprea dsheets, there is no furt her detai l on type o f use. Considering functional spreadsheet operat or usage suggests the value of Example Driven Modelling for Decision Support Systems: Thorne & Ball Page 148 In addition, there are case studies and audit ex pe riences written by both academics and practition ers. Such investigat ions are us ually dri ven by som e fina ncial loss or realisation that t he practice they are partaking in carries some significant risk . Two su ch cases are Fernandez (2002) and Gosling (2003) who both surveyed spreadsh eet usage and policy in their respective organisation s. Both found that spreadsheets are used as low-level data stores and for calculation of trivial items such as expenses. They also discove red that s preadsheets ha d become part of the IT i nfrastructu re of the organi sations and that the business woul d be seriou sly affected if t hese spreadsheet applications failed. This clearly shows a strategic reliance on sprea dsheets for decision analysis. Croll (2005) reports on the usage of spreads heets in the fin ancial m arkets in London . Croll fo und that there is great reliance on spreadsheets fo r modelling as the below quote from one of th e participant responses highlights. “Excel is utterly pervasive. Nothing l arge (good or bad) happens without it passing at some time through Excel” (Croll, 2005) The evidence provid ed so far in this section su gges ts spreadsheets as the most comm only used end user tool. There is also eviden ce to suggest that spr eadsheets are used in a strate gic manner (Croll 2005, Fernandez 2002, Gosling 2003). Croll’s ev idence comes in the form of quotes from participants studied. Both Fernandez a nd Gosling are ve rtical cas e studies in tw o different organisat ions. Fer nandez investigates an intern ational private company and Gosling a large National Health Serv ice (NHS) trust. The aim of both st udies was to establish the use of sprea dsheets in each organisation and examine the implications of their use. Bo th Fernandez and Gosling found th at spreadsheets were pervasiv e in the organisations and that s preadsheets were used in a strate gic manner. In some cases spreadsheets were used to m ake decisions on how an enti re departm ent was run. Sp readsheets had al so become part of the Informat ion System s Architecture, rem oving dat a from t he corporate syst em, ma nipulating it and then re entering it into the Corporate system. Clearly this pr actice of merging validated and un-validated data is undermines the integrity of corpor ate systems. Spreadsheet Errors The first study into spreadsheet er ror was condu cted by Brown and Gould for IBM in 1987. This study took 9 expe rienced sprea dsheet devel opers and ex amined their perform ance when asked to create a number of s preadsheets f rom scratch. They found that 44% of the spreads heets developed contained errors such as mistyping formul ae. This study was condu cted because the au thors had noted that business sprea dsheet usage had boomed and that it had been suggeste d that spreads heets might co ntain errors. Since this original pape r, there have been many st udies in spreadsheet error and the statistics reported from these stu dies varies fr om 30% to 100% of m odels wit h errors. Table 1 depicts som e experim ental studies with relevant err or rates. Author and Year Percentage of models with errors Hicks and Pank o, 1995 91% Javrin and Morrison, 1996 84% KPMG, 1997 91% Panko and Halver son, 1997 80% Javrin and Morrison, 2000 95% Table 1 Spreadsheet error rates These statisti cs have led ot her researchers t o inves tigate spreads heet error in more depth, asking a variety of pertinent questions: what causes er rors, ho w many types are there, how can e rrors be reduced or removed. T hus far m ost spreadsheet e rror research st udies have co nsidered error from a gene ral point of vi ew, i.e. a f ocus on the number of models pr oduced wi th error o r the average number of e rrors per cell. Whilst this information is useful for estimating the extent of error in certain domains it do es not necessarily tell us why error co mes about or what causes the erro r. Research into taxonomies of error such as ( Panko and Hal verson 1998, Teo a nd Tan 19 99 and Ayal ew et al 2000, R ajalingham et al. Considering functional spreadsheet operat or usage suggests the value of Example Driven Modelling for Decision Support Systems: Thorne & Ball Page 149 2000, Rajali ngham 200 5) have all examined and defined e rror types i n varyi ng levels of detail. The se error types are not specific to the actual ap plication, rathe r they are ge neric. Limitations of current spreadsheet research General survey s of spreads heet error ha ve traditional ly focused o n the end user products, i .e. the studies are themed accordi ng to the final applicati on of the technology. They are often written from a management point of view, highlighting inadequacies in policy or pr actice and attributing these inadequacies to poor spreadsheet quality. Typical o utput would be the percentage of spreadsheets used in the accounting industry togeth er with the percent used for optimisation. Whilst this serves a purpo se, it does not s hed any ne w light on the nature of spread sheet error or what causes spreadsheets to be s o error prone. An alterna tive view of re porting s preadsheet e rror is to e xamine the p rogramm ing structures that spreadsheets ar e composed of. Pr ogramming structures i n spreadsheets consist of formulae constructed utilising bu ilt in vendor operators. A deeper understanding of the use of operators and the operators relationship to error would provide fresh insigh t into the spreadsheet error problem. Within spreadsheet software, there are a number of vendor defined classes of functio n, each funct ion contains various operators that relate to the cl ass they are a member of. Functional Classes in Spreadsheet Software As defined by t he vendor M icrosoft, t here are 11 classes of functi on offered with the stan dard Excel spreadsheet software. Excel is chosen since it is the m ost commonly used spreadsheet application according to Walchenbach (2005). Walchenbach stat es that Excel now a ccounts for 90% of the spreadsheet market, althoug h it is difficult to determin e the exact number of Excel users, in 1997 alone Microsoft s hipped ove r 70 mil lion copies o f Excel 97. These classes contain operators to be used in form ulae expre ssions and are grouped acc ording to t heir actual purpose. The 11 classes co ntain varyin g amounts of operators rangin g from 5 to 78 ope rators in a class, offerin g a grand tot al 343 uniq ue operator s. The 11 class groupings are shown in table 2. Class Name Number of operators Database 12 Date and Tim e 20 Financial 53 Engineering 39 Information 18 Logical 6 Look-up an d Reference 17 Math and Tri gonom etry 60 Statistical 78 Text 35 External linking 5 Table 2 Excel function classes Studies of function al usage As previously mentio ned, there have b een very few documented studies of operator function ality usage. The studies that do ex ist offer some insight but often lack detail or are a minor aspect of a larger study. Chan and Stor ey (1996) surveyed 256 analysts using Lotus 123 on the functionality of spreadsheets used, see figure 1. The survey wa s based upon a Likert scale (1 being never and 5 always).Th e participant s indicated h ow often they use a particul ar class of funct ion in thei r spreadsheet and that was rec orded on a Likert scale. For exampl e if they never use d the Goal See k function, t his woul d be recorded as a 1 on the Li kert scale. The ma in findings of this st udy, see figu re 1, show that m athematical and statistical functions are the most frequently used an d that goal seeking is the least used . However, since this study was conducted on Lo tus 123 users, the fun ctional classes are different to that of Excel. Unfortunately the vendor Lotus were unable to prov ide a detailed functiona lity listing for Lotus 1-2-3. Th e difference Considering functional spreadsheet operat or usage suggests the value of Example Driven Modelling for Decision Support Systems: Thorne & Ball Page 150 between Lotus and E xcel makes direct com parison difficult , e.g. som e operators i n Excel are not supported in Lo tus 1-2-3 an d vice-versa. Figure 1 Chan and Storey (1996) Fre quency of spreadsheet operator use Ballinger et al. (2003) presented spreads heet functional data collected from 259 Excel work books used to record student marks in a University. Figure 2 s hows the results of the survey, in this case the data shows how many oper ators of a function type were used, i.e. ther e were 751 logical operators in th e 259 workbooks. Figure 2 (Ballinger et al. , 2003) Frequency of sp readsheet oper ator use The data clearly shows that logical and m ath functio ns are used more extensively than ot her classes. Indeed, most classes were not used in the worksheet s which suggest that the sample may be biased due to the specific app lication. The results of this study concur with that of Chan and Storey (1996) t o som e extent. Both st udies identify that mathematical functions are used extens ively. However, it is uncle ar if Chan an d Storey (1996) include the L ogic operator s in their mathem atics class. Through Pri vate Comm unication wit h Barry Lawson of T he Tuck m anagem ent school D artmout h College in the US, further data regarding functiona l utilisation was obtained . The data was extracted Spreadsheet function usage 1 1.5 2 2.5 3 3.5 4 4.5 5 0 2 4 6 8 10 Functions Uptake (1 Never, 5 Always) Math/statistical functions Macros Graphics Data import/export What-I f Linking Financial functions Database functions Goal seeking Considering functional spreadsheet operat or usage suggests the value of Example Driven Modelling for Decision Support Systems: Thorne & Ball Page 151 from a ‘base of knowledge’ gathered at the college via the Spreadsheet Engi neeri ng Research Project (SERP). The data was based upon 35 randomly se lected spreadsheets that were submitted by the schools alum ni. The result s of the st udy are presente d in fig ure 3. Figu re 3 suggests t here is a disproportio nate am ount of Financia l and Sta tistical function usa ge, given t he other stu dies. One possible reaso n for this a pparent bi as is that t he data was e xtracted from the Tuck sc hools al umni who had all studi ed financi al managem ent in som e sense on predom inantly MBA programm es which traditionally contain fina ncial managem ent aspects. “I would only obs erve that because we are a business school, one might expect th at our contact s may be biased i n that directio n - hence perhaps a lar ger than ra ndom number of financial spreadsheets” Barry Laws on, SERP, Tuck Scho ol of Ma nageme nt (2006) Excel f unct i on usage by % 8.6% 31.4% 65.7% 11.4% 82.9% 31.4% 100.0% 60.0% 11.4% 0.0% 20.0% 40.0% 60.0% 80.0% 100.0% D a t a ba s e Da t e a nd Tim e Fi na n ci a l I n f o r m at io n Lo g ic a l Lo oku p & R e fer en ce M a th S t a t i s t i c al T e xt Figure 3 (SERP, 2006) Spreadshee t function use by percentage The results of t he data we re arrive d at by exam ining each of the 35 sprea dsheets and det ermini ng what functions each spreadsheet c ontained. F or exam ple the Math class was use d by 100% of spreads heets in the survey, i.e. all 35 of th e spreadsheets exam ined used one of more mat h functions. This does suggest some confusion in interpreting figure 3. For example if 65.7% of spreadsheets c ontain one example of a fi nancial op erator then we get the bar on the figu re as shown. On the other hand if t here are 100 exam ples of all 53 financial o perators in 6 5.7% of sprea dsheets then we would appare ntly get the same bar resulting. However, this study is more comprehensive than either of the previo us studies, nevertheless it does share so me commonality in the results. All three stud ies (SERP 2006, Ballinger 2003 and Chan and Storey 1996) identify that Math functions are used extensively in sp readsheets. Both SERP (2006) and Balling er (2003) identify that Logical functio ns are used extensively. If we consider the possibility that Chan and Storey interrupt logical operators as part of the math class, this further reinforc es this theory. For further i nformat ion regardin g SERPs re search visit http://mba.tuck.dartmouth .edu/spreadsheet/ind ex.html Conclusions of function al usage a nalysis From the dat a available one mi ght hypothe sise a researc h question: “Are t he majority of functi ons used in spreadsheets eith er of the math or logical cl ass?” Since the SERP (2006) data offers the most comprehensive data, it is the best indicator as to the proporti ons of fu nction classes i n spreadsheets. This data sugg ests that Math functions appe ar in near to 100% of sprea dsheets and that Logic functi ons appear in around 80 % of all spreadsh eets, however this does not indicate how many of each type occur. One possibility is that modellers may prefer to build their own bespo ke models via simpler operators Considering functional spreadsheet operat or usage suggests the value of Example Driven Modelling for Decision Support Systems: Thorne & Ball Page 152 rather than utilise some pre defined operator. Further, possibly a series of statements built with simpler operators may suggest a n increased us e of logical operator connectives. This theory is supporte d by Napi er (1989 an d 1992) wh o demonst rated that not only di d spreads heet modellers use very little of the functionality in spreadsheets but were largely u naware of much of the functionality on offer. If we apply th is theory to the rest of the data contained in this survey, it further supports the notion that sim ple sp readsheet operators are m ore usef ul than specific pre defined functions. Thi s may expl ain why Ba llinger et al. ha d a high num ber of Ma ths and L ogic functi ons and no others exce pt a minori ty using St atistical fu nctio ns. This suggests that the spreadsheet m odellers utilised simple tools to build complex models without utilisin g pre-defined structures. When conside ring n on progra mming err or reductio n methods, t he signi ficance of these finding s for strategy formulation maybe substantial. Considering spreadsh eets in terms of functionality may allow more accurate risk analysis which i n turn could allow m ore effective appl ication of controls to minimise such risks and improv e quality. Indeed, further work should be conducted to crucially examine which structures within the functionality cla sses are the most prone to error. This data could then allow easier identification of spreadsh eets that carry a higher risk since th ey are m ore error prone. This would t hen allow m ore inform ed decisions to be m ade in term s of risk m anagement , spreadsheet use and auditing . One novel approach to reducing spreadsheet errors optimises on the use of logical operators (Tho rne and Ball, 2 005) A novel approach spreadshee t error management We have considered a novel approach to spreadsh eet error management at UWIC (UK) c alled Example Driven Modelling (EDM). This requir es the user to produce exam ples of attribute classifications (see below) which then deduce th e functio n of those e xamples and ge neralises to new unsee n examples. EDM uses machi ne learning techniques a nd research t o date suggests this results in a more accurate spreadsheet. Machine learning, in the context of EDM, is b est described as the ability to adap t and extrapolate patt erns in data as define d by Russel and Norvig (2 003). In parti cular, Neural Networks can be used in exa mple att ribute classifications of data. For example, the use r provides sim ple examples of the problem data. This dat a is then fed into th e learning m achine an d it pro duces an e quivale nt model of the probl em. Thorne et al. (2004) discussed a n experim ent to te st the relative levels of accuracy gained from both tra ditionally modelling a form ulae and utilising an EDM approach, over successively more difficult problems. The results of this study f ound that producing the formulae with the tradition al method was error pro ne (80% of m odels wit h error). The results o f the EDM method y ielded a m uch lower error rate (2% of models with er ror). Example Driven Modelling The basic premise of E xample Driven Modelling is the concept that the user pr ovides exam ple data of the problem they wish to solve. The exam ple data co m prises of attribute classifications for the problem they wish to m odel. This exam ple data set is the n used to trai n a Neural Network an d hence gene ralise the problem, see Figure 4. Figure 4 Example Driven Modelling concept To demonstrat e how this pr ocess works i n practice, an example of EDM in execution will be presented. This example will extend from the construction of the data set to the performance results gained from the network aft er testi ng. The exam ple probl em is ta ken from an instructio n book on how to im plem ent Decision Support Systems (DSS) i n Excel (Gross et al. , 2006). The exampl e takes the f orm of a Cre dit Risk DSS that is entirely contained i n one spreads heet. User Produce Example data set Fed into Neural Network Learns General model Considering functional spreadsheet operat or usage suggests the value of Example Driven Modelling for Decision Support Systems: Thorne & Ball Page 153 Credit Risk Decisi on Support System This DSS is used to assess the credit worthiness of poten tial clients. The model classifies the applicants in to one of three possible classifications. Th es e are: Accept; Further En quire and Reject. These decisions are based upon equations that evaluate th e classification based upon a num ber of variable inputs. These variable inputs are used as key iden tifiers as to the businesses worthiness for credit. These variables include: Current Yea r ’s sales, previous debt balances, Net wo rth and a number of risk class indexes. Each classification has an ass ociated rule that it mu st pass to satisfy the cla ssification. For e xample, Rule 1 demands that Inpu t 1 (Previous debt balances) is less than or equal to 10% of Input 2 (Cur rent Year’s Sales). If all of the conditions are True, the applicant p asses on to the next rule for assessment. If the rule Fails, the applicant is given class 3 and classification equals reje ct. The rules increase in complexity for each class, they requ ire m ore variables with m ore complex relationships to be satisfied. All of the rules and variab le cond itions are presented in Figure 5. When using EDM the first task that must be completed is the gen eration of the example data set. This data set must cover all rules and all condition s that make up these rules. The simplest way to do this is to examine each rule i n turn and consider what valu es satisfy or re ject the rule’s classification. Once the paramet ers of the model have been de fined a n example data set can be con structed a round the true and false classifications of the m odel. The example data set appears as a set of values nee ded to satisfy or decline classifications in t he model. For example, if we examine Rule 3 (Classification 1) in Figure 4 it has the following con ditions to be true: To satisfy Class 1, it is necessary that Va riable 3 (Net worth) is greater than or equal to 50,000 AND Variable 4 (D&B Credit Index) is greater than or equal to 2 AND Variable 5 (D&B Paydex index) is greater than o r equal t o 70 AND V ariable 6 (D&B Stress cl ass index) i s equal to 1. Based upon that statement the user m ust construct a ttribute classifications that both sati sfy and reject that rule. For e xample, t he user coul d constr uct an example where the Net Worth val ue equals 66,000, satisfying the first part of the rule. They would also need an example whe re Net Worth fails the rule i.e. a value of belo w 50,000 which woul d reject t his part of t he rule a nd the classifi cation as a wh ole. This is of course onl y part of the rule , there are ot her clauses wh ich need to have the sam e treatme nt. The complete set would com prise of a case of exam ple values with the appropriate classifications that would be awa rded by t he model. Credit Risk C lassifi cation Fl ow Ch ar t Ye s No Ye s No No No No No Ye s Pa ss R ule 3 , C l ass = 1 Fa il Rule 3 , C la ss = 2 Is I n put 3 > = 500 000 AN D Inp u t 4 <= 2 AN D Inp ut 5 > = 70 AND Input 6 = 1 Fa il Rule 2 , C la ss = 2 Is In pu t 4 >= 1 OR I n p ut 5 > = 9 0 Fa il R ule 1 , Cla ss = 3 Is Inpu t 1 < = 1 0 % of Input 2 Figure 5 Example risk classificati on system, gener ated from Gross et al. (2006) R1 R3 R2 Considering functional spreadsheet operat or usage suggests the value of Example Driven Modelling for Decision Support Systems: Thorne & Ball Page 154 Once the example data set is constru cted these values are then fed into a neural network so t hat the network can l earn the di mensions of the pr oblem based upon the pa rameters provided in the data set. An excerpt from the example data set used in thi s problem can be seen i n Table 3, see ap pendix. CYS PDB NW D&B C D&B P D&B S C1 Coded Var / example No. Input 1 Input 2 Input 3 Input 4 Input 5 Input 6 Classification 1 11000 500 50000 4 15 3 2 2 10000 2000 45000 3 20 3 1 3 27000 1000 75000 2 70 2 2 4 30000 2000 85000 1 85 2 2 5 45000 5000 69000 2 92 2 2 6 31000 500 77000 1 96 2 2 7 180000 200 0 500000 2 79 1 3 8 210000 460 45000 0 1 72 1 2 9 100000 500 0 505000 3 76 1 2 10 600000 20 00 700000 1 69 1 2 Table 3 Example Training Set Training, testing an d blind se ts in Neur al Networks During training and testing there ar e three fundamental components that allow the user to train and test a network accurately. The training set is used by th e network as the source to learn the given problem . A training set, consisting of examples of input data for which the outpu t is known, is presented to the network. The test set consi sts of exam ples not use d in traini ng but avai lable to the networ k during training for cross validation. Th e blind set consists of examples that are completely un seen to the network and used in pa rt to de termine overall accuracy of t he network give n the universe of possibilities. The universe contains all po ssible training, test and blind sets. Results Once the mode l has learnt t he proble m, it out puts the Mea n Squared E rror (MSE) of the n etwork (Chi squared). This value indicates ho w well the model has learnt the task and hence how well it will perform in t esting. MSE i ndicate s the differe nce betw een t he training set and the act ual output, i.e. a comparison bet ween the netw ork output s and the kn own outputs. Blind testing is the best ab solute test, i.e. passing unseen data through th e network and checking th e classifications i t gives on that basis. The trai ned networ k is given new exam ples and is assessed o n how well it classifies those examples. Below in Figure 6 the results of the blind testing are displayed. In this blind test 25 unseen exam ples were passe d throug h the trained network. T he network values were t hen compared with the actual classifications that woul d be outp ut given t he input pattern. As can be observed, the network in t his instance ga ve a hig h level of accuracy with no misclassifications. There are some values that diverge from the actual classi fications but these value s fall within the accepta ble class value range that stop it from being misclassified. Considering functional spreadsheet operat or usage suggests the value of Example Driven Modelling for Decision Support Systems: Thorne & Ball Page 155 Test i ng dat a out put 0 0. 5 1 1. 5 2 2. 5 3 3. 5 0 5 10 15 20 25 E x a m pl e num be r C la s s ific a tio n De s ir ed Ac t u al Figure 6 Blind testing results Conclusions The limitations of current spre adsheet research su ggest t hat a deeper understanding of the use of operators a nd the ope rators re lationshi p to error woul d provide fresh insight in to the spreadsheet error problem . The proposed method (E DM) perf orms well i n business pro blems that are based up on logic in decisi on making. Beyond this domain, EDMs usefu lness diminishes; it really requires some sort of decision process that incl udes a final c onclusion. In the presented e xample, the accur acy of the network was m ore than satisfactory, judging from the blind testing results, this model is un likely to misclassify any examples presented to it. Research shows (Thorne et al., 2004) that althoug h the process of generating examples is a very novel approach for users, it is actua lly easier than creating the equi valent formul ae by program ming a spreadsheet. A weakness of EDM is t hat the model must be understood completely for the technique to be implemented effectively. However, in compensation, generating typical examples in norma lly a very easy task. Further research is require d but ( as pr esented ab ove) EDM a ppears to be more reliabl e than convention al spreadsheet methods particularly for th e spreadsheet application subset: de cision supp ort systems. Where domain knowledge is less sure the valu e of EDM maybe uncertain. Domain knowledge is often the Achilles h eel of automated spreadsheet tools. There may well be othe r challenge s facing users of such a novel appr oach. In particul ar Base Er ror Rate (BER) c ould be a seri ous problem for users wishing to adopt this t echnique. The research shows that BER is most prevalent when the user is co mpleting simple tasks in large vo lume. An example often quot ed is copy ty ping, where t he subject has t o copy l arge am ounts of text . This process y ields a certain BER since it is repetitive and contains signif icant volume. Since the process of modelling using Acceptable class value range Considering functional spreadsheet operat or usage suggests the value of Example Driven Modelling for Decision Support Systems: Thorne & Ball Page 156 EDM requires the user to create repetitive data sets based upon the rules of a problem, EDM m ay be at risk from a significant BER level. This paper has demonstrated that a novel a pproach that has an appreci ation for Hum an Factors can be executed successfully to replace a D SS implemented in spreads heets. APPENDIX Learning in Neural Networks There are a nu mber of int roductory t exts on Ne ural Netwo rks (NN), ( Haykin 199 9, Rumel hart et al. 1988, Principe et al. 1999). Briefly the proced ure is as follows. A processin g unit takes a number of input signals, x1 ,...,xn with cor responding weig ht s w1...,wn, respectively. These values are passed through the network t o give an output w hich is the n com pared to the trai ning set provided by the use r. The network then adjusts the weights in an attempt to mi mic the input/outp ut pattern of the training set. This process is repeated until the network reaches som e predeter m ined level of accuracy. This allows the network to become m ore and more accurate and hence the network learns the proble m. The neuron will only be fired if the threshold function (T) is satisfied and is governed by this equation: X1W1 + X2W2 + … + XnWn > T There are m any diffe rent par adigms and algorithm s for lea rning in ne ural netw orks. The most comm on is Backpropogation (supervised learning) since this offers the greatest generality (Haykin, 19 99). The process of su pervised lear ning follo ws the subseq uent seque nce: 1. A training set, consisting of examples of input data for which th e output is known, is presented to the network. 2. The network weigh ts are adjusted u ntil the netw ork produces results that are in agreement with the training set. 3. The network c an then ge neralise to unsee n exam ples in the universe In addition to the backpropogation rule, ge netic Optimisation is us ed in the inpu t space for this example. GO is used in conjunction with NN to optimise the “Input space” of th e problem. This has proved part icularly useful w hen there is a limi ted amount of data a vailable for t raining (Chang an d Lippmann, 1990). The Genetic Algorithm seeks the best combination of inpu ts, i.e. the combination that gives the best accuracy. Considering functional spreadsheet operat or usage suggests the value of Example Driven Modelling for Decision Support Systems: Thorne & Ball Page 157 References Ayalew, Y., M. Clermont, R. Mittermeir. (2000), ‘Detecting Errors in Spreadsheets’ , Proceedings of EuSpRIG Sym posium , EuSpRIG 200 0 Symposi um, Spreadsheet Risks—the Hi dden Co rporate Gamble. (pp. 65-76). G reenwich, Engl and: Greenwich University Press. Ballinger. D ,Biddle. R ,Noble . P, (2003) , ‘Spreadsheet structure inspec tion using low level access and visualisation’ , Proceedings of the Fourth Australian user i nterface confere nce on User interfaces 2003, p.91-94, February 01, 200 3, Adelaide, Australia Brancheau. J and Wetherbe. J, (1990), ‘T he adopti on of spre adsheet softw are: Testing Innovatio n Diffusion Theory in the cont ext of End User Developme nt’ , Inform ation systems researc h, 1 (2), pp 115-142 Brown. P and Gould. J, (1989), ‘An experi mental study of pe ople creating spreads heets’ , ACM transactions o n inform ation syst ems, 5 (3 ), pp253-2 72 Chan and Storey (1996), ‘The use of spreadsheets in organisations: de terminates and conseq uences’ , Informat ion and Ma nagement, 3 1, pp 119 -134 Chang, E.I. a nd Lippm ann, R. P. ( 1990), ‘Using Genetic Algorithms to improve pattern classification performance’ , NIPS 1990, pp 797-803. Croll. G, (2005) , ‘The importance and criticality o f spreadsheets in the city of London ’ , Proceedings of EUSPRIG 2005 – Man aging spreadsh eets in the ligh t of Sarbanes- Oxley, London, UK, pp 8 2-94, ISBN 1-902724-16-X Davis. B, (1987), ‘Commentary on Infor mation systems’, Accounting horizons, 43 Fernandez, K. (2003), ‘In vestigation a nd Managem ent of E nd User Com puting Ri sk’, Not pu blished, MSc thesis April 2003. Available from University of Wales Institute Card iff (UWIC) Business School Gosling. C, (2003), ‘To what ext ent are syst ems desig n and devel opment techni ques use d in the production of no n clinical corporate sp readsheets at a large NHS trust’, Not published, MBA thesis May 2003, available from: University o f Wale s Institute Cardiff (U WIC ) Business School. Gross. D, Akaiwa. F, Nor dquist. K, (2006 ), ‘Succeeding in business with Microsoft Office: Excel 2003 a problem solving approach’ , Thompson Course Tec hnology, C anada, IS BN 0-619- 267-40- 2, Page 254 Haykin. S, (1999), ‘Neural Networks a com prehensive found ation’ , Prentice Hall publishers, 2 nd Edition, New York, ISBN 0-13-9083 85-5 Hicks and Pank o, (1995), ‘Cap ital Budgeting Spr eadsheet Code Inspection at NYNEX’, Internet http://panko.cba.hawaii.edu/ssr/Hick s/HICKS.HTM , 12.1.05, 12.00, Available. Janvrin. D and Mo rrison. J, (1996), ‘Factors I nfluencing Risks and Outcomes in End-User Development’ Proceedings of the Twe nty-Ninth Ha waii International C onference on Systems Sciences, Vol. II, Hawaii, IE EE Computer Society Press, pp. 346-355. Janvrin. D and Morrison . J, (2000), ‘Using a structur ed design approach to reduce risks in End User Spreadsheet devel opment ’, Informati on & management, 37 , pp 1-12 Jenne. S, (1996), Audits of End User Computin g’ , Internal Auditor, 53 (6), pp 30-35 KPMG, (1997), ‘Supportin g the Decision Maker - A Guide to th e Value of Business Modeling’ press release, July 30, 1997. http://www.kpmg.co.uk/uk /services/manage/press/970605a.html , Unavailable Considering functional spreadsheet operat or usage suggests the value of Example Driven Modelling for Decision Support Systems: Thorne & Ball Page 158 Lawson. B, ( 2006), Privat e Comm unication wi th Barry La wson of the T uck Managem ent School of Dartmouth C ollege in t he US, as Figu re 3. Napier. A. Batsel l. R. Lane. D. Guadagno. N., (1992), ‘Knowl edge of comma nd usage in a spreadshee t program’, Databa se, 23 (1), pp 13-21. Napier. A. Lane. D. Batsell. R . Guadango. N., (1 989) , ‘Impact of restricte d natural language inte rface on ease of lear ning and pr oductivi ty’, Comm unications of the AC M, 32 (1 0), pp 11 90-1198 Panko. R and Halverson . R, (1997), ‘ Are Two Heads Better th an One? (At Reducing Errors in Spreadsheet Modelling?’ Office Systems Research Journal, 15 (1), pp. 21-32. Pemberton. J a nd Robs on. A, (20 00), ‘Sprea dsheets in business’ , Industrial managem ent and systems, 100 (8), pp 37 9-388 Principe. J.C., Euliano. N. R., Lefebvre. W.C., (1999). ‘Neural and adaptive systems: Fundam entals through simulation ’ , John Wiley and Sons, New Y ork, ISBN 0-471-351 67-9 Rajalingham K.; Chadwick, D.; Knig h t, B. & Edward s, D., (2000), ‘Quality Control in Spread sheets: A Software E ngineering- Based Approac h to Spreads heet Devel opment’ , Proceedi ngs of the 33rd Hawaii International Conference on System Sciences, Maui, Hawaii. Rajalingham. K., (2005 ), ‘A revised classification of spreadsheet errors’ , Proceedings of EUSPRIG 2005 – Ma naging sprea dsheets in the li ght of Sarba nes-Oxley, Lo ndon, UK, pp 185-200, ISBN ? Rumelhart. D.E., McCe lland. J.L., (1988), ‘Parallel distributed Processing’ , T he MIT Press, Cambridge Massachusetts, ISBN 0262-1 8120-7 Russel. S and Norvig. P, (2003) , ‘Artificial Intelligence – A Modern Appro ach’ , 2nd Edition, Pearson education inc., New Jersey, ISBN 0-13-080302-2 SERP, (2005), ‘Spreadsheet Engi neering Res earch Proj ect’ , Tuck B usiness Sch ool, Dartm outh College, USA Taylor. M, Moynihan. P, Wood-Harper. T, (1 998), ‘End User Computing and information systems methodolo gies’, In format ion system s Journal, 8, pp 85- 96 Teo.T and Tan. M, (1999), ‘Sp readsheet development and ‘wh at if’ analysis, quantitative versu s qualitative error’ , Accounting managem ent and Inform ation Technol ogy, 9 (3), p p 141-160 Thorne. S and Ball. D, (2005), ‘The relevance of Hu man Factors t o the mana gement of s preadsheet risks’ , INFORMS 2005, Sa n Francisco Thorne. S. Bal l. D. Laws on. Z., (2 004), ‘A novel appr oach to s preadsheet fo rmulae prod uction and overconfidenc e measurement to reduce risk in sp readsheet modelling’, Proceedi ngs of EUSPRIG 2004 – Risk red uction in End User Computi ng, Klagen furt, pp 71-85, ISB N 1 90272 4 94 1

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment