Spreadsheet Components For All

Spreadsheet Co mponents for All Jocelyn Paine Proce edings of EuSp RIG 2008 Confere nce "In P ursuit of Spreadsheet Excelle nce" ISBN : 9 78-905617-69-2 Copyright © 20 08 Europ ean Spreadsheet Risks I nterest Group ( www.eusprig.or g ) and Author 109 S preadsheet Compon ent s For A ll Jocelyn Paine www.j-paine.org/ and www .spreadsheet-parts.org/ popx@j-paine.org ABSTRACT We have pro totyped a " spreadsheet co mponent rep ository" Web site, f rom which users can co py "compo nents" into their own Excel o r Google spreadsheets . Compo nents are collections of cells contain ing formu lae: in rea l life, they wou ld d o u seful calc ulations that many prac titioners fin d hard to program, a nd would be rigorou sly tested and documen ted. Crucially, the use r can tell the repository which cells in their spreadsheet to use for a compon ent's in puts and outputs. The repository will then reshape the comp onent to fit. A sing le compo nent can therefore be used in many d ifferent sizes and shapes o f spreadsheet. We ho pe to set u p a spread sheet equiva lent of the high-qua lity numerical sub routine libraries tha t revolutionised scien tific computin g, but where instead o f subroutines, the library contain s such compon ents. NOTE You ca n tr y a demonstration o f the r epositor y for Google Spreadsheets via the W eb form at [Paine 2008b] , www.j-paine.org/cgi- bin/google/filter_ form.py . It is e xplained a t [Pa ine 2008 a], www.j- paine.org/do bbs/spreadlets.ht ml . 1. INTRODUCTION In everyday usage, a " component" is a part, usually manufactured. But the word often carries another implication. To an electronics engineer, a component is something such as a r esistor, t ransistor or integrated circuit t hat you select from a catalogue. T he catalogue documents t he component's specification precisely enough that you are in no doubt abou t how to use it. If, having bought it, you find it fails to meet the spec, you can demand a replacement: in other words, t he component comes with a guarantee. Should you need more components of a similar kind, you can buy them as readily as you bought the first: you're entitled to presume that if a supplier stocks a 5.6KΩ resistor, you'll have no t roub le obtaining other standard values su ch as 4.7KΩ an d 6.8KΩ. Finally , a component is manufactured via pr ocesse s you probab ly can't invoke for yourself, usually involving extreme temperatures, exotically toxic metalloids, and abstruse quantum-physical calculations. This, mutatis mutandis , i s a spreadsheet component. Something t hat is to become part of your spreadsheet — i.e. a group of cells; that performs a calculation you neither know nor care how to program for yourself; that is comprehensively and comprehensibly documented; that has been t ested so rigorously that the supplier feels safe i n offering a no-quibble support agreement; and that admits all re asonable variations. Meaning that if its job is (say) to rem ove duplicates from an input range, you're enti tled to presume that if it can accept rows as inputs, it can also accept colum ns as inputs, and probably any rectangle of cells on any sheet. In the rest of th is paper, I de scribe a prototype Web site for delivering such components to Google Spreadshee ts [Goog le b] and Excel: a " spreadsheet parts r epository ". For Excel, i t sends components as files representing groups of cells that c an, with a suitable Spreadsheet Co mponents for All Jocelyn Paine Proce edings of EuSp RIG 2008 Confere nce "In P ursuit of Spreadsheet Excelle nce" ISBN : 9 78-905617-69-2 Copyright © 20 08 Europ ean Spreadsheet Risks I nterest Group ( www.eusprig.or g ) and Author 110 VBA (Visual Basic f or Appl ications) program, be c opied into your Excel spreadsheet. For Google Spreadsheets, you identify your spr eadsheet by filling in your Google Docs details and the spreadsheet name in a f orm . The repository will then copy t he cell groups directly into that spreadshe e t. You can try th is via the form a t [Paine 2008b]. Crucial is t hat although the repository sends components as groups of cells, it does not store them as such groups. Instead, it stor es them a s "templates", im plemented in the Excelsior s preadsheet- generator language I desc ribe in Sections 2.2 and 2.3. From t hese, it can generate many differently- shaped cell groups. This means that you can tell the repository which cells in your spreadsheet to use as a component's inputs and outputs. It will then apply this info rmation to the appropr iate template, in ef fect reshaping it to fit. As far as I know, t he Web-based repository I describe in this paper i s the first such system. T here are many other programs and utilities for generating Excel files, but none appear to have been use d to deliver components in the way I propose. Probably the most important task r emaining is to discover what components a re most likely to benefit Excel users; and then, of course, t o i mplem e nt them. Then, I hope, the repository will do for spreadsheeting what high-qual ity numerical subroutine libraries such a s the Numerical Algorithms Group [NAG] di d for scientific co m puting. I also hope i t will be possible to find permanent funding so that the repository w ill be free for a ll users. 1.1 Content of this paper The next section gives background on the nature of components. Section 3 describes how the proto type Web site was implem ented. Section 4 is a brief evaluation. Section 5 proposes future line s of study and implementation. Section 6 is a brief conclusion. Section 7 lists re ference s and links. Th e Appendices contain technical material that may not interest all r eaders: examples of Excelsior for specifying and documenting components. Appendix 1 lists the Excelsior source of the component described in Section 2. Appendix 2 is a longer e xample showing how an unusual s preadshee t, one t ha t generates science-fiction plots, can be specified and documented. The point is to show Excelsior-style documentation at work on a task that al most all readers will find unfamiliar. If the documentation makes i t clear to you, the appendix will have hel ped to show that Excelsior is a viable documentation tool. Appendix 3 e xplains just enough of the Excelsior languag e for you to mak e sense of Appendices 1 and 2. 2. BACKGROUND In this se ction, I expl ain what I mean by " spreadsheet component" (Section 2.1); how they would be used (Sectio n 2.2); and how the repos itory stores them (Sec tion 2.3). 2.1 What do I mean by "s preadsheet component"? In the I ntroduction, I ta lke d i n very general terms about co m ponents. It's now time to examine the idea in more detail, and I shall describe what a typical component might look like, why it's the kind of thing wort h making a vailable, and why user s might find i t h ar d Spreadsheet Co mponents for All Jocelyn Paine Proce edings of EuSp RIG 2008 Confere nce "In P ursuit of Spreadsheet Excelle nce" ISBN : 9 78-905617-69-2 Copyright © 20 08 Europ ean Spreadsheet Risks I nterest Group ( www.eusprig.or g ) and Author 111 to program for themselves. As my example, I' l l use a component for filtering elem ents out of tables, which I needed recently for creating dropdow n menus. Last year, I worked on a spreadsheet for modelling the finances of urban regeneration projects. T hese projects e ntailed building h ouses and flats, and users had to be abl e to input lists of dwelling t ypes. Later on, the spreadshee t had to generate dropdown menus from t hese l ists. Now, Excel allows a dropdown menu t o take its options from a cell range. So, for instance, you can put in cell C7 of a sheet, a dropdown that takes its options from cells A1:A3 of t he sheet. If t hese happen t o contain t he strings "House", " Flat" and "Bungalow", then the dropdown will display "House", "Flat" and "Bungalow" as options. We tried doing this with ou r spreadsheet, by associat ing the category names table with the dropdown. The problem was that because of how the tables were organised, some ce lls in the table were blank. This meant some of the options on t he menu also became bl ank , which looked ugly a nd confused users. So we needed a way to remove these bl anks , copying the remaining n on- bla nk cells into a new table from which t he dropdown would take its options. The spreadsheet below i llustrates almost t his, except that to show the idea generalised to other patterns, it actually removes cells that don't start with X . Should you want to t ry it yourself, it is linked from [Paine 2007c]. This also links to the original of the component documentation displayed in Appendix 1. This is an ex ample of the f ilter part that remov es all strings that do not mat ch a specified p attern. In this examp le, the patte rn is X* elements to search the index matching elements Not X 2 X X 5 X2 Not X 10 X4 Not X 11 X5 X2 -1 Not X -1 Not X -1 Not X -1 -1 X4 -1 X5 -1 Not X -1 Not X -1 The fir st t wo rows are text. T he rest of column A is a is a t able from which we want to copy all cells starting with X to c olum n C, closing up gaps. In Excel, you can f ind such cells by a f ormula suc h as match( "X*", A3:A15, 0 ), the st ar be ing a wildc ard. This i s why the text i n the first row talks about X*. Colum n B is a working table. The first cell of the table, B3, is the offset of the first cell i n colum n A st arting with an X , 2. (I.e. cell A4.) Spreadsheet Co mponents for All Jocelyn Paine Proce edings of EuSp RIG 2008 Confere nce "In P ursuit of Spreadsheet Excelle nce" ISBN : 9 78-905617-69-2 Copyright © 20 08 Europ ean Spreadsheet Risks I nterest Group ( www.eusprig.or g ) and Author 112 The second cell of the table, B4, i s the offset of the first cell starting wit h an X, 5. ( I.e. cell A7.) The third cell of the ta ble, B5, i s the offset of the third cell starting with an X, 10. (I.e. cell A12.) And so on. The -1's indicate that there are no m ore such cells. Now, it is possible to program this in Excel, without ha ving to drop into V BA — whi ch for various reasons, we didn't want to do. However, it is not easy. In the ne xt paragraph, I'll explain a little why, so you have one example of why some components are hard to do in Excel. Technically speakin g, the natural w ay to pro gram this is t o use a technique called recursion, as so: Dear Excel, this is how to remov e blank cells from location i of table In onwards, put ting the non-blank ones into table O ut at location o onwa r ds: 1. If the cell at location i of In is blank, remov e blank cells from location i+1 of table In onwards, putting th e non-blank ones into tab le Out at location o onwar ds. 2. If the cell at location i of In is not blank, copy it to location o of table Out. Then remov e blank cells from location i+1 of table I n onwards, putting the non- blank ones into table Out at location o+1 onwards. 3. If there are no m ore cells at or after location i o f In, stop. This is a terribly familiar style of programm ing if you a re a computer scientist: recursion is t hen something you'll have been taught in your first term, if not first week. However, if you're an Excel developer (unless perhap s you learnt programm in g in some other language first), it probably isn't. Recursion is normally used when defining functions; but Excel doesn't let you define functions (except in VBA). Ev en if it di d, it' s not good at data structures, and probably wouldn't have a no tation for talk ing about tables and tabl e segments in the way needed to translate the above rule s. 2.2 How are spreadshee t components used? I've explai ned why something might be worth regarding as a c om ponent. How should it be used? If you inspect the f orm ulae i n the Web copy of t he spreadsheet above, linked from [Paine 2007c], you will see that t here is nothing special about where the three table s are. W e could wri te a similar spreadsheet with the "elements to search" ta ble in column E instead of A, or column AJ , or even arranged along a row. We could put the "matching elements" table in c olum n X , or r ow 19 of this s heet, or row 6 of any other sheet. Whatever the tables' relative positions and orientations, the formulae follow t he same scheme. Similarly, they f ollow th e same scheme no matter how large the "elements to s earch" table. What we need, therefore, is a way to encode t his scheme in a way ind ependent of t he tables' sizes, orientations, and relative positions; and then to enable a user t o take t his encoding. And we need a way for a user to t ell this encoding where in their spreadsheet the "elements to search" table (i.e. the inputs) is, and where the "matching elements" tabl e (i.e. the outputs) is. Also to tell it whether the tables run horizontally or vertically, a nd how big they are. We can then substitute these values in to the schem e, and obtain Spreadsheet Co mponents for All Jocelyn Paine Proce edings of EuSp RIG 2008 Confere nce "In P ursuit of Spreadsheet Excelle nce" ISBN : 9 78-905617-69-2 Copyright © 20 08 Europ ean Spreadsheet Risks I nterest Group ( www.eusprig.or g ) and Author 113 formulae appropriate to their spreadsheet. All we then need is t o make thes e formulae available as an XLS file, or something else the user can copy into thei r spreadsheet. This is what the Excelsior sp readshee t generator does. I described it in earlier papers [Paine 2005; Paine, Tek, Williamson 2006; Paine 2007a]; and explain briefly in Appendix 3 i n c ase you want to read the code examples in Appendices 1 and 2. Here, therefore, I shall just exp lain the basic idea. 2.3 How are spreadshee t components implemen ted? Excelsior in a nutsh ell Suppose we have the equation: t[i] = u[i] * 2 Suppose also that t and u are both groups of 5 cells. Let t be cells A1:A5, and u be cells C1:G1. Assume the square brackets signify "array indexing" or "array subscripting". (If you've pr ogramm e d in any conventional languages, you' ll be f amiliar w ith this.) The idea is that the index i denotes an offset from the first cell in the table being i ndexed. Thus, t[1] is cell A1, t [2] is cell A2, t [5] is cell A5. Similarly, u[1] is cell C1, u[2] is ce ll C2, u[5] is cell G1. Then we can regard t[i] = u[i] * 2 as shorthand for the equations t[1] = u[1] * 2 ... t[5] = u[5] * 2 We can then interpret the in dices in each equation a s in the previous paragraph. This gives us the formulae: A1 = C1 * 2 ... A5 = G1 * 2 This is how Excelsior works. Had I said that t was c ells P77:P81, the pr ocess would have worked j ust as well; and it would have worked just as well if t and u held 9 or 12 7 elements instead of 5. This gives us the "way to encode this scheme in a way indepe nden t of tables' sizes, positions a nd orientations" that Sect ion 2.2 asked for. Components are encoded as Excelsior programs. Generating a group of cells from a component j ust entails inserting the user- wanted tables' sizes, positions and orientations i nto the program, running Excelsior over it, and sending the resulting spr eadshee t to the user to copy into their own spreadsheet. Spreadsheet Co mponents for All Jocelyn Paine Proce edings of EuSp RIG 2008 Confere nce "In P ursuit of Spreadsheet Excelle nce" ISBN : 9 78-905617-69-2 Copyright © 20 08 Europ ean Spreadsheet Risks I nterest Group ( www.eusprig.or g ) and Author 114 3. IMPLEMENTATION Because the repository is a Web site, we had to design some Web pages a s an interface. These are explained next, in Sec tion 3.1. We also had to write code on the Web server to accept i nputs from the com ponent-customisation page, run Excelsior, and send back a link to the resulting downloadable file. T his is explained in Section 3.2. These sections deal with t he version for Excel. Section 3.3 briefly describe s how the version for Google differs. 3.1 The Web pages We designed f our pa ges : a welc om e page; the reposito ry contents page; a page via which the user can request a com p onent; and a download pag e. 3.1.1 The introduction pa ge This page welcomes t he user, and explai ns wha t the repository can do for them. The underlined "contents list " in the second paragraph is a l ink: Barking Dog Spreadshee t Parts Repository: Welcome You are a sp readsheet develop er. You are the Ha rry Potter of modelling: you kno w precisel y how to model your compan y's business into Excel. But man y Excel tas ks require spec ialised knowledge . Some, such as statis tical calculation s, go badl y wrong if you progra m them by a metho d that looks right b ut doesn't co nsider things suc h as numerical acc uracy, rounding errors, a nd division by zero. Others, such as creatin g lists of drop down menus options fro m bigger lists, are j ust plain hard. We provide a Web-based libra ry of parts t o carry out such ta sks! Just look up your needs in our c ontents list, and choo se which part you want. T hen, using a Web form, tell our server where in your sprea dsheet the par t should take its input s from, and wh ere it should put its res ults. WE DO THE REST ! Our server will A UTOMATICAL LY reshape the part to fit your spreadsheet . It will then send it as a link whic h you can cop y into your spreadsheet usi ng a small VB A (Visual Basic for Applications) pr ogram. Should you be unsure about h ow a part works o r how to use it, full documentatio n is available. All parts come with an example spread sheet demonstratin g usage, which you can do wnload and try. And all parts come with fully cross-referenced documentatio n pages, which li ke the example spreadsheets, ar e linked from a p art's contents entr y. Dougal the barking dog says WHY KEEP A DO G AND BARK Y OURSELF!? ! 3.1.2 The contents page This is t he catalogue of components, linked from t he welcome page. In our protot ype , there is one component, described after the single bullet point. A real- life repository would have many more, and would provide tools for searching the catalogue. The component is the one I explained in Section 2; the u nderlined "here"' s are links: Spreadsheet Co mponents for All Jocelyn Paine Proce edings of EuSp RIG 2008 Confere nce "In P ursuit of Spreadsheet Excelle nce" ISBN : 9 78-905617-69-2 Copyright © 20 08 Europ ean Spreadsheet Risks I nterest Group ( www.eusprig.or g ) and Author 115 Spreadsheet Parts Repos itory: Contents • Filter: re m oves all stri ngs not matchin g a pattern. This part takes an input table of strings. It re moves all strings that do not match a specified patter n, placing the o thers in the output table . Please click here to custo mise and do wnload the part. Please click here for an e xample spreadsheet. Please click here to read the source co de and explanator y commentary. 3.1.3 The form-submissio n page This page contains a form where the user customizes the component, and is linked from the contents page above. Customisation involves telling the repository: where the component should take i ts i nputs (t he "elements to search" table of Section 2.1) from; where it should p lace its outputs (the "matching elements" table of Section 2. 1); where the working table, "the index" in S ection 2.1, should g o; and the pattern to search f or. Spreadsheet Parts Repos itory: Filter, Remove Non -matches This part takes an input table of strings. It re moves all strings that do not match a specified patter n, placing the o thers in the output table . To get a co py of the part customised to your own spread sheet, please fill i n the fields belo w and click the Sub mit button. Pattern to m atch Input Sheet: First input cell (to p left) Final input cell (bottom right) Output Sheet: First output cell (top left) Final output cell ( bottom right) Working Sheet: Su bm i t 3.1.4 The "download cust omised component" page Spreadsheet Co mponents for All Jocelyn Paine Proce edings of EuSp RIG 2008 Confere nce "In P ursuit of Spreadsheet Excelle nce" ISBN : 9 78-905617-69-2 Copyright © 20 08 Europ ean Spreadsheet Risks I nterest Group ( www.eusprig.or g ) and Author 116 The repository g enerates this page dynam ically when the com ponent is ready. I t just contains a link to a tempora ry file generated by the server: Spreadsheet Parts Repos itory: Download Your co mponent is read y . You can do wnload it from here. 3.2 The server code I implemented the system on my Internet Service Provider's (Mythic Beasts) machine, a fairly standard Unix box. Excelsior is written in SWI- Prolog [SWI], and fortunately, there is an SWI implementation for that machine and operating system. It can easily be c alled from Web scripting languages such as PHP, Perl and Python. Given these facts, implementation was s traightfoward, using te chniques familiar to any Web implementor. It worked as follows: 1. The user fills in the f ields on the form described in the prev ious section and subm it s it. The Web browser then sends th e fields' v alues to the Web server. 2. A s cript on the server, written in PHP, extracts the fields' values. From these, by some simple string manipulation, it generates a scratch file c ontaining Excelsior statements defining the component' s user-specifiable co nstant s and cell locations. For example, the pattern t o match would become a consta nt declaration such as constant pattern = "X*" . It appends this to t he main Excelsior source file for the component. For the "Filter, Remove Non -m atches" com pone nt, this is the file shown in Appendix 1. It then invokes Excelsior to compile this file. 3. By followin g t he process described in Section 2.3, Exc elsior generates a scratch text file containing the corresponding Excel formulae. 4. The server script em its a download page containing a link to this file. 5. The user downloads the file, and i nvok es a VBA macro to read the formulae from it into the appropriate cells of their spreadsheet. 3.3 The Google version This works similarly. The key difference is that the user nominates the Google Spreadsheet t o be updated, by f illing in their Google Docs details and the spreadsheet name in a form [ Paine 2008b]. They must al so tell the r epos itory where the component should be placed, as in Section 3.1.3. The repository then generates the formulae as in steps 2 and 3 above, but instead of putting the m into a downloadable file, it copies them directly to the Google Spre adsheet. It does so by calling Google’s Spreadsheets Data API (GData) library. At the lowest le vel, one has to manipulate Google Spreadsheets via HTTP requests [Google a] , encoding information in URLs and as chunks of XML. The Spreadsheet Co mponents for All Jocelyn Paine Proce edings of EuSp RIG 2008 Confere nce "In P ursuit of Spreadsheet Excelle nce" ISBN : 9 78-905617-69-2 Copyright © 20 08 Europ ean Spreadsheet Risks I nterest Group ( www.eusprig.or g ) and Author 117 GData library hi des the details of these encodings, and can be used without needing to know much about them . 4. EVALUAT ION Evaluation was just a matter of testing that the implementation worked as expected. It did. Implementation often throws up unpleasant c once ptual or technical surprises, but t his time there were none. I did not evaluate the i nterface for ease of us e; that s hould be one line of future research, usin g a m ore realistic catalogue of com ponents. 5. FURTHER WORK The most important task is to survey users to discover which co m ponents would be most useful. Then to implement them and promote the use of tested, documented, guaranteed, free components. We should consider the use r interface. With my prototype, users must re-request a component every time t hey c hange the si ze, shape, or location of its inputs or outputs in their spreadsheet. C an we design a user interface that h andles this, w hilst remaining acceptably unintrusive? (Popping up a "Do you want to reload component?" modal window every time the use r edits any cell is not acceptably unintrusive.) One possibility is that t he user define n amed ranges for e ach input a nd output. An acol yt e process could watch these ranges, a utomatically regenerating the component whenever they change. Going further, we could program a "wiring tool " with which users could dr aw "wiring diagrams" depicting their spreadsheets as networks o f interconnected component s. Although such a tool would be primarily for use with components from the repository, it would also be usable with users' own com ponents. (I shall sa y more in a future paper about how these will be created.) The "wiring diagrams" could then sit besides Excel as an alternative a nd more structured representation of the spreadsheet. T hus by stealth we encourage users to structur e their spreadsheets. We could consider using one of t he Web-based s prea ds heets for demonstrating e xam ple uses of components and perhaps for delivering to Excel. This was one reason I decided to experiment with a Google Spreadsheets version. It will be, of course, useful to Google Spreadsheets users anyway. Indeed, it would sit very ni cely besides Google’s own repository of charts and other data visualisation aids for spreadsheets, Google Gadgets [Chitu 2008]. Spreadsheet Co mponents for All Jocelyn Paine Proce edings of EuSp RIG 2008 Confere nce "In P ursuit of Spreadsheet Excelle nce" ISBN : 9 78-905617-69-2 Copyright © 20 08 Europ ean Spreadsheet Risks I nterest Group ( www.eusprig.or g ) and Author 118 6. CONCLUSIONS I have prototyped a Web-based "spreadsheet compone nt repository" from which users can copy "components" i nto their own Excel or Google spreadsheets. T hese components a r e collections of cells containing formulae that (in real l ife) would do c alculations useful but too hard for many users to program. Cruci ally, the user can tell the repository which cells in their spreadsheet to use for a component's inputs and outputs. Because the repository stores components as “tem plates” (code for the Excelsior spreadsheet-g enerator) fr om which spreadsheets can be generated by parameterization, it can "reshape" the com ponent to fit before sending it. A single component can theref ore be used in m any different sizes and shapes of spreadsheet. I believe t here to be two sig nificant contributions to spreadsheet practice. Firstly, I hope to set up a spreadsheet equivalent of the high-quality numerical subroutine libraries that revolutionised scientific com p uting, but where instead of subrout ines, t he library contains “templates” for pieces of spreadsheet. This will bring t o spreadsheeting the benefits that prefabrication of standard ised parts brought to m anufacturing. Secondly, i t seem s that w e can design an inter face that will display spre adsheets a s “wiring diagrams” f or systems of interconnected components. Users may star t off using this only for components downl oaded fr om a repository. But once they are used to the idea, they may go on define and “wire in” their own components, structuring their o w n spreadsheets as collections of their own com ponents. Thus, by st ealth we encourag e structured spreadsheets. Spreadsheet Co mponents for All Jocelyn Paine Proce edings of EuSp RIG 2008 Confere nce "In P ursuit of Spreadsheet Excelle nce" ISBN : 9 78-905617-69-2 Copyright © 20 08 Europ ean Spreadsheet Risks I nterest Group ( www.eusprig.or g ) and Author 119 7. REFERENCES Ionut Chitu. Google Spreadsheets Ad ds Gadgets, a Directory of Features blog posting for 19/3/2008, googlesystem.blogspot.com/2008/03/ google-spreadsheets-adds-gadgets.html 1:08pm, 20/6/08. Google a. Google Spreadsheets Dat a API developer’s guide, code.google.com/apis/spreadsheets/develop ers_guide_protocol.html 1:07pm, 20/6/08. Google b. Welcome to Googl e Spreadsheets page, www.google.com/googlespreadsheets/tr y_out.htm l 1:10pm, 20/6/08. NAG. How we started page, www.nag.co.uk/about_earlyyears.asp 4:42p m , 10/4/08. Jocelyn Paine 2005. Excelsio r: br inging the benefits of modularisa tion to Excel . In Proceedings of EuSpRIG 2005 , Greenwich, 6-8 July 2005. www.j-paine.org/eusprig2005 .htm l 10:24pm, 11/4/08. Jocelyn Paine 2007a. It Ain't What You View, But The Way That You View It: documenting spreadsheets with Excelsior, semantic wikis, and l iterate programming . In Proceedings of EuSpRIG 2007 , Greenwich, 8-10 Jul y 2007. www .j-paine.org/eusp rig2007_docume nting.do c 10:24pm , 11/4 /08. Jocelyn Paine 2007b. SF story-g enerator - Index , www.j-paine.org/excelsior/repository/spin/i ndex.htm l 8:47pm, 9/4/08. Jocelyn Paine 2007c. Remove non-match es - Index page, www.j- paine.org/excelsior/repository/remove_no n_matches/index.html 6:22pm, 11/4/08. Jocelyn Paine 2008a. Spread sheet Components, Google Spreadsheets, and Code Reuse , www.j- paine.org/dobbs/spreadl ets.htm l 5:36pm, 18/6/08. Jocelyn Paine 2008b. Spread let Demo form, www.j-paine.org/cgi-bin/goo gle/f ilter_form.py 5:43p m , 18/6/08. Jocelyn Paine, Emre Tek, Duncan Willia mson 2006. Rapid Spreadsheet Reshaping with Excelsior: multiple drastic changes to conten t and layout are easy when you represent enough structure . In Proceedings of EuSpRIG 2006 , Cambridge, 7 -9 July 2006. www .j-paine.org/eusp rig2006.html 10:24pm, 11/4/08. SWI. SWI Prolog site, www .swi-pro log.org/ 9:08pm, 10/4/08. Gahan Wilson 1972. Th e Science Fiction Horror Movie Pocket Computer , in The Yea r's Best Science Fiction No. 5 , edited by Harry Harrison and Brian Aldiss, 1972. Spreadsheet Co mponents for All Jocelyn Paine Proce edings of EuSp RIG 2008 Confere nce "In P ursuit of Spreadsheet Excelle nce" ISBN : 9 78-905617-69-2 Copyright © 20 08 Europ ean Spreadsheet Risks I nterest Group ( www.eusprig.or g ) and Author 120 APPENDIX 1: EXCELSI O R SOURCE A ND DOCUMENTA TION FOR "Filter, Remove Non- matches" COM PONENT In the code b elow, the input ce lls are table elem ents_to_sear ch, to be searched for occurrences of pat tern . The output ce lls are matching_ elements . m atching_elem ents[i] is the i 'th element of elements_ to_search tha t matches pattern , or blank if there are no more s uch elements. The w orking cells a re the_index . Element the_index [i] is the ind ex of the i'th ele ment of elements_ to_search tha t matches pattern , or -1 if there are no more suc h elements. constan t pattern. table e lements_to_s earch : elem ents_base -> text. table m atching_elem ents : eleme nts_base -> text. type el ements_base. table t he_index : e lements_base -> text. the_ind ex[ 1 ] = if( i sna( match( pattern, ele ments_to_sea rch[all], 0 ) ) , - 1 , m atch( patter n, elements_ to_search[al l], 0 ) ). the_ind ex[ i > 1 ] = if( t he_index[i-1 ] = -1 / / There was no previous one, so can' t be anoth er. , - 1 , i f( the_index [i-1] = upb( elements_bas e) // Previo us one was a t the end of the table , // so the re can't be any more. , -1 , if( isna( match( patt ern , elem ents_to_sear ch[ (the_i ndex[i- 1]+1):upb (elements_ba se) ] , 0 ) ) , -1 , match ( pattern , elements_t o_search[ (t he_index[i - 1]+1):upb (elements_ba se) ] , 0 ) + the_inde x[ i-1 ] ) ) ). matchin g_elements[ i ] = if( t he_index[ i ] <> -1 , e lements_to_s earch[ the_i ndex[i] ] , " " ). Spreadsheet Co mponents for All Jocelyn Paine Proce edings of EuSp RIG 2008 Confere nce "In P ursuit of Spreadsheet Excelle nce" ISBN : 9 78-905617-69-2 Copyright © 20 08 Europ ean Spreadsheet Risks I nterest Group ( www.eusprig.or g ) and Author 121 APPENDIX 2: EXCELSI O R SOURCE A ND DOCUMENTA TION FOR SF STORY SPREADSHEET This Appendix is a longer exa m ple, demonstrating specification and documentation of a n unusual Excel spreadsheet, one t hat generates science- fi ction plots. The point is to s how Excelsior-style documentation at work on a task that al most all readers will find unfamiliar. If the docum entation makes it clear to you, it will help to show Excelsi or to be a viable documentation tool . The t ext below is how Exce lsior’s documentation engine renders the text as HTML: f o r details, see [Paine 2007a]. For t he print c opy of this paper, I have re moved the links between tables since they don't show up on paper. I have also indented the code sections to make them stand out. On the Web, they would stand out anyway, because the documentation engine pl aces t hem on a pale bl ue background. The actual HTML generated, and the spreadsh eet, can be seen at [Pain e 2007b]. This is an Excelsior version of the Prolog science- fiction generator at www.j- paine.org/cgi-bin/spin.php . I adapted that somewh at broadly from [Wilson 1972]. It demonstrates how recursion is possi ble when one thinks of tables as functions. Defining the content o f possible stories The raw material from which I generate stories is a graph — that is, a ne twork — of possib le stor y events. T he graph consists of nodes (i.e. p oints), and edges, whic h lead from one nod e to another. Each edge is as sociated with a chunk of text. A nod e can have more than one edge lead ing fro m it. To generate a story, we start a t node 0 and sele ct, at random, one o f the three edges lead ing from it. The text assoc iated with the selected edge be comes the first few wor ds of the stor y. Every edge leads to a node; an d so the edge selected from nod e 0 will too. To generate the rest o f the story, we rep eat the above process w ith that node; a nd we simply co ntinue follo wing edges and using their asso ciated text unti l we arrive at a node from whic h no edges lead. Representing the graph s in Excel Nodes: ta ble node_nos I repr esent each node by an int eger, starting at 0. The data sto red in this version of t he spreadsheet has nodes n umbered from 0 to 42: type node _base = 0:42 . I'm going to set up a table that lists the edges leadi ng out of each nod e. To mak e it easier to read, I'll put it next to a table that sh ows all the nod e numbers — i.e. the n 'th element of t he table is just n : table n ode_nos : no de_base -> g eneral. node_no s[n] = n. Edges: table o ut_edges Now I'll define t he edges that lead out of each node . For each ed ge, I need to stor e its text, and the number of the node it leads to. I must do so in a way that makes it eas y to count the ed ges, and to rando m ly select one o f them, a nd to find the node it leads to. Spreadsheet Co mponents for All Jocelyn Paine Proce edings of EuSp RIG 2008 Confere nce "In P ursuit of Spreadsheet Excelle nce" ISBN : 9 78-905617-69-2 Copyright © 20 08 Europ ean Spreadsheet Risks I nterest Group ( www.eusprig.or g ) and Author 122 It must also be easy to type ne w text and node numbers when d efining the stor y data. Because Excel 's string-handling i s rudimentary, I decided to store the text of an edge a nd the number of the node it leads to in separate but adj acent cells, ra ther than in one cell which would have to be chop ped apar t to get these components. So each edge is t wo cells, text follo wed by a destination nu mber. I'll store a ll the edges for a node in the sa me row, so that a r ow becomes a sequence of Te xt Destina tionNodeNu mber cell pair s. This way, if I generate a rando m number to indica te an edge, it's ea sy to get the corresp onding text and d estination node number. So I w ant a two-di mensional table , whose first di mension is node number , and whose seco nd dimension is ed ge data: type ou t_edges_base = 0:12. table o ut_edges : n ode_base out _edges_base -> general . This will contai n the stor y data. I've put the data at the e nd of this file; to show you ho w it works, here are the fir st few lines: // out_ edges[0,0]=" Earth". // out_ edges[0,1]=1 . // out_ edges[0,2]=" Mars". // out_ edges[0,3]=1 . // out_ edges[0,4]=" Planet 9 of Alpha-Centau ri". // out_ edges[0,5]=1 . // out_ edges[1,0]=" is used as t he cue ball in a game of galactic bar-billi ards". // out_ edges[1,1]=2 . // out_ edges[1,2]=" falls toward the Sun". // out_ edges[1,3]=2 . // out_ edges[1,4]=" falls toward a black hol e". // out_ edges[1,5]=2 . Counting the edges: table o ut_edge_coun t As alread y mentioned, to gene rate a story, we pick t he start no de, randomly select an ed ge lead ing from it, outp ut the associated t ext, and repeat with the edge's de stination node. To do the random selection, I use the Excel funct ion rand() , asking it to generate a number between 0 a nd the number of ed ges minus 1. It tur ned out to be convenient to simpli fy the formula by calculatin g this number of edges in an auxil iary table: table o ut_edge_coun t : node_bas e -> general . out_edg e_count[ n ] = count( ou t_edges[n, 0 :12] ). The generated story I'll explain what I did as thoug h the story is bei ng generated o ver a sequence of ti mepoints, w ith the first node ge nerated at ti me 0. Let's have 100 timepoints altogether: type st ory_base = 0 :99. The co re of this representation will be the node n umber selected a t each timepoint. Story no des: table stor y_node_nos table s tory_node_no s : story_ba se -> genera l. Spreadsheet Co mponents for All Jocelyn Paine Proce edings of EuSp RIG 2008 Confere nce "In P ursuit of Spreadsheet Excelle nce" ISBN : 9 78-905617-69-2 Copyright © 20 08 Europ ean Spreadsheet Risks I nterest Group ( www.eusprig.or g ) and Author 123 We start wi th node n umber 0, at ti me 0: story_n ode_nos[ 0 ] = if( go[1] = 'Recalcul ate', 0, 0 ). (The r eference to table go enables the user to for ce generation of a new story b y selecting from a dropdo wn at the to p of the spre adsheet.) For the re maining timepoints, the story node at time t will be the node to which the edge selecte d at time t-1 lead s: story_n ode_nos[ t>0 ] = story_o ut_edge_dest ination_no de_no[ t-1 ]. Story destina tion nodes: sto ry_out_edge_ destination_ no The right-hand side of that equatio n introduces a new table which holds t he number of the node selected at ea ch time: table s tory_out_edg e_destinatio n_node_no : story_base -> general. I'll define this i n terms of an a uxiliary index. Reca ll that I've stored a node 's out-edges in a ro w, as pairs of cells: T ext Destinatio nNodeNumber T ext Destinatio nNodeNumber T ext DestinationNode Number ... Suppose w e have selec ted pair number i : i is 0 for the f irst pair, and so on. Then the i 'th pair's text is offset b y i*2 fro m the first cell i n this row. T he i 'th pair's destinatio n node number is offset by i*2 + 1 . So now suppo se that we have a table stor y_out_edge_i ndex , giving the value of this i at time t . Then we can say that: // stor y_out_edge_d estination_n ode_no[ t ] = // offs et( out_edge s[0,0], stor y_node_nos[ t ], story_out _edge_index[ t ]*2+1 ). When we c rash into the end o f the story Note that I've co mmented out the above equation. T he one I'm actuall y using is this: story_o ut_edge_dest ination_node _no[ t ] = if( s tory_out_edg e_index[ t ] <> -1 , o ffset( out_e dges[0,0], s tory_node_no s[ t ], story_out _edge_index[ t ]*2+1 ) , - 1 ). It's as above, but returns -1 if t he index is -1. I 'm using this to handle the case where I 've run out of story nodes — i.e . after co ming to a node w ith no out-ed ges. It's impor tant to deal with such cases, otherwise we 'll end up with cellsful o f #REF's. The index: ta ble story_ou t_edge_inde x This is where t he random edge selection happe ns. The index of t he out-edge selecte d at time t is - 1 if there are no edges, otherwise it's a random n umber between 0 and the number of edges minus 1: table s tory_out_edg e_index : st ory_base -> general. Spreadsheet Co mponents for All Jocelyn Paine Proce edings of EuSp RIG 2008 Confere nce "In P ursuit of Spreadsheet Excelle nce" ISBN : 9 78-905617-69-2 Copyright © 20 08 Europ ean Spreadsheet Risks I nterest Group ( www.eusprig.or g ) and Author 124 story_o ut_edge_inde x[ t ] = if( s tory_out_edg e_count[ t ] > 0 , f loor( rand() * story_out _edge_count[ t ], 1 ) , - 1 ). Note that 'rand' is better than 'randbetween', because t he latter requires the Excel Analysis ToolP ak to be lo aded. Thanks to Alice Campbell for p ointing out this i m prove m ent. The out-edge count: table story_out_ed ge_count I used an auxiliar y table to get the number of out-ed ges at time t . It's j ust the number o f out-edges for the corr esponding node, o r -1 if we've run out of s tory so the nod e is -1: table s tory_out_edg e_count : st ory_base -> general. story_o ut_edge_coun t[ t ] = if( s tory_node_no s[ t ] <> -1 , o ffset( out_e dge_count[0] , story_node _nos[ t ], 0 ) , 0 ). The text associa ted with a n out-edge: story_out_te xt This uses the i ndex of the selecte d out-edge, as explained earlier, to get the edge's text: table s tory_out_tex t : story_ba se -> genera l. story_o ut_text[ t ] = if( s tory_out_edg e_index[ t ] <> -1 , o ffset( out_e dges[0,0], s tory_node_no s[ t ], story_out _edge_index[ t ]*2 ) , - 1 ). The output: table story_ node_text This app ears near the top o f the spreadsheet and is held in the table story_n ode_text : table s tory_node_te xt : story_b ase -> text. It's a cop y of story_o ut_text , cleaned up to d isplay blanks after we've hit the end o f the story: story_n ode_text[ t ] = if( s tory_out_tex t[ t ] <> -1 , s tory_out_tex t[ t ] , " " ) & if( a nd( t = upb( story_base ) , story_ou t_edge_count [ t ] > 0 ) , " ... NO SPAC E TO CONTINU E!" , " " ). The second if appends a suitable message i f we're about to r un off the end of the tab le. Spreadsheet Co mponents for All Jocelyn Paine Proce edings of EuSp RIG 2008 Confere nce "In P ursuit of Spreadsheet Excelle nce" ISBN : 9 78-905617-69-2 Copyright © 20 08 Europ ean Spreadsheet Risks I nterest Group ( www.eusprig.or g ) and Author 125 Forcing r ecalculation As explained above, this forces r ecalculation. In the la yout, I've attached a dropd own to this table. table go : -> general . Layout layout( 'Spin' , rows( [ ski p(1,2) ] , [ 'St ory', 'Recal culate' ] , [ sto ry_node_text as y, as( g o, y, ['Re calculate'] ) ] , [ ski p(1,2) ] , headi ng , [ sto ry_node_nos as y , sto ry_out_edge_ count as y , sto ry_out_edge_ index as y , sto ry_out_edge_ destination_ node_no as y , sto ry_out_text as y ] , [ ski p(1,2) ] , headi ng , [ nod e_nos as y , out _edges as yx , out _edge_count as y ] ) ) . Story data out_edg es[0,0]="Ear th". out_edg es[0,1]=1. out_edg es[0,2]="Mar s". out_edg es[0,3]=1. out_edg es[0,4]="Pla net 9 of Alp ha-Centauri" . out_edg es[0,5]=1. out_edg es[1,0]="is used as the cue ball in a game of galactic bar-billi ards". out_edg es[1,1]=2. out_edg es[1,2]="fal ls toward th e Sun". out_edg es[1,3]=2. out_edg es[1,4]="fal ls toward a black hole". out_edg es[1,5]=2. out_edg es[1,6]="is struck by a comet". out_edg es[1,7]=2. out_edg es[1,8]="is invaded by n asty aliens" . ... etc . out_edg es[28,0]="". out_edg es[28,1]=-1. out_edg es[28,2]="an d". out_edg es[28,3]=22. out_edg es[42,0]="42 ". Spreadsheet Co mponents for All Jocelyn Paine Proce edings of EuSp RIG 2008 Confere nce "In P ursuit of Spreadsheet Excelle nce" ISBN : 9 78-905617-69-2 Copyright © 20 08 Europ ean Spreadsheet Risks I nterest Group ( www.eusprig.or g ) and Author 126 out_edg es[42,1]=13. APPENDIX 3: THE EXC ELSIOR LANGU AGE This Appendix explains j ust enough of t he Excelsior language f or you to make sense of Appendices 1 and 2. Excelsior describes spreadsheets as sets of equation s between groups of c ells calle d tables. The Excelsior compiler reads program files containing t he equations and generates an Excel spreadsheet by translating each equation in to one or more form ulae. As well as equations, a pro gram can contain constant declarations, type declarat ions, table declarations, and layout descriptions. The program's meaning is independent of their order. Prog rams are free fo rmat; statements end with a full st op; and Excelsior recognises /* to */ as block comm e nts and // as end- of-line comments. Each equation has a left-hand side and a right-hand side. T he right-hand side i s an Excel formula, except that instead of cell references, it contains Exc elsior table references. It can also refer to declared constants. T he left-hand side is a reference to a table element. To translate the equation, Excelsior rewrites the right-hand side, replacing table references by cell references, and constants by their values. It then places t he formula int o the cell denoted by the le ft-hand side. Excelsior calculates cell positions from layout statem ents, each of which depicts a sheet as a grid of t ables. In the exa m ple below, t here is one layout statement, depicting she et Demo. The following contrived p rogram illustrates this: constan t two = 2. // An unin teresting co nstant. constan t hundred = 100. /* A bigge r constant. */ type sp an = 1:4. table n ums: span -> general. // I'll pu t numbers // in here . There'll b e 4, because // 'span' is 1:4. table s trings: span -> text. /* I'll pu t messages i n here. */ nums[1] = two. strings [1] = "Two = " & two & " .". nums[2] = two * num s[1]. strings [2] = "Twice two = " & n ums[2] & "." . nums[3] = len( stri ngs[2] ). strings [3] = "Lengt h of above t ext = " & nu ms[3] & ". ". nums[4] = sum( nums [1], nums[2: 3], hundred, 250.12+24 9.88 ). Spreadsheet Co mponents for All Jocelyn Paine Proce edings of EuSp RIG 2008 Confere nce "In P ursuit of Spreadsheet Excelle nce" ISBN : 9 78-905617-69-2 Copyright © 20 08 Europ ean Spreadsheet Risks I nterest Group ( www.eusprig.or g ) and Author 127 strings [4] = "Sum o f above numb ers plus 600 = " & num s[4] & ".". layout( 'Demo' , rows( [ num s, strings ] ) ) . // This ma kes sheet 'D emo' contain // 'nums' to the left of 'strings' . This program generates one sheet, called Demo . This contains tables nums and strings . Both t ables contain 4 elements, as specified by the type span used in their declarations. Whe n arranged on sheet Demo , nums is in column A and to the left of strings , as specified in the layout statement. The spreadsheet generated from the above is as follows: 2 Two = 2. 4 Twice two = 4. 14 Length of above text = 14. 620 Sum of ab ove numbers plus 100 = 6 20. The table declarations, incidentally, have a syntax copied from the convention often u sed to describe functions in mathematics and computer science: the function's name, foll owed by the t ypes of its a rgum ents, followed by the type of i ts result. This i s because it's often convenient to think of tables as functions, and table references as function calls. The story-generator i n Appendix 2 il lustrates this: by re cursing over tables, it f inds a random path through a transition m atrix represented as a g raph. In Excel sior, t he argument types specify the size of a table's dimensions. T he result type specifies the type of the elements t he table sh ould contain. Excelsior uses this both t o detect errors such as adding strings to numbers, and to generate an appropriate Ex cel style for formatting the cells.

Spreadsheet Components For All

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment