Workbook Structure Analysis - 'Coping with the Imperfect'

Reading time: 5 minute
...

📝 Original Info

  • Title: Workbook Structure Analysis - ‘Coping with the Imperfect’
  • ArXiv ID: 1111.6858
  • Date: 2011-11-30
  • Authors: Authors not specified in the provided excerpt.

📝 Abstract

This Paper summarises the operation of software developed for the analysis of workbook structure. This comprises: the identification of layout in terms of filled areas formed into "Stripes", the identification of all the Formula Blocks/Cells and the identification of Data Blocks/Cells referenced by those formulas. This development forms part of our FormulaDataSleuth toolset. It is essential for the initial "Watching" of an existing workbook and enables the workbook to be subsequently managed and protected from damage.

💡 Deep Analysis

Figure 1

📄 Full Content

A study [Lawson et al, 2007] has shown that inexperienced spreadsheet developers have markedly less understanding of the benefits of practices such as integrating spreadsheet modules, keeping data and formulas separate and testing than experienced developers. This can lead to critical spreadsheets and workbooks being vulnerable to error because of poor workbook structure. This is a particular problem in small to medium sized organisations where resources to support spreadsheet development are limited. We have developed a software algorithm to enable the structure of a Workbook to be analysed and subsequently improved.

There is a school of thought that there is no such thing as “spreadsheet good practice” [Colver, 2004], arguing that there are many different ways of designing a workbook, each of which has advantages and disadvantages. In contrast, our experience of spreadsheet design and use shows that there is “good practice” at least for spreadsheet layout. This “good practice” is often adopted intuitively by experienced spreadsheet practitioners. This is distinct from overall workbook design (e.g. having Inputs, Outputs and Workings in separate sheets). Layout good practice can be based on assessing whether modifications (e.g. inserting rows) are likely to cause damage and tracking whether Blocks and Single Cells are accurately referenced by formulas.

Algorithms have been developed that perform Workbook Structure Analysis and find all the Formula Blocks/Cells and Data Blocks/Cells. To achieve this all the Formula Blocks are examined and the ones that share a common width or height are identified. The relationships between the Formula Blocks/Cells can thus be recorded. The Data Blocks/Cells are then found from the Formula References. Manual inspection by an expert practitioner could achieve the same end. These algorithms have evolved from a “Find Formulas” procedure that originally found formulas only and displayed the structure as borders around the filled Formula Blocks. This feature is not unique and is included in products such as OAK V4 [OPERIS, 2009] and SpACE [AuditWare, 2006].

Why perform the complex analysis needed to reveal the workbook layout and structure beyond identifying filled Blocks? At least one recent study [Aurigemma, 2010] has shown that automated analysis performs less well at finding errors in spreadsheets than detailed study by a person. Our Workbook Structure Analysis does not find all the errors and instead concentrates on what can be done automatically to reveal the layout. It then checks whether references in formulas accurately follow the layout/structure. Once the underlying structure is revealed then areas of a workbook that appear untidy, e.g. with Blocks that conflict with the overall structure, can be targeted for manual detailed analysis. Our experience of using the algorithm, on real spreadsheets from our client base, is that revealing the structure greatly speeds up the process of understanding and analysing a workbook. This addresses some of the problems encountered when needing to understand legacy spreadsheets [Hodnigg, 2008]. As well as visually revealing the structure, potentially incorrect cell references are indicated as warnings or errors for the user to inspect. The algorithm also gives an early indication of the likely scope of the problems as a percentage “score”, and will show when the workbook is past redemption and needs to be re-worked rather than checked [Murphy, 2007].

The original version of the “Find Formulas” procedure did not record spreadsheet structure which could only be determined by the user inspecting the borders. Nevertheless, familiar patterns reflecting both good practice and poor layouts could be observed. Experience gained from manually analysing/checking spreadsheets has now enabled the enhancement of our FormulaDataSleuth ® toolset [Bekenn, 2008]. The latest version of the “Find Formulas and Data” procedure records the spreadsheet structure and flags layout anomalies. The correct identification of spreadsheet structure enables the accurate detection of all Data Blocks/Cells. This includes identifying, if present, Data Blocks which may be “unknown” to the user/developer but are nevertheless referred to by Formula Blocks. These are denoted “Unintentional Data Blocks”.

Spreadsheet structure may be defined in terms of “Stripes” and “Blocks” which are defined in figure 1 and the following sub-sections. These concepts are examined in more detail in [Bekenn, 2009]. 1. Single Row, 2. Single Column, 3. Two Dimensional (Multiple Row and Column).

A horizontal set of Blocks spanning the same rows, make up a “Horizontal Stripe”. A vertical set of Blocks spanning the same columns, make up a “Vertical Stripe”. Stripes are more important if they contain multiple rows/columns than if they contain just single rows/columns.

An “Idealised Layout” is a layout composed only of Stripes. These stripes can have whole rows or whole columns inserted at any

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut