Understanding and Building Spreadsheet Tools
Tutorial at VL/HCC 2007
Tutorial Outline
Spreadsheets are among the most widely used programming systems [13]. Unfortunately, there is a high incidence of errors within spreadsheets that are employed for a wide variety of computations [11, 12]. Some of these errors have a huge impact on individuals and organizations [10]. As part of our research on spreadsheets, we have developed several approaches that are targeted at helping end-user programmers prevent [8, 9, 4], detect [1, 5, 3, 7], and correct [2, 6] faults within their spreadsheets.
In this tutorial, we explain fundamental principles on which spreadsheet tools can be based. We then illustrate how some simple inference mechanisms and visualization techniques that are based on these principles can be derived to detect errors or anomalous areas within spreadsheets. We also introduce a flexible framework for the quick prototype development of such spreadsheet tools and visualizations. The tutorial consists of the following two main parts.
(A) In the first part, we provide an introduction to the formal foundations of spreadsheet programs. This part will provide a precise description of what spreadsheet programs are and how they are different from programs written in general-purpose languages. These differences—the limitations as well as the additional structures offered by spreadsheet—reveal what information can be exploited by spreadsheet tools and how. The formal foundations part will include syntax and semantics of spreadsheets, an informal comparison with other computational models, such as lambda calculus, and, in particular, concepts to support the static analysis of spreadsheets, such as cell usage relationships and cell classifications, cell profiles and cell equivalence classes, spreadsheet regions and partitions, and a generic analysis approach based on region graphs.
(B) In the second part, we demonstrate an implementation framework and a variety of tools and visualizations built using this framework. The discussion in this part of the tutorial will be motivated by the kinds of errors that can be detected using the various tools. We will also look into building new tools, and combining existing tools and effective visualizations to interpret the feedback from the combined systems. The framework consists of two components. The front end is implemented in Visual Basic for Applications (VBA). It captures the required information about the spreadsheets and also controls the display of the feedback presented to the user. The back end is implemented in Haskell and has the inference/auditing mechanisms. The two components communicate over a socket connection. The goal of this tutorial is to provide the participants with a mathematically sound background for the understanding of spreadsheet programs, spreadsheet errors, and spreadsheet tools. Moreover, the goal is to provide the participants, based on this background, with an understanding of the working of existing tools and to enable the participants to modify tool implementations. To be able to make modifications to the tools presented during the tutorial, the participants must have a basic understanding of Haskell and VBA. We will provide the source code for our framework (including the latest version of the GHC compiler for Haskell) and all the examples (the Haskell code and example spreadsheets) used in the tutorial on CDs. Any participant who has a Windows laptop with Microsoft Excel installed will be able to get a hands-on experience with the framework during the course of the tutorial. For example, participants might come up with their own ideas for tools and visualizations and build them using our framework.
Format of tutorial: Half day
Organizers
Dr. Martin Erwig is an Associate Professor of Computer Science at Oregon State University. He received his M.S. in Computer Science from the University of Dortmund, Germany, in 1989, and his Ph.D. and Habilitation from the University of Hagen, Germany, in 1994 and 1999, respectively. His research interests include functional programming, domain-specific languages, and visual languages.
Contact information:
School of EECS, Oregon State University, Corvallis, OR 97331, USA, erwig@eecs.oregonstate.edu
Robin Abraham is a Ph.D. student at the Oregon State University. He received his M.S. in Computer Science from the Oregon State University in 2003. His research interests include end-user programming, software engineering, and visual languages.
Contact information:
School of EECS, Oregon State University, Corvallis, OR 97331, USA, abraharo@eecs.oregonstate.edu
References
[1] R. Abraham and M. Erwig. Header and Unit Inference for Spreadsheets Through Spatial Analyses. In IEEE Int. Symp. on Visual Languages and Human-Centric Computing, pages 165–172, 2004.
[2] R. Abraham and M. Erwig. Goal-Directed Debugging of Spreadsheets. In IEEE Int. Symp. on Visual Languages and Human-Centric Computing, pages 37–44, 2005.
[3] R. Abraham and M. Erwig. AutoTest: A Tool for Automatic Test Case Generation in Spreadsheets. In IEEE Int. Symp. on Visual Languages and Human-Centric Computing, pages 43–50, 2006.
[4] R. Abraham and M. Erwig. Inferring Templates from Spreadsheets. In 28th IEEE Int. Conf. on Software Engineering, pages 182–191, 2006.
[5] R. Abraham and M. Erwig. Type Inference for Spreadsheets. In ACM Int. Symp. on Principles and Practice of Declarative Programming, pages 73–84, 2006.
[6] R. Abraham and M. Erwig. GoalDebug: A Spreadsheet Debugger for End Users. In 29th Int. Conf. on Software engineering, 2007. to appear.
[7] R. Abraham and M. Erwig. UCheck: A Spreadsheet Unit Checker for End Users. Journal of Visual Languages and Computing, 18(1):71–95, 2007.
[8] M. Erwig, R. Abraham, I. Cooperstein, and S. Kollmansberger. Automatic Generation and Maintenance of Correct Spreadsheets. In 27th IEEE Int. Conf. on Software Engineering, pages 136–145, 2005.
[9] M. Erwig, R. Abraham, S. Kollmansberger, and I. Cooperstein. Gencel — A Program Generator for Correct Spreadsheets. Journal of Functional Programming, 16(3):293–325, 2006.
[10] EuSpRIG. European Spreadsheet Risks Interest Group. http://www.eusprig.org/.
[11] R. R. Panko. Spreadsheet Errors: What We Know. What We Think We Can Do. In Symp. of the European Spreadsheet Risks Interest Group (EuSpRIG), 2000.
[12] K. Ra jalingham, D. R. Chadwick, and B. Knight. Classification of Spreadsheet Errors. Symp. of the European Spreadsheet Risks Interest Group (EuSpRIG), 2001.
[13] C. Scaffidi, M. Shaw, and B. Myers. Estimating the Numbers of End Users and End User Programmers. In IEEE Symp. on Visual Languages and Human-Centric Computing, pages 207-214, 2005.