RANGE NAMES
A Risky Practice in Spreadsheet Development?
Ruth McKeever and Kevin McDaid
Dundalk Institute of Technology, Dundalk, Co., Louth, Ireland
Keywords: Spreadsheets.
Abstract: The use of range names to improve spreadsheet development is advocated by both academics and
practitioners, however there is a clear absence of supporting scientific evidence. This paper describes the
latest in a series of experiments that examine the impact of range name structures on spreadsheet reliability,
and formula development time. The aim of this paper is to compare the reliability of simple spreadsheet
formulas developed by intermediate users through both cell references and range names. The results are
consistent with the findings of previous experiments that, contrary to widespread opinion, the use of range
names does not improve the quality of spreadsheets.
1 INTRODUCTION
The importance of spreadsheets cannot be
overstated, as there are now believed to be 400
million Excel users worldwide. The importance, and
uncontrolled use, of spreadsheets in the financial
sector is investigated by Croll (2005). It is reflected
in the following quote made in relation to the
financial sector in the City of London: “Excel is
utterly pervasive. Nothing large (good or bad)
happens without it passing at some time through
Excel”.
Many authors acknowledge that spreadsheet
development is programming. For example, Burnett
et al (2004) declares that “spreadsheet languages are
the most widely used end-user programming
languages to date—in fact, they may be the most
widely used of all programming languages”.
Spreadsheets are both powerful and flexible and,
as a result, are an indispensable tool in modern
business. Flexibility, however, comes at a cost. Any
user can become a developer, without any
knowledge of risk or testing procedures. This leads
to widespread uncontrolled use of poorly developed
spreadsheet models. Considerable research has
recognised that spreadsheets are rarely developed by
professional programmers. For example, Purser and
Chadwick (2006) found that 85% of survey
participants developed the spreadsheets that they
use. This is not to say that the developers are not
professionals, but that their expertise lies within
their domain rather than in programming.
Consequently, spreadsheet development is known to
be highly unreliable, and spreadsheets have been
linked with many recent high profile and costly
errors. Powell et al. (2007) found errors in 94% of
spreadsheets, and 1-2% of cells. The reliability of a
spreadsheet is essentially the accuracy of the data
that it produces, and is measured by the number and
magnitude of errors found in the spreadsheet. To
improve reliability many expert practitioners
advocate the use of range names in the development
of spreadsheets. However, there is no empirical
research to support these claims.
This study is part of the first empirical
investigation into the impact of range names on
spreadsheet reliability. It summarises the results of
earlier experiments on the impact of range names on
spreadsheet debugging and it presents in detail the
results of a recent experiment investigating the use
of range names in the development of simple
spreadsheet formulas. This work is important in that,
contrary to published opinion, our evidence suggests
that range names have a negative impact on the
reliability of simple spreadsheet formulas, and on
the time it takes to develop them.
1.1 Spreadsheets
Errors in spreadsheets that lead to bad decisions are
often reported in the media, and a list of examples
269
McKeever R. and McDaid K..
RANGE NAMES - A Risky Practice in Spreadsheet Development?.
DOI: 10.5220/0003510202690274
In Proceedings of the 6th International Conference on Software and Database Technologies (ICSOFT-2011), pages 269-274
ISBN: 978-989-8425-76-8
Copyright
c
2011 SCITEPRESS (Science and Technology Publications, Lda.)
can be found on the European Spreadsheet Risks
Interest Group (EuSpRIG) website.
Spreadsheet engineering is frequently described
as end-user programming. A 2005 study (Scaffidi et
al., 2005) estimates that by 2012 there will be over
13 million end-user developers in the US, compared
with 3 million professional programmers. It is
acknowledged that end-user developed systems
introduce risk into an organization, and these risks
can have many influencing factors, for instance,
spreadsheet developers do not follow a structured
process, many are untrained in the use of processes
as in software development, and are rarely aware of
the unreliability of spreadsheets. One study found
that only 6% of development time is spent testing
spreadsheets (Baker et al., 2006).
Spreadsheet researchers all converge on the same
findings – spreadsheet use is ubiquitous, and
spreadsheet quality is not considered paramount
within organisations. Testing is not considered
crucial, and critical decisions are made based on
unregulated spreadsheets.
1.2 Naming
Many researchers have examined the importance of
naming in programming. Keller (1990) found that
people who read programs that followed a defined
naming scheme found them easier to read, but could
not pinpoint why. Jones (2008) raises the issue of
incorrect spelling: “if people make spelling mistakes
for words whose correct spelling they have seen
countless times, it is certain that developers will
make mistakes, based on the same reasons, when
typing a character sequence they believe to be the
spelling of an identifier”.
Range names are a feature in Excel that allows a
developer to assign a meaningful name to a cell or
group of cells. This name can then be used
throughout the spreadsheet instead of the cell
reference. Names can be easily created, modified,
and deleted through the Name Manager, located in
the formulas tab of the Excel ribbon.
Practitioners often suggest that range names can
make spreadsheets easier to understand and to
develop, in books, academic papers, journals, and on
websites, illustrated by the following examples:
“Range names improve reliability. If you need
to change references to the range, you only
have to change the definition of the range
name. Then every formula that uses it will refer
to the new address.” (O'Beirne, 2005)
“Clearly, using the Defined Names makes the
formula much easier to understand and
maintain.” (Pearson, 2009)
In contrast, some experts caution against using
range names. Panko and Ordway (2005) warn that
range names “should be considered potentially
dangerous until research on using range names is
done.” Blood (2006) states that names are
unnecessary if the model is well designed, and that
range names make it more difficult to audit
formulas, as important information becomes hidden.
He also criticizes range names for making formulas
unnecessarily long.
1.3 Summary
Many researchers have explored spreadsheet errors,
in terms of their frequency and causes. It is widely
recognised that spreadsheets are unreliable, and
range name use is often mentioned as a practice that
improves spreadsheet quality. The majority of
practitioners are in favour or range names, yet a few
vocal opponents remind us that there is no scientific
evidence to support these recommendations.
This paper outlines earlier aspects of our
research programme, and then describes in detail an
experiment to establish the effect of range names on
the reliability of basic spreadsheet formulas.
2 BACKGROUND
This study is part of a wider research project that
investigates the impact of range names on
spreadsheet reliability, in order to assess the
feasibility of recommending range names for use in
spreadsheets, and is guided by the following
objectives:
Objective 1: Investigate the impact of range
names on the ability of novice users to successfully
identify and correct errors in a spreadsheet.
Objective 2: Investigate the reliability of
spreadsheets developed using range names.
Objective 1 was addressed by experiments
detailed in Section 2.1. The study presented in this
paper begins to address Objective 2 by examining
the reliability of formulas developed using range
names, as compared to formulas developed using
cell references.
2.1 Debugging Experiments
The experiments that addressed Objective 1 were
adapted from a design first used in a study by Howe
and Simkin (2006), and later used by Bishop and
ICSOFT 2011 - 6th International Conference on Software and Data Technologies
270
McDaid (2007). Participants were given a
spreadsheet seeded with errors, and were asked to
correct any mistakes they could find, directly in the
spreadsheet. They were not told how many errors
were in the spreadsheet, or what types of errors were
included. Their cell clicks were recorded by T-CAT,
a “time-stamped cell activity tracking tool”.
2.1.1 Experiment 1
Initially a small exploratory experiment was carried
out on 21 computing students (McKeever et al.,
2009). The spreadsheet, seeded with 42 errors, was
first modified to include range names in formulas.
Table 1: Results of exploratory study.
Error Type No. of Seeded
Errors
% Corrected by
Participants
Clerical 4 11%
Rule Violation 4 63%
Data Entry 8 64%
Formula 26 47%
When these results were compared with the
results of an identical study by Bishop and McDaid
(2007) which did not involve range names, it was
found that the students in the exploratory trial found
19% fewer formula errors than the students who
took part in the trial without range names. Further
analysis showed that the exploratory group found
29% fewer cell reference errors, and 24% fewer
range reference errors.
The significant difference in results indicated
that the inclusion of range names in a spreadsheet
does not make it easier to debug. Importantly it
prompted a second, better-controlled experiment
with more focussed and detailed research questions.
2.1.2 Experiment 2
Based on the results and feedback from the first
experiment, a new set of research questions were
developed for a controlled second experiment which
examined, in a more rigorous way, the impact of
range names on the spreadsheet debugging process
(McKeever and McDaid, 2010). This experiment
was designed to investigate whether users are better
at finding errors in formulas when the formulas
contain range names as opposed to cell references,
and to provide information on the following four
cases of error involving range names:
RQ1: The error is due to the wrong range being
assigned to a name.
RQ2: The error is due the wrong range name being
used in a formula.
RQ3: The formula contains a name, but the error is
not due to the name.
RQ4: There are no names in the formula, but names
in the spreadsheet.
The spreadsheet used for the first trial was
modified for this experiment, to reflect feedback
received. The number of names in the spreadsheet
was reduced from 152 to 12. The number of errors
was reduced from 42 to 39. 29 students took part in
this trial, and this time they were divided randomly
into control and treatment groups.
The treatment group found 4% fewer errors than
the control group, for the errors relating to RQ1 and
RQ2; these results are not statistically significant.
The treatment group also found 20% fewer errors
relating to RQ3, and 12% fewer errors relating to
RQ4; these results are statistically significant.
2.1.3 Summary
These findings demonstrate how range names make
debugging a spreadsheet more difficult for novice
users. Development however, is arguably more
important than debugging; if range names can help
developers avoid errors initially then less debugging
is necessary. For this reason, we now focus our
research on the role of range names in formula
development.
3 RESEARCH QUESTIONS
To address Objective 2, to investigate the reliability
of spreadsheets developed using named ranges, the
authors first looked at the reliability of basic
formulae developed using range names. The work
was guided by the following two research questions:
RQ1: Do users make more mistakes using range
names or cell references, when asked to develop a
simple spreadsheet formula?
RQ2: Does the time it takes users to develop a
simple spreadsheet formula differ for formulas using
range names than for formulas using cell references?
Range names, as with programming variables, can
be chosen according to various conventions. The
work will examine each of the research questions
above for each of the following six range naming
structures:
a. Where no two names begin with the same word.
b. Where several different names begin with the
same word, but end in a different word.
c. Where several different names begin and end in
the same words, with a change in the number in
the middle of the name.
d. Where names begin with the same word with a
RANGE NAMES - A Risky Practice in Spreadsheet Development?
271
change in the trailing number.
e. Where several different names begin and end
with the same word, with a minor change in the
word in the middle.
f. Where names do not follow any naming
convention, and are inconsistent.
4 METHODOLOGY
To address these questions we decided to isolate a
basic formula task, one that the participants would
have used many times. It was decided to ask the
students to add the values from a number of cells
together. The participants would not require domain
knowledge to complete the tasks.
4.1 Task Design
The spreadsheet designed for this experiment held
six worksheets, each of which contained two tasks.
For each task they were asked to calculate the total
of a number of specified cells, one task using cell
references, the other using range names. This
resulted in a total of twelve tasks, six for cell
references and six for range names.
The range names in each sheet were developed
using a different naming convention, in order to
examine each structure defined in the research
questions. Examples are shown in Table 2:
Table 2: Naming Conventions.
Sheet Name Example
Sheet 1 ArnottsSales, ClearysSales
Sheet 2 TopshopGP, TopshopNP.
Sheet 3 HMV2008Profits, HMV2009Profits
Sheet 4 PrimarkTax2006, PrimarkTax2007
Sheet 5 GAPSecWages, GAPSupWages
Sheet 6 PPatchVar,
Fixed_Costs_Pumpkin_Patch
Sheet level names were used for this experiment,
so that the subject would only be able to view the
names relevant to the task on which they were
working. There were 264 names in the workbook,
and this allowed us to examine the naming structures
in isolation without confusing the participants.
4.2 Conducting the Experiment
The participants in this trial were 15 postgraduate
students from the Higher Diploma in Computing
class in Dundalk Institute of Technology, most of
whom had spend a period of time a period in the
workplace before returning to education. The
participants were considered intermediate users,
based on a background survey carried out on a
sample of 14 members of the class.
Within subject design was chosen for this study,
due to the low number of available participants.
Each participant developed twelve basic formulas,
six using cell references and six using range names.
To avoid any carryover effect that might occur, the
order in which the participants carried out the tasks
was alternated, by dividing them into two groups.
Group A used range names for the first task on each
worksheet and cell references for the second; group
B used cell references for the first and range names
for the second.
After randomly dividing the subjects into groups
the researcher explained how to complete the
experiment. T-CAT was used to record the time each
task took, and the participants were observed
throughout.
4.3 Example
The following task was given to the participants in
Group A, on Sheet 5.
Task 9: With a formula that uses range names, in
cell C28 calculate the total wages for the
following employees :
TJ Hughes supervisor
Mackays secretary
GAP secretary
Clintons Card Shops supervisor
Eisengger supervisor
Eisengger secretary
Oasis secretary
One participant made the following erroneous
answer to this task:
=TJHughesSupWages+MackaysSecWages+Clin
tonsCardShopsSupWages+EisenggerSupWages+Eis
enggerSecWages+OasisSecWages
The correct answer would also include
“GAPSecWages”.
5 RESULTS
Out of the fifteen participants, only seven completed
all tasks correctly, with eight students making at
least one error. The following results are based on
the individual errors made, rather than the
participants who made the error.
ICSOFT 2011 - 6th International Conference on Software and Data Technologies
272
5.1 Errors Made
Table 2 shows the number of errors made in each of
the sheets on the spreadsheet, according to whether
the task included range names or cell references.
Table 3: Experiment results.
Errors Named Ranges Cell References
Sheet 1 0 1
Sheet 2 4 0
Sheet 3 3 0
Sheet 4 0 1
Sheet 5 3 2
Sheet 6 2 0
Total 12 4
We expected to find two error types – selection
errors, where the wrong range is used in the formula,
and omission errors, where a reference is left out.
Five of the range name errors were selection errors,
and six were omission errors; one cell reference
error was a selection error, and three were omission
errors. In addition to these errors, another type
occurred when participants used range names: in two
cases a subject added an extra name to the formula.
During task 3 one subject added the name ZaraNP
twice, and during task 6 a participant included
Costcutter2008Profits and Costcutter2009Profits.
Statistical analysis was conducted on these
results, using McNemar’s test for paired proportions.
This indicates that in two cases, the second and third
naming conventions, the subjects were less effective
at developing formulas using range names, than with
cell references.
5.2 Time Results
Table 3 shows the average time in minutes it took
the participants to complete the tasks on each sheet,
according to whether the task included range names
or cell references. It took on average 0.7 minutes
longer to complete the tasks using range names.
Table 4: Average times (minutes).
Errors Named Ranges Cell References
Sheet 1 1.98 1.33
Sheet 2 1.95 1.3
Sheet 3 1.45 1.1
Sheet 4 1.35 1.04
Sheet 5 1.31 1.19
Sheet 6 3.19 1.04
Total 1.87 1.17
The times recorded by T-CAT included the
assimilation time for both beginning the experiment,
and for each sheet. This means that the time it took
each participant to complete the second task on each
sheet should have been naturally less than the first.
Such bias was eliminated through the structure of
the experiment and randomisation of participants.
5.3 Findings
Statistical analysis of the significance of the
difference in the times to perform each task was
performed based on a paired T- test, with a 5% level
of significance. These statistical tests support the
following statements:
1. Intermediate users make fewer mistakes when
developing formulae using cell references than
using range names where:
a. Several different names begin with the same
word, but end in a different word.
b. Several different names begin and end in the
same words, with a change in the number in
the middle of the name.
2. Intermediate users take less time to develop
formulae using cell references than using range
names where:
a. Several different names begin with the same
word, but end in a different word.
b. Several different names begin and end in the
same words, with a change in the number in
the middle of the name.
c. Names begin with the same word with a
change in the trailing number.
d. Names do not follow any naming convention,
and are inconsistent.
5.4 Issues
One problem with the task was the if the user
clicked on a cell that was named while writing a
formula, the name of the cell appeared in the
formula rather than the cell reference. Unfortunately
this is not something that can be caught by the T-
CAT macro. The participants were observed
throughout the task however, and the researcher is
satisfied that they followed the instructions exactly.
Any future repetitions of this experiment on larger
groups will have to take this into consideration.
Students were used in these experiments.
Although this approach can be controversial, studies
have shown that students have similar abilities to
professionals. As stated previously, it is frequently
the case that spreadsheets are, in fact, developed by
novice or intermediate users.
The participants in this trial were taught how to
use range names, which is not the case for real-
world users.
RANGE NAMES - A Risky Practice in Spreadsheet Development?
273
6 CONCLUSIONS
This experiment shows that there is no evidence to
support the theory that range names reduce the
quantity of errors found in spreadsheets, or make
them easier to use. The number of errors made when
the participants used range names was higher overall
than when the same subjects used cell references.
The average time it took to complete each task was
also higher when the participants used names.
Importantly, the increase in selection errors
illustrates that range names do not help the user to
avoid referring to the wrong cell, as is often claimed.
The increase in the time it takes to develop a
formula dispels the notion that range names make
formulas easier to create.
Methods that appear to work for a small number
of expert developers must not be presumed to work
for other less experienced users who do not have the
same experience of errors. This work concludes that,
while spreadsheet quality is a real and important
issue, the use of range names is not the solution for
novice or intermediate level users.
6.1 Future Work
First we plan to repeat this experiment on a larger
group of subjects to improve validity. Range names
could possibly save developers time when cells are
located on different worksheets; this is something
we intend to investigate in the future.
More generally, we anticipate the work will
look at impact of range names on the entire
spreadsheet development process. This is part of a
larger study and the there are also plans to extend
this experiment to focus on more general
spreadsheet development. It is important to examine
the performance of expert users with regard to range
names.
REFERENCES
Baker, K. R., Foster-Johnson, L., Lawson, B. & Powell, S.
G. 2006. A Survey of MBA Spreadsheet Users.
Bishop, B. & McDaid, D. K. 2007. An Empirical Study of
End-User Behaviour in Spreadsheet Debugging.
PPIG. Salford 2007.
Blood, A. T. 2006. Elements of Good Spreadsheet Design:
Spreadsheet Design Notes [Online]. Available:
http://www.xl-logic.com/modules.php?name=Content
&pa=showpage&pid=2 [Accessed May 2009].
Burnett, M., Cook, C. & Rothermel, G. 2004. End-User
Software Engineering. Communications of the ACM,
47, 53-58.
Croll, G. J. 2005. The Importance and Criticality of
Spreadsheets in the City of London. EuSpRIG.
Howe, H. & Simkin, M. G. 2006. Factors Affecting the
Ability to Detect Spreadsheet Errors. Decision
Sciences Journal of Innovative Education, 4, 101-122.
Jones, D. M. 2008. The New C Standard (Identifiers) An
Economic and Cultural Commentary.
Keller, D. 1990. A Guide to Natural Naming. SIGPLAN
notices, 25, 95-102.
McKeever, R. & McDaid, K. 2010. How do Range Names
Hinder Novice Debugging Performance? EuSpRIG.
London.
McKeever, R., McDaid, K. & Bishop, B. 2009. An
Exploratory Analysis of the Impact of Named Ranges
on the Debugging Performance of Novice Users.
EuSpRIG. Paris, France.
O'Beirne, P. 2005. Spreadsheet Check and Control: 47 key
practices to detect and prevent errors, Gorey, Systems
Publishing.
Panko, R. R. & Ordway, N. 2005. Sarbanes-Oxley: What
About all the Spreadsheets? Controlling for Errors and
Fraud in Financial Reporting. EuSpRIG.
Pearson, C. H. 2009. Defined Names [Online]. Available:
http://www.cpearson.com/Excel/named.htm [Accessed
November 25 2008].
Powell, S. G., Baker, K. R. & Lawson, B. 2007. Errors in
Operational Spreadsheets [Online]. Available:
http://mba.tuck.dartmouth.edu/pages/faculty/steve.po
well/publications.html [Accessed].
Purser, M. & Chadwick, D. 2006. Does an awareness of
differing types of spreadsheet errors aid end-users in
identifying spreadsheets errors? EuSpRIG.
Scaffidi, C., Shaw, M. & Myers, B. 2005. The “55M End-
User Programmers” Estimate Revisited. Pittsburgh,:
Institute for Software Research Human-Computer
Interaction Institute.
ICSOFT 2011 - 6th International Conference on Software and Data Technologies
274