A WEB-BASED ALGORITHM ANALYSIS TOOL
An Online Laboratory for Conducting Sorting Experiments
James TenEyck
Department of Computer Science, Marist College, Poughkeepsie, NY, USA
Keywords: Sorting algorithms, Algorithm analysis, Asymptotic behavior of algorithms, Big-oh notation.
Abstract: In this paper, an on-line laboratory is described in which students can test theoretical analyses of the run-
time efficiency of common sorting algorithms. The laboratory contains an applet that allows students to
select an algorithm with a type of data distribution and sample size and view the number of compares
required to sort a particular instance of that selection. It provides worksheets for tabulating the results of a
sequence of experiments and for entering qualitative and quantitative observations about the results. It also
contains a second applet that directly measures the goodness of fit of recorded data with common functions
such as cn
2
and cn(lg(n)). The laboratory is intended to reinforce classroom learning activities and other
homework assignments with a practical demonstration of the performance of a variety of sorting algorithms
on different kinds of data sets. It is a singular on-line tool that complements other online learning tools such
as animations of various sorting algorithms and visualizations of self-adjusting data structures. The
laboratory has been used in algorithms courses taught by the author at Marist College and Vassar, and is
available on-line for use by a more general audience.
1 INTRODUCTION
Algorithm analysis is a particularly difficult concept
for many computer science students to learn.
Despite the fact that students are taught to analyze
the run-time efficiency of algorithms using big-oh
notation early in their course of study, many upper-
level students are uncomfortable using this analysis
tool. Many students find the study of algorithm
analysis to be too abstract and consider it to be not
particularly relevant to their prospective career.
Students often state that when they need to find an
efficient algorithm to apply to a particular problem,
they can find one in a book or on the web, even
though, without the appropriate analytical skills,
they are relying more on the authority of their source
than on their own assessment.
This sorting laboratory attempts to make the
study of algorithm analysis more concrete by
providing students with a hands-on experience of
comparing the observed run-time behavior of
various sorting algorithms with results obtained
from analysis of asymptotic behavior. It also
requires the students to evaluate how the various
algorithms perform upon varying data sets.
Sorting algorithms are well suited for an
algorithm analysis laboratory. They are familiar to
students and provide a good basis for comparing
alternative approaches. Sorting algorithms are
among the first algorithmic procedures that students
encounter in their course of study, and by the time
they are ready to take a class in Algorithm Analysis
and Design; they are usually familiar with several
alternative approaches that they can choose from to
perform a sort. With a number of different
algorithms that can be used to sort the same initial
data set, a comparison of the relative performance of
each is easily made. Students are able to observe for
themselves that choosing an appropriate sorting
algorithm for a particular application is not strictly
pre-determined, but requires an analysis of the data
set to be sorted and a familiarity with the strengths
and weaknesses of the various algorithmic
approaches. With this laboratory experience
accompanying classroom instruction and other
homework assignments, the student should obtain a
more comprehensive appreciation of sorting
algorithms.
2 COMPARISON WITH OTHER
AVAILABLE TOOLS
The sorting laboratory is different from most of the
other on-line material augmenting courses in
Algorithms in that it deals with the analysis rather
485
TenEyck J. (2005).
A WEB-BASED ALGORITHM ANALYSIS TOOL - An Online Laboratory for Conducting Sorting Experiments.
In Proceedings of the First International Conference on Web Information Systems and Technologies, pages 485-489
DOI: 10.5220/0001233304850489
Copyright
c
SciTePress
than the step-by-step display of the workings of an
algorithm. Animation tools are readily located on
the web by performing a search using algorithm
animation or sorting algorithms as a key. Analysis
tools on the web are far scarcer, and it requires a
much deeper traversal of the list of possible matches
to find anything similar to this laboratory.
The most similar tool to the laboratory described
here is one produced by Carol Wellington at
Shippensburg University (Wellington, 1998). It also
focuses on sorting algorithms. It allows the user to
select a sorting algorithm, a data set and a sample
size, and it combines an animation with a report of
the number of compares and swaps that were
executed. It encourages the user to perform an
analysis of the results, but it does not produce a
worksheet to assist in that effort.
Another prominent example of an algorithm
analysis tool is the KLYDE workbench developed at
DePauw University (Berque, 1994). It is one in a
line of locally implemented laboratories (Collins,
1991; Baldwin, 1992; Epp, 1992) developed at the
time in response to the Computing Curricula 1991
Report of the ACM/IEEE Joint Curriculum Task
Force (Tucker, 1990) that recommended
experiences involving experimentation should be
included in the undergraduate computer science
curriculum. The original KLYDE system was
developed in Turbo Pascal to run under DOS on the
x86 based platforms of the time. It supported a
varied collection of algorithms, and used execution
time as the metric for evaluating performance.
KLYDE is not an on-line tool, but is available from
the developers.
The intent of the sorting laboratory described in
this paper differs from the two previously cited in
several important ways:
Its only focus is analysis. It does not do
animations. Animations are useful for
explicating the algorithmic approach and links
to other animation sites are provided on the
initial page, but this tool uses larger sample
sizes that would make animation impractical.
It reinforces the teaching of asymptotic analysis.
The laboratory experience allows the student to
compare empirical results with the worst-case
asymptotic bounds developed in class. The
algorithms are implemented essentially as they
appear in the standard text, and are not specially
modified to enhance their execution speed. The
metrics used here are comparison and swap
counts, as is the case in the classroom
discussion.
The applets and all of the supplementary
material for performing experiments are
available online..
3 DESCRIPTION OF THE SORTING
LABORATORY
The laboratory consists of an initial html page with
links to the various resources it provides. The
laboratory has two main components. The first is an
applet in which the student may repeatedly select a
sorting algorithm, a sample size and a data set, and
read from the display the number of comparisons
and swaps needed to perform the sort. At present
the choice of algorithms consists of insertion sort,
selection sort, bubble sort, quicksort, and mergesort.
The data sets include randomly generated integers,
highly degenerate data, almost sorted data, and
reverse ordered integer values; and the sample sizes
are all the even powers of two from 16 to 4096. A
Figure 1: User Interface for Main Applet
WEBIST 2005 - E-LEARNING
486
depiction of the applet’s interface is shown in Figure
1 on the previous page.
In performing an experiment, the student selects
an algorithm, a data type, and a sample size from the
three JList objects located beneath their respective
labels. When a selection has been made, the student
clicks on the Select button in the panel at the right of
the screen, and an output string is appended to the
text area at the bottom-left (center) of the display.
The output indicates the algorithm selected, the type
of data set operated upon, the sample size chosen,
and the number of compares and swaps that were
counted during the run. The applet appends the
output string from each new selection to the contents
of the text area. The text area can be cleared at any
time by the student clicking on the Clear button at
the bottom of the panel at the right of the screen.
This panel also contains a small text area above the
Select button containing the instructions for the user.
The student may obtain downloadable
worksheets for recording data and making
qualitative and quantitative observations about that
data by clicking on one of the other primary links on
the first page. The worksheet provides boxes for
entering data and lines for student responses to
questions about the performance of the various
algorithms on each of the different data sets. The
worksheet also provides the values of n
2
and n lg(n)
for each of the sample sizes in one of the questions
and contains instructions for how to compare the
recorded data points with a multiple of either of
these two common functions. The laboratory does
not presently provide its own graphing tool, but the
worksheet provides instructions to the students on
how to use the graphing facility in Excel to visualize
comparisons between different algorithmic
approaches and between curves generated from the
data and plots of the two standard functions.
However, because there is such a wide disparity
between the range of sample sizes and the range of
comparison counts, the difference in scale between
the horizontal and vertical axes distorts the shape of
the curves and makes visual recognition of the
algorithmic behavior less transparent.
The second component directly measures the
goodness of fit of the generated data points to the
functions n
2
and n lg(n) and automatically generates
the parameters of the best fitting curve. The selected
algorithm is run over a range of sample sizes of
randomly generated data, and the best-fitting curve
for the resulting data points is determined. In this
approach, data points for ten different sample sizes
are obtained. These samples sizes range from 400 to
4000 and for each sample size the average of five
runs is used to determine the number of
comparisons. Curvilinear regression is used to
determine a best fit to one of the two standard
curves, and the parameters of this curve are
appended to an output string (Miller, 1965).
In figure 2, the interface for this second applet is
displayed. The general features are consistent with
those of the first applet. The student using this
applet need only select the sorting algorithm and the
program will run multiple sorts, collecting the data
points to be used in the regression analysis. When
the program completes, the parameters of the best
fitting curve are appended to the output string and
displayed in the text area at the bottom of the applet.
The average number of compares and the predicted
number for each sample size are copied into the
table located in the center of the applet. The only
data set type is the randomly generated distinct
integers, and the sample sizes are pre-set.
This second component eliminates the role of the
student in recording observations and evaluating the
data that he or she has collected. It effectively
automates out the traditional role of the
experimenter. Its principal value is that it provides
an immediate comparison between the average case
performance of the algorithm with the worst-case
performance (in the case of quicksort, the average
case performance) predicted by big-oh analysis. It
can also be used by the student to quickly check the
reasonableness of the results obtained from doing his
or her own calculations on random data sets.
Figure 2: User Interface for the Second Applet
A WEB-BASED ALGORITHM ANALYSIS TOOL - An Online Laboratory for Conducting Sorting Experiments
487
4 INTEGRATION OF THE
LABORATORY INTO THE
CURRICULUM
In the Algorithm Analysis and Design course in
which the sorting laboratory has been used, it has
been integrated into a learning module with the
following objectives:
Reinforce the student’s grasp of algorithm
analysis.
Develop the student’s appreciation of the
strengths and weaknesses of the various sorting
algorithms.
Provide a learning experience in which the
student applies the knowledge acquired through
the classroom and laboratory activities to a new
situation.
The study of sorting algorithms directly follows
a unit on asymptotic analysis and solving recurrence
relations. The laboratory activity is assigned after
an initial lecture that emphasizes the design of
iterative algorithms using loop invariants and a
homework assignment that includes implementing
insertion sort and quicksort, clearly stating and
adhering to appropriate loop invariants.
At the completion of the unit the students are
asked to write a program that will efficiently sort a
suite of data sets supplied by the instructor. They
are not told the exact composition of this suite but
they do know that each of the data set types they
encountered in the sorting laboratory are represented
to one extent or another. They are in competition
with each other to either select the algorithm with
the best overall performance or produce an efficient
hybrid that uses a different algorithm for large and
small segments of the data. The student algorithms
are incorporated into a benchmark program provided
by the instructor and run on a common platform on
the common suite of data sets. The algorithm that
realizes the best performance "wins" the
competition, and the student that submitted it
receives bonus points that are added directly to his
or her final grade-point average.
At this writing there is only a qualitative
judgment that the laboratory experience augmented
by the programming competition has enhanced the
learning experience of the students. They seem to
enjoy the exercise and have a better on-time
completion percentage than they achieve on other
assignments.
5 EVOLUTION OF THE SORTING
LABORATORY
This laboratory evolved from an activity that was
created to give visiting high school students a taste
of an aspect of computer science. It was one of a
number of activities designed to be fun as well as
educational. The initial activity ran as a standalone
menu-driven Pascal program and required the
students to tabulate the results of the sorts and
answer some qualitative questions.
A number of years after this event, the program
was rewritten as a java applet with a GUI interface
and incorporated into an expanded laboratory
exercise for students in an Algorithm Analysis and
Design class. The applet directly displaying the
parameters of the least squares best-fit curve to a
standard function was added recently.
The plan for the near future is to add shellsort
and heapsort to the set of algorithms, and replace the
too small text area displaying user instructions with
a pop-up message dialog box. With shellsort, the
student will have an additional option of selecting
the step size. A more long-range objective is to
augment the printable worksheet with an interactive
spreadsheet. The goal is to remove as much of the
routine drudgery in the calculations as possible with
the student retaining control over which calculations
to perform. Ultimately the sorting laboratory serves
as a model for an algorithm analysis laboratory
encompassing other kinds of algorithms.
6 CONCLUSIONS
A search of the internet has revealed few similar
online tools for evaluating the performance of
algorithms. Several of the more frequently
referenced animations of sorting algorithms indicate
the sample size and number of comparisons and
swaps in performing the sort, but they do not attempt
to evaluate performance on a range of sample sizes.
This laboratory is a contribution to a niche of online
learning tools that has yet to be adequately filled.
The laboratory for analysis of sorting algorithms
described in this paper enhances student learning in
the following ways:
It is easy to use and provides a rather enjoyable
learning experience.
It provides concrete examples of the run-time
behavior of the various algorithms on a variety
of data set types.
It uses the scientific method of collecting and
analyzing data and using the results to test
hypotheses. It provides a worksheet with a set
WEBIST 2005 - E-LEARNING
488
of questions that require students to observe and
write out salient features of the performance of
the algorithms on the different data sets.
It augments other learning activities in the unit
teaching about the design, implementation, and
analysis of performance of sorting algorithms.
This tool has been used several times by the
author in courses in the analysis of algorithms.
While no quantitative measure of the benefit it
provides has been attempted, the qualitative
observations of the benefit include greater student
interest and more timely completion of assignments.
The sorting laboratory is available online for use by
the academic community at the web site listed
below.
www.academic.marist.edu/~jzbv/algorithms/sorts
REFERENCES
Berque, D., Bogda, J., Fisher, B., Harrison, T., and Rahn,
N., 1994. The KLYDE Workbench for Studying
Experimental Algorithm Analysis. Proceedings of the
Twenty-fifth SIGSCE Symposium on Computer
Science Education, pp 83-87. Pheonix, AZ.
Collins, W., 1991. Estimating Execution Times: A
Laboratory Exercise for CS2. SIGSCE Bulletin, pp
358-363, Vol23, No. 1, March 1991.
Epp, E., 1992. Yet Another Analysis of Algorithms
Laboratory. SIGSCE Bulletin, pp 11-14, Vol. 24, No.
4, December 1992.
Miller, I. and Freund, J. E., 1965. Probability and
Statistics for Engineers, Prentice-Hall.
Tucker, A. (editor), 1990. Computing Curricula 1991:
Report of the ACM/IEEE-CS Joint Task Force. IEEE
Computer Society Press.
Wellington, C. http://www.ship.edu/~cawell/Sorting/
A WEB-BASED ALGORITHM ANALYSIS TOOL - An Online Laboratory for Conducting Sorting Experiments
489