UNDERSTANDING THE DYNAMICS
OF INFORMATION SYSTEMS
Abdelwahab Hamou-Lhadj
Electrical and Computer Engineering, Concordia University, 1455 de Maisonneuve West, Montreal, Canada
Keywords: Software Engineering, Information Systems, Program Comprehension, Dynamic Analysis, Reverse
Engineering.
Abstract: Information systems are in the process of undergoing significant transformations triggered by the Internet
technology. However, most existing systems suffer from poor to non-existent documentation, which makes
the maintenance process a daunting task even for a skilled software engineer. As a result, software engineers
are often faced with the inevitable problem of understanding different aspects of the system before
undertaking a simple maintenance task. This paper describes ongoing research in the area of program
comprehension that aims at investigating efficient techniques for the understanding of the dynamics of
software systems with a particular emphasis on information systems. The proposed approach is based on the
analysis of system’s execution traces. The long-term objective is to create effective tool support for software
engineers working on maintenance tasks.
1 INTRODUCTION
Today’s information systems are object-oriented,
component-based, and distributed in multi-tier
architectures. Maintaining such systems is often a
complex task; software engineers must understand
various aspects of a system before they can make
changes that preserve reliability and other system
attributes. The problem is further exacerbated by the
fact that documentation of the system under study is
rarely updated while key developers, knowledgeable
of the system's design, commonly move to new
projects or companies.
Understanding a software system requires both
static and dynamic analysis techniques. The former
focuses on exploring the structure of the system by
analysing its source code whereas the latter provides
insight into its behavioural properties. Both
approaches aim to extract the system’s components
and their relations at different levels of abstraction.
Today’s most prominent maintenance activities
in the context of information systems consist of the
migration of existing systems into Web technology,
and the integration of the system’s components
using Web Services. Both activites necessitate the
understanding of the way the system functions, i.e.
software maintainers must understand the behavior
of the system before they can undertake the
previsously mentioned maintenance tasks.
In this paper, we describe ongoing research that
focuses on techniques that permit the analysis of the
dynamics of a software system. These techniques
revolve around efficient analysis of execution traces.
Traces have the advantage of being precise and
sensitive to the input data (Ball 1999). Unlike static
analysis, where the analyst needs to go through the
many different relationships of all the system
artefacts, traces can be collected in such a way that
they contain only the information needed to perform
the maintenance activity at hand. In addition, system
execution can be driven by specific input data which
provides a powerful mechanism for relating program
inputs, outputs, and behaviour.
Traces, however, have historically been
difficult to work with since they may contain
millions of events. There is a need to find ways to
“shrink” their content while keeping as much of
their essence as possible.
The proposed research aims at investigating
how to best represent traces. The long-term
objective is to integrate the results into reverse
engineering tools that can be used by software
maintainers to efficiently analyse the content of
traces while performing maintenance tasks.
498
Hamou-Lhadj A. (2007).
UNDERSTANDING THE DYNAMICS OF INFORMATION SYSTEMS.
In Proceedings of the Ninth International Conference on Enterprise Information Systems - DISI, pages 498-502
DOI: 10.5220/0002399904980502
Copyright
c
SciTePress
This paper is organised as follows: The next
section discusses briefly the components of an
execution trace. In Section 3, we present our
research framework and the particular topics that are
being investigated by our research team. In Section
4, we present related work. We conclude the paper
in Section 5.
2 EXECUTION TRACES
Traces are often generated by executing the features
of the system under study. Test cases have also been
used at a less extent. There are different techniques
for generating traces. The first is based on
instrumenting the source code, i.e. inserting probes
such as print-out statements at different locations in
the source code. In the context of object-oriented
systems, probes are usually inserted at each entry
and optionally each exit of every method.
Instrumentation is usually done automatically. The
second approach consists of instrumenting the
execution environment in which the system runs.
For example, the Java Virtual Machine can be
instrumented to generate events of interest. The
advantage of this technique is that it does not require
the modification of the source code. Finally, it is
also possible to run the system under the control of a
debugger. In this case, breakpoints are set at
different locations (e.g. entry and exit of a method).
This technique has the advantage of not modifying
the source code and the environment; however, it
can slow down considerably the execution of the
system.
To reproduce the execution of a distributed
object-oriented system, we need to collect the events
related to object construction/destruction, method
entry/exit, and process execution and
synchronization (De Pauw 2002, Richner 2002).
The latter usually requires a global clock in order to
replay the execution of the system accurately.
There are various sorts of traces depending on
the type of analysis performed. For example, traces
of methods calls are often used to understand object
interactions. Statement-level traces represent the
information at a lower level of abstraction allowing
software maintainers to detect potential defects in
the system. Some researchers suggest using traces of
inter-component interactions to depict the system’s
behaviour at the architectural level, e.g. (Walker
1998). The distributed and object-oriented nature of
today’s information systems might require the
combination of multiple types of traces. One aspect
of our research is to determine which traces are most
suitable for the comprehension of the dynamics of
information systems.
3 RESEARCH FRAMEWORK
Figure 1 depicts our approach for trace analysis.
Traces are generated through code instrumentation.
They are then visualized using a visualization
environment. Due to the extraordinary size of typical
traces, we need to develop simplification algorithms
that can help software engineers explore the content
of traces in an easy way. In addition, we would like
to explore traces so as to recover behavioural design
models that are usually lost as the system undergoes
several ad-hoc maintenance tasks.
The bulk of the proposed research is three-fold
that we present here and elaborate in more detail in
the subsequent sections:
Trace-Simplification Techniques: This research
tackles the problem of reducing the complexity
of traces while keeping as much of their essence
as possible.
Design Recovery: This work focuses on the
recovery of high-level behavioural design
models from traces.
Trace Visualization: This work focuses on
techniques for efficient representation of trace
information.
Figure 1: Framework of Proposed Research.
3.1 Trace Simplification Techniques
This work focuses on the problem of ‘compressing’
traces, thus enabling software engineers to better
understand the behaviour of software. The term
‘compressing’ is not used here to refer to the well-
known concept of data compression, but rather to
Source
Code
Instrumented
Code
Simplification
Algorithms
Different Types
of Traces
Visualization
Environment
Design
Recovery
UNDERSTANDING THE DYNAMICS OF INFORMATION SYSTEMS
499
making a trace appear simpler and smaller so that
software engineers can understand relevant parts
more easily. The objective of the proposed project is
to develop a number of algorithms for compression,
particularly focusing on criteria by which various
parts of a trace can be treated as the same pattern.
For example, consider a portion of a trace of routine
calls, T1: A(B(CCCCCD)(B(DCC)), where A(B)
denotes “A calls B”. This trace can be transformed
into T2: A(B(CD)) if the contiguous repetitions of
“C” and the order of calls from “C” to “D” are
ignored when comparing the two subtrees rooted at
“B”. At a high level, the information contained in T2
might be sufficient for the programmer’s purposes.
Existing trace analysis tools, e.g. (Jerding 1997,
Systä 2000, De Pauw 2002), support a variety of
matching criteria that software engineers can use
during the exploration of the trace. However, the
sheer size of typical traces makes this exploration
process a daunting task, further complicated by the
fact that some criteria require, in advance, the setting
of specific parameters. In addition, the order in
which they are applied can have a significant impact
on the resulting trace. Automated assistance is
clearly needed. We propose developing a set of
algorithms that will combine several criteria and
automatically suggest appropriate settings for the
rapid exploration of the trace content. The
algorithms should be designed by taking into
consideration the nature of the trace being studied
(e.g. trace of routine calls, inter-process messages,
etc.), as well as the current goals and experience of
the maintainer. They will vary depending on the
criteria used, the order in which they are applied,
and input parameters specific to each criterion.
The proposed approach encompasses several
steps. First, we need to conduct a comprehensive
study of the most cited criteria in order to identify
the ones that are best suited to the analysis of
information systems. The study will involve
applying these criteria to several large traces and
using statistical techniques to analyse the results.
Next, we will design and implement the algorithms
starting with a few matching criteria. The remaining
steps are performed iteratively: (1) Experiment with
the algorithms using a multitude of traces generated
from various information systems, (2) Validate the
results by involving software engineers with
different levels of knowledge of the system under
study working on different maintenance tasks, (3)
Refine the algorithms by modifying their input
parameters, adding new matching criteria, etc, and
finally going back to Step (1).
3.2 Design Recovery
The objective of this work is to develop efficient
techniques for the recovery of high-level
behavioural design views from execution traces.
These views record the essence of traces in terms of
a few abstracted elements, making it easier for an
engineer to comprehend the information. In previous
work, we introduced the concept of trace
summarization (Hamou-Lhadj 2005) so as to extract
summaries from large traces. The process relies on
successive filtering of trace content by removing
utilities. We worked with software engineers from
the telecom. industry on developing a ‘utilityhood’
metric in order to assess, in the absence of proper
documentation, the extent to which a component
(e.g. class, method, etc) can be considered a utility
(Hamou-Lhadj 2004, Hamou-Lhadj 2005). This
metric is based on the idea that a component with
higher fan-in is more likely to be a utility especially
if the calls come from diverse parts of the system. In
contrast, a component that is called from only a few
places but calls many other components would most
likely be an important component of the system.
The first step of the proposed project is to
continue the work with the utilityhood metric in
order to improve its effectiveness when applied to
information systems. The second step is to
experiment with the utilityhood concept so as to
detect additional types of utility components,
including processes, classes, packages, etc. The
experiment will require the use of different target
systems. The validation should involve the original
designers of the systems if available or whatever
other valid documentation is available. Furthermore,
we need to experiment with traces generated from
these systems and assess the accuracy of the high-
level models extracted using the utility removal
approach. The most important challenge we
anticipate is to determine a proper utility threshold
that can lead to views that are neither too abstract
nor too detailed; in other words, views that are as
informative as possible to software engineers.
3.3 Trace Visualization
The objective of this research is to develop a
visualization environment for representing trace
information using multiple views in order to provide
effective support for program comprehension across
a wide range of maintenance tasks. The main
practical result we expect to achieve is a working
tool that incorporates various views of the system’s
behaviour. The views will be linked to allow
software engineers to navigate from one to another,
ICEIS 2007 - International Conference on Enterprise Information Systems
500
enabling them to analyse the system dynamics at
different levels of abstraction.
The first step of this work will focus on
determining the types of traces necessarily for the
understanding of the behaviour of information
systems. Due to the nature of today’s information
systems, we anticipate that the following views will
be in use:
Call View: A trace will be viewed as a call tree
exhibiting the call relationship between the
system components.
Process View: This view will show a trace as a
set of processes interacting by exchanging
messages.
Object View: This view will focus on object
creation and deletion. It is particularly interesting
for maintenance tasks that revolve around defect
detection, performance analysis, etc.
Component View: This view will allow software
engineers to understand the interaction among
the system components. This is important for
information systems since they tend to be
component-based.
Data View: This view will focus on the way
particular items in databases are updated.
In addition, the tool is expected to have
supporting views. The main ones that we envision to
be useful are: The Source Code View, which will be
used to map trace elements to the source code, and
the Statistics View, that will display statistical
information to orient the user during the exploration
of traces. The tool will allow the traditional
browsing capabilities as well as the simplification
algorithms discussed earlier. We anticipate building
our tool as an Eclipse plug-in, so most of the tool
infrastructure will be provided.
There are a number of key research challenges
associated with this design. First, we need to have an
internal model to represent the information
displayed. This model must be scalable to handle
lengthy traces and must have a sufficient power of
expression to characterise the data generated from
information systems. The second issue is related to
the user interface widget that represents the traces.
The problem is that most user interface elements for
displaying large amounts of information build a
complete representation of the display in memory,
and then make sections of it visible as the user
scrolls through the information. This is further
complicated by the fact that when the user applies
the compression algorithms, or simply changes some
parameters, the entire display will need to be re-
created; despite the fact that only a tiny fraction will
be visible. In the context of this research, we will
investigate a new type of browsing widget that will
generate the display for only that part of the trace
that can currently be viewed. Furthermore, we will
investigate the best way to represent traces in the
user interface. We anticipate usability challenges
related to the ability of each view to convey
massive amounts of data. Research into software
visualization and usability engineering will need to
be carried out.
4 RELATED WORK
Existing trace simplification techniques can be
grouped into four categories. The first focuses on
grouping similar sequences of events invoked in a
trace as instances of the same pattern (Jerding 1997,
De Pauw 1998, Systä 2000, Richner 2002). Patterns
are not easily exploitable unless generalized. A set
of pattern matching criteria have been proposed by
many authors, e.g. (Jerding 1997, De Pauw 1998).
However, the use of these criteria has raised many
research issues. One of the main objectives of the
proposed research is to address these issues. The
second category encompasses techniques that
operate by limiting the amount of trace data gathered
(Systä 2002). These techniques assume that software
engineers have some knowledge of the system under
study. This assumption is not valid in practice.
Sampling techniques, which are representative of the
third category, suggest that only a sample of the
trace is needed for comprehension, eliminating the
need to generate the entire trace (Chan 2003).
Sampling is still at its early research stages. The
main issue lies in determining appropriate sampling
parameters. Finally, the last techniques suggest that
trace simplification could be performed by
clustering various trace components and only
visualizing the interaction among these clusters
(Walker 1998).
Another alternative to trace-simplification
consists of generating summaries from large traces.
Software engineers can use these high-level views to
look at the big picture (i.e. main content) first and
then delve into the detailed if desired. Amyot et al.
suggest tagging the source code at particular places
in order to generate a trace that can later be
represented using a Use Case Map (Amyot 2002).
Systä proposes a semi-automatic technique in which
state diagrams can be synthesized using various
UML sequence diagrams, extracted from traces
(Systä 2002). Her approach combines static and
dynamic analysis techniques. Wilde et al. propose a
simple method for the recovery of design threads
from inter-process systems by identifying, using
UNDERSTANDING THE DYNAMICS OF INFORMATION SYSTEMS
501
dynamic analysis, the implementation components
relevant to each thread (Wilde 1997).
We have previously published a survey of the
many existing trace visualization tools; this included
descriptions of their advantages and limitations
(Hamou-Lhadj 2004). These tools support features
ranging from simple trace exploration techniques to
more sophisticated types of analysis (e.g. querying
of trace model, etc). Traces have been represented in
various ways depending on their type. Traces of
routine (method) calls are often visualized using tree
structures (De Pauw 1998), UML sequence diagrams
(Jerding 1997, Systä 2000, Richner 2002), and Use
Case Maps (Amyot 2002). Traces of inter-process
messages are usually represented using crossing
lines among the processes (De Pauw 2002). Tools
that support traces of architectural components use
boxes and lines to represent the components and
their dynamic interactions (Walker 1998).
5 CONCLUSIONS
In this paper, we discussed ongoing research in the
area of reverse engineering of software systems with
an emphasis on information systems. Our approach
is based on analysing the content of large traces.
Traces, however, can be extremely large.
Therefore, there is a need to investigate ways to
reduce their size and complexity while keeping as
much of their essence as possible. We discussed
three research topics that we are currently
investigating. The first one focuses on shrinking
traces by grouping various sequences as instances of
the same pattern. The key challenges consist of
finding the proper matching criteria as a measure of
similarity. We proposed developing a set of
simplification algorithms based on these criteria.
Design recovery techniques, which are
representative of the second category, focus on
recovering behavioural design models from large
traces. These models can be used by software
engineers to explore a trace by looking at the main
content first and then dig into the details.
Finally, the last research topic focuses on
developing a visualization environment for
representing traces. The environment should support
multiple views so as to allow software engineers
browse the content of traces at different levels of
abstraction.
REFERENCES
Amyot, D., Mussbacher, G., and Mansurov, N., 2002.
Understanding Existing Software with Use Case Map
Scenarios. In SAM’02, 3rd SDL and MSC Workshop,
LNCS Vol.2599, Springer-Verlag.
Ball T, 1999. The Concept of Dynamic Analysis. In
ESEC’99, 7th European Software Engineering
Conference, Springer-Verlag.
Chan A., Holmes R., Murphy G. C., and Ying A. T. T.,
2003. Scaling an Object-Oriented System Execution
Visualizer through Sampling. In IWPC’03, 11th
International Workshop on Program Comprehension,
IEEE Computer Society.
De Pauw W., Lorenz D., Vlissides J., and Wegman M.,
1998. Execution Patterns in Object-Oriented
Visualization. In USENIX’98, 4th Conference on
Object-Oriented Technologies and Systems.
De Pauw W., Jensen E., Mitchell N., Sevitsky G., and
Vlissides J., Yang J., 2002. Visualizing the Execution
of Java Programs. In LNCS Vol. 2269, Springer-
Verlag.
Hamou-Lhadj, A., Braun, E., Amyot, D and Lethbridge,
T.C., 2005. Recovering Behavioral Design Models
from Execution Traces. In CSMR’05, 9th European
Conference on Software Maintenance and
Reengineering, IEEE Computer Society.
Hamou-Lhadj, A., and Lethbridge T., 2004. A Survey of
Trace Exploration Tools and Techniques. In
CASCON’04, 14th Annual IBM Centers for Advanced
Studies Conferences, IBM Press.
Hamou-Lhadj, A., and Lethbridge, T.C., 2004. Reasoning
About the Concept of Utilities. In ECOOP-PPPL’04,
1st International Workshop on Practical Problems of
Programming in the Large, LNCS Vol 3344,
Springer-Verlag.
Jerding D., Stasko J. and Ball T., 1997. Visualizing
Interactions in Program Executions. In ICSE’97, 19th
International Conference on Software Engineering,
ACM Press.
Richner T. and Ducasse S., 2002. Using Dynamic
Information for the Iterative Recovery of
Collaborations and Roles. In ICSM’02, 18th
International Conference on Software Maintenance,
IEEE Computer Society.
Systä T., 2000. Understanding the Behaviour of Java
Programs. In WCRE’00, 7th Working Conference on
Reverse Engineering, IEEE Computer Society.
Walker R. J., Murphy G. C., Freeman-Benson B.,
Swanson D., and Isaak J., 1998. Visualizing Dynamic
Software System Information through High-level
Models. In OOPSLA’98, 13th Object-Oriented
Programming Systems, Languages, and Applications,
ACM Press.
Wilde N., Casey C., Vandeville J., Trio G., Hotz D.,
1997. Reverse Engineering of Software Threads: A
Design Recovery Technique for Large Multi-Process.
The Journal of Systems and Software, Elsevier.
ICEIS 2007 - International Conference on Enterprise Information Systems
502