SURVEY AND PROPOSAL OF A METHOD FOR BUSINESS
RULES IDENTIFICATION IN LEGACY SYSTEMS SOURCE
CODE AND EXECUTION LOGS
Wanderley Augusto Radaelli Junior, Gleison Samuel do Nascimento and Cirano Iochpe
Informatics Institute, Federal University of Rio Grande do Sul, 9500 Bento Gonçalves Av., Porto Alegre, Brazil
Keywords: Legacy systems modernization, Reverse engineering, Business process, Business rules, Source code
manipulation, Execution logs mining.
Abstract: Computer systems implement business processes from different organizations. Among the currently
operating computer systems, much of it is classified as legacy system. Typically, legacy systems are
complex applications that are still active, due to the high cost of modernization and a high degree of
criticality. In recent years, were published several works addressing the importance of legacy systems
modernization, emphasizing the extraction of the business process model implemented in these systems.
Within this context, a key step is to extract knowledge from source code and / or systems execution logs,
aiming to use this information in reverse engineering processes. In this work are presented and analyzed
methods based on source code manipulation and system’s execution logs mining, which can be used to
extract knowledge from legacy systems, prioritizing business rules identification. A comparison between the
two different approaches is presented, as well as their positive and negative characteristics. Our results
include a list of desired features and a proposal of a method for legacy systems reverse engineering and
business rules identification.
1 INTRODUCTION
Legacy systems are information systems which
accomplish useful tasks within an organization, but
were developed based on technologies which are no
longer in use. These systems include information
and procedures which are fundamental for the
operation of the organization. However, to keep a
legacy system running is a complex, error-prone and
costly task (Pressman, 2001). In particular its
program code can be obsolete, hard to understand,
and poorly documented.
To reduce these problems, many organizations
invest in rewriting their legacy systems based on
modern technologies (Pressman, 2001). In
(Nascimento, Iochpe, Thom, Reichert, 2009), we
propose a sequence of rewriting steps that should be
followed in order to restructure legacy code on both
Business Process Management (BPM) and Service
Oriented Architecture (SOA). In short, this method
encompasses two subsequent phases. The first one
consists of identifying and translating the dynamic
behavior of a legacy system into business processes
models. Source code is analyzed and its behavior is
translated to directed graphs representing business
processes models. In a second phase those business
processes models are implemented and can be
executed by using a BPMS (Business Process
Management System) (Weske, 2007).
To identify the business processes which are
implicitly implemented within a legacy system
constitutes a major challenge of any rewriting
method that relies on BPM and SOA technologies.
In order to discover business processes within legacy
code, in (Nascimento, Iochpe, Thom, Reichert,
2009) we present a technique that consists of
analyzing the legacy source code in order to identify
its dynamic behavior and to use this knowledge later
on to construct directed graphs that represent the
business processes that exist within it.
Business rule is a key concept to the proposed
legacy code analysis technique. A business rule is an
abstracting statement that describes either a
behavioral or an execution constraint of an
information system (Ross, 1997). A business rule
defines either an action or control point of the
execution flow of an information system.
207
Radaelli Junior W., do Nascimento G. and Iochpe C..
SURVEY AND PROPOSAL OF A METHOD FOR BUSINESS RULES IDENTIFICATION IN LEGACY SYSTEMS SOURCE CODE AND EXECUTION
LOGS.
DOI: 10.5220/0003483802070213
In Proceedings of the 13th International Conference on Enterprise Information Systems (ICEIS-2011), pages 207-213
ISBN: 978-989-8425-55-3
Copyright
c
2011 SCITEPRESS (Science and Technology Publications, Lda.)
Usually, business rules are identified and applied
to the logical specification of information systems.
By the implementation of an information system,
these statements are translated into source code
fragments. Therefore, it is correct to say that the at
least some parts of a legacy system code implement
business rules that define as well as produce its
dynamic behavior during system execution.
In a similar way, a business process model, too,
can be specified through business rules (Knolmayer,
Endl, Pfahrer, 2000). By the implementation of a
business process model, the business rules that must
be represented within it are mapped onto individual
process components or process fragments. Thus, we
propose to use business rules as invariants in the
mapping of parts of legacy code onto process model
fragments of equivalent behavior.
In this work, we present a set of business rule
types to describe the behavior of different parts of a
legacy code. This set was found sufficient in all case
studies we have carried out until now in the context
of the BPM/SOA based rewriting technique we are
investigating.
For every business rule type contained in the
specified subset and for every programming
language, a syntactical structure can be defined that
allows for the representation of rules of that type in
any source code written in that specific
programming language. Therefore, one can create a
set of rule type templates in any programming
language he/she selects. Each one of the rule
templates can then be used either manually or
automatically in order to identify rule instances
within a source code written in the respective
programming language.
An important step of the BPM/SOA based
rewriting technique we are investigating is the
identification of business rules that are represented
in the legacy source code by applying a set of rule
type templates as in a pattern matching procedure
onto the legacy source code.
In this context, in the literature, there are several
studies addressing methods that assist the task of
business rules identification in legacy systems. Most
of these methods are based on reverse engineering
processes (Nascimento, Iochpe, Thom, Reichert,
2009) (Kalsing, Nascimento, Iochpe, Thom,
Reichert, 2010a) (Erlikh, 2000).
There are two main approaches to reverse
engineer systems, aiming at further business rules
identification:
1. Methods that involve source code manipulation:
usually, these methods are based on static analysis of
the source code.
2. Methods that involve data mining: these methods
are based on the application of data mining
algorithms, which analyze execution logs of the
target system's.
This paper present and analyze methods which allow
source code manipulation and / or system execution
logs mining, in order to identify our business rules
set that make up the business process model of the
system being analyzed. To reduce the need of human
intervention and to identify business rules
automatically or semi-automatically constitute an
important contribution to the research area.
Thus, the main contributions of this paper are:
Definition of the business rules set used in the
rewriting method presented in (Nascimento, Iochpe,
Thom, Reichert, 2009);
An analysis about how the presented methods
can be utilized to identify the business rules set used
in the rewriting method presented in (Nascimento,
Iochpe, Thom, Reichert, 2009);
Presentation of a legacy systems reverse
engineering hybrid method of business rules
identification.
The remainder of this paper is organized as follows:
section 2 shows methods of legacy systems reverse
engineering used to identify business rules in source
code and execution logs. Section 3 presents the
business rules set that is used in the rewriting
method presented in (Nascimento, Iochpe, Thom,
Reichert, 2009). In Section 4 are listed some desired
features for legacy systems reverse engineering
methods. Section 5 contains a proposal of a legacy
systems reverse engineering hybrid method suitable
for the identification and extraction of the whole set
of business rules listed in section 3. Finally, the
conclusion outlines the main contributions of this
research work.
2 RELATED WORK
In this section are presented and analyzed legacy
systems reverse engineering methods, which can be
used in knowledge extraction processes,
emphasizing business rules identification.
2.1 Methods based on Source Code
Manipulation
In order to identify business rules in legacy systems,
methods based on source code manipulation, mainly,
use a combination of techniques (e.g. program
slicing), concepts (e.g. domain variables), and data
ICEIS 2011 - 13th International Conference on Enterprise Information Systems
208
structures (e,g. abstract syntax tree - AST).
In (Chiang, 2006) the author presents a method
of legacy systems reverse engineering, which uses
program slicing technique as base for source code
manipulation and business rules identification. For
better application within large systems, the method
presented, first classify the source code into three
different categories, which are: user interface layer,
data access layer and business rules layer. The
author argues that this division simplifies the source
code manipulation process. After the classification,
dependency graphs are created to guide the program
slicing process, which leads to business rules
identification with human aid.
In (Wang, Zhou, Chen, 2008), is presented a
method of legacy systems reverse engineering based
on domain variables identification. The method also
uses program slicing technique and dependency
graphs. The article presents a classification for
system variables, introducing the concept of pure
domain variables, arguing that this kind of variable
is generally directly related to business rules,
referencing external data sources such as files or
database connections. In the end of the process there
is a validation step, in which the identified business
rules candidates are presented for system experts.
Similarly, in (Wang, Sun, Yang, He, Maddineni,
2004), knowledge extraction and subsequent
identification of business rules is based on domain
variables management. The proposed method create
dependency graphs and identify domain variables
related to input/output operations. According to the
authors, the method was proposed to overcome a
limitation that exists in other methods, which make
the identification and manipulation of domain
variables manually. In large systems the proposed
method should be implemented module by module,
due to the large number of domain variables found.
A method for business rules identification and
extraction is also presented in (Putrycz, Kark, 2007).
The method consists of several steps, and the most
important are: creation of an AST representing the
system, extraction of information from the AST and
creation of a knowledge base (code blocks,
identifiers, conditional branches, loops, comments in
source code). The final step correspond to extracted
data validation, and is performed with the aid of
system specialists, using a prototype developed by
the authors, which evaluates the information
contained in the knowledge base created earlier.
A semi automatic method of legacy systems
reverse engineering, which uses a combination of
AST, dependency graphs and program slicing
technique is presented in (Paradauskas,
Laurikaitidis, 2006). The method consists of eight
steps, among which we can highlight the following:
generation of AST, generation of dependency
graphs, application of program slicing technique,
final validation through human intervention in the
end of process. The variant of program slicing
applied depends on the type of the variable being
analyzed: backward slicing to handle output
variables, and forward slicing to handle input
variables.
In (Huang, Tsai, Bhattacharya, Chen, Wang,
Sun, 1996) is presented an interactive method for
business rules identification within legacy systems
source code. In the first step, the source code is
parsed, resulting in an AST and a data dictionary. In
the sequence, a dependency graph is created based
on the AST. Using the DG and some heuristic rules
presented in the paper, the variables present in the
source code are classified. After this step, the most
important variables are selected and guide the
program slice technique application. The result is
then presented in a user interface prototype, where
the users can participate of the business rules
identification and extraction process.
2.2 Methods based on Mining
Execution Logs
In (Van Der Aalst, Reijers, Weijters, 2007) is
proposed a method of legacy systems reverse
engineering and modernization based on systems
execution logs mining. The text is based on ideas
present in ProM framework, designed by the
authors. The main characteristics of the proposed
method are: it is generic in the sense that it is not
geared to the characteristics of the system being
analyzed. Represents the information in ProM
framework syntax, which is not trivial, specially for
non-technical staff. More than one data mining
algorithm is used. The main contributions of the
paper are: authors present arguments and examples
which justify the combination of various data mining
algorithms, in order to obtain better results within
mining large systems execution logs. Authors claim
that interaction with system users is necessary to
resolve doubts and validate the business rules
extracted, because in some cases, the information in
the log are not self-explanatory, making sense only
to users with good experience in the system.
A method of reverse engineering system based
on data mining is presented in (Stroulia, El-Ramly,
Kong, Sorenson, Matichuk, 1999). The proposed
method is based on mining logs of the user
interaction with the system. The authors used the
SURVEY AND PROPOSAL OF A METHOD FOR BUSINESS RULES IDENTIFICATION IN LEGACY SYSTEMS
SOURCE CODE AND EXECUTION LOGS
209
naming traces of execution, rather than execution
logs. The text is focused on the ideas presented by
the project CelLEST, which is conducted by the
authors. The main steps of the proposed method are:
identify the business process model implemented by
the system using Genetic Miner algorithm. The
interaction patterns are derived from the sequence of
screens accessed during user navigation, based on
information such as title and code of each screen.
Similarly, in (El-Ramly, Stroulia, Sorenson,
2002) is presented another proposal of a method for
legacy system reverse engineering based on systems
execution logs mining. The difference between this
proposal and the proposal presented above is that
this one led to the discovery of use cases in the
system. Below are listed the main features of the
method: application of associative mining algorithm.
Data mining phase, aiming patterns discover. The
discovered patterns are used in the process of use
cases identification, which serve as basis for
mapping the various business rules embedded in
source code. Once a pattern is discovered, it is
incremented, with the aid of human experts.
3 BUSINESS RULES SET
In (Kalsing, Nascimento, Iochpe, Thom, Reichert,
2010a), we define a set of 8 business rules required
for mapping legacy systems to business process
models, as proposed by our rewriting method.
Now, in this work, for each business rule type,
we define a syntactic structure that represents a code
fragment which is necessary to characterize the
implementation of the business rule in the legacy
source code. We can use these structures as
templates to identify the business rules in legacy
source code.
For a better understanding of these rules, we
subdivided our business rule set into two subsets:
1) Business rule for work units: These rules are
atomic operations executed in legacy system. They
represent units of work (processing) that are
indivisible.
2) Business rule for control flows: These rules are
operations that control the flow of execution of the
legacy system. They represent conditional, loop and
others structures of control flow.
In this paper it will be studied methods which allow
identification of the following types of business
rules (see Table 1).
The syntactic structures presented in Table 1 can
be found in most imperative programming
languages.
Table 1: Business Rules Classification.
Business
Rule Type
Description
Business Rules for Work Units
Math
Calculation
Represents a math calculation. It can be a
simple math expression or a complex calculus
related to several statements of code.
Ex: Discount = (price*20)/100;
Function o
r
Procedure
Call
Represents an external function or procedure
call in source code. An external function or
procedure is usually a black box, where the
source code is not available.
Ex: ret = InvokeCreditAnalysis(customer);
Data
Persistence
Represents statements associated to data
manipulation. It can be a SQL statement or a
file I/O operation.
Ex: sql = “SELECT * FROM customer
WHERE name = “ + customer_name;
User
Interaction
Represents an input or output data form,
where the user interacts with the system.
Ex: printf("*Customer Name/Address: *");
scanf("%s", customer_name); …
Business Rules for Control Flows
Conditional
Decision
Represents a decision, i.e., a conditional
statement (e.g. if-then-else, switch, etc).
Ex: IF (approved && price>1000) { … }
Iteration Represents a repetition, i.e., a loop statement
(e.g. for, while, etc).
Ex: WHILE (items > 0) { … }
Exception
Handling
Represents special structures to handle
exceptions (e.g. try-catch).
Ex: try { … } catch (Exception e) { … }
Parallel Some programming languages can execute
commands in parallel through the structure
fork/join.
Ex: fork { … } : { … } join
4 DESIRED FEATURES FOR
BUSINESS RULES
IDENTIFICATION METHODS
In this section are presented desired features of
legacy systems reverse engineering methods
obtained in bibliography research.
1) Application Domain Independent: second
(Newcomb, Kotik, 1995), the method should be
generic enough to not be directed to specific
characteristics of the application domain where the
analyzed system is operating;
2) Programming Language Independent: it
would be interesting to have a method of legacy
systems reverse engineering that is not tied only to
systems implemented in a particular programming
language (Kuipers, Moonen, 2000).
3) Scalability: second (Almonaies, Cordy, Dean,
2009) it would be interesting to have a method of
systems reverse engineering which provide good
scalability when applied to the whole legacy system,
ICEIS 2011 - 13th International Conference on Enterprise Information Systems
210
or when applied to large modules. Therefore, it is
expected that the method is capable of identifying
the proper relationship between the various business
rules contained in the modules of the system and
also be able to manage successfully the question of
the growing volume of data to be analyzed.
4) Types of Business Rules Identified: the method
should be able to recognize at least the types of
business rules listed in section 3.
5) Reduce Human Intervention: second
(Paradauskas, Laurikaitidis, 2006) the search for an
automatic legacy systems reverse engineering
method is the main objective of a great number of
papers available in literature. It would be great to be
able to decrease the amount of human intervention
in the process of business rules identification, either
during the step of extracted data validation, either
during the stage of business rules annotation in
legacy systems source code, before mining the
execution logs.
5 PROPOSAL OF A SUITABLE
METHOD FOR BUSINESS
RULES IDENTIFICATION
A hybrid method will be utilized to identify the
whole set of business rules listed in section 3. That
is, our method of business rules identification will
use both source code manipulation and execution
logs mining techniques.
The application of a reverse engineering method
based on source code manipulation, using AST,
dependency graphs, domain variables management
and program slicing technique is suitable for the
identification of business rules for work units. The
application of techniques based on systems
execution logs mining is suitable for the
identification of business rules for control flows.
The implementation of our method consists of
two algorithms: the first algorithm use techniques of
source code manipulation to identify business rules
for work units; the second algorithm use systems
execution logs mining to identify business rules for
control flows.
The first algorithm identifies the business rules
for unit works and to annotate both the start and end
points of the business rules found in the source code
(Figure 1(a)). Annotations are source code lines that
write records into a log file during next executions
of the legacy code (Figure 1(b)).
Figure 1: Main Steps of our Method.
Below is presented the steps of the algorithm to
make the source code manipulation and to annotate
the business rules for work units:
1) AST Creation: an AST representing the whole
system is obtained after source code parsing
(Putrycz, Kark, 2007) (Paradauskas, Laurikaitidis,
2006);
2) Generation of Dependency Graphs: the
dependency graphs are obtained based on the AST.
These graphs are used to assist the systems source
code understanding task, representing the
relationships between the different variables and
instructions present in the source code (Wang, Sun,
Yang, He, Maddineni, 2004a) (Horwitz, Reps,
Binkley, 2003) (Ottenstein, Ellcey, 1992);
3) Domain Variables Identification: based on
variables classification (Chen, Tsai, Joiner,
Gandamaneni, Sun, 1994) and input/output variables
management (Wang, Zhou, Chen, 2008);
4) Application of Program Slicing Technique:
guided by the domain variables identified in the
previous step (Xu, Qian, Zhang, Wu, 2005). Second
(Harman, Hierons, 2001), static program slicing is
more suited for business rules identification in
source code, because it works representing all the
possible branches that the systems execution flow
can follow, unlike other program slicing approaches,
such as dynamic, conditional and amorphous slicing;
5) Extracted Data Validation: in the end of the
process, human intervention is needed, in order to
validate the extracted business rules. The validation
step is necessary because none of the methods
available in literature is able to automatically
identify the business rules ensuring an acceptable
level of accuracy.
After the business rules for work units
annotation, a set of meaningful execution scenarios
is selected according to the most frequently used
legacy system running parameters. In the sequence,
the system is recompiled and executed repeatedly
SURVEY AND PROPOSAL OF A METHOD FOR BUSINESS RULES IDENTIFICATION IN LEGACY SYSTEMS
SOURCE CODE AND EXECUTION LOGS
211
according to the selected running scenarios. Thus, a
log file will be created and extended with
information about business rules executions within
most probable running scenarios (Figure 1(b)).
In the second step, the system execution log file
is analyzed by the process mining algorithm. This
algorithm is called Incremental Miner and has been
developed by the research group. The Incremental
Miner Algorithm was presented in (Kalsing, Thom,
Iochpe, 2010b). The Incremental Miner Algorithm
analyses the log file and produces as its output one
or more probable partial orders of business rules
executions according with some predefined
threshold (Figure 1(c)).
Note in Figure 1(c) that the boxes Rule 01, Rule
02, Rule 03 and Rule 04 represent the execution of
the business rules for work units annotated by the
algorithm of source code manipulation. Note also
that there are two transitions out of the Rule 02.
These transitions represent a business rule for
control flow. Thus, the Incremental Miner algorithm
captures the behavior of legacy source code control
flow instructions, i.e., it identifies the business rules
for control flows.
The combination of the source code
manipulation and the systems execution logs mining
incremental algorithm is able to recognize the set of
business rules listed in section 3 and also provides
the desired features for legacy systems reverse
engineering methods listed in section 4.
6 SUMMARY AND OUTLOOK
In this paper were studied and analyzed methods of
legacy systems reverse engineering, which allow
knowledge extraction, emphasizing business rules
identification and extraction.
Throughout the text, we presented its main
characteristics and a comparison between the
different techniques, data structures and concepts
used by the cited methods.
The hybrid method, based on source code
manipulation, which uses AST, dependency graphs,
domain variables management, program slicing
technique and systems execution logs mining,
presented in section 5 is suitable to identify and
extract business rules from legacy systems. In the
end of the process, human intervention is needed to
validate the extracted data. The validation step is
necessary because none of the methods studied is
able to automatically identify the business rules
ensuring an acceptable level of accuracy.
It is on course the implementation of a prototype
which will be able to apply to legacy systems the
reverse engineering hybrid method presented in
section 5. In future work, using the prototype and the
knowledge obtained in this work, we intend to be
able to identify and extract the whole set of business
rules listed in section 3.
REFERENCES
Almonaies A., Cordy J., Dean T., Legacy System
Evolution towards Service-Oriented Architecture,
2009.
Chiang C. Extracting Business Rules from Legacy
Systems into Reusable Components, 2006.
El-Ramly M., Stroulia E., Sorenson P., Mining System-
User Interaction Traces for Use Case Models, 2002.
Erlikh L., Leveraging legacy system dollars for e-business,
IT Professional, p. 17-23, 2000.
Harman M., Hierons M., An overview of program slicing.
Software Focus p. 85-92, 2001.
Huang H., Tsai W., Bhattacharya S., Chen X., Wang Y.,
Sun J., Business Rule Extraction from Legacy Code.
In: 20th Compute software & Applications Conf. IEEE
Computer Society Press (1996).
Kalsing A. C., Nascimento G. S. do., Iochpe C., Thom L.
H., Reichert M. An Incremental Process Mining
Approach to Extract Knowledge from Legacy
Systems, In: 14th Int. IEEE EDOC Conf., IEEE
Computer Society Press: Vitória-Brazil, (2010).
Kalsing A. C., Thom L. H., C. Iochpe. An Incremental
Process Mining Algorithm. In: 12th Int. Conf. on
Enterprise Information Systems, (2010).
Knolmayer G., Endl R., Pfahrer M.. Modelling Processes
and Workflows by Business Rules, In: LNCS -
Business Process Management, Springer: London,
(2000).
Kuipers T., Moonen L., Types and Concept Analysis for
Legacy Systems, 2000.
Nascimento G. S., Iochpe C., Thom L. H., Reichert M. A
Method for Rewriting Legacy Systems using Business
Process Management Technology, In: 11th Int. Conf.
on Enterprise Information Systems, (2009).
Newcomb P., Kotik G. Reengineering Procedural Into
Object-Oriented Systems, 1995.
Paradauskas B., Laurikaitidis A., Business Knowledge
Extraction from Legacy Information Systems,
Information Technology and Control, Vol.35, No.3,
2006.
Pressman R. S. Software Engineering: A Practitioner´s
Approach, (McGraw-Hill, 2001).
Putrycz E., Kark A. W., Recovering Business Rules from
Legacy Source Code for System Modernization,
Lecture notes in computer science, p 107, 2007.
Ross R. G. The Business Rule Book: Classifying,
Defining and Modeling Rules, 2nd edition, Business
Rule Solutions, (1997).
ICEIS 2011 - 13th International Conference on Enterprise Information Systems
212
Stroulia E., El-Ramly M., Kong L., Sorenson P., Matichuk
B., Reverse Engineering Legacy Interfaces: An
Interaction-Driven Approach. In Proc. of the 6th
Working Conf. on Reverse Engineering, (1999).
Van der aalst W., Reijers H., Weijters A., Business
Process Mining: An Industrial Application, (2007).
Wang C., Zhou Y., Chen J. Extracting Prime Business
Rules from large legacy system, Int. Conf. on
Computer Science and Software Engineering, (2008).
Wang X., Sun J., Yang X., He Z., Maddineni S. Business
Rules Extraction from Large Legacy Systems, In: 8th
Euro. Conf. on Software Maintenance and
Reengineering, 2004.
Weske M. Business Process Management: Concepts,
Languages, Architectures, Springer; Berlin (2007).
SURVEY AND PROPOSAL OF A METHOD FOR BUSINESS RULES IDENTIFICATION IN LEGACY SYSTEMS
SOURCE CODE AND EXECUTION LOGS
213