A METHOD FOR DISCOVERING THE RELEVANCE OF
EXTERNAL CONTEXT VARIABLES TO BUSINESS PROCESSES
Eduardo Costa Ramos, Flavia Maria Santoro and Fernanda Baião
NP2Tec, Department of Applied Informatics
Federal University of the State of Rio de Janeiro (UNIRIO), Rio de Janeiro, Brazil
Keywords: Business Process, External Context, Knowledge Management, Competitive Intelligence, KDD.
Abstract: Organizations have been demanded to efficiently detect and respond to changes in their environment, which
depends on its ability to adapt their business processes. Taking internal and external environment variables
into account enables to address issues, such as, how a business process was executed last time the country
experienced a similar economic scenario; whether that process execution brought positive results or not;
which were the external environmental reasons that provoked changes in previous process executions.
These environmental variables are typically referred in the literature as the context of the process. In this
paper, we propose a method to identify and prioritize external variables that impact the execution of specific
activities of a process. The proposed method applies competitive intelligence concepts and data mining
techniques, and was evaluated in a case study.
1 INTRODUCTION
Organizations are pressured to quickly detect and
respond to changes in their environment, which may
include issues about social, political, economical or
technological areas. This fast adaptation depends on
its ability to use both internal and external
information about the environment and adapt itself
to changes and other contingencies imposed. Such
disruptions in the routine should be reflected in
business processes (Recker and Rosemann, 2006).
Knowledge Management and Competitive
Intelligence approaches can be used in this direction
(Jung et al., 2006).
Both Knowledge Management (KM) and
Competitive Intelligence (CI) focus on the strategic
organization goals. While CI focuses on the outside,
monitoring and internalizing information from the
external environment, KM encodes, shares and uses
knowledge generated and stored internally in the
organization. Taking internal and external
environment variables into account enables the
organization to address important questions such as
how a business process was executed last time the
country experienced a similar economic scenario;
whether that process execution brought positive
results or not; which were the external
environmental reasons that posed changes in
previous process executions. Those environmental
variables are typically referred in the literature as the
context of the process.
Context is defined as any information that can be
used to characterize the situation of an entity (Dey,
2001). In a business process scenario, context is the
minimum set of variables containing all relevant
information impacting the design and
implementation of a business process. Context
information could be associated to any process
element, such as activities, events, or actors.
Furthermore, its analysis should provide insights to
identify problems and learn with the past, besides
helping to make decisions.
However, manipulating all stored organizational
knowledge, as well as environmental external
information, requires the application of knowledge
discovery techniques so as to automatically handle
and extract patterns from it. In this regard, Liebowitz
(2003) proposed a set of frameworks to help a
project manager in conceptualizing and
implementing knowledge management initiatives,
and poses some important questions that need to be
addressed: (i) how knowledge discovery techniques
can be applied for mining Knowledge bases; (ii)
how is Knowledge originating from outside a unit
evaluated for internal use?; (iii) does lack of a shared
context inhibit the adoption of knowledge
399
Costa Ramos E., Maria Santoro F. and Baião F..
A METHOD FOR DISCOVERING THE RELEVANCE OF EXTERNAL CONTEXT VARIABLES TO BUSINESS PROCESSES.
DOI: 10.5220/0003668603990408
In Proceedings of the International Conference on Knowledge Management and Information Sharing (RDBPM-2011), pages 399-408
ISBN: 978-989-8425-81-2
Copyright
c
2011 SCITEPRESS (Science and Technology Publications, Lda.)
originating from outside a unit?; (iv) How much
context needs to be included in knowledge storing to
ensure effective interpretation and application?
Although there are a few proposals that deal with
context associated to business process (Nunes et al.,
2009); (Rosemann et al., 2008); (Saidani and
Nurcan, 2007), defining the relevance of external
information for the execution of a process in an
organization is still a challenge.
We propose a method to identify and prioritize
external variables that impact the execution of
specific activities of a process. The proposed method
applies Competitive Intelligence concepts and data
mining techniques (feature selection and decision
trees). We have evaluated the method in a case
study, which showed how the discovered variables
influenced specific activities of the process.
This paper is structured as follows: Section 2
defines context and KM concepts, and presents
related work. Section 3 details the proposed method,
which was applied to a case study explained in
Section 4. Section 5 concludes this work and points
to promising evolutions of it.
2 RELATED WORK ON
CONTEXT-AWARE PROCESS
The concept of context has recently revealed its
relevance in business process management area.
Identifying, documenting and analyzing contextual
issues might help to make clear how changes in the
environmental setting of an organization should lead
to adaptations in processes. Literature points to the
importance of considering contextual information,
both in the design of business processes; and also,
throughout process instances execution. As a result,
an important issue should be identifying contextual
elements that impact the process.
A taxonomy for context, described by Saidani
and Nurcan (2007), which is composed of the most
usual contextual information (location, time,
resource and organization) aims at supporting
context elicitation. Nunes et al. (2009) also
presented a model for context to support knowledge
management within the scenario of a business
process. The model developed by these authors is an
ontology that establishes a representation for context
elements associated with process activities. Based on
this model, process instances and their context are
stored and further could be re-used. The types of
context elements presented are: (i) information that
exist during the execution of an activity (time,
artifacts), (ii) information about individuals or
groups that perform an activity, (iii) information to
spell out the interaction between individuals within
the activity performed. Both proposals do not
provide explicit methods for context elicitation and
neither consider external environment context.
Rosemann et al. (2008) integrate context in
process modeling and define a meta-model
concerned to the structure of a process, its goals, and
context. They also describe a context framework
where diverse context levels are depicted in layers,
and a procedure to use it: (i) identify process goals;
(ii) decompose process, (iii) determine relevance of
context, (iv) identify contextual elements, (v) type
context. Our research is directly related to the
detailing of step 4 as an evidence-based task.
Another approach for bringing out context is
stated by Soffer et al (2010) with the goal of
learning and gradually improving business processes
considering three elements: process paths, context
and goals. Similar to our work, they argue that the
success of a process instance can be affected not
only by the actual path performed, but also by
environmental conditions, not controlled by the
process. Their work is based on an experience base,
including data of past process instances: actual path,
achieved outcome, and context information.
We propose context identification to be handled
at the activity level, thus enabling process
stakeholders to dynamically interfere into a specific
activity result by applying previously acquired
knowledge during the execution of a process. The
circumstances are defined according to the external
environment. External contingencies can be
considered as opportunities or constraints that
influence the structure and internal processes of
organizations, according to Competitive Intelligence
initiatives (Jung et al., 2006). The CI
implementation cycles generally include steps to
identify information that should be collected.
Therefore, based on (Jung et al., 2006); (Kimball
and Ross, 2002); (Cook and Cook, 2000); (Herring,
1999); (Ramos et al., 2010) described the CI process
cycle steps to support a Context-based KM Model
The first step is to identify process, therefore key
business processes are chosen from goals and
organization strategy. Then, external variables
should be identified and represented and associated
to the process model through a Bus Matrix (Kimball
and Ross, 2002). After that step, it is possible to start
collecting and keeping these information through
properly sources (databases, sensors, etc.). All
information is stored in a repository called
Organizational Memory, and a number of techniques
(KDD, inferences) are applied in order to search for
KMIS 2011 - International Conference on Knowledge Management and Information Sharing
400
evidences of their impact in process instances. This
might result in scenarios and recommendations,
which might improve the process, either at the
instance or at the model level. The process manager
is able to make decisions based on that outcomes; it
could possibly cause process adaptations. Then, the
cycle starts in on again.
The problem addressed in this paper is
specifically related to steps 2 and 3 from this cycle.
Next section describes a method to identify the
external context, or the kind of information that
generally cannot be captured in transactional
systems, but from outside of the organization.
3 A METHOD FOR
DISCOVERING EXTERNAL
CONTEXT
In order to capture and use context information, it is
first necessary to specify which context information
has to be handled by the organization (Nunes et al.,
2009). We propose a method to discover external
context variables (Figure 1) that may not be part of
the organizational memory elements, but can be very
relevant to the organization in achieving its process
goals. This method also identifies which specific
activities and process outcomes are impacted by the
external context variables. Once discovered, the
intelligence analyst may retrieve and analyze
external context variables to define scenarios and
recommend actions for decision-makers. The
decision-makers evaluate the previous decisions and
make new decisions that can reflect on improving,
creating or removing processes.
There are several methods related to the
definition of information needs, e.g., questionnaire,
interview and observation that are widely used in
different contexts (Vuori and Pirttimäki, 2005).
However, the most suitable methods for the
definition of information at the strategic level used
by competitive intelligence are Key Intelligence
Topics (KIT) (Herring, 1999) and Critical Success
Factors (CSF). The use of a systematized or formal
“management-needs identification process” is a
proven way to accomplish this task (Herring, 1999).
Key Intelligence Topic (KIT) support specification,
definition and prioritization of information needs at
the strategic level of the organization. KITs are
items that must be constantly monitored to guarantee
business success. They should be more detailed in
the form of KIQs (Key Information Questions),
which are items that specify the contents of each
KIT. For example, the KIT “Strategic Investment
Decisions” may consist of the following KIQs:
"What is the involvement of other investors in
competitors?" and "What are the critical investments
from competitors?" (Vuori and Pirttimäki, 2005).
The KITs are identified through interviews with
managers, asking open questions. They fall into
three categories: (i) strategic decisions and actions;
(ii) topics for early warning, considering threats and
issues on which decision makers do not want to be
surprised, and (iii) major players in the market, such
as customers, competitors, suppliers and partners
(Herring 1999). The technique also proposes the
concept of surveillance areas, which are
macroeconomic variables that impact the business
sector, and that should be monitored.
The method steps are described as follows.
Step 1 – Identify Process Goal(s). Identify the goal
related to a given process and their appropriate
measures (Rosemann et al, 2008). Repeat this step to
identify others goals after concluding the last step.
Step 2 – Select KIT Category. Herring (1999) has
divided KITs into three categories: 1) Strategic
Decisions and Issues, 2) Early-warning KITs,
considering threats and issues on which decision
makers do not want to be surprised and 3) Key
player KITs (such as customers, competitors,
Figure 1: Method for external context variables identification.
A METHOD FOR DISCOVERING THE RELEVANCE OF EXTERNAL CONTEXT VARIABLES TO BUSINESS
PROCESSES
401
suppliers and partners).
Step 3 – Select Surveillance Area. To define the
external context variables, the steps 3 to 6 are part of
a top-down approach. Top level areas must be
considered to give support to the next step. A model
to categorize context information would help to
select those areas. The areas can be selected from
any framework or a combination of them, such as
Five Forces model (Porter, 1979), or SLEPT or
STEEP Analysis (The Times, 2010). In general, they
are: social, technology, economic, ecology, political,
legal and competitors, due to all industries are
influenced by them. These forces are continually in a
state of change and then should be scanned. Most
research about context in business process deal with
internal context, i.e. process attributes inherent to the
way process is performed, to the organization of
activities and internal rules. Few context categories
are proposed, such as location, time, and
organization environment. Our work focuses on the
events that occur externally to the process, or
ultimately to the organization where it runs, but
somehow interfere within this process, provoking
good or bad effects. There are not many proposals to
categorize this kind of context information.
Rosemann et al. (2008) propose that the external
layer of their model is composed of the following
types of context: suppliers, capital providers,
workforce, partners, customers, lobbies, states,
competitors. Repeat this step for each of the three
KIT categories.
Step 4 – Identify KIT. Key Intelligence Topics
(KITs) are identified by interviewing the key
decision-makers and asking them open-ended, non-
directive questions (Herring and Francis, 1999). An
interview protocol can be very useful to ensure the
consistency of results (Herring, 1999). Repeat this
step for each of the surveillance area selected.
Step 5 – Identify KIQ. Key Intelligence Questions
(KIQs) should be identified for each KIT. KIQs
represent the information needs listed in the KIT, i.e.
what the manager needs to know to be able to make
the decisions. It is possible to have the same KIQ for
more than one KIT. Repeat this step for each KIT
selected.
Step 6 – Identify External Context Variables.
Each KIQ may reference one or more external
variables. These are the external context variables
and are identified in this step. It is possible to have
the same variable for more than one KIQ. Repeat
this step for each KIQ identified in the previous step.
For each process goal, the result of all the executions
of steps 2 to 6 will be the final Intelligence Tree with
the following columns: Process Goal, KIT category,
Surveillance Area, KIT, KIQ and External Context
Variable.
Step 7 – Collect Past Information of the External
Context. In this step, the historic of the external
context is collected and stored in the organizational
memory.
Step 8 – Determine Relevance of the External
Context to the Process outcomes and to the
Process Activities Outcomes. It is not feasible to
store all context information that could form part of
the Organization Memory. That’s is why, this step
helps prioritizing which context to capture and store,
by classifying the variables by relevance using data
mining. This step follows the KDD process of
Fayyad et al (1996) that is interactive and iterative,
involving numerous steps with many decisions made
by the user. The term Knowledge Discovery in
Databases (KDD) is generally used to refer to the
overall process of discovering useful knowledge
from data, where data mining is a particular step in
this process (Fayyad, et al., 1996)
Several data mining problem types or analysis
tasks are typically encountered during a data mining
project. Depending on the desired outcome, several
data analysis techniques with different goals may be
applied successively to achieve a desired result
(Jackson, 2002). Before applying the KDD process,
it is necessary to develop an understanding of the
application domain and the relevant prior knowledge
and identifying the goal of the KDD process from
the customer’s viewpoint (Fayyad et al., 1996). Our
method uses KDD for the following KDD goal:
predict the process goal and determine the relevance
of the external context to the process outcomes and
to the process activities outcomes to achieve the
process goal defined in step 1. The KDD process
steps (Fayyad et al., 1996) are:
Step 8.1 (Selection) - this step consists on creating a
target data set, or focusing on a subset of variables
or data samples, on which discovery is to be
performed. In this step, the historic of the external
context is associated to the process activities
outcomes and to the process execution results, for
the same period.
Step 8.2 (Pre-processing) - this step consists on the
target data cleaning and pre processing in order to
obtain consistent data;
Step 8.3 (Transformation) - this step consists on
data reduction and projection: nding useful features
to represent the data depending on the goal of the
task. With dimensionality reduction or
transformation methods, the effective number of
KMIS 2011 - International Conference on Knowledge Management and Information Sharing
402
variables under consideration can be reduced, or
invariant representations for the data can be found
(Fayyad et al., 1996).
Step 8.4 (Data Mining - DM) - this step consists on
the searching for patterns of interest in a particular
representational form, depending on the DM
objective (usually, prediction). Many models can be
created to allow comparing which one has the best
accuracy for predicting the target attribute, in the
case of prediction. The chosen model must easily
show the relevant variables that must be scanned and
what specific values may trigger some decisions.
Step 8.5 (Interpretation/Evaluation) - this step
consists on the interpretation and evaluation of the
mined patterns.
4 A CASE STUDY USING DATA
FROM OPEN SOURCE
PROJECTS
An explanatory case study was made in order to
evaluate the method proposed. A case study was
used in this research because it does not require
control of behaviours events and because it focus on
contemporaneous events (Yin, 2009). This research
question is: “how to determine the relevance of
variables of the external context to a business
process?”.
4.1 Source Forge Software
Development Process Model
We applied the approach in a scenario on the domain
of Open Source Software Development. Figure 2
presents a process of Source Forge software
development projects modeled with the Bizagi
Process Modeler (Bizagi, 2011) using BPMN 1.2
notation (OMG 2010). In this software development
process, the organizations may be interested in the
information if new projects or existing ones will be
concluded under the production or mature status,
i.e., the organizations must make decisions such as:
authorize or no the start of a software development
project?; what to do to maximize the chances of an
on going project to be concluded in the production
or mature status?; when is it better to deactivate a
project than continuing with it?
The software development process of Source
Forge (SF) is not published formally by Source
Forge, thus, we made some considerations in
creating the process model in Figure 2, as for
example, we considered only the projects that started
in the Specify Requirements activity, despite there
were others projects getting started in others
activities.
Each project can be classified into one of six
different levels, from the earliest stage of production
to a fully developed software: planning, pre-alpha,
alpha, beta, production stable and mature (Comino et
al., 2007). The process in Figure 2 was based on
these status and on literature (PMI, 2008); (Madey,
2011). In the Authorize the start of the project
activity, the decision maker, that can be a project
manager for example, creates the project in SF; in
Figure 2: Source Forge Software Development Process Model.
A METHOD FOR DISCOVERING THE RELEVANCE OF EXTERNAL CONTEXT VARIABLES TO BUSINESS
PROCESSES
403
the Specify Requirements activity the requirements
are specified; in the Design and Code activity the
software engineers design and develop the software,
and perform the unit tests; in the Perform Alpha Test
activity and in the Perform Beta Test activity, the
software is tested; in the Deployment activity the
official software is published to the users in the
production or mature status, that is why we consider
just one status: “production/mature”; in the
Deactivate activity, the project is canceled
temporarily or definitely by the decision maker.
4.2 The Data Set
The proposed method was applied to the Open
Source (OS) projects from Source Forge projects
database (Madey, 2011). SourceForge (SF) thrives
on community collaboration to help creating the
leading resource for open source software
development and distribution. With the tools it
provides, 2.7 million developers create software in
over 260,000 projects. SF connects more than 46
million consumers with these open source projects
and serves more than 2,000,000 downloads a day
(Madey, 2011). SourceForge.net is the largest
existing online platform providing OS developers
with useful tools to control and manage software
development. Project administrators register their
software project on SF and provide the required
information which is then available on-line (Comino
et al. 2007).
The dataset we employed in our analysis consists
of 1,087 OS projects that were hosted on SF and that
had an English version and that got started after
January 2005 at the “Specify Requirements”
activity, and that achieved firstly one of the
following activities before January 2011: “Deploy”
or “Deactivate”. All the 1,087 projects are aligned
with the process of Figure 2. This dataset has 1
dependent variable and 10 predictors pertaining to
projects. These predictors consist of 1 process
outcome and 9 process activities outcomes.
For each project, the binary outcome (dependent)
variable “final status” is available and indicates
whether the project achieved firstly the status of
“production/mature” (good projects) or “inactive”
(bad projects). This dataset contains 295 bad
projects and 792 good projects. It means that 27% of
the 1087 projects achieved the “final status” as
inactive, and 73% of them, as production/mature. In
addition, this dataset has also 9 process activities
outcomes available for each project, describing the
total duration of the project in each process activity
and the percentage it represents of the project
duration. The project duration is one process
outcome and represents the duration of the project
from the Specify Requirements activity to the first
month of one of the following activities: Deploy or
Deactivate. The duration is measured in quantity of
months.
In our work, we introduce new variables of the
external context and relate it to the process activities
and to the process execution results to support these
decisions.
4.3 Application of the Method
In this explanatory case study, we applied all the 8
steps of the proposed method to define relevant
external variables that influenced the project
conclusion of SF projects using the dataset detailed
in section 4.2 and considering the software
development process defined in section 4.1. The
result after applying the steps 1 to 6 of the proposed
method 1 is a list of possible relevant external
variables. The result applying the steps 7 to 8 is a list
showing just the relevant variables among the
external contexts, the activities outcomes and the
process outcomes; and a decision tree showing the
relation among these relevant variables.
Step 1 – The goal “Conclude the software
development in the Deploy activity” was considered
for the process of Figure 2. This goal is achieved
Table 1: Part of the Final Intelligence Tree after all the executions of steps 2 to 6.
KIT category
Surveillanc
e Area
KIT KIQ External Context Variable
Strategic decisions
and actions
Economic
Economic
recession
What are the predictions for IT investments
of public and private organizations for the
next years?
IT Investment Prediction;
What are the predictions for the
unemployment rate for next years?
Unemployment Rate prediction;
Unemployment Rate;
What are the predictions for the inflation rate
for next years?
Inflation Rate prediction;
Inflation Rate;
Strategic decisions
and actions/ Early-
warning
Politic
IT goals of
the Govern
What are the Open Source Software patterns
adopted by the Govern?
Open Source Software patterns;
KMIS 2011 - International Conference on Knowledge Management and Information Sharing
404
when the dependent variable “final status” is
production/mature.
Step 2 to 6 – For the defined process goal, the result
of all the executions of steps 2 to 6 was a table
similar to the Table 1. This table contains possible
relevant external variables that can impact the
process goal.
Step 7 – In this step the focus is on collecting the
past information of the external variables defined
previously. As the projects could be developed by
people that lives in different countries anywhere in
the planet, it was necessary to make a simplification
assuming that the USA was the original country of
every one involved in the 1,087 projects of the
dataset. The USA was chosen because it is one of
the most influential countries in the global economy,
as we could see in the global economy crises of
2008 that got initiated in the USA. Another aspect to
consider in this step is that sometimes it is not
possible to collect the past information of all the
external variables because, for example, it may not
exist. In our research, we have collected the historic
of 2 external variables defined previously: the USA
unemployment rate and the USA inflation rate
(IndexMundi, 2010).
Step 8 – In this step we followed the KDD process
(Fayyad et al. 1996) and we applied the Feature
Selection technique to show the variables relevance,
and we used Decision Tree C&RT (Standard
Classification Trees with Deployment) to show
explicitly the rules of the relation between the
relevant external contexts, the relevant process
outcomes and the relevant process activities
outcomes for predicting the dependent variable
“final status”. This was the KDD goal.
Below, we explain how the data mining technique
determined that Unemployment Rate was a relevant
external context variable to the defined process goal
and to one of its activity outcome. We used the
STATISTICA Data Miner software (StatSoft, 2010)
that uses the CRISP-DM process (CRoss-Industry
Standard Process for Data Mining). According to
Azevedo and Santos (2008) CRISP-DM can be
viewed as an implementation of the KDD process of
Fayyad et al (1996). KDD process steps:
Steps 8.1 (Selection) and 8.2 (Pre-processing) -
These 2 steps were some of the most time
consuming steps, as Mack et al. (2005) already
experienced. The data requirements for what is
necessary as well as the data acquisition itself have
been taken care of already with the data dump from
SourceForge (SF). The output of the step 8.1 is the
process log, the dataset that was detailed in section
4, and the output of the step 8.2 is a new dataset with
the historic of the collected external contexts (step
7) associated to the process activities outcomes and
process outcomes (step 8.1).
Step 8.3 (Transformation) – In this case study, we
run the Feature Selection of STATISTICA Data
Miner (StatSoft, 2010) to automatically find and
rank important predictor variables for predicting the
dependent variable “final status” that discriminates
between good and bad projects, as shown in Figure
3. Feature Selection (FS) technique is “the process
of reducing dimensionality by removing irrelevant
and redundant features” (Guyon & Elisseeff, 2003
apud Refaeilzadeh et al., 2007)(Blum & Langley,
1997 apud Refaeilzadeh et al., 2007) reducing “the
complexity of the problem, transforming the data set
into a data set of lower dimensions” (Nisbet et al.,
2009). Figure 3 shows that among the 12 variables
of the dataset created in the last step, there are 5 that
have a p-value of less than 0.01, i.e., that stand out
as the most important predictors variables to
determine whether a project would be finalized in
the production/mature or in the inactive status.
Starting from the most relevant to the less
relevant, these 5 variables are: 1-Project duration; 2-
Specify requirements duration; 3-Inflation rate; 4-
Unemployment rate; 5-Perform Beta Test Duration.
Note that 2 of these relevant variables are process
activities outcomes; 1 is a process outcome; and the
third and the fourth most relevants variables are
from the external context.
Figure 3: Best predictors variables for categorical
dependent status_final ordered top to bottom on basis of
lowest p-value to highest (Stratified Random Sampling).
Step 8.4 (Data Mining) - Decision trees are
powerful tools for classification and prediction. The
decision tree C&RT (Standard Classification Trees
with Deployment) of Figure 4 was run using
STATISTICA Data Miner (StatSoft, 2010)
considering the relevant variables found in the
previous step. We used the V-fold cross validation
and a 30% sample of dataset for testing to assess the
accuracy of the model. Based on the 1087 projects
of the full dataset, initially we used a training data
sample to build the decision tree (training phase),
then, a testing data sample to refine and evaluate the
A METHOD FOR DISCOVERING THE RELEVANCE OF EXTERNAL CONTEXT VARIABLES TO BUSINESS
PROCESSES
405
decision tree (testing phase), and finally, we used
another dataset with different projects to re-evaluate
the accuracy of the decision tree (re-evaluation
phase).
In the training phase the decision tree had an
error rate of 19.12%; in the testing phase, 21.53%;
and in the re-evaluation phase, 20%. The error rate
of 19.12% (training phase) means that the decision
tree C&RT can predict correctly with an accuracy of
80.80% whether a project will be finalized in the
production/mature or in the inactive status. The
percent of correct predictions for the bad projects
(final status = inactive) is 77.44%; and for the good
projects (final status = production/mature) is
82.10%.
Step 8.5 (Interpretation/Evaluation) - The
decision tree C&RT (Standard Classification Trees
with Deployment) of Figure 4 show the relation
between the relevant external contexts, the relevants
process activities outcomes and the relevants process
outcomes. This decision tree shows that the process
outcome “Project duration” is related to the Perform
beta test activity by its outcome “Perform beta test
duration” and that these outcomes are related to the
external context “Inflation rate”, as we can see in
nodes 1, 2, 4, 6, 8 and 11. Node 11 clearly shows the
relevance of the external variable to the Perform
beta test activity. It evidences that, when the
inflation rate raises below or equal 2.705 and greater
than 1.67, then there is a higher probability of the
projects, that have Project duration <=4.5 and
Perform beta test duration <=0.5, to be deactivated,
i.e., to be concluded as inactive.
The project manager or the decision maker can
use the decision tree when, for example, he will
decide to develop a new software project that will
last less than 0.5 month in the “Perform beta test”
activity, so he can see the estimate for the USA
inflation rate when this project is supposed to be
concluded. If this rate raises below or equal 2.705
and greater than 1.67, so there is a higher probability
of this project to be concluded as inactive, i.e., the
decision maker can decide not to start this project or
he can make actions to maximize the chances of this
project to be deployed and minimize the chances of
it be inactive. This same scenario can happen with
an on going project, that is why the relevant external
contexts must be monitored because it may fire a
change during the process execution or before the
project start.
5 CONCLUSIONS
5.1 Analysis and Discussion
It is important to note that external variable
relevance is discovered based on the process log. As
Figure 4: Part of the decision tree C&RT (training phase) for the SF projects dataset considering the best predictor variables
to the dependent variable “final status”.
KMIS 2011 - International Conference on Knowledge Management and Information Sharing
406
with any data mining approach, the discovered
knowledge depends on the amount of detailed
information available in the log. This is a limitation
of the proposed method. Therefore, when our
approach discovers that a specific external variable
is not-so-relevant, it does not mean that it is not
relevant at all; instead, it means that the process log
did not include enough evidences pointing to the
relevance of this external variable to the historical
process log, when compared to other variables.
Therefore, it is important to take into account a
process log with enough information to run our
method and to consider other methods and the
experience and feelings of the specialists and of the
decision makers when deciding which external
variables are relevant to be scanned. At least, it must
contemplate the relevant variables found in our
proposed method. Another limitation in the method
is that transforming some KIQs into external
variables may be very difficult, as well as collecting
these variables.
In this explanatory case study, our goal was not
to get the most relevant external variables that exist,
but our goal was to confirm the relevance of the
defined variables identified applying our proposed
method. It explains why we could do some
limitations in this case study, such as, interviewing
people that were not involved in none of the 1087
projects of this dataset neither had experience in OS.
Our method differs from existing approaches in
the literature (Rosemann et al., 2008); (Soffer et al.,
2010) since it suggests new external context
variables that may not be part of the organizational
memory and that can be very relevant to the
organization achieve the process goals; and shows
which specific process activities are impacted by the
external context variables to the organization
achieve the process goal.
5.2 Conclusion and Future Work
Successful organizations are those able to identify
and answer appropriately to changes in their internal
and external environments. The organizations´
decision makers need to make important decisions in
order to carry this out.
In this paper we described a method for
supporting the identification and prioritization of
variables to be considered in the context of the
external environment that impacts process
execution. This method also shows which specific
process activities are impacted by these variables to
the organization achieve its process goals. An
explanatory case study illustrated the application of
our method in a software development process using
real data from projects of SourceForge.net. This
method is based on CI and data mining techniques
and provides the process manager with a fact-based
understanding on which are the most relevant
external variables that really influenced previous
process executions, among the several variables that
could be taken into consideration unnecessarily. This
case study showed that changes in relevant variables
of the external context may fire a decision of the
decision maker to quickly responding to these
changes, by adapting the process specification, or
creating other business rules to be followed by the
business process.
As future work we suggest applying our
proposed method: in others different scenarios, such
as oil&gas and risk management; applying to larger
samples of process log and with more variables;
interviewing decision makers of the same process
log organization. We also suggest refining the model
evaluation of our method.
REFERENCES
Azevedo, A., Santos, M. F., 2008. KDD, Semma and
CRISP-DM: A Parallel Overview, European
Conference Data Mining-IADIS.
Comino, S., Manenti, F., & Parisi, M., 2007. From
planning to mature: On the success of open source
projects. Research Policy, 36(10), 1575-1586.
Retrieved from http://www.scopus.com.
Crerie, R., 2009. A method for discovering of business
rules by using mining. UNIRIO. Master degree.
BizAgi Process Modeler., Version 1.6.1.0, 2011. BPMN
Software. http://www.bizagi.com. May/2011.
Cook, M., Cook, C. 2000. Competitive Intelligence.
London: Kogan Page Limited.
Dey, A. K. 2001. Understanding and using context’,
Personal and Ubiquitous Computing, 5(1), pp 4–7.
Fayyad, U. M., Piatetsky-Shapiro, G., Smith, P. e
Uthurusamy, R. 1996. Advances in Knowledge
Discovery and Data Mining. AAAI/MIT Press.
Herring, J. P. 1999. Key Intelligence Topics: A Process to
Identify and Define Intelligence Needs. Competitive
Intelligence Review, Vol. 10, No. 2.
Herring, J. P., Francis, D. B. 1999. “Key Intelligence
Topics: A Window on the Corporate Competitive
Psyche”, Competitive Intelligence Review 10(4).
IndexMundi. USA Unemployment and Inflation rate.
Available at http://www.indexmundi.com. April/2011.
Jackson, J., 2002. Data Mining: a Conceptual Overview,
Comm. Association for Information Systems 8(19).
Available at: http://aisel.aisnet.org/cais/vol8/iss1/19.
Jung, J., Choi, I., Song, M. 2006. An integrated
architecture for knowledge management systems and
business process management systems. Computers in
Industry 58, pp 21–34.
A METHOD FOR DISCOVERING THE RELEVANCE OF EXTERNAL CONTEXT VARIABLES TO BUSINESS
PROCESSES
407
Kimball, R., Ross, M. 2002. The Data Warehouse Toolkit.
New York, Wiley Computer Publishing.
Liebowitz, J., 2003. A set of frameworks to AID the
Project manager in conceptualizing and implementing
knowledge management initiatives. Sciencedirect.
Mack, D., Chawla, N. V., Madey, G., 2005. Activity
Mining in Open Source Software, In NAACSOS 2005.
Nisbet, R., Elder, J., Miner, G. 2009. Handbook of
statistical analysis And Data Mining Applications.
California, Elsevier Inc.
Nunes, V. T., Santoro F.M., Borges R. B. 2009. A
Context-based Model for Knowledge Management
embodied in Work Processes, Information Sciences
179, pp 2538-2554.
OMG-Object Management Group/Business Process
Management Initiative. BPMN Specification Releases:
BPMN 1.2. http://www.bpmn.org. October/2010.
PMI. 2008. A guide to the project management body of
knowledge (PMBOK® Guide) – (Fourth ed.).
Newtown Square, PA: Project Management Institute.
Porter, Michael E., 1979. How competitive forces shape
strategy, Harvard business Review, March/April 1979.
Refaeilzadeh, P., Tang, L., Liu, H., 2007. On Comparison
of Feature Selection Algorithms, Association for the
Advancement of Artificial Intelligence (www.aaai.org).
Ramos, E. C., Santoro, F. M.. 2010. A Model to Support
Knowledge Management based on External Context.
In: Workshop of Theses and Dissertations– Brazilian
Symposium on Information Systems, Marabá, Brazil.
Recker, J. C., Rosemann, M. 2006. Context-aware Process
Design: Exploring the Extrinsic Drivers for Process
Flexibility. In: The 18th International Conference on
Advanced Information Systems Engineering.
Proceedings of Workshops and Doctoral Consortium.
Rosemann, M., Recker, J., Flender, C. 2008.
"Contextualization of Business Processes,"
International Journal of Business Process Integration
and Management, vol. 3, pp. 47-60.
Saidani.O, S. Nurcan. 2007. Towards Context Aware
Business Process Modelling, Workshop on Business
Process Modelling, Development, and Support (BP
MDS), Trondheim, Norway.
Soffer P., Ghattas J., Peleg M. 2010. A Goal-Based
Approach for Learning in Business Processes, In:
Nurcan et al (eds), Intentional Perspectives on
Information Systems Engineering, Springer.
StatSoft Inc. 2010. STATISTICA Data Miner.
http://www.StatSoft.com. May/2011.
Greg Madey, ed., The SourceForge Research Data
Archive (SRDA). University of Notre Dame.
Available at http://srda.cse.nd.edu. July/2011.
The Times, SLEPT analysis. 100 Edition. 2010.
www.thetimes100.co.uk. Last accessed Apr/2010.
Vuori, V., Pirttimäki, V. 2005. Identifying of Information
Needs in Seasonal Management, Frontiers of E-
business Research, pp. 588-602.
Yin, R. K., 2009. Case Study Research: Design and
Methods.
Fourth Ed. SAGE Publications. California.
KMIS 2011 - International Conference on Knowledge Management and Information Sharing
408