AGMI - AN AGENT-MINING TOOL AND ITS APPLICATION TO
BRAZILIAN GOVERNMENT AUDITING
Carlos Vin
´
ıcius Sarmento Silva
Controladoria-Geral da Uni
˜
ao, SAS, Qd 01, Bl A, Edif
´
ıcio Darcy Ribeiro, Bras
´
ılia, DF, CEP 70.070-905, Brazil
C
´
elia Ghedini Ralha
Computer Science Department, University of Bras
´
ılia, POBox 4466, Bras
´
ılia, DF, CEP 70.904-970, Brazil
Keywords:
Agent-mining tool, Brazilian government auditing, AGMI, e-Government.
Abstract:
This paper presents research combining two originally separated areas increasingly interrelated: distributed
multi-agent systems and data mining. In our approach, we prove the interaction features in a bilateral and
complementary way, since we have defined an integrated architecture and developed a prototype, which has
been used in a government auditing study case. In Brazil, government auditing is performed by the Office
of the Comptroller General (CGU), where several approaches are being used to prevent and fight corruption.
However, some activities such as government purchasing fraud detection are limited by the difficulty in finding
effective ways to implement. Considering data mining perspective, we have used different model functions,
such as clusterization and link analysis with association rules. Our approach integrating multi-agent and data
mining techniques resulted in expressive discovered knowledge, which would help detection of cartels acting
in public bidding processes at CGU.
1 INTRODUCTION
The CGU is the agency of the Federal Govern-
ment in charge of assisting the president of Brazil
in matters which, within the executive branch,
are related to defending public assets and en-
hancing management transparency through inter-
nal control activities, public audits, corrective and
disciplinary measures, corruption prevention and
combat, and coordinating ombudsman’s activities
(http://www.cgu.gov.br/english/).
Nowadays, a large volume of information has
been produced and stored by the Brazilian govern-
ment information systems. Considering only 2009,
the Federal Accountability System (SIAFI) registered
one billion of financial transactions. All this data are
normally used to support the preparation and execu-
tion of government auditing. In this way, CGU has
driven efforts to apply technologies in order to pro-
mote transparency and prevent corruption. However,
the analysis of the available data to produce useful
knowledge to auditing activities is a hard task. Data
mining (DM) and knowledge-discovery in databases
(KDD) are playing an important role, especially when
integrated to other computational techniques to ana-
lyze and explore information.
In the past decade, intelligent agents/multi-agent
systems (MAS) and DM/KDD have emerged as
two increasingly interrelated research areas, opening
space to the agent-mining interaction and integration
(AMII) research field. This new field has driven ef-
forts from both sides to find benefits and comple-
mentarities to both communities (Cao, 2009; Ralha,
2009).
In the context of CGU and AMII research field,
this paper presents an agent-mining tool AGMI.
AGMI has been tested using the Brazilian Gov-
ernment Auditing data, in order to help to pre-
vent corruption through KDD. The Brazilian Fed-
eral Bidding database used is entitled ComprasNet
(http://www.comprasnet.gov.br/).
The rest of this paper is presented as follows:
in Section 2, we discuss the problem at the Brazil-
ian Government Auditing domain; in Section 3, we
present AGMI architecture and prototype; in Sec-
tion 4, we present the experimental results and the
discovered knowledge; and finally, in Section 5, we
conclude and suggest future work.
535
Vinícius Sarmento Silva C. and Ghedini Ralha C..
AGMI - AN AGENT-MINING TOOL AND ITS APPLICATION TO BRAZILIAN GOVERNMENT AUDITING.
DOI: 10.5220/0003333905350538
In Proceedings of the 7th International Conference on Web Information Systems and Technologies (WEBIST-2011), pages 535-538
ISBN: 978-989-8425-51-5
Copyright
c
2011 SCITEPRESS (Science and Technology Publications, Lda.)
2 THE PROBLEM
In general, the identification of cartels is a difficult
task since it requires analysis of several public bid-
ding processes, which usually exceeds the scope of
only one government department. Cartels can op-
erate in various government departments, cities and
even states of the Federation. Furthermore, the anal-
ysis of data from databases using Structured Query
Language (SQL) queries is also impractical because
of the exponential solution space. Thus, the problem
consists in creating an efficient way to identify groups
of companies which might be suspected of practicing
cartels in public bidding processes.
In (Silva and Ralha, 2010), a solution using As-
sociation Rules to help solving the problem of car-
tels detection in public bidding processes is proposed.
The proposal is due to the fact that this technique is
useful to find strong relationships among attributes.
Thus, it is possible to apply this technique creating
a dataset, so each attribute is a boolean value, which
indicates the participation or not of the company in
each bidding process of the database. The dataset for
association rule technique must be constructed as the
matrix A consisting of m rows and n columns, where
m is the total of bidding processes from database and
n is the total of companies from database.
a
i, j
=
(
true if j has participated of i
f alse if j has not participated of i
1 i m; 1 j n;
where i is a bidding and j is a company
Thus, we expect to obtain rules with transi-
tive properties, like the following: company
A
=
true, company
B
= true company
C
= true.
However, tasks like dataset preparation, the exe-
cution of different DM algorithms and rule evaluation
are adequate to be integrated to the MAS approach,
considering either the distribution aspects of the au-
tomated tasks or the parallel execution of algorithms.
Thus, we have to propose a way to improve the time
execution of the tasks, since it took us weeks to per-
form manually the tasks of DM using Weka.
3 AGMI ARCHITECTURE AND
PROTOTYPE
DM has many characteristics, which make interac-
tion and integration to MAS possible by combining
KDD through the different DM techniques. In this
section, we present our architecture of cooperative
DM, which uses two frameworks: Java Agent De-
velopment Framework (JADE), a Java framework to
support the development of MAS (Bellifemine et al.,
2007); and Data Mining for Agents (DMA), a Weka-
based framework, developed to enable integration of
DM in a multi-agent environment.
Figure 1: AGMI Architecture Schema.
Figure 1 presents the structural schema of AGMI.
The agents run on JADE platform, and may be dis-
tributed among different physical hosts. The model
uses three different types of agents: Mining Agent,
Coordinator Agent and Evaluator Agent.
Mining Agent responsible for running specific
DM algorithms. Thus, we have multiple agents
performing the same or different services of DM,
whereby the DM services are classified by DM tech-
niques (e.g. Association Rule Service, Clustering Ser-
vice).
Coordinator Agent responsible for overall coor-
dination of the DM activities executed by the mining
agents. It prepares datasets for the agents and coordi-
nates the interactions allowing the cooperation among
different agents. It also has the ability to hire a mining
agent to perform a service required. When the service
required is provided by more than one agent, the co-
ordinator chooses the agent that has the best profile
to perform the task, based on memory and process-
ing capacities. Agent services can be found in JADEs
Yellow Pages a service provided by JADE to pub-
lish the services of registered agents (Directory Facil-
itator).
Evaluator Agent responsible for evaluating the
found knowledge before being presented to the user.
In general, DM algorithms work with evaluating crite-
ria (e.g. support and confidence in association rules).
However, it is possible to adopt other criteria to dis-
covered knowledge. Thus, we can adapt this agent
WEBIST 2011 - 7th International Conference on Web Information Systems and Technologies
536
to perform a specific function or just define the mini-
mal values of covering and accuracy of rules resulting
from mining algorithms.
4 EXPERIMENTAL RESULTS
In order to test AGMI, we used real data from Com-
prasNet the Brazilian Federal Bidding online sys-
tem. The used data is relative to all bidding processes
of a specific type of service contracted by Federal
Executive agencies, between years 2005 and 2008,
including all states of Brazil (26 states + the Fed-
eral District). The database includes 26,615 records,
2,701 bidding processes and 3,051 companies. Each
record in the dataset represents one bid of a company
in a specific bidding process.
4.1 First Experiment
We started our experiments with AGMI using three
agents: the coordinator, the evaluator and the rule as-
sociation agent. The Rule Association Agent used the
Apriori algorithm, available in Weka framework. The
algorithm has been adapted to DMA in order to be
used in a multi-agent environment. In previous tests,
before AGMI prototype was implemented (Silva and
Ralha, 2010), we found that a rule with high lower
bound of support might just imply the presence of big
companies in bidding processes (frequent itemsets).
Thus, setting a high lower bound of support in this al-
gorithm can suppress the appearance of several good
rules, with real features of cartels. On the other hand,
high lower bound of confidence ensures the selection
of good rules.
With the help of experts, in our first experiment
with AGMI, we set the lower bound of support to
get rules with 9 occurrences on the database, and the
lower bound of confidence was set in 90%. We have
also defined Equation 1 for evaluating the rules ob-
tained through the DM process. This function was im-
plemented in the Evaluator Agent for measuring the
quality of the rules, and means the probability of one
company of the suspicious group winning a bidding
process. The higher the Rule Quality (RQ) value, the
more suspicious the group is of cartel practicing.
RQ = 100.
V (C)
Sup. × Inst.
(1)
In Equation 1, Sup. is the rule support value; Inst.
is the total number of instances; C is the company
set from rule; B is the bidding processes where the
company group C has participated; and V (C) is the
number of victories in B of any company in C.
We ran our tests using two computers: Host A (In-
tel Core 2, 2.40 GHz, 2.00 GB RAM) and Host B
(Intel Pentium Dual, 1.86 GHz, 2.00 GB RAM). The
coordinator and evaluator agent were set in Host A,
and the association rule agent was set in Host B. In
this experiment, we found 128 rules, and the execu-
tion time was 29 minutes. The top 100 rules scored
an average of 16.56 in RQ evaluation. The average of
support, on the other hand, was 30.98.
The top 10 rules scored an average of 33.00 in RQ
evaluation, and the average of support was 26.90. The
best rule, according to Equation 1 had 46 points of RQ
and 26 of support. As noticed, the top ten rules have
a smaller average support than the top 100 rules. Val-
idating the results with auditing experts, we conclude
that higher limits of support do not guarantee better
rules.
4.2 Second Experiment
In the First Experiment, with only one mining agent
in the system (Rule Association), we couldn’t set the
lower bound support to less than 24. This happened
due to the lack of memory resources available in our
machines. As lower the bound of support is set, as
more memory is consumed by the rule association al-
gorithm. Furthermore, it’s quite possible to have rules
with support less than 24 with real characteristics of
a cartel. Thus, we needed to introduce a strategy to
divide our space to seek the groups more accurately.
For this, clustering is an appropriate DM technique to
do such activity.
In addition to the Rule Association Agent we
have used Clustering Agent, which implemented the
Expectation-Maximization (EM) algorithm to dis-
cover clusters considering the companies and the
Brazilian states. According to (Han and Kamber,
2005), in EM algorithm, each object is assigned to
each cluster according to a weight representing its
probability of membership. For the experiments with
the Clustering Agent we have included it in Host A.
With the addition of the Clustering Agent, we
found the regions of public biddings, and the com-
panies that participated of those biddings in each re-
gion. Seven clusters of states were found, and, for
each cluster region, the rule association agent ran his
technique searching associated companies. The found
clusters were: 1 - {AM, PA, AC, RO, AP, RR, MT,
MS, RJ, ES and MG}; 2 - {RS, SC, PR}; 3 - {BA,
SE}; 4 - {PE, PB, RN, CE, PI, MA} 5 - {TO, DF,
GO}; 6 - {SP}; 7 - {AL}.
We found in this experiment 6,150 rules. In the
top 100 rules, we found an average of 69.71 in RQ,
and 9.78 of support. In the top 10 rules, the RQ
AGMI - AN AGENT-MINING TOOL AND ITS APPLICATION TO BRAZILIAN GOVERNMENT AUDITING
537
average was 89.70, and the average of support was
9.40. The time spent in the execution was 75 minutes.
Based on this result, we can notice that the addition
of the Clustering Agent improved the results dramat-
ically. Considering the top 10 rules of both experi-
ments, the values for quality of rules has grown up
more than 150%.
If we compare the average of the Top 100 rules,
the Second Experiment presents an average more than
4 times greater than the First Experiment’s average.
With the addition of the Clustering Agent in our tests,
the search space was divided, and so it was possible
to reduce the lower bound of support generating more
and better rules. Thus, we improved a lot the quality
of rules produced by the system. This proves the po-
tential to integrate different DM techniques to SMA.
We also expected that all clusters were made
up by states with common borders. In general,
the companies do not act in states far from each
other. However, the results have shown one cluster
with large proportions and with some very discon-
nected states (Cluster 1). For more information about
the geographic position of the Brazilian states, visit
http://www.ibge.gov.br/estadosat/.
4.3 Discovered Knowledge
The experiments present lots of rules, specially the
second one (Section 4.2), due to the value of lower
bound of support. And several rules presented by the
Evaluator Agent after the execution showed lots of
company groups with evidence of cartel acting. The
Clustering Agent result also revealed the trends of
companies’ participation in public bidding in Brazil.
Following, we present some of the best rules as exam-
ple of discovered knowledge:
In 9 different public biddings in the same state,
and in just one government agency, one rule has
pointed the participation of two specific compa-
nies. In spite of the fact that both companies have
participated in all biddings, just one of the com-
panies has won all of the biddings. When we an-
alyzed the history of the loser company, we found
that it had only participated in those 9 exactly bid-
ding processes, evidencing a possible simulation
of competition to hide the cartel characteristics.
Probably the company was created just to sim-
ulate competition in the public biddings, where
competition is mandatory.
A group with cartel characteristics, made up by
4 companies, has acted in Region 2 in 10 differ-
ent public biddings, and has won 9 of these. Two
companies of that group have appeared in other
7 different groups, in the same region, discovered
by other rule association.
5 CONCLUSIONS AND FUTURE
WORK
Apart from being in charge of inspecting and detect-
ing frauds in the use of federal public funds, the CGU
is also responsible for developing mechanisms to pre-
vent corruption. Thus, in this article we presented the
Agent-Mining Tool AGMI, a MAS based tool to
support distributed DM activities which was used at
the Brazilian Government Audition domain.
In addition to enable the combination of distinct
DM techniques, in order to improve the KDD process,
AGMI also includes the DMA (Weka-based frame-
work) made up by a set of DM algorithms adapted to
multi-agent environment.
We tested AGMI as a proposed solution to the
problem of detecting cartels in public biddings. Two
experiments were conducted and the results proved
that AGMI is a potential solution for the problem.
Several association rules indicating evidences of car-
tel acting in public biddings were found. Besides, the
combination of DM techniques enabled an increase of
150% in the quality of the top ten association rules.
For future work, we will study the inclusion of
other DM techniques and the reduction of the over-
all processing time, through different data preparation
methods and its automation. We are also studying
other mechanisms and heuristics to use, in order to
improve our agents interaction protocol, and to give
more autonomy to coordinator and evaluator agents
during the KDD process.
REFERENCES
Bellifemine, F. L., Caire, G., and Greenwood, D. (2007).
Developing Multi-Agent Systems with JADE. Wiley.
Cao, L. (2009). Introduction to agent mining interaction
and integration. In Cao, L., editor, Data Mining and
Multi-agent Integration. Springer US.
Han, J. and Kamber, M. (2005). Data Mining: Concepts
and Techniques. Morgan Kaufmann Publishers Inc.,
San Francisco, CA, USA.
Ralha, C. G. (2009). Towards the integration of multiagent
applications and data mining. In Cao, L., editor, Data
Mining and Multi-agent Integration. Springer US.
Silva, C. V. S. and Ralha, C. G. (2010). Utilizac¸
˜
ao
de t
´
ecnicas de minerac¸
˜
ao de dados como aux
´
ılio
na detecc¸
˜
ao de cart
´
eis em licitac¸
˜
oes. In XXX
Congresso da Sociedade Brasileira de Computac¸
˜
ao
(SBC/WCGE), Porto Alegre, RS, Brazil.
WEBIST 2011 - 7th International Conference on Web Information Systems and Technologies
538