The Way of Adjusting Parameters of the Expert
System Shell McESE: New Approach
I. Bruha
1
and F. Franek
1
1
McMaster University, Department of Computing & Software
Hamilton, Ont., Canada, L8S4K1
Abstract. We have designed and developed a general knowledge representation
tool, an expert system shell called McESE (McMaster Expert System Environ-
ment); it derives a set of production (decision) rules of a very general form.
Such a production set can be equivalently symbolized as a decision tree.
McESE exhibits several parameters such as the weights, thresholds, and the
certainty propagation functions that have to be adjusted (designed) according to
a given problem, for instance, by a given set of training examples. We can use
the traditional machine learning (ML) or data mining (DM) algorithms for in-
ducing the above parameters can be utilized.
In this methodological case study, we discuss an application of genetic algo-
rithms (GAs) to adjust (generate) parameters of the given tree that can be then
used in the rule-based expert system shell McESE. The only requirement is that
a set of McESE decision rules (or more precisely, the topology of a decision
tree) be given.
1 Introduction
When developing a decision-making system, we (as builders, knowledge engineers)
utilize an existing expert system shell, either developed by ourselves or by a
specialized expert-system tool builder.
We have designed and implemented a software tool (expert system shell) called
McESE (McMaster Expert System Environment) that yields (induces) a set of
production (decision) rules of a very general form; among all, one of its advantages is
a large set of several routines for handling uncertainty [9], [10].
Note that a production (decision) set derived by the system McESE can be
equivalently exhibited as a decision tree. The main and only constraint of our new
approach is that we expect in this methodological case study that the logical structure
(topology) of a set of decision rules (a decision tree) is given. The point of this study
consists in that even if this logical structure is provided, particularly in real-world
tasks, the designer may be faced with the lack of knowledge of other parameters of
the tree. These parameters are usually adjustable values (either discrete or numerical
ones) of production rules or other knowledge representation formalisms such as
frames.
Bruha I. and Franek F. (2006).
The Way of Adjusting Parameters of the Expert System Shell McESE: New Approach.
In 6th International Workshop on Pattern Recognition in Information Systems, pages 119-126
DOI: 10.5220/0002470301190126
Copyright
c
SciTePress
Our McESE system exhibits these parameters: the weights and thresholds for
terms and the selection of the certainty value propagation functions (CVPF for short)
from a predefined set. In order to select the optimal (or at least suboptimal)
values/formulas for these parameters we use the traditional approach of machine
learning (ML) and data mining (DM); we adjust the above parameters according to a
set of training (representative) observations (examples). However, we use a different
and relatively new approach for the inductive process based on the paradigm of
genetic algorithms (GAs).
A genetic algorithm includes a long process of evolution of a large population of
chromosomes (individuals, objects) before selecting optimal values that have a better
chance of being globally optimal compared to the traditional methods. The funda-
mental idea is simple: individuals (chromosomes) selected according to a certain
evaluation criterion are allowed to crossover so as to produce one or more offsprings.
The offsprings are slightly different from their ‘parents’. Any generic algorithm
evidently performs according to how the term ‘slightly different’ and evaluation
criterion are defined.
We present in this paper a simulation of applying GAs to generate/adjust the
parameter values of a McESE decision tree. Section 2 briefly describes our rule-based
expert system shell McESE with emphasis on the form of rules. Section 3 then
surveys the structure of GAs. Afterwards, Section 4 introduces the methodology of
this project including a case study.
2 Methodology: Rule-based Expert System Shell McESE
McESE (McMaster Expert System Environment) [9], [10] is an interactive
environment for design, creation, and execution of backward as well as forward
chaining rule-based expert systems. The main objectives of the project are focused on
two aspects: (i) to provide extensions of regular languages to deal with McESE rule
bases and inference with them, and (ii) a versatile machinery to deal with uncertainty.
As for the first aspect, the language extension is facilitated through a set of
functions with the native syntax that provide the full functionality required (for
instance, in the Common-Lisp extension these are Common-Lisp functions callable
both in the interactive or compiled mode, in the C extension, these are C functions
callable in any C program).
As for the latter one, the versatility of the treatment of uncertainty is facilitated by
the design of McESE rules utilizing weights, threshold directives, and CVPF's
(Certainty Value Propagation Function). The McESE rule has the following syntax:
R: T
1
& T
2
& .... & T
n
=F=> T
T
1
,...,T
n
are the left-hand side terms of the rule R and T is the right-hand side term of
the rule R, F symbolizes a formula for the CVPF.
A term has the form:
weight * predicate [op cvalue]
where weight is an explicit certainty value,
predicate is a predicate possibly with variables (it could be negated by ~ ), and
120
op cvalue is the threshold directive: op can either be >, >=, <, or <=, and cvalue is
an explicit certainty value.
If the weight is omitted it is assumed to be 1 by default. The threshold directive can
also be omitted. The certainty values are reals in the range 0..1 .
It should be emphasized that a value of a term depends on the current value of the
predicate for the particular instantiation of its variables; if the threshold directive is
used, the value becomes 0 (if the current value of the predicate does not satisfy the
directive), or 1 (if it does). The resulting value of the term is then the value of the
predicate modified by the threshold directive and multiplied by the weight.
When the backward-chaining mode is used in the McESE system, each rule that
has the predicate being evaluated as its right-hand side predicate is eligible to ‘fire’.
The firing of a McESE rule consists of instantiating the variables of the left-hand side
predicates by the instances of the variables of the right-hand size predicate, evaluating
all the left-hand side terms and assigning the new certainty value to the predicate of
the right-hand side term (for the given instantiation of variables). The value is com-
puted by the CVPF F based on the values of the terms T
1
,...,T
n
. In simplified terms,
the certainty of the evaluation of the left-hand side terms determines the certainty of
the right-hand side predicate. There are several built-in CVPF’s the user can use (min,
max, average, weighted average), or the user can provide his/her own custom-made
CVPF's. This approach allows, for instance, to create expert systems with fuzzy logic,
or Bayesian logic, or many others [14].
It is a widely known conflict that any rule-based expert system must deal with the
problem of which of the eligible rules should be ‘fired’. This is dealt with by what is
commonly referred to as conflict resolution. This problem in McESE is slightly
different; each rule is fired and it provides an evaluation of the right-hand predicate –
and we face the problem which of the evaluation should be used. McESE provides the
user with three predefined conflict resolution strategies: min (where one of the rules
leading to the minimal certainty value is considered fired), max (where one of the
rules leading to the maximal certainty value is considered fired), and rand (a
randomly chosen rule is considered fired). The user has the option to use his/her own
conflict resolution strategy as well.
3 Survey of Genetic Algorithms
Data Mining (DM) consists of several procedures that process the real-world data.
One of its components is the induction of concepts from databases; it consists of
searching usually a large space of possible concept descriptions. There exist several
paradigms how to control this search, for instance various statistical methods, logi-
cal/symbolic algorithms, neural nets, and the like. However, such traditional
algorithms select immediate (usually local) optimal values.
The genetic algorithms (GAs) exhibit a newer paradigm for search of concept
descriptions. They comprise a long process of evolution of a large population of
individuals (objects, chromosomes) before selecting optimal values, thus giving a
‘chance’ to weaker, worse objects. They exhibit two important characteristics: the
search is usually global and parallel in nature since a GA processes not just a single
individual but a large set (population) of individuals.
121
Genetic algorithms utilize (emulate) biological evolution and are generally
utilized in optimization processes. The optimization is performed by processing a
population of individuals (chromosomes). A designer of a GA has to provide an
evaluation function, called fitness, that evaluates any individual. The fitter individual
is given a greater chance to participate in forming of the new generation. Given an
initial population of individuals, a genetic algorithm proceeds by choosing individuals
to become parents and then replacing members of the current population by the new
individuals (offsprings) that are modified copies of their parents. This process of
reproduction and population replacement continues until a specified stop condition is
satisfied or the predefined amount of time is exhausted.
Genetic algorithms exploit several so-called genetic operators:
Selection operator chooses individuals (chromosomes) as parents depending on
their fitness; the fitter individuals have on average more children (offsprings)
than the less fit ones. Selecting the fittest individuals tends to improve the popu-
lation.
Crossover operator creates offsprings by combining the information involved in
the parents.
Mutation causes the offsprings to differ from their parents by introducing a
localized change.
Optional are other routines such as high-claiming that processes (modifies)
the objects in a narrow ‘neighbourhood’ of each new offspring.
Details of the theory of genetic algorithms may be found in several books, e.g. [11],
[13]. There are many papers and projects concerning genetic algorithms and their
incorporation into data mining [1], [8], [4], [5], [12], [15], [16].
We now briefly describe the performance of the genetic algorithm we have
designed and implemented for general purposes, including this project. The
foundation for our algorithms is the CN4 learning algorithm [2], a significant
extension of the well-known algorithm CN2 [6], [7]. For our new learning algorithm
(genetic learner) GA-CN4, we removed the original search section (so-called beam
search) from the inductive algorithm and replaced it by a domain-independent genetic
algorithm working with fixed-length chromosomes. The other portion of the original
CN4 remain unchanged; its parameters have been set to their default values.
The learning starts with an initial population of individuals (chromosomes) and
lets them evolve by combining them by means of genetic operators introduced above.
More precisely, its high-level logic can be described as follows:
procedure GA
Initialize randomly a new population
Until stop condition is satisfied do
1. Select individuals by the tournament selection operator
2. Generate offsprings by the two-point crossover operator
3. Perform the bit mutation
4. Check whether each new individual has the correct value (depending
on the type of the task); if not the individual's fitness is set to 0 (i.e., to
the worst value)
enddo
Select the fittest individual
122
If this individual is statistically significant then
return it
else return nil
The above algorithm mentions some particular operations used in our GA. Their
detailed description can be found e.g. in [3], [11], [13]. More specifically:
- the generation mode of replacing a population is used;
- the fitness function is derived from the Laplacian evaluation formula.
The default parameter values in our genetic algorithm: size of population is 30,
probability of mutation Pmut = 0.002 . The genetic algorithm stops the search when
the Laplacian criterion does not improve after 10000 generations.
Our GA also includes a check for statistical significance of the fittest individual. It
has to comply with the statistical characteristics of a database which is used for
training; the χ
2
-statistics is used for this test of conformity. If no fittest individual can
be found, or it does not comply with the χ
2
-statistic, then nil is returned in order to
stop further search; the details can be found in [4].
4 A Case Study
As we have already stated our methodological study utilizes GA-CN4 for deriving
some parameters of the rule-based expert system McESE. Particularly, an individual
(chromosome) is formed by a fixed-length list (array) of the following parameters of
the McESE system:
- the weight of each term of McESE rule,
- the threshold value cvalue of each term,
- the selection of the CVPF of each rule from a predefined set of CVPF’s
- the conflict resolution for the entire decision tree.
Note that our GA-CN4 is able to process numerical (continuous) attributes;
therefore, the above parameters weight and cvalue can be properly handled. As for the
CVPF, it is considered as a discrete attribute with these singular values (as mentioned
above): min, max, average, and weighed average. Similarly, the conflict resolution is
treated as a discrete attribute.
Since the list of the above parameters is of the fixed size, we can apply the GA-
CN4 algorithm that can process the fixed-length chromosomes (objects) only.
The entire process of deriving the right values of the above parameters (weights,
cvalues, CVPF’s, conflict resolution) looks as follows:
1. A dataset of typical (representative) examples for a given task is selected
(usually by a knowledge engineer that is to solve a given task).
2. The knowledge engineer (together with a domain expert) induces the set of
decision rules, i.e. the topology of the decision tree, without specifying values of
the above parameters.
3. The genetic learner GA-CN4 induces the right values of the above parameters by
processing the training database.
123
To illustrate our new methodology of knowledge acquisition we introduce the
following case study. We consider a very simple task of heating and mixing three
liquids L
1
, L
2
, and L
3
. The first two have to be controlled by their flow and
temperature; then they are mixed with L
3
. Thus, we can derive these four rules:
R
1
: w
11
* F
1
[>= c
11
] & w
12
* T
1
[>= c
12
] =f
1
=> H
1
R
2
: w
21
* F
2
[>= c
21
] & w
22
* T
2
[>= c
22
] =f
2
=> H
2
R
3
: w
31
* H
1
[>= c
31
] & w
32
* F
1
[>= c
32
] &
w
33
* H
2
[>= c
33
] & w
34
* F
3
[>= c
34
] =f
3
=> A
1
R
4
: w
41
* H
2
[>= c
41
] & w
42
* F
2
[>= c
42
] &
w
43
* H
1
[>= c
43
] & w
44
* F
3
[>= c
44
] =f
4
=> A
2
Here F
i
is the flow of L
i
, T
i
its temperature, H
i
the resulting mix, A
i
the adjusted mix,
i =1, 2 (or 3). The corresponding decision tree is on Fig. 1.
We assume that the above topology of the decision tree (without the right values
of its parameters) was derived by the knowledge engineer. The unknown parameters
w
ij
, c
ij
, f
i
, including the conflict resolution then form a chromosome (individual) of
length 29 attributes. The global optimal value of this chromosome is then induced by
the genetic algorithm GA-CN4.
5 Analysis
This project was to design a new methodology for inducing parameters for an expect
system under the condition that the topology (the decision tree) is known. We have
selected domain-independent genetic algorithm that searches for a global optimizing
parameters values.
Our analysis of the methodology indicates that it is quite a viable one. The
traditional algorithms explore a small number of hypotheses at a time, whereas the
genetic algorithm carries out a parallel search within a robust population. The only
disadvantage our study found concerns the time complexity. Our genetic learner is
about 20 times slower than the traditional machine learning algorithms. This disad-
vantage can be overcome by a specialized hardware of parallel processors; however,
this can be accomplished at a highly distinguished research units.
In the near future, we are going to implement the entire system discussed here and
compare it with other inductive data mining tools. The McESE system will thus
comprise another tool for rule-base knowledge processing (besides neural net and
Petri nets) [10].
The algorithm GA-CN4 is written in C and runs under both Unix and Windows.
The McESE system has been implemented both in C and Lisp.
References
1. Bala, J. et al.: Hybrid learning using genetic algorithms and decision trees for pattern
classification. Proc. IJCAI-95 (1995), 719-724
124
2. Bruha, I. and Kockova, S.: A support for decision making: Cost-sensitive learning system.
Artificial Intelligence in Medicine, 6 (1994), 67-82
3. Bruha, I., Kralik, P., Berka, P.: Genetic learner: Discretization and fuzzification of
numerical attributes. Intelligent Data Analysis J., 4 (2000), 445-460
4. Bruha, I.: Some enhancements in genetic learning: A case study on initial population. 14th
International Symposium on Methodologies for Intelligent Systems (ISMIS-2003), Japan
(2003), 539-543
5. Bruha, I.: Rule representation and initial population in genetic learning. Industrial
Simulation Conference: Complex System Modelling (ISC-2005), Berlin (2005), 37-41
6. Clark, P. and Boswell, R.: Rule induction with CN2: Some recent improvements. EWSL-
91, Porto, Springer-Verlag (1991), 151-163
7. Clark, P. and Niblett, T.: The CN2 induction algorithm. Machine Learning, 3 (1989), 261-
283
8. De Jong, K.A., Spears, W.M., Gordon, D.F.: Using genetic algorithms for concept learning.
Machine Learning, 13, Kluwer Academic Publ., 161-188
9. Franek, F.: McESE-FranzLISP: McMaster Expert System Extension of FranzLisp. In:
Computing and Information, North-Holland (1989)
10. Franek, F. and Bruha, I.: An environment for extending conventional programming
languages to build expert system applications. Proc. IASTED Conf. Expert Systems,
Zurich (1989)
11. Goldberg, D.E.: Genetic algorithms in search, optimization, and machine learning.
Addison-Wesley (1989)
12. Giordana, A. and Saitta, L.: REGAL: An integrated system for learning relations using
genetic algorithms. Proc. 2nd International Workshop Multistrategy Learning (1993), 234-
249
13. Holland, J.: Adaptation in natural and artificial systems. University Michigan Press, Ann
Arbor (1975)
14. Jaffer, Z.: Different treatments of uncertainty in McESE. MSc. Thesis, Dept Computer
Science & Systems, McMaster University (1990)
15. Janikow, C.Z.: A knowledge-intensive genetic algorithm for supervised learning. Machine
Learning, 5, Kluwer Academic Publ. (1993), 189-228
16. Turney, P.D.: Cost-sensitive classification: Empirical evaluation of a hybrid genetic
decision tree induction algorithm. J. Artificial Intelligence Research (1995)
125
Fig. 1. The decision tree of our case study.
126