
 
namely  true  and  false. The inner zone adjacent to 
that frontier, specifically that of the edges and 
vertices of the polyhedral surface, is the risk 
decision zone or the caution zone, and here is where 
the frontier must be redefined. An accumulation in 
single vector of several variables with values that do 
not exceed the hazard maximums but which are near 
them, as would be the case with state vectors in the 
caution zone, may belong – in principle, at the 
judgment of the expert – to a category other than the 
one it would be found owing to its position with 
respect to the polyhedral frontier (figure 1).  
 
Figure 1: 2-dimensional depiction of natural frontier. 
Having modelled the problem in this way, 
consideration was given to the method that should 
be used to solve it, and we decided to rule out 
conventional models based on analytic mathematical 
models – i.e., a formula to determine risk – due 
mainly to the large degree of subjectivity used by 
experts in assessing risk. 
Consequently, we decided to use one of the 
existing systems with the capacity for supervised 
inductive learning. The system should learn from 
state vectors that reflect past situations that have 
been classified by an expert according to the risk 
they entailed. The classification model provided by 
the system would induce classification for state 
vectors that were not necessarily included in the 
learning process; that is, it would neatly trace the 
new frontier in the caution zone based on the expert 
decisions for the state vectors in the past. 
An activity in a given instant in the maritime 
work will be identified with a state vector to which a 
Boolean class variable will be added with the 
possible values of true or false. The new state vector 
shall be n-dimensional, where n-1 is the number of 
variables that have been defined to assess the risk in 
that activity y la n-th the special class variable. An 
example or case will be a specific state vector. 
Measurements generated by examples are commonly 
made at one-hour intervals. Examples that will be 
used to train the system will have a special variable 
value that classifies each as: true, a situation of high 
risk, or false,  when the risk is low or at least 
acceptable. Classification of these examples will 
have been performed – or at least supervised – by an 
expert. With a database with this vector type as 
entries, learning systems extract models that enable 
subsequent classification of new cases. Models are 
abstractions of structural patterns that present 
vectors classified in one class against those 
classified with another: that is, systems will learn to 
distinguish high-risk situations from low-risk ones 
by using the knowledge accumulated in the learning 
process and retained as a model.  
The abundance of learning systems means that 
multiple solutions or models are possible; usually 
more than one per system, as these offer parameters 
that, according to their settings, make the system 
produce different solutions. An important task shall 
be to decide what system of learning and what set of 
parameters to use, in addition to studying the 
suitability of the variables used and perhaps 
reducing or increasing the number of them; in short, 
a good job of data mining is needed, (Wittten et al., 
2005). 
Following these considerations, discussions and 
the pertinent tests, we decided to pre-select two 
systems of supervised inductive learning for trials 
and a more thorough comparison in our problem: 
these were C4.5 (Quinlan, 1993) and Support Vector 
Machines (SVMs, hereinafter) (Cortés, Vapnik, 
1995), (Cristianini, Shawe-Taylor, 2004). 
Conceptually, these systems are quite different: 
while the first is based on a heuristic approach, the 
second is grounded in a whole mathematical theory 
to explain its method. We will now provide a brief 
description of each. 
3.1  The C4.5 System 
C4.5 is a traditional automatic learning system that, 
however, remains fully valid (Jaudet et al., 2005), 
and needs no introduction. For this paper, its main 
feature is that it produces the knowledge learned in 
an explicit form, by means of a decision tree or 
classification rules; in both cases, these are 
comparable to the experience of an expert in the 
field, an aspect of the utmost interest to us. C4.5 
works with both qualitative and quantitative 
variables and is powerful when faced with noise. 
C4.5 incrementally generates a decision tree; each 
new level is originated by a variable that is selected 
for its importance in determining class. 
O 
X-safety 
margin 
Polyhedral 
frontier 
Natural 
frontier 
Y-safety 
margin 
X 
Y 
ICEIS 2009 - International Conference on Enterprise Information Systems
140