
distance will be described. After that, we will devote 
section 4 to the introduction of the criterion which 
will be given as the comparison study between the 
neural and the statistical classifiers. Simulation 
results are presented and analyzed in section 5. In the 
last section, we will apply this comparative study to 
the evaluation of real pattern recognition problem. So 
we intend to test the different classifiers stability and 
performance for the handwritten digits recognition 
problem by classifying their corresponding Fourier 
Descriptors. Such features form a set of invariant 
parameters under similarity transformations and 
closed curve parameterizations. This set has good 
proprieties as completeness and stability.  
2 NEURAL APPROACHES 
The most used and studied networks category is the 
mixed NNs, which present a combination of the 
features extractors NNs and the classifiers ones. Once 
the first networks layers carry out the primitive 
extraction, the last layers classify the extracted 
features. An interesting example is the Multi-Layer 
Perceptron.  
2.1  Multi-Layer Perceptron: MLP 
Based on the results from (Steven, 1991), a MLP with 
one hidden layer is generally sufficient for most 
problems including the classification. Thus, all used 
networks in this study will have a unique hidden 
layer. The number of neurons in the hidden layer 
could only be determined by experience and no rule 
is specified. However, the number of nodes in the 
input and output layers is set to match the number of 
input and target parameters of the given process, 
respectively. Thus, the NNs have a complex 
architecture that the task of designing the optimal 
model for such application is far from easy. 
In order to reduce the difference between the 
ANN outputs and the known target values, the 
training algorithm estimates the weights matrices, 
such that an overall error measure is minimized. The 
proposed technique requires improvements for MLP 
with the back-propagation algorithm. 
2.2  Neural Networks Critics 
Although the effectiveness and significant progress of 
ANNs in several applications, and especially the 
classification process, they present several limits. 
First, the MLP desired outputs are considered as 
homogeneous to a posterior probability. Till today, no 
proof of this approximation quality has been 
presented. Second, the NNs have a complex 
architecture that the task of designing the optimal 
model for such application is far from easy. Unlike 
the simple linear classifiers which may underfit the 
data, the NNs architecture complexity tends to overfit 
the data and causes the model instability. Breiman 
proved, in (Breiman, 1996), the instability of ANNs 
classification results. Therefore, a large variance in its 
prediction results can be introduced after small 
changes in the training sets. Thus, a good model 
should find the equilibrium between the under-fitting 
and the over-fitting processes.  
Qualified by their instability, the neural classifiers 
produce a black box model in terms of only crisp 
outputs, and hence cannot be mathematically 
interpreted as in statistical approaches. Thus, we 
recall in the next section some statistical methods 
such the basic linear discriminate analysis and the 
proposed Patrick-Fischer distance estimator. 
3 STATISTICAL APPROACHES 
The traditional statistical classification methods are 
based on the Bayesian decision rule, which presents 
the ideal classification technique in terms of the 
minimum of the probability error. However, in the 
non parametric context, applying Bayes classifier 
requires the estimation of the conditional probability 
density functions. It is well known that such task 
needs a large samples size in high dimension. 
However, a dimension reduction is required in the 
first step.  
3.1  Linear Discriminate Analysis: LDA 
The linear discriminate analysis is the most well-
known approach in supervised linear dimension 
reduction methods since this popular method is based 
on scatter matrices. In the reduced space, the between 
scatter matrices are maximized while the within class 
ones are minimized. To that purpose, the LDA 
considers searching for orthogonal linear projection 
matrix  W that maximizes the following so-called 
Fisher optimization criterion (Fukunaga, 1990): 
)(
)(
)(
WSWtrace
WSWtrace
WJ
w
T
b
T
 
(1)
S
w
 is the within class scatter matrix and S
b
 is the 
between class scatter one. Their two well-known 
expressions are given by: 
ComparisonofStatisticalandArtificialNeuralNetworksClassifiersbyAdjustedNonParametricProbabilityDensity
FunctionEstimate
673