and the decoder of the DCAE will be extracted and 
used as the final high-level features of the system.  
The DCAE will help to encode the geometrical 
details of the cells contained in the original pictures. 
The discrimination potentiality carried by the 
extracted features allows us to feed them as the inputs 
of a shallow nonlinear classifier, which will certainly 
find a way to discriminate them. The proposed 
method was tested on the SNP HEp-2 Cell dataset 
(Wiliem et al.) and the results show that the proposed 
features outperform by far the conventional and 
popular handcrafted features and perform at least as 
well as the state-of-the-art supervised deep learning 
based methods.  
2 PROPOSED METHODOLOGY 
Auto-encoders (Hinton et al.) are unsupervised 
learning methods that are used for the purpose of 
feature extraction and dimensionality reduction of 
data. Neural network based auto-encoder consists of 
an encoder and a decoder. The encoder takes an input 
  of  dimension  d, and maps it to a hidden 
representation  , of dimension r, using a 
deterministic mapping function  such that: 
 
 = f(Wx + b) (1)
 
where the parameters W and b are the weights and 
biases associated with the encoder. The decoder then 
takes the output  of the encoder and uses the same 
mapping function   in order to provide a 
reconstruction  that must be of the same shape or in 
the same form (which means almost equal to) as the 
original input signal . Using equation (1), the output 
of the decoder is also given by: 
 
z = f(W’x + b’)  (2)
 
where the parameters W’ and b’ are the weights and 
bias associated with the decoder layer. Finally, the 
network must learn the parameters W, W’, b and b’ 
so that z must be close or, if possible, equal to x. In 
final, the network leans to minimize the differences 
between the encoder’s input x and the decoder’s 
output.  
This encoding-decoding process can be done with 
the use of convolutional neural networks by using 
what we call the deep convolutional autoencoder 
(DCAE). Unlike conventional neural networks, 
where you can set the size of the output that you want 
to get, the convolutional neural networks are 
characterized by the process of down-sampling, 
accomplished by the pooling layers, which are 
incorporated in their architecture. And this sub-
sampling process has as consequence the loss of the 
input’s spatial information while we go deeper inside 
the network. 
To tackle this problem, we can use DCAE instead 
of conventional convolutional neural networks. In the 
DCAE, after the down-sampling process 
accomplished by the encoder, the decoder tries to up-
sample the representation until we reconstruct the 
original size. This can be made by backwards 
convolution often called “deconvolution” operations. 
The final solution of the network can be written in the 
form: 
 
, ’, , ’
argmin
,
,,
, 
(3)
 
where  z denotes the decoder’s output and x is the 
original image. The function L in equation (3) 
estimates the differences between the x and z. So, the 
solution of equation (3) represents the parameter 
values that minimize the most the difference between 
input x and the reconstruction z. 
In our experiments, the feature vectors extracted 
from the DCAE contain 4096 elements. The second 
part of the method consists of giving this feature 
vector to a shallow artificial neural network (ANN). 
Finally, in order to predict the cell type, a supervised 
learning process will be conducted using the extracted 
features from the DCAE as the inputs and a 2 layered 
ANN as the classifier. 
3 RESULTS AND DISCUSSION 
There are 1,884 cellular images in the dataset, all of 
them extracted from the 40 different specimen 
images. Different specimens were used for 
constructing the training and testing image sets, and 
both sets were created in such a way that they cannot 
contain images from the same specimen. From the 40 
specimen, 20 were used for the training sets and the 
remaining 20 were used for the testing sets. In total 
there are 905 and 979 cell images for the training and 
testing sets, respectively. Each set (training and 
testing) contains five-fold validation splits of 
randomly selected images. In each set, the different 
splits are used for cross validating the different 
models, each split containing 450 images 
approximatively. The SNPHEp-2 dataset was 
presented by Wiliem et al. (2016). Figure 1 shows the 
example images of the five different cell types 
randomly selected from the dataset. 
As previously mentioned, the created feature 
vectors extracted from the DCAE contain 4096 
elements. So, our network will have 4096 neurons in