LEUKOCYTES CLASSIFICATION USING BAYESIAN NETWORKS

Ver´onica Rodr´ıguez-L´opez and Ra´ul Cruz-Barbosa

Computer Science Institute, Universidad Tecnol´ogica de la Mixteca, 69000, Huajuapan, Oaxaca, M´exico

Keywords:

Bayesian networks, Classiﬁcation, Leukocyte recognition.

Abstract:

In this paper, the use of bayesian networks in the leukocytes classiﬁcation problem is explored. The com-

plexity in this problem is mainly due to morphological diversity between cells of the same type and similar

features found in different types of cells, which complicate the classiﬁcation task. Since bayesian networks

have demonstrated to be useful as both a classiﬁer and a powerful tool for knowledge representation and infer-

ence under conditions of uncertainty, this graphical model is applied in the leukocytes classiﬁcation problem.

The design of two bayesian network models based on the expert’s knowledge and data are presented. Some

preliminary results have shown that the proposed models classify all types of leukocytes with an acceptable

accuracy.

1 INTRODUCTION

White blood cells, or leukocytes, are cells of the

immune system involved in defending the body

against infection. There are ﬁve types of leuko-

cytes that normally appear in blood: neutrophils,

basophils, eosinophils, lymphocytes and monocytes

(Greer et al., 2009).

One of the most frequently requested test in a

hematology laboratory is a complete blood count

(CBC). As part of the CBC, a white blood cell count

and a differential white blood cell count are done. The

former measures the total number of white blood cells

in a volume of blood given. The latter consists of

a blood examination to determine the presence and

the number of different types of white blood cells

(Estridge et al., 1999; Carr and Rodak, 2004).

Leukocytes can be counted by either manual or

automated hematology analyzers. The manual leuko-

cytes count is a time consuming task, and highly de-

pendent on lab technician skills who performs the dif-

ferential analysis. Human classiﬁcation errors are the

main source of misclassiﬁcation in the manual counts,

where the main problem is the scarcity of cell sam-

ples (usually, sample sizes range from 100 to 200).

On the other hand, automated hematology analyzers

classify cell populations using both electrical and op-

tical techniques. These machines decrease the time of

performing routine examinations and at the same time

increase cells classiﬁcation accuracy. However, these

analyzers are unable to accurately identify and clas-

sify all types of cells and are, particularly, insensitive

to abnormal or immature cells. For this reason, most

tests performed by these equipments will require a re-

view of a skilled lab technician for cell type deﬁnitive

identiﬁcation (Greer et al., 2009).

To help lab technicians on leukocytes identiﬁca-

tion, many computational systems based on digital

image processing and pattern recognition techniques

have been developed. Despite several systems have

reported a good performance (Colunga et al., 2009;

Mircic and Jorgovanovic, 2006; Rodrigues et al.,

2008), automation of leukocytes recognition is not an

easy task. There are two main problems in this pro-

cess. Firstly, cell morphology is very diverse between

cells of the same type (e.g. neutrophil morphology).

Secondly, different types of cells share some charac-

teristics as shape and texture.

In this work, taking into consideration that the

leukocytes classiﬁcation is an expert and uncertain

task domain, we explored the use of bayesian net-

works for discrimination of ﬁve types of leukocytes.

Bayesian networks have demonstrated to be useful as

both a classiﬁer and a powerful tool for knowledge

representation and inference under conditions of un-

certainty.

This paper is organized as follows. In section 2,

a brief description about bayesian networks is pre-

sented. The description of the bayesian network mod-

els design for leukocytes classiﬁcation and the corre-

sponding results are presented in section 3. Finally,

some preliminary conclusions are presented in section

681

Rodríguez-López V. and Cruz-Barbosa R..

LEUKOCYTES CLASSIFICATION USING BAYESIAN NETWORKS.

DOI: 10.5220/0003197706810684

In Proceedings of the 3rd International Conference on Agents and Artiﬁcial Intelligence (ICAART-2011), pages 681-684

ISBN: 978-989-8425-40-9

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

2 BAYESIAN NETWORKS

Bayesian networks (BN), also known as belief net-

works, belong to the probabilistic graphical models

family. These graphical structures are used for knowl-

edge representation of uncertain domains and when

they work with statistical techniques together, they

present several advantages for data analysis (Hecker-

man, 1996).

A formal deﬁnition of a BN is as follows. A

bayesian network model, or simply a bayesian net-

work, is a pair (D, P), where D is a directed acyclic

graph (DAG), P = {p(x

|π

), ..., p(x

|π

)} is a set of

n conditional probability distributions, one for each

variable, and Π

is the set of parents of node X

in D

(Castillo et al., 1997). The set P deﬁnes the associated

joint probability distribution as

p(x

, x

, ..., x

) =

∏

i=1

p(x

|π

) (1)

The construction of a bayesian network involves

the deﬁnition of its structure and the estimation of its

parameters. In the simplest case, the structure of a

bayesian network is speciﬁed by an expert and then

the corresponding parameters are learned from the

available data.

3 EXPERIMENTS

3.1 Experimental Design and Settings

In order to explore the performance of bayesian net-

works in the leukocytes classiﬁcation problem, we de-

signed two models of this approach. That is, two ex-

periments for classifying all types (neutrophils, ba-

sophils, eosinophils, lymphocytes and monocytes) of

leukocytes were conducted. In the ﬁrst experiment,

a bayesian network which includes some important

morphological features for leukocytes classiﬁcation

was built. In the second experiment, we searched for

a simpler bayesian network with a better performance

than the one designed in the ﬁrst experiment.

For the ﬁrst experiment, the leukoA model was

developed. In the leukoA model, we proposed a

leukocyte classiﬁcation node as the main one, and

with the purpose of expressing the real dependence

among features of leukocytes, we used a tree struc-

ture. In this model, we aimed to use some character-

istics that experts take into account for the classiﬁca-

tion process. These features were incorporated into

the model as discrete latent variables. Furthermore,

for the bayesian network structure building we placed

some observable nodes (which are linked to the la-

tent variables) representing the description or mea-

surements of the corresponding features (see Figure

1). These measurements were obtained by applica-

tion of digital image processing techniques. The ob-

servable nodes are continuous variables that have a

normal distribution. The description of the incorpo-

rated knowledge into the leukoA model is presented

as follows.

The ﬁrst characteristic considered into the leukoA

model was the shape of the nucleus. The nucleus

shape of lymphocytes is round, and the monocytes

shape have a great reniform or horseshoe-shaped nu-

cleus. The nucleus of neutrophils have from 2 to 5

lobules, it can present S, C or glass shapes. The nu-

cleus of eosinophils have 2 lobules and usually it is

glass shaped. The nucleus of basophils is bi- or tri-

lobed, but it is hard to see because of the number of

granules which hide it (Carr and Rodak, 2004; Greer

et al., 2009; Estridge et al., 1999). This knowledge

about the shape of nucleus was encoded into the nu-

cleus shape node. The estimation of this shape was

obtained by means of region descriptors, particularly,

we used the compactness, dispersion and the ﬁrst Hu

moment (Nixon and Aguado, 2007). These descrip-

tors were included into the leukoA model as compact-

ness, dispersion and MH1 nodes.

Since nucleus size is more relevantthan cytoplasm

size for leukocytes identiﬁcation, only the nucleus

size was considered for the leukoA model. For the

nucleus size measurement we took the number of pix-

els that belong to the corresponding region divided by

the total number of pixels of the cell (nucleus and cy-

toplasm pixels). This nucleus size information was

included into the nucleus size node, which was linked

with the nucleus shape node due to the relationship

between these two features.

The cytoplasm texture is an important characteris-

tic of leukocytes, it allows to group the cells by the

presence or absence of granules in their cytoplasm

(Greer et al., 2009). The granulocyte type cells are

neutrophils, basophils and eosinophils. The agranu-

locyte cells are lymphocytes and monocytes. In order

to get information about the cytoplasm texture, the en-

ergy descriptor (Nixon and Aguado, 2007) was used.

This knowledge about the cytoplasm texture and its

corresponding descriptor were captured with the cy-

toplasm texture and energyC nodes.

The texture of nucleus is another important char-

acteristic of leukocytes that is reported in medical

literature (Greer et al., 2009; Estridge et al., 1999).

For this reason, we included this knowledge into the

leukoA model in a similar way as the cytoplasm tex-

ture was.

ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence

682

The colour of cytoplasm was the last feature of

leukocytes taken into account for the leukoA model.

The granulocyte leukocytes are characterized by the

presence of differently staining colour granules in

their cytoplasm: neutrophils have pink colour gran-

ules, eosinophils have orange granules, and basophils

have dark purple granules. For the agranulocyte

cases, the cytoplasm colour for lymphocytes is light

blue and for monocytes is greyish blue (Carr and Ro-

dak, 2004; Estridge et al., 1999). The colour de-

scriptor was obtained through the average intensity

value using the RGB space. The knowledge about

the colour was encoded into the cytoplasm colour,

Rvalue, Gvalue and Bvalue nodes.

In summary, the topology of the LeukoA model is

showed in Figure 1.

Figure 1: Topology of the leukoA bayesian network model

for leukocytes classiﬁcation.

For the second experiment, we explored the pos-

sibility to ﬁnd a tree type bayesian network model

with a minimum set of nodes, that performs leuko-

cyte classiﬁcation with an acceptable degree of accu-

racy. A deﬁnition of the new model was found by

modifying the leukoA model. The modiﬁcation is as

follows. Analyzing the leukoA model, we observed

that the cytoplasm colour node is a redundant node

because it does not encode uncertain information. For

this reason, the cytoplasm colour node was removed.

Since either cytoplasm or nucleus texture is described

by one measurement we decided to remove the cyto-

plasm and nucleus texture nodes in the leukoA model.

We hypothesize that the energy’s nodes are enough to

consider the texture information. Following the pre-

vious observations, we deﬁned the second bayesian

network model, named leukoB. The topology of the

leukoB model is presented in Figure 2.

3.2 Preliminary Results

In order to evaluate the performance of the leukoA

and leukoB bayesian network models, we used a set

of 190 leukocytes colour images with a resolution of

256X256 pixels. The images were obtained with the

help of a microscope that has an in-built CCD cam-

Figure 2: Topology of the leukoB bayesian network model

for leukocytes classiﬁcation.

era with the resolution of 640X480 pixels. The man-

ual selection and cut of leukocytes region were ap-

plied to all images. For the nucleus and cytoplasm

segmentation, we used a free software developed by

Zoltan Kato (Berthod et al., 1996). The image set was

formed by 8 basophils, 72 neutrophils, 9 eosinophils,

31 monocytes and 70 lymphocytes. All images were

previously classiﬁed by a human expert.

The classiﬁcation performance of the designed

bayesian networks was evaluated by ﬁve-fold cross-

validation. The parameters of the corresponding

models were obtained by using maximum likeli-

hood estimation from complete data (Heckerman,

1996). The models were tested using the Hugin Lite

7.3

software.

The average classiﬁcation accuracy results for

the proposed bayesian network models are shown in

Table 1. These results are slightly better for the

leukoB model, which favours the simpler model as

we expected. Also, the results from Table 1 com-

pare favourably with those of alternative methods

that consider less types of leukocytes than the ones

used here. For example, the classiﬁers presented in

(Mircic and Jorgovanovic, 2006) and (Colunga et al.,

2009) are complex and consider only the most com-

mon types of leukocytes. In (Mircic and Jorgov-

anovic, 2006), they only classify four types (neu-

trophils, eosinophils, lymphocytes and monocytes) of

leukocytes with 86% of accuracy. While in (Colunga

et al., 2009), they classify neutrophils, eosinophils

and lymphocytes with 84% of accuracy. In contrast,

our bayesian network models are simple classiﬁers

that can be easily understood and veriﬁed by experts.

In Table 2, the average classiﬁcation accuracy re-

sults for each type of leukocyte is presented. From

this table, it can be observed that both models

can classify all types of leukocytes, including ba-

sophils and eosinophils, which are, usually, imbal-

anced classes (they appear less frequently in blood

cells). These preliminary results show that bayesian

networks are promising models for leukocytes classi-

ﬁcation.

LEUKOCYTES CLASSIFICATION USING BAYESIAN NETWORKS

683

Table 1: Average classiﬁcation accuracy results (using

cross-validation) for leukoA and leukoB bayesian network

models.

Model classif. acc.

leukoA 87.9%

leukoB 90.5%

Table 2: Average classiﬁcation accuracy results for each

type of leukocyte of the leukoA and leukoB bayesian net-

work models.

Model type of leukocyte classif. acc.

basophils 93.3%

neutrophils 95.3%

leukoA eosinophils 83.3%

monocytes 61.0%

lymphocytes 88.0%

basophils 93.3%

neutrophils 95.3%

leukoB eosinophils 83.3%

monocytes 82.7%

lymphocytes 89.5%

4 CONCLUSIONS

We presented two bayesian network models for leuko-

cytes classiﬁcation in this paper. A tree structure

for them and deﬁnition of variables by using expert’s

knowledge and medical literature was proposed. De-

spite the analyzed data set have not enough images

of some types of leukocytes (imbalanced classes), the

proposed bayesian networks performance is compa-

rable with those of reported in literature. Our pro-

posed models can classify all types of leukocytes, in-

cluding the less frequent types, with a high degree of

accuracy. These preliminary results have shown that

bayesian network models could be competitive with

other types of classiﬁers.

As future work, we will use the leukocytes fea-

tures found in this analysis for building a naive bayes

and a neural network model, which can then be com-

pared, in terms of average accuracy, with our pro-

posed bayesian network models.

REFERENCES

Berthod, M., Kato, Z., Yu, S., and Zerubia, J. (1996).

Bayesian image classiﬁcation using markov random

ﬁelds. Image and Vision Computing, (14):285–295.

Carr, J. H. and Rodak, B. F. (2004). Clinical Hematology

Atlas. Saunders, 2nd. edition.

Castillo, E., Gutierrez, J. M., and Hadi, A. S. (1997).

Experts systems and Probabilistic Networks Models.

Springer-Verlag.

Colunga, M. C., Siordia, O. S., and Maybank, S. J.

(2009). Leukocyte recognition using EM-algorithm.

In Aguirre, A. H., Borja, R. M., and Garc´ıa, C.

A. R., editors, MICAI ’09: Proceedings of the 8th

Mexican International Conference on Artiﬁcial Intel-

ligence, pages 545–555. Springer-Verlag.

Estridge, B. H., Reynolds, A. P., and Walters, N. J. (1999).

Basic Medical Laboratory Techniques. Delmar Cen-

gage Learning, 4th. edition.

Greer, J. P., Foerster, J., Rodgers, G. M., Paraskevas, F.,

Glader, B., Arber, D. A., and Robert T. Means, J.

(2009). Wintrobe’s Clinical Hematology, volume 1.

Lippincott Williams & Wilkins, 12th. edition.

Heckerman, D. (1996). A tutorial on learning with bayesian

networks. Technical report, Microsoft Research.

Mircic, S. and Jorgovanovic, N. (2006). Automatic classi-

ﬁcation of leukocytes. Journal of Automatic Control,

16(1):29–32.

Nixon, M. S. and Aguado, A. S. (2007). Feature Extraction

& Image Processing. Academic Press, 2nd. edition.

Rodrigues, P., Ferreira, M., and Monteiro, J. (2008).

Segmentation and classiﬁcation of leukocytes using

neural networks: A generalization direction. In

Bhanu Prasad, S. M. P., editor, Speech, Audio, Image

and Biomedical Signal Processing using Neural Net-

works, pages 373–396. Springer Berlin / Heidelberg.

ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence

684