Classifying Breast Cytological Images using Deep Learning

Architectures

Hasnae Zerouaoui

and Ali Idri

1,2 b

Modeling, Simulation and Data Analysis, Mohammed VI Polytechnic University,

Benguerir, Morocco

Software Project Management Research Team, ENSIAS, Mohammed V University in Rabat, Morocco

Keywords: Computer-aided Diagnosis, Breast Cancer, Classification, Deep Convolutional Neural Networks, Image

Processing, Histological Images.

Abstract: Breast cancer (BC) is a leading cause of death among women worldwide. It remains a critical challenge,

causing over 10 million deaths globally in 2020. Medical images analysis is the most promising research area

since it provides facilities for diagnosing several diseases such as breast cancer. The present paper carries out

an empirical evaluation of recent deep Convolutional Neural Network (CNN) architectures for a binary

classification of breast cytological images based fined tuned versions of seven deep learning techniques:

VGG16, VGG19, DenseNet201, InceptionResNetV2, InceptionV3, ResNet50 and MobileNetV2. The

empirical evaluations used: (1) four classification performance criteria (accuracy, recall, precision and F1-

score), (2) Scott Knott (SK) statistical test to select the best cluster of the outperforming architectures, and (3)

borda count voting system to rank the best performing architectures. All the evaluations were over the FNAC

dataset which contain 212 images. Results showed the potential of deep learning techniques to classify breast

cancer in malignant and benign, therefor the findings of this study recommend the use of MobileNetV2 for

the classification of the breast cancer cytological images since it gave the best results with an accuracy of

98.54%.

1 INTRODUCTION

Breast cancer (BC) is still the leading cause of death

among women worldwide (Metelko et al., 1995). It

remains a global challenge, causing over 1 million

deaths globally in 2019 (Bish et al., 2005). As the

number of patients infected by this disease increases,

it turns out to be increasingly hard for radiologists to

accurately deal with the diagnosis process in the

constrained accessible time (Zhang et al., 2011).

Medical images analysis is one of the most promising

research areas, it provides facilities for diagnosis and

making decisions of several diseases such as breast

cancer. Recently, more attention are paid to imaging

modalities and Deep Learning (DL) in BC

(Mendelson and Eb, 2019). Therefore, interpretation

of these images requires expertise and consequently

several algorithms have been developed and

evaluated to improve and help oncologist’s diagnosis.

https://orcid.org/0000-0001-7268-8404

https://orcid.org/0000-0002-4586-4158

In general, DL showed better performance in breast

cancer detection, and provided high accurate

classifications compared with classical Machine

Learning (ML) techniques (Saha, Mukherjee and

Chakraborty, 2016) (Sadoughi et al., 2018). For

instance, the study (Saha, Mukherjee and

Chakraborty, 2016) showed that the use of deep CNN

architectures is very powerful and efficient in the

domain of DL since it tested the InceptionRecurrent

Residual CNN in the dataset BreakHis and it gave

better results compared to existing techniques such as

CNN and SVM. The study (Xie et al., 2019) used

AlexNet and LeNet for the binary classification of the

BreakHis dataset and showed an improvement of the

accuracy compared to the traditional ML techniques.

However, the present study develops and evaluates

the performances measured in terms of accuracy,

sensitivity, recall, precision and F1-score of seven of

the most recent DL techniques in BC classification

Zerouaoui, H. and Idri, A.

Classifying Breast Cytological Images using Deep Learning Architectures.

DOI: 10.5220/0010850000003123

In Proceedings of the 15th Inter national Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2022) - Volume 5: HEALTHINF, pages 557-564

ISBN: 978-989-758-552-4; ISSN: 2184-4305

557

over the FNAC dataset. To the best of our knowledge,

this study is the first to evaluate and compare seven

DL techniques (VGG16, VGG19, DenseNet201,

InceptionResNetV2, InceptionV3, ResNet50 and

MobileNetV2) using the Scott Knott (SK) statistical

test and the borda count voting method in BC binary

classification. Note that the SK test has been widely

used to comparing, clustering and ranking multiple

machine learning models for parameters tunning

(Idri, Hosni and Abran, 2016; Ottoni et al., 2020) in

different fields such as software engineering (Ottoni

et al., 2020) and breast cancer (Idri et al., 2020) .

Therefore, we use the SK test since: (1) it shows high

performance compared to other statistical tests such

as Jollife (Jolliffe, Allen and Christie, 1989), Calinski

and Corsten (Calinski and Corsten, 1985), and Cox

and Spjotvoll (Worsley, 1986) and (2) its ability to

select the best non-overlapping groups of machine

learning techniques. Moreover, we use the Borda

Count voting method (García-Lapresta and Martínez-

Panero, 2002; Emerson, 2013) to rank the best SK

selected techniques based on the four performance

criterias.

The present study discusses two research

questions (RQs):

 (RQ1): What is the overall performance of DL

techniques in BC classification?

 (RQ2): Is there any DL techniques which

distinctly outperform the others?

The main contributions of this empirical study are the

following: (1) Designing seven DL architectures:

VGG16, VGG19, DenseNet201,

InceptionResNetV2, InceptionV3, ResNet50 and

MobileNetV2 in BC classification, (2) Avoiding

overfitting by using weight decay and L2

regularizers, (3) Comparing the performances of the

seven architectures using SK clustering test and borda

count voting method.

The remainder of this paper is organized as

follow. Section 2 describes the related work. In

Section 3, we present the configuration and

parametrization of the seven DL techniques, the

empirical methodology followed throughout the

research, the data preparation which includes data

acquisition and image processing and the

abbreviations. Section 4 presents and discusses the

empirical results. Section 5 outlines conclusions and

future works.

2 RELATED WORKS

This section presents the results and the main findings

of the related work as shown in Table 1, the results

are summarized as follow:

 Accuracy is the most frequently criterion used to

evaluate the performance of the DL techniques in

BC when using balanced datasets (Zerouaoui and

Idri, 2021).

 Most of the studies only compared two to three

DL techniques. Although the DL architectures

used in the selected studies were different, it is

worth notable that the most investigated

techniques were InceptionResNet, VGG16,

VGG19 and ResNet50

(Alom et al., 2019; Spanhol

et al., 2016; Xie et al., 2019)

 Some studies (

Kassani et al., 2019; (Nahid, Mehrabi

and Kong, 2018; Zhu et al., 2019)

combined more

than two DL techniques in order to have better

Table 1: Summary of the literature review of the use of DL

techniques in BC classification.

Authors Findings and results

Alom et

al.

(Alom et

al.,

2019)

Conduct a study on the use of Inception

Recurrent Residual Convolutional Neural

Network (IRRCNN) which is a hybrid

DCNN architecture based on RCNN,

Residual Network and Inception tested on

two datasets: BreakHis and BC

classification challenge 2015.

The performance was evaluated on image

level, patch level, image based and patch-

based analysis. The results show an

improvement of 3,67% and 2.14% of

accuracy on the BreakHis dataset compared

to scientific results since 2016

Fabio et

al.

(Spanhol

et al.,

2016)

Investigate a deep learning approach, using

two DCNN architecture which are AlexNet

and LeNet, in order to avoid hand crafted

features. The results of the experiments

demonstrated an improved accuracy

compared to the experiments that used

traditional feature extractor techniques.

Xie et al.

(Xie et

al.,

2019)

Evaluate Inception_V3 and Inception_

ResNet_V2 for classification of Brekhis

using two types of learning: the supervised

and the unsupervised learning. The authors

used the transfer learning by pre training the

model on ImageNet, applying Data

augmentation and Fine tuning using the

BreakHis dataset. The experiment shows

that the results of the augmented dataset is

much better than the normal dataset, that

Inception_ ResNet_V2 has better results for

the feature extraction and the supervised

learning has a better accuracy than

clusterin

HEALTHINF 2022 - 15th International Conference on Health Informatics

558

Table 1: Summary of the literature review of the use of DL

techniques in BC classification (cont.).

Authors

Findings and results

Nahid et

al.

(Nahid,

Mehrabi

and

Kong,

2018)

Use three models: CNN techniques, LSTM

structure and the combination of CNN and

LSTM on the BreakHis Dataset. As results

91% of accuracy was obtained using 200x

images dataset, 96 of precision with 40x

images dataset, and the best F-Measure was

obtained using 40x and 100x datasets.

Kassani

et al.

(Kassani

et al.,

2019)

Use of a combination of CNNs deep

learning architectures such as VGG19,

MobileNet and Densenet for feature

extraction on 4 histopathological dataset

images: ICIAR, BreakHis, Densenet and

MobilNet. Then they compared the results

of the classification on traditional single

models and machine learning techniques

namely Decision tree, Random Forest,

XGBoost, AdaBoost and bagging classifier.

The main finding is that the proposed

ensemble method gives better results thant

the solo methods with an accuracy of:

98.13%,

Chuang

et al.

(Zhu et

al.,

2019)

Propose a hybrid architecture where they

assembled multiple CNN architectures

(Inception module, Residual Net and Batch

Normalization techniques) and tested it on

two datasets (BreakHis and BACH). The

proposed model shows a comparable and

etter performance.

Jiang et

al. (Jiang

et al.,

2019)

Design a new CNN that includes a

convolutional layer, a small SE-ResNet

module and a fully connected layer,

they tested their architecture on the

BreakHis dataset and the results

achieved 98,87% and 99,34% for the

binary class and 90.66% and 93.81% for

the multi class.

prediction results, since the ensemble methods in

general outperformed their single techniques.

3 EXPERIMENT

CONFIGURATION

This section presents the parameter tuning of the DL

models, the empirical design, the data preparation

followed and finally the abbreviations.

3.1 Experiment Configuration

Toward an automatic binary BC classification based

on publicly available image FNAC dataset, the

different DL architectures have been implemented

using several parameters tuning experiments. All the

images of the FNAC dataset were resized to 224x224

pixels except those of InceptionV3 and

InceptionResNetV2 models that were resized to

299x299 since it is the default input size in their

architectures. To train the models, we used the

transfer learning technique where we downloaded the

seven DL techniques pre-trained in the ImageNet

dataset (Fei-Fei, Deng and Li, 2010). For the

parameter tunning, we set the batch size to 32 and the

number of epochs to 300. As for the optimization, we

used Adam (adaptive moment estimation) (Kingma

and Ba, 2015) with β1=0.9, β2=0.999, and an initial

learning rate set to 0. 0001 and decrease exponentially

to 0.000001. Moreover, we used weight decay and

L2- regularizers to reduce the overfitting for different

models. A fully connected layer was trained with the

ReLU, followed by a dropout layer with a probability

of 0.5. We updated the last dense layer in all models

to output two classes corresponding to benign and

malignant instead of 1000 classes as was used for

ImageNet.

3.2 Empirical Design

Figure 1 shows the methodology followed to carry

out all the empirical evaluations. It consists of three

steps we describe hereafter. Note that similar

methodologies were used in (Worsley,

2009)(SHARMA et al., 2003)(Azzeh, Nassif and

Minku, 2015)(Idri, Abnane and Abran,

2018)(Zerouaoui et al., 2021)(Idri and Abnane,

2017).

Figure 1: Experimental process.

Classifying Breast Cytological Images using Deep Learning Architectures

559

3.3 Data Preparation

This section presents the data preparation process

followed for the FNAC as described in Figure 2,

which consists of Data pre-processing by using

intensity normalization and Contrast Limited

Adaptive Histogram Equalization (CLAHE) and Data

augmentation. The Images of the FNAC dataset were

captured by us using Leica ICC50 HD microscope

using 400 resolution and 24 bits color depth and with

5 megapixels camera associated with the

microscope(Saikia et al., 2019). Digitized images

captured were then reviewed by experienced certified

cyto-pathologists and selected a total of 212 images

(113 Malignant and 99 Benign). The database can be

downloaded from the link in (Saikia et al., 2019).

Figure 2: Data preparation process.

Data Processing: The next stage is to pre-process

input images using intensity normalization and

Contrast Limited Adaptive Histogram Equalization

(CLAHE). Intensity normalization is a pre-processing

step in image processing applications (Kassani et al.,

2019). We normalized input images to the standard

normal distribution using min-max normalization of

Equation 1. Furthermore, before feeding input images

into the proposed models, CLAHE is a necessary step

to improve the contrast in images as shown in Figure

3 (Makandar and Halalli, 2015; Kharel et al., 2017).

Figure 3: Original and transformed images.

minmax

min

norm

−

(1)

Data Augmentation: was used for the training

process after dataset pre-processing and has the goal

to avoid the risk of overfitting (Perez and Wang,

2017). Moreover, the strategies we used include

geometric transforms such as rescaling, rotations,

shifts, shears, zooms and flips.

3.4 Abbreviations

To assist the reader and shorten the names of the DL

techniques, we use the following naming rules in the

rest of this paper. We abbreviate the name of each

variant of DL techniques as shown in Table 2.

Table 2: Abbreviations used for the DL techniques for the

FNAC dataset.

D.L techniques with the image

magnification factor

Abbreviation

VGG 16 VGG16

VGG 19 VGG19

ResNet 50 Res50

Inception V3 INV3

Inception ResNet V2 INRES

DensNet201 DENS

MobilNetV2 MOB

4 RESULTS

This section presents and discusses the results of the

empirical evaluations of seven DL techniques:

VGG16, VGG19, InceptionV3, ResNet50,

InceptionResNetV2, DenseNet201, and

MobileNetV2, over the FNAC dataset. The

performances of the DL techniques were evaluated

using 5-fold cross validation and four criteria’s:

accuracy, recall, precision and F1-score. First the

performance is compared in terms of accuracy of each

DL technique (RQ1). Thereafter, we use the SK

statistical test to cluster the selected DL techniques,

and borda count to rank the DL techniques belonging

to the best SK cluster (RQ2).

4.1 Accuracy Evaluation and

Comparison of the Seven DL

Techniques

This section compares the accuracy values of the

seven DL techniques to each other over the FNAC

dataset. Note that the training and testing of the DL

techniques are implemented in Python using Keras

and Tensorflow DL frameworks and run on a TPU

processing unit of 8 cores with 35 GB in RAM and

Linux-based OS, provided by google in Colab

Notebook.

Figure 4 and Table 3 show the accuracy values of

the baseline VGG16, VGG19, DenseNet201,

InceptionResNetV2, InceptionV3, ResNet50 and

HEALTHINF 2022 - 15th International Conference on Health Informatics

560

MobileNetV2 vs the number of epochs over the

FNAC dataset. We observe that the best accuracy

values were achieved with the number of epochs 25

using VGG16, VGG19, InceptionV3, DenseNet201,

InceptionResNetV2 and MobileNetV2; and with the

number of epochs 100 when using ResNet 50 since.

The best accuracy value was achieved by MobileNet

V2 (98.54%) followed by DenseNet201, VGG16,

VGG19, InceptionV3 and InceptionResNetV2 with

an accuracy value of 98.07%, 97.65%, 95.72%,

93.51% and 91.34% respectively. The worst accuracy

value was achieved using ResNe 50 with a value of

91.34%

Figure 4: Accuracy values vs number of epochs of the seven

deep learning architectures over the FNAC dataset (The

abbreviation used in Figure 3 for the DL techniques for the

FNAC dataset are defined in Table 4 of section 6.5).

Table 3: Accuracy values of the seven deep learning

architectures over the FNAC dataset (The abbreviation used

in Table 3 for the DL techniques for the FNAC dataset are

defined in Table 1 of section 3.4).

DL technique Accuracy (%)

VGG16 97.65

VGG19 95.72

INV3 95.06

Res50 91.34

INRES 93.51

DENS 98.07

MOB 98.54

4.2 Clustering DL Techniques using

SK Test and Ranking Them using

Borda Count

This step uses the SK statistical test to evaluate the

predictive capabilities of the DL techniques evaluated

in step 4.1 and discusses the ranking results when

applying the borda count voting method based on

accuracy, recall, precision and F1-score on the best

SK clusters. Table 4 shows the values of the four

performance measures of all the DL techniques over

FNAC dataset. Note that the SK test consists of

grouping DL techniques with no significant

difference between their accuracy values. Since the

SK test requires that its inputs should be normally

distributed we verified the normality of the data by

the Kolmogorov-Smirnov test, and since the data is

normally distributed we didn’t use the Box Cox

transformation (Sakia, 2012). Afterwards, we

performed the SK test to cluster the selected DL

techniques into overlapping free groups and

identified the best group based on accuracy. The DL

techniques belonging to the same group have similar

predictive capability and the best group contains the

DL techniques that have the highest value of

accuracy.

Table 4: Best performance values of the DL techniques over

the FNAC Dataset (The abbreviation used in Table 4 for the

DL techniques for the FNAC dataset are defined in Table 1

of section 3.4).

DL Accur

acy

(%)

Recall

(%)

Precis

ion

(%)

F1 score

(%)

VGG16 97.65 98.15 97.49 97.8

VGG19 95.72 94.98 96.95 95.95

INV3 95.06 94.19 96.5 95.32

Res50 91.34 91.9 91.98 91.86

INRES 93.51 94.28 93.79 93.94

DENS 98.07 98.42 98.86 98.63

MOB 98.54 98.15 98.24 98.2

From the results of the SK test shown in Figure 5, it

is noticeable that the SK test results gives 4 clusters

which implies that the accuracy performances are

highly influenced by the DL model used for the

classification. The figure shows that the best SK

cluster contains 3 DL models including

DenseNet201, MobileNetV2 and VGG16 and that

last SK cluster contains the DL model ReseNet50.

Figure 5: Results of SK test for the DL techniques over the

FNAC dataset.

Table 5 shows the borda count ranking of the

architectures belonging to the best SK clusters for the

Classifying Breast Cytological Images using Deep Learning Architectures

561

FNAC dataset. The DL architecture MobileNetV2 is

ranked first followed by DenseNet201 and finally

VGG16.

Table 5: Best performance values.

Borda Count Ranking Deep Learning model

1 MOB

2 DENS

3 VGG16

As a summary, when using the cytological FNAC

dataset, it is recommended to use the DL architecture

MobileNetV2 since it was ranked first when using the

borda count voting method based on accuracy,

precision, recall and F1-score and achieved an

accuracy of 98.54%. On top of that MobileNetV2

remains the perfect DL architecture since it is light

weighted in terms of architecture and is designed for

mobile and web application. Therefore, we highly

recommend the use of MobileNetV2 for a binary

cytological classification.

5 CONCLUSIONS

The present paper presented and discussed the results

of an empirical comparative study of seven recent DL

techniques (VGG16, VGG19, DenseNet201,

InceptionResnetV2, InceptionV3, ResNet50 and

MobileNetV2) for BC binary imaging classification.

All the empirical evaluations used four performance

criteria’s, SK statistical test, and borda Count to

assess and rank these seven DL techniques over the

FNAC dataset. The findings of this study are:

(RQ1): What is the Overall Performance of DL

Techniques in BC Classification?

The accuracy results of the seven DL techniques were

highly influenced by the characteristics of the dataset.

Nevertheless, we observed that MobileNetV2,

DenseNet201, VGG16, VGG19 and InceptionV3

gave the best results. However, ReseNet50

underperformed compared to the others.

(RQ2): Is There Any DL Techniques, Which

Distinctly Outperform the Others?

MobileNetV2 technique gave the best results since it

belonged to the best SK clusters for the FNAC dataset

and was ranked first using the borda count voting test

based on accuracy, recall, precision and F1-score. As

results we recommend the use of MobileNet V2 to

develop DL computer assister diagnosis systems

since it gives good results when using cytological

images for binary BC classification.

Ongoing works investigate homogenous and

heterogeneous ensembles whose members are deep

learning techniques with different meta-learning

techniques such as bagging, boosting and stacking for

breast cancer imaging classification.

ACKNOWLEDGEMENTS

This work was conducted under the research project

“Machine Learning based Breast Cancer Diagnosis

and Treatment”, 2020-2023. The authors would like

to thank the Moroccan Ministry of Higher Education

and Scientific Research, Digital Development

Agency (ADD), CNRST, and UM6P for their

support.

This study was funded by Mohammed VI

polytechnic university at Ben Guerir Morocco.

Compliance with ethical standards.

Conflicts of interest/competing interests Not

applicable.

Code availability not applicable.

REFERENCES

Alom, M. Z. et al. (2019) ‘Breast Cancer Classification

from Histopathological Images with Inception

Recurrent Residual Convolutional Neural Network’,

Journal of Digital Imaging. Journal of Digital Imaging.

doi: 10.1007/s10278-019-00182-7.

Azzeh, M., Nassif, A. B. and Minku, L. L. (2015) ‘An

empirical evaluation of ensemble adjustment methods

for analogy-based effort estimation’, Journal of

Systems and Software. Elsevier Ltd., 103, pp. 36–52.

doi: 10.1016/j.jss.2015.01.028.

Bish, A. et al. (2005) ‘Understanding why women delay in

seeking help for breast cancer symptoms B’, 58, pp.

321–326. doi: 10.1016/j.jpsychores.2004.10.007.

Bony, S. et al. (2001) ‘The relationship between mycotoxin

synthesis and isolate morphology in fungal endophytes

of Lolium perenne’, New Phytologist, 152(1), pp. 125–

137. doi: 10.1046/j.0028-646X.2001.00231.x.

Calinski, T. and Corsten, L. C. A. (1985) ‘Clustering Means

in ANOVA by Simultaneous Testing’, Biometrics,

41(1), p. 39. doi: 10.2307/2530641.

Emerson, P. (2013) ‘The original Borda count and partial

voting’, Social Choice and Welfare, 40(2), pp. 353–

358. doi: 10.1007/s00355-011-0603-9.

Fei-Fei, L., Deng, J. and Li, K. (2010) ‘ImageNet:

Constructing a large-scale image database’, Journal of

Vision, 9(8), pp. 1037–1037. doi: 10.1167/9.8.1037.

García-Lapresta, J. L. and Martínez-Panero, M. (2002)

‘Borda count versus approval voting: A fuzzy

HEALTHINF 2022 - 15th International Conference on Health Informatics

562

approach’, Public Choice, 112(1), pp. 167–184. doi:

10.1023/A:1015609200117.

Hamza, M. and Larocque, D. (2005) ‘An empirical

comparison of ensemble methods based on

classification trees’, Journal of Statistical Computation

and Simulation, 75(8), pp. 629–643. doi: 10.1080/

00949650410001729472.

He, K. and Sun, J. (2016) ‘Deep Residual Learning for

Image Recognition’. doi: 10.1109/CVPR.2016.90.

Hosni, M. et al. (2019) ‘Reviewing ensemble classification

methods in breast cancer’, Computer Methods and

Programs in Biomedicine, 177, pp. 89–112. doi:

10.1016/j.cmpb.2019.05.019.

Huang, G. et al. (2017) ‘Densely Connected Convolutional

Networks’. doi: 10.1109/CVPR.2017.243.

Idri, A. et al. (2020) ‘Assessing the impact of parameters

tuning in ensemble based breast Cancer classification’,

Health and Technology. Health and Technology, 10(5),

pp. 1239–1255. doi: 10.1007/s12553-020-00453-2.

Idri, A. and Abnane, I. (2017) ‘Fuzzy Analogy Based Effort

Estimation: An Empirical Comparative Study’, IEEE

CIT 2017 - 17th IEEE International Conference on

Computer and Information Technology, (Ml), pp. 114–

121. doi: 10.1109/CIT.2017.29.

Idri, A., Abnane, I. and Abran, A. (2018) ‘Evaluating

Pred(p) and standardized accuracy criteria in software

development effort estimation’, Journal of Software:

Evolution and Process, 30(4), pp. 1–15. doi: 10.1002/

smr.1925.

Idri, A., Hosni, M. and Abran, A. (2016) ‘Improved

estimation of software development effort using

Classical and Fuzzy Analogy ensembles’, Applied Soft

Computing Journal. Elsevier B.V., 49, pp. 990–1019.

doi: 10.1016/j.asoc.2016.08.012.

Jiang, Y. et al. (2019) ‘Breast cancer histopathological

image classification using convolutional neural

networks with small SE-ResNet module’, PLoS ONE,

14(3), pp. 1–21. doi: 10.1371/journal.pone.0214587.

Jolliffe, I. T., Allen, O. B. and Christie, B. R. (1989)

‘COMPARISON OF VARIETY MEANS USING

advantage of this approach is that the divisions into

groups can be done at more’, 25, pp. 259–269.

K, H. T. (2013) ‘c r v i h o e f c r v i h o e f’, 4(2), pp. 627–

635.

Kassani, S. H. et al. (2019) ‘Classification of

Histopathological Biopsy Images Using Ensemble of

Deep Learning Networks’. Available at: http://

arxiv.org/abs/1909.11870.

Kharel, N. et al. (2017) ‘Early diagnosis of breast cancer

using contrast limited adaptive histogram equalization

(CLAHE) and Morphology methods’, 2017 8th

International Conference on Information and

Communication Systems, ICICS 2017, pp. 120–124.

doi: 10.1109/IACS.2017.7921957.

Kingma, D. P. and Ba, J. L. (2015) ‘Adam: A method for

stochastic optimization’, 3rd International Conference

on Learning Representations, ICLR 2015 - Conference

Track Proceedings, pp. 1–15.

Makandar, A. and Halalli, B. (2015) ‘Breast Cancer Image

Enhancement using Median Filter and CLAHE’,

International Journal of Scientific & Engineering

Research, 6(4), pp. 462–465. Available at: http://

www.ijser.org.

Mendelson, E. B. and Eb, M. (2019) ‘Imaging : Potentials

and Limitations’, American Journal of Roentgenology,

(February), pp. 1–7. doi: 10.2214/AJR.18.20532.

Metelko, Z. et al. (1995) ‘Pergamon THE WORLD

HEALTH ORGANIZATION QUALITY OF LIFE

ASSESSMENT ( WHOQOL ): POSITION PAPER

FROM THE WORLD HEALTH ORGANIZATION’,

41(10).

Mittas, N. and Angelis, L. (2013) ‘Ranking and clustering

software cost estimation models through a multiple

comparisons algorithm’, IEEE Transactions on

Software Engineering, 39(4), pp. 537–551. doi:

10.1109/TSE.2012.45.

Nahid, A. Al, Mehrabi, M. A. and Kong, Y. (2018)

‘Histopathological breast cancer image classification

by deep neural network techniques guided by local

clustering’, BioMed Research International, 2018. doi:

10.1155/2018/2362108.

Ottoni, A. L. C. et al. (2020) ‘Tuning of reinforcement

learning parameters applied to SOP using the Scott–

Knott method’, Soft Computing. Springer Berlin

Heidelberg, 24(6), pp. 4441–4453. doi: 10.1007/

s00500-019-04206-w.

Perez, L. and Wang, J. (2017) ‘The Effectiveness of Data

Augmentation in Image Classification using Deep

Learning’. Available at: http://arxiv.org/abs/

1712.04621.

Razzak, M. I., Naz, S. and Zaib, A. (no date) ‘Deep

Learning for Medical Image Processing : Overview ,

Challenges and the Future’.

Sadoughi, F. et al. (2018) ‘Artificial intelligence methods

for the diagnosis of breast cancer by image processing:

A review’, Breast Cancer: Targets and Therapy, 10,

pp. 219–230. doi: 10.2147/BCTT.S175311.

Sagi, O. and Rokach, L. (2018) ‘Ensemble learning: A

survey’, Wiley Interdisciplinary Reviews: Data Mining

and Knowledge Discovery, 8(4), pp. 1–18. doi:

10.1002/widm.1249.

Saha, M., Mukherjee, R. and Chakraborty, C. (2016)

‘Computer-aided diagnosis of breast cancer using

cytological images: A systematic review’, Tissue and

Cell. Elsevier Ltd, 48(5), pp. 461–474. doi:

10.1016/j.tice.2016.07.006.

Saikia, A. R. et al. (2019) ‘Comparative assessment of

CNN architectures for classification of breast FNAC

images’, Tissue and Cell. Elsevier Ltd, 57, pp. 8–14.

doi: 10.1016/j.tice.2019.02.001.

Sakia, A. R. M. (2012) ‘The Box-Cox transformation

technique : a review’, 41(2), pp. 169–178.

Sandler, M. et al. (2018) ‘MobileNetV2: Inverted Residuals

and Linear Bottlenecks’, Proceedings of the IEEE

Computer Society Conference on Computer Vision and

Pattern Recognition. IEEE, pp. 4510–4520. doi:

10.1109/CVPR.2018.00474.

SHARMA, J. et al. (2003) ‘Symbiotic Seed Germination

and Mycorrhizae of Federally Threatened Platanthera

praeclara (Orchidaceae)’, The American Midland

Classifying Breast Cytological Images using Deep Learning Architectures

563

Naturalist, 149(1), pp. 104–120. doi: 10.1674/0003-

0031(2003)149[0104:ssgamo]2.0.co;2.

Simonyan, K. and Zisserman, A. (2015) ‘Very deep

convolutional networks for large-scale image

recognition’, 3rd International Conference on Learning

Representations, ICLR 2015 - Conference Track

Proceedings, pp. 1–14.

Spanhol, F. A. et al. (2016) ‘A Dataset for Breast Cancer

Histopathological Image Classification’, IEEE

Transactions on Biomedical Engineering, 63(7), pp.

1455–1462. doi: 10.1109/TBME.2015.2496264.

Szegedy, C. et al. (2014) ‘Rethinking the Inception

Architecture for Computer Vision’.

Szegedy, C. et al. (no date) ‘the Impact of Residual

Connections on Learning’, pp. 4278–4284.

Worsley, A. K. J. (2009) ‘A Non-Parametric Extension of a

Cluster Analysis Method by Scott and Knott Published

by: International Biometric Society Stable URL: http://

www.jstor.org/stable/2529369’, 33(3), pp. 532–535.

Worsley, K. J. (1986) ‘Confidence regions and tests for a

change-point in a sequence of exponential family

random variables’, Biometrika, 73(1), pp. 91–104. doi:

10.1093/biomet/73.1.91.

Xie, J. et al. (2019) ‘Deep learning based analysis of

histopathological images of breast cancer’, Frontiers in

Genetics, 10(FEB), pp. 1–19. doi: 10.3389/

fgene.2019.00080.

Zerouaoui, H. et al. (2021) ‘Breast Fine Needle Cytological

Classification Using Deep Hybrid Architectures BT -

Computational Science and Its Applications – ICCSA

2021’, in Gervasi, O. et al. (eds). Cham: Springer

International Publishing, pp. 186–202.

Zerouaoui, H. and Idri, A. (2021) ‘Reviewing Machine

Learning and Image Processing Based Decision-

Making Systems for Breast Cancer Imaging’. Journal of

Medical Systems.

Zhang, G. et al. (2011) ‘A review of breast tissue

classification in mammograms’, Proceedings of the

2011 ACM Research in Applied Computation

Symposium, RACS 2011, pp. 232–237. doi: 10.1145/

2103380.2103426.

Zhu, C. et al. (2019) ‘Breast cancer histopathology image

classification through assembling multiple compact

CNNs’, BMC Medical Informatics and Decision

Making, 19(1), pp. 1–17. doi: 10.1186/s12911-019-

0913-x.

HEALTHINF 2022 - 15th International Conference on Health Informatics

564