A Convolutional Neural Network Based Patch Classiﬁer Using

Mammograms

Yumnah Hasan

1 a

, Aidan Murphy

2 b

, Meghana Kshirsagar

1 c

and Conor Ryan

1 d

Biocomputing and Developmental Systems Lab, University of Limerick, Ireland

Department of Computer Science, University College Dublin, Ireland

Keywords:

Convolutional Neural Networks, Breast Cancer, Patch Extraction, Image Pre-Processing, Deep Learning.

Abstract:

Breast Cancer is the most prevalent cancer among females worldwide. Early detection is key to good prog-

nosis and mammography is the most widely-used technique, particularly in screening programs. However,

mammography is a highly-skilled and often time-consuming task. Deep learning methods can facilitate the

detection process and assist clinicians in disease diagnosis. There has been much research showing Deep Neu-

ral Networks’ successful use in medical imaging to predict early and accurate diagnosis. This paper proposes a

patch-based Convolutional Neural Network (CNN) classiﬁcation approach to classify patches (small sections)

obtained from mammogram images into either benign or malignant cases. A novel patch extraction approach

method, which we call Overlapping Patch Extraction, is developed and compared with the two different tech-

niques, Non-Overlapping Patch Extraction, and a Region-Based-Extraction. Experimentation is conducted

using images from the Curated Breast Imaging Subset of Digital Database for Screening Mammography. Five

deep learning models, three conﬁgurations of EfﬁcientNet-V2 (B0, B2, and L), ResNet-101, and MobileNet-

V3L, are trained on the patches extracted using the discussed methods. Preliminary results indicate that the

proposed patch extraction approach, Overlapping, produces a more robust patch dataset. Promising results are

obtained using the Overlapping patch extraction technique trained on the EfﬁcientNet-V2L model achieving

an AUC of 0.90.

1 INTRODUCTION

Machine learning (ML) plays a signiﬁcant role in

computer-aided diagnosis (CAD) systems to provide

early and accurate results in different areas of med-

ical imaging. Continuous advancements in ML are

making remarkable contributions to clinical decision

support systems, diagnostic reasoning, automatic de-

tection of disease, and classiﬁcation of cases.

Deep Learning is a sub-ﬁeld of ML inﬂuenced

by artiﬁcial neural networks (ANN), which, in turn,

were inspired by human brain functions and forma-

tion. Convolutional Neural Networks (CNN) provide

models to learn from the data just like human brain

structures. In the area of cancer research, CNNs have

been successfully used for the automated classiﬁca-

tion of breast cancer from mammographic images, a

very challenging task owing to the ﬁne-grained vari-

https://orcid.org/0000-0001-9310-8886

https://orcid.org/0000-0002-6209-4642

https://orcid.org/0000-0002-8182-2465

https://orcid.org/0000-0002-7002-5815

ability in the appearance of breast lesions (Mahmood

et al., 2022).

Image resolution plays an important role in the

performance of CNN because high dimensional im-

age results in more trainable features. However,

high-resolution images are computationally expen-

sive to process in CNNs due to the memory limita-

tions of Graphical Processing Units (GPUs). There-

fore, downsizing is required, which results in the loss

of discriminative features. One feasible approach

to overcome this challenge is to develop a classi-

ﬁer trained on high-resolution image patches, i.e.,

relatively small sections of the whole image, which

can then preserve the maximum information encoded

within the patch.

This research aims to develop a patch extrac-

tion method that resolves the need for full images

to train a CNN. The proposed patch extraction tech-

nique, which we call Overlapping Patch Extraction,

is compared against two different patch extraction ap-

proaches, a Non-overlapping Patch Extraction and

a Region-Based-Extraction (RBE) approach. RBE

Hasan, Y., Murphy, A., Kshirsagar, M. and Ryan, C.

A Convolutional Neural Network Based Patch Classiﬁer Using Mammograms.

DOI: 10.5220/0011790800003393

In Proceedings of the 15th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2023) - Volume 3, pages 869-876

ISBN: 978-989-758-623-1; ISSN: 2184-433X

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

869

is currently the state-of-the-art method used in re-

search. Five deep-learning models are trained to ana-

lyze the efﬁciency of the proposed method. All mod-

els are trained on the Curated Breast Imaging Sub-

set of the Digital Database for Screening Mammog-

raphy (CBIS-DDSM) dataset. The EfﬁcientNet-V2L

trained on our patch extraction method results in an

Area Under the Curve (AUC) of 0.90 outperforming

the other two approaches in this paper.

This paper is organized as follows: the back-

ground is described in Section 2, and the details of the

suggested methodology are covered in Section 3. The

experimental details are covered in Section 4. The

conclusion and future implications are presented in

Section 5.

2 LITERATURE SURVEY

There has been substantial research on image classi-

ﬁcation using DL techniques, among these are excep-

tional results attained from CNN models on breast

cancer classiﬁcation. This section provides insight

into signiﬁcant work done using full images followed

by patches for training a robust classiﬁer.

2.1 Classiﬁcation Using Whole Image

The use of full mammography images has been shown

to yield impressive results. A notable example (Al-

Antari et al., 2020) proposed a novel CAD system

that detects and classiﬁes breast cancer using an entire

mammogram. Three modiﬁed DL models are imple-

mented namely regular feedforward CNN, ResNet-

50, and InceptionResNet-V2. The best results are

attained using InceptionResNet-V2 classiﬁer with an

overall accuracy of 97.50% on the DDSM dataset.

They (Zahoor et al., 2022) used ﬁne-tuned

MobileNet-V2 and Nasnet Mobile to extract deep fea-

tures from the full mammograms. The Modiﬁed En-

tropy Whale Optimization Algorithm (MEWOA) was

used to optimize extracted features. Finally, images

are classiﬁed into benign and malignant using ma-

chine learning classiﬁers. The maximum accuracy

achieved for INBreast, MIAS and CBIS-DDSM are

99.7%, 99.8% and 93.8% respectively.

A Deep one-stage U-Net model was proposed

by (Soulami et al., 2021) for semantic segmentation

and classiﬁcation of mammograms achieving an AUC

of 0.998 for both INbreast and DDSM. They trained

the model from scratch using full images and did not

use any pre-trained weights.

Despite these excellent results, these approaches

have many drawbacks, such as the need for a large

amount of annotated training data required which

is presently limited. Moreover, as the amount of

data increased the computational cost and time also

increased. Using lower-resolution images is one

method to shrink the size of the dataset used. How-

ever, low-contrast input images used for training the

classiﬁer are more likely to result in low accuracy.

Therefore, methods are explored which reduce the

need for a huge amount of data as well as compu-

tationally efﬁcient.

2.2 Classiﬁcation Using Patches

Patch-based classiﬁers offer an attractive alternative

to the use of full-sized images. The patch is a sub-

section of the whole image. In patch-wise training,

small patches are constructed and processed individ-

ually. The advantages include high-resolution im-

ages that can be processed more quickly when they

are divided into small patches. Moreover, computa-

tional cost can be signiﬁcantly decreased when small

patches are processed, compared to the whole image.

Much research has been conducted in this area.

In (El Houby and Yassin, 2021), two DNN-based

models were created. The ﬁrst was a patch classi-

ﬁer for the Region Of Interest (ROIs), while the sec-

ond was a whole image classiﬁer. The patches were

extracted from the INBreast dataset, which was an-

notated and diagonal extreme points of breast lesions

were present. According to the given points (Xmin,

Ymin) and (Xmax, Ymax), the bounding box around

the ROI was created and cropped for patch extraction.

A sliding window approach (Agarwal et al., 2019)

was used to scan the whole image and extract the

patches using a stride (56 × 56) which determines

the minimum overlap between the two consecutive

patches. Data labelling was performed based on the

central pixel value of patches. Using the annotated la-

bels of ROI if the central pixel of the patch lies inside

the mass it has been taken as a positive label if not

it is assigned as a negative label. They used benign

and malignant masses from the CBIS-DDSM dataset,

from where ﬁrst positive patches were extracted, and

then randomly an equal number of negative patches

were extracted. Their proposed results for patch clas-

siﬁers using the Inception-V3 model produced a test

accuracy of 84.16%.

In (Yu et al., 2020), a deep fusion model was

developed to perform image classiﬁcation using

the Mammographic Image Analysis Society (MIAS)

dataset that contains a small set of mammographic

images. Training a deep model using a small sam-

ple set is challenging. Hence, they used a patch-based

approach to solve this problem. ROIs were extracted

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

870

from the images using preprocessing techniques. Af-

terwards, random patches were extracted around each

ROI of size 72 × 72. They used two classes, benign

and malignant for classiﬁcation and the best results

obtained from their proposed model-1 VGG-16 Fu-

sion 1 reveal an accuracy of 89.06%.

(Shen et al., 2019) presents a patch classiﬁer

based end-to-end training approach. Original images

were downsized to meet the limited GPU memory

space. They created two subsets of patch datasets re-

ferred to as S1 and S10. The S1 patches were ex-

tracted from the centre of ROI, while the S10 patches

were extracted randomly around the ROI with a mini-

mum overlapping ratio between ROI and background

area set to 0.9. All the patches of size 224 × 224

are extracted. For the patch classiﬁer, they used the

CBIS-DDSM dataset and achieved the best results

from the pre-trained ResNet-50 model having an ac-

curacy of 0.89. They used ﬁve classes for the classiﬁ-

cation task: benign mass, malignant mass, benign cal-

ciﬁcation, malignant calciﬁcation, and background.

(Petrini et al., 2022) created a patch classiﬁer

that to develop a single and two-view whole image

classiﬁer. Their patch extraction approach is inspired

by (Shen et al., 2019). They applied two differ-

ent test conﬁgurations of the dataset that include 5-

fold cross-validation (CV) and original division (OD).

They used ﬁve classes, the same as (Shen et al.,

2019). The highest accuracy of the patch classiﬁer for

the OD dataset obtained is 75.54% using EfﬁcientNet-

B0.

Figure 1: Pipeline of the entire process from patch extrac-

tion to tumor classiﬁcation.

3 METHODOLOGY

The proposed framework in Figure 1 is divided into

the following stages. In the ﬁrst step, data is pre-

processed as discussed in Section 3.1 to remove the

additional noise and artefacts; otherwise, these un-

desirable objects negatively affect a model’s accu-

racy. Secondly, patches are extracted from the pre-

processed dataset and labelled according to the an-

notations provided in the dataset. Finally, ﬁve CNN

models are trained and evaluated on the test dataset to

compare the performance. We compare our proposed

method, Overlapping Patch Extraction, described in

Section 3.4, to two other methods, Non-overlapping

Patch Extraction and RBE, described in Sections 3.3

and 3.2, respectively.

3.1 Pre-Processing

Image pre-processing is the most important part of

designing an efﬁcient CAD system. In this step, un-

wanted objects such as annotations and noise are re-

moved from the images. In this research, the follow-

ing pre-processing steps are applied to enhance the

quality of images:

• Images are converted from the DICOM format

(generally used for medical images as they can

contain rich meta-data) and converted to the

Portable Network Graphic (PNG) format.

• Otsu thresholding (Otsu, 1979), is applied to sep-

arate the foreground pixels from the background.

• Artefacts are removed by using contour detection.

To do this, the binarised version of the image is

evaluated to ﬁnd the largest contour. Next, the

maximum area of the connected region is iden-

tiﬁed and all the other small and residual regions

are eliminated.

• Images are cropped by removing additional back-

ground areas to increase the recognition rate.

3.2 Region-Based-Extraction

The developed RBE technique (Shen et al., 2019) is

implemented in this study to extract patches in which

mammograms are ﬁrst downsized to 1152×896 using

the interpolation technique. Images are pre-processed

as described previously to remove the noise and arte-

facts. The rectangular area around the contour is se-

lected and deﬁned as the ROI for patch extraction.

Ten patches of size 224 × 244 are extracted randomly

around the ROI. The overlapping ratio is between 0.5

to 0.9. If there is no ROI present in the case of a be-

nign image, then ten patches are extracted from any-

where within the image. Labels of benign and malig-

nant patches are assigned according to the annotation

provided in the dataset. Patches containing more than

50% black pixels are ignored and not included in the

training set. The benign patches are higher in quan-

tity than the malignant patches, so sampling is used

to resolve the imbalance of data problem. The total

A Convolutional Neural Network Based Patch Classiﬁer Using Mammograms

871

training patches of benign and malignant used in this

work are 11828.

3.3 Non-Overlapping Patch Extraction

In this method, images are resized and sliced into

equal size patches. The patch size of 256 × 256 is

selected and a total of 16 patches per image are ex-

tracted. No overlapping is taken into consideration

during the extraction. The labelling of patches is per-

formed on the threshold if more than 70% of ROI is

overlapped with the patch then it is labelled as ma-

lignant; otherwise patch is labelled as benign. From

this approach, the number of benign patches for train-

ing is 12099 while, for malignant patches, there are

only 926, giving quite an unbalanced set. To reduce

the number of benign patches, those containing more

than 90% black pixels were removed.

3.4 Proposed Method: Overlapping

Patch Extraction

The CBIS-DDSM dataset contains lesions for each

category of images i.e benign or malignant. The seg-

mented binary masks are also present and are used to

differentiate between the background and the cancer-

ous region. This research proposes a novel approach

to extracting patches from the whole image. The im-

ages are not downsized to retain the maximum infor-

mation from the original sample. Our method auto-

matically extracts patches from the images using im-

age height, “ImgH”, and width, “ImgW”, as key at-

tributes. Thus, for a sliding window of size 224×224,

WinH and WinW are the window height and window

width respectively. It is used to scan the full image

to extract the patches. It scans the image from the

top leftmost area of the image to the bottom right-

most area with a stride of size S= 56 × 56. StrH is

taken as the stride height and StrW is deﬁned as the

stride width. After scanning the whole row it goes to

the next row with the same stride size and continues

until all the image is scanned. In this way, with the

scanning operation on the whole image, we are able

to extract the maximum patches per image. The equa-

tions below show the mathematical form of the entire

process.

TotalRows(R) =

x=n

∑

x=0

R =

(ImgH −WinH)

StrH

(1)

TotalColumns(C) =

x=n

∑

x=0

C =

(ImgW −WinW )

StrW

(2)

TotalPatches(P) =

x=n

∑

x=0

P = R ∗C (3)

where R and C are the numbers of rows and columns

from 0 to the nth pixel in the image are used to calcu-

late the number of patches per image. It is important

to mention that patch, 224 × 224, is selected due to

the fact that smaller patches do not contain enough

information to extract the unique features. On the

other hand, a large patch size will increase the com-

putational cost and time for training. Therefore, the

most appropriate size was found after performing ex-

perimentation on different patch sizes of 128 × 128,

64 × 64, and 512 × 512.

To label, the image annotations are used from the

given CSV ﬁle. The segmented ROI mask of the cor-

responding image is used to assigned the labels. If the

patch contains more than 50% of ROI it is labelled as

malignant otherwise, declared as benign. First malig-

nant patches are extracted and then the benign patches

are sampled out to minimize the difference between

the number of malignant and benign patches. To in-

crease the number of patches for training the DL mod-

els, data augmentation is used. The patches are ran-

domly selected and then one of the operations includ-

ing vertical ﬂip, horizontal ﬂip, crop, and rotated is

performed.

4 EXPERIMENTAL DETAILS

We chose ﬁve CNN models to test the performance

of the processed technique: EfﬁcientNet-V2B0,

EfﬁcientNet-V2B2, EfﬁcientNet-V2L, ResNet-101,

and MobileNet-V3L. Pretrained weights of ImageNet

are used to ﬁne-tune all ﬁve models. These mod-

els were chosen based on their performance in pre-

vious works that used the same dataset as ours. As

discussed in the literature, these architectures were

shown to have a high train/test accuracy. In addition,

they are faster to train and computationally efﬁcient.

4.1 Dataset Preparation

The dataset used in this paper was developed by

(Lee et al., 2017). It is a publicly available updated

and standardized version of (DDSM). The images are

compressed to 5000 × 3000 pixels and converted into

Digital Imaging and Communications in Medicine

(DICOM) format. The veriﬁed pathological informa-

tion is present in CSV ﬁles for three different cate-

gories: normal, benign, and malignant. The database

contains full mammograms, ROI, masses, and calci-

ﬁcation for both benign and malignant. In this work,

only mass images are used for training and testing.

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

872

There is a total of 1318 images of full mammograms

for training, while for testing 378 images are present.

For training, 681 benign and 637 malignant images

are used, while 231 and 147 images are processed for

testing. The details are present in Table 1.

Table 1: CBIS-DDSM Dataset details used for patch extrac-

tion.

Category Training Testing Total

Benign 681 231 912

Malignant 637 147 784

Table 2: Details of patches used for training and testing (B=

Benign, M= Malignant, Tr= training patches, Ts= testing

patches and Tr Ag= Augmented training patches).

B M Tr Ts Tr Ag

Overlapping

X 968 454 1936

X 1525 454 3050

Non-Overlapping

X 12099 454 -

X 926 454 -

Region-Based-Extraction

X 5914 1465 -

4.2 Training Setup

There are ﬁve CNN models selected for the eval-

uation of the proposed approach that includes

EfﬁcientNet-V2L, EfﬁcientNet-V2B0, EfﬁcientNet-

V2B2, ResNet-101, and MobileNet-V3L. They are

initially trained on the ImageNet dataset with in-

put dimensions 224 × 224 × 3. The three dimen-

sions indicate red, blue, and green channels. How-

ever, patches extracted from the mammograms con-

tain only a single channel i.e grey level, so all the

patches of size 224 × 224 × 1 are then converted into

three-dimensional channels to construct compatible

input patches that can be given as input to the pre-

trained CNN models. For training, we divide the

dataset into training and validation, using the pre-

trained weights of ImageNet for training. The number

of iterations taken to process the dataset is deﬁned as

the epochs and the validation set is used to determine

the level of performance after each epoch. The train-

ing hyperparameters are ﬁxed for all ﬁve models. The

change in the model’s response after each epoch to-

wards the estimated error is determined by the learn-

ing rate (LR). The value of LR is 10e-4, which is the

same for all ﬁve models. Adam, inspired by the Adap-

tive moment estimation, is an efﬁcient stochastic de-

scendent gradient optimizer used in the training as it

takes only ﬁrst-order gradients with reduced memory

requirements. A batch size of 16 has been opted for

because a larger batch size can reduce the learner’s

ability to generalize (Keskar et al., 2016) leading to a

model becoming stuck in a local minimum. A lower

batch size helps the model to ﬁnd global minima.

All the experiments are performed on Google Colab

Pro+.

The number of patches used for training the mod-

els by using methods Overlapping Patch Extraction,

Non-Overlapping Patch Extraction, and RBE are pro-

vided in Table 2. Here, for the ﬁrst experiment, Non-

Overlapping training patches are used. For the sec-

ond experiment patches are extracted from the Over-

lapping Patch Extraction technique without augmen-

tation images are trained. The third experiment uses

Overlapping Patch Extraction with the augmented

dataset. In the ﬁnal experiment, extracted patches

from the RBE technique are used to evaluate CNN

models. The details of experiments performed during

the training are listed below:

• Experiment No. 1 using Non-Overlapping

Patch Extraction: Equal-size patches of 256 ×

256 are extracted and used to generate a training

patches dataset. Downsized images are used with

equal stride sizes to take only non-overlapping

patches. No data augmentation is performed dur-

ing the training and original patches are used to

train the ﬁve CNN models. The training model

consists of a feature extractor as the backbone

that connects to the global average pooling layer.

Afterwards, it is followed by one hidden layer

and one softmax layer as the output. The model

is trained with categorical cross-entropy loss and

Adam optimizer. The dropout rate is taken at 0.2.

• Experiment No. 2 using Overlapping Patch Ex-

traction: Patches are extracted from the mammo-

graphic images of the 4921 × 2085 input shape.

The optimal patch size of 224 × 224 is used to

extract patches with a stride of 56x56. Overlap-

ping patches are used during the slicing of the im-

age. To reduce the computational cost and pre-

serve the maximum information by using the orig-

inal size of the input image these patches are con-

sidered. During the training data augmentation

pipeline is deﬁned including ﬂip, rotate, and crop

to augment the patches at each epoch. Therefore,

more patches can be obtained to train a successful

model. There are 681 benign and 968 malignant

patches used for training. The model conﬁgura-

tions used in this experiment are the same as those

described in experiment no 1.

• Experiment No. 3 using Overlapping Patches:

Image scanning using a sliding window approach

is used to extract patches of size 224 × 224 with

a stride of size 56 × 56. There are 50% over-

lapping patches are taken to increase the number

A Convolutional Neural Network Based Patch Classiﬁer Using Mammograms

873

of patches. The original dimension of the im-

age is used and no downsizing is performed to

retrain the maximum information within the im-

age. Data augmentation is performed exclusively

before training the patches. Each one of the four

different operations is selected randomly to aug-

ment the patch. In this way, a total of 1936 benign

patches and 3050 malignant patches are used for

training the models. The training model consists

of a feature extractor as the network’s backbone,

with a size of 224 × 224 × 3. The backbone con-

nects to the ﬂattened layer and it is followed by

four hidden layers and one output layer i.e soft-

max layer. It is trained with the categorical cross-

entropy loss and Adam optimizer, and the dropout

rate is 0.2.

• Experiment No. 4 using RBE: Images are ﬁrst

downsized to 1152 × 896 and then patches are ex-

tracted using RBE technique (Shen et al., 2019)

approach as discussed previously. Patches of size

224×224 are used for training the ﬁve CNN mod-

els. Data is not augmented explicitly before the

training as mentioned in the original paper. The

patches are augmented randomly during the train-

ing only using one arbitrary function containing

horizontal ﬂip, vertical ﬂip, crop or rotate. A to-

tal of 11828 training and 2930 testing patches are

used in this experiment. The training settings of

all the models and hyper-parameter details are the

same as mentioned in experiment no. 1.

5 RESULTS AND DISCUSSION

The performance of the proposed framework is mea-

sured by using the evaluation metrics parameters

which are accuracy, F-score, precision, AUC, and sen-

sitivity. System performance is measured by calculat-

ing the AUC. This is calculated by taking true posi-

tives, true negatives, and false negatives into consid-

eration. Accuracy is deﬁned by equation 4, below.

Here, T

Pos

shows the actual malignant cases, and T

Neg

are lesions correctly diagnosed as benign. F

Pos

incor-

rectly classiﬁed benign regions as malignant and T

Neg

represents malignant lesions deﬁned as benign.

= (T

Pos

+ T

Neg

)/ (T

Pos

+ F

Neg

) + (F

Pos

+ T

Neg

)

(4)

The AUC shows how well a model can discrim-

inate between benign and malignant cases. The re-

ceiver operating characteristic (ROC) curve values are

summoned to one. The graph is plotted between the

true-positive rate (TPR) and false-positive (FPR) rate.

The relation of TPR and FPR is known as sensitivity

(recall) and is provided in the below equation 5:

Sensitivity = (T

Pos

)/ (T

Pos

) + (F

Neg

) (5)

The precision is an integral property that describes

the ratio of positive predicted cases against all the ac-

tual positive cases. By reducing the FPR rate, high

accuracy can be achieved. It is deﬁned by the follow-

ing equation 6:

Precision = (T

Pos

)/ (T

Pos

) + (F

Pos

) (6)

The collective mean of precision and recall deﬁnes

the successful score of the model performance on the

test dataset. The F-Score is calculated by the follow-

ing equation 7:

F-score = 2 ∗

Pos

∗

Pos

Neg

Pos

Neg

(7)

The ratio of the number of predicted negatives

with all the actual negatives is deﬁned by the follow-

ing equation 8 known as speciﬁcity:

Speci f icity =

Neg

+ F

Pos

)

(8)

The proposed patch extraction method discussed

in this paper is based on the sliding window approach

of patch size 224 × 224 and is shown to improvise ac-

curacy over existing patch extraction methods. Three

patch extraction approaches are compared to identify

the best-performing method for onward model train-

ing. Moreover, the modiﬁed architecture of the CNN

model is used and provides the highest AUC, 0.90, by

using the EfﬁcientNet-V2L architecture. The results

reveal that our proposed model successfully classi-

ﬁes the breast mass’s ROIs into malignant and benign

categories. Four experiments are performed to ﬁnd

the best model using three different patch extraction

methods.

The overall best performing model from the Over-

lapping method using an augmented database, de-

scribed in experiment no. 3, was EfﬁcientNet-V2L

having precision, F1, Sensitivity, and speciﬁcity of

0.90, 0.90, 0.91, and 0.88, respectively, as shown in

Table 3 model A. When using this approach but with-

out augmentation, experiment no. 2, on the training

database, EfﬁcientNet-V2L achieves precision=0.89,

F1=0.89, sensitivity=0.91, and speciﬁcity=0.87 as

given in Table 3 model B. Non-Overlapping men-

tioned in the experiment no. 1 showed the lowest pre-

cision, F1, accuracy, sensitivity, and speciﬁcity val-

ues, with 0.81, 0.58, 0.55, 0.10, and 0.99 respec-

tively, see Table 3 model C. Since augmentation is not

making a signiﬁcant difference in the accuracy of the

model, results from the Non-Overlapping method are

extremely poor. Therefore, no augmentation is per-

formed for this method.

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

874

Table 3: Results of Models (A) Augmented Overlapping Patch Extraction (B) Non-Augmented Overlapping Patch Extraction

Model A Precision F1 AUC Accuracy Sensitivity Speciﬁcity

EfﬁcientNet-V2B2 0.88 0.88 0.87 0.87 0.86 0.88

ResNet-101 0.86 0.85 0.85 0.85 0.92 0.78

EfﬁcientNet-V2B0 0.87 0.87 0.87 0.87 0.89 0.85

MobileNet-V3L 0.81 0.80 0.79 0.79 0.68 0.91

EfﬁcientNet-V2L 0.90 0.90 0.90 0.90 0.91 0.88

Model B Precision F1 AUC Accuracy Sensitivity Speciﬁcity

EfﬁcientNet-V2B2 0.84 0.84 0.84 0.84 0.82 0.85

ResNet-101 0.86 0.86 0.85 0.86 0.90 0.82

EfﬁcientNet-V2B0 0.86 0.86 0.86 0.86 0.83 0.89

MobileNet-V3L 0.79 0.75 0.75 0.76 0.93 0.58

EfﬁcientNet-V2L 0.89 0.89 0.88 0.88 0.91 0.87

Model C Precision F1 AUC Accuracy Sensitivity Speciﬁcity

EfﬁcientNet-B2V2 0.77 0.67 0.62 0.62 0.26 0.99

ResnNet-101 0.47 0.49 0.50 0.50 0.00 1.00

EfﬁcientNet-V2B0 0.77 0.65 0.61 0.61 0.22 0.99

MobileNet-V3L 0.74 0.64 0.60 0.60 0.22 0.98

EfﬁcientNet-V2L 0.81 0.58 0.55 0.54 0.10 0.99

Model D Precision F1 AUC Accuracy Sensitivity Speciﬁcity

EfﬁcientNet-V2B2 0.76 0.75 0.74 0.75 0.64 0.84

ResNet-101 0.77 0.75 0.75 0.75 0.86 0.64

EfﬁcientNet-V2B0 0.75 0.74 0.73 0.74 0.65 0.82

MobileNet-V3L 0.77 0.77 0.77 0.77 0.81 0.72

EfﬁcientNet-V2L 0.78 0.78 0.78 0.78 0.76 0.79

(a) (b) (c) (d)

Figure 2: ROC analysis (a) Augmented Overlapping Patch Extraction (b) Non Augmented Overlapping Patch Extraction (c)

Non-Overlapping Patch Extraction technique (d) RBE state-of-the-art method.

We also tested RBE using experiment no. 4 in this

paper, and the results in Table 3 model D reveal the

low performance of all ﬁve CNN models using this

technique. The maximum precision is 0.78, sensitiv-

ity reaches 0.76, F1 is 0.78, and speciﬁcity is 0.79.

Recall that AUC and accuracy are 0.78, noted on the

EfﬁcientNet-V2L model. The test dataset is created

separately for this approach because the images are

downsized initially, as described in the existing pa-

per. In the other two approaches, patches are extracted

from the full-size images.

The proposed framework using the Overlapping

technique with an augmented database trained on

the EfﬁcientNet-V2L architecture reveals the high-

est AUC of 0.90 when tested using the CBIS-DDSM

dataset as shown in Figure 2(a). A decrement in the

AUC is observed with the same Overlapping method

when used without an augmented dataset. The highest

AUC is 0.88 noted by using EfﬁcientNet-V2L can be

seen in Figure 2(b). RBE technique reveals mediocre

results. Figure 2(d) shows the maximum AUC of 0.78

is achieved from the EfﬁcientNetV2L model. The

worst results are obtained from the Non-Overlapping

technique, and the AUC barely reaches 0.55 for the

same deep model as provided in Figure 2(c).

The proposed framework of patch-based CNN

training shows promising results; however, a direct

comparison with other proposed approaches is difﬁ-

cult as there are some differences in the testing sce-

narios. Results from the proposed method are com-

petitive and hold the potential to outperform the ex-

isting techniques when the same testing parameters

A Convolutional Neural Network Based Patch Classiﬁer Using Mammograms

875

are applied.

6 CONCLUSION AND FUTURE

WORK

This study proposes a patch-based CNN model train-

ing technique to classify breast mammograms into be-

nign or malignant categories and test on a publicly

available dataset of mammograms, CBIS-DDSM,

which was used to classify cancerous and non-

cancerous regions. Our proposed system extracts

overlapping patches using the Overlapping Patch Ex-

traction method, and we compare them with the Non-

Overlapping Patch Extraction approach and Region-

Based-Extraction approach, which is state-of-the-art.

The state-of-the-art approach downsizes the images,

which may result in the loss of discriminative fea-

tures. However, full-size images are used in this work

for patch extraction. The patches are labelled based

on the threshold of ROI using the segmented masks.

The latest CNN models are explored to test the per-

formance of the proposed technique. In our suggested

Overlapping method, whole images are scanned using

the sliding window approach, and a patch database

is created for the training. The best results are ob-

tained using an augmented version of our proposed

approach, the Overlapping Patch Extraction method

trained on the EfﬁcentNet-V2L architecture revealing

an AUC of 0.90.

In the future, a density-based patch extraction

technique can extract more informative patches that

help improve the model’s performance. Moreover,

Generative Adversarial Networks (GANS) can be

used to generate more synthetic data that can directly

contribute towards the successful training of DL mod-

els.

ACKNOWLEDGEMENTS

This work was conducted with the ﬁnancial support of

the Science Foundation Ireland Centre for Research

Training in Artiﬁcial Intelligence under Grant No.

18/CRT/6223. Moreover, we would like to thank

Naveed Shahid and Allan de Lima for their immense

support.

REFERENCES

Agarwal, R., Diaz, O., Llad

o, X., Yap, M. H., and Mart

ı, R.

(2019). Automatic mass detection in mammograms

using deep convolutional neural networks. Journal of

Medical Imaging, 6(3):031409.

Al-Antari, M. A., Han, S.-M., and Kim, T.-S. (2020). Eval-

uation of deep learning detection and classiﬁcation to-

wards computer-aided diagnosis of breast lesions in

digital x-ray mammograms. Computer methods and

programs in biomedicine, 196:105584.

El Houby, E. M. and Yassin, N. I. (2021). Malig-

nant and nonmalignant classiﬁcation of breast le-

sions in mammograms using convolutional neural net-

works. Biomedical Signal Processing and Control,

70:102954.

Keskar, N. S., Mudigere, D., Nocedal, J., Smelyanskiy, M.,

and Tang, P. T. P. (2016). On large-batch training for

deep learning: Generalization gap and sharp minima.

arXiv preprint arXiv:1609.04836.

Lee, R. S., Gimenez, F., Hoogi, A., Miyake, K. K., Gorovoy,

M., and Rubin, D. L. (2017). A curated mammogra-

phy data set for use in computer-aided detection and

diagnosis research. Scientiﬁc data, 4(1):1–9.

Mahmood, T., Li, J., Pei, Y., Akhtar, F., Rehman, M. U.,

and Wasti, S. H. (2022). Breast lesions classiﬁca-

tions of mammographic images using a deep convo-

lutional neural network-based approach. Plos one,

17(1):e0263126.

Otsu, N. (1979). A threshold selection method from gray-

level histograms. IEEE transactions on systems, man,

and cybernetics, 9(1):62–66.

Petrini, D. G., Shimizu, C., Roela, R. A., Valente, G. V.,

Folgueira, M. A. A. K., and Kim, H. Y. (2022).

Breast cancer diagnosis in two-view mammography

using end-to-end trained efﬁcientnet-based convolu-

tional network. Ieee Access, 10:77723–77731.

Shen, L., Margolies, L. R., Rothstein, J. H., Fluder, E.,

McBride, R., and Sieh, W. (2019). Deep learning to

improve breast cancer detection on screening mam-

mography. Scientiﬁc reports, 9(1):1–12.

Soulami, K. B., Kaabouch, N., Saidi, M. N., and Tamtaoui,

A. (2021). Breast cancer: One-stage automated detec-

tion, segmentation, and classiﬁcation of digital mam-

mograms using unet model based-semantic segmen-

tation. Biomedical Signal Processing and Control,

66:102481.

Yu, X., Pang, W., Xu, Q., and Liang, M. (2020). Mammo-

graphic image classiﬁcation with deep fusion learn-

ing. Scientiﬁc Reports, 10(1):1–11.

Zahoor, S., Shoaib, U., and Lali, I. U. (2022). Breast can-

cer mammograms classiﬁcation using deep neural net-

work and entropy-controlled whale optimization algo-

rithm. Diagnostics, 12(2):557.

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

876