Deep Learning in Breast Calciﬁcations Classiﬁcation: Analysis of

Cross-Database Knowledge Transferability

Adam Mra

cko

1 a

, Ivan Cimr

1 b

, Lucia Vanov

canov

2,3 c

and Viera Lehotsk

2,3 d

Faculty of Management Science and Informatics, University of

Zilina, 010 26

Zilina, Slovakia

2nd Radiology Department, Faculty of Medicine, Comenius University in Bratislava, 813 72 Bratislava, Slovakia

St. Elizabeth Cancer Institute, 812 50 Bratislava, Slovakia

Keywords:

Convolutional Neural Networks, Artiﬁcial Intelligence, Mammography, Machine Learning, Breast

Calciﬁcations.

Abstract:

Study delves into the application of deep learning models for the classiﬁcation of breast calciﬁcations in mam-

mography images. Initial objective was to investigate various convolutional neural network (CNN) architec-

tures and their inﬂuence on model accuracy. ResNet101 emerged as the most effective architecture, although

other models exhibited comparable performances. The insights gained were subsequently applied to the main

goal, which focused on examining the transferability of knowledge between models trained on digitalized

ﬁlms (Curated Breast Imaging Subset of Digital Database for Screening Mammograph) and those trained on

digital mammography images (Optimam Database). Results conﬁrmed the lack of seamless transferability,

prompting the creation of a combined dataset for training, signiﬁcantly improving overall model accuracy to

76.2%. The study also scrutinized instances of incorrect predictions across different models, particularly those

posing challenges even for medical professionals. Visualizations using Grad-Cam aided in understanding the

models’ decision-making process.

1 INTRODUCTION

Breast cancer is the most common type of cancer

among women (Sung et al., 2021). Early detection

through mammographic screening leads to prompt

treatment and better patient prognosis. Many coun-

tries have implemented mammographic screening to

detect individuals with carcinoma before the onset

of symptoms, starting from the age of 45. Common

ﬁndings in mammography include masses, calciﬁca-

tions (macro- and micro-), architectural distortions,

and asymmetries. Mammography excels in detect-

ing pathological microcalciﬁcations, with their detec-

tion often leading to the discovery of ductal carci-

noma in situ (DCIS), a pre-invasive type of breast

cancer that can progress to a more dangerous inva-

sive type. Approximately 80%-90% of DCIS cases

are diagnosed through mammography (Grimm et al.,

2022), accounting for about 20-30% of all breast can-

https://orcid.org/0009-0004-6538-6896

https://orcid.org/0000-0002-0389-7891

https://orcid.org/0000-0003-2363-1238

https://orcid.org/0000-0003-4083-8097

cer types (Allred, 2010).

Diagnosing microcalciﬁcations is complex due to

variations in shape, density, size, number, and dis-

tribution—either diffuse or clustered. The challeng-

ing diagnosis of suspicious ﬁndings results in a high

number of false positives, with only about 15%-

45% (Chhatwal et al., 2010) of biopsy cases turn-

ing out positive. Waiting for biopsy results nega-

tively impacts patients’ health due to increased stress.

Mammographic examinations undergo double read-

ing, where two independent radiologists assess pa-

tient images and must agree on the ﬁnal evaluation.

The high patient volume, demanding diagnostics, and

double reading contribute to a heavy workload for

medical personnel.

Introducing artiﬁcial intelligence models into the

examination process could potentially expedite the

entire procedure and, with high accuracy in classiﬁ-

cation tasks, even emulate the second doctor in dou-

ble reading. Convolutional Neural Networks (CNN)

are the most suitable models, currently unparalleled

in image data processing tasks. CNNs can handle

classiﬁcation, detection, and segmentation tasks. This

study focuses on the classiﬁcation of ﬁndings with

Mra

cko, A., Cimrák, I., Vanov

canová, L. and Lehotská, V.

Deep Learning in Breast Calciﬁcations Classiﬁcation: Analysis of Cross-Database Knowledge Transferability.

DOI: 10.5220/0012535200003657

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2024) - Volume 1, pages 527-535

ISBN: 978-989-758-688-0; ISSN: 2184-4305

527

microcalciﬁcations, speciﬁcally binary classiﬁers de-

termining whether a given ﬁnding belongs to the ma-

lignant or benign class. It will utilize two databases

with mammography images obtained through differ-

ent technologies. The analysis will focus on the trans-

ferability of knowledge between models trained on

older technology (digitalized ﬁlms) and modern tech-

nology (digital mammography, also known as full-

ﬁeld digital mammography). Additionally, it ad-

dresses how to manage high-resolution mammogra-

phy images and interpret models to ensure they make

decisions based on crucial radiological features.

2 MAMMOGRAPHY DATA

The mammography images used were obtained

from the Curated Breast Imaging Subset of Digi-

tal Database for Screening Mammography (CBIS-

DDSM) (Lee et al., 2017) and the Optimam database

(OMI-DB) (Halling-Brown et al., 2021).

CBIS-DDSM is a subset of the Digital Database

for Screening Mammography (DDSM), which has

been updated and standardized. DDSM contains

digitalized screen ﬁlms (indirect digital mammogra-

phy). In CBIS-DDSM, these images were converted

to the DICOM format, which is the current standard

in medicine. The database is divided into two main

groups based on the type of ﬁndings: masses and

calciﬁcations. Each ﬁnding includes a segmentation

mask (Figure 1) and a histopathological result. The

database also includes a split into training and testing

sets.

Figure 1: Mammogram (left) and binary mask (right).

OMI-DB is a comprehensive database that con-

tinuously collects images with associated data

(histopathology results) from several oncology insti-

tutes in the United Kingdom. Unlike CBIS-DDSM, it

is not freely available, and access requires afﬁliation

with a commercial, academic, or non-proﬁt organiza-

tion. Mammographic images come from direct digital

mammography, which is a signiﬁcant advantage com-

pared to CBIS-DDSM. OMI-DB includes all types of

ﬁndings (masses, calciﬁcations, architectural distor-

tions, asymmetries, and their combinations), as well

as images of patients without a record of the biopsy

performed. Findings can be localized with rectangle

bounding boxes (coordinates of the top left corner and

bottom right corner).

2.1 Data Preprocessing

The study focused on patches with ﬁndings of cal-

ciﬁcations. The ﬁrst step in data preprocessing was

to ﬁlter the correct data from the used databases.

CBIS-DDSM provided a direct distribution contain-

ing only calciﬁcation ﬁndings. For OMI-DB, data

needed more complex ﬁltering. The ﬁlter included

the following conditions:

• Only images with a single bounding box (had to

be present on the image and have a non-zero area).

• Only calciﬁcation ﬁndings (without various com-

binations with other ﬁndings).

• We accepted only the ﬁndings with which the

histopathological result could be unambiguously

associated.

• Accepted histopathological result values were

only Malignant or Benign.

From the obtained dataset, all ﬁndings that would

not ﬁt into a square (patch) of size 674x674 pixels due

to their size were removed (Figure 3, 4, 5, 6). The de-

cision to use the resolution of 674x674 pixels as input

for models was based on histograms, aiming to re-

move as few ﬁndings as possible while not exceeding

GPU memory limitations. More details on why not to

resize patches and why not to use entire images are in

Section 3.1.

Figure 2: Patch creation from whole mammogram.

For images with dimensions smaller than

674x674, the surrounding area from the mammogram

was added (Figure 2). If possible, the patches were

centered on the lesion. In the case of ﬁndings at

the edges of the mammogram, the patch was shifted

more towards the inside of the mammogram. Before

BIOINFORMATICS 2024 - 15th International Conference on Bioinformatics Models, Methods and Algorithms

528

creating patches, mammograms were normalized to

values between 0 and 1.

Figure 3: Resolution distribution of calciﬁcation ﬁndings in

the CBIS-DDSM.

Figure 4: Resolution distribution of calciﬁcation ﬁndings,

not higher than 674x674 pixels, in the CBIS-DDSM.

The CBIS-DDSM database contained numerous

masks with different resolutions compared to the cor-

responding mammograms. These masks were scaled

to the correct resolution. Additionally, about 30 addi-

tional adjustments were made, involving slight shifts

of masks located next to the ﬁnding. If one mammo-

gram had multiple masks and their ﬁndings were too

close, the masks were uniﬁed. Some ﬁndings were re-

moved if the mask did not contain any calciﬁcations.

The OMI-DB database contained several inverted

images, which were corrected using inversion. Im-

ages with lower quality and incorrect gray back-

grounds were discovered and retained in the dataset

(digital mammography should contain a black back-

ground - value 0, or completely white for inverted im-

ages).

After data processing, 2947 training data (OMI-

DB: 1786, CBIS-DDSM: 1161) and 684 validation

data (OMI-DB: 443, CBIS-DDSM: 241) were ob-

tained. For CBIS-DDSM data, their ofﬁcial distri-

bution into training/testing sets was used. In both

databases, the classes were imbalanced (Table 1). The

data between the training and validation sets were in-

dependent, and thus, the same patient could not be

in both the validation and training sets. Methods like

k-fold cross-validation were not used due to the time-

consuming nature of experiments on high-resolution

images.

Table 1: Amount of data in speciﬁed classes.

Dataset Malignant Benign

OMI-DB train 1329 457

OMI-DB val. 327 116

CBIS-DDSM train 309 852

CBIS-DDSM val. 74 167

Figure 5: Resolution distribution of calciﬁcation ﬁndings in

the OMI-DB.

Figure 6: Resolution distribution of calciﬁcation ﬁndings,

not higher than 674x674 pixels, in the OMI-DB.

3 EXPERIMENTAL STUDIES

The study focused on four types of experiments. Ini-

tially, it was essential to explore how the downsiz-

ing of patches would impact the accuracy of the mod-

els. The goal of the second experiment was to ﬁnd

a suitable convolutional architecture for classifying

patches. The last two experiments took advantage

of access to the two databases with mammograms

obtained using different technologies. Firstly, we

observed how models performed with images from

a different technology. Subsequently, we analyzed

patches for which predictions were consistently incor-

rect across all models.

3.1 Image Downscaling

In convolutional neural networks (CNNs), it is com-

mon to downscale input images for various bene-

ﬁts, such as faster training and reduced GPU mem-

ory requirements. The typical input resolution

used is 224x224 pixels. However, mammography

images have very high resolutions, often exceed-

ing 4000x4000 pixels. Shrinking mammograms to

224x224 would result in signiﬁcant information loss.

Deep Learning in Breast Calciﬁcations Classiﬁcation: Analysis of Cross-Database Knowledge Transferability

529

While reducing to half or quarter size may seem like

a solution, training models on entire images for clas-

siﬁcation tasks (e.g., presence or absence of cancer)

would not yield reasonable accuracy due to limited

data, image complexity, and information loss.

Therefore, this study focused on classifying

patches from mammograms containing calciﬁcations.

Speciﬁcally, it dealt with ﬁndings of calciﬁcations

that ﬁt into patches with a resolution of 674x674 pix-

els. Using patches helps models understand what is

essential and where to focus. A model trained on such

patches could later be transformed to process the en-

tire input, possibly through an end-to-end approach

introduced in (Shen et al., 2019).

The patches in our study had high resolution, pos-

ing hardware demands. An experiment was con-

ducted to assess the impact of reducing patch resolu-

tion to 224x224 pixels. Table 2 shows relative differ-

ences in accuracy between models trained on patches

with resolutions of 674x674 and 224x224 pixels. The

ResNet50 architecture was used, and each hyperpa-

rameter setting underwent three training runs to im-

prove statistical sampling. On average, there was

a relative accuracy decrease of 1.51% with reduced

patch resolutions.

It was observed that downsizing, whether the en-

tire image or patches, is not suitable as it leads to the

loss of crucial details. Similar behavior has been seen

in other experiments with other architectures as well.

Several studies, including (Geras et al., 2017), have

addressed the inadequacy of downsizing mammogra-

phy images.

3.2 Convolutional Architectures

In general, the most signiﬁcant contributors to the

overall accuracy of models are the training data, the

chosen architecture, and the proper setting of the

learning rate. Therefore, the next series of exper-

iments focused on trying different well-known ar-

chitectures with varying learning rate values. Ar-

chitectures tested included VGG, Inception, ResNet,

DenseNet, and EfﬁcientNet.

3.2.1 Architectures Description

• Year 2014 - VGG (Visual Geometry Group) (Si-

monyan and Zisserman, 2015): Known for its

simplicity, which can be a signiﬁcant advantage

for implementing various methods (such as model

interpretation methods). It was the ﬁrst deeper ar-

chitecture (up to 19 layers). More prone to over-

ﬁtting, computationally expensive.

• Year 2015 - Inception (GoogLeNet) (Szegedy

et al., 2014): Uses inception modules with mul-

tiple ﬁlter sizes (1x1, 3x3, 5x5) in parallel. Aims

to capture different scales of information simulta-

neously. Increased computational complexity.

• Year 2016 - ResNet (Residual Network) (He et al.,

2015): Introduced skip connections, enabling the

creation of very deep networks (up to 152 lay-

ers) at the cost of greater model complexity. Until

this point, the major issue with deep CNN net-

works was the vanishing/exploding gradient prob-

lem (Glorot and Bengio, 2010). Adding more lay-

ers improved performance.

• Year 2017 - DenseNet (Densely Connected Con-

volutional Network) (Huang et al., 2016): In-

troduces dense connectivity where each layer re-

ceives inputs from all preceding layers. Reduces

vanishing gradient problems and promotes feature

reuse. Higher memory consumption due to dense

connectivity. Computationally more intensive.

• Year 2019 - EfﬁcientNet (Tan and Le, 2019):

Employs a compound scaling method to balance

model width, depth, and resolution. Achieves bet-

ter performance with fewer parameters. Improved

efﬁciency in terms of accuracy and computational

cost.

3.2.2 Architectures Experiments

Most architectures have several versions that differ in

the number of trainable parameters. All versions (pro-

vided in the PyTorch (Paszke et al., 2019) library) that

ﬁt into the GPU memory capacity (RTX 4080 16GB)

with a set mini-batch of 8 were tried. For DenseNet,

only version 121 was used, and for EfﬁcientNet, ver-

sions B0 to B2 were tested. Inception had only one

implementation, V3.

Pre-trained weights on the ImageNet dataset were

used for each architecture. The use of pre-trained

weights resulted in faster training and better ﬁnal ac-

curacy. The following learning rate values were tested

with the Adam optimizer: 1e-2, 1e-3, 1e-4, 1e-5, 1e-

6, 1e-7. The early stopping technique was also em-

ployed. The top 3 models for each architecture are

shown in Tables 3, 4, 5, 6, 7.

From the results, it was observed that for our

combined dataset from CBIS-DDSM and OMI-DB

databases, it is most suitable to use learning rate val-

ues ranging from 1e-5 to 1e-6. Smaller values resulted

in signiﬁcant accuracy oscillations during training,

while larger values considerably extended the train-

ing duration and couldn’t achieve as high accuracy as

the mentioned values.

Surprisingly, all architectures performed very sim-

ilarly. The top 8 models (Table 8) included all ar-

BIOINFORMATICS 2024 - 15th International Conference on Bioinformatics Models, Methods and Algorithms

530

Table 2: Accuracy comparison of patches with different resolution.

Patch Resolution 674x674 Patch Resolution 224x224 Difference

LR Avg. Val. Acc. Best Val. Acc. Avg. Val. Acc. Best Val. Acc.

1e-6 75,4% 76,0% -2,79% -3,03%

1e-5 74,9% 75,4% -2,00% -1,72%

1e-3 72,5% 72,8% -0,97% -0,82%

1e-4 72,3% 72,8% -0,28% -0,41%

Table 3: Top 3 models - ResNet architecture.

Model LR Val. Acc. Train Acc.

ResNet101 1e-6 77,2% 81,3%

ResNet50 1e-6 75,9% 79,9%

ResNet152 1e-6 75,7% 90,9%

Table 4: Top 3 models - Inception architecture.

Model LR Val. Acc. Train Acc.

Inception-V3 1e-4 75,4% 80,8%

Inception-V3 1e-5 75,1% 78,4%

Inception-V3 1e-6 73,8% 78,0%

chitectures except Inception-V3. The average accu-

racy of the top 8 was 76.2%. It is important to note

that due to the small amount of training data, the ﬁ-

nal accuracy of the same model may vary by up to +-

1.5% after different training runs. In our experiment,

the ResNet101 architecture proved to be the best with

77,2% validation accuracy.

3.3 Cross-Dataset Generalization Study

The main focus of this experiment was to train three

models, each on a different dataset:

• CBIS-DDSM model – trained on CBIS-DDSM

training data.

• OMI-DB model – trained on OMI-DB training

data.

• Combined model – trained on data from both

databases.

The goal was to observe how these models per-

form on validation sets from CBIS-DDSM, OMI-DB,

and both databases combined. The experiment aimed

to answer whether a model trained on digitalized

screen ﬁlms (CBIS-DDSM) could be transferable to

data from modern digital mammography (OMI-DB)

without any ﬁne-tuning, and vice versa. Additionally,

it sought to determine if combining databases would

lead to better accuracies.

The ResNet101 architecture was used with learn-

ing rate, which proved to be the best in previous ex-

periments with architectures. Different class weights

Table 5: Top 3 models - VGG architecture.

Model LR Val. Acc. Train Acc.

VGG-16-BN 1e-6 75,7% 83,4%

VGG-19-BN 1e-6 75,7% 81,5%

VGG-13 1e-6 75,6% 83,4%

Table 6: Top 3 models - EfﬁcientNet architecture.

Model LR Val. Acc. Train Acc.

EfﬁcientNet-B1 1e-5 75,9% 78,7%

EfﬁcientNet-B2 1e-5 75,4% 89,5%

EfﬁcientNet-B0 1e-5 75,3% 82,1%

during training were applied due to the high imbal-

ance in the datasets. The settings were as follows:

• CBIS-DDSM model [Benign - 0.266, Malign -

0.734]

• OMI-DB model [Benign - 0.744, Malign - 0.256]

• Combined model [Benign - 0.556, Malign -

0.444]

Training occurred for 40 epochs, with models be-

ing saved and relevant statistics computed after each

epoch.

For the CBIS-DDSM model, the experiment re-

sults are presented in Table 9. Table shows only

epochs where the accuracy on individual validation

sets was the highest. At ﬁrst glance, it may seem that

the model performed better on the OMI-DB valida-

tion set than on its own. However, in that epoch, the

model assigned 97.7% of OMI-DB data to the malig-

nant class, resulting in an extremely low speciﬁcity

of 2.58%. On its own data, the database achieved

more reasonable results shown in Figure 7. Gener-

ally, the model tended to classify more towards the

benign class.

The OMI-DB model had a similar outcome to the

CBIS-DDSM model. According to Table 10, the best

achieved accuracy for CBIS-DDSM data was only

47.7%, which is practically unusable. As expected,

on its own validation data, the model performed bet-

ter (Figure 8), but it still leaned towards predicting the

malignant class.

The use of a combined training set signiﬁcantly

contributed to improving accuracy (Table 11). Nat-

ural class balancing also played a role. The confu-

Deep Learning in Breast Calciﬁcations Classiﬁcation: Analysis of Cross-Database Knowledge Transferability

531

Table 7: Top 3 models - DenseNet architecture.

Model LR Val. Acc. Train Acc.

DenseNet-121 1e-6 76,5% 85,5%

DenseNet-121 1e-5 76,3% 83,1%

DenseNet-121 1e-4 74,0% 87,1%

Table 8: Top 8 models - all architectures.

Model LR Val. Acc. Train Acc.

ResNet101 1e-6 77,2% 81,3%

DenseNet-121 1e-6 76,5% 85,5%

DenseNet-121 1e-5 76,3% 83,1%

EfﬁcientNet-B1 1e-5 75,9% 78,7%

ResNet50 1e-6 75,9% 79,9%

ResNet152 1e-6 75,7% 90,9%

VGG-16-BN 1e-6 75,7% 83,4%

Figure 7: Confusion matrix of the CBIS-DDSM model and

its validation set.

Figure 8: Confusion matrix of the OMI-DB model and its

validation set.

sion matrix of the combined validation sets (Figure 9)

shows that the model no longer strongly prefers either

the benign or malignant class.

From the observed experiments, it can be con-

ﬁrmed that models trained on separate databases

struggle to generalize well to data acquired using

different technologies. The creation of a combined

dataset led to an overall improvement in accuracy.

Table 9: Validation accuracies on the CBIS-DDSM model.

CBIS-DDSM Model

Validation Best Acc Epoch

CBIS-DDSM 70,5% 35

OMI-DB 72,9% 20

Combined 70,8% 26

Table 10: Validation accuracies on the OMI-DB model.

OMI-DB Model

Validation Best Acc Epoch

CBIS-DDSM 47,7% 16

OMI-DB 72,2% 17

Combined 61,1% 19

Figure 9: Confusion matrix of the Combined model and its

validation set.

3.4 Intersection of Incorrect Predictions

The last part of the study focused on incorrect pre-

dictions using models from the previous experiment

(CBIS-DDSM, OMI-DB, and Combined Model). For

each validation set, the epoch with the highest accu-

racy was chosen. It was revealed that models often

struggled with predictions on the same data, despite

the CBIS-DDSM model preferring predictions into

the benign class and the OMI-DB model into the ma-

lignant class.

The CBIS-DDSM validation set contained 17

patches with incorrect predictions across all three

models. On average, this constituted 22.6% of data

with incorrect predictions (false positives + false neg-

atives). OMI-DB had as many as 59 such patches,

averaging 50.9% of data with incorrect predictions.

For a deeper analysis of problematic data, the in-

terpretation method Grad-Cam (Gildenblat and con-

tributors, 2021) was used to visualize important re-

gions in the input patch that contributed most to the

ﬁnal decision. In the medical ﬁeld, explaining models

is essential, ensuring that the model genuinely makes

decisions based on what is important.

The analysis revealed that most CBIS-DDSM

BIOINFORMATICS 2024 - 15th International Conference on Bioinformatics Models, Methods and Algorithms

532

Table 11: Validation accuracies on the Combined model.

Combined Model

Validation Best Acc Epoch

CBIS-DDSM 76,8% 10

OMI-DB 76,1% 9

Combined 76,2% 10

Figure 10: Patch with dense tissue.

patches were of lower quality (less sharp images)

and contained dense ﬁbroglandular tissue, potentially

complicating the detection of small abnormalities

(Figure 10). The OMI-DB model performed the

worst, often failing to correctly detect clusters of mi-

crocalciﬁcations. Even when it managed to identify

a crucial area in the patch, the result was inﬂuenced

by a large amount of irrelevant tissue (Figure 11). In

comparison, the Combined model performed better,

marking the signiﬁcant area more accurately (Figure

12). However, the result for both models was an in-

correct malignant prediction. Both Combined and

CBIS-DDSM models also had trouble detecting the

crucial area in some patches.

Figure 11: Incorrect prediction of CBIS-DDSM patch with

OMI-DB model.

On OMI-DB patches, the models performed sig-

niﬁcantly better. In almost all cases, the models man-

aged to identify important clusters of microcalciﬁca-

tions. The CBIS-DDSM model had the least accurate

detection, giving importance to areas with only fatty

or ﬁbroglandular tissue without visible abnormalities.

For comparison, the CBIS-DDSM model (Figure 13)

and the Combined model (Figure 14) were examined.

Figure 12: Incorrect prediction of CBIS-DDSM patch with

Combined model.

Almost all 59 patches belonged to the benign class,

but the models classiﬁed them as malignant. This fact

was discussed with breast radiologists, clarifying that

it is very challenging or even impossible to determine

the correct class for patches with which the models

struggled using mammography alone. In such cases,

a biopsy is necessary for the most accurate determina-

tion of whether the ﬁnding contains cancerous tissue.

Figure 13: Incorrect prediction of OMI-DB patch with

CBIS-DDSM model.

Figure 14: Incorrect prediction of OMI-DB patch with

Combined model.

This part of the study demonstrated that using

a combined dataset during training positively con-

tributes to more accurate detection of important areas

in patches. However, it is essential to note that the pri-

mary purpose of the models is not detection but rather

the classiﬁcation of patches.

Deep Learning in Breast Calciﬁcations Classiﬁcation: Analysis of Cross-Database Knowledge Transferability

533

4 CONCLUSIONS

The ﬁrst goal of the study was to investigate the im-

pact of different architectures on the resulting accu-

racy of the models. The best-performing architecture

was ResNet101, but other architectures achieved very

comparable accuracies.

The second goal aimed to explore the transferabil-

ity of knowledge from a model trained on digitalized

ﬁlms (indirect digital mammography) to direct digital

mammography images and vice versa. It was con-

ﬁrmed that models are not transferable to data ob-

tained using different technology. Combining these

training data into a uniﬁed dataset signiﬁcantly con-

tributed to the overall improvement of model accu-

racy. Such a model achieved an accuracy of 76.2

The ﬁnal part involved examining patches with

incorrect predictions, speciﬁcally focusing on those

where the prediction was incorrect across all tested

models. The results were discussed with radiologists,

conﬁrming that many patches incorrectly classiﬁed as

malignant pose a signiﬁcant challenge even for med-

ical professionals and cannot be classiﬁed without a

tissue biopsy.

During the experiments, it was observed that the

decision-making in some patches involved the area

around the ﬁnding, which did not contain abnormali-

ties. This behavior could potentially be addressed, for

example, by adding a third class containing patches

from healthy tissue. Adding such a class will be the

subject of our next study.

There is a relatively wide scope for improving re-

sults, including better hyperparameter optimization,

adding augmented data, or incorporating regulariza-

tion methods. However, the primary intent of this

work was to explore the questions outlined in the

stated goals.

It is important to note that the created models may

be biased, as all training/validation data used had un-

dergone a biopsy. This means they represent ﬁndings

where doctors were uncertain whether the abnormal-

ity was benign or malignant.

ACKNOWLEDGEMENTS

This research was supported by the Ministry of Ed-

ucation, Science, Research and Sport of the Slovak

Republic under the contract No. VEGA 1/0525/23.

REFERENCES

Allred, D. (2010). Ductal carcinoma in situ: Terminology,

classiﬁcation, and natural history. Journal of the Na-

tional Cancer Institute. Monographs, 2010:134–8.

Chhatwal, J., Alagoz, O., and Burnside, E. S. (2010). Opti-

mal breast biopsy decision-making based on mammo-

graphic features and demographic factors. Operations

research, 58(6):1577–1591.

Geras, K. J., Wolfson, S., Kim, S. G., Moy, L., and Cho,

K. (2017). High-resolution breast cancer screening

with multi-view deep convolutional neural networks.

ArXiv, abs/1703.07047.

Gildenblat, J. and contributors (2021). Pytorch li-

brary for cam methods. https://github.com/jacobgil/

pytorch-grad-cam.

Glorot, X. and Bengio, Y. (2010). Understanding the dif-

ﬁculty of training deep feedforward neural networks.

Journal of Machine Learning Research - Proceedings

Track, 9:249–256.

Grimm, L. J., Rahbar, H., Abdelmalak, M., Hall, A. H., and

Ryser, M. D. (2022). Ductal carcinoma in situ: State-

of-the-art review. Radiology, 302(2):246–255. PMID:

34931856.

Halling-Brown, M. D., Warren, L. M., Ward, D., Lewis,

E., Mackenzie, A., Wallis, M. G., Wilkinson, L. S.,

Given-Wilson, R. M., McAvinchey, R., and Young,

K. C. (2021). Optimam mammography image

database: A large-scale resource of mammography

images and clinical data. Radiology: Artiﬁcial Intelli-

gence, 3(1):e200103. PMID: 33937853.

He, K., Zhang, X., Ren, S., and Sun, J. (2015). Deep

residual learning for image recognition. CoRR,

abs/1512.03385.

Huang, G., Liu, Z., and Weinberger, K. Q. (2016).

Densely connected convolutional networks. CoRR,

abs/1608.06993.

Lee, R., Gimenez, F., Hoogi, A., Miyake, K., Gorovoy, M.,

and Rubin, D. (2017). A curated mammography data

set for use in computer-aided detection and diagnosis

research. Scientiﬁc Data, 4:170–177.

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J.,

Chanan, G., Killeen, T., Lin, Z., Gimelshein, N.,

Antiga, L., Desmaison, A., Kopf, A., Yang, E., De-

Vito, Z., Raison, M., Tejani, A., Chilamkurthy, S.,

Steiner, B., Fang, L., Bai, J., and Chintala, S. (2019).

Pytorch: An imperative style, high-performance deep

learning library. In Advances in Neural Information

Processing Systems 32, pages 8024–8035. Curran As-

sociates, Inc.

Shen, L., Margolies, L., Rothstein, J., Fluder, E., McBride,

R., and Sieh, W. (2019). Deep learning to improve

breast cancer detection on screening mammography.

Scientiﬁc Reports, 9:1–12.

Simonyan, K. and Zisserman, A. (2015). Very deep convo-

lutional networks for large-scale image recognition.

Sung, H., Ferlay, J., Siegel, R. L., Laversanne, M., Soerjo-

mataram, I., Jemal, A., and Bray, F. (2021). Global

cancer statistics 2020: Globocan estimates of in-

cidence and mortality worldwide for 36 cancers in

BIOINFORMATICS 2024 - 15th International Conference on Bioinformatics Models, Methods and Algorithms

534

185 countries. CA: a cancer journal for clinicians,

71(3):209—249.

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S. E.,

Anguelov, D., Erhan, D., Vanhoucke, V., and Rabi-

novich, A. (2014). Going deeper with convolutions.

CoRR, abs/1409.4842.

Tan, M. and Le, Q. V. (2019). Efﬁcientnet: Rethink-

ing model scaling for convolutional neural networks.

CoRR, abs/1905.11946.

Deep Learning in Breast Calciﬁcations Classiﬁcation: Analysis of Cross-Database Knowledge Transferability

535