Selection of Backbone for Feature Extraction with U-Net in Pancreas

Segmentation

Alexandre de Carvalho Ara

ujo

, Joao Dallyson Sousa de Almeida

Anselmo Cardoso de Paiva

and Geraldo Braz Junior

Applied Computing Group (NCA), Federal University of Maranh

ao (UFMA), S

ao Lu

ıs, MA, Brazil

Keywords:

Image Segmentation, Pancreas Segmentation, U-Net.

Abstract:

The survival rate for pancreatic cancer is among the worst, with a mortality rate of 98%. Diagnosis in the

early stage of the disease is the main factor that deﬁnes the prognosis. Imaging scans, such as Computerized

Tomography scans, are the primary tools for early diagnosis. Computer Assisted Diagnosis tools that use these

scans usually include in their pipeline the segmentation of the pancreas as one of the initial steps for diagnosis.

This paper presents a comparative study of the use of different backbones in combination with the U-Net. This

study aims to demonstrate that using pre-trained backbones is a valuable tool for pancreas segmentation and

to provide a comparative benchmark for this task. The best result obtained was 85.96% of Dice in the MSD

dataset for the pancreas segmentation using backbone efﬁcientnetb7.

1 INTRODUCTION

Compared to other cancers, pancreatic cancer is rela-

tively rare. Some symptoms associated with this can-

cer are weight loss, jaundice, pain, anemia, and oth-

ers. In Brazil alone, this type of cancer accounts for

about 1% of all cancer diagnoses, but it represents 5%

of all cancer deaths in the country. The prognosis for

pancreatic cancer is unfavorable. Among all tumor

forms, pancreatic cancer has one of the lowest sur-

vival rates, with a mortality rate of 98%, and diagno-

sis in the early stages of the disease is the main factor

that deﬁnes the prognosis (Cheng, 2018).

Small lesions on imaging exams, such as abdomi-

nal ultrasound, computer tomography (CT), and mag-

netic resonance imaging (MRI), which are the main

tools used for early diagnosis, make it challenging for

the specialist to identify the early stages of pancre-

atic cancer. This is one of the main challenges in the

early diagnosis of this type of cancer (Dietrich and

Jenssen, 2020). These tools also come with a few

more challenges. A professional is required to ana-

lyze the numerous images produced by a CT scan, for

instance. This analysis is highly complex, thus requir-

https://orcid.org/0000-0002-0250-6211

https://orcid.org/0000-0001-7013-9700

https://orcid.org/0000-0003-4921-0626

https://orcid.org/0000-0003-3731-6431

ing the attention and experience of the specialist. By

its nature, this analysis is a repetitive process, which

may lead to physical and mental fatigue which can

cause distraction of the specialist. This makes it pos-

sible for injuries to go unnoticed, which could result

in consequences from the cancer and the medical pro-

cedures required to treat it. For this reason, technolo-

gies that supplement these image-based examinations

are needed.

Many studies in the biomedical ﬁeld focus on

Computer-Aided Diagnosis (CAD) as a tool to facil-

itate disease detection, decrease errors in diagnosis,

aid and reduce invasive procedures, and save time and

costs related to analysis. Automatic segmentation of

the pancreas is an essential topic in this area, since it

is an initial step often used in cancer analysis, lesion

detection and three-dimensional visualization of the

pancreas (Gong et al., 2019). In CT scans, this step

is hampered by the fact that the pancreas occupies a

minimal part of the scan, besides having shape, size,

and location in the abdomen with drastic variance be-

tween patients (Zheng et al., 2020).

Although there are examples of pre-trained back-

bones being used for feature extraction for pancreas

segmentation (Yu et al., 2019; Liu et al., 2019; Hu

et al., 2020), as well as the original U-Net architecture

being used as a coarse segmentation step, we have

not found a comparison between different backbones

for pancreas segmentation. That being the case, in

822

Araújo, A., Sousa de Almeida, J., Cardoso de Paiva, A. and Braz Junior, G.

Selection of Backbone for Feature Extraction with U-Net in Pancreas Segmentation.

DOI: 10.5220/0012573900003660

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2024) - Volume 3: VISAPP, pages

822-829

ISBN: 978-989-758-679-8; ISSN: 2184-4321

this paper, a comparative study is done between dif-

ferent networks as encoders or feature extractors in a

U-Net model for pancreas segmentation. This study

benchmarks U-Net architectures in pancreas segmen-

tation, providing a useful baseline for comparison for

architectural modiﬁcations of the U-Net model. In

our pipeline, we apply Hounsﬁeld value windowing

and Histogram Matching as preprocessing to increase

the contrast between the pancreas and neighboring

structures and decrease the variation in the contrast

between scans from different CT scanners. As CT

scan slices are single-channel images, we also apply

an initial convolutional layer in our models with the

purpose of generating 2 maps with the same dimen-

sions as the original slice and those maps will serve as

the second and third channel instead of simply repeat-

ing the single channel slice for the second and third

channel. This is a requirement for the ImageNet pre-

trained weights used in our backbone networks.

The rest of this paper is organized as follows: The

section 2 describes works related to the problem ad-

dressed. The sections 3 and 4 describe the method

used for this work, as well as the results obtained, dis-

cussions and comparisons with other works. Finally,

the section 5 presents the ﬁnal considerations about

the results and proposals for future work.

2 RELATED WORK

For pancreas segmentation and other segmentation

tasks in medical images, U-Net (Ronneberger et al.,

2015) is one of the most widely used architectures in

the literature. This is due to the quality of segmenta-

tion provided by this network. Due to its simple yet

effective structure, composed of an encoder and a de-

coder, many proposals in the literature aim to modify

the U-Net to improve its ability to segment the pan-

creas.

One example is changes to the original network

blocks with the addition of layers within the blocks

(Fan and Tai, 2019). Another possibility is to modify,

in addition to the network blocks, the skip connections

of the network (Oktay et al., 2018; Ma et al., 2021;

Dai et al., 2023). Another possible improvement is

in the bottleneck layer, such as using dilated convo-

lutions to increase the receptive ﬁeld of convolutions

(Giddwani et al., 2020).

It can also be observed the use of more than one

U-Net to perform the segmentation of the pancreas,

such as the combination of two 3D U-Net proposed on

(Zhao et al., 2019), two Deformable U-Net on (Huang

et al., 2019) and one 2.5D U-Net with Multi-View 2D

U-Net (Li et al., 2021a). In these approaches, the out-

put of the ﬁrst network is a coarse segmentation that

serves as input to the second network that reﬁnes the

segmentation to obtain the ﬁnal result. Other possibil-

ities are to use a U-Net in combination with another

network, sequentially (Yang et al., 2019), in a par-

allel manner (Cai et al., 2019) or with an ensemble

of U-Net’s (Liu et al., 2019). In these approaches,

one combines the U-Net with FCN’s or more com-

plex networks in a sequential manner to improve the

segmentation provided by the U-Net.

However, one point that needs to be tackled is us-

ing different backbones as a U-net encoder for pan-

creas segmentation. Using pre-trained backbones can

provide good results, with different backbones having

different applications where they stand out (Ahmed

et al., 2022). Hence, a study of the use of different

backbones for U-Net is performed in this paper. Ta-

ble 1 compares the related works.

Table 1: Researches with U-Net to segment the pancreas.

Authors Dataset Dice

(Fan and Tai, 2019) NIH 68.45%

(Cai et al., 2019) MSD 74.30%

(Boers et al., 2020) NIH 78.10%

(Giddwani et al., 2020) NIH 83.30%

(Liu et al., 2019) NIH 84.10%

(Zhao et al., 2019) NIH 85.99%

(Huang et al., 2019) NIH 87.25%

(Yang et al., 2019) NIH 87.82%

(Ma et al., 2021) NIH 88.48%

(Li et al., 2021a) MSD 88.52%

(Dai et al., 2023) MSD 91.22%

3 METHOD

This section details the preprocessing techniques, as

well as the backbone choice method used in combi-

nation with U-net.

We use the well-known U-Net architecture (Ron-

neberger et al., 2015) to address our pancreas segmen-

tation task. The U-Net architecture is a fully convolu-

tional network in the encoder-decoder neural network

family. The spatial information is downsampled in

the encoding stage using convolution and pooling op-

erations. This section serves as a feature extraction

stage. The spatial information is upsampled back to

the original size in the decoding section using convo-

lution transpose. High-resolution features from the

encoder are concatenated to the corresponding fea-

tures from the decoder via skip connections, infusing

high-resolution information into the decoder. The ad-

dition of skip connections provides the network with

Selection of Backbone for Feature Extraction with U-Net in Pancreas Segmentation

823

its “U” shape.

In our work, we experiment with different network

architectures pre-trained on the ImageNet dataset as

the feature extractor, replacing the standard encoder

of U-Net.

3.1 Preprocessing

The preprocessing step aims to increase the contrast

of the pancreas relative to the other organs present

in the scan, providing better features for the follow-

ing steps of the method. To achieve this, we per-

form Hounsﬁeld (HU) values windowing and apply

Histogram Matching.

HU value windowing is a process in which the

grayscale values of the voxel values of a CT are

truncated to highlight speciﬁc structures (Seeram,

2015). Two thresholds deﬁne this process. From

these thresholds, the truncation of HU values is per-

formed. Any HU value greater than the upper thresh-

old is truncated to the upper threshold value, and any

HU value less than the lower threshold is truncated to

the lower threshold value. The thresholds used were

[-150, 250]. This range highlights soft tissue in the

abdomen, a category to which the pancreas belongs

(Mo et al., 2020).

After windowing the HU values, the Histogram

Matching (HM) algorithm is applied. Given two his-

tograms, this algorithm ﬁnds a color mapping that ap-

proximates one histogram to the other (Castleman,

1996). However, a reference histogram must be de-

ﬁned, which will be the approximated histogram. The

reference histogram was deﬁned empirically. First,

we trained a 3D U-Net where the only preprocessing

applied was the windowing of HU values. Then, the

volume of the training set where the network obtained

the best Dice for segmentation of the pancreas was

chosen. Using this volume as a basis, the HM was ap-

plied to all volumes in the set, with the histogram of

the chosen volume as reference. Finally, the volumes

were transformed into slices. We employ 3D U-Net

because using one histogram for the entire scan leads

to superior contrast normalization.

Figure 1 illustrates the application of preprocess-

ing. The red square represents the pancreas and the

area around it. It is possible to see that after pre-

processing, the pancreas becomes more evident when

comparing with surrounding structures.

3.2 Networks for Feature Extraction

For training, the U-Net architecture is used as a basis

and the network encoder is exchanged for different ar-

chitectures pre-trained on the ImageNet dataset such

(A)

(B)

Figure 1: Preprocessing step: (A) original cut and (B) cut

after preprocessing.

as VGG (Simonyan and Zisserman, 2014), ResNet

(He et al., 2016), EfﬁcientNet, (Koonce and Koonce,

2021), MobileNet (Howard et al., 2017), Inceptionv3

(Szegedy et al., 2016) and DenseNet (Iandola et al.,

2014). Two versions of each of these architectures,

the version with the most parameters and the version

with the fewest parameters, were used. Speciﬁcally

for EfﬁcientNet three models are used, as this network

has several versions. The smallest (b0) the median

(b3) and the largest (b7) were chosen. The chosen

versions for each architecture can be seen in Table 2.

Top-1 Accuracy refers to the accuracy obtained on the

ImageNet validation set, Number of Parameters refers

to the number of trainable parameters in the architec-

ture, and depth refers to the number of layers with

parameters, such as convolution layers and batch nor-

malization layers.

3.3 Models Training

For the training of the models, we use the same pre-

processing and the same hyperparameters, such as op-

timizer and learning rate for all models. The loss

function used for training is a combination of the Dice

Loss (Sudre et al., 2017) and the Focal Loss (Lin

et al., 2017b). Dice Loss is calculated as:

dice

(t p, f p, f n) =

(1 + β

) ·t p

(1 + β

) · f p + β

· f n + f p

(1)

where t p are true positives, f p are false positives and

f n are false negatives.

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

824

Table 2: Summary of the backbones used in this study.

Number of parameters in millions.

Method

Top-1

Accuracy

Params Depth

VGG16 71.3% 138.4M 16

VGG19 71.3% 143.7M 19

ResNet50 74.9% 25.6M 107

ResNet152 76.6% 60.4M 311

InceptionV3 77.9% 23.9M 189

InceptionResNetV2 80.3% 55.9M 449

MobileNet 70.4% 4.3M 55

MobileNetV2

71.3% 3.5M 105

DenseNet121 75.0% 8.1M 242

DenseNet201 77.3% 20.2M 402

EfﬁcientNetB0 77.1% 5.3M 132

EfﬁcientNetB3 81.6% 12.3M 210

EfﬁcientNetB7 84.3% 66.7M 438

Meanwhile, Focal Loss is calculated as:

f ocal

(gt, pr) = −gtα(1 − pr)

log(pr)

−(1 − gt)αpr

log(1 − pr)

(2)

where gt is ground truth, pr is the model prediction, α

is the weighting factor and γ is the focusing parameter

for modulating factor (1 − p).

These two loss functions address class unbalance,

and their combination aims to take advantage of the

positives of each while minimizing their disadvan-

tages. As the values of Focal Loss function are gener-

ally smaller than the values of Dice Loss function by

a high prediction conﬁdence cases, we multiply the

Focal Loss by a factor of 10 to balance the two met-

rics, as this was the factor observed empirically to best

balance the both functions. The ﬁnal loss function is

deﬁned in Equation 3.

total loss = Dice loss + 10 ∗ Focal loss (3)

The pre-trained weights are unfrozen. This is nec-

essary as ImageNet is a dataset made of images from

diverse domains, but our targets are CT scan images,

which are not present on the ImageNet dataset. So,

unfreezing the backbone weights enables specialized

learning for CT scan images features extraction based

on already learned feature extraction.

4 RESULTS

4.1 Datasets

We evaluate the performance of the different models

on two publicly available datasets: The ﬁrst dataset

contains 281 contrast-enhanced CT scans with la-

beled pancreas and pancreatic tumor from the Med-

ical Segmentation Decathlon (MSD) challenge pan-

creas segmentation dataset (Simpson et al., 2019),

where each CT volume has dimensions 512 x 512 x

Z, and Z ∈ [37, 751]. Following similar studies (Chen

et al., 2022), pancreas and pancreatic tumor labels

were combined into a single target label. The sec-

ond dataset contains 82 abdominal contrast-enhanced

CT scans from the National Institutes of Health (NIH)

Clinical Center pancreas segmentation dataset (Roth

et al., 2015). Each volume has dimensions 512 x 512

x Z, where Z ∈ [181, 466].

4.2 Evaluation Metric

To evaluate segmentation performance, we use Dice

similarity coefﬁcient (Dice). Dice is the most com-

mon metric used for evaluating segmentation results

in medical image segmentation (Dai et al., 2023).

This metric represents the harmonic mean of the pre-

cision and sensitivity and is calculated as in Equation

Dice =

2t p

2t p + f p + f n

(4)

Where t p are the true positives, f p are the false

positives, and f n are the false negatives. In our work,

these translate to t p being equivalent to pancreas pix-

els that the model correctly classiﬁed as pancreas,

background pixels that the model mistakenly classi-

ﬁed as pancreas as f p and pancreas pixels that the

model wrongly classiﬁed as background pixels as f n.

This metric ranges from 0 to 1, with higher values

representing better segmentation.

4.3 Implementation Details

The neural network models were implemented us-

ing the Python 3.8, along with the libraries Ten-

sorﬂow 2.6.0, Keras 2.9.0 and Segmentation models

(Iakubovskii, 2019). To manipulate the NIFTI vol-

umes, the Nibabel 3.2.1 library was used, and to

manipulate the slices extracted from the volumes,

the Opencvpython 4.5.3.56 library was used. An

NVIDIA RTX 3060 graphics card with 12 GB of

memory was used for all experiments. Due to mem-

ory limitations, the images that entered the network

were resized from 512 x 512 to 256 x 256.

For all experiments, training parameters were kept

the same. The models were trained for 100 epochs

with initial learning rate (lr) set to 0.0001 with the

Adam optimizer and lr decay by a factor of 0.1 if val-

idation loss has not decreased in the last 10 epochs.

For all models, the data was preprocessed and split

identically for all training and test experiments.

Selection of Backbone for Feature Extraction with U-Net in Pancreas Segmentation

825

4.4 Segmentation Result on MSD

Dataset

To evaluate the performance of the trained models, we

compare them with networks that perform well on the

MSD dataset. The results obtained can be seen in Ta-

ble 3, where we compare the different backbones used

with works in the literature for pancreas segmentation

on the MSD dataset. The average Dice for all patient

slices is used to represent the segmentation quality for

the patient. It can be seen that the use of U-Net with a

backbone already trained on ImageNet obtains good

results, being superior to some works found in the lit-

erature. The best result obtained was with the efﬁcien-

tenetb7 architecture as a backbone, obtaining a Dice

2.56% lower than the second best result and 5.26%

lower than the best result found in the literature for

this dataset.

Table 3: Pancreas segmentation on the MSD dataset.

Method

Pacient (Dice)

(Cai et al., 2019) 74.30%

resnet50 76.12%

(Boers et al., 2020) 78.10%

inceptionv3 79.71%

(Zhu et al., 2019)

79.94%

vgg16 81.54%

efﬁcientnetb0 81.88%

vgg19 82.24%

(Zhang et al., 2021b) 82.74%

resnet152 83.10%

inceptionresnetv2 83.78%

mobilenet 84.33%

densenet121 84.52%

mobilenetv2 84.61%

(Fang et al., 2019) 84.71%

efﬁcientnetb3 85.08%

densenet201 85.33%

(Zhang et al., 2021a) 85.56%

efﬁcientnetb7 85.96%

(Li et al., 2021a) 88.52%

(Dai et al., 2023) 91.22%

In Table 3 it can been seen that the results found

vary between [76.12%; 85.96%]. Despite this varia-

tion of 9.84% between the worst and best result ob-

tained, the top 5 backbones vary between [84.52%;

85.96%]. This shows that the choice of backbone has

a high impact and also that backbones such as Mo-

bileNetv2, EfﬁcientNet b3 and b7 and densenet121

and densenet201 have such close results that the

choice of backbone will depend on the application,

where GPU memory limitations or inference time

may be the determining factors for the best backbone.

Figure 2 shows examples of the pancreas segmen-

tation performed by the U-Net with efﬁcientnetb7 as

the backbone on multiple slices of the same patient.

For this patient, the Dice obtained was 93.39%, indi-

cating a segmentation very close to the ground truth.

As you can see, in three of the four examples shown,

the segmentation result is quite similar with the label.

In the second slice, an error case is presented, where

the network did not identify the connected body of the

pancreas. One of the reasons that may have caused

this error is the texture change that exists in this slice.

It is possible to notice that in the location where the

failure occurs, the scan presents a change in the inten-

sity of the pancreas pixels, making it difﬁcult for the

network to recognize it.

Figure 2: Examples of pancreas segmentation with the U-

Net using backbone efﬁcientnetb7. From left to right, input

slice, expert labelling (ground truth) and segmentation per-

formed by the model.

Figure 3 shows three examples of comparison be-

tween segmentation with different backbones. In

these examples, the three networks followed the same

pattern observed in Table 3, where ResNet50 obtained

the worst result among the backbones tested, VGG19

obtained a better result than ResNet50, but lower than

EfﬁcientNetb7, which obtained the best average result

for the dataset MSD. ResNet50 had difﬁculty detect-

ing the pancreas at two separate points in row A, a be-

havior that was repeated in VGG19, while Efﬁcient-

Netb7 was able to capture this non-connectivity at this

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

826

cut. In row C, it is observed that ResNet50 had great

difﬁculty in segmenting the pancreas. In that same

slice, VGG19 can capture more information from the

pancreas, but there is still noticeable loss. Efﬁcient-

Netb7 achieved a segmentation very close to the ex-

pert’s marking, reﬂecting the average result obtained

for the dataset as a whole.

Figure 3: Segmentation with 3 different backbones,

ResNet50 (79.02% patient dice), VGG19 (88.36% patient

dice) and EfﬁcientNetb7 (91.11% patient dice).

4.5 Segmentation Result on NIH

Dataset

We also evaluate the performance of the trained mod-

els with networks that perform well on the NIH

dataset. The results obtained can be seen in Table 4.

The average Dice for all patient slices is used to repre-

sent the segmentation quality for the patient. As with

the MSD dataset, the use of U-Net with a backbone al-

ready trained on ImageNet obtains good results, with

being superior to works already found in the litera-

ture. The best result obtained was also with the ef-

ﬁcientenetb7 architecture as a backbone, obtaining a

Dice 2.11% lower than the third best result, 2.18%

lower than the second best result and 4.5% lower than

the best result in the literature for this dataset.

In Table 4 it can be seen that the results found vary

between [63.08%, 85.39%]. This is a much larger gap

than the one found on the MSD dataset. The VGG

family models had difﬁculty learning relevant features

for segmentation. Such variations can be attributed to

the difference in datasets, as the NIH dataset generally

presents greater difﬁculty for segmentation (Dai et al.,

2023). The choice of backbone shows an even bigger

impact in this dataset, but The top 5 backbones, even

though with a bigger range in performance, vary be-

tween [83.94%; 85.39%], which is a relatively small

gap in performance. As such, the choice of backbone

will mainly depend on the application, where GPU

memory limitations or inference time may be the de-

termining factors for the best backbone.

Table 4: Pancreas segmentation on the NIH dataset.

Method

Pacient (Dice)

vgg16 63.08%

vgg19 63.09%

densenet201 69.64%

resnet152 73.09%

resnet50 80.53%

efﬁcientnetb0 80.99%

efﬁcientnetb3 82.68%

(Li et al., 2021b) 83.21%

mobilenet 83.44%

inceptionresnetv2 83.94%

densenet121 84.02%

inceptionv3 84.24%

mobilenetv2 84.35%

(Zhang et al., 2021b) 84.47%

(Zhang et al., 2021a) 84.90%

(Chen et al., 2022) 85.19%

(Li et al., 2021a) 85.35%

efﬁcientnetb7 85.39%

(Li et al., 2020b) 87.50%

(Li et al., 2020a) 87.57%

(Dai et al., 2023) 89.89%

5 CONCLUSION

The present study aimed to evaluate the use of differ-

ent backbones as encoder for the U-Net architecture

for pancreas segmentation. Several models pretrained

on the ImageNet dataset were compared. The best

combination found was U-net with EfﬁcientNetb7,

which showed positive results that are competitive

with the literature in the two most commonly used

datasets. One advantage of using backbones as an en-

coder is the good segmentation results with ease of

implementation, which can be useful in initial seg-

mentation steps for more complex models, such as

coarse-to-ﬁne models. This study can also serve as

a benchmark for comparison for pancreas segmenta-

tion, providing an useful baseline for comparison of

architectural modiﬁcations for the U-Net model. As

a future work, it is suggested to investigate new ar-

chitectures such as Feature Pyramid Network (FPN)

(Lin et al., 2017a), LinkNet (Chaurasia and Culur-

ciello, 2017) and Pyramid Scene Parsing Network

(PSPN) (Zhao et al., 2017) in place of U-Net as the

main architecture to further improve the baseline for

architectural modiﬁcations in those models for pan-

creas segmentation. Another line of research could

be the combination of models in an ensemble, as seen

in (Georgescu et al., 2023).

Selection of Backbone for Feature Extraction with U-Net in Pancreas Segmentation

827

ACKNOWLEDGMENTS

The authors acknowledge the Coordenac¸

ao de

Aperfeic¸oamento de Pessoal de N

ıvel Superior

(CAPES),Finance Code 001, Conselho Nacional

de Desenvolvimento Cient

ıﬁco e Tecnol

ogico

(CNPq), and Fundac¸

ao de Amparo

a Pesquisa

Desenvolvimento Cient

ıﬁco e Tecnol

ogico do

Maranh

ao (FAPEMA) (Brazil), Empresa Brasileira

de Servic¸os Hospitalares (Ebserh) Brazil (Grant

number 409593/2021-4) for the ﬁnancial support.

REFERENCES

Ahmed, I., Carni, D. L., Balestrieri, E., and Lamonaca, F.

(2022). Comparison of u-net backbones for morpho-

metric measurements of white blood cell. In 2022

IEEE International Symposium on Medical Measure-

ments and Applications (MeMeA), pages 1–6. IEEE.

Boers, T., Hu, Y., Gibson, E., Barratt, D., Bonmati,

E., Krdzalic, J., van der Heijden, F., Hermans, J.,

and Huisman, H. (2020). Interactive 3d u-net for

the segmentation of the pancreas in computed to-

mography scans. Physics in Medicine and Biology,

65(6):065002.

Cai, J., Xia, Y., Yang, D., Xu, D., Yang, L., and Roth, H.

(2019). End-to-end adversarial shape learning for ab-

domen organ deep segmentation. In Machine Learn-

ing in Medical Imaging: 10th International Workshop,

MLMI 2019, Held in Conjunction with MICCAI 2019,

Shenzhen, China, October 13, 2019, Proceedings 10,

pages 124–132. Springer.

Castleman, K. R. (1996). Digital image processing. Pren-

tice Hall Press.

Chaurasia, A. and Culurciello, E. (2017). Linknet: Exploit-

ing encoder representations for efﬁcient semantic seg-

mentation. In 2017 IEEE visual communications and

image processing (VCIP), pages 1–4. IEEE.

Chen, H., Liu, Y., Shi, Z., and Lyu, Y. (2022). Pancreas

segmentation by two-view feature learning and multi-

scale supervision. Biomedical Signal Processing and

Control, 74:103519.

Cheng, S. (2018). Punc¸

ao ecoendosc

opica de massas

olidas pancre

aticas por t

ecnica de press

ao negativa

versus capilaridade: estudo prospectivo e random-

izado. PhD thesis, Universidade de S

ao Paulo.

Dai, S., Zhu, Y., Jiang, X., Yu, F., Lin, J., and Yang,

D. (2023). Td-net: Trans-deformer network for

automatic pancreas segmentation. Neurocomputing,

517:279–293.

Dietrich, C. F. and Jenssen, C. (2020). Modern ultra-

sound imaging of pancreatic tumors. Ultrasonogra-

phy, 39(2):105.

Fan, J. and Tai, X.-c. (2019). Regularized unet for auto-

mated pancreas segmentation. In Proceedings of the

Third International Symposium on Image Computing

and Digital Medicine, pages 113–117.

Fang, C., Li, G., Pan, C., Li, Y., and Yu, Y. (2019). Glob-

ally guided progressive fusion network for 3d pan-

creas segmentation. In Medical Image Computing and

Computer Assisted Intervention–MICCAI 2019: 22nd

International Conference, Shenzhen, China, October

13–17, 2019, Proceedings, Part II 22, pages 210–218.

Springer.

Georgescu, M.-I., Ionescu, R. T., and Miron, A. I. (2023).

Diversity-promoting ensemble for medical image seg-

mentation. In Proceedings of the 38th ACM/SIGAPP

Symposium on Applied Computing, pages 599–606.

Giddwani, B., Tekchandani, H., and Verma, S. (2020). Deep

dilated v-net for 3d volume segmentation of pancreas

in ct images. In 2020 7th International Conference on

Signal Processing and Integrated Networks (SPIN),

pages 591–596. IEEE.

Gong, Z., Zhu, Z., Zhang, G., Zhao, D., and Guo, W.

(2019). Convolutional neural networks based level set

framework for pancreas segmentation from ct images.

In Proceedings of the Third International Symposium

on Image Computing and Digital Medicine, pages 27–

30.

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-

ual learning for image recognition. In Proceedings of

the IEEE conference on computer vision and pattern

recognition, pages 770–778.

Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D.,

Wang, W., Weyand, T., Andreetto, M., and Adam,

H. (2017). Mobilenets: Efﬁcient convolutional neu-

ral networks for mobile vision applications. arXiv

preprint arXiv:1704.04861.

Hu, P., Li, X., Tian, Y., Tang, T., Zhou, T., Bai, X., Zhu, S.,

Liang, T., and Li, J. (2020). Automatic pancreas seg-

mentation in ct images with distance-based saliency-

aware denseaspp network. IEEE journal of biomedi-

cal and health informatics, 25(5):1601–1611.

Huang, M., Huang, C., Yuan, J., and Kong, D. (2019).

Fixed-point deformable u-net for pancreas ct segmen-

tation. In Proceedings of the Third International Sym-

posium on Image Computing and Digital Medicine,

pages 283–287.

Iakubovskii, P. (2019). Segmentation models. https://gith

ub.com/qubvel/segmentation\ models.

Iandola, F., Moskewicz, M., Karayev, S., Girshick, R., Dar-

rell, T., and Keutzer, K. (2014). Densenet: Imple-

menting efﬁcient convnet descriptor pyramids. arXiv

preprint arXiv:1404.1869.

Koonce, B. and Koonce, B. (2021). Efﬁcientnet. Convo-

lutional Neural Networks with Swift for Tensorﬂow:

Image Recognition and Dataset Categorization, pages

109–123.

Li, F., Li, W., Shu, Y., Qin, S., Xiao, B., and Zhan, Z.

(2020a). Multiscale receptive ﬁeld based on resid-

ual network for pancreas segmentation in ct im-

ages. Biomedical Signal Processing and Control,

57:101828.

Li, H., Li, J., Lin, X., and Qian, X. (2020b). A model-driven

stack-based fully convolutional network for pancreas

segmentation. In 2020 5th International Confer-

ence on Communication, Image and Signal Process-

ing (CCISP), pages 288–293. IEEE.

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

828

Li, J., Lin, X., Che, H., Li, H., and Qian, X. (2021a). Pan-

creas segmentation with probabilistic map guided bi-

directional recurrent unet. Physics in Medicine and

Biology, 66(11):115010.

Li, M., Lian, F., and Guo, S. (2021b). Automatic pancreas

segmentation using double adversarial networks with

pyramidal pooling module. IEEE Access, 9:140965–

140974.

Lin, T.-Y., Doll

ar, P., Girshick, R., He, K., Hariharan, B.,

and Belongie, S. (2017a). Feature pyramid networks

for object detection. In Proceedings of the IEEE con-

ference on computer vision and pattern recognition,

pages 2117–2125.

Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Doll

ar, P.

(2017b). Focal loss for dense object detection. In

Proceedings of the IEEE international conference on

computer vision, pages 2980–2988.

Liu, S., Yuan, X., Hu, R., Liang, S., Feng, S., Ai, Y., and

Zhang, Y. (2019). Automatic pancreas segmentation

via coarse location and ensemble learning. IEEE Ac-

cess, 8:2906–2914.

Ma, H., Zou, Y., and Liu, P. X. (2021). Mhsu-net:

A more versatile neural network for medical image

segmentation. Computer Methods and Programs in

Biomedicine, 208:106230.

Mo, J., Zhang, L., Wang, Y., and Huang, H. (2020). It-

erative 3d feature enhancement network for pancreas

segmentation from ct images. Neural Computing and

Applications, 32:12535–12546.

Oktay, O., Schlemper, J., Folgoc, L. L., Lee, M., Heinrich,

M., Misawa, K., Mori, K., McDonagh, S., Hammerla,

N. Y., Kainz, B., et al. (2018). Attention u-net: Learn-

ing where to look for the pancreas. arXiv preprint

arXiv:1804.03999.

Ronneberger, O., Fischer, P., and Brox, T. (2015). U-

net: Convolutional networks for biomedical image

segmentation. In Medical Image Computing and

Computer-Assisted Intervention–MICCAI 2015: 18th

International Conference, Munich, Germany, October

5-9, 2015, Proceedings, Part III 18, pages 234–241.

Springer.

Roth, H. R., Lu, L., Farag, A., Shin, H.-C., Liu, J., Turkbey,

E. B., and Summers, R. M. (2015). Deeporgan: Multi-

level deep convolutional networks for automated pan-

creas segmentation. In Medical Image Computing and

Computer-Assisted Intervention–MICCAI 2015: 18th

International Conference, Munich, Germany, October

5-9, 2015, Proceedings, Part I 18, pages 556–564.

Springer.

Seeram, E. (2015). Computed Tomography-E-Book: Phys-

ical Principles, Clinical Applications, and Quality

Control. Elsevier Health Sciences.

Simonyan, K. and Zisserman, A. (2014). Very deep con-

volutional networks for large-scale image recognition.

arXiv preprint arXiv:1409.1556.

Simpson, A. L., Antonelli, M., Bakas, S., Bilello, M.,

Farahani, K., Van Ginneken, B., Kopp-Schneider, A.,

Landman, B. A., Litjens, G., Menze, B., et al. (2019).

A large annotated medical image dataset for the de-

velopment and evaluation of segmentation algorithms.

arXiv.

Sudre, C. H., Li, W., Vercauteren, T., Ourselin, S., and

Jorge Cardoso, M. (2017). Generalised dice over-

lap as a deep learning loss function for highly unbal-

anced segmentations. In Deep Learning in Medical

Image Analysis and Multimodal Learning for Clini-

cal Decision Support: Third International Workshop,

DLMIA 2017, and 7th International Workshop, ML-

CDS 2017, Held in Conjunction with MICCAI 2017,

ebec City, QC, Canada, September 14, Proceed-

ings 3, pages 240–248. Springer.

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wo-

jna, Z. (2016). Rethinking the inception architecture

for computer vision. In Proceedings of the IEEE con-

ference on computer vision and pattern recognition,

pages 2818–2826.

Yang, Z., Zhang, L., Zhang, M., Feng, J., Wu, Z., Ren, F.,

and Lv, Y. (2019). Pancreas segmentation in abdomi-

nal ct scans using inter-/intra-slice contextual informa-

tion with a cascade neural network. In 2019 41st An-

nual International Conference of the IEEE Engineer-

ing in Medicine and Biology Society (EMBC), pages

5937–5940. IEEE.

Yu, W., Chen, H., and Wang, L. (2019). Dense attentional

network for pancreas segmentation in abdominal ct

scans. In Proceedings of the 2nd International Con-

ference on Artiﬁcial Intelligence and Pattern Recogni-

tion, pages 83–87.

Zhang, D., Zhang, J., Zhang, Q., Han, J., Zhang, S., and

Han, J. (2021a). Automatic pancreas segmentation

based on lightweight dcnn modules and spatial prior

propagation. Pattern Recognition, 114:107762.

Zhang, Y., Wu, J., Liu, Y., Chen, Y., Chen, W., Wu, E. X.,

Li, C., and Tang, X. (2021b). A deep learning frame-

work for pancreas segmentation with multi-atlas reg-

istration and 3d level-set. Medical Image Analysis,

68:101884.

Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017).

Pyramid scene parsing network. In Proceedings of

the IEEE conference on computer vision and pattern

recognition, pages 2881–2890.

Zhao, N., Tong, N., Ruan, D., and Sheng, K. (2019). Fully

automated pancreas segmentation with two-stage 3d

convolutional neural networks. In Medical Im-

age Computing and Computer Assisted Intervention–

MICCAI 2019: 22nd International Conference, Shen-

zhen, China, October 13–17, 2019, Proceedings, Part

II 22, pages 201–209. Springer.

Zheng, H., Chen, Y., Yue, X., Ma, C., Liu, X., Yang, P.,

and Lu, J. (2020). Deep pancreas segmentation with

uncertain regions of shadowed sets. Magnetic Reso-

nance Imaging, 68:45–52.

Zhu, Z., Liu, C., Yang, D., Yuille, A., and Xu, D. (2019). V-

nas: Neural architecture search for volumetric medical

image segmentation. In 2019 International conference

on 3d vision (3DV), pages 240–248. IEEE.

Selection of Backbone for Feature Extraction with U-Net in Pancreas Segmentation

829