EFL-Net: An Efﬁcient Lightweight Neural Network Architecture for

Retinal Vessel Segmentation

Nasrin Akbari and Amirali Baniasadi

Department of Electrical and Computer Engineering, University of Victoria, Victoria, Canada

Keywords:

Blood Vessel Segmentation, Deep Learning, Image Processing.

Abstract:

Accurate segmentation of retinal vessels is crucial for the timely diagnosis and treatment of conditions like

diabetes and hypertension, which can prevent blindness. Deep learning algorithms have been successful in

segmenting retinal vessels, but they often require a large number of parameters and computations. To address

this, we propose an efﬁcient and fast lightweight network (EFL-Net) for retinal blood vessel segmentation.

EFL-Net includes the ResNet branches shufﬂe block (RBS block) and the Dilated Separable Down block

(DSD block) to extract features at various granularities and enhance the network receptive ﬁeld, respectively.

These blocks are lightweight and can be easily integrated into existing CNN models. The model also uses

PixelShufﬂe as an upsampling layer in the decoder, which has a higher capacity for learning features than

deconvolution and interpolation approaches. The model was tested on the Drive and CHASEDB1 datasets and

achieved excellent results with fewer parameters compared to other networks such as ladder net and DCU-Net.

EFL-Net achieved F1 measures of 0.8351 and 0.8242 on the CHASEDB1 and DRIVE datasets, respectively,

with 0.340 million parameters, compared to 1.5 million for ladder net and 1 million for DCU-Net.

1 INTRODUCTION

The retina is a layer of light-sensitive nerve tissue

located at the back of the eye that receives images

and transmits them to the brain as electric signals

through the optic nerve (Kolb, 2012). Changes in the

retina and optic nerve may indicate certain diseases

such as glaucoma (Salmon, 2019) or hypertensive

retinopathy (HR) (Irshad and Akram, 2014), which

can cause blurring of vision. As we age, the oxida-

tive load increases, leading to higher levels of oxida-

tive stress which can cause pathologies such as age-

related macular degeneration or neuropathic compli-

cations of diabetes in the eye (Payne et al., 2014).

Diabetic retinopathy (DR) is a condition that affects

individuals with diabetes, causing gradual damage to

the retina and potentially leading to vision loss. It is

a major complication of diabetes that can threaten vi-

sion.

Primary eye care (PEC) can use a funduscopy ex-

amination to give an early screening for drug-induced

retinal toxicity (Alberta et al., 2022). In the procedure

of funduscopy examination, an ophthalmologist looks

at the structures of the retina, retinal blood vessels,

and optic nerve head (disk) of the eye (Walker et al.,

1990). There are several ways to analyze retinal im-

ages and ﬁnd diseases, one of which is retinal image

segmentation, which can be divided into manual and

automatic methods. Manual segmentation takes time

and expertise, while automated algorithms are useful

for early detection and treatment of eye diseases due

to their increased accuracy, reduced cost, and faster

speed compared to manual segmentation.

U-Net (Ronneberger et al., 2015) is an automatic

model used to segment vessels in retina images and

is one of the successful medical and biomedical im-

age segmentation methods based on deep neural net-

works. Humans often struggle to distinguish blood

vessel images from their distorted backgrounds, mak-

ing it more difﬁcult to detect diseases. As a result,

developing practical algorithms to identify vessel im-

ages and their surroundings would be useful (Yang

et al., 2022).

Deep neural networks (DNNs) have been shown

to be effective in automatically learning reliable and

complex features from raw data without the need for

manual feature engineering (Ronneberger et al., 2015;

Zhou et al., 2021; Gu et al., 2019; Li et al., 2020).

These techniques have achieved signiﬁcant success in

the ﬁelds of computer vision and medical health. Re-

search on retinal vessel segmentation using DNNs has

proposed various architectures for this task, however,

920

Akbari, N. and Baniasadi, A.

EFL-Net: An Efﬁcient Lightweight Neural Network Architecture for Retinal Vessel Segmentation.

DOI: 10.5220/0011754700003417

In Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 4: VISAPP, pages

920-927

ISBN: 978-989-758-634-7; ISSN: 2184-4321

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

Figure 1: A retinal image from DRIVE dataset (left) and

retinal vessel segmentation (right) (Staal et al., 2004).

our observations indicate that many of these models

are not optimal in terms of architecture and num-

ber of parameters. Figure 2 illustrates the F1 mea-

sure and complexity of well-known DNN-based mod-

els (Azad et al., 2019; Zhou et al., 2021; Gu et al.,

2019; Mou et al., 2021; Li et al., 2020; Zhuang, 2018;

Zahangir Alom et al., 2018; Guo et al., 2021; Yang

et al., 2022). More research is needed to improve the

number of parameters and accuracy of current mod-

els. Our research aims to design a lightweight CNN

architecture with fewer parameters that can achieve

similar or better results in retinal vessel segmentation

compared to state-of-the-art networks.

The remainder of the paper is structured as fol-

lows: Section 2 describes the proposed network

architecture, Section 3 compares experimental re-

sults to state-of-the-art neural networks using the

CHASEDB1 and DRIVE datasets, and Section 4 pro-

vides concluding remarks and discussion of future re-

search directions.

2 METHODOLOGY

U-Net is a convolutional neural network that was

designed for image segmentation in the ﬁeld of

biomedicine (Ronneberger et al., 2015). It is an im-

provement on the previously developed FCN - ”Fully

convolutional networks for semantic segmentation”

(Long et al., 2015). Its ability to perform well with

small training datasets has made it the most reliable

architecture for the semantic segmentation of biolog-

ical images.

The U-Net architecture consists of four encoder

blocks on the left side, known as the contracting path,

and four decoder blocks on the right side referred

to as the expansive path. The encoder captures fea-

tures from the input image and reduces its resolution

through pooling layers, while the decoder part recon-

structs the image and restores object details through

skip connections between the encoder and decoder

layers. While the U-Net model has been successful

in various tasks, it has several drawbacks including a

large number of parameters (31.031 million) and poor

performance on retinal vessel segmentation (F1 score

of 0.7783 on the CHASEDB1 dataset). In order to ad-

dress these issues, we analyzed additional papers and

their blocks and developed a solution that introduces

two new blocks for enhanced feature extraction: the

Resnet Branches Shufﬂe Block (RBSB) and the Di-

lated Separated Down Block (DSDB). We also em-

ployed efﬁcient layers such as pixel shufﬂe decon-

volution and interpolation techniques in the decoder

path of the U-Net model (Shi and Caballero, 1874).

Our modiﬁed U-Net model aims to improve perfor-

mance on retinal vessel segmentation tasks

2.1 Our Proposed Architecture

Inspired by the U-Net (Ronneberger et al., 2015) and

ShufﬂeNetV2 (Ma et al., 2018) models, we propose

the Efﬁcient and Fast Lightweight Neural Network

(EFL-Net) for retinal vessel segmentation. Our goal

is to create a lightweight and accurate deep learn-

ing model for this task. To increase the receptive

ﬁeld of the U-Net model, we introduce the Resnet

Branches Shufﬂe Block (RBS block) and the Dilated

Separable Down Block (DSD block) to our architec-

ture. The encoder path of the U-Net model consists

of four stages, each comprising EFL-Net, RBS, and

DSD blocks for feature extraction and downsampling.

In the decoder path, we use the PixelShufﬂe layer

(Shi and Caballero, 1874) for upsampling instead of

the deconvolution layer and add the encoder and de-

coder features rather than concatenating them to re-

duce computation. A diagram of our architecture is

shown in Figure 3. A Dropout Block layer with a

batch normalization layer after each convolution layer

is included after each RBS block. In the following

section, we will provide a more in-depth discussion

of the core concepts that form the basis of our archi-

tecture and how they contribute to the overall design.

2.2 Resnet Branches Shufﬂe Block

In this paper, we present an improved feature extrac-

tion method for image classiﬁcation tasks. Our ap-

proach is based on the Res2Net (Gao et al., 2019)

and ShufﬂeNetV2 (Ma et al., 2018) architectures, and

aims to enhance the feature extraction capabilities of

the ShufﬂeNetV2 basic unit.

To this end, we propose the RBS block (Figure 4),

which modiﬁes the number of split channels in the in-

put feature maps of the ShufﬂeNetV2 basic unit. The

input, which has n channels, is divided into n/4 groups

with the same number of channels each, denoted as

EFL-Net: An Efﬁcient Lightweight Neural Network Architecture for Retinal Vessel Segmentation

921

Figure 2: Accuracy and number of parameters of several retinal vessel segmentation papers in the past ﬁve years (CHASEDB1

dataset (Owen et al., 2009)).

Figure 3: EFL-Net Architecture.

xi, where i ∈

{

1, 2, 3, 4

}

. Each xi has the same spa-

tial size and is passed through a bottleneck unit Ci(),

resulting in the output yi. The output of the previous

bottleneck unit, Ci − 1(), is then added to the current

group xi and passed through the bottleneck unit Ci().

This results in an output with a larger receptive ﬁeld

than xi. In addition, the channels xi for i > 1 are ag-

gregated with x1 to reuse features. The remainder of

the RBS block is identical to the ShufﬂeNetV2 con-

volution block.

RBS block can be easily incorporated into any

network as a lightweight feature extractor. Our ex-

perimental results demonstrate the superiority of the

RBS block over the original ShufﬂeNetV2 basic unit

in terms of multi-scale feature extraction and the num-

ber of parameters. As shown in Figure 4, the architec-

ture consists of several blocks, which we will describe

in detail in the following section.

2.3 ShufﬂeNetV2 Basic Unit

The basic unit in ShufﬂeNetV2 (Ma et al., 2018) is

a block of layers that includes a depthwise separa-

ble convolution (Chollet, 2017), a pointwise convolu-

tion (Chollet, 2017), and a shufﬂe operation (Zhang

et al., 2018). This unit is used to construct larger

network architectures in a way that reduces computa-

tional complexity while maintaining representational

capacity. The use of depthwise separable convolu-

tions and the shufﬂe operation also make the network

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

922

Figure 4: a) ShufﬂeNetV2 basic unit (Ma et al., 2018), b) Resnet Branches Shufﬂe Block (RBS block), DWConv stands for

depth-wise convolution.

more efﬁcient by reducing the number of parameters

and computations required.

2.4 Depthwise Separable Convolution

A depthwise separable convolution (Chollet, 2017) is

a way of decomposing a standard convolution in a

convolutional neural network into two smaller con-

volutions: a depthwise convolution that applies a sep-

arate kernel to each input channel, and a pointwise

convolution that mixes the output channels as shown

in Figure 5) . This can reduce the number of param-

eters and computations in a model and make it more

efﬁcient, particularly for mobile and embedded appli-

cations.

2.5 Channel Shufﬂe

A channel shufﬂe (Zhang et al., 2018) operation is a

way of rearranging the channels of a feature map by

interleaving them into groups. It is used to allow a

convolutional neural network to mix and combine the

information in different channels more ﬂexibly and is

often used with depthwise separable convolutions to

increase representational capacity while maintaining

efﬁciency.

2.6 Dilated Separable down Block

(DSD)

The receptive ﬁeld (RF) is a crucial concept in the

design of convolutional neural networks (CNNs). As

described in (Luo et al., 2016), the RF at each layer

is the size of the region in the input that contributes

to generating a particular feature in the output. In or-

der to accurately predict the boundaries of objects in

the input image, such as organs, tumors, or vessels, it

is necessary to provide the model with access to all

relevant parts of the image. In a CNN, each neuron

controls a speciﬁc part of the data and is exposed to

different parts of the input data during the convolu-

tion process, ﬁlling a segmented area known as the

local receptive ﬁeld. In this paper, we propose the use

of the Dilated Separable Down block (DSD block) as

a method for increasing the RF of the network and

improving its ability to predict object boundaries.

The Dilated Separable Down block (DSD) block,

as shown in Figure 6, consists of two branches of 3x3

group convolutional layers with a stride size of 2 and

a pointwise convolutional layer. Note that different

dilation rates are applied to the different groups to

extract multi-scale features. The results of the two

branches are concatenated to improve the ability of

the network to represent features. This block can be

used in place of pooling layers to increase the expres-

sive power of the model. The use of dilated convo-

lution allows the model to increase its ﬁeld of view

without increasing the number of parameters.

2.7 PixelShufﬂe

PixelShufﬂe (Shi and Caballero, 1874) is a type of

upsampling layer used in convolutional neural net-

works (CNNs) to increase the resolution of the out-

put feature maps. It has the advantage of being able

to achieve a higher resolution output than other up-

sampling methods and being more efﬁcient, as it does

not require the use of additional convolutional kernels

or the insertion of zeros into the feature maps. These

EFL-Net: An Efﬁcient Lightweight Neural Network Architecture for Retinal Vessel Segmentation

923

Figure 5: Depthwise separable convolution (Pandey, 2018).

Figure 6: Dilated Separable Down Block.

properties make PixelShufﬂe a useful tool for tasks

that require a high degree of spatial resolution and for

use in mobile and embedded applications where com-

putational resources are limited.

3 EXPERIMENTAL

ENVIRONMENT AND RESULT

3.1 DATASETS and Data Preprocessing

The DRIVE (Owen et al., 2009), and CHASEDB1

(Owen et al., 2009) are both publicly available

datasets for retinal segmentation. The DRIVE dataset

consists of 40 2D RGB images with a resolution of

565 x 584 pixels, with 20 images in both the train-

ing and test sets. The CHASEDB1 dataset includes

28 images, and has a resolution of 999 x 960 pixels.

There are 20 training images and 8 test images in the

CHASEDB1 dataset. The model’s performance was

evaluated on both datasets using the ground truth la-

bels provided by the ﬁrst expert.

We enhanced the size of the dataset by imple-

menting data augmentation techniques. To focus on

the relevant information and eliminate unnecessary

processing, we employed a mask or ﬁeld of view

(FOV) to extract patches from the input image that

only contained vessels. While the DRIVE dataset in-

cludes a binary mask, the CHASEDB1 dataset does

not. Therefore we manually created a mask for the

CHASEDB1 dataset.

Before training, the data was pre-processed to re-

move noise and uneven lighting in fundus images.

The green channel of the RGB image was chosen as

it allows for better visualization of blood vessels. The

data was then normalized and scaled, and contrast

limited adaptive histogram equalization (Zuiderveld,

1994) and gamma adjustment were applied to im-

prove the contrast between the foreground and back-

ground

3.2 Evaluation Approaches

The performance of a segmentation model can be

evaluated by comparing its results to the ground truth

(GT) and considering four scenarios: true positive

(TP), false positive (FP), false negative (FN), and true

negative (TN). TP is the number of correctly classi-

ﬁed blood vessel pixels, FP is the number of incor-

rectly classiﬁed background pixels as vessels, FN is

the number of incorrectly classiﬁed vessel pixels as

background, and TN is the number of correctly clas-

siﬁed background pixels. In addition to these four in-

dicators, the model’s performance can also be eval-

uated using the following criteria: sensitivity (SE),

speciﬁcity (SP), accuracy (ACC), precision (Pr), and

F-Measure (F1).

AC =

T P + T N

T P + T N + FP + FN

SE =

T P

T P + FN

SP =

T N

T N + FP

Precision =

T P

T P + FP

Recall =

T P

T P + FN

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

924

Table 1: Performance comparison between the EFL-Net and some state-of-the-art models on DRIVE.

Methods Year F1 SE SP Acc AUC Parameters (M)

U-NET (Ronneberger et al., 2015) 2015 0.7783 0.8288 0.9701 0.9578 0.9772 31.031

SGL (Zhou et al., 2021) 2021 0.8271 0.869 0.9843 0.9771 0.992 15.533

CE-Net (Gu et al., 2019) 2019 0.8054 0.8093 0.9797 0.9641 0.9834 15.28

CS2-Net (Mou et al., 2021) 2021 0.8141 0.8329 0.9784 0.9651 0.9851 8.91

Iternet (Li et al., 2020) 2020 0.8073 0.797 0.9823 0.9655 0.9851 8.244

Ladder-Net (Zhuang, 2018) 2018 0.8031 0.7978 0.9818 0.9656 0.9839 1.5

Recurrent U-Net (Zahangir Alom et al., 2018) 2019 0.781 0.7459 0.9836 0.9622 0.798 1.037

R2U-Net (Zahangir Alom et al., 2018) 2019 0.7928 0.7756 0.982 0.9634 0.9815 1.037

DCU-Net (Yang et al., 2022) 2022 0.8278 0.8075 0.9841 0.9664 0.9872 1.0004

SA-Net (Hu et al., 2021) 2020 0.8153 0.8573 0.9835 0.9755 0.9905 0.538707

EFL-Net (ours) 2022 0.8242 0.7957 0.9802 0.9567 0.9803 0.340

Table 2: Performance comparison between our EFL-Net and some state-of-the-art models on CHASEDB1.

Methods Year F1 SE SP Acc AUC Parameters (M)

U-NET (Ronneberger et al., 2015) 2015 0.8174 0.7537 0.982 0.9531 0.9755 31.031

BCDU-Net (d=3) (Azad et al., 2019) 2019 0.8224 0.8007 0.9786 0.956 0.9789 20.659

SGL (Zhou et al., 2021) 2021 0.8316 0.838 0.9834 0.9705 0.9886 15.533

CE-Net (Gu et al., 2019) 2019 0.8243 0.8276 0.9735 0.9545 0.9794 15.28

CS2-Net (Mou et al., 2021) 2021 0.8228 0.8154 0.9757 0.9553 0.9784 8.91

Iternet (Li et al., 2020) 2020 0.8205 0.7735 0.9838 0.9573 0.9816 8.244

Ladder-Net (Zhuang, 2018) 2018 0.8202 0.7856 0.981 0.9561 0.9793 1.5

Recurrent U-Net (Zahangir Alom et al., 2018) 2019 0.8155 0.7751 0.9816 0.9556 0.9782 1.037

R2U-Net (Zahangir Alom et al., 2018) 2019 0.8171 0.7792 0.9813 0.9556 0.9784 1.037

DCU-Net (Yang et al., 2022) 2022 0.8272 0.8115 0.978 0.9568 0.981 1.0004

SA-Net (Hu et al., 2021) 2020 0.8263 0.8212 0.984 0.9698 0.9864 0.538707

EFL-Net (ours) 2022 0.8351 0.7977 0.9860 0.9651 0.9868 0.340

F1 = 2 ×

Precision × Recall

Precision + Recall

Speciﬁcity (SP) is the ratio of correctly segmented

background pixels to the total number of actual back-

ground pixels, while sensitivity (SE) is the ratio of

correctly segmented blood vessel pixels to the total

number of actual blood vessel pixels. Accuracy (Acc)

shows the percentage of total image pixels that were

correctly segmented. Precision measures the quality

of the model’s positive predictions, while recall mea-

sures the quality of negative predictions. A higher

precision value indicates that the model’s architecture

is better trained on the given data. F1 is the weighted

harmonic mean of precision and recall.

3.3 Loss Function

The focal loss (Lin et al., 2017) is a method for ad-

dressing the class imbalance between foreground and

background pixels in a dataset, and is deﬁned as fol-

lows:

FL(p

) = −α

∗ (1 −p

)

∗ log(p

) where

(

p i f y = 1

1 − p else

(1)

In the focal loss, the predicted probability of the

network output is denoted by p and the focusing pa-

rameter, which can be adjusted, is denoted by γ. Sam-

ples that are easy to classify contribute less to the loss

values, while samples that are difﬁcult to classify con-

tribute more, causing the model to focus more on the

latter.

3.4 Experimental Environment and

Parameters

In this work, we trained our network from scratch for

200 epochs using a batch size of 256. We initialized

the weights with random values and used the Adam

optimizer with default parameters and an initial learn-

ing rate of 0.001. The learning rate was updated at

each epoch using a cosine function attenuation strat-

egy. Our experiments were conducted on a server

with a Linux operating system, 2.30 GHz processor,

128 GB RAM, an NVIDIA TESLA P100 GPU, and

the Pytorch 1.7.0 framework.

3.5 Experimental Result

Tables 1 and 2 present a comparison of the per-

formance of our proposed architecture with existing

methods on the DRIVE and CHASEDB1 datasets.

The results show that our model outperforms other

methods on the CHASEDB1 dataset and produces

EFL-Net: An Efﬁcient Lightweight Neural Network Architecture for Retinal Vessel Segmentation

925

Table 3: Ablation experiment on RBS and DSD blocks. The EFL-Net is trained and evaluated on CHASEDB1.

Method F1 SE SP Acc AUC Parameters (M)

EFL-Net (Standard convolution + maxpooling2D + cross entropy loss) 0.747399 0.6839 0.9815 0.9489 0.82892 1.936738

EFL-Net (Standard convolution + DSD block + cross entropy loss) 0.770642 0.7031103 0.986 0.957 0.979272 1.788498

EFL-Net (ShufﬂeNetV2 basic unit + DSD block + cross entropy loss) 0.815429 0.739147 0.990829 0.962996 0.985666 0.410922

EFL-Net (RBS block + DSD block + cross entropy loss) 0.801086 0.950963 0.947362 0.9648 0.988168 0.340950

EFL-Net (RBS block + DSD block+ Focal loss) 0.8351 0.7693 0.9891 0.9648 0.9871 0.340950

satisfactory results on the DRIVE dataset. In par-

ticular, our model achieves the highest F1 score of

0.8351 and the highest speciﬁcity of 0.9860 on the

CHASEDB1 dataset, demonstrating its superiority

in retinal vessel segmentation compared to previous

works. When comparing the results of our model

(EFL-Net) with other networks, it is clear that our net-

work achieves equivalent or better results compared to

the best performing networks with a smaller number

of parameters.

We conducted ablation experiments on the

CHASEDB1 dataset to study the impact of different

components of our proposed architecture. In the ﬁrst

experiment, we compared the performance of net-

works with and without DSD blocks (using MaxPool-

ing instead) in the encoder. The results, shown in rows

1 and 2 of Table 3, indicate that the DSD block is su-

perior to MaxPooling in terms of F1 score.

In the second experiment, we replaced the RBS

block with the ShufﬂeNetV2 basic unit to evaluate the

performance of the RBS block in the proposed EFL-

Net architecture. By comparing the performance of

EFL-Net with the RBS block and DSD block (EFL-

Net (RBS block + DSD block)) to EFL-Net with the

ShufﬂeNetV2 basic unit and DSD block (EFL-Net

(ShufﬂeNetV2 basic unit + DSD block)), we found

that the RBS block signiﬁcantly improved the model’s

F1 score by approximately 1.47%.

In the third experiment, we compared the perfor-

mance of our model using cross entropy loss and focal

loss. The results showed that the model using focal

loss achieved better performance.

The results of our model on the DRIVE and

CHASEDB1 datasets are shown in Figure 7. The

original images of retinal vessels are shown in the ﬁrst

column, followed by the output of our network for

Figure 7: Retinal vessel images from the CHASEDB1 (ﬁrst

row) and DRIVE (second row) datasets.

vessel segmentation in the second column. The third

column shows the binary result obtained by apply-

ing a threshold to the network output, and the ground

truth for each input image is displayed in the last col-

umn.

4 CONCLUSION

EFL-Net is a lightweight network designed to im-

prove the accuracy and speed of blood vessel seg-

mentation. It uses two custom modules, the ResNet

Branches Shufﬂe Block (RBS) and the Dilated Sepa-

rable Down block (DSD), which have a high capac-

ity for feature extraction. The RBS block is based

on the shufﬂe Net block and the DSD block expands

the network’s receptive ﬁeld while reducing feature

size without losing important information. In the up-

sampling path, the network uses PixelShufﬂe instead

of deconvolution or interpolation. The network has

0.34 million parameters and demonstrated good per-

formance on two datasets, achieving F1 scores of

0.8242 on the DRIVE dataset and 0.835187 on the

CHASEDB1 dataset.

REFERENCES

Alberta, I. B., Rahmani, S. A., et al. (2022). Retinal impair-

ment associated with long-term use of ritonavir among

hiv patients: A systematic review for primary eye care

practice. International Journal of Retina, 5(1):48–48.

Azad, R., Asadi-Aghbolaghi, M., Fathy, M., and Escalera,

S. (2019). Bi-directional convlstm u-net with dens-

ley connected convolutions. In Proceedings of the

IEEE/CVF international conference on computer vi-

sion workshops, pages 0–0.

Chollet, F. (2017). Xception: Deep learning with depthwise

separable convolutions. In Proceedings of the IEEE

conference on computer vision and pattern recogni-

tion, pages 1251–1258.

Gao, S.-H., Cheng, M.-M., Zhao, K., Zhang, X.-Y., Yang,

M.-H., and Torr, P. (2019). Res2net: A new multi-

scale backbone architecture. IEEE transactions on

pattern analysis and machine intelligence, 43(2):652–

662.

Gu, Z., Cheng, J., Fu, H., Zhou, K., Hao, H., Zhao, Y.,

Zhang, T., Gao, S., and Liu, J. (2019). Ce-net: Context

encoder network for 2d medical image segmentation.

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

926

IEEE transactions on medical imaging, 38(10):2281–

2292.

Guo, C., Szemenyei, M., Yi, Y., Wang, W., Chen, B.,

and Fan, C. (2021). Sa-unet: Spatial attention u-net

for retinal vessel segmentation. In 2020 25th inter-

national conference on pattern recognition (ICPR),

pages 1236–1242. IEEE.

Hu, J., Wang, H., Wang, J., Wang, Y., He, F., and Zhang, J.

(2021). Sa-net: A scale-attention network for medical

image segmentation. PloS one, 16(4):e0247388.

Irshad, S. and Akram, M. U. (2014). Classiﬁcation of reti-

nal vessels into arteries and veins for detection of hy-

pertensive retinopathy. In 2014 Cairo International

Biomedical Engineering Conference (CIBEC), pages

133–136. IEEE.

Kolb, H. (2012). Simple anatomy of the retina.

Li, L., Verma, M., Nakashima, Y., Nagahara, H., and

Kawasaki, R. (2020). Iternet: Retinal image seg-

mentation utilizing structural redundancy in vessel

networks. In Proceedings of the IEEE/CVF winter

conference on applications of computer vision, pages

3656–3665.

Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Doll

ar, P.

(2017). Focal loss for dense object detection. In

Proceedings of the IEEE international conference on

computer vision, pages 2980–2988.

Long, J., Shelhamer, E., and Darrell, T. (2015). Fully con-

volutional networks for semantic segmentation. In

Proceedings of the IEEE conference on computer vi-

sion and pattern recognition, pages 3431–3440.

Luo, W., Li, Y., Urtasun, R., and Zemel, R. (2016). Under-

standing the effective receptive ﬁeld in deep convolu-

tional neural networks. Advances in neural informa-

tion processing systems, 29.

Ma, N., Zhang, X., Zheng, H.-T., and Sun, J. (2018). Shuf-

ﬂenet v2: Practical guidelines for efﬁcient cnn archi-

tecture design. In Proceedings of the European con-

ference on computer vision (ECCV), pages 116–131.

Mou, L., Zhao, Y., Fu, H., Liu, Y., Cheng, J., Zheng, Y.,

Su, P., Yang, J., Chen, L., Frangi, A. F., et al. (2021).

Cs2-net: Deep learning segmentation of curvilinear

structures in medical imaging. Medical image anal-

ysis, 67:101874.

Owen, C. G., Rudnicka, A. R., Mullen, R., Barman, S. A.,

Monekosso, D., Whincup, P. H., Ng, J., and Paterson,

C. (2009). Measuring retinal vessel tortuosity in 10-

year-old children: validation of the computer-assisted

image analysis of the retina (caiar) program. Inves-

tigative ophthalmology & visual science, 50(5):2004–

2010.

Pandey, A. (2018). Depth-wise convolution and depth-wise

separable convolution.

Payne, A. J., Kaja, S., Naumchuk, Y., Kunjukunju, N.,

and Koulen, P. (2014). Antioxidant drug therapy ap-

proaches for neuroprotection in chronic diseases of

the retina. International journal of molecular sci-

ences, 15(2):1865–1886.

Ronneberger, O., Fischer, P., and Brox, T. (2015). U-net:

Convolutional networks for biomedical image seg-

mentation. In International Conference on Medical

image computing and computer-assisted intervention,

pages 234–241. Springer.

Salmon, J. F. (2019). Kanski’s Clinical Ophthalmology E-

Book: A Systematic Approach. Elsevier Health Sci-

ences.

Shi, W. and Caballero, J. (1874). Ferenc husz

a r, johannes

totz, andrew p aitken, rob bishop, daniel rueckert, and

zehan wang. 2016. real-time single image and video

super-resolution using an efﬁcient sub-pixel convolu-

tional neural network. In Conf. on computer vision

and pattern recognition (CVPR), volume 1883.

Staal, J., Abr

amoff, M. D., Niemeijer, M., Viergever, M. A.,

and Van Ginneken, B. (2004). Ridge-based vessel seg-

mentation in color images of the retina. IEEE trans-

actions on medical imaging, 23(4):501–509.

Walker, H. K., Hall, W. D., and Hurst, J. W. (1990). Clinical

methods: the history, physical, and laboratory exami-

nations.

Yang, X., Li, Z., Guo, Y., and Zhou, D. (2022). Dcu-net:

a deformable convolutional neural network based on

cascade u-net for retinal vessel segmentation. Multi-

media Tools and Applications, 81(11):15593–15607.

Zahangir Alom, M., Hasan, M., Yakopcic, C., Taha, T. M.,

and Asari, V. K. (2018). Recurrent residual convo-

lutional neural network based on u-net (r2u-net) for

medical image segmentation. arXiv e-prints, pages

arXiv–1802.

Zhang, X., Zhou, X., Lin, M., and Sun, J. (2018). Shuf-

ﬂenet: An extremely efﬁcient convolutional neural

network for mobile devices. In Proceedings of the

IEEE conference on computer vision and pattern

recognition, pages 6848–6856.

Zhou, Y., Yu, H., and Shi, H. (2021). Study group learning:

Improving retinal vessel segmentation trained with

noisy labels. In International Conference on Medi-

cal Image Computing and Computer-Assisted Inter-

vention, pages 57–67. Springer.

Zhuang, J. (2018). Laddernet: Multi-path networks based

on u-net for medical image segmentation. arXiv

preprint arXiv:1810.07810.

Zuiderveld, K. (1994). Contrast limited adaptive histogram

equalization. Graphics gems, pages 474–485.

EFL-Net: An Efﬁcient Lightweight Neural Network Architecture for Retinal Vessel Segmentation

927