Multi-class Semantic Segmentation of Skin Lesions via Fully

Convolutional Networks

Manu Goyal

1 a

, Moi Hoon Yap

2 b

and Saeed Hassanpour

3 c

Department of Biomedical Data Science, Dartmouth College, Hanover, NH, U.S.A.

Visual Computing Lab, Manchester Metropolitan University, Manchester, U.K.

Departments of Biomedical Data Science, Computer Science and Epidemiology, Dartmouth College, Hanover, NH, U.S.A.

Keywords:

Skin Cancer, Fully Convolutional Networks, Multi-class Segmentation, Lesion Diagnosis.

Abstract:

Melanoma is clinically difﬁcult to distinguish from common benign skin lesions, particularly melanocytic

naevus and seborrhoeic keratosis. The dermoscopic appearance of these lesions has huge intra-class variations

and high inter-class visual similarities. Most current research is focusing on single-class segmentation irre-

spective of classes of skin lesions. In this work, we evaluate the performance of deep learning on multi-class

segmentation of ISIC-2017 challenge dataset, which consists of 2,750 dermoscopic images. We propose an

end-to-end solution using fully convolutional networks (FCNs) for multi-class semantic segmentation to au-

tomatically segment the melanoma, seborrhoeic keratosis and naevus. To improve the performance of FCNs,

transfer learning and a hybrid loss function are used. We evaluate the performance of the deep learning seg-

mentation methods for multi-class segmentation and lesion diagnosis (with post-processing method) on the

testing set of the ISIC-2017 challenge dataset. The results showed that the two-tier level transfer learning

FCN-8s achieved the overall best result with Dice score of 78.5% in a naevus category, 65.3% in melanoma,

and 55.7% in seborrhoeic keratosis in multi-class segmentation and Accuracy of 84.62% for recognition of

melanoma in lesion diagnosis.

1 INTRODUCTION

Skin cancers are more common than all other can-

cers (Pathan et al., 2018). Malignant skin lesions are

classiﬁed as melanocytic, i.e. melanoma, and non-

melanocytic. The most common non-melanocytic

cancers are keratinocytic: basal cell carcinoma and

squamous cell carcinoma. Melanoma is less common

but is more likely to prove fatal than keratinocytic

skin cancers due to aggressive invasion and metastasis

(National Cancer Institute, 2017)(Dvo

ankov

a et al.,

2017). Hence, early detection is important to save

lives. According to the prediction of the Melanoma

Foundation (Melanoma Foundation (AIM), 2017), the

estimated diagnosed cases of melanoma in the United

States in 2018 is 178,560 with 91,270 cases will be

invasive.

Melanocytic naevi and seborrhoeic keratosis

are very common benign skin lesions that may

be clinically difﬁcult to differentiate from skin

cancer. Both melanoma and melanocytic naevi

https://orcid.org/0000-0002-9201-1385

https://orcid.org/0000-0001-7681-4287

https://orcid.org/0000-0001-9460-6414

are melanocytic lesion as uncontrolled growth of

melanocytes (pigmented cells) results in melanoma

whereas non-cancerous growth in moles results in be-

nign melanocytic naevus. Seborrheic keratosis is a

type of non-melanocytic skin lesion. But, it is very

hard to distinguish the SK lesions from melanocytic

lesions (moles and melanoma) even with the help of

dermoscopy as these skin lesions share similar fea-

tures such as irregular shapes and multiple colors.

With the rapid growth of deep learning ap-

proaches, many researchers (Yuan et al., 2017), (Yu

et al., 2017), (Bi et al., 2017), (Goyal et al., 2019)

have proposed Deep Convolutional Neural Networks

for skin lesion segmentation (single-class). We have

found no previous research on multi-class semantic

segmentation for different types of skin lesions.

Our contributions are three fold. Firstly, we pro-

pose multi-class semantic lesions segmentation for

melanoma, seborrhoeic keratosis and naevus. To

overcome data deﬁciency, a two-tier transfer learn-

ing is used in skin lesions segmentation to train the

fully convolutional networks (FCNs). Secondly, we

design a hybrid loss function to handle class imbal-

ance in the multi-class segmentation. Thirdly, we as-

sess the performance of state-of-the-art deep learning

290

Goyal, M., Yap, M. and Hassanpour, S.

Multi-class Semantic Segmentation of Skin Lesions via Fully Convolutional Networks.

DOI: 10.5220/0009380302900295

In Proceedings of the 13th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2020) - Volume 3: BIOINFORMATICS, pages 290-295

ISBN: 978-989-758-398-8; ISSN: 2184-4305

Table 1: Distribution of images for multi-class segmenta-

tion task.

Naevi Melanoma Seborrheic Keratosis Total

Training set 1372 521 387 2000

Validation set 92 34 23 150

Testing set 393 117 90 600

Total 1843 521 386 2750

algorithms using our proposed multi-class segmenta-

tion and a post-processing method to determine le-

sion diagnosis on ISIC-2017 Challenge dataset. Our

proposed method can be generalised into other multi-

class segmentation tasks in medical imaging.

2 METHODOLOGY

This section discusses the publicly available ISIC-

2017 skin lesion dataset and its ground truth labeling,

the two-tier transfer learning approach, and the hybrid

loss function.

2.1 Datasets and Ground Truth

We used the publicly available ISIC-2017 Skin Le-

sion Analysis Towards Melanoma Detection Chal-

lenge dataset (Codella et al., 2017) to train the fully

convolutional deep learning models. RGB colorspace

is used to represent all the images in this dataset. It in-

cludes 3 skin lesion types using dermoscopy images:

naevi, melanomas and seborrhoeic keratosis. The seg-

mentation task on these dermoscopy images is very

challenging due to high inter-class similarity between

the 3 types of skin lesions. This dataset is imbalanced

as there are only a total of 521 melanoma and 386 seb-

orrheic keratosis compared to 1843 melanocytic naevi

dermoscopic images. There are a total of 2750 der-

moscopy images in the ISIC-2017 challenge dataset,

as summarised in Table 1.

In this dataset, the size of images varies between

540 × 722 and 4499 × 6748. To improve the per-

formance and reduce the computational cost, all the

images are resized to 500 × 375. In ISIC-2017 seg-

mentation challenge, the task is to segment the lesion

boundaries, which was a one-class segmentation task.

Here we are targeting on automatic multi-class seg-

mentation. The ground truths are all deﬁned in RGB

colorspace and 8-bit paletted images. Figure 1 illus-

trates the dermoscopic images with the correspond-

ing ground truth labeling in PASCAL-VOC format

(Garcia-Garcia et al., 2017)(Everingham et al., 2015).

Index 1 indicates naevus, index 2 indicates melanoma

and index 3 represents seborrhoeic keratosis.

2.2 Fully Convolutional Networks for

Multi-class Semantic Segmentation

FCNs and encoder-decoder CNNs can detect the mul-

tiple objects as well as localize the objects by using

pixel-wise prediction. This enables to learn which

pixel of an image belongs to which class of ob-

ject. Recently, FCNs have become the state-of-the-art

methods for segmentation tasks on both non-medical

and medical imaging, which are superior to conven-

tional machine learning and other deep learning meth-

ods. We used the four different variants of FCNs

(FCN-AlexNet, FCN-32s, FCN-16s, and FCN-8s)

and assessed their performance on multi-class skin le-

sions segmentation.

The ﬁrst variant FCN-AlexNet is a modiﬁed ver-

sion of original state-of-the-art classiﬁcation model

called AlexNet, which won ImageNet ILSVRC-2012

competition in the classiﬁcation category (Long et al.,

2015)(Krizhevsky et al., 2012). The FCN-AlexNet

enables the pixel-wise prediction by using the de-

convolutional layers which up-sample the features

learned by the earlier convolutional layers. We have

trained the FCN-AlexNet on the Caffe deep learning

framework (Jia et al., 2014). The input and ground

truth images are both 500×375. We have ﬁne-tuned

the network parameters to allow the method more

time to learn the features from dermoscopy images

by using 100 epochs, stochastic gradient descent with

a learning rate of 0.0001.

The other FCNs variants, FCN-32s, FCN-16s

and FCN-8s, are based on another state-of-the-art

classiﬁcation network called VGG-16, which won

the localization challenge and was in second posi-

tion for the classiﬁcation challenge in the ImageNet

ILSVRC-2014 competition (Simonyan and Zisser-

man, 2014)(Long et al., 2015). The differences be-

tween these models are the up-sampling layers with

different pixel stride. As the name suggested by

these FCNs variants, in FCN-32s, up-sampling is per-

formed with the help of 32-pixel stride whereas 16-

pixel stride is used for FCN-16s and 8-pixel stride for

FCN-8s. With the small pixel stride, the models were

able to predict ﬁner-grained analysis of the objects.

The same network parameters as FCN-AlexNet were

used to train these models.

2.3 The Two-tier Transfer Learning

Approach

Convolutional neural networks generally require a

huge dataset to learn the features and detect objects

Multi-class Semantic Segmentation of Skin Lesions via Fully Convolutional Networks

291

(a) (b) (c)

Figure 1: Original images (ﬁrst row) and PASCAL-VOC format (second row). The skin lesion diagnosis from left to right:

(a) naevus, (b) melanoma and (c) seborrhoeic keratosis.

(a) Single-detection (b) No-detection (c) Multi-detection

Figure 2: Examples of different types of semantic segmentation in ISIC-2017 testing set: (a) result with one class lesion type;

(b) result with no lesion detected; and (c) result with multiple lesion types. Where green color represents melanocytic naevus,

red color represents melanoma, and yellow color represents seborrhoeic keratosis.

Table 2: Number of cases for each type of inference in ISIC-

2017 Testing Set.

Inference Single-detection Multi-detection No-detection

Testing Set 395 192 13

in images (LeCun et al., 2015). Since, we have

RGB images in dermoscopic images, it is good to use

two-tier transfer learning from huge datasets in non-

medical backgrounds such as ImageNet and Pascal-

VOC dataset to converge the weights associated with

each convolutional layer of networks (Russakovsky

et al., 2014)(Everingham et al., 2015)(Goyal et al.,

2017). The transfer learning transfers the feature

learned by previous models on huge non-medical

datasets to medical image datasets. There are two

types of transfer learning, i.e. partial transfer learning

in which only the features from few convolutional lay-

ers are transferred, and full transfer learning in which

features are transferred from all the layers of previ-

ous pre- trained models. For the ﬁrst tier of two-tier

transfer learning, we used partial transfer learning by

transferring the features from the convolutional lay-

ers trained on ImageNet. For the second tier, we used

full transfer learning from a model trained on Pascal-

VOC.

2.4 Custom Hybrid Loss Function

For imbalanced dataset as summarized in Table 1, we

used a hybrid loss function, which is a combination of

softmax cross-entropy loss and Dice score loss func-

tion, to optimize the objective function. Dice Score

is a commonly used performance metric in medical

imaging segmentation. Softmax cross-entropy loss

function is a sum of per-pixel softmax cross-entropy

C2C 2020 - Workshop on COMP2CLINIC: Biomedical Researchers Clinicians Closing The Gap Between Translational Research And

Healthcare Practice

292

Table 3: Comparison of different FCN architectures using the ISIC-2017 Challenge Dataset (SK denotes Seborrheic Kerato-

sis).

Method

Dice Speciﬁcity Sensitivity MCC

Naevi Melanoma SK Naevi Melanoma SK Naevi Melanoma SK Naevi Melanomas SK

FCN-AlexNet 0.819 0.609 0.488 0.989 0.982 0.987 0.798 0.4864 0.456 0.814 0.541 0.484

FCN-32s 0.779 0.549 0.484 0.991 0.977 0.968 0.751 0.430 0.478 0.775 0.484 0.463

FCN-16s 0.761 0.590 0.506 0.988 0.979 0.978 0.706 0.471 0.466 0.764 0.528 0.501

FCN-8s 0.785 0.653 0.557 0.990 0.984 0.988 0.747 0.527 0.509 0.779 0.582 0.5683

Ground truth FCN-AlexNet FCN-32s FCN-16s FCN-8s

Figure 3: Illustration of segmentation results to visually compare the performance of ground truth delineation and four FCNs

on multi-class segmentation for a naevus (top row), a melanoma (middle row), and a seborrhoeic keratosis (bottom row).

loss whereas Dice score loss function take care of

overall segmentation score on whole image.

= So f tmax(cross −entropy) (1)

where L

is overall softmax cross entropy loss func-

tion and cross-entropy is per-pixel cross-entropy loss.

2|S ∩ G|

|S| + |G|

(2)

where L

is Dice score loss function, S is segmented

image and G is ground truth.

= L

+ L

(3)

where L

is a hybrid loss function which is combina-

tion of both softmax cross entropy loss function and

dice loss function.

3 RESULT AND DISCUSSION

We experimented with four state-of-the-art fully con-

volutional networks for our proposed segmentation

task as described above. We trained the models on the

ISIC-2017 training set of 2000 dermoscopic images

with an input-size of 500×375 using stochastic gradi-

ent descent with a learning rate of 0.0001, 60 epochs

with a dropout rate of 33%. In Table 3, we report Dice

Similarity Coefﬁcient (Dice), Sensitivity, Speciﬁcity,

Matthews Correlation Coefﬁcient (MCC) as metrics

for performance evaluation of multi-class segmenta-

tion of skin lesions. We used the trained model based

on the best Dice score on the ISIC-2017 validation set

to perform inference on the ISIC-2017 test set.

Conﬁguration of GPU Machine for Experi-

ments. (1) Hardware: CPU - Intel i7-6700 @

4.00Ghz, GPU - NVIDIA TITAN X 12Gb, RAM -

32GB DDR5 (2) Software: Caffe.

In performance measure for multi-class segmen-

Multi-class Semantic Segmentation of Skin Lesions via Fully Convolutional Networks

293

tation, we received three types of results from the

inference as shown in the Fig. 2 and number of

cases for each type of detection is shown in Table

2. In Table 3, we report the performance evaluation

of fully convolutional networks for multi-class seg-

mentation on ISIC-2017 test set. In the naevi cate-

gory, all FCNs achieved good segmentation results,

but FCN-AlexNet achieved the best results with Dice

score of 0.819, MCC score of 0.814, and Sensitivity is

0.798. In this category, FCN-8s performed 2nd best

with Dice score of 0.779 and MCC score of 0.779. In

the melanoma and seborrhoeic keratosis catergories,

FCN-8s has achieved Dice score of 0.653 and 0.557

respectively, which was also the best performer for all

the metrics. Fig. 3 visually compares the segmenta-

tion results on different lesion types. FCNs performed

best in the class of naevi because we have more im-

ages of naevi than melanoma and seborrhoeic ker-

atosis. Due to high intra-class and inter-class visual

similarities, performance for both melanoma and se-

borrhoeic keratosis suffer due to fewer images in the

dataset. Melanoma images are approx. 37% and ker-

atosis images are approx. 22% of the total of images

of naevi in the dataset.

The results demonstrated that deep learning tech-

niques are reliant on the size of dataset. The segmen-

tation results for melanoma and seborrhoeic keratosis

were notably poorer than for naevi as a consequence

of data deﬁciency. Despite the limitation on dataset,

we have provided a fully automated end-to-end solu-

tion for multi-class segmentation.

3.1 Post-processing Method to

Determine Lesion Diagnosis

We used a post-processing method to determine a sin-

gle label for lesion diagnosis especially for multi-

detection. We only used FCN-8s for this stage

as it provided best scores for the segmentation of

melanoma and seborrhoeic keratosis. For single-

detection, we directly assumed the detected lesion

class as same. There were very few cases of no de-

tection (13 cases out of 600) in testing set, we as-

sumed these cases as naevi for performance evalua-

tion. For multi-class detection, we adopted an prior-

ity based strategy for class prediction with preference

of the malignant lesions over the benign and number

of images in the training set according to the Table

4. For example, the (c) multi-detection case in Fig. 2

is classiﬁed as melanoma according to priority based

strategy.

In Table 5, we report the performance of selected

FCN-8s with post-processing method to determine le-

sion diagnosis. We achieved an Accuracy of 84.62%

Table 4: Priority strategy based on benign/malignant and

number of images in ISIC-2017 training set. Where SK is

seborrhoeic keratosis.

Priority Class Benign/Malignant No. of Images

1 Melanoma Malignant 541

2 SK Benign 387

3 Naevi Benign 1372

Table 5: The performance of FCN-8s with post-processing

method for lesion diagnosis on ISIC-2017 testing set.

Where SK is seborrhoeic keratosis.

Class No. of Cases Correct Incorrect Accuracy

Naevi 393 319 74 81.17

Melanoma 117 99 18 84.62

SK 90 67 23 74.44

Overall 600 485 115 80.83

for recognition of melanoma and 74.44% for sebor-

rhoeic keratosis with our proposed post-processing

method despite the poor performance of FCNs for

segmentation of melanoma and seborrhoeic keratosis.

4 CONCLUSION

We propose a fully automated multi-class semantic

segmentation for melanomas, naevi and seborrhoeic

keratosis in the ISIC 2017 Challenge dataset. Seg-

mentation of skin lesions is very challenging as there

are high intra-class variations and inter-class similar-

ities in terms of visual appearance, size and colour.

The literature on skin lesion segmentation only de-

scribes one-class solutions. Computer vision algo-

rithms can easily segment one class of skin lesion

from the surrounding healthy skin. But it remains a

major challenge to achieve good multi-class segmen-

tation results for multiple categories. We designed a

hybrid loss function and implemented two-tier trans-

fer learning and successfully established a new base-

line for multi-class segmentation for skin lesions. We

further investigated the post-processing method to im-

prove the lesion diagnosis of FCNs. With balanced

skin lesion dataset and expert annotation, the method

has potential to further improve the lesion diagnosis

with multi-class segmentation.

REFERENCES

Bi, L., Kim, J., Ahn, E., Kumar, A., Fulham, M., and

Feng, D. (2017). Dermoscopic image segmentation

C2C 2020 - Workshop on COMP2CLINIC: Biomedical Researchers Clinicians Closing The Gap Between Translational Research And

Healthcare Practice

294

via multi-stage fully convolutional networks. IEEE

Transactions on Biomedical Engineering.

Codella, N. C., Gutman, D., Celebi, M. E., Helba, B.,

Marchetti, M. A., Dusza, S. W., Kalloo, A., Li-

opyris, K., Mishra, N., Kittler, H., et al. (2017).

Skin lesion analysis toward melanoma detection: A

challenge at the 2017 international symposium on

biomedical imaging (isbi), hosted by the international

skin imaging collaboration (isic). arXiv preprint

arXiv:1710.05006.

Dvo

ankov

a, B., Szabo, P., Kodet, O., Strnad, H., Kol

M., Lacina, L., Krej

ı, E., Na

nka, O.,

Sedo, A., and

Smetana, K. (2017). Intercellular crosstalk in human

malignant melanoma. Protoplasma, pages 1–8.

Everingham, M., Eslami, S. M. A., Van Gool, L., Williams,

C. K. I., Winn, J., and Zisserman, A. (2015). The pas-

cal visual object classes challenge: A retrospective.

International Journal of Computer Vision, 111(1):98–

136.

Garcia-Garcia, A., Orts-Escolano, S., Oprea, S., Villena-

Martinez, V., and Garcia-Rodriguez, J. (2017). A re-

view on deep learning techniques applied to semantic

segmentation. arXiv preprint arXiv:1704.06857.

Goyal, M., Oakley, A., Bansal, P., Dancey, D., and Yap,

M. H. (2019). Skin lesion segmentation in dermo-

scopic images with ensemble deep learning methods.

IEEE Access, pages 1–1.

Goyal, M., Reeves, N. D., Rajbhandari, S., Spragg, J., and

Yap, M. H. (2017). Fully convolutional networks

for diabetic foot ulcer segmentation. arXiv preprint

arXiv:1708.01928.

Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J.,

Girshick, R., Guadarrama, S., and Darrell, T. (2014).

Caffe: Convolutional architecture for fast feature em-

bedding. In Proceedings of the 22nd ACM inter-

national conference on Multimedia, pages 675–678.

ACM.

Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Im-

agenet classiﬁcation with deep convolutional neural

networks. In Advances in neural information process-

ing systems, pages 1097–1105.

LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learn-

ing. Nature, 521(7553):436–444.

Long, J., Shelhamer, E., and Darrell, T. (2015). Fully con-

volutional networks for semantic segmentation. In

Proceedings of the IEEE Conference on Computer Vi-

sion and Pattern Recognition, pages 3431–3440.

Melanoma Foundation (AIM) (2017). Melanoma stats,

facts and ﬁgures. last access: 27/10/2017.

National Cancer Institute (2017). Cancer stat facts:

Melanoma of the skin. last access: 26/10/17.

Pathan, S., Prabhu, K. G., and Siddalingaswamy, P. (2018).

Techniques and algorithms for computer aided diag-

nosis of pigmented skin lesions—a review. Biomedi-

cal Signal Processing and Control, 39:237–262.

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh,

S., Ma, S., Huang, Z., Karpathy, A., Khosla,

A., Bernstein, M., et al. (2014). Imagenet large

scale visual recognition challenge. arXiv preprint

arXiv:1409.0575.

Simonyan, K. and Zisserman, A. (2014). Very deep con-

volutional networks for large-scale image recognition.

arXiv preprint arXiv:1409.1556.

Yu, L., Chen, H., Dou, Q., Qin, J., and Heng, P.-A. (2017).

Automated melanoma recognition in dermoscopy im-

ages via very deep residual networks. IEEE transac-

tions on medical imaging, 36(4):994–1004.

Yuan, Y., Chao, M., and Lo, Y.-C. (2017). Automatic skin

lesion segmentation using deep fully convolutional

networks with jaccard distance. IEEE Transactions

on Medical Imaging.

Multi-class Semantic Segmentation of Skin Lesions via Fully Convolutional Networks

295