CoDA-Few: Few Shot Domain Adaptation for Medical Image Semantic

Segmentation

Arthur B. A. Pinto

1 a

, Jefersson A. dos Santos

1,5 b

, Hugo Oliveira

2 c

and Alexei Machado

3,4 d

Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil

Institute of Mathematics and Statistics, University of S

ao Paulo, Brazil

Department of Anatomy and Imaging, Universidade Federal de Minas Gerais, Brazil

Department of Computer Science, Pontif

ıcia Universidade Catolica de Minas Gerais, Brazil

Computing Science and Mathematics, University of Stirling, Scotland, U.K.

Keywords:

Few-Shot, Domain Adaptation, Image Translation, Semantic Segmentation, Generative Adversarial Networks.

Abstract:

Due to ethical and legal concerns related to privacy, medical image datasets are often kept private, prevent-

ing invaluable annotations from being publicly available. However, data-driven models as machine learning

algorithms require large amounts of curated labeled data. This tension between ethical concerns regarding

privacy and performance is one of the core limitations to the development of artiﬁcial intelligence solutions in

medical imaging analysis. Aiming to mitigate this problem, we introduce a methodology based on few-shot

domain adaptation capable of leveraging organ segmentation annotations from private datasets to segment pre-

viously unseen data. This strategy uses unsupervised image-to-image translation to transfer annotations from

a conﬁdential source dataset to a set of unseen public datasets. Experiments show that the proposed method

achieves equivalent or better performance when compared with approaches that have access to the target data.

The method’s effectiveness is evaluated in segmentation studies of the heart and lungs in X-ray datasets, often

reaching Jaccard values larger than 90% for novel unseen image sets.

1 INTRODUCTION

The Internet provides a virtually unlimited amount of

unlabeled, weakly-labeled, or even fully labeled im-

ages in the visible spectrum. Speciﬁc imaging do-

mains as medical data, however, deal with privacy

and ethical concerns during the creation of public

datasets, while also being harder and highly more ex-

pensive to annotate. As the literature of medical im-

age analysis migrates from shallow feature extraction

to deep feature learning, the main limitation to the

performance of machine learning models becomes the

lack of labeled data.

Deep Neural Networks (DNNs) for visual recog-

nition (Krizhevsky et al., 2012) require extensive and

representative datasets for training, that may be un-

available for most clinical scenarios. While the lack

of annotated data is an issue that can be alleviated

with techniques such as transfer learning and semi-

https://orcid.org/0000-0003-2057-9489

https://orcid.org/0000-0002-8889-1586

https://orcid.org/0000-0001-8760-9801

https://orcid.org/0000-0001-8077-3377

supervised learning, one aspect that makes this task

difﬁcult is that most labeled datasets are private or

not fully publicly accessible. In order to protect the

patients’ privacy, hospitals decline to share medical

records to train machine learning models, even when

these are expected to help diagnosis counseling.

Domain Adaptation (DA) is traditionally handled

with the aid of supervised, semi-supervised, weakly-

supervised or even unsupervised methods (Zhang

et al., 2017) by leveraging source data/labels and tar-

get data. Unsupervised Domain Adaptation (UDA)

can be used to transfer representations between do-

mains or tasks without requiring any target labels,

while Semi-Supervised Domain Adaptation (SSDA)

considers the case of a few labeled samples on the

target set. However, as such DA methods demand si-

multaneous access to both source and target data, they

do not ﬁt Few-Shot Domain Adaptation (Few-Shot

DA) cases, where the target-domain data for the task

of interest are unavailable. An example of Few-Shot

DA is the case of medical image datasets, where the

source or the target sets are often not publicly avail-

able due to privacy and ethical concerns. This limita-

Pinto, A., Santos, J., Oliveira, H. and Machado, A.

CoDA-Few: Few Shot Domain Adaptation for Medical Image Semantic Segmentation.

DOI: 10.5220/0011616800003417

In Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 4: VISAPP, pages

715-726

ISBN: 978-989-758-634-7; ISSN: 2184-4321

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

715

tion represents reproducibility hurdles, as annotations

from specialized physicians end up being used only

for local research, remaining inaccessible to other in-

stitution.

In this paper, we introduce a novel DA architec-

ture applicable in Few-Shot DA cases where the tar-

get domain data for the tasks of interest is unavailable.

The approach is based on the Conditional Domain

Adaptation Generative Adversarial Networks (CoDA-

GANs)(Oliveira et al., 2020) and the Few-Shot Un-

supervised Image-to-Image Translation (FUNIT)(Liu

et al., 2019) framework, speciﬁcally applied to the

context of biomedical image segmentation tasks.

For the current study, we claim the following con-

tributions:

1. We propose an innovative method that combines

Few-shot Image-to-Image translation with a seg-

mentation model to perform successful Few-shot

DA in biomedical image segmentation task;

2. A strategy with a more consistent training phase,

i.e., less instability from the Generative Adversar-

ial Networks (GANs);

3. A thorough test of our technique on a large collec-

tion of Chest x-ray (CXR) datasets utilizing vari-

ous source dataset combinations.

The method’s improved stability in the training

phase and its performance with unseen images are

demonstrated by extensive evaluation of a large col-

lection of Chest X-Ray (CXR) datasets using different

combinations of source datasets for two segmentation

tasks: lungs and heart.

2 BACKGROUND AND RELATED

WORK

2.1 Image-to-Image Translation

Image-to-Image (I2I) translation aims to learn the

mapping from a source image domain to a target im-

age domain. I2I often employs Generative Adversar-

ial Networks (GANs) (Goodfellow et al., 2014) that

are capable of transforming samples from one image

domain into images from another. These networks

use paired images to simplify the learning process

and loss functions, comparing the original and trans-

lated images at pixel or patch levels. Pix2Pix (Isola

et al., 2017) uses a GAN to create the mapping func-

tion according to a source image that serves as con-

ditioning to the model. On the other hand, BiCycle-

GANs (Zhu et al., 2017b) generate diverse outputs in

I2I problems, promoting the one-to-one relationship

between the network results and the latent vector by

modeling continuous and multi-modal distributions.

Although high-quality results have been shown both

in Pix2Pix and BiCycleGANs experiments(Zhu et al.,

2017b), the training procedure of these architectures

requires paired training data that reduces the appli-

cability of I2I translation to a small and limited sub-

set of image domains where there is the possibility of

generating paired datasets. This limitation motivated

the conception of Unpaired Image Translation meth-

ods such as CycleGAN (Zhu et al., 2017a), Unsuper-

vised Image-To-Image Translation (UNIT) (Liu et al.,

2017), and the Multimodal Unsupervised Image-To-

Image Translation (MUNIT) (Huang et al., 2018)

method that aim to learn a conditional image gen-

eration mapping function able to translate input im-

ages of a source domain to analog images of a target

domain without pairing supervision. These methods

leverage Cycle-Consistency to regularize the training

and to model the translation process between two im-

age domains as an invertible process.

2.2 Few-Shot Unsupervised Image

Translation

The FUNIT framework (Liu et al., 2019) proposes to

map an image of a source domain to a similar image

of an unseen target domain by leveraging only the few

target samples available at test time. During train-

ing, FUNIT uses images from a set of source datasets

(e.g. images of several animal species, or, closer to

our context, public medical imaging datasets) to train

a multi-source I2I translation model.

In the deploy phase, few images from a novel do-

main are presented to the model. The model leverages

the few target samples to translate any source sample

to analogous images of the target class. Then, when

the model is fed the few target images from a different

unseen class, it morphs source images to their analo-

gous target translation.

2.3 Domain Adaptation

A method often used in tasks such as classiﬁcation,

detection, and segmentation is transfer learning via

ﬁne tuning. This method adapts DNNs pre-trained

on larger source datasets to perform similar tasks

on smaller labeled target datasets. Although use-

ful, Fully Supervised Domain Adaptation (FSDA) ap-

proaches have the limitation of requiring at least small

quantities of labeled target datasets, while their unsu-

pervised counterpart (i.e. UDA) allows for zero su-

pervision on target domains.

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

716

In recent years, modern alternatives to perform

UDA in neural networks have emerged such as the

ones based on Maximum Mean Discrepancy (MMD)

(Yan et al., 2017; Sun and Saenko, 2016; Tzeng et al.,

2017). Aiming to improve MMD by exploiting the

prior probability on the source and target domains,

Yan et al. (Yan et al., 2017) propose a weighted MMD

that includes domain-speciﬁc auxiliary weights into

MMD. Sun and Saenko (Sun and Saenko, 2016) dis-

cuss the case when the target domain is unlabeled and

extend the Correlation Alignment method to layer ac-

tivations in DNNs. Tzeng et al. (Tzeng et al., 2017)

combine discriminative modeling, untied weight shar-

ing, and an adversarial loss in a method called Adver-

sarial Discriminative Domain Adaptation (ADDA).

A vast number of works have used I2I Translation

for Domain Adaptation in order to perform segmen-

tation. Among these works, the Cycle-Consistent Ad-

versarial Domain Adaptation (CyCADA) (Hoffman

et al., 2018) accomplishes UDA by adding an FCN

to the end of a CycleGAN (Zhu et al., 2017a). Other

important works to be mentioned are the I2IAdapt

(Murez et al., 2018), that uses a CycleGAN (Zhu

et al., 2017a) coupled with segmentation architec-

tures to perform UDA; and the Dual Channel-wise

Alignment Network (DCAN) (Wu et al., 2018) that

attaches a segmentation architecture to the target end

of a translation architecture.

DA using Cycle-Consistency GANs have also

been applied to medical imaging, aiming to im-

prove cross-dataset generalization (Zhang et al., 2018;

Tang et al., 2019b; Tang et al., 2019a), transferring

knowledge between imaging modalities (Yang et al.,

2019) and even domain generalization (Oliveira et al.,

2020). However, all of these methods, except CoDA-

GANs (Oliveira et al., 2020), have the limitation of

not being multi-source/multi-target. In addition to

that, all of the previously mentioned GANs for med-

ical imaging DA need the source and target datasets

to be available during the training phase, which limits

their use to private target data.

2.4 CoDAGANs

CoDAGAN (Oliveira et al., 2020) is a framework

that combines I2I translation architectures (Liu et al.,

2017; Huang et al., 2018) with Encoder-Decoder seg-

mentation models (Ronneberger et al., 2015) to per-

form UDA, SSDA, or FSDA between various im-

age sets from the same imaging modality. The base

translation models of CoDAGANs rely on Autoen-

coders as generators, containing down-sampling and

up-sampling residual blocks. The intermediate rep-

resentations from the generator’s encoders are used

as basis for the isomorphic representation that serves

as input for the supervised segmentation module. By

employing supervision on an isomorphic space shared

across all datasets, CoDAGANs use the supervision

of the source datasets to perform inference across tar-

get data. Due to the nature of adversarial training, one

main disadvantage of CoDAGANs is the lack of sta-

bility in its DA performance. This limitation can be

mitigated by using historical averages, as discussed in

Section 3.

3 METHODOLOGY

We propose a new approach for Few-Shot DA in

cross-dataset semantic segmentation tasks applied to

medical imaging, henceforth referred to as CoDA-

Few. CoDA-Few is based on previous developments

in the UDA/SSDA translation (Oliveira et al., 2020)

and Few-Shot I2I (Liu et al., 2019), and is there-

fore an incremental improvement for CoDAGANS

(Oliveira et al., 2020). It uses the same proposition

of generating a mid-level isomorphic representation I

as CoDAGANs (Oliveira et al., 2020), with the dis-

tinction that a Few-Shot I2I translation network (Liu

et al., 2019) is used to compute I instead of the orig-

inal MUNIT/UNIT architectures (Huang et al., 2018;

Liu et al., 2017). During training, CoDA-Few uses the

Few-Shot I2I translation network to learn to generate

I from unseen datasets. Then, I is fed to a super-

vised model M based on I capable of inferring over

several datasets. At test time, we can use CoDA-Few

to infer over a dataset that was never seen in training.

The unsupervised translation process, followed by a

supervised learning model, can be seen in Figure 1.

This change effectively allows our Few-Shot DA net-

work to perform predictions on fully-unseen datasets,

while CoDAGANs can only infer over target distribu-

tions seen during training.

A few-shot segmentation task F is deﬁned as a

task where the dataset has a small number of labeled

samples. In particular, we deﬁne F as a zero-shot task

when we have a source dataset S used in training, and

an unseen target dataset F used for testing. The chal-

lenge is to segment images from F using information

from S .

The proposed method allows the multi-

source/multi-target conﬁguration on the Few-Shot

DA scenario involving two meta-datasets, i.e., the

source meta-dataset S = {S

,... ,S

} with an

arbitrary number of labeled datasets N ≥ 2, and the

target dataset F = {F

,. .. ,F

} with an arbitrary

number M of unlabeled unseen datasets. This allows

the proposed method to be trained with multiple

CoDA-Few: Few Shot Domain Adaptation for Medical Image Semantic Segmentation

717

source datasets and be applied in many target sets,

as CoDA-Few does not need the presence of a target

dataset in the training phase. For simplicity we will

refer to each target dataset F

individually as F .

For this work, FUNIT (Liu et al., 2019) was used

as a base to generate I . Similarly to CoDAGANs

(Oliveira et al., 2020), a supervised model M based

on a U-Net (Ronneberger et al., 2015) was attached

on top of that, with some considerable changes to

the translation approaches, regarding the architecture

and conditional distribution modeling of the origi-

nal GANs. Speciﬁcally, the ﬁrst two layers of the

segmentation network were removed, resulting in an

asymmetrical U-Net to compensate for the loss of

spatial resolution introduced by the Encoder. Also,

the number of input channels in M was changed in

order to match the number of channels of I . As in

the case of MUNIT (Huang et al., 2018), FUNIT (Liu

et al., 2019) also separates the content of an image

from its style. The U-Net is only fed the content infor-

mation, as the style vector can be ignored since it has

no spatial resolution. In contrast to MUNIT/UNIT,

FUNIT (Liu et al., 2019) uses a progressive growth by

historical average with a weighted update, resulting in

a ﬁnal generator G

= {E

} that is an epochal ver-

sion of the intermediate generators. With that, the sta-

bility in the training phase is considerably improved

for both translation and DA.

A training iteration on a CoDA-Few follows the

sequence shown in Figure 1. The generator network

is an Encoder-Decoder translation architecture.

The encoding half (E

) receives images from the dif-

ferent source domains S and generates an isomorphic

representation I within the image domains in a high

dimensional space. Decoders (D

) are fed with I

and produce synthetic images from the same or dif-

ferent domains used in the learning process. Then,

a Discriminator D evaluates whether the fake im-

ages generated by G

according to the style of the

target dataset are convincing samples to have been

drawn from the target distribution. At last, E

is used

to generate the isomorphic representation I that are

forwarded to a supervised model M that learns how

to segment images. The aforementioned isomorphic

representation is an essential part of CoDA-Few, as

the whole supervised learning process is performed

using I . At each training iteration of CoDA-Few,

there are three routines for training the networks: (a)

Dis Update, when the generator is frozen and the dis-

criminator is updated; (b) Gen Update, when the dis-

criminator is frozen and the generator is updated; and

dated. These routines will be further detailed in the

following paragraphs.

Generative Update. This routine is responsible for

the generator updates. First, a pair of source domains

a ∼ p

and b ∼ p

are randomly selected from the

N domains used in training. A batch X

of images

from S

is then appended to a code h

generated by a

one-hot encoding scheme, intending to inform the en-

coder E

of the samples’ domain. The encoded batch

of samples X

is passed to the encoder E

, producing

an intermediate isomorphic representation I for the

input X

according to the marginal distributions com-

puted by E

for domain S

. Next, I is passed through

the decoder D

and produces X

a→b

, a translation of

images in batch X

with the style of domain S

Discriminative Update. This routine is responsible

for the discriminator updates. At the end of the De-

coder D

, the synthetic image X

a→b

is presented. The

original samples X

and the translated images X

a→b

are merged into a single batch and passed to the dis-

criminator D, which uses the adversarial loss compo-

nent to classify between real and fake samples. In

routines where the generators are being updated, the

adversarial loss is computed instead.

Supervised Update. This routine is responsible for

updating the supervised model M. For each sam-

ple X

(i)

∈ S

that has a corresponding label Y

(i)

, the

isomorphisms I

(i)

, I

(i)

a→b→a

are both fed to the same

supervised model M. Then the model M performs

the desired supervised task, generating the predictions

(i)

and

(i)

a→b→a

. These predictions can be compared

in a supervised way to Y

(i)

by employing L

if there

are labels for the image i in this batch. Since there are

always some labeled samples in this case, M is trained

to infer over isomorphic representations of both origi-

nal labeled data and translated data by the CoDA-Few

for the style of other datasets.

If domain shift is calculated and correctly ad-

justed during the training procedure, the properties

≈ X

a→b→a

and I

≈ I

a→b→a

are both achieved, sat-

isfying the Cycle-Consistency and Isomorphism, re-

spectively. Then, after training, we achieve a state

where I

≈ I

a→b→a

≈ I

. Now, it does not matter

which domain S or F is fed to E

to generate the iso-

morphism I since samples from all datasets must be-

long to the same joint distribution in I -space. There-

fore, any learning performed in I

and I

a→b→a

is uni-

versal for all domains used in the training procedure

and for any future unseen domains.

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

718

real

fake

......

(a) Training (b) Deployment

Figure 1: CoDA-Few architecture for visual DA. Training: A single Generative network G

, divided into Encoder (E

)

and Decoder (D

) blocks, performs translations between the datasets. A Discriminator D evaluates whether the fake images

generated by G

according to the style of the target dataset are convincing samples to have been drawn from the target

distribution. A single supervised model M is trained on the isomorphic representation I . Deployment: Images from the

target dataset (F ) are presented to the trained model, although the model has never seen a single sample from F during

training. E

generates the isomorphic representation I , which is used by the supervised model M to segment the images.

3.1 CoDA-Few Loss

FUNIT jointly optimizes adversarial L

adv

, image re-

construction L

rec

, and feature matching L

f ea

loss

components. The content reconstruction loss (L

rec

)

helps G

to learn a translation model in an unsuper-

vised fashion through cycle-consistency, mostly con-

tributing to the low-frequency components and se-

mantic consistency of the translation (Isola et al.,

2017). The adversarial component (L

adv

) encourages

the network to produce images with higher ﬁdelity

and more accurate high-frequency components. The

feature matching loss (L

f ea

) helps regularizing the

training, handles the instability of GANs by specify-

ing a new objective for the generator that prevents it

from overﬁtting the current discriminator (Liu et al.,

2019). Instead of directly maximizing the output of

the discriminator, this new objective instructs the gen-

erator to yield data that matches the statistics of the

authentic samples. In this case, the discriminator

is used only to specify the statistics that are worth

matching (Salimans et al., 2016). A feature extractor

is created by removing the prediction layer from

the discriminator. Then, the features from the trans-

lation output and the target image are extracted using

and used to calculate the complete loss function of

FUNIT, L

= λ

adv

a→b

) +

adv

a→b→a

)] +

f ea

( f

), f

a→b

)) +

f ea

( f

), f

a→b→a

))] +

rec

a→b→a

)]. (1)

More details about the FUNIT loss components can

be found in the original paper (Liu et al., 2019).

Aiming to tackle the unbalance from semantic

segmentation datasets, as a supervised loss compo-

nent L

sup

, CoDA-Few uses a combination of the

Cross-Entropy loss (L

(Y, ˆy) = −Y log( ˆy) − (1 −

Y )log(1− ˆy)), and the Dice loss (L

DSC

(Y, ˆy) = (2Y ˆy+

1)/(Y + ˆy+ 1)), where Y represents the pixel-wise se-

mantic map and ˆy the probabilities for each class for a

given sample. Therefore, the supervised loss is given

as L

sup

= L

(Y, ˆy)+ L

DSC

(Y, ˆy). The ﬁnal loss L for

CoDA-Few is consequently deﬁned as:

L = λ

adv

a→b

) +

adv

a→b→a

)] +

f ea

( f

), f

a→b

)) +

f ea

( f

), f

a→b→a

))] +

rec

a→b→a

)] +

sup

,M(I

)) +

sup

,M(I

)) +

sup

,M(I

a→b

)) +

sup

,M(I

b→a

))]. (2)

4 EXPERIMENTAL SETUP

The method was implemented using the PyTorch

framework and FUNIT repository (Liu et al., 2019).

All experiments were executed on an NVIDIA Titan

X Pascal GPU with 12GB of memory

CoDA-Few was trained for 10,000 iterations in the

experiments. This number of iterations was empiri-

cally found to be a good stopping point for conver-

gence (Oliveira et al., 2020). The learning rate was set

https://github.com/Arthur1511/CoDA-Few

CoDA-Few: Few Shot Domain Adaptation for Medical Image Semantic Segmentation

719

to 10

−4

with L2 normalization by weight decay with a

value of 10

−4

and the RMSProp solver. The values for

adv

= 1, λ

rec

= 0.1, λ

f ea

= 1, and λ

sup

= 1 were also

empirically chosen based on exploratory experiments

and previous knowledge from CoDAGANs. Due to

GPU memory constraints, a batch size of 3 was used.

As in FUNIT, the ﬁnal generator is a historical aver-

age version of the intermediate generators where the

update weight is 10

−3

(Karras et al., 2017).

The proposed method was applied to a total of 11

Chest X-Ray (CXR) datasets, including the Chest X-

Ray 8 (Wang et al., 2017), the Japanese Society of Ra-

diological Technology (JSRT) (Shiraishi et al., 2000),

the Montgomery and Shenzhen sets (Jaeger et al.,

2014), PadChest (Bustos et al., 2020), NLMCXR

(Demner-Fushman et al., 2016) and the OpenIST

datasets. A specialist manually labeled lungs and

heart for a random subset of 10 samples from the

Chest X-Ray 8, PadChest, Montgomery, and Shenzen

datasets, which were used for evaluation purposes.

Two sets of baselines were deﬁned:

a) CoDA-Unfair: In this case, unlabeled tar-

get images were included in the training procedure.

We used the original CoDAGANs training procedure

where the unlabeled images of the target datasets were

used in the training procedure to perform unsuper-

vised domain adaptation between two or more image

datasets. This baseline was called CoDA-Unfair.

b) CoDA-Fair: In this setting, images of the

target datasets were not available during training.

As the original CoDAGAN method is not designed

for this setting, a baseline was created by extend-

ing the CoDAGAN framework based on MUNIT.

The CoDAGAN model was trained purely using the

source datasets. Through testing, we evaluate the

performance of the predictions in the target unseen

datasets. This baseline was called CoDA-Fair.

To properly compare CoDA-Few, CoDA-Fair,

and CoDA-Unfair, all datasets were randomly split

into the same training and test sets according to an

80%/20% division. Aiming to simulate real-world

scenarios wherein the absence of labels is a signiﬁcant

problem, no samples were kept for validation pur-

poses. Results were evaluated from the last iteration

for computing the mean and standard deviation val-

ues to consider the statistical variability of the meth-

ods during the ﬁnal iterations. Quantitative evaluation

was conducted according to the well-known Jaccard

score metric.

github.com/pi-null-mezon/OpenIST

5 RESULTS AND DISCUSSION

Two segmentation tasks were evaluated: CXR lungs

and heart segmentation. Source datasets included the

JSRT, OpenIST, Shenzhen, and Montgomery repos-

itories due to the presence of labels for these tasks

in these sets. Different combinations with three and

two datasets being used as source were tested. Since

Chest X-Ray 8, PadChest, and NLMCXR do not have

training labels, they were only used as target datasets.

Among the source datasets in the heart segmentation

task, only JSRT has training labels, so the remain-

ing source datasets were used to improve the gen-

eralization of the isomorphic representations. The

cross-sample average Jaccard and conﬁdence inter-

vals with p ≤ 0.05 values for the lungs and heart seg-

mentation are shown in Figures 2 and 3. Tables 1, 2,

3, and 4 present jaccard results and standard devia-

tion, bold values represent the best overall results in a

given source dataset conﬁguration for a speciﬁc target

dataset.

The proposed CoDA-Few framework outperforms

the baselines in most of the target datasets for both

lung and heart segmentation tasks. In the lung seg-

mentation task on CXRs, (a-d) in Figure 2 and (a-f)

in Figure 3, CoDA-Few presents better results for tar-

get datasets than CoDA-Unfair, even when only two

source datasets are employed to train CoDA-Few. In

the rare cases where the baselines outperform the pro-

posed method, CoDA-Few narrowly misses and, in

some circumstances, has a slightly smaller variation.

Heart segmentation proved to be a more difﬁcult

task, with J values below 85%, as shown in (e-g) of

Figure 2 and (g-i) of Figure 3. One of the reasons that

caused the heart segmentation task to deliver worse

results when compared to the lung is the low contrast

that the heart has with the surrounding tissues, unlike

lungs that have well-deﬁned boundaries. Once more

the proposed CoDA-Few framework outperforms the

baselines in most of the targets datasets, mainly when

three source datasets are used in the training phase,

implying that the method is able to learn from multi-

ple dataset source distributions. When the baselines

surpass the proposed method, they do it by a small

gap.

Figure 2f and 3h clearly shows that CoDA-

Few outperforms all baselines for heart segmenta-

tion when well-behaved datasets, such as JSRT and

OpenIST are used as source datasets and not well-

behaved datasets are used as targets datasets, such as

Padchest. One should notice that the target datasets,

in this case, are considerably harder than the source

ones due to poor image contrast, the presence of un-

foreseen artifacts such as pacemakers, rotation, and

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

720

M C P N

100

Sources: J, O, S

(a)

S C P N

100

Sources: J, O, M

(b)

O C P N

100

Sources: J, S, M

(c)

J C P N

100

Sources: O, S, M

(d)

M C P

100

Sources: J, O, S

(e)

S C P

100

Sources: J, O, M

(f)

O C P

100

Sources: J, S, M

(g)

Legend

CoDA-Few

CoDA-Unfair

CoDA-Fair

Figure 2: Jaccard results (in %) achieved for JSRT (J), OpenIST (O), Shenzhen (S), Montgomery (M), Chest X-Ray 8 (C),

PadChest (P), and NLMCXR (N) using 3 sources for the segmentation of lungs (a-d) and heart (e-g). CoDA-Few, Unfair and

Fair baselines are represented by blue, orange, and green bars, respectively.

Table 1: Jaccard results (in %) and standard deviation for lungs segmentation using 3 source datasets. Bold cells indicate the

best Jaccard values for each target dataset.

Source Target Coda-Few Coda-Unfair Coda-Fair

JSRT

OpenIST

Shenzhen

Montgomery

CXR8

Padchest

NLMCXR

93.47 ± 6.13

86.87 ± 1.98

86.48 ± 4.05

87.24 ± 2.06

89.56 ± 10.94

83.86 ± 3.43

83.74 ± 4.37

84.55 ± 3.47

89.83 ± 16.85

86.59 ± 1.64

86.72 ± 3.65

87.32 ± 1.90

JSRT

OpenIST

Montgomery

Shenzhen

CXR8

Padchest

NLMCXR

91.62 ± 5.37

85.58 ± 2.42

87.64 ± 2.24

86.67 ± 2.43

91.02± 5.43

85.31 ± 2.53

86.11 ± 3.25

82.82 ± 6.91

91.34 ± 6.13

85.46 ± 1.81

85.61 ± 4.41

86.24 ± 3.22

JSRT

Shenzhen

Montgomery

OpenIST

CXR8

Padchest

NLMCXR

92.54 ± 1.35

83.65 ± 4.27

85.58 ± 3.88

85.83 ± 2.22

91.04 ± 1.57

82.02 ± 5.27

82.94 ± 4.48

84.23 ± 2.80

92.35 ± 1.30

83.66 ± 2.49

83.87 ± 5.64

85.08 ± 2.36

OpenIST

Shenzhen

Montgomery

JSRT

CXR8

Padchest

NLMCXR

88.32 ± 2.48

84.27 ± 1.24

84.06 ± 4.22

84.61 ± 2.62

88.47 ± 1.68

84.00 ± 1.82

83.47 ± 5.40

85.54 ± 2.64

87.14 ± 3.27

84.46 ± 1.31

85.95 ± 2.93

85.14 ± 3.24

scale differences, and health conditions. Those fac-

tors, paired with the fact that the samples from the

JSRT dataset are the only source of labels for this

task evidencing CoDA-Few’s capability of generating

a better isomorphic representation of unseen datasets.

5.1 Qualitative Results

Figures 5 and 7 show qualitative results for lungs seg-

mentation in CXR. Examples of predictions wherein

CoDA-Few outperformed the baselines are depicted

in Figure 5 while Figure 7 shows erroneous pre-

dictions achieved by the baselines and the proposed

method. Columns in both ﬁgures present the original

CoDA-Few: Few Shot Domain Adaptation for Medical Image Semantic Segmentation

721

Table 2: Jaccard results (in %) and standard deviation for heart segmentation using 3 source datasets. Bold cells indicate the

best Jaccard values for each dataset.

Source Target CoDA-Few CoDA-Unfair CoDA-Fair

JSRT

OpenIST

Shenzhen

Montgomery

CXR8

Padchest

84.56 ± 4.34

85.646 ± 3.79

83.09 ± 5.33

83.82 ± 3.12

82.56 ± 6.62

76.57 ± 11.29

81.56 ± 3.11

81.59 ± 5.05

78.64 ± 7.44

JSRT

OpenIST

Montgomery

Shenzhen

CXR8

Padchest

82.53 ± 6.55

84.89 ± 5.93

85.86 ± 5.72

76.80 ± 3.73

75.23 ± 5.00

68.56 ± 10.07

78.50 ± 3.72

83.83 ± 2.73

72.48± 16.86

JSRT

Shenzhen

Montgomery

OpenIST

CXR8

Padchest

81.10 ± 12.93

85.65 ± 3.57

84.16 ± 5.23

72.37 ± 18.73

79.89 ± 4.67

75.06 ± 7.16

79.26 ± 12.08

86.20 ± 3.95

77.35 ± 7.70

S M C P N

100

Sources: J, O

(a)

O M C P N

100

Sources: J, S

(b)

O S C P N

100

Sources: J, M

(c)

J M C P N

100

Sources: O, S

(d)

J S C P N

100

Sources: O, M

(e)

J O C P N

100

Sources: S, M

(f)

O M C P

100

Sources: J, S

(g)

S M C P

100

Sources: J, O

(h)

O S C P

100

Sources: J, M

(i)

Legend

CoDA-Few

CoDA-Unfair

CoDA-Fair

Figure 3: Jaccard results (in %) achieved for JSRT (J), OpenIST (O), Shenzhen (S), Montgomery (M), Chest X-Ray 8 (C),

PadChest (P), and NLMCXR (N) using 2 sources for the segmentation of lungs (a-f) and heart (g-i). CoDA-Few, Unfair and

Fair baselines are represented by blue, orange, and green bars, respectively.

sample, the segmentation ground truth, and predic-

tions from CoDA-Few, CoDA-Unfair, and CoDA-Fair

for visual comparison. Each row presents an image

from each one of the target datasets.

Figure 5 shows DA results for lung ﬁeld segmen-

tation using the JSRT, OpenIST, Shenzhen, and Mont-

gomery datasets both as source and target, and using

the Chest X-Ray 8, PadChest, and NLMCXR datasets

only as targets. The latter cases are considerably more

challenging than the others due to poor image con-

trast, the presence of unforeseen artifacts as pacemak-

ers, rotation and scale differences, as well as a much

wider variety of lung sizes, shapes, and health condi-

tions. However, the DA approach using CoDA-Few

for lung ﬁeld segmentation was satisfactory for most

images, only showing errors on very challenging sam-

ples.

Figures 4 and 6 show qualitative results for

heart segmentation in CXR. Examples of predictions

wherein CoDA-Few outperformed the baselines are

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

722

Table 3: Jaccard results (in %) and standard deviation for lungs segmentation using 2 source datasets. Bold cells indicate the

best Jaccard values for each dataset.

Source Target Coda-Few Coda-Unfair Coda-Fair

JSRT

OpenIST

Shenzhen

Montgomery

CXR8

Padchest

NLMCXR

91.27 ± 6.90

93.98 ± 5.27

86.39 ± 1.79

87.21 ± 3.65

87.71 ± 1.82

88.64 ± 10.60

86.43 ± 15.26

87.28 ± 2.30

86.98 ± 5.22

87.94 ± 2.75

90.91 ± 7.41

86.68 ± 19.19

86.98 ± 1.38

87.92 ± 2.36

84.15 ± 9.31

JSRT

Shenzhen

OpenIST

Montgomery

CXR8

Padchest

NLMCXR

90.20 ± 9.09

87.81 ± 19.06

85.38 ± 2.35

85.25 ± 4.26

85.65 ± 3.47

82.82 ± 14.41

76.58 ± 19.00

79.69 ± 7.95

76.95 ± 8.78

81.33 ± 3.76

90.52 ± 5.98

87.65 ± 19.33

85.22 ± 2.21

81.30 ± 5.98

84.64 ± 2.54

JSRT

Montgomery

OpenIST

Shenzhen

CXR8

Padchest

NLMCXR

92.24 ± 1.98

91.41 ± 5.38

84.02 ± 2.31

84.95 ± 4.32

84.65 ± 3.08

89.94 ± 4.40

89.31 ± 7.50

84.21 ± 3.12

83.42 ± 5.23

84.04 ± 3.53

93.34 ± 1.03

91.59 ± 5.98

85.87 ± 2.02

85.06 ± 5.77

86.61 ± 2.40

OpenIST

Shenzhen

JSRT

Montgomery

CXR8

Padchest

NLMCXR

89.84 ± 4.47

91.42 ± 10.64

86.09 ± 1.24

82.02 ± 9.92

86.18 ± 2.31

88.86 ± 1.72

86.11 ± 12.37

82.25 ± 4.28

79.19 ± 7.35

82.66 ± 3.15

89.12 ± 2.04

91.50 ± 11.75

86.75 ± 1.14

85.43 ± 5.26

86.96 ± 1.93

OpenIST

Montgomery

JSRT

Shenzhen

CXR8

Padchest

NLMCXR

89.36 ± 1.66

91.14 ± 4.81

85.26 ± 1.25

86.04 ± 2.94

85.23 ± 2.89

88.36 ± 3.66

84.62 ± 10.74

82.68 ± 4.75

84.91 ± 4.03

81.76 ± 7.82

88.39 ± 2.76

90.06 ± 6.39

84.93 ± 1.05

84.88 ± 3.87

85.13 ± 2.93

Shenzhen

Montgomery

JSRT

OpenIST

CXR8

Padchest

NLMCXR

90.05 ± 1.77

91.08 ± 1.46

83.52 ± 2.72

81.92 ± 7.51

84.16 ± 3.34

89.79 ± 1.64

91.09 ± 1.25

83.90 ± 3.00

82.58 ± 5.15

84.06 ± 3.65

88.90 ± 1.88

92.63 ± 1.38

82.55 ± 3.62

84.15 ± 4.41

85.28 ± 3.21

Table 4: Jaccard results (in %) and standard deviation for heart segmentation using 2 source datasets. Bold cells indicate the

best Jaccard values for each dataset.

Source Target CoDA-Few CoDA-Unfair CoDA-Fair

JSRT

OpenIST

Shenzhen

Montgomery

CXR8

Padchest

83.02 ± 3.50

82.92 ± 5.77

80.36 ± 6.42

84.11 ± 4.77

71.00 ± 6.12

80.84 ± 3.38

79.52 ± 5.18

73.76 ± 10.30

79.74 ± 7.93

80.16 ± 3.47

83.89 ± 3.89

77.19 ± 12.36

JSRT

Shenzhen

OpenIST

Montgomery

CXR8

Padchest

78.63 ± 10.96

75.56 ± 11.90

80.71 ± 6.65

77.32 ± 8.53

82.78 ± 6.87

85.08 ± 4.71

83.66 ± 4.60

74.21 ± 12.33

79.10 ± 6.62

77.27 ± 4.34

79.50 ± 6.06

75.24± 4.38

JSRT

Montgomery

OpenIST

Shenzhen

CXR8

Padchest

82.43 ± 10.63

82.61 ± 7.20

84.38 ± 6.69

82.82 ± 6.37

69.16 ± 27.98

75.10 ± 5.03

80.35 ± 6.20

70.94 ± 19.79

75.58 ± 23.27

80.86 ± 5.53

86.53 ± 3.21

82.48 ± 6.20

depicted in Figure 4 while Figure 6 shows erro-

neous predictions achieved by the baselines and the

proposed method. Columns in both ﬁgures present

the original sample, the segmentation ground truth,

and predictions from CoDA-Few, CoDA-Unfair, and

CoDA-Fair for visual comparison. Each row presents

an image from each one of the target datasets.

Figure 4 shows DA results for heart ﬁeld segmen-

tation using the JSRT, OpenIST, Shenzhen, and Mont-

gomery datasets both as source and target, and us-

ing the Chest X-Ray 8 and PadChest datasets only as

targets. One should notice that the latter cases are

CoDA-Few: Few Shot Domain Adaptation for Medical Image Semantic Segmentation

723

CoDA-FewImage Ground Truth

Chest-X-Ray 8

Padchest

CoDA-FairCoDA-Unfair

Montgomery Shenzhen

OpenIST

Figure 4: Qualitative heart segmentation results in CXR im-

ages for the unseen target datasets of heart segmentation.

CoDA-UnfairImage Ground Truth CoDA-Few CoDA-Fair

JSRTOpenISTShenzhenMontgomeryChest X-Ray 8PadChestNLMCXR

Figure 5: Qualitative lungs segmentation results in CXR

images for the unseen target datasets of lungs segmentation.

considerably more challenging than the others due to

poor image contrast, the presence of unforeseen arti-

facts as pacemakers, rotation and scale differences, as

well as a much wider variety of heart sizes, shapes,

and health conditions. However, the DA approach us-

ing CoDA-Few for heart ﬁeld segmentation, yielded

consistent and satisfactory predictions maps across all

target datasets for most images, only showing errors

on very challenging samples from the dataset.

CoDA-Few

Chest-X-Ray 8

Padchest

CoDA-Fair CoDA-Unfair

Montgomery

Shenzhen

OpenIST

Image Ground Truth

Figure 6: Noticeable errors in CoDA-Few and baseline re-

sults for the unseen target datasets of heart segmentation.

JSRTOpenISTShenzhenMontgomeryChest X-Ray 8PadChestNLMCXR

CoDA-UnfairImage Ground Truth CoDA-Few CoDA-Fair

Figure 7: Noticeable errors in CoDA-Few and baseline re-

sults for the unseen target datasets of lungs segmentation.

6 CONCLUSION

This paper proposed and validated a method that per-

forms Few-Shot Domain Adaptation in dense label-

ing tasks for multiple sources and target biomedi-

cal datasets. Quantitative and qualitative experimen-

tal evaluation were performed on several distinct do-

mains, datasets, and segmentation tasks. We found

empirical evidence that CoDA-Few can segment im-

ages of an unseen target dataset made available at test

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

724

time based on the knowledge of seen source datasets.

CoDA-Few was shown to be a useful Domain

Adaptation method that could learn a single model

that performs satisfactory predictions for several dif-

ferent unseen target datasets in a domain, even when

the visual patterns of these data were different. The

proposed method was able to gather both labeled and

unlabeled data in the inference process, making it

highly adaptable to a wide variety of data scarcity sce-

narios.

CoDA-Few reached results in Few-Shot DA that

are comparable to DA methods that do have access to

the target data distribution. Furthermore, it presented

better Jaccard values in most experiments where la-

beled data was scarce, such as in heart segmentation

where only JSRT provided labeled training data. The

method also presented good performance in Few-Shot

DA tasks, even for highly imbalanced classes, such as

in the case of heart segmentation, wherein the region

of interest in images represented only a very small

slice of the number of pixels.

One should notice that CoDA-Few is conceptually

not limited to 2D dense labeling tasks or biomedical

images, despite being tested only for non-volumetric

segmentation tasks in this paper. Future works will

investigate Few-Shot DA in the segmentation of vol-

umetric images, such as Computed Tomography (CT)

scans, Positron Emission Tomography (PET scans),

and Magnetic Resonance Imaging (MRI). We also

plan to test CoDA-Few in other image domains, such

as traditional Computer Vision datasets and Remote

Sensing data.

ACKNOWLEDGEMENTS

The authors would like to thank CAPES, CNPq

(424700/2018-2 and 306955/2021-0), FAPEMIG

(APQ-00449-17 and APQ-00519-20), FAPESP (grant

#2020/06744-5), and Serrapilheira Institute (grant

#R-2011-37776) for their ﬁnancial support to this re-

search project.

REFERENCES

Bustos, A., Pertusa, A., Salinas, J.-M., and de la Iglesia-

Vay

a, M. (2020). Padchest: A large chest x-ray image

dataset with multi-label annotated reports. Medical

Image Analysis, 66:101797.

Demner-Fushman, D., Kohli, M. D., Rosenman, M. B.,

Shooshan, S. E., Rodriguez, L., Antani, S., Thoma,

G. R., and McDonald, C. J. (2016). Preparing a col-

lection of radiology examinations for distribution and

retrieval. Journal of the American Medical Informat-

ics Association, 23(2):304–310.

Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B.,

Warde-Farley, D., Ozair, S., Courville, A. C., and Ben-

gio, Y. (2014). Generative adversarial nets. In NIPS.

Hoffman, J., Tzeng, E., Park, T., Zhu, J.-Y., Isola, P.,

Saenko, K., Efros, A., and Darrell, T. (2018). Cy-

cada: Cycle-consistent adversarial domain adaptation.

In ICML, pages 1989–1998. PMLR.

Huang, X., Liu, M.-Y., Belongie, S., and Kautz, J. (2018).

Multimodal unsupervised image-to-image translation.

In Proceedings of the European conference on com-

puter vision (ECCV), pages 172–189.

Isola, P., Zhu, J.-Y., Zhou, T., and Efros, A. A. (2017).

Image-to-image translation with conditional adversar-

ial networks. In Proceedings of the IEEE conference

on computer vision and pattern recognition, pages

1125–1134.

Jaeger, S., Candemir, S., Antani, S., W

ang, Y.-X. J., Lu,

P.-X., and Thoma, G. (2014). Two public chest

x-ray datasets for computer-aided screening of pul-

monary diseases. Quantitative Imaging in Medicine

and Surgery, 4(6):475.

Karras, T., Aila, T., Laine, S., and Lehtinen, J. (2017). Pro-

gressive growing of gans for improved quality, stabil-

ity, and variation. arXiv preprint arXiv:1710.10196.

Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Im-

ageNet classiﬁcation with deep convolutional neural

networks. NIPS, 25:1097–1105.

Liu, M.-Y., Breuel, T., and Kautz, J. (2017). Unsupervised

image-to-image translation networks. In Proceedings

of the 31st International Conference on Neural Infor-

mation Processing Systems, pages 700–708.

Liu, M.-Y., Huang, X., Mallya, A., Karras, T., Aila, T.,

Lehtinen, J., and Kautz, J. (2019). Few-shot unsu-

pervised image-to-image translation. In Proceedings

of the IEEE/CVF International Conference on Com-

puter Vision, pages 10551–10560.

Murez, Z., Kolouri, S., Kriegman, D., Ramamoorthi, R.,

and Kim, K. (2018). Image to image translation for

domain adaptation. In Proceedings of the IEEE Con-

ference on Computer Vision and Pattern Recognition,

pages 4500–4509.

Oliveira, H. N., Ferreira, E., and Dos Santos, J. A.

(2020). Truly generalizable radiograph segmentation

with conditional domain adaptation. IEEE Access,

8:84037–84062.

Ronneberger, O., Fischer, P., and Brox, T. (2015). U-net:

Convolutional networks for biomedical image seg-

mentation. In International Conference on Medical

image computing and computer-assisted intervention,

pages 234–241. Springer.

Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V.,

Radford, A., and Chen, X. (2016). Improved tech-

niques for training gans. Advances in neural informa-

tion processing systems, 29:2234–2242.

Shiraishi, J., Katsuragawa, S., Ikezoe, J., Matsumoto, T.,

Kobayashi, T., Komatsu, K.-i., Matsui, M., Fujita,

H., Kodera, Y., and Doi, K. (2000). Development

of a digital image database for chest radiographs with

CoDA-Few: Few Shot Domain Adaptation for Medical Image Semantic Segmentation

725

and without a lung nodule: receiver operating char-

acteristic analysis of radiologists’ detection of pul-

monary nodules. American Journal of Roentgenology,

174(1):71–74.

Sun, B. and Saenko, K. (2016). Deep coral: Correla-

tion alignment for deep domain adaptation. In Euro-

pean conference on computer vision, pages 443–450.

Springer.

Tang, Y., Tang, Y., Sandfort, V., Xiao, J., and Summers,

R. M. (2019a). Tuna-net: Task-oriented unsupervised

adversarial network for disease recognition in cross-

domain chest x-rays. In International Conference on

Medical Image Computing and Computer-Assisted In-

tervention, pages 431–440. Springer.

Tang, Y.-B., Tang, Y.-X., Xiao, J., and Summers, R. M.

(2019b). Xlsor: A robust and accurate lung segmen-

tor on chest x-rays using criss-cross attention and cus-

tomized radiorealistic abnormalities generation. In

International Conference on Medical Imaging with

Deep Learning, pages 457–467. PMLR.

Tzeng, E., Hoffman, J., Saenko, K., and Darrell, T. (2017).

Adversarial discriminative domain adaptation. In Pro-

ceedings of the IEEE conference on computer vision

and pattern recognition, pages 7167–7176.

Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., and Sum-

mers, R. (2017). Hospital-scale chest x-ray database

and benchmarks on weakly-supervised classiﬁcation

and localization of common thorax diseases. In CVPR,

pages 3462–3471.

Wu, Z., Han, X., Lin, Y.-L., Uzunbas, M. G., Goldstein,

T., Lim, S. N., and Davis, L. S. (2018). Dcan:

Dual channel-wise alignment networks for unsuper-

vised scene adaptation. In Proceedings of the Euro-

pean Conference on Computer Vision (ECCV), pages

518–534.

Yan, H., Ding, Y., Li, P., Wang, Q., Xu, Y., and Zuo,

W. (2017). Mind the class weight bias: Weighted

maximum mean discrepancy for unsupervised domain

adaptation. In CVPR, pages 2272–2281.

Yang, J., Dvornek, N. C., Zhang, F., Chapiro, J., Lin,

M., and Duncan, J. S. (2019). Unsupervised domain

adaptation via disentangled representations: Appli-

cation to cross-modality liver segmentation. In In-

ternational Conference on Medical Image Computing

and Computer-Assisted Intervention, pages 255–263.

Springer.

Zhang, J., Li, W., and Ogunbona, P. (2017). Transfer learn-

ing for cross-dataset recognition: a survey. arXiv

preprint arXiv:1705.04396.

Zhang, Y., Miao, S., Mansi, T., and Liao, R. (2018). Task

driven generative modeling for unsupervised domain

adaptation: Application to x-ray image segmenta-

tion. In International Conference on Medical Im-

age Computing and Computer-Assisted Intervention,

pages 599–607. Springer.

Zhu, J.-Y., Park, T., Isola, P., and Efros, A. A. (2017a).

Unpaired image-to-image translation using cycle-

consistent adversarial networks. In Proceedings of

the IEEE international conference on computer vi-

sion, pages 2223–2232.

Zhu, J.-Y., Zhang, R., Pathak, D., Darrell, T., Efros, A. A.,

Wang, O., and Shechtman, E. (2017b). Multimodal

image-to-image translation by enforcing bi-cycle con-

sistency. In Advances in neural information process-

ing systems, pages 465–476.

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

726