Crossing Domain Borders with Federated Few-Shot Adaptation

Manuel R

oder

1 a

, Maximilian M

unch

2 b

, Christoph Raab

3 c

and Frank-Michael Schleif

1 d

Faculty of Computer Science and Business Information Systems, Technical University of Applied Sciences

urzburg-Schweinfurt, W

urzburg, Germany

Department of Computer Science, University of Groningen, Groningen, Netherlands

IAV GmbH, Berlin, Germany

Keywords:

Federated Learning, Domain Adaptation, Few-Shot Learning, Deep Transfer Learning, Resource Constraints,

Sporadic Model Updates.

Abstract:

Federated Learning has gained signiﬁcant attention as a data protecting paradigm for decentralized, client-

side learning in the era of interconnected, sensor-equipped edge devices. However, practical applications of

Federated Learning face three major challenges: First, the expensive data labeling process required for target

adaptation involves human participation. Second, the data collection process on client devices suffers from

covariate shift due to environmental impact on attached sensors, leading to a discrepancy between source

and target samples. Third, in resource-limited environments, both continuous or regular model updates are

often infeasible due to limited data transmission capabilities or technical constraints on channel availability

and energy efﬁciency. To address these challenges, we propose FedAcross, an efﬁcient and scalable Feder-

ated Learning framework designed speciﬁcally for real-world client adaptation in industrial environments. It

is based on a pre-trained source model that includes a deep backbone, an adaptation module, and a classiﬁer

running on a powerful server. By freezing the backbone and the classiﬁer during client adaptation on resource-

constrained devices, we enable the domain adaptive linear layer to solely handle target domain adaptation and

minimize the overall computational overhead. Our extensive experimental results validate the effectiveness of

FedAcross in achieving competitive adaptation on low-end client devices with limited target samples, effec-

tively addressing the challenge of domain shift. Our framework effectively handles sporadic model updates

within resource-limited environments, ensuring practical and seamless deployment.

1 INTRODUCTION

Traditional machine learning requires a centralized

data center to store and aggregate collected training

data as obtained from local devices, such as mobile

phones, drones or thin clients. This approach has

proven to be impractical for real-world application ar-

eas, as it requires considerable effort to collect and

label data from different sources in compliance with

data protection regulations.

In (McMahan et al., 2017) Federated Learning

(FL) was introduced as a means of mitigating the se-

curity risks and costs associated with the implementa-

tion of traditional models. The proposed architecture

enables multiple edge devices to jointly learn a global

https://orcid.org/0009-0003-4907-3999

https://orcid.org/0000-0002-2238-7870

https://orcid.org/0000-0002-6988-353X

https://orcid.org/0000-0002-7539-1283

machine learning model under the administration of a

central server, while local data stay with the client.

Recent years have witnessed remarkable advance-

ments in hardware and software technologies, with

notable growth observed in the proliferation and inter-

connection of sensor-equipped edge devices (Siqueira

and Davis, 2021), nowadays frequently employed in

industrial production environments. This develop-

ment, coupled with the increasing use of 5G-capable

end devices, has signiﬁcantly boosted the attractive-

ness of FL for practical industry applications and re-

search purposes (Hard et al., 2018; Yang et al., 2018;

Yang et al., 2019; Yang et al., 2020).

A popular real-world use case, where the typ-

ical FL approach is not directly applicable, is the

classiﬁcation of waste items (Laier and Laier, 2023;

Bashkirova et al., 2023) by resource-constrained

client devices equipped with visual sensors located

at different waste sorting facilities (see conceptual

design shown in Figure 1). In times of constantly

Röder, M., Münch, M., Raab, C. and Schleif, F.

Crossing Domain Borders with Federated Few-Shot Adaptation.

DOI: 10.5220/0012351900003654

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2024), pages 511-521

ISBN: 978-989-758-684-2; ISSN: 2184-4313

511

Server

device border

Client 1 Client 2 Client 3

local model local model

local model

global model

device border

Pretrain global model

Distribute (updated)

model parameters

Adapt local model

iteration

Upstream parameters

Domain shift

across client devices

Resource limitation in client environment

Domain A Domain B

Edge Devices Expensive Annotating

spray bottle

drink cup

...

Figure 1: Challenges (left) and proposed approach (right) for federated client adaptation.

increasing amounts of garbage, waste sorting is a

crucial challenge in many communities and intelli-

gent sensor systems are a key element in the differ-

ent strategies to address the problem (Lange, 2021).

In this scenario, each client’s local model is trained

using isolated processing units, while data is gener-

ated locally and remains decentralized. Cross-device

federated learning (Kairouz et al., 2021) tackles the

aforementioned challenges of collaborative learning

across multiple machines with limited data collec-

tion, expensive labeling and restricted sharing. Fur-

thermore, there are several external drivers inﬂuenc-

ing the availability and quality of relevant sensor in-

formation. Considering e. g. visual sensors, domain

shifts (see Figure 1, left) in captured images may oc-

cur due to variations in lightning or environmental

conditions that affect brightness, contrast, color tem-

perature, perspective, and noise (Koh et al., 2021). In

general, the challenge of adjusting a system trained

in the source domain to perform better in the tar-

get domain is referred to as source-to-target adapta-

tion (Raab et al., 2022). Considering the limited hard-

ware resources of clients, such as computing power,

transmission capacity, and memory consumption, it is

essential to design a training algorithm that minimizes

client-side load while maintaining accurate model

training. Consequently, this work focuses on integrat-

ing an established pre-train and ﬁne-tune strategy into

FL, aiming to transform domain-speciﬁc features into

a task-invariant metric space to mitigate the effects of

domain shift under resource constrains.

To address transmission costs and privacy restric-

tions (see Section 3.5 for details) in cross-device FL,

we provide a practical solution not considered by

now in FL across multiple domains based on proto-

types (Snell et al., 2017), reducing data transfer over-

head and minimizing computational costs for client

device inference: Local prototypes are computed as

class-wise averaged feature vectors for memory efﬁ-

ciency and client-side label prediction is carried out

by comparing the distances between projected inputs

and class prototypes for CPU usage optimization. As

detailed below, this article presents a novel FL frame-

work FedAcross, addressing real-world challenges of

data sparsity on isolated clients as well as domain

shift occurring across clients deployed in unrestrained

industrial settings.

Main Contributions.

1. A computation-efﬁcient FL approach is proposed

to tackle target adaptation issues with limited

labeled samples and distributional shifts across

siloed devices in real-world applications. The out-

lined concept aims to improve per-device local

models on downstream classiﬁcation tasks while

iteratively optimizing global model parameters to

enhance bootstrapping of new FL clients.

2. We provide a ready-to-deploy, efﬁcient and highly

scalable, end-to-end FL solution based on Py-

torch Lightning (Falcon and team, 2019) and

Flower (Beutel et al., 2020), available on Github

3. We thoroughly assess our method on a waste item

classiﬁcation scenario using domain adaptation

benchmark data sets mirroring production con-

ditions and observed competitive adaptation per-

formance to state-of-the-art methods. This sce-

nario is exceptionally challenging and covers a

broad spectrum of real-life particularities in cross-

device FL. Notably, our approach is not limited to

waste item classiﬁcation but can be effectively ap-

plied to various other use cases

, highlighting its

versatility and potential.

2 RELATED WORK

Previous studies have extensively examined the limi-

tations of device-based FL, in which data remains iso-

lated within individual entities or organizations. Re-

searchers have conducted investigations to address the

https://github.com/cairo-thws/FedAcross

Supply chain optimization, smart grid energy manage-

ment, epidemic and disease surveillance

ICPRAM 2024 - 13th International Conference on Pattern Recognition Applications and Methods

512

challenges associated with this siloed approach, in-

cluding data privacy concerns, communication over-

head, and model performance degradation (Zhao

et al., 2022). However, the existing approaches still

have certain limitations in efﬁciently utilizing the in-

formation present in siloed data while maintaining

privacy. Achieving efﬁcient information transfer and

compact encoding is crucial to overcome the barri-

ers of crossing device borders with minimal transmis-

sion costs and achieving robust generalization capa-

bilities (Zhao et al., 2022).This is getting even more

challenging in realistic scenarios, where often only

weakly labeled data are available.

Techniques addressing centralized scenarios have

been proposed using few-shot learning (FSL) to ef-

fectively enable models to learn from limited labeled

data (Wang et al., 2020; Song et al., 2023; Hu et al.,

2022). These approaches usually rely on ﬁne-tuning

strategies to improve the model’s ability to gener-

alise and adapt to new tasks, even when only a few

labeled training samples are available. Two main

training paradigms have emerged: learning by trans-

fer (Dhillon et al., 2020), where a deep neural net-

work is trained on a source data set and subsequently

ﬁne-tuned on a downstream few-shot learning task,

and meta-learning approaches see e.g. (Snell et al.,

2017), where incremental parameter updates encode

task-speciﬁc background information in the model

optimization process.

Additionally, the siloed setting and common phe-

nomena of error-prone measurement devices intro-

duce various effects of cross-domain shifts. Cross-

domain learning speciﬁcally focuses on transferring

knowledge from a source domain to a target domain,

even when the data’s characteristics or distribution

signiﬁcantly differ. Extensive research in this area has

explored domain adaptation (DA) techniques such as

domain alignment, feature mapping and instance re-

weighting to improve model performance when ap-

plied to unseen target domains (Raab et al., 2022)

Moreover, test-time adaptation methods (Nado

et al., 2021; Zhang et al., 2022) require only few

labeled data points per class from a target domain

to optimize domain-speciﬁc adaptation parameters,

thereby providing an effective blueprint to compose

our model architecture.

Cross-domain few-shot learning (CD-FSL) deals

with a centralized combination of the aforementioned

challenges, namely the effective and fast learning of

relevant information with only a few samples while

coping with the distributional shift between source

and target data. Recent studies on CD-FSL (Guo

et al., 2020) have shown that transfer learning ap-

proaches outperform state-of-the-art meta-learning

methods on FSL benchmarks over multiple domains.

Therefore, transfer learning represents a reasonable

solution to not only avoid computationally intensive

gradient update calculations for client models running

on edge devices, but also to outsource the heavy lift-

ing of training a source model on a large data set to

a well-equipped server instance. With FedProto (Tan

et al., 2022), which is a prototype-based aggregation

method for heterogeneous clients and FPL (Huang

et al., 2023), which constructs server-side cluster pro-

totypes, prototypes have already been used in FL con-

texts; however, it does not address client-side resource

limitations and requires many labeled data. Our ap-

proach combines ideas from different ﬁelds address-

ing the mentioned issues to provide a practical solu-

tion.

3 METHODOLOGY

In Section 3.2 we provide the main prerequisites,

followed by an overview of the server model archi-

tecture. An end-to-end source model training pro-

cedure as well as some theoretical background is

provided in Section 3.1. Section 3.3 examines the

on-client model adaptation process under the con-

straints of minimal availability of labeled samples

and domain shift. Subsequently we introduce a

computation-efﬁcient, prototype-based client infer-

ence pipeline in Section 3.4 followed by the pseudo-

code for FedAcross and prototype upstreaming op-

tions in Section 3.5.

3.1 Server Model Training

The proposed server model consists of four main

components as visualized in Figure 2. Throughout

this chapter, we describe in detail each of these com-

ponents and deliver a justiﬁcation of the respective de-

sign choice in regards to the proclaimed challenges.

Target

Data

Set

Transformation Module

Feature

Extractor

Linear

Classifier

ReLU

BatchNorm

Model

Architecture

Output

Probabilities

Prototype

Generator

Prototypes

Adaptation Module

pre-training

only

Source

Data

Set

non-

trainable

pre-training and

fine-tuning

Figure 2: Model Architecture for server pre-training on

source data set D

and client ﬁne-tuning on target data set

spt

. Module colors determine whether the parameters are

frozen or trainable during respective training stages.

Crossing Domain Borders with Federated Few-Shot Adaptation

513

3.2 Prerequisites

We consider a publicly available, labeled source data

set D

= {X

} = {x

, y

}

l=1

i.i.d.

∼ p

) in the

source domain S distributed by p

. For each client

i we have a target data set D

i.i.d.

∼ {x

, y

}

l=1

i.i.d.

∼

) in the target domain T

. Without loss of

generality, we assume the amount of labeled sam-

ples per class k = k

= {0, 3, 5, 10} for client i. Sam-

ples are given as x

∈ R

, d as number of features

and y

∈ Y , Y as a discrete label space, common for

source and target domain, |Y | = L. The distributions

) ̸= p

) are subject to domain shift.

Similar as x

, we introduce class-wise prototypes ω

(n)

for the different source and target domains, where n is

a class index in Y . A central server handles the bulk

of model learning, while the training data is stored

in separate silos at different clients with limited com-

munication and processing power. The objective is to

train a classiﬁer model that can generalize to related

target domains (see ”DomainA” and ”DomainB” in

Figure 1 as an example for source and target domain

differences). With f (φ) being a feature extractor pa-

rameterized by φ and A(ψ) an adaptation module pa-

rameterized by ψ. Eventually a classiﬁer g(ν) is gen-

erated, parameterized by ν. The approach is appli-

cable to various tasks, although we assume the fea-

ture space is related to image processing. Given a FL

setup as seen in Figure 1, a computationally power-

ful server instance can fully access the source domain

data set D

. Additionally, each client i participating in

the distributed learning process has exclusive access

to its target domain data set D

. To reﬂect real-world

conditions in the modeling process, the following as-

sumptions are also made:

• The number of annotated samples per class in the

target domain data set D

is ﬁxed by k. The k-

shot support set of client i is then denoted with

spt

⊂ D

, resulting in input-output pairs within

each observed data set being equally distributed.

In production, operators collect and annotate only

k samples for local model ﬁne-tuning.

• The number of classes used for local ﬁne-tuning

can be limited to n for each client separately, giv-

ing the client operator the option to individually

select a subset of available classes to meet their

needs.

In the following paragraphs we propose a model ar-

chitecture that adheres to the above constraints and

addresses the challenges presented from beginning to

end. A summary including the most important nota-

tion used in this work can be found in Table 1.

The transformation module T is the ﬁrst compo-

Table 1: Notation Summary.

Notation Description

source domain data set on server S

target domain data set on client i

spt

k-shot support set on client i, D

spt

⊂ D

N nr. of observed classes, n = 1, ..., N

K nr. of labeled samples per class, k = 0, ..., K

f (φ) feature extractor f parameterized by φ

A(ψ) adaptation module A parameterized by ψ

g(ν) classiﬁer g parameterized by ν

(n)

prototype of class n on client i

nent of the server model. It maps the input x

taken

from the source domain data set D

onto the output ˜x

by applying a single augmentation chain that deploys

concatenated afﬁne transformations to alter model in-

put data samples using ˜x

← T (x

). Afﬁne data aug-

mentation is a widely used and valid tool to avoid

overﬁtting in the context of image classiﬁcation us-

ing deep learning models (Perez and Wang, 2017).

Further studies (Kim et al., 2022) revealed these data

augmentation techniques are also beneﬁcial for CD-

FSL to increase the data set size as well as to improve

the training procedure on the transfer learning down-

stream task on the target data set. Since the server

model is trained on the entire source data set D

, we

follow their base augmentation method for full ﬁne-

tuning scenarios, where the entire network parameters

are refreshed, and deploy horizontal ﬂipping, random

resized cropping and color jittering into the pipeline

of T .

The feature extractor f , parameterized by φ

, is

the second model component that retrieves relevant

features from the transformed input data ˜x

and pro-

duces output data

◦

, with f ( ˜x

) 7→

◦

∈ R

, m ≪

d. As a compromise between the network depth re-

quired for the image classiﬁcation problem and the

constraint of keeping the run-time resource usage as

low as possible for client endpoints in the ﬁne-tuning

stage later on, a pre-trained ResNet-34 (He et al.,

2016) backbone with about 22 million parameters is

employed on server side.

Following the feature extractor f , the next com-

ponent of the source model is the adaptation module

A parameterized by ψ

. This is the core of our con-

tribution to this work being built upon the concept of

task-speciﬁc adapters (Li et al., 2022) and universal

templates for few-shot learning (Triantaﬁllou et al.,

2021): a domain-adaptive linear layer allocates a ded-

icated set of conditional batch normalization parame-

ters and linear layer weights for the source domain

pre-training and each downstream target ﬁne-tuning

task, rendering the adaptation module A being fully

responsible for DA (Chang et al., 2019). Formally,

ICPRAM 2024 - 13th International Conference on Pattern Recognition Applications and Methods

514

the server-side adaptation module is deﬁned as

A(ψ) = A (µ, σ, γ, β,W,b;

◦

)

◦

+ b) − µ

γ + β

(1)

with

◦

denoting the output of the feature extrac-

tor, {µ, σ} denoting the batch norm statistics, {γ, β}

are the batch norm parameters and {W, b}, W ∈

m×m

, b ∈ R are the weights and bias parameters of

the linear layer, respectively. Parameter simpliﬁca-

tions are possible by taking dependencies into ac-

count. In the server model, the classiﬁcation head

g is realized by a fully connected layer that receives

the output of the adaptation module and projects that

output to a speciﬁc set of labels. Following the

deﬁnition of all essential components of the source

model, the overall decision function is expressed as

F (φ

, ψ

, ν

) = g(ν

) ◦ A(ψ

) ◦ f (φ

). The training

objective for our server-side classiﬁcation task is de-

ﬁned as

argmin

,ψ

,ν

(F (φ

, ψ

, ν

; ˜x

), y

) (2)

where L

is the cross entropy loss regularized by

label smoothing (M

uller et al., 2019) to further en-

courage robust output features and y

the ground-truth

label associated with ˜x

. All optimizations are done

with stochastic gradient descent + momentum until

convergence.

Overall, server pre-training on the source domain

data set D

is intended to learn and reﬁne a set of

features that are both discriminatory and transferable,

thereby mitigating the difference between source and

target domains. This results in the adapted parameters

of the server model being used as initial parameters

for client devices joining the FL process. Technically,

the parameter transmission can be performed in two

ways:

• On Demand. Upon joining the FL process for

the ﬁrst time, the central server transmits model

parameters to the client. The disadvantage of this

method is the high communication costs associ-

ated with the initial transfer of all parameters, de-

spite it being a ﬂexible solution.

• Pre-Conﬁgured. Upon installation, the client in-

cludes a pre-trained model parameter conﬁgura-

tion, thereby eliminating the need to update the

weights at the start and reducing the amount of

communication involved.

Further, low-end client devices join the FL process of

the central server step by step and follow the adapta-

tion as described in Section 3.3.

3.3 Client Adaptation

For each client the respective client model is designed

from the ground up with all aforementioned limita-

tions in mind. As shown in Figure 2, the client model

implements the same training pipeline as the source

model in order to ensure maximum parameter reuse.

After applying the pre-trained weights to the feature

extractor f

, the adaptation module A

, and the clas-

siﬁer g

, the parameter sets of the feature extractor

and the classiﬁer are frozen, resulting in these com-

ponents being ﬁxed during training. This strategy is

beneﬁcial in many ways: disabling the backpropaga-

tion of error especially through the deep feature ex-

tractor, thus avoiding computationally expensive gra-

dient calculations of the corresponding weights, low-

ers the hardware requirements for the client model

signiﬁcantly. Furthermore, the network training is op-

timized to achieve a convergent solution at an accel-

erated pace while maintaining stability. Lastly, exclu-

sive ﬁne-tuning of the parameters of the adaptation

module contributes to the concept of keeping domain-

speciﬁc information in a single, purpose-built model

component.

The adaptation of client i is conducted using a

reduced k-shot target support set D

spt

⊂ D

, where

k = {3, 5, 10} denotes the number annotated samples

per class, reﬂecting data scarcity in the target domain.

We further discuss and evaluate the selection of k in

Section 4. The training objective for the ﬁne-tuning

task of client i is deﬁned as

argmin

(F (ψ

; ˜x

), y

) (3)

with F (·) being the client decision function parame-

terized with ψ

, ˜x

the augmented data sample drawn

from the target domain data set D

spt

, and the corre-

sponding ground-truth label y

, respectively.

In the next step (see Figure 2), target prototypes

are calculated in the same manner as class prototypes

in ProtoNet (Snell et al., 2017) and FedProto (Tan

et al., 2022), but in FedAcross with a particular ﬁne-

tuning strategy. Our choice of the prototypical repre-

sentation is based on its high interpretability, simplic-

ity of computation, and memory efﬁciency. There-

fore, the prototype to model the n-th class on client i

is denoted as:

(n)

spt

, n|

∑

)∈D

spt

( ˜x

) (4)

where τ

(·) deﬁnes the embedding function over the

client-side feature extractor and adaptation module

with τ

( ˜x

) = A

( f

( ˜x

)). The output set of the client

adaptation is a collection of prototypes tailored to the

target data set.

Crossing Domain Borders with Federated Few-Shot Adaptation

515

3.4 Client Inference

Inference on a client device is straightforward: The

entire embedding pipeline, including feature extractor

and adaptation module A

, is upcycled to project the

unlabeled, transformed sample ˜x

, observed on client

device i, to generate the corresponding query proto-

type as illustrated in Figure 3

. The embedded query

Query Input

Transformation Module

Feature

Extractor

Linear

ReLU

BatchNorm

Client | Inference

Adaptation Module

Prototype

Generator

Target

Prototypes

Query

Prototype

Distance

Metric

Prediction

non-trainable

parameter module

Figure 3: Client Inference .

vector is then fed into the Distance Metric Module for

computation of the pairwise L2 distance between the

query vector and the pre-computed target prototypes.

The query sample is assigned to the class belonging

to the nearest target prototype denoted as

ˆy

= argmin



( ˜x

) − ω

(n)



(5)

where ˆy

is the predicted label of sample ˜x

observed

on client i.

3.5 Client Prototype Upstreaming

Clients can not only beneﬁt from the FL cycle, but

also pledge to contribute to it through sharing their

locally reﬁned knowledge without risking the expo-

sure of sensitive information. We argue that access

to raw client data points is restraint in three ways:

First, in contrast to traditional FL our approach does

not rely on interchanging model gradients, thus avoid-

ing the threat of input data reconstruction from inter-

cepted model gradients. Second, target prototypes re-

side at the mean of their respective class in the em-

bedding space, restricted to only leak information in

the same way that mean value statistics leak informa-

tion (Brinkrolf et al., 2019). Third, even in case an

adversary manages to reconstruct the feature vector

of a single data point and additionally gains access to

the ﬁne-tuned client model, the resembling of a raw

client data point encoded by a deep backbone is con-

sidered to be not a practically feasible task.

Prototype Upstreaming enables client devices to

send their generated target prototypes and adapta-

tion module parameters back to the server, minimiz-

ing data transfer whilst addressing bandwidth con-

straints and transmission latency. The prototypes are a

We show one of the X-ray images from the WeSort.AI

waste detection scenario. The rechargeable battery of a cell

phone has to be detected.

Algorithm 1: FedAcross.

highly compact injective encoding of the former train-

ing data. Similarly to how clients generate target pro-

totypes, the server can produce source prototypes by

applying Equation (4) on the source data set D

after

pre-training the source model. Compared with target

prototypes, these prototypes are more robust to out-

liers, yet they are also more specialized to the source

data set. To compensate for this shortcoming, there

are a variety of methods to enrich source prototypes

with ﬁne-tuned prototypes received from client de-

vices, e.g. by applying an appropriate fusion strat-

egy as described in (Tan et al., 2022). For enhanced

bootstrapping of new clients in the FL cycle, the ini-

tial weights of the adaptation module A can be re-

ﬁned by processing the ﬁne-tuned adaptation parame-

ters received from previous clients. FedAvg (McMa-

han et al., 2017) is one of the best known approaches

to combine model parameters within a FL context,

where weights are collected from remote devices and

averaged on a central hub. This method can be ﬂaw-

lessly integrated into our client-server setup. The

pseudo-code for FedAcross is given in Algorithm 1.

ICPRAM 2024 - 13th International Conference on Pattern Recognition Applications and Methods

516

4 EXPERIMENTS

In this section, we explain the experiments of our ap-

proach replicating a garbage classiﬁcation scenario

4.1 Implementation Setup

All experiments are conducted to reﬂect the real-

world challenges induces by domain shift over client

observations, shortage of annotations on target sam-

ples and additional restrictions imposed by the FL en-

vironment. For each experiment, the feature extractor

of the server instance is ﬁrst initialized with a ResNet-

34 or ResNet-50 architecture using corresponding

weights from pre-training on ImageNet (Russakovsky

et al., 2015), respectively. Additionally, the weight

parameters of the adaptation module and the classi-

ﬁer’s linear layers are initialized with random values

from a normal distribution. The batch normalization

layer’s weights, on the other hand, are initialized us-

ing the Xavier normal initialization method (Glorot

and Bengio, 2010). The main objective of the server

pre-training is the optimization of the server model to

recognize source domain-speciﬁc classes by minimiz-

ing the cross-entropy loss from Equation (2) on the

full source data set D

. Following (Guo et al., 2020)

for a pre-training conﬁguration in conjunction with

few-shot learning downstream tasks, an SGD opti-

mizer with initial learning rate of 0.01, momentum of

0.9 and weight decay of 0.001 is deployed. The learn-

ing rate is steadily reduced by learning rate schedul-

ing. Server training runs 300 epochs, processing ran-

domly shufﬂed mini-batches of size 128 per epoch,

with optional early stopping on convergence. The

server-side transformation module follows the recom-

mendation from (Kim et al., 2022) by applying hor-

izontal ﬂipping, random resized cropping and color

jittering to augment the input data. Following the pre-

training step, the Flower-based server opens a gRPC

network connection and listens for clients to join the

FL round.

Fine-tuning starts by booting up client instances

signaling their availability to the server with a pre-

conﬁgured parameter set taken from the pre-training

stage. Moreover, to further replicate the scarcity of

labeled target data, each client i is restricted to only

access its corresponding k-shot support set D

spt

dur-

ing training, where each support set is randomly gen-

erated from the respective domain of the DA data set

under inspection. The objective of the client train-

ing is to ﬁne-tune the client model by minimizing the

The original x-ray and multi-spectral image data could

not be made available for copyright reasons, but the data in

the experiments are sufﬁciently similar.

cross-entropy loss from Equation (3). The training

setup for clients and server is equal, except that the

client has a learning rate set to 0.1 and the mini-batch

size is set to 32. For simulation purposes, the number

of training epochs is set to 200 for one federated round

since clients can access all support samples straight

away. In real world scenarios, the training epochs

can be split and distributed over the number of fed-

erated rounds. On completion of the ﬁne-tuning pro-

cess, the client creates the target prototypes based on

Equation (4). Test accuracy is reported by evaluating

each client individually using mean-centroid classiﬁ-

cation on its respective hold-back test set and average

classiﬁcation accuracy over ﬁve runs.

4.2 Model Evaluation

In order to adequately benchmark our model, we ﬁrst

evaluate our method against approaches of source-

free unsupervised DA as set up in (Zhang et al., 2022),

focusing on single-domain performances. We chose

the ofﬁcial Ofﬁce-31 (Saenko et al., 2010) and Of-

ﬁceHome (Venkateswara et al., 2017) data sets to be

the most suitable ﬁt for evaluation purposes, since the

contained domains are based on pictures taken from

real-world objects with visual differences in terms of

lighting conditions, viewpoints and backgrounds. A

total of 31 object classes with 4110 images are present

in the Ofﬁce-31 data set spread over three domains:

Amazon (A), DSLR (D) and Webcam (W). The Of-

ﬁceHome data set used in the second experiment in-

cludes 15500 images and 65 object classes divided

into four domains: Art (A), Clipart (C), Product (P)

and Real World (RW). For both experiments, we ini-

tially select a source domain (e.g. D

= A sets do-

main A as source), pre-train the server model on it

and copy the model to ﬁne-tune it on the remaining

domains, with A → W and A → D exempliﬁying the

mean-centroid classiﬁcation task on the test set of the

target domains W and D, respectively. The perfor-

mance of FedAcross is evaluated using a ResNet-50

feature extractor, since all competitive methods uti-

lize the latter. First, we compare our approach with

source-free DA methods that permit access to the full

target data set for ﬁne-tuning: SHOT (Liang et al.,

2020), SFDA (Kim et al., 2021) and SDAA (Kurmi

et al., 2021). We also compare our approach to

recent state-of-the-art few-shot adaptation methods

FLUTE (Triantaﬁllou et al., 2021), which develops a

universal template based on multiple source data sets,

and LCCS (Zhang et al., 2022) that adapts batch nor-

malization statistics on target samples.

The experimental outcome on Ofﬁce31 (Table 2)

shows that FedAcross delivers on par adaptation re-

Crossing Domain Borders with Federated Few-Shot Adaptation

517

Table 2: Results

with ResNet-50 baseline, centralized

source-free DA and few-shot transfer learning methods on

Ofﬁce-31. ”→” indicates a domain change.

Method k Ofﬁce-31

= A D

= W D

= D

A → W A → D W → A W → D D → A D → W Avg

Baseline - 68.4 68.9 60.7 99.3 62.5 96.7 76.1

SHOT all 90.1 94.0 74.3 99.9 74.7 98.4 88.6

SFDA all 91.1 92.2 71.2 99.5 71.0 98.2 87.2

SDDA all 82.5 85.3 67.7 99.8 66.4 99.0 83.5

FLUTE

∗

5 84.6 88.2 66.4 99.1 66.4 95.3 83.3

LCCS

∗

5 92.8 91.8 75.1 99.9 75.4 98.5 88.9

FedAcross 5 89.4 90.4 63.5 94.5 60.0 90.4 81.4

FedAcross 10 97.4 98.5 71.1 98.6 71.0 97.4 89.0

sults against all competitors despite the more chal-

lenging conditions induced by the FL setup: Although

LCCS produces the best overall adaptation perfor-

mance (88.9%) with ﬁve labeled samples per class,

FedAcross achieves the best overall adaptation re-

sults of all methods under inspection with k = 10

(89.0%). Ultimately, our approach offers practical ad-

vantages over LCCS for cross-device FL scenarios:

First, FedAcross demonstrates enhanced ﬂexibility as

client adaptation does not depend on the number of

batch normalization layers of the feature extractor,

making it more versatile and applicable to a wider

range of network architectures. Second, in contrast

to FedAcross, the LCCS method requires a two-stage

adaptation process, starting with a compute-intensive

grid search in the initial stage. This demanding com-

putational task is dedicated to determine the optimal

parameter conﬁguration for its learnable coefﬁcients,

which are then applied to kick-start the gradient up-

date stage.

The results on the more challenging OfﬁceHome

benchmark data set in Table 3 reveal that the un-

supervised DA method SHOT outperforms all other

competitors in that scenario, underlining the difﬁculty

of adaptation in low data regimes (71.8% SHOT -

70.9% FedAcross, k = 10). We argue against SHOT

that it requires on average six times the amount of

(unlabeled) data points per class in the OfﬁceHome

setup (59.6 images/class for SHOT - k images/class

for FedAcross, k = 10) to achieve only slightly better

overall accuracy than FedAcross.

Two further insights regarding our problem

emerge from the results: There is a trade-off between

the number of parameters that needs to be transmit-

ted to the client initially (communication efforts) and

on-client adaptation performance determined by the

selection of the feature extractor. Moreover, the num-

ber of ground truth annotations k, is essential in en-

hancing prediction accuracy according to the speciﬁc

needs of client operators.

To investigate the effectiveness of our approach in

terms of waste item classiﬁcation, the DA benchmark

5∗

Results referenced from (Zhang et al., 2022)

Table 3: Results

with ResNet-50 baseline, centralized

source-free DA and few-shot transfer learning methods on

OfﬁceHome. ”→” indicates a domain change.

Method k OfﬁceHome

= A D

= C D

= P D

= RW

A → C A → P A → RW C → A C → P C → RW P → A P → C P → RW RW → A RW → C RW → P Avg

Baseline - 34.9 50.0 58.0 37.4 41.9 46.2 38.5 31.2 60.4 53.9 41.2 59.9 46.1

SHOT all 57.1 78.1 81.5 68.0 78.2 78.1 67.4 54.9 82.2 73.3 58.8 84.3 71.8

SFDA all 48.4 73.4 76.9 64.3 69.8 71.7 62.7 45.3 76.6 69.8 50.5 79.0 65.7

FLUTE

∗

5 49.0 70.1 68.2 53.8 69.3 65.1 53.2 46.8 70.8 59.4 51.7 77.3 61.2

LCCS

∗

5 57.6 74.5 77.0 60.0 71.5 70.9 59.2 54.7 75.9 69.2 61.2 81.5 67.8

FedAcross 5 45.9 68.9 66.1 53.9 67.6 64.8 55.8 47.2 67.2 59.5 48.1 73.3 59.9

FedAcross 10 56.7 77.0 76.3 69.1 76.7 74.6 69.5 59.4 76.6 72.4 60.4 81.5 70.9

data sets OfﬁceHome and DomainNet (Peng et al.,

2019) (30 waste object classes, Clipart and Real do-

main) are modiﬁed to only include items typically

observed in waste sorting scenarios

. In our exper-

iment, the waste sorting service provider (Srv) pre-

trains its source model on all available waste object

classes. Waste sorting facilities (Cl) ﬁne-tune their lo-

cal model on a specialized, randomly selected subset

of ten classes, respectively. In Table 4, the prediction

accuracy averaged over ﬁve runs with k = {0, 3, 5,

10} is reported. The results for OfﬁceHome (Waste)

show that FedAcross effectively improves the predic-

tion accuracy on client devices with k > 3 in a pho-

tographic adaptation task, scaling with an increased

number of annotated samples. The more challeng-

ing DomainNet (Waste) adaptation tasks across two

domains with a larger distributional gap show simi-

lar performance improvements, thus highlighting the

ﬂexibility of FedAcross.

Table 4: Adaptability of FedAcross in a waste sorting sce-

nario. ”→” indicates a domain change.

OfﬁceHome (Waste) DomainNet (Waste)

k Srv

→ Cl

Srv

→ Cl

Srv

→ Cl

Srv

→ Cl

0 87.82±0.26 78.68±0.86 54.51±0.12 65.0±0.50

3 84.42±0.25 75.73±0.73 54.74±0.06 69.0±0.65

5 89.43±0.45 84.44±0.29 57.77±0.24 76.87±0.25

10 93.45±0.56 88.91±0.19 66.48±0.32 83.18±0.53

Finally, we take advantage of the interpretable

nature of our approach to visualize the separation

progress over multiple adaptation stages using t-

SNE (van der Maaten and Hinton, 2008) on the

Ofﬁce-31 classiﬁcation task A → W. One objective

of our approach is to determine the optimal projection

that will bring samples from the same class closer to-

gether while pushing samples from different classes

further apart, resulting in prototypes with greater rep-

resentativeness. In Figure 4, the plots illustrate tar-

get sample feature projections of ﬁve classes using:

(a) an off-the-shelf ResNet-50 backbone, (b) a model

pre-trained on D

and (c) a model pre-trained on D

and ﬁne-tuned on D

spt

with their prototypes denoted

as red rectangles, respectively.

Waste object classes are speciﬁed in the FedAcross

sources.

ICPRAM 2024 - 13th International Conference on Pattern Recognition Applications and Methods

518

(a) baseline

(b) server pre-train (c) client fine-tune

Figure 4: t-SNE plot of target class data (color-coded dots)

and respective target class prototype (red triangles).

5 CONCLUSION

Although baseline prototypes are relatively close to

each other, the pre-training assists in the initial class

sample separation. Fine-tuning compresses samples

of the same class even further, creating a suitable basis

to apply a nearest-centroid classiﬁer. In this work, we

presented FedAcross, a computation-efﬁcient FL ap-

proach offering a ready-to-deploy solution for target

adaptation tasks under resource restrictions and dis-

tributional shifts across data silos. We demonstrated

the scalability and ﬂexibility of our method by ex-

emplifying an image recognition task motivated from

intelligent waste sorting systems throughout this pa-

per. By employing prototype-based few-shot learning

in combination with cross-device domain adaptation

techniques, our model achieves competitive results

in a federated server-client environment whilst keep-

ing communication and computation efforts to a min-

imum. An extensive set of experiments performed on

both public and industry data sets have demonstrated

the applicability of our proposed approach in produc-

tion environments.

While our current approach offers signiﬁcant in-

sights, it also opens up several avenues for future re-

search. An immediate extension of our work could

involve adapting our methodology to handle data

streams in a federated learning environment. This

evolution would require developing robust techniques

to manage the dynamic and potentially large-scale na-

ture of streaming data. Furthermore, integrating ac-

tive learning strategies into federated clients presents

an exciting opportunity. Such an approach would

not only address the challenge of expansive label-

ing in distributed settings but also enhance the ef-

ﬁciency of the learning process. A critical aspect

of this future work would be the quality assessment

of data points obtained from streaming data, ensur-

ing that the most informative samples contribute to

the learning process. This progression would signif-

icantly improve the model’s adaptability and perfor-

mance in real-world, dynamic scenarios. Addition-

ally, exploring the impact of these advancements on

privacy preservation and communication efﬁciency

in federated settings could provide valuable insights,

aligning with the growing need for secure and scal-

able machine learning solutions. Ultimately, these ef-

forts would contribute to the development of more so-

phisticated, efﬁcient, and practical federated learning

systems, capable of handling the complexities of real-

world data distributions and applications.

ACKNOWLEDGEMENTS

MR and MM thank the Bavarian HighTech Agenda

and the W

urzburg Center for Artiﬁcial Intelligence

and Robotics (CAIRO).

REFERENCES

Bashkirova, D., Mishra, S., Lteif, D., Teterwak, P., Kim,

D., Alladkani, F. M., Akl, J., C¸ alli, B., Bargal, S. A.,

Saenko, K., Kim, D., Seo, M., Jeon, Y., Choi, D.-

G., Ettedgui, S., Giryes, R., Hussein, S. A., Xie,

B., and Li, S. (2023). VisDA 2022 challenge: Do-

main adaptation for industrial waste sorting. CoRR,

abs/2303.14828.

Beutel, D. J., Topal, T., Mathur, A., Qiu, X., Fernandez-

Marques, J., Gao, Y., Sani, L., Kwing, H. L., Parcollet,

T., Gusm

ao, P. P. d., and Lane, N. D. (2020). Flower:

A friendly federated learning research framework.

Brinkrolf, J., G

opfert, C., and Hammer, B. (2019). Differ-

ential privacy for learning vector quantization. Neuro-

computing, 342:125–136.

Chang, W.-G., You, T., Seo, S., Kwak, S., and

Han, B. (2019). Domain-speciﬁc batch normaliza-

tion for unsupervised domain adaptation. CoRR,

abs/1906.03950:7346–7354.

Dhillon, G. S., Chaudhari, P., Ravichandran, A., and Soatto,

S. (2020). A baseline for few-shot image classiﬁca-

tion. In International conference on learning repre-

sentations, volume abs/1909.02729 of International

Conference on Learning Representations. OpenRe-

view.net.

Falcon, W. and team, T. P. L. (2019). PyTorch lightning.

Glorot, X. and Bengio, Y. (2010). Understanding the difﬁ-

culty of training deep feedforward neural networks. In

Teh, Y. W. and Titterington, M., editors, Proceedings

of the thirteenth international conference on artiﬁcial

intelligence and statistics, volume 9 of Proceedings of

machine learning research, pages 249–256, Chia La-

guna Resort, Sardinia, Italy. PMLR.

Guo, Y., Codella, N. C., Karlinsky, L., Codella, J. V., Smith,

J. R., Saenko, K., Rosing, T., and Feris, R. (2020).

A broader study of cross-domain few-shot learning.

In Vedaldi, A., Bischof, H., Brox, T., and Frahm, J.-

M., editors, Computer vision – ECCV 2020, volume

12372 of European Conference on Computer Vision,

pages 124–141. ECCV.

Crossing Domain Borders with Federated Few-Shot Adaptation

519

Hard, A., Rao, K., Mathews, R., Beaufays, F., Augenstein,

S., Eichner, H., Kiddon, C., and Ramage, D. (2018).

Federated learning for mobile keyboard prediction.

ArXiv, abs/1811.03604.

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-

ual learning for image recognition. In 2016 IEEE con-

ference on computer vision and pattern recognition

(CVPR), Computer Vision and Pattern Recognition,

pages 770–778. IEEE.

Hu, S. X., Li, D., St

uhmer, J., Kim, M., and Hospedales,

T. M. (2022). Pushing the limits of simple pipelines

for few-shot learning: External data and ﬁne-tuning

make a difference. In IEEE/CVF conference on com-

puter vision and pattern recognition, CVPR 2022, new

orleans, LA, USA, june 18-24, 2022, Computer Vision

and Pattern Recognition, pages 9058–9067. IEEE.

Huang, W., Ye, M., Shi, Z., Li, H., and Du, B. (2023). Re-

thinking federated learning with domain shift: A pro-

totype view. In 2023 IEEE/CVF Conference on Com-

puter Vision and Pattern Recognition (CVPR), pages

16312–16322, Los Alamitos, CA, USA. IEEE Com-

puter Society.

Kairouz, P., McMahan, H. B., Avent, B., Bellet, A., Ben-

nis, M., Nitin Bhagoji, A., Bonawitz, K., Charles, Z.,

Cormode, G., Cummings, R., D’Oliveira, R. G. L.,

Eichner, H., El Rouayheb, S., Evans, D., Gardner, J.,

Garrett, Z., Gasc

on, A., Ghazi, B., Gibbons, P. B.,

Gruteser, M., Harchaoui, Z., He, C., He, L., Huo, Z.,

Hutchinson, B., Hsu, J., Jaggi, M., Javidi, T., Joshi,

G., Khodak, M., Konecn

y, J., Korolova, A., Koushan-

far, F., Koyejo, S., Lepoint, T., Liu, Y., Mittal, P.,

Mohri, M., Nock, R.,

Ozg

ur, A., Pagh, R., Qi, H., Ra-

mage, D., Raskar, R., Raykova, M., Song, D., Song,

W., Stich, S. U., Sun, Z., Suresh, A. T., Tram

er, F.,

Vepakomma, P., Wang, J., Xiong, L., Xu, Z., Yang, Q.,

Yu, F. X., Yu, H., and Zhao, S. (2021). Advances and

open problems in federated learning. Found. Trends

Mach. Learn., 14(1–2):1–210.

Kim, Y., Cho, D., Han, K., Panda, P., and Hong, S. (2021).

Domain adaptation without source data. IEEE Trans-

actions on Artiﬁcial Intelligence, 2(6):508–518.

Kim, Y., Oh, J., Kim, S., and Yun, S.-Y. (2022). How to

ﬁne-tune models with few samples: Update, data aug-

mentation, and test-time augmentation.

Koh, P. W., Sagawa, S., Marklund, H., Xie, S. M., Zhang,

M., Balsubramani, A., Hu, W., Yasunaga, M., Phillips,

R. L., Gao, I., Lee, T., David, E., Stavness, I., Guo, W.,

Earnshaw, B. A., Haque, I. S., Beery, S., Leskovec,

J., Kundaje, A., Pierson, E., Levine, S., Finn, C., and

Liang, P. (2021). WILDS: A benchmark of in-the-

Wild distribution shifts. In Meila, M. and 0001, T. Z.,

editors, International conference on machine learning

(ICML), volume 139 of International Conference on

Machine Learning, pages 5637–5664. PMLR.

Kurmi, V. K., Subramanian, V. K., and Nambood-

iri, V. P. (2021). Domain impression: A source

data free domain adaptation method. CoRR,

abs/2102.09003:615–625.

Laier, N. and Laier, J. (2023). WeSort.AI homepage. https:

//www.wesort.ai/. Accessed: 2023-10-24.

Lange, J.-P. (2021). Managing plastic waste-sorting, recy-

cling, disposal, and product redesign. ACS Sustain-

able Chemistry & Engineering, 9(47):15722–15738.

Li, W., Liu, X., and Bilen, H. (2022). Cross-domain few-

shot learning with task-speciﬁc adapters. In 2022

IEEE/CVF conference on computer vision and pat-

tern recognition (CVPR), Computer Vision and Pat-

tern Recognition, pages 7151–7160, Los Alamitos,

CA, USA. IEEE Computer Society.

Liang, J., Hu, D., and Feng, J. (2020). Do we really need to

access the source data? Source hypothesis transfer for

unsupervised domain adaptation. In Proceedings of

the 37th international conference on machine learn-

ing, ICML’20, pages 6028–6039. JMLR.org.

McMahan, B., Moore, E., Ramage, D., Hampson, S., and

Arcas, B. A. y. (2017). Communication-efﬁcient

learning of deep networks from decentralized data. In

Singh, A. and Zhu, J., editors, Proceedings of the 20th

international conference on artiﬁcial intelligence and

statistics, volume 54 of Proceedings of machine learn-

ing research, pages 1273–1282.

uller, R., Kornblith, S., and Hinton, G. (2019). When does

label smoothing help? In Wallach, H. M., Larochelle,

H., Beygelzimer, A., d’Alch

e Buc, F., Fox, E. A., and

Garnett, R., editors, Neural Information Processing

Systems, pages 4696–4705. Curran Associates Inc.,

Red Hook, NY, USA.

Nado, Z., Padhy, S., Sculley, D., D’Amour, A., Laksh-

minarayanan, B., and Snoek, J. (2021). Evaluat-

ing prediction-time batch normalization for robustness

under covariate shift.

Peng, X., Bai, Q., Xia, X., Huang, Z., Saenko, K., and

Wang, B. (2019). Moment matching for multi-source

domain adaptation. In Proceedings of the IEEE in-

ternational conference on computer vision, IEEE In-

ternational Conference on Computer Vision, pages

1406–1415. IEEE.

Perez, L. and Wang, J. (2017). The effectiveness of data

augmentation in image classiﬁcation using deep learn-

ing.

Raab, C., R

oder, M., and Schleif, F.-M. (2022). Domain ad-

versarial tangent subspace alignment for explainable

domain adaptation. Neurocomputing, 506:418–429.

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh,

S., Ma, S., Huang, Z., Karpathy, A., Khosla, A.,

Bernstein, M., Berg, A. C., and Fei-Fei, L. (2015).

ImageNet large scale visual recognition challenge.

International Journal of Computer Vision (IJCV),

115(3):211–252.

Saenko, K., Kulis, B., Fritz, M., and Darrell, T. (2010).

Adapting visual category models to new domains. In

Daniilidis, K., Maragos, P., and Paragios, N., editors,

Computer vision – ECCV 2010, volume 6314 of Euro-

pean Conference on Computer Vision, pages 213–226,

Berlin, Heidelberg. Springer Berlin Heidelberg.

Siqueira, F. and Davis, J. G. (2021). Service computing for

industry 4.0: State of the art, challenges, and research

opportunities. Acm Computing Surveys, 54(9):1–38.

Snell, J., Swersky, K., and Zemel, R. (2017). Prototyp-

ical networks for few-shot learning. In Guyon, I.,

ICPRAM 2024 - 13th International Conference on Pattern Recognition Applications and Methods

520

Luxburg, U. v., Bengio, S., Wallach, H. M., Fergus,

R., Vishwanathan, S. V. N., and Garnett, R., editors,

Advances in neural information processing systems,

NIPS, pages 4077–4087.

Song, Y., Wang, T., Cai, P., Mondal, S. K., and Sahoo, J. P.

(2023). A comprehensive survey of few-shot learn-

ing: Evolution, applications, challenges, and opportu-

nities. Acm Computing Surveys, abs/2205.06743.

Tan, Y., Long, G., Liu, L., Zhou, T., Lu, Q., Jiang, J.,

and Zhang, C. (2022). FedProto: Federated prototype

learning across heterogeneous clients. In AAAI confer-

ence on artiﬁcial intelligence, volume 36 of Proceed-

ings of the AAAI Conference on Artiﬁcial Intelligence,

pages 8432–8440. Association for the Advancement

of Artiﬁcial Intelligence (AAAI).

Triantaﬁllou, E., Larochelle, H., Zemel, R., and Dumoulin,

V. (2021). Learning a universal template for few-shot

dataset generalization.

van der Maaten, L. and Hinton, G. (2008). Visualizing high-

dimensional data using t-SNE. Journal of Machine

Learning Research, 9(nov):2579–2605.

Venkateswara, H., Eusebio, J., Chakraborty, S., and Pan-

chanathan, S. (2017). Deep hashing network for

unsupervised domain adaptation. In Proceedings of

the IEEE conference on computer vision and pattern

recognition, Computer Vision and Pattern Recogni-

tion, pages 5018–5027. IEEE.

Wang, Y., Yao, Q., Kwok, J., and Ni, L. (2020). Gener-

alizing from a few examples: A survey on few-shot

learning. ACM Computing Surveys, 53:1–34.

Yang, K., Shi, Y., Zhou, Y., Yang, Z., Fu, L., and Chen, W.

(2020). Federated machine learning for intelligent IoT

via reconﬁgurable intelligent surface. IEEE Network,

34:16–22.

Yang, Q., Liu, Y., Chen, T., and Tong, Y. (2019). Federated

machine learning: Concept and applications. ACM

Transactions on Intelligent Systems and Technology,

10(2):1–19.

Yang, T., Andrew, G., Eichner, H., Sun, H., Li, W., Kong,

N., Ramage, D., and Beaufays, F. (2018). Applied

federated learning: Improving google keyboard query

suggestions.

Zhang, W., Shen, L., Zhang, W., and Foo, C.-S. (2022).

Few-shot adaptation of pre-trained networks for do-

main shift. In Raedt, L. D., editor, Proceedings of the

thirty-ﬁrst international joint conference on artiﬁcial

intelligence, IJCAI-22, volume abs/2205.15234 of In-

ternational Joint Conference on Artiﬁcial Intelligence,

pages 1665–1671. International Joint Conferences on

Artiﬁcial Intelligence Organization.

Zhao, C., Sun, X., Yang, S., Ren, X., Zhao, P., and McCann,

J. (2022). Exploration across small silos: Federated

few-shot learning on network edge. IEEE Network,

36(1):159–165.

Crossing Domain Borders with Federated Few-Shot Adaptation

521