PPVFL-SplitNN: Privacy-Preserving Vertical Federated Learning with

Split Neural Networks for Distributed Patient Data

Bashair Alrashed

1 a

, Priyadarsi Nanda

1 b

, Hoang Dinh

1 c

, Amani Aldahiri

, Hadeel Alhosaini

and Nojood Alghamdi

Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, Australia

Faculty of Computing and Information Technology, University of Jeddah, Jeddah, Saudi Arabia

Keywords:

Privacy-Preserving, Vertical Federated Learning, Split Neural Networks, Patient Data.

Abstract:

Medical data privacy regulations pose signiﬁcant challenges for sharing raw data between healthcare institu-

tions. These challenges are particularly critical when the data is vertically partitioned. In such scenarios, each

healthcare provider holds unique but complementary patient information. This makes collaborative learning

challenging while protecting patient privacy. As a result, developing effective machine learning models that

require integrated data becomes unfeasible. This leads to fragmented analyses and less effective patient care.

To address this issue, we developed a vertical federated learning framework using split neural networks to

enable secure collaboration while preserving privacy. The framework comprises three main stages: generating

symmetric keys to establish secure communication, aligning overlapping patient records across institutions

using a privacy-preserving record linkage algorithm, and collaboratively training a global machine learning

model without revealing patient privacy. We evaluated the framework on three well-known medical datasets.

Our evaluation focused on two critical scenarios: varying degrees of overlap in patient records and differing

feature distributions. The proposed framework ensures patient privacy and compliance with strict regulations,

providing a scalable and practical solution for real-world healthcare networks. It effectively addresses key

challenges in privacy-preserving collaborative machine learning.

1 INTRODUCTION

Over the past decade, the rapid digitization of health

systems and the exponential growth of digital med-

ical data have transformed the healthcare landscape.

This evolution offers new opportunities to revolution-

ize medical research and improve patient care deliv-

ery. Machine learning (ML) algorithms provide re-

searchers with new ways to efﬁciently analyze and

manage medical data. These advances drive innova-

tions that improve outcomes and streamline health-

care processes. Such algorithms enable predictive,

personalized, and cost-effective data management.

They can analyze medical images and patient records

to predict diseases and help healthcare providers de-

velop effective treatment plans. Moreover, they help

prevent complications by enabling early disease de-

tection through advanced medical systems. For ex-

https://orcid.org/0000-0002-7393-5355

https://orcid.org/0000-0002-5748-155X

https://orcid.org/0000-0002-9528-0863

ample, (Riedel et al., 2023) employed ResNetFed, a

modiﬁed ResNet50 model, to detect COVID-19 pneu-

monia on chest radiographs. Similarly, (Mali et al.,

2023) used artiﬁcial neural models to predict heart

disease.

The integration of ML algorithms into medical

systems delivers signiﬁcant beneﬁts to healthcare

providers, patients, and society. Despite their poten-

tial, ML applications in medical research face signif-

icant challenges. One key challenge is the distribu-

tion of medical data. Privacy regulations, such as

HIPAA and GDPR, restrict data sharing across health-

care providers, preventing the creation of centralized

repositories (Antunes et al., 2022). Hence, the data

remain within organizational boundaries.

In fact, patient records are distributed across mul-

tiple healthcare providers or institutions rather than

centralized in a single repository (Allaart et al., 2022).

(Allaart et al., 2022) also believed that the distribu-

tion of medical data generally follows two patterns:

horizontal or vertical, as shown in Figure 1. The hor-

izontal distribution involves sharing similar features

Alrashed, B., Nanda, P., Dinh, H., Aldahiri, A., Alhosaini, H., Alghamdi and N.

PPVFL-SplitNN: Privacy-Preserving Vertical Federated Learning with Split Neural Networks for Distributed Patient Data.

DOI: 10.5220/0013445300003979

In Proceedings of the 22nd International Conference on Security and Cryptography (SECRYPT 2025), pages 13-24

ISBN: 978-989-758-760-3; ISSN: 2184-7711

(a) Horizontal.

(b) Vertical.

Figure 1: Data Distribution Types.

across different population groups. In contrast, the

vertical distribution involves sharing data about the

same individuals, but with different attributes. For

example, in the vertical distribution, a data provider

might store attributes such as patient ID, gender, and

heart attack status. Another might have details such

as gender, age, and ST slope. This distributed nature

of medical data highlights the need for a decentral-

ized ML framework that enables collaborative model

training without sharing raw data. This framework

ensures the privacy of sensitive medical information.

Therefore, (McMahan et al., 2017) introduced a

new decentralized approach to collaboratively train

the global model without compromising data privacy,

known as federated learning (FL). FL has two main

categories based on the data distribution: horizontal

federated learning (HFL) and vertical federated learn-

ing (VFL). In HFL, each training sample shares the

same feature space. As a result, each data provider

creates its own local model independently based on its

local training samples. These local models are then

used to iteratively train a global model. This kind

of learning improves the performance of the global

model and resolves the data-shortage problem when

the data size is limited. Since HFL requires all data

providers to access the same feature space, it cannot

be directly applied to vertically distributed data.

To address the limitations of HFL for vertically

partitioned data, a VFL was introduced. Unlike HFL,

VFL enables collaboration between institutions that

hold different but complementary feature sets. To fa-

cilitate deep learning in VFL, split neural networks

were introduced (Vepakomma et al., 2018). This ar-

chitecture partitions the neural network layers among

participants, ensuring privacy by exchanging only in-

termediate outputs and gradients instead of full neu-

ral network updates. Since data providers possess

distinct feature sets, this method allows collabora-

tive model training without exposing raw data. Re-

cent research has widely adopted this architecture as

a baseline for VFL frameworks. For example, (Sun

et al., 2023) optimized communication efﬁciency in

split learning, while (Anees et al., 2024) explored its

application in scenarios with limited overlap between

participants, addressing real-world data sharing chal-

lenges.

These studies often ignore practical implementa-

tion details. They also fail to evaluate split learning

performance on diverse datasets, feature distributions,

and overlap conditions, leaving key VFL challenges

unresolved. Data heterogeneity is one of the main

challenges in the VFL framework. It requires han-

dling diverse feature distributions, which can degrade

model performance. In addition, incomplete overlap

between healthcare providers complicates the record

linking and training process in the VFL framework.

Moreover, none of the existing studies provides sys-

tematic evaluations across diverse datasets and real-

world scenarios.

Therefore, to address these limitations, we pro-

pose a novel privacy-preserving VFL framework us-

ing split neural networks (PPVFL-SplitNN). It is

designed to enable secure and efﬁcient collabora-

tion among healthcare providers while preserving pa-

tient privacy. PPVFL-SplitNN incorporates three

key stages. First, symmetric key generation estab-

lishes a secure communication channel among par-

ticipants, preventing unauthorized data access during

the linking and training process. Next, record link-

age uses privacy-preserving algorithms to accurately

align overlapping patient records across institutions

while enabling error-tolerant comparisons. Finally,

split model training exchanges intermediate embed-

dings and gradients instead of raw data to collabora-

tively train a global model. These components ad-

dress data heterogeneity, limited participant overlap,

and strict privacy constraints, offering a practical so-

lution for vertically partitioned medical data.

The proposed framework has been evaluated on

three diverse medical datasets under varying over-

lap percentages and feature distributions. The results

show that the framework achieves predictive perfor-

mance comparable to centralized learning (CL) while

preserving privacy. This makes it a robust and secure

solution for collaborative model training. To the best

of our knowledge, this is the ﬁrst work to systemat-

ically evaluate split learning across a broad range of

distributed patient data scenarios. It highlights the po-

tential of split learning to enable effective collabora-

tive learning in real-world healthcare networks. The

main contributions of this paper are summarized as

follows:

• Development of a Privacy-Preserving VFL

Framework. The proposed framework trains

split neural networks that are distributed among a

server and a number of healthcare providers. This

framework is signiﬁcant as it enables collabora-

SECRYPT 2025 - 22nd International Conference on Security and Cryptography

tive model training on vertically partitioned med-

ical data while preserving patient privacy. It en-

sures compliance with data protection regulations

and addresses challenges such as incomplete over-

lap and data heterogeneity.

• Implementation of Key Stages. The framework

includes three core stages: symmetric key gen-

eration, record linkage, and training split learn-

ing model. These stages ensure secure commu-

nication, accurate alignment of overlapping pa-

tient records, and collaborative training without

sharing raw data. Together, they address critical

challenges such as privacy preservation, data het-

erogeneity, and limited overlap. This enables se-

cure and efﬁcient model training on vertically par-

titioned datasets.

• Comprehensive Evaluation and Identiﬁcation

of Challenges. We evaluated the framework’s

predictive performance on three diverse medical

datasets with varying overlap percentages and fea-

ture distributions. The results demonstrate its ro-

bustness in real-world conditions. The results

show that the framework achieves predictive ac-

curacy and F1 scores comparable to CL while

preserving privacy. However, the evaluation re-

veals key challenges in current VFL frameworks,

including communication overhead, suboptimal

performance under limited overlap, and sensitiv-

ity to heterogeneous feature distributions. These

ﬁndings highlight areas for future research, such

as improving record linkage algorithms, optimiz-

ing communication efﬁciency, and enhancing ro-

bustness against feature heterogeneity.

2 SYSTEM OVERVIEW

2.1 System Design

This study proposes a privacy-preserving VFL frame-

work designed to address the challenges of training

ML models on vertically partitioned medical data.

The system consists of two types of entities: a cen-

tral server and multiple clients (healthcare providers),

as illustrated in Figure 2a.

A. Server. In a medical context, the server acts as a

central authority, such as a hospital group or an-

alytics provider. It ensures data privacy during

collaborative training. Figure 2a highlights the

server’s primary roles including:

• Symmetric Key Generation. It is respon-

sible for generating and distributing crypto-

(a) Entities Types.

(b) Architecture.

Figure 2: Framework Overview.

graphic keys for secure communication during

the record linkage and training process.

• Record Linkage. The server identiﬁes over-

lapping patients across participating hospitals

using their identiﬁer attributes. Techniques

like the Bloom ﬁlter are used to align patient

records while safeguarding privacy.

• Training and Updating Global Model. It has

the capability to store all the labels and the

global model, as shown in Figure 2b. In ad-

dition, it has the computing power to train and

analyze the distributed model and make pre-

dictions by aggregating the embeddings from

healthcare providers. The server is also respon-

sible for updating the global model and calcu-

lating the gradients to update the local model.

B. Client. Each client corresponds to a healthcare

provider, such as hospitals and laboratories, that

holds complementary patient attributes. For ex-

ample, in Figure 2a, Client 1 is a Cardiology

Clinic holding data on patient’s heart and blood

vessels. Client m stores radiological images, like

chest X-rays and MRIs, to help diagnose diseases

and plan treatments. Each healthcare provider can

usually store a large number of training samples

that can be used to train the ML model locally, as

shown in Figure 2b. As feedback, each client re-

ceives from the server the indices of the overlap

records and the gradients to shufﬂe its local data

and update the local model, respectively. Note

that any client can act as a “VFL server” if it has

the labels. We commonly refer to it as an ac-

PPVFL-SplitNN: Privacy-Preserving Vertical Federated Learning with Split Neural Networks for Distributed Patient Data

Figure 3: Symmetric Key Generation.

tive party, while a passive party refers to the client

holding features only.

In this paper, we adopt a semi-honest model in

which all entities honestly follow the protocol but ex-

ploit any opportunity to extract private data from in-

termediate results generated during the execution of

the proposed system. Therefore, each client cannot

interact with each other directly.

2.2 Proposed Framework

The proposed framework has been divided into three

main stages: symmetric key generation, record link-

age, and training and updating a global model. Each

stage involves the roles of the server and the clients.

• Stage 1: Symmetric Key Generation. The

server agrees ﬁrst on two public parameters (g

and p) where p should be a large prime number

and g is a primitive root modulo p (Rodriguez-

Henriquez et al., 2007), as follows:

mod p, ∀k ∈ {1, 2, . . . , p −1}. (1)

Then, each participant chooses a secret key and

exchanges its public key with the server only.

Next, each participant computes the shared secret

key using the shared public key. The secret key

should be the same for both parties (server and

healthcare provider). Then, each participant con-

verts the shared secret key to a symmetric key us-

ing a secure cryptographic hash function such as

SHA-256.

Figure 3 shows the complete process of the sym-

metric key generation. After generating the sym-

metric key, all messages between the server and

the client will be encrypted and decrypted using

the generated key. Adding this stage to the pro-

posed system helps enhance its security level dur-

ing the record linkage and model training.

• Stage 2: Record Linkage. The ﬁrst preprocess-

ing step in VFL to start a collaboration training

process is to link distributed records that belong to

the same sample ID anonymously using a privacy-

preserving algorithm. This algorithm is called a

Figure 4: Record Linkage Process.

long-term cryptographic key (CLK) and was pro-

posed in (Hardy et al., 2017) and (Nock et al.,

2018). It uses Bloom ﬁlters to preserve the pri-

vacy of identiﬁer attributes while enabling error-

tolerant comparisons.

At this stage, each participant must securely re-

ceive a hashing secret key from the server, along

with essential identiﬁer attributes that uniquely

distinguish each patient. These attributes, in-

cluding patient ID, age, and gender, are critical

for ensuring accurate and reliable identiﬁcation

throughout the process. The server encrypts these

information using the generated symmetric key

and shares them with all participants. Each client

decrypts these information using the same sym-

metric key and starts clustering the training sam-

ple using the K-means algorithm to minimize the

mean distances between the user data points and

their closest cluster centers. Followed by creating

a set of CLKs for each entity using a BLAKE2

hash function. Then, each client encrypts and ex-

changes the created CLKs with the server only.

The server decrypts and computes the similarity

between the three sets of CLKs using a dice co-

efﬁcient. It then extracts the indices of all possi-

ble pairs above the given threshold. After aligning

the overlapping patient records using the privacy-

preserving algorithm, the server encrypts the in-

dices of the matched records and securely shares

them with all clients. This ensures that each client

can identify and use only the aligned records for

collaborative training without compromising pa-

tient privacy.

Figure 4 shows the complete process of the record

linkage between three parties, a server and two

clients, in order to ﬁnd matching records without

revealing patient privacy. It is important to note

that all participants must formalize a combination

of personal characteristics, such as age and gen-

der, in the same data formats and presentations

before starting the linking process. The data for-

malization process ensures that similar records are

matched accurately while maintaining the conﬁ-

dentiality of the data involved.

SECRYPT 2025 - 22nd International Conference on Security and Cryptography

Figure 5: Model Overview.

• Stage 3: Training Split Learning Model. Due

to the medical data heterogeneity problem in

VFL, split learning is used to enable collaborative

model training while preserving patient privacy.

This approach offers several advantages, includ-

ing enhanced privacy, efﬁcient communication,

and adaptability to heterogeneous data distribu-

tions. At this stage, each client trains the bottom

model of the global model using the overlapped

samples only. Then, each client shares only the

output of the trained model, known as embed-

dings, with the server instead of sharing the com-

plete model parameters or the patient raw data in

order to preserve the patient’s privacy, as shown in

Figure 5. The server concatenates all the received

embeddings and feeds them as input to the server-

side top model. It completes the training process,

makes predictions, and computes the gradients

required to update the global model. To main-

tain the privacy-preserving advantage, the server

splits the computed gradients for each client and

shares them individually. Each client then up-

dates its local model using its respective gradients.

This process ensures that raw data and client-side

model details remain private throughout the train-

ing process. Split learning reduces communica-

tion overhead by avoiding the need to share com-

plete model parameters. It also works well with

vertically partitioned datasets, where healthcare

providers hold different features.

Figure 6 illustrates the complete process of train-

ing and updating the global model that splits into

two sub-models: the bottom and the top mod-

els located at the clients and the server side, re-

spectively. This training process iterates until the

model converges or a maximum number of itera-

tions is met.

Figure 6: Training Split Learning Model.

3 PRIVACY-PRESERVING

ALGORITHM

This section presents the formalization process of the

privacy-preserving record linkage algorithm and the

practical implementation of the split learning algo-

rithm.

3.1 Record Linkage Algorithm

Several solutions have been proposed to link the med-

ical record, including traditional merging techniques,

record linkage toolkit (De Bruin, 2019), dedupe

(Gregg and Eder, 2022), and splink (Linacre et al.,

2022). However, these solutions do not guarantee

the preservation of patient privacy when perform-

ing the record linkage process. Therefore, to ad-

dress this issue, we select the CLK algorithm to

link matching records without compromising individ-

ual privacy. This method was introduced by (Hardy

et al., 2017; Nock et al., 2018) to link related records

anonymously. It encodes identiﬁer attributes using

BLAKE2, which is a family of hash functions (Au-

masson et al., 2014). This method also uses Bloom

ﬁlters to construct a set of CLKs as follows:

clk =

∑

j=1

), (2)

where k represents the number of different indepen-

dent hash functions to compute the indices for an en-

try, and l is the length of the bit array. Using the

BLAKE2 hash function to encode identiﬁer attributes,

the possibility of a collision attack is minimized (Au-

masson et al., 2014).

Next, the constructed CLKs are used by the server

to assess the similarity between the two clients (A and

B) as follows:



1 i f D

clk

∼ D

clk

, and

0 otherwise

, (3)

where the operator ∼ can be interpreted as “the most

likely match”.

PPVFL-SplitNN: Privacy-Preserving Vertical Federated Learning with Split Neural Networks for Distributed Patient Data

Figure 7: Compute the Similarity between Two sets of

CLKs.

More precisely, the server uses the Dice coefﬁ-

cient algorithm to compare between bit strings as fol-

lows:

A,B

a + b

, (4)

where h represents the number of bit positions that are

set to 1 in both bit strings, a denotes the number of bit

positions set to 1 in A, and b denotes the number of

bit positions set to 1 in B (Schnell et al., 2009).

Figure 7 shows how the identiﬁer attributes in

client 1 are hashed using the BLAKE2 hash function

along with the secret hash key to add an additional

level of security to the record linking algorithm. Af-

ter creating the CLKs, each participant sends its own

set of CLKs to the server. The server then measures

the similarity between the two sets of CLKs to extract

the indices of the overlapped samples x

In this paper, we consider that there are com-

mon identiﬁer attributes shared between all clients

to uniquely identify the same sample ID using the

method proposed in (Hardy et al., 2017; Nock et al.,

2018). This assumption is often considered in real-

world healthcare systems to match similar records

across institutions accurately (Sun et al., 2022).

Therefore, the training sample set of each client is

divided into overlapped samples X

∈ R

×d

and

non-overlapped samples X

∈ R

×d

, where N

and N

represent the number of training samples

in the two datasets, respectively N

= N

+ N

In addition, the server stores the ground truth labels

∈ 0, 1

×C

for the overlapped samples, where C

represents the possible number of classes.

Algorithm 1 describes a general procedure of

record linkage using CLK. The server ﬁrst generates

the secret hash key and calculates the encryption pa-

rameters, which are distributed to all clients, as shown

in Figure 4. The server also selects and speciﬁes iden-

tiﬁer attributes to uniquely identify each sample and

shares the encrypted attribute with the participants.

Then, using the K-means algorithm, each participant

clusters its own data to minimize the mean distances

between the user data points and their closest cluster

centers. Next, each client uses a BLAKE2 hash

Input: hashing secret key (HSKey) and identiﬁer attributes

(IDAttr)

Output: X

∈ R

×d

Server:;

∥HSKey∥, ∥IDAttr∥ ← encrypt(HSKey, IDAttr);

Send ∥HSKey∥ and ∥IDAttr∥ to all clients;

for each client m = 1, 2, . . . , M in parallel do

HSKey, IDAttr = decrypt(∥HSKey∥, ∥IDAttr∥);

clusterData f rame =

∑

i=0

min(||x

− µ

);

clks =

∑

j=1

;

Send clks to the server;

end

Server:;

a+b

← Equation (4) ;

Send the index of X

to all clients;

for each client m = 1, 2, . . . , M in parallel do

← reshiftDataFrame(X

);

end

Algorithm 1: Privacy Preserving Record Linkage Algo-

rithm.

function to implement Bloom ﬁlters and create the set

of CLKs with Equation (2) and sends it to the server.

The server computes the similarity between the three

sets of CLKs using the dice coefﬁcient method ac-

cording to Equation (4) and returns the results X

all participating clients to re-shift their local data us-

ing reshiftDataFrame function.

After the record Linkage process, each participant

has to delete the non-overlap samples X

∈ R

×d

and uses only the overlapped samples X

∈ R

×d

to train the local neural network. Finally, using this

algorithm, which combines personally identiﬁable at-

tributes, the proposed system is able to link individual

records and extract machining records while preserv-

ing patient privacy.

3.2 Split Learning Algorithm

Each client at this stage can start training the split

learning model using the overlapped samples X

only in the privacy-preserving setting. The server ini-

tializes the training parameters θ

and sends them to

all clients. Each client trains the local model h

with

parameters θ

={x

, b

}. The output of the local

model train is called embedding or feature embed-

ding, which represents the data patterns within each

client. h

is deﬁned as follows:

= x

= σ



i−1

+ b



, i ∈ {1, 2, . . . , I},

= u

(5)

where σ⊙ is a linear function and u

is the ith

layer of the neural network (Li et al., 2023). The

server receives and concatenates all client embedding

vectors in a weighted manner (i.e. w=[h

⊙;...; h

⊙]).

SECRYPT 2025 - 22nd International Conference on Security and Cryptography

Concatenated embedding vectors act as input to the

server top model θ

that is connected to the interac-

tive layer to predict ˆy

. The server then calculates the

loss function L ⊙ as follows:

f (Θ) = L (h

(θ

, w); y

with w =







(θ

)

(θ

)







, m ∈ {1, 2, . . . , M},

(6)

where Θ = θ

i=0

represents the global model that con-

tains M local models (θ

, ..., θ

) and the top model θ

(Li et al., 2023).

The server calculates the gradients of the global

model

αι

αΘ

(

to update the global model Θ

t+1

. It also

computes the gradients for each client

αι

αh

and sends

them back to all the clients. Then, each client per-

forms a backward propagation and updates its local

model as follows:

∇

ι =

∂ι

∂θ

∑

∂ι

∂h

i−1

∂h

i−1

∂θ

, (7)

Next, each client obtains the new h

t+1

and sends

it to the server. The server repeats the process until

the global model converges or the maximum number

of iterations is met.

Algorithm 2 describes the pressure for standard

VFL training based on split neural networks using

adaptive moment estimation (ADAM). The server

ﬁrst initializes the top model θ

and sends the ini-

tialization parameters to all clients. Each client then

trains and computes the local model output h

σ(θ

, x

) in a mini-batch β of samples X

and sends

to the server. With all {h

}

(m=1)

, the server

concatenates the embeddings in a weighted manner

(w = [h

⊙;...; h

⊙]) and computes the loss function

following Equation (6). It also updates its global

model Θ using the calculated gradients

αι

αΘ

. Next, the

server computes the gradients

αι

αh

for each client and

sends them back to all clients. Finally, each client

computes and updates its local model θ

with Equa-

tion (7). This procedure iterates until the global model

Θ converges or the maximum number of iterations is

met.

4 PERFORMANCE EVALUATION

The effectiveness of the proposed framework is eval-

uated on three well-known medical datasets. The ﬁrst

section describes the simulation setups, including the

setting of three datasets and training parameters. We

Input: Feature data {X

}

m=1

, learning rate η, batch size β,

number of rounds T

Output: Global model parameters Θ

Server: Initialize top model parameters Θ

(0)

and send Θ

(0)

to all

clients;

for each round t = 0, 1, . . . , T − 1 do

for each client m = 1, 2, . . . , M in parallel do

Client m computes h

= σ(θ

· X

);

Client m sends h

to the server;

end

Server: w = {h

}

m=1

;

(t)

= L( f (Θ

(t)

, w), y);

(t+1)

= Θ

(t)

− η∇

(t)

;

∇

(t)

for each client’s output;

Send ∇

(t)

to all clients;

for each client m = 1, 2, . . . , M in parallel do

Client m ∇

(t)

;

Client m θ

(t+1)

= θ

(t)

− η∇

(t)

;

end

Algorithm 2: Training Split Learning Model.

then discuss the obtained results in detail. All exper-

iments are performed on a single machine using an

Intel (R) Core (TM) i7-8565U CPU.

4.1 Experimental Setups

4.1.1 Datasets

General information is provided on the three datasets

that were used to train and test the split learning

model: Diabetes Prediction Dataset (Mustafa, 2023),

Breast Cancer (Wolberg, 1990) and Gliomas (Tasci,

Erdal et al., 2022). The description of each dataset is

as follows.

• Diabetes Prediction. This dataset is a public col-

lection of medical and demographic data from the

Kaggle website. It is used to predict the possi-

bility of developing diabetes in patients based on

their medical history and demographic records.

It initially contains 100,000 records, each with

eight features along with the patient’s diabetes sta-

tus that is categorized as “Yes” and “No” indi-

cating the presence or absence of diabetes. For

this dataset, the learning rate is set to 0.001, and

the batch size is 256. Furthermore, the dataset

is signiﬁcantly imbalanced, which can lead the

model to disproportionately favor the majority

class (e.g., non-diabetic cases) during training. To

address this issue and ensure fair representation, it

is crucial to reduce the volume of data in the ma-

jority class, thereby achieving a balanced distribu-

tion with the minority class (e.g., diabetic cases).

• Breast Cancer. This database was obtained from

the University of Wisconsin Hospitals in 1992. It

is used to classify the cell nuclei of breast masses

PPVFL-SplitNN: Privacy-Preserving Vertical Federated Learning with Split Neural Networks for Distributed Patient Data

as malignant or benign. Initially, it contains 699

records, each with nine multivariate attributes,

along with the target value that describes whether

breast cancer is benign or malignant. For this

dataset, the learning rate is set to 0.001, and the

batch size is 32.

• Glioma. This dataset represents the histological

medical records of patients with brain tumor (i.e.,

glioma) grading. It initially contains 839 records,

each with 20 mutated genes and three clinical fea-

tures along with the grade value that determines

whether a patient is a lower grade glioma or a mul-

tiforme glioblastoma. For this dataset, the learn-

ing rate is set to 0.001, and the batch size is 32.

To train the ML model efﬁciently, the quality of the

input data must be maintained because it signiﬁcantly

impacts the output results. Therefore, it is essen-

tial to preprocess the selected datasets before making

predictions by cleaning the data, checking for miss-

ing data and duplicate records. There are no missing

data for the diabetes prediction dataset. However, it

has 3854 duplicated records. Therefore, duplicated

records have been deleted. Furthermore, records that

do not provide valuable information, such as a person

with unclear gender information or records with “no

information” in the smoking history variable, were

not included. However, the breast cancer dataset has

some missing data and duplications. Due to the lim-

itation of the dataset size, the missing values were

ﬁlled in with the mean, and duplicate records were

kept the same as in the Glioma dataset. Finally, the

datasets are divided into two portions: training and

testing using the following ratio 8:2.

4.1.2 Training Details

The proposed framework is designed to handle verti-

cally partitioned data efﬁciently while ensuring data

privacy. In this setup, the training features are split

vertically between two healthcare providers and the

labels are stored exclusively on the server side. Each

participant is equipped with speciﬁc neural network

components tailored to the dataset being trained.

For the Diabetes Prediction and Breast Cancer

datasets, each healthcare provider trains a bottom

model consisting of two fully connected layers with

Linear-ReLU activations. The server holds the top

model, which comprises a single Linear-Sigmoid

layer. The units of these fully connected layers are

16, 8 and 1, respectively.

For the Glioma dataset, which has a higher feature

dimensionality, the architecture is expanded to ac-

commodate the complexity of the data. Each client’s

bottom model consists of two fully connected Linear-

ReLU layers, while the server’s top model includes

three Linear-ReLU layers followed by a sigmoid out-

put layer. The units of these fully connected layers

are 32, 16, 16, 8, and 1, respectively.

The models are initialized with random weights

using PyTorch’s default initialization method to en-

sure consistent starting conditions across all training

rounds. The ADAM optimizer is used for training,

with a learning rate of 0.001 to balance convergence

speed and stability. Binary cross-entropy is used as

the loss function to handle binary classiﬁcation tasks

effectively.

The training process involves 200 rounds of com-

munication between the clients and the server. During

each round, the clients process their local data and

compute intermediate embeddings, which are sent to

the server. The server concatenates these embeddings,

trains the top model, and computes gradients that are

propagated back to update global and local models.

This iterative process ensures the privacy of raw data

while enabling collaborative model training.

4.2 Performance Metrics

The performance of the proposed system is evaluated

using three key metrics: Accuracy, F1 Score, and the

Confusion Matrix.

1. Accuracy. Measures the proportion of correctly

predicted samples to the total predictions, reﬂect-

ing the overall performance of the model.

2. F1 Score. It captures the model’s ability to bal-

ance precision and recall, minimizing false posi-

tives (FP) and false negatives (FN). This is crucial

in healthcare applications due to the challenges

posed by imbalanced datasets.

3. Confusion Matrix. Visualizes the performance

of the classiﬁcation model, showing the distribu-

tion of true positives (TP), true negatives (TN),

FP, and FN.

4.3 Evaluating Split Learning

Algorithm

The performance of the proposed PPVFL-SplitNN

framework is evaluated against four scenarios:

• Baseline (CL). In this scenario, all data are inte-

grated into a single repository to train a CL model

and achieve optimal performance. However, this

approach violates privacy regulations by requiring

raw data sharing (Blue line in the plots).

• PPVFL-SplitNN (Stander). The proposed

framework incorporates privacy-preserving tech-

niques such as symmetric key generation, record

SECRYPT 2025 - 22nd International Conference on Security and Cryptography

Table 1: Evaluation Metrics in Comparison between the Centralize and the PPVFL-SplitNN framework.

Database Name

Training

Samples

Testing

Samples

Centralize PPVFL-SplitNN

Accuracy (%) F1 Score (%) Accuracy (%) F1 Score (%)

Diabetes Prediction 5908 1478 88.70 90.22 85.38 86.55

Breast Cancer Wisconsin 559 140 96.42 94.38 95 92.30

Gliomas 671 168 86.30 86.22 80.95 77.77

linkage, and split learning. It represents the ideal

scenario with fully overlapping records and con-

sistent feature distributions (Orange line in the

plots).

• PPVFL-SplitNN + Varying Overlap Percent-

age. Evaluates the framework in conditions with

limited overlap among participants, simulating

real-world healthcare challenges (Green line in

the plots).

• PPVFL-SplitNN + Differ Feature Distribution.

Tests the framework’s ability to handle data het-

erogeneity, where participants hold diverse fea-

ture distributions (Red line in the plots).

Table 1 presents the evaluation metrics of the model

trained with the PPVFL-SplitNN framework com-

pared to the CL, while Figures 8 to 10 visualize the

confusion matrices for each dataset, showcasing the

number of TP, TN, FP and FN. These results demon-

strate that our framework achieves comparable perfor-

mance to the CL framework while preserving patient

privacy.

However, a slight performance gap is observed

between the two approaches. This gap can be pri-

marily attributed to the communication overhead in

split learning, which introduces latency and slows

down convergence due to the exchange of interme-

diate embeddings and gradients between the server

and clients. In contrast, CL beneﬁts from seamless

data integration and optimization within a single en-

vironment. Additionally, split learning suffers from

the lack of end-to-end gradient optimization across all

model layers. Since only partial gradients are visible

during training, updates to the bottom and top models

may become suboptimal, especially when the client

data features are highly heterogeneous. Despite these

challenges, the confusion matrices in Figures 8 to 10

indicate that our framework achieves similar TP and

TN values as CL, demonstrating its effectiveness for

binary classiﬁcation tasks.

In addition, Table 2 shows that our work achieves

comparable results to those reported by (Guo et al.,

2020), (Tasci et al., 2022) and (Fadillah et al., 2023).

However, there are signiﬁcant differences in the

methodology and the system design. Speciﬁcally,

(Tasci et al., 2022; Fadillah et al., 2023) relied on the

CL approach, which requires the aggregation of raw

data from all healthcare providers in a single repos-

(a) CL. (b) PPVFL-SplitNN.

Figure 8: Show a Comparison between the CL, and the

Model Trained with PPVFL-SplitNN Framework when us-

ing the Breast Cancer.

(a) CL. (b) PPVFL-SplitNN.

Figure 9: Show a Comparison between the CL, and the

Model Trained with PPVFL-SplitNN Framework when us-

ing the Diabetes Prediction.

(a) CL. (b) PPVFL-SplitNN.

Figure 10: Show a Comparison between the CL, and the

Model Trained with PPVFL-SplitNN Framework when us-

ing the Gliomas Prediction.

itory. Although their methods achieved high predic-

tive performance, CL raises substantial privacy con-

cerns, particularly in healthcare, where data sensitiv-

ity is critical and regulations restrict data sharing. On

the other hand, (Guo et al., 2020) adopted an HFL

approach, achieving an F1 score of 0.88 and an ac-

curacy of 0.91, results close to those obtained by our

framework. However, HFL assumes horizontally par-

titioned data, where data providers share the same fea-

tures but for different patients. This assumption limits

the applicability of their method in vertically parti-

tioned settings, where different providers hold com-

plementary features for the same individuals.

Our work addresses this limitation by imple-

menting a PPVFL-SplitNN framework, which en-

ables collaborative training across vertically parti-

tioned datasets while preserving privacy. Importantly,

our method achieves predictive performance compa-

rable to (Guo et al., 2020) while operating under

stricter data constraints. By exchanging only inter-

PPVFL-SplitNN: Privacy-Preserving Vertical Federated Learning with Split Neural Networks for Distributed Patient Data

Table 2: Evaluation Metrics of the CL.

Database Name Reference Model Type Accuracy (%) F1 Score (%)

Diabetes Prediction

PPVFL-SplitNN NN 88.70 90.22

(Fadillah et al., 2023)

K-Nearest Neighbors 87 86.64

Random Forest 90.70 90.41

Logistic Regression 88.64 88.45

Breast Cancer Wisconsin

PPVFL-SplitNN NN 96.42 94.38

(Guo et al., 2020) NN 95 92

Gliomas

PPVFL-SplitNN NN 86.30 86.22

(Tasci et al., 2022) SVM + RF + AdaBoost 86.4 84.2

Table 3: Training Accuracy (%) of Split Learning Model.

Reference MNIST Titanic

PPVFL-SplitNN 81.16 68.16

Flower (Beutel et al., 2020) - 65

PyVertical (Romanini et al., 2021) 91.982 -

mediate embeddings and gradients, our framework

ensures compliance with privacy regulations and en-

hances suitability for real-world healthcare applica-

tions by eliminating the need to share raw data.

On the other hand, in VFL, we compare the train-

ing accuracy of our framework with existing frame-

works such as Flower (Beutel et al., 2020) and PyVer-

tical (Romanini et al., 2021) using two well-known

datasets: MNIST and Titanic. As shown in Table

3, our results are comparable to these frameworks,

with a slight improvement in training accuracy. In

comparison, Flower provides a general purpose FL

framework. However, it does not explicitly focus

on vertical partitioning, which is essential in medical

datasets where data are distributed between health-

care providers with complementary features. Simi-

larly, PyVertical focuses on vertical data. However, it

does not implement advanced techniques for privacy

record linkage or consider varying overlap percent-

ages, which are critical factors affecting performance

in real-world scenarios.

By addressing these limitations, our framework

ensures better alignment of shared records and im-

proved training efﬁciency, leading to a slight yet con-

sistent improvement in training accuracy. These re-

sults demonstrate the practical applicability of our

framework for vertically partitioned medical data,

where privacy and performance must be balanced si-

multaneously. The effectiveness of our framework

is further evaluated in two distinct scenarios to ana-

lyze its performance under realistic medical data chal-

lenges. These evaluations highlight the framework’s

robustness in handling varying overlap percentages

and feature distributions, reﬂecting the complexities

of real-world healthcare applications.

4.3.1 Impact of Overlap Percentage

In this scenario, we consider the effect of incomplete

overlap, where fewer than 100% of patient records

are shared across participating hospitals. This sit-

uation reﬂects real-world challenges, such as frag-

mented healthcare systems where not all patients

have records in every hospital, particularly in ru-

ral or under-resourced areas. As shown in Table 4,

lower overlap percentages (e.g., 60%) lead to a slight

degradation in model accuracy and F1 scores due

to the reduced number of shared samples available

for training. This limits the server’s ability to ag-

gregate meaningful embeddings across participants,

impacting global model performance. However, the

degradation is not dramatic, demonstrating the robust-

ness of our framework under incomplete data con-

ditions. These ﬁndings highlight the importance of

robust record linkage techniques to maximize shared

sample alignment and suggest opportunities to lever-

age non-overlapping samples for better data utiliza-

tion in future work.

4.3.2 Impact of Feature Distribution

In this section, we investigate the effect of the fea-

ture distribution on the model performance using the

three medical datasets. Randomized feature distri-

butions introduce redundancy, imbalance, and noise,

which degrade accuracy and F1 scores compared to

manually engineered distributions. As shown in Ta-

ble 5, this degradation underscores the critical role of

feature engineering in VFL. For instance, integrating

heterogeneous data sources, such as imaging labora-

tories (holding radiology data) and clinical databases

(storing demographic and test results), requires care-

ful feature selection to ensure meaningful contribu-

tions from all participants. Thus, performing fea-

ture engineering or feature selection within the VFL

framework becomes essential to maintain model per-

formance.

4.4 Performance Analysis

The testing accuracy and F1 scores for the evaluated

scenarios are presented in Figures 11 to 13. The re-

sults conﬁrm that the CL achieves higher and more

stable performance due to seamless data integration

and full gradient optimization. However, CL is im-

practical for healthcare applications because of pri-

vacy regulations and the sensitive nature of patient

SECRYPT 2025 - 22nd International Conference on Security and Cryptography

Table 4: Evaluation Metrics of Our Framework in Case if the Number of Overlap Records is Different.

Overlap (%) Database Name Training Samples Testing Samples Accuracy (%) F1 Score (%)

100

Diabetes Prediction 5908 1478 85.38 86.55

Breast Cancer Wisconsin 559 140 95 92.30

Gliomas 671 168 80.95 77.77

Diabetes Prediction 3544 887 84.44 85.91

Breast Cancer Wisconsin 335 84 94.04 91.22

Gliomas 402 101 76.23 72.72

Table 5: Evaluation Metrics of Our Framework based on the Feature Distribution.

Database

Name

Feature Distribution Accuracy

(%)

F1 Score

(%)Type Client 1 Client 2

Diabetes

Prediction

manual New ID, gender, age, hypertension, heart dis-

ease

New ID, gender, age, smoking history, bmi,

HbA1c level, blood glucose level

85.38 86.55

random New ID, gender, age, hypertension, blood glu-

cose level, HbA1c level, BMI, smoking history

(e.g., ever, current)

New ID, gender, age, heart disease, smoking

history (e.g., never, not current, former)

84.57 86.11

Breast Cancer

Wisconsin

manual New ID, Clump thickness, Uniformity of cell

size, Uniformity of cell shape, Marginal adhe-

sion

New ID, Single epithelial cell size, Bare nu-

clei, Bland chromatin, Normal nucleoli, Mi-

toses

95 92.30

random New ID, Bland chromatin, Uniformity of cell

shape, Uniformity of cell size, Normal nucle-

oli, Single epithelial cell size

New ID, Bare nuclei, Mitoses, Marginal ad-

hesion, Clump thickness

92.14 89.32

Gliomas

manual New ID, Gender, Age at diagnosis, IDH1,

TP53, ATRX, PTEN, EGFR, CIC, MUC16,

PIK3CA, NF1, PIK3R1, Race

New ID, Gender, Age at diagnosis,

FUBP1, RB1, NOTCH1, BCOR, CSMD3,

SMARCA4, GRIN2A, IDH2, FAT4,

PDGFRA

80.95 77.77

random New ID, Gender, Age at diagnosis, FUBP1,

NF1, ATRX, BCOR, PDGFRA, PTEN,

MUC16, TP53, GRIN2A, EGFR, RB1,

NOTCH1, Race (e.g., american indian or

alaska native, white)

New ID, Gender, Age at diagnosis,

PIK3CA, IDH2, CIC, PIK3R1, FAT4,

CSMD3, IDH1, SMARCA4, Race (e.g.,

black or african American, asian)

80.95 77.14

(a) Testing Accuracy. (b) F1-Score.

Figure 11: Show the Model Performance under the Evalu-

ated Scenarios when using Breast Cancer.

(a) Testing Accuracy. (b) F1-Score.

Figure 12: Show the Model Performance under the Evalu-

ated Scenarios when using the Diabetes Prediction.

(a) Testing Accuracy. (b) F1-Score.

Figure 13: Show the Model Performance under the Evalu-

ated Scenarios when using the Gliomas Prediction.

data.

In contrast, our proposed framework preserves

medical data privacy while achieving performance

comparable to CL. This demonstrates its practical-

ity for collaborative model training in healthcare net-

works. However, careful data utilization is critical

to avoid model degradation. Scenarios with reduced

overlap or randomized feature distributions highlight

the need for robust record linkage and feature engi-

neering techniques to maintain model performance.

5 CONCLUSIONS AND FUTURE

WORKS

In this paper, we proposed a privacy-preserving VFL

framework that uses split learning to address chal-

lenges in the training of ML models on vertically par-

titioned data. Our framework ensures privacy preser-

vation, data security, and collaboration among health-

care providers in real-world scenarios. Evaluations on

three medical datasets show that the proposed frame-

work achieves a performance comparable to CL while

preserving patient privacy. It demonstrates robust-

ness in handling incomplete overlap and diverse fea-

ture distributions, offering a practical solution for sen-

sitive healthcare networks. These ﬁndings highlight

the potential of our frameworks for advancing med-

ical research and patient care while maintaining pri-

vacy. Future work should focus on optimizing record

linkage and reducing communication overhead to im-

prove scalability and efﬁciency in large-scale settings.

PPVFL-SplitNN: Privacy-Preserving Vertical Federated Learning with Split Neural Networks for Distributed Patient Data

REFERENCES

Allaart, C. G., Keyser, B., Bal, H., and Van Halteren, A.

(2022). Vertical split learning-an exploration of pre-

dictive performance in medical and other use cases. In

2022 International Joint Conference on Neural Net-

works (IJCNN), pages 1–8. IEEE.

Anees, A., Field, M., and Holloway, L. (2024). A neural

network-based vertical federated learning framework

with server integration. Engineering Applications of

Artiﬁcial Intelligence, 138:109276.

Antunes, R. S., Andr

e da Costa, C., K

uderle, A., Yari,

I. A., and Eskoﬁer, B. (2022). Federated learning for

healthcare: Systematic review and architecture pro-

posal. ACM Transactions on Intelligent Systems and

Technology (TIST), 13(4):1–23.

Aumasson, J.-P., Meier, W., Phan, R. C.-W., Henzen, L.,

Aumasson, J.-P., Meier, W., Phan, R. C.-W., and Hen-

zen, L. (2014). Blake2. The Hash Function BLAKE,

pages 165–183.

Beutel, D. J., Topal, T., Mathur, A., Qiu, X., Fernandez-

Marques, J., Gao, Y., Sani, L., Kwing, H. L., Par-

collet, T., Gusm

ao, P. P. d., and Lane, N. D. (2020).

Flower: A friendly federated learning research frame-

work. arXiv preprint arXiv:2007.14390.

De Bruin, J. (2019). Python Record Linkage Toolkit: A

toolkit for record linkage and duplicate detection in

Python.

Fadillah, M. I., Aminuddin, A., Rahardi, M., Abdulloh,

F. F., Hartatik, H., and Asaddulloh, B. P. (2023). Dia-

betes diagnosis and prediction using data mining and

machine learning techniques. In 2023 International

Workshop on Artiﬁcial Intelligence and Image Pro-

cessing (IWAIIP), pages 110–115. IEEE.

Gregg, F. and Eder, D. (2022). Dedupe. https://github.com/

dedupeio/dedupe.

Guo, Y., Liu, F., Cai, Z., Chen, L., and Xiao, N. (2020).

Feel: A federated edge learning system for efﬁcient

and privacy-preserving mobile healthcare. In Pro-

ceedings of the 49th International Conference on Par-

allel Processing, pages 1–11.

Hardy, S., Henecka, W., Ivey-Law, H., Nock, R., Patrini, G.,

Smith, G., and Thorne, B. (2017). Private federated

learning on vertically partitioned data via entity reso-

lution and additively homomorphic encryption. ArXiv

Preprint ArXiv:1711.10677.

Li, A., Huang, J., Jia, J., Peng, H., Zhang, L., Tuan, L. A.,

Yu, H., and Li, X.-Y. (2023). Efﬁcient and privacy-

preserving feature importance-based vertical feder-

ated learning. IEEE Transactions on Mobile Comput-

ing.

Linacre, R., Lindsay, S., Manassis, T., Slade, Z., Hep-

worth, T., Kennedy, R., and Bond, A. (2022). Splink:

Free software for probabilistic record linkage at scale.

International Journal of Population Data Science,

7(3):1794.

Mali, B., Saha, S., Brahma, D., Pinninti, R., and Singh, P. K.

(2023). Towards building a global robust model for

heart disease detection. SN Computer Science, 4(5):1–

12.

McMahan, B., Moore, E., Ramage, D., Hampson, S., and

y Arcas, B. A. (2017). Communication-efﬁcient learn-

ing of deep networks from decentralized data. In Ar-

tiﬁcial intelligence and statistics, pages 1273–1282.

PMLR.

Mustafa, M. (2023). Diabetes prediction dataset.

https://www.kaggle.com/datasets/iammustafatz/

diabetes-prediction-dataset.

Nock, R., Hardy, S., Henecka, W., Ivey-Law, H., Patrini,

G., Smith, G., and Thorne, B. (2018). Entity resolu-

tion and federated learning get a federated resolution.

ArXiv Preprint ArXiv:1803.04035.

Riedel, P., von Schwerin, R., Schaudt, D., Hafner, A., and

ate, C. (2023). ResNetFed: Federated deep learning

architecture for privacy-preserving pneumonia detec-

tion from COVID-19 chest radiographs. Journal of

Healthcare Informatics Research, pages 1–22.

Rodriguez-Henriquez, F., Perez, A. D., Saqib, N. A., and

Koc, C. K. (2007). A brief introduction to modern

cryptography. Cryptographic Algorithms on Recon-

ﬁgurable Hardware, pages 7–33.

Romanini, D., Hall, A. J., Papadopoulos, P., Titcombe, T.,

Ismail, A., Cebere, T., Sandmann, R., Roehm, R., and

Hoeh, M. A. (2021). Pyvertical: A vertical federated

learning framework for multi-headed splitnn. arXiv

preprint arXiv:2104.00489.

Schnell, R., Bachteler, T., and Reiher, J. (2009). Privacy-

preserving record linkage using bloom ﬁlters. BMC

medical informatics and decision making, 9:1–11.

Sun, C., van Soest, J., Koster, A., Eussen, S. J., Schram,

M. T., Stehouwer, C. D., Dagnelie, P. C., and Dumon-

tier, M. (2022). Studying the association of diabetes

and healthcare cost on distributed data from the maas-

tricht study and statistics netherlands using a privacy-

preserving federated learning infrastructure. Journal

of Biomedical Informatics, 134:104194.

Sun, J., Xu, Z., Yang, D., Nath, V., Li, W., Zhao, C., Xu, D.,

Chen, Y., and Roth, H. R. (2023). Communication-

efﬁcient vertical federated learning with limited over-

lapping samples. In Proceedings of the IEEE/CVF

International Conference on Computer Vision, pages

5203–5212.

Tasci, E., Zhuge, Y., Kaur, H., Camphausen, K., and

Krauze, A. V. (2022). Hierarchical voting-based fea-

ture selection and ensemble learning model scheme

for glioma grading with clinical and molecular charac-

teristics. International Journal of Molecular Sciences,

23(22):14155.

Tasci, Erdal, Camphausen, Kevin, Krauze, Andra Valentina,

and Zhuge, Ying (2022). Glioma Grading Clinical and

Mutation Features. UCI Machine Learning Reposi-

tory. DOI: https://doi.org/10.24432/C5R62J.

Vepakomma, P., Gupta, O., Swedish, T., and Raskar, R.

(2018). Split learning for health: Distributed deep

learning without sharing raw patient data. arXiv

preprint arXiv:1812.00564.

Wolberg, W. (1990). Breast Cancer Wisconsin (Orig-

inal). UCI Machine Learning Repository. DOI:

https://doi.org/10.24432/C5HP4Z.

SECRYPT 2025 - 22nd International Conference on Security and Cryptography