Research on Privacy and Security Issues in Federated Learning

Xinyuan Bi

International School, Beijing University of Posts and Telecommunications, Beijing, 100876, China

Keywords: Federated Learning, Privacy Security, Encryption Technology, Security Audit.

Abstract: In the digital age, data privacy and security have become key issues. Federated learning, as an emerging

distributed machine learning technology, has been widely applied in fields such as finance and healthcare.

However, federated learning still faces many challenges in privacy protection. This paper deeply studies the

privacy and security issues of federated learning analyzes its technical principles, privacy protection

mechanisms, and performance in practical applications. Through the analysis of application cases in fields

such as finance and healthcare, it explores the advantages and disadvantages of federated learning in privacy

protection. On this basis, this paper proposes improvement strategies such as optimizing encryption

technology, strengthening model security, establishing a security audit mechanism, and improving laws and

regulations, aiming to enhance the privacy and security level of federated learning and provide guarantees for

its stable application in various fields. In the future, the research on privacy and security of federated learning

will develop towards more intelligent, efficient, and integrated directions. It is necessary to further study new

privacy protection technologies, strengthen dynamic security monitoring and adaptive defense capabilities,

and formulate unified privacy and security standards and norms to promote the safe application and

development of federated learning technology worldwide.

1 INTRODUCTION

At a time when big data and artificial intelligence are

booming, data has become a core resource driving

innovation and development in various fields. The

traditional centralized data processing model faces

the risks of data leakage and misuse in the process of

data collection, storage, and use, seriously threatening

user privacy and security. At the same time, the

phenomenon of data silos between different

institutions and organizations hinders data circulation

and sharing, limiting the development of AI

technology.

As an innovative distributed machine learning

paradigm, Federated Learning takes “data does not

move, model moves” as its core idea so that

participants can jointly train high-precision models

without sharing original data and only exchange

model parameters or intermediate results, effectively

solving the data silo problem and reducing the risk of

data leakage. The privacy and security research of

federated learning is of great significance, which is

mainly reflected in the protection of user privacy, the

https://orcid.org/0009-0005-0284-640X

promotion of data circulation and sharing, the

enhancement of model security and reliability, and

the promotion of its wide application, so as to provide

support for the development of innovation in various

fields (Li, Zhou, 2024; Xiao et al., 2024).

Currently, the research on the privacy and security

of federated learning has made some progress. In

terms of privacy protection techniques, scholars

propose a variety of methods. Differential privacy

achieves privacy protection by adding noise to the

data; homomorphic encryption allows computation

over ciphertexts; secure multiparty computation

ensures data privacy security for all parties by

collaborating with multiple parties to accomplish

computational tasks. Li et al. proposed a new privacy-

preserving framework to ensure that data is always

encrypted during transmission and computation,

effectively preventing data leakage and providing

end-to-end privacy protection from data generation to

model update. In terms of computational efficiency,

experiments show that the framework only slightly

decreases the model accuracy (e.g., the accuracy of

the MNIST dataset decreases from 98.2% to 98.0%)

after the introduction of fully homomorphic

104

Bi, X.

Research on Privacy and Security Issues in Federated Learning.

DOI: 10.5220/0013679400004670

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 2nd International Conference on Data Science and Engineering (ICDSE 2025), pages 104-110

ISBN: 978-989-758-765-8

encryption, which maintains a high model

performance while protecting privacy. In terms of

communication overhead, when the data batch size is

64, the communication overhead is only about 10%

higher than that of unencrypted nodes, balancing

privacy protection and communication efficiency (Li,

Zhou, 2024). Taking the medical field as an example,

due to its data sensitivity, there is a higher

requirement for privacy and security. Researchers

propose privacy protection schemes such as

blockchain-based federated learning frameworks,

which utilize blockchain's immutability and

traceability to enhance data security and

trustworthiness. However, in practical applications,

balancing privacy protection and model performance

to ensure secure and efficient data collaboration

among different healthcare organizations is still a

problem to be solved (Wang, 2024). On the industrial

side, a new paradigm of industrial federated learning

based on layered cross-domain architecture is

proposed, driven by 6G technology. The federated

learning algorithm based on the end-side cloud three-

layer federated learning architecture proposed by

Chen Zhu et al. can significantly reduce the latency

and energy consumption of the federated learning

model with faster convergence of the model in the

cloud while guaranteeing the testing accuracy of the

model in the cloud (Liu et al., 2024; Chen et al.,

2024). However, these techniques have challenges in

practical applications, such as differential privacy

affecting model accuracy, high computational

overhead of homomorphic encryption, and high

complexity of secure multi-party computational

communication.

This paper systematically studies the privacy

security problem of federated learning, analyzes its

theoretical foundation, architecture, and privacy

protection technology, and discusses the privacy

security guarantee mechanism and effect by

combining it with the application cases in finance,

medical, and other fields. The article deeply analyzes

the existing privacy security challenges, puts forward

targeted improvement strategies, and looks forward to

the future development trend, aiming to provide

theoretical support and practical guidance for the

privacy security protection of Federated Learning and

help its healthy and sustainable development.

2 THEORETICAL

FOUNDATIONS OF FEDERAL

LEARNING

2.1 Federated Learning Architecture

There are three main architectures for federation

learning: horizontal federation learning, vertical

federation learning, and federation migration learning.

Horizontal federation learning is suitable for

scenarios where the data characteristics of the

participants are similar, but the samples are different,

for example, banks in different regions can jointly

train credit assessment models through horizontal

federation learning. Vertical federation learning is

suitable for situations where the samples are similar,

but the characteristics are different, such as banks and

e-commerce platforms can use this to realize data

collaboration. Federated Migration Learning is used

to solve the model training problem when the data

distribution difference is large, when the source and

target domain data do not meet the conditions of

independent and same distribution, the source domain

data can be used to improve the performance of the

target domain model.

2.2 Principle of Operation

The basic workflow of federated learning is that each

participant trains the model locally using its own data

and calculates the gradient or parameter update values

of the model. Then, these update values are uploaded

to the central server or other coordinating nodes by

encryption and other secure methods. The

coordination node aggregates the received update

values, such as using a weighted average method, and

determines the weights based on factors such as the

amount of data of each participant to obtain the global

model update. Finally, the updated global model is

sent down to each participant, who uses the updated

model to continue training on local data and so on

iteratively until the model converges.

2.3 Privacy Protection Technology

Homomorphic encryption is a commonly used

encryption technique for federated learning, which

allows specific computations to be performed on the

ciphertext, and the results of the computations are

decrypted to be consistent with the same

computations in plaintext. In the process of updating

and uploading model parameters, participants encrypt

the gradient values with homomorphic encryption,

and the server aggregates the computation in the

Research on Privacy and Security Issues in Federated Learning

105

ciphertext state, which guarantees privacy and

security during data transmission and computation.

However, homomorphic encryption has a large

computational overhead, which affects the efficiency

of federated learning (Li, Zhou, 2024).

Differential privacy achieves privacy protection

by adding noise to the data. In federated learning,

participants upload model update values before

adding noise to the gradient or parameters that match

a specific distribution. Noise addition makes it

difficult for an attacker to infer the original data from

the output, protecting data privacy. However, noise

introduction affects model accuracy and requires a

trade-off between privacy protection and model

performance. The FLFilter scheme proposed by Xiao

Di et al. uses local differential privacy to protect

customer privacy and differentiate normal and

malicious customer behaviors and designs clustering

and filtering methods for backdoor attack

characteristics. The proposed cosine gradient

clustering index breaks the barrier between model

perturbation and backdoor model identification.

Through theoretical analysis and experimental

simulation, it is confirmed that FLFilter achieves the

expected goals in terms of accuracy, robustness, and

privacy (Xiao et al., 2024).

Secure multi-party computation allows multiple

participants to jointly compute an objective function

without revealing their respective data. In federated

learning, each participant computes the gradient, loss

function, etc., of the model through the secure multi-

party computation protocol. For example, the secure

data interaction is realized by using an unobtrusive

transmission protocol, which ensures that the

participants can only access the information required

for the computation and cannot access the raw data of

other parties. However, the communication

complexity of secure multi-party computation is high

and requires high network bandwidth and

computational resources.

3 CASE STUDIES ON THE

APPLICATION OF FEDERAL

LEARNING IN DIFFERENT

FIELDS

3.1 Financial Sector

3.1.1 Application Scenarios

In the field of financial risk control, different banks

can jointly train credit assessment models through

federated learning. Take Bank A and Bank B as an

example; they cannot share data directly due to data

privacy and competition, but they can use their own

data to train their credit assessment models locally

and upload the gradient or parameter update values of

the models encrypted to the federated learning

platform. The platform aggregates the updated values

and sends them down to the banks for further training.

For example, after clarifying the best fit between

federated learning and financial business, the Tencent

security team gave full play to its technological

effectiveness to promote agile business innovation on

the industry side, screening and federating more than

200 business indicators to achieve intelligent credit

card management for a commercial bank (Zhang,

2024).

3.1.2 Privacy and Security Mechanisms

This process uses homomorphic encryption to

encrypt the uploaded model update values to ensure

secure data transmission. To prevent model inversion

attacks, differential privacy is introduced in the model

training process to add an appropriate amount of

noise to the gradient values. Meanwhile, a secure

multi-party computation protocol ensures that each

bank computes intermediate results, such as model

gradients, without disclosing their respective raw data.

For example, Qiuxian Li et al. proposed a new

privacy protection framework in terms of privacy

protection capability, through full homomorphic

encryption, the model parameters complete the

computation and update in the encrypted state to

ensure that the data is always encrypted during

transmission and computation, effectively preventing

data leakage and providing end-to-end privacy

protection from data generation to model update (Li,

Zhou, 2024).

3.1.3 Application Effects

With federated learning, banks are able to integrate

data from multiple sources to improve the accuracy

of their credit assessment models. The credit

assessment model, after adopting federated learning,

has improved the accuracy of identifying defaulted

customers compared to the model trained on single

bank data. Meanwhile, due to the effective

implementation of privacy and security mechanisms,

the collaborative utilization of data was achieved

without any data leakage incidents while

safeguarding data privacy. For example, the

asynchronous federated learning technology of

Jingdong Digital Technology has been implemented

in financial scenarios to build a big data risk control

ICDSE 2025 - The International Conference on Data Science and Engineering

106

model with partner institutions, which improves the

generalization effect of the model and makes the data

stored locally participate in the training of the overall

model at the same time (Zhang, 2024).

3.2 Medical Field

3.2.1 Privacy Security Guarantee

Mechanism

To protect patient privacy, healthcare organizations

use a blockchain-based federated learning framework

to ensure data security and trustworthiness using

blockchain's immutability and traceability. During

data transmission, data is encrypted using

homomorphic encryption. For the data source

inference attacks that medical data may face, a ring

signature-based anti-source inference attack scheme

is used to protect the identity privacy of patients.

Meanwhile, corresponding detection and defense

mechanisms are proposed to prevent double-ended

poisoning attacks. Wenshuo Wang discusses the

application of federated learning in the field of smart

healthcare and proposes a series of solutions for the

data privacy protection problems therein, which

effectively solves the security and privacy problems

faced by federated learning in smart healthcare

through the three frameworks of FRESH, ABPFL-

SSH, and PFHE, and provides technical support for

the development of smart healthcare (Wang, 2024).

3.2.2 Application Effect

FRESH framework: the effectiveness of its signature

consumption time and batch verification strategy

under different public key set capacities is verified

through experiments. In terms of signature time

consumption, when the public key set capacity is 100,

a single signature takes 1.5 seconds; when the

capacity is 200, it takes less than 3 seconds. In terms

of batch verification, when the capacity of the public

key set is 100 and the number of clients is 50, the

batch verification takes 0.5 seconds, which saves 80%

of the time compared to the one-by-one verification

(Wang, 2024).

ABPFL-SSH framework: the relationship

between server-side poisoning volume and model

accuracy and the performance of client-side

poisoning screening algorithms are experimentally

verified. In terms of server-side poisoning amount

and model accuracy, when the poisoning amount is

lower than the threshold value of 0.5, the model

testing accuracy is more than 90%; when it exceeds

0.5, the accuracy drops to less than 20%.

In terms of the performance of the client-side

poisoning screening algorithm, in 100 client systems,

the testing accuracy was 92%, the false positive rate

was 11%, and the false negative rate was 0% (Wang,

2024).

PFHE framework: computational cost, model

prediction accuracy, and communication cost were

evaluated experimentally. In terms of computational

cost, client-side encoding and encryption took 0.03

seconds, and decoding and decoding took 0.02

seconds. The total time consumed by the server-side

ciphertext operation does not exceed 0.5 seconds. The

model prediction accuracy is similar to the original

model accuracy, with an average difference of no

more than 2%. For example, the original model on the

LBW dataset is 95%, and the PFHE model is 93%;

the original model on the Nhanes III dataset is 88%,

and the PFHE model is 86%. In terms of

communication cost, the client uploads data size of

3.125MB, 3.125MB per minute at the highest

frequency, and the network speed requires 437Kbps

(Wang, 2024).

4 EXISTING ISSUES IN

FEDERAL LEARNING

PRIVACY AND SECURITY

4.1 Data Leakage Risks

An attacker can utilize the output of a federated

learning model to reverse the derivation to try to

restore the original training data. In the federated

learning scenario of image recognition, an attacker

observes changes in the model output by adjusting the

input data, potentially reconstructing the original

training image and leading to data leakage.

During federated learning, the gradient

information uploaded by the participants may contain

some features of the original data. After obtaining this

gradient information, the attacker analyzes the

gradient change trends and numerical features and

may infer sensitive information from the original data.

4.2 Malicious Attack Threats

Malicious participants may intentionally inject

malicious data during model training to change the

direction of model training and degrade model

performance. In federated learning of medical

diagnostic models, malicious parties uploading

incorrectly labeled case data can lead to biased

Research on Privacy and Security Issues in Federated Learning

107

disease diagnosis in the trained model (Manzoor,

2024)..

An attacker may steal the global model in

federated learning or the local model of a participant

through means such as network attacks. After

obtaining the model, the attacker can further analyze

the model structure and parameters, speculate the

information related to the original data, or use the

stolen model for illegal activities (Manzoor, 2024).

4.3 Limitations of Privacy Protection

Techniques

Although homomorphic encryption can effectively

protect data privacy, the computational overhead is

large, leading to a significant increase in federated

learning training time. In large-scale data and

complex model training scenarios, the computational

burden of homomorphic encryption may make it

difficult for federated learning to run in real time. The

emergence of federated learning breaks the

phenomenon of data silos and solves some data

privacy problems at the same time, but as the types of

participants increase and the attackers' attacks

become more and more sophisticated, federated

learning faces increasing privacy leakage problems.

Although there have been a number of privacy

protection studies on federated learning, there are still

many unsolved problems due to limited resistance to

attacks, single application scenarios, and huge

communication overheads. The future privacy

protection of federated learning remains a more

permanent challenge (Wang, Yi, Zhang, 2024).

Differential privacy protects privacy by adding

noise, but noise addition affects model accuracy. In

practice, it is difficult to determine the appropriate

noise intensity that balances privacy protection

requirements and model performance.

Secure multi-party computation requires a large

number of communication interactions between the

participants with high communication complexity. In

unstable network environments or limited bandwidth,

the efficiency of secure multi-party computation can

be seriously affected, even leading to the interruption

of the federated learning process.

4.4 Legal and Regulatory Issues

Currently, there are fewer privacy protection norms

for federated learning, and there is a lack of clear

standards and guidelines. In practical application, the

responsibilities and obligations of each participant for

data privacy protection are not clearly defined, and it

is easy to see that privacy protection measures are not

in place. Author Liu Zegang reveals the legal defects

of the existing privacy protection path and puts

forward the corresponding improvement direction

and suggestions by analyzing the problems of federal

learning in terms of legal norms, responsibility

implementation, protection of personality rights and

interests, and technical trade-offs. The “loose” and

“joint” learning process of federal learning makes

attribution of responsibility difficult. In cross-

organizational federated learning, the server

controller may not be more responsible for privacy

protection, and it is difficult to determine the legal

nature of each participant, so it is difficult to allocate

and pursue responsibility from the perspective of

personal data law (Liu, 2025).

5 FEDERAL LEARNING

PRIVACY AND SECURITY

IMPROVEMENT STRATEGIES

5.1 Optimization of Encryption

Technology

Research and adopt new encryption algorithms, such

as lattice-based encryption algorithms, which have

higher security and efficiency. Lattice-based

encryption algorithms have advantages in resisting

quantum computing attacks and, at the same time,

reduce the computational complexity relative to

traditional algorithms such as homomorphic

encryption, which can improve the computational

efficiency of Federated Learning under the premise of

safeguarding data privacy.

Combine encryption with other privacy protection

techniques, such as combining homomorphic

encryption with differential privacy. Before the data

is uploaded, the noise is added using differential

privacy, and then homomorphic encryption is

performed, which reduces the amount of encryption

computation and enhances the effect of privacy

protection. As proposed by Chaudhury D S, a

federated learning framework SBTLF combining

blockchain, local differential privacy, and incentives

is designed to solve the problems of privacy

preservation, data sharing incentives, and secure

sharing of model parameters in federated learning.

Experimental results show that the framework is able

to maintain high model performance while protecting

privacy and promoting active client participation

through incentive mechanisms. In addition, the

SBTLF framework has good scalability and security

ICDSE 2025 - The International Conference on Data Science and Engineering

108

for large-scale distributed machine-learning scenarios

(Chaudhury, 2024).

5.2 Model Security Reinforcement

Aiming at model inversion attacks and poisoning

attacks, design corresponding defense mechanisms.

Adopt model watermarking technology to embed

specific identification information in the model so

that when the model is stolen or maliciously tampered

with, anomalies can be found by detecting the

watermark. At the same time, a verification

mechanism for model updates is established to verify

the model updates uploaded by the participants to

ensure their authenticity and legitimacy.

Optimize the model structure of federated

learning to improve the robustness of the model.

Adopt a decentralized model architecture to reduce

the dependence on a single server and reduce the risk

of privacy security due to server attacks. At the same

time, reduce the amount of data transmitted by the

model through techniques such as model compression

to reduce the risk of data leakage. In the study of

machine learning models for Alzheimer's disease

detection, by simulating membership inference

attacks, it is found that the FL model using SecAgg

can effectively protect client data privacy, and the

attacking model cannot determine which data

samples have been used to train client-specific

models. This shows that SecAgg has significant

advantages in privacy protection and can reduce the

risk of information leakage (Mitrovska et al., 2024).

5.3 Establishment of a Security Audit

Mechanism

Establish a strict data use monitoring mechanism to

monitor the use of data in the federal learning process

in real-time. Record operations such as data access,

transmission, and processing to ensure that the use of

data complies with privacy protection regulations.

Once abnormal data operations are found, provide

timely warning and processing. Such as dynamic

adaptive defense technology: develop defense

technology that can monitor and adapt to changes in

the system state in real time to cope with changing

threats (Chen et al., 2024).

Audit the behavior of the participants to verify the

authenticity and legitimacy of the participant's

identity. Record the operation records of the

participants through technologies such as blockchain

to ensure that the participants follow the rules in the

federal learning process and prevent the attack

behavior of malicious participants.

5.4 Legal and Regulatory

Improvements

Formulate privacy protection regulations specifically

for federal learning and clarify the data privacy

protection responsibilities and obligations of each

participant. Specify the privacy protection standards

for all aspects of data collection, storage, use, and

transmission to provide a legal basis for the privacy

security of federated learning.

Establish a specialized regulatory agency to

supervise and manage the application of federal

learning. The regulatory body is responsible for

reviewing the privacy and security program of the

federal learning program and imposing penalties for

violations to ensure that federal learning operates in a

legal and secure environment.

6 CONCLUSION

Federated learning, as an emerging distributed

machine learning technology, has significant

advantages in solving the data silo problem and

protecting data privacy. In this paper, through the

application case analysis in finance, medical, and

other fields, it is verified that it can realize the

collaborative use of data and improve the model

performance under the premise of privacy security.

However, Federated Learning still faces challenges in

privacy security, such as data leakage risks, malicious

attack threats, limitations of privacy protection

technologies, and legal and regulatory issues. To

address these issues, this paper proposes

improvement strategies such as optimizing

encryption technology, reinforcing model security,

establishing a security audit mechanism, and

improving legal and regulatory issues. Through the

implementation of these strategies, the privacy

security level of federated learning can be effectively

improved to provide a guarantee for its wide

application in various fields.

With the continuous development of technology,

the privacy security research of federated learning

will develop in the direction of more intelligent,

efficient, and integrated. In the future, further

research on new privacy protection techniques, such

as quantum cryptography-based encryption, is needed

to cope with increasingly complex security threats. At

the same time, research on the dynamic security

monitoring and adaptive defense capabilities of the

FCS should be strengthened so that it can adjust its

security strategy promptly in the face of ever-

changing means of attack. In addition, it is also

Research on Privacy and Security Issues in Federated Learning

109

necessary to strengthen international cooperation and

exchanges to jointly develop unified privacy security

standards and specifications for federated learning

and to promote the secure application and

development of federated learning technologies on a

global scale.

REFERENCES

Chen, C., Liu, J., Tan, H., et al., 2024. Trustworthy

federated learning: privacy, security, and beyond.

Knowledge and Information Systems, prepublish, pp. 1-

36.

Chaudhury, D.S., Morreddigari, R.L., Varun, M., et al.,

2024. Blockchain based secure federated learning with

local differential privacy and incentivization. IEEE

Transactions on Privacy, pp. 131-144.

Liu, Z., 2025. The limitations and solutions of privacy

protection in federated learning in the era of artificial

intelligence. Zhongwai Faxiang, 1-20.

Liu, M., Xia, Y., Zhao, H., et al., 2024. Federated learning

for 6G industrial Internet of Things: from demands,

visions to challenges and opportunities. Journal of

Electronics and Information Technology, 46(12), pp.

4335-4353.

Li, Q. & Zhou, Q., 2024. Research on federated learning

privacy protection technology based on fully

homomorphic encryption. Modern Information

Technology, 8(23), pp. 170-174.

Manzoor, U.H., Shabbir, A., Chen, A., et al., 2024. A

survey of security strategies in federated learning:

defending models, data, and privacy. Future Internet,

16(10), pp. 374-374.

Mitrovska, A., Safari, P., Ritter, K., et al., 2024. Secure

federated learning for Alzheimer's disease detection.

Frontiers in Aging Neuroscience, pp. 161324032-

1324032.

Ruan, M., Zhang, S., Xue, K., et al., 2024. Federated

learning algorithm design and optimization in edge

computing networks. Mobile Communication, 48(12),

pp. 122-128.

Wang, W., 2024. Research on key security technologies of

federated learning for smart healthcare. Master's thesis.

Yantai University.

Wang, B., Yin, X., Zhang, L., 2024. Research on privacy

protection based on horizontal federated learning.

Computer Technology and Development, 34(10), pp. 1-

Xiao, D., Yu, Z., Li, M., et al., 2024. A secure federated

learning scheme based on differential privacy and

model clustering. Computer Engineering and Science,

46(09), pp. 1606-1615.

Zhang, Z., 2024. Research on data privacy protection based

on federated learning: An analysis based on examples

from WeBank, Ping An Technology, and others.

Frontiers of Foreign Social Sciences, (05), pp. 86-99.

ICDSE 2025 - The International Conference on Data Science and Engineering

110