
model. This ensures that no single model has com-
plete access to the dataset, preserving privacy.
2.1.2 Teacher Model Training
Each teacher model is trained on its respective data
subset using standard machine learning algorithms.
To protect privacy, noise is added to the models’ pre-
dictions, controlled by a privacy budget parameter,
which determines the balance between privacy and
accuracy (Wagh et al., 2021).
2.1.3 Aggregation of Predictions
The teacher models’ noisy predictions are aggregated
using a voting mechanism, which selects the most
commonly predicted label for each data point. This
process prevents individual data points from being di-
rectly inferred (Xu et al., 2021).
2.1.4 Student Model Training
The aggregated predictions are used to train a student
model, which learns from the collective knowledge of
the teacher models. As the aggregated predictions al-
ready include noise, the student model uses a smaller
privacy budget (Boenisch et al., 2023).
2.1.5 Evaluation
The student model is tested on a separate dataset to
evaluate its accuracy and ability to generalize.
Figure 1: Overview of PATE (Papernot et al., 2018)
2.2 Differentially Private Stochastic
Gradient Descent (DP-SGD)
DP-SGD ensures differential privacy during the train-
ing of machine learning models by adding noise to the
gradients (Papernot et al., 2018). While DP-SGD is
effective in preserving individual data privacy through
noise addition, integrating advanced techniques such
as homomorphic encryption can further enhance pri-
vacy by allowing computations to be performed on
encrypted data, thus minimizing the exposure of sen-
sitive gradients (Fang and Qian, 2021). This hybrid
approach can address specific attack vectors, such as
membership inference, that exploit plaintext gradient
information.
2.2.1 Gradient Computation and Clipping
Gradients are computed using stochastic gradient de-
scent (SGD) on randomly sampled data batches. To
limit the influence of individual data points, gradients
are clipped to a fixed norm.
2.2.2 Adding Noise and Aggregation
Noise is added to the clipped gradients, adjusted
based on the privacy budget. These noisy gradi-
ents are aggregated to compute the average gradient,
which is used to update the model parameters.
2.2.3 Model Updating and Evaluation
The model parameters are iteratively updated using
the average noisy gradients. The trained model is
evaluated on a test dataset to measure accuracy and
privacy guarantees (Boenisch et al., 2023).
2.3 AES-GCM Encryption for Data
Security
AES-GCM is applied to enhance data security during
preprocessing (Das et al., 2019; Gueron and Krasnov,
2014). Its performance and security have been exten-
sively studied across different IoT-oriented microcon-
troller architectures, including 8-bit, 16-bit, and 32-
bit cores, where it was found to balance cryptographic
efficiency and resource constraints effectively (Sovyn
et al., 2019). This algorithm’s ability to resist side-
channel attacks, such as timing and power analysis,
makes it suitable for resource-constrained IoT envi-
ronments.
2.3.1 Data Pre-processing and Encryption
The MNIST dataset is normalized and split into train-
ing and testing sets. Selected data points are en-
crypted using AES-GCM, ensuring both confidential-
ity and integrity during training. AES-GCM’s prac-
tical strengths, such as balancing speed and security,
have made it a suitable choice for ML applications
(Arunkumar and Govardhanan, 2018).
2.4 Dataset: MNIST
The MNIST dataset is a standard benchmark in ma-
chine learning, featuring 70,000 grayscale images of
handwritten digits ranging from 0 to 9. It includes
60,000 images for training and 10,000 for testing,
each sized at 28×28 pixels. This dataset was em-
ployed to validate the proposed methodologies.
An Integrated Approach of Differential Privacy Using Cryptographic Systems
745