Privacy-Preserving Machine Learning in IoT: A Study of Data

Obfuscation Methods

Yonan Yonan

, Mohammad O. Abdullah

, Felix Nilsson

, Mahdi Fazeli

, Ahmad Patooghy

and Slawomir Nowaczyk

School of Information Technology, Halmstad University, Sweden

HMS Industrial Networks, Halmstad, Sweden

Department of Computer Systems Technology, North Carolina A&T State University, NC, U.S.A.

Keywords:

Machine Learning, Neural Networks, Data Obfuscation, Image Classiﬁcation, IoT, Cloud Computing,

Resource Constrained Devices.

Abstract:

In today’s interconnected digital world, ensuring data privacy is critical, particularly for neural networks op-

erating remotely in the age of the Internet of Things (IoT). This paper tackles the challenge of data privacy

preservation in IoT environments by investigating Utility-Preserving Data Transformation (UPDT) methods,

which aim to transform data in ways that reduce or eliminate sensitive information while retaining its utility

for analytical tasks. UPDT methods aim to balance privacy preservation and utility in data analytics. This

study examines the strengths and limitations of these methods, focusing on ObfNet, a neural network-based

obfuscation algorithm, as a representative case study to contextualize our analysis. By analyzing ObfNet, we

highlight its vulnerabilities and based on these ﬁndings we introduce LightNet and DenseNet as novel neural

networks to identify ObfNet’s limitations, particularly for larger and more complex data. We uncover chal-

lenges such as information leakage and explore the implications for maintaining privacy during remote neural

network inference. Our ﬁndings underscore the challenges and possibilities to preserve privacy during remote

neural network inference for UPDT algorithms, especially in resource-limited edge devices.

1 INTRODUCTION

In recent years, the rapid advancement of deep learn-

ing has signiﬁcantly enhanced the complexity and ca-

pabilities of inference models, which now demand

substantial computational resources. Deploying these

models on Internet of Things (IoT) devices presents

a formidable challenge due to their limited computa-

tional capacity and power constraints. IoT devices,

such as smartphones, rely on battery power and need

to conserve energy to function efﬁciently over ex-

tended periods, often requiring them to remain dor-

mant and occasionally transmit data for processing.

Given these constraints, ofﬂoading complex compu-

tations to remote servers or cloud-based backends,

which can handle the heavy processing requirements

of deep learning models, has become essential (Ni-

eto et al., 2024). However, this approach introduces

new concerns, particularly in terms of data privacy.

Edge computing, which involves shifting computa-

tion and storage from centralized cloud servers to net-

work edge nodes, offers notable advantages, includ-

ing reduced end-to-end latency and minimized band-

width usage. This paradigm is particularly beneﬁ-

cial when applying machine learning techniques to

the vast amounts of data generated by IoT devices.

However, even with these beneﬁts, ensuring the pri-

vacy of data during remote processing remains a crit-

ical issue. IoT devices, due to their ubiquitous na-

ture and the sensitive nature of the data they handle,

are particularly vulnerable to privacy breaches during

data transmission and processing. As a result, pro-

tecting the privacy of the data in these devices while

maintaining the efﬁciency of machine learning pro-

cesses is a key challenge (Zheng et al., 2019). De-

veloping lightweight and efﬁcient privacy-preserving

methods that can operate effectively within these con-

straints is crucial. Such methods must not only safe-

guard sensitive information throughout the machine

learning pipeline, from data collection to model in-

ference, but also minimize communication and com-

putational overhead to facilitate remote inference ef-

fectively. This is essential because high communica-

tion overhead not only causes delays and increased

latency but also drives up data transfer costs, es-

pecially in IoT systems reliant on cellular networks

where charges are based on data usage. Similarly,

excessive computational overhead can drain the lim-

Yonan, Y., Abdullah, M. O., Nilsson, F., Fazeli, M., Patooghy, A., Nowaczyk and S.

Privacy-Preserving Machine Learning in IoT: A Study of Data Obfuscation Methods.

DOI: 10.5220/0013458900003979

In Proceedings of the 22nd International Conference on Security and Cryptography (SECRYPT 2025), pages 347-354

ISBN: 978-989-758-760-3; ISSN: 2184-7711

347

ited battery life of IoT devices. Efﬁcient manage-

ment of these overheads ensures real-time process-

ing, scalability, and cost-effectiveness in IoT deploy-

ments (Zheng et al., 2019). One effective solution

in this context is Utility-Preserving Data Transforma-

tion (UPDT) methods (Zheng et al., 2019)(Xu et al.,

2020)(Feng and Narayanan, 2021)(Malekzadeh et al.,

2020). UPDT techniques have emerged as an es-

sential approach in privacy-preserving machine learn-

ing, particularly in the context of Internet of Things

(IoT) and edge computing. These techniques aim

to mask sensitive information within data while re-

taining its utility for tasks such as remote machine

learning inference. UPDT methods are designed to

be lightweight, suitable for resource-constrained IoT

devices with limited computational and energy re-

sources. UPDT methods must balance the dual ob-

jectives of preserving privacy and retaining data util-

ity for inference. However, their effectiveness is of-

ten compromised by vulnerabilities, including resid-

ual data patterns that can expose sensitive information

and scalability issues with complex datasets (Dhi-

nakaran et al., 2024). This work seeks to address these

challenges by evaluating the privacy-preserving capa-

bilities and computational efﬁciency of UPDT meth-

ods, focusing speciﬁcally on their limitations and vul-

nerabilities. Using ObfNet (Zheng et al., 2019)(Xu

et al., 2020) as a case study, we conduct a comprehen-

sive analysis to uncover its shortcomings and explore

the broader implications for UPDT methods. Our key

contributions, a detailed evaluation of ObfNet as a

case study, highlighting vulnerabilities in its ability

to obfuscate sensitive data while maintaining com-

putational efﬁciency and utility. These ﬁndings are

then used as a basis for exploring and assessing other

UPDT solutions. The introduction of LightNet and

DenseNet, two novel architectures designed to stress-

test UPDT methods on complex datasets. Generalized

insights and recommendations for enhancing the ro-

bustness, scalability, and privacy guarantees of UPDT

methods in resource-constrained environments.

2 BACKGROUND AND

LITERATURE

Utility-Preserving Data Transformation methods are

designed to transform data in a way that obscures,

obfuscates, or eliminates sensitive information while

preserving the attributes necessary for machine learn-

ing inference. These methods are especially rele-

vant in privacy-preserving inference for IoT and edge

computing, where devices need lightweight solu-

tions that ensure privacy without compromising util-

Figure 1: ObfNet training and inference phases with ﬁxed

InfNet and partitioned ObfNet.

ity. An example of a UPDT method is anonymiza-

tion, where sensitive identiﬁers such as names, ad-

dresses, or phone numbers are removed or replaced

with pseudonyms. However, while anonymization

can prevent direct identiﬁcation, it often fails to pro-

tect against re-identiﬁcation attacks, especially when

combined with auxiliary datasets (Ano, 2017). Dif-

ferential privacy (DP) on the other hand, provides a

framework for quantifying the privacy guarantees of

algorithms, ensuring that the removal or addition of a

single data point does not signiﬁcantly affect the out-

come. This technique is particularly useful in scenar-

ios where multiple queries are made to a database, as

it provides robust privacy protection by adding noise

to the query results. Although differential privacy

provides strong privacy guarantees, it reduces data

utility due to noise addition, especially in small or

complex datasets. It also introduces implementation

complexity (Abadi et al., 2016)(ha et al., 2019). In

contrast, ObfNet(Zheng et al., 2019)(Xu et al., 2020)

is a neural network-based UPDT algorithm designed

for IoT applications. ObfNet transforms data at the

edge device using the ﬁrst, smaller half of a parti-

tioned neural network with many-to-one mapping ac-

tivation functions, elimiating sensitive attributes be-

fore transmitting them to the server for ﬁnal infer-

ence, as described in Figure 1. According to the au-

thors, many-to-one activation functions make data re-

construction virtually impossible, but recent research

(Ding et al., 2022) suggests potential vulnerabilities in

its obfuscation, as residual patterns may expose sensi-

tive information. Even if only intermediate layers of a

neural network are shared, these layers may still con-

tain sensitive information. This is because the training

methods used in the ObfNet algorithm (Zheng et al.,

2019) do not explicitly differentiate between utility

and sensitive data.

Another example of a UPDT method is the Re-

placement AutoEncoder (RAE) (Malekzadeh et al.,

2020), where an autoencoder is trained with raw time-

series data as input and the same data as output, but

SECRYPT 2025 - 22nd International Conference on Security and Cryptography

348

with sensitive information selectively replaced in spe-

ciﬁc time windows by neutral data. RAE have been

combined with gradient reversal layer (GRL) (Ganin

and Lempitsky, 2015) to protect demographic infor-

mation (Feng and Narayanan, 2021). While ObfNet

and RAE share similarities in their approaches to

privacy-preserving transformation, they differ in the

residual sensitive information they may leave behind.

The authors of (Malekzadeh et al., 2020) designed

training methods for RAE to selectively replace sen-

sitive portions of the data while also validatating the

removal of sensitive information through privacy loss

metrics. These metrics -measured as the adversary’s

accuracy in inferring sensitive information- approach

random guessing, indicating effective obfuscation. In

contrast, ObfNet employs a many-to-one activation

mapping without conducting comprehensive security

evaluations to determine the extent of data removal.

Consequently, it remains unclear whether ObfNet ef-

fectively removes sensitive information or if the trans-

formed data may still be exploited for purposes be-

yond its intended utility. Privacy-preserving inference

techniques focus on protecting sensitive data during

the inference phase. A prominent approach is the use

of a UPDT method like OfbNet(Zheng et al., 2019).

In these networks, data is transformed or masked

on the edge device before being sent to the server,

where a subsequent, larger, model performs the ﬁ-

nal inference task. This approach aims to prevent

the server from accurately reconstructing the origi-

nal data, thus preserving privacy. Other approaches

include CryptoNets, which adapt neural networks to

process homomorphically encrypted data, enabling

secure computation without decrypting sensitive in-

formation (Chen and Ran, 2019), and Generative Ad-

versarial Models (GANs), which add controlled noise

to data, creating obfuscation that can confuse adver-

saries during inference (Romanelli et al., 2019).

3 METHODOLOGY

This section details the security evaluation approach

for examining ObfNet, with a focus on identifying po-

tential vulnerabilities and risks of information leak-

age. These ﬁndings may also have implications for

similar UPDT methods, shedding light on shared pri-

vacy challenges across this category. To examine

ObfNet, we followed the guidelines outlined in the

original ObfNet paper (Zheng et al., 2019) and used

the source code found in (Xu et al., 2020) during

the implementation of ObfNet. We used the CNN-

based inference model, and both CNN and MLP-

based ObfNets were explored, with bottleneck sizes

Figure 2: Attack scenario with server and adversary.

ranging from 8 neurons to 512 neurons. In our ex-

periments with colored images, we tested bottleneck

sizes from 8 to 1024 to examine the consequences of

a larger bottleneck size as the authors claim it pro-

vides better obfuscation (Xu et al., 2020). When

working with colored images, the size of the input

layer of all architectures of ObfNets was modiﬁed

from 28x28 neurons to 28x28x3 neurons to accom-

modate the additional color channels. The architec-

tures of the ObfNets and the inference network can

be found in (Xu et al., 2020). The attack scenario is

as follows. We employ ObfNet to obfuscate MNIST

images (Deng, 2012) as well as Colored-MNIST (a

modiﬁed version of MNIST introduced later). Then

the server attempts to reconstruct them or extract use-

ful information, such as color, utilizing all available

resources, including the public ObfNet network and

the training dataset used to generate it. Another at-

tacker in this scenario is an adversary intercepting the

connection between the edge device and the server,

assuming the obfuscated images are sent without en-

cryption. The difference between the server and the

adversary is that the latter lacks access to the origi-

nal training data. The adversary also possesses back-

ground knowledge regarding the nature of the data be-

ing transmitted, such as recognizing that MNIST im-

ages depict numbers or that Colored-MNIST includes

color information. The adversary also has the capabil-

ity to gather or generate their own datasets. An over-

iew of both attackers and their respective resources

are depicted in Figure 2. To quantify privacy, many

methods tailored for obfuscation use similarity mea-

surements between the original and obfuscated im-

ages, creating a similarity score (Raynal et al., 2020).

However, in ObfNet’s case, the obfuscated images

look vastly different from the original ones, making

a visual similarity measure a poor method for pri-

vacy quantiﬁcation. Despite trying to estimate a full

reconstruction of the original input from obfuscated

images, many of our tests focus on extracting color

information from obfuscated images since we know

color is a feature irrelevant for digit classiﬁcation and

it should not remain in the obfuscated images. At

the same time, the adversary is aware of the pres-

ence of color information in the obfuscated images.

The attackers’ ability to accurately predict the col-

Privacy-Preserving Machine Learning in IoT: A Study of Data Obfuscation Methods

349

Figure 3: Samples from the Colored-MNIST dataset.

Figure 4: Samples from the Noisy dataset.

ors of the obfuscated images serves as a metric for

quantifying privacy. This feature-accuracy (F

) repre-

sents the attackers’ ability to extract private features

from the obfuscated images and is calculated using

where C is the number of correctly predicted

colors, and T is the total number of obfuscated im-

ages. Given the seven equally distributed colors in our

datasets (Explained later), random guessing would re-

sult in an expected F

of 14.2% for both the server

and the adversary. This baseline accuracy provides

a reference point. In practice, an ObfNet will rea-

sonably be deemed compromised if this F

exceeds

twice the expected value, rounded up to 30%. When

it comes to datasets, in addition to using the original

MNIST dataset (Deng, 2012), we generated our own

datasets based on it. To avoid class imbalance, all col-

ors are randomly selected an equal number of times.

Colored-MNIST dataset is used for obfuscation and

security assessment, meanwhile the adversary gener-

ates Noisy and Path datasets. Colored-MNIST is an

extension of the original MNIST dataset. Initially,

we select a random color from the predetermined set

comprising {red, green, blue, aqua, magenta, yellow,

and white}, which is subsequently applied to indi-

vidual digit images from the MNIST dataset. Exam-

ple images from the dataset can be seen in Figure 3.

The motivation behind this dataset is to evaluate the

color-prediction accuracy from obfuscated images, as

the standard MNIST dataset lacks the color feature.

Noisy dataset is generated by assigning each pixel a

random intensity inﬂuenced by its distance from the

center, creating a higher intensity towards the middle.

The entire image is then assigned a random color from

the Colored-MNIST palette. The dataset consists of

30 thousand training images. Example images can

be seen in Figure 4. Path dataset aims to simulate a

handwriting-like appearance by generating a random

walk pattern. The process involves initiating the walk

at the center and taking a step in a random direction

for a predetermined number of steps. The entire im-

age is subsequently assigned a random color from the

set used in the Colored-MNIST dataset. Example im-

ages from the dataset can be seen in Figure 5. The

Figure 5: Samples from the Path dataset.

motivation for this dataset closely parallels that of the

Noisy Dataset, where an adversary serves as the at-

tacker. In this scenario, however, we assume that the

adversary possesses slightly more background knowl-

edge regarding the Colored-MNIST data. Speciﬁ-

cally, we presume that the adversary is aware of the

color information and the general shapes of the digits,

resembling a random walk. Reconstruction network

(RecNet) is an auto-encoder with an identical archi-

tecture to the MLP-based ObfNet found in (Xu et al.,

2020), with a ﬁxed bottleneck of 1024 neurons. The

server uses RecNet to estimate the original input im-

ages based on obfuscated images. The server gener-

ates a unique RecNet for each ObfNet in an attempt

to reverse its speciﬁc transformations. An overview

of this attack is illustrated in Figure 6. Each ObfNet,

whether MLP or CNN, is paired with its own unique

RecNet counterpart. While all RecNets are identical

in terms of architecture, the distinction lies in the data

on which they are trained. The server obfuscates the

entire training dataset using one ObfNet. RecNet is

then trained using pairs of the original and obfuscated

images to recreate the original images. This process

is repeated for each ObfNet. Following the examples

provided in (Xu et al., 2020), standard MNIST im-

ages are obfuscated using both MLP and CNN vari-

ants of ObfNet, with bottleneck sizes varying from

8 to 512 neurons. For each ObfNet, the server con-

structed a corresponding RecNet to estimate the orig-

inal images. This identical attack is also tested us-

ing Colored-MNIST images to evaluate the impact of

color on the reconstructed images. ColorNet adopts

the architecture of the MLP-based inference network

described in (Xu et al., 2020) with two slight modi-

ﬁcations. The input layer of this network is adjusted

from 28x28x1 to 28x28x3 to accommodate the color

channels present in the Colored-MNIST dataset. The

output layer is also adjusted from 10 to 7, reﬂecting

the classiﬁcation task among the seven available col-

ors. The primary objective of this network is to target

a speciﬁc feature that ObfNet aims to protect. This is

achieved by classifying the obfuscated images based

on their original digit colors. As color is not a nec-

essary feature for digit classiﬁcation, it is expected

that ObfNet should have kept this feature private by

removing it from the obfuscated images. The feature-

accuracy of ColorNet serves as a determinant in as-

sessing the security of an ObfNet. An overview of

SECRYPT 2025 - 22nd International Conference on Security and Cryptography

350

Figure 6: ColorNet and RecNet overview with example im-

ages.

this attack is illustrated in Figure 6. ColorNets are

generated on the server and trained using obfuscated

training images from Colored-MNIST. Each ObfNet

is paired with its corresponding ColorNet. Both MLP

and CNN variants of ObfNet are investigated, with

bottleneck sizes varying from 8 to 1024 neurons.

NoisyNet and PathNet are both ColorNets, but

they are trained on obfuscated Noisy and Path

datasets, respectively. The adversary utilizes these

networks to learn the colors of the obfuscated

Colored-MNIST images under the assumption that

they do not have access to original training data, but

only to a proxy. These networks are tested on CNN

and MLP-based ObfNets with bottleneck sizes vary-

ing from 8 to 1024.

All of our attacks thus far rely on the fact that,

as outlined in (Xu et al., 2020), the ObfNet network

is public, thereby accessible to everyone. If ObfNet

were trained and set up by a trusted third party, such

that neither attacker had access to it, this approach

would mitigate many risks associated with the net-

work’s security. In this attack scenario, both the

server and the adversary only have access to the ob-

fuscated images. However, the ability of the inference

network at the server to classify the obfuscated im-

ages into digits suggests that the features of the dig-

its are embedded within the obfuscated images. To

further analyze the information embedded in the ob-

fuscated images, both CNN and MLP-based ObfNets

are employed to obfuscate the testing images in the

Colored-MNIST dataset. This is done to see if the

sensitive color information is still present in the im-

ages. The obfuscated images are processed through a

t-SNE dimensionality reduction technique to reduce

their initial 2352 dimensions into a more manage-

able 2-dimensional representation. After many tests,

the perplexity value of the t-SNE hyperparameter was

set to 15 for all subsequent t-SNE experiments. This

choice balances preserving the global structure, such

as the numbers’ clusters, and capturing local struc-

tures, such as the colors’ sub-clusters.

According to the work presented in (Zheng et al.,

2019), ObfNet is a small-scale neural network de-

signed to be deployed on resource-constrained edge

devices. To validate this assertion, we utilized a pre-

trained ResNet50 (He et al., 2015) as the inference

network at the server and employed ObfNet to obfus-

cate ImageNette (Howard, 2019) images. ImageNette

images are larger and more complex than MNIST,

particularly in classiﬁcation tasks. This poses a chal-

lenge to the extent to which ObfNet can maintain its

lightweight and small-scale nature. When consider-

ing the proposed ObfNet architectures in the paper

(Xu et al., 2020), simply reshaping the input to have

the shape (224, 224, 3) results in a model with exces-

sive parameters. This, in turn, increases the size, the

one-time communication overhead, and the RAM us-

age on the edge device. This signiﬁcantly complicates

the model training process and undermines the fun-

damental purpose of ObfNet as a small-scale neural

network. For this reason, we introduce two new ob-

fuscation networks, namely LightNet and DenseNet.

LightNet is a fully convolutional neural network

(CNN) constructed for obfuscation of images in im-

age classiﬁcation tasks, speciﬁc for the ImageNette

dataset (Howard, 2019). The CNN architecture con-

sists of several critical layers designed to extract fea-

tures from the input images and generate new obfus-

cated images that can be sent to the server. Light-

Net has no dense layers, making it very lightweight in

terms of parameters, size, and communication over-

head. LightNet uses ResNet50 (He et al., 2015) as an

inference network and is trained following the same

methodologies presented in (Xu et al., 2020) using

ImageNette training images. LightNet processes pre-

processed input images of size (224, 224, 3) using

only convolutional layers. Max pooling reduces spa-

tial dimensions, followed by dropout to prevent over-

ﬁtting. Convolutional layers extract spatial features,

with batch normalization stabilizing training. Gaus-

sian noise is added as a regularizer. Finally, a trans-

posed convolution upsamples the feature map back to

(224, 224, 3).

DenseNet is a CNN with two dense layers as a bot-

tleneck. This architecture is inspired by the two-dense

layer design outlined in the original paper (Xu et al.,

2020). DenseNet incorporates two down-sampling

blocks comprising a convolutional layer followed by

a maxpool layer. This design aims to reduce the di-

mensions of the image. The bottleneck size is ﬁxed

at 128 neurons to keep the number of parameters and

size of this network to a minimum. DenseNet also

uses ResNet as an inference network and is trained

following the same guidelines in (Xu et al., 2020) us-

ing ImageNette training images. DenseNet processes

pre-processed input images of size (224, 224, 3) us-

ing two convolutional layers, each followed by max

pooling to progressively reduce spatial dimensions. A

dropout layer helps prevent overﬁtting. A bottleneck

dense layer (128 neurons) is used before a ﬁnal dense

Privacy-Preserving Machine Learning in IoT: A Study of Data Obfuscation Methods

351

(a) Original input images

(b) Bottleneck 8 (c) RecNet reconstruction

(d) Bottleneck 16 (e) RecNet reconstruction

(f) Bottleneck 32 (g) RecNet reconstruction

(h) Bottleneck 64 (i) RecNet reconstruction

(j) Bottleneck 128 (k) RecNet reconstruction

(l) Bottleneck 256 (m) RecNet reconstruction

(n) Bottleneck 512 (o) RecNet reconstruction

Figure 7: MNIST reconstructions (MLP).

layer with 224 × 224 × 3 neurons to reconstruct the

output image. The ﬁnal reshape operation restores the

original image dimensions.

4 RESULTS

With a ﬁxed bottleneck size of 1024 neurons, Rec-

Net was deployed to estimate the original input to

the ObfNet models obtained from the source code

provided in (Xu et al., 2020). Our ﬁndings in Fig-

ure 7 align with those reported in the original pa-

per, wherein the digit ’one’ appears slightly darker

than others when the bottleneck size is small. In the

same Figure 7, the output of RecNet, or the recon-

structed images, are displayed. The output of Rec-

Net when estimating the original input of obfuscated

Colored-MNIST images is illustrated in Figure 8 for

MLP-based ObfNets. The server attempts to ex-

tract color information from the obfuscated Colored-

MNIST images using ColorNet. The outcomes of

these efforts are depicted in Figure 9. The feature-

accuracy is showcased across the various bottleneck

sizes in the plot. The adversary employs NoisyNet

and PathNet to extract colors from the obfuscated im-

ages. The outcomes of these attempts are presented

in Figure 10. These tests reveal that even the adver-

sary can recognize the colors without access to the

original training data with feature-accuracy exceed-

ing 50%. The results thus far demonstrate that color

and digit information are embedded within the obfus-

cated images. The obfuscated images underwent a t-

(a) Original input images

(b) Bottleneck 8 (c) RecNet reconstruction

(d) Bottleneck 16 (e) RecNet reconstruction

(f) Bottleneck 32 (g) RecNet reconstruction

(h) Bottleneck 64 (i) RecNet reconstruction

(j) Bottleneck 128 (k) RecNet reconstruction

(l) Bottleneck 256 (m) RecNet reconstruction

(n) Bottleneck 512 (o) RecNet reconstruction

(p) Bottleneck 1024 (q) RecNet reconstruction

Figure 8: Colored-MNIST reconstructions (MLP).

Figure 9: F

of ColorNet with varying bottlenecks.

SNE dimensionality reduction procedure to analyze

this phenomenon further. The results are visualized

on a 2D graph in Figure 11 and they show ten pri-

mary clusters corresponding to the ten different digits.

However, each cluster contains sub-clusters associ-

ated with colors, indicating that color information re-

mains embedded in the obfuscated images. While this

study focuses on ObfNet, the vulnerabilities we iden-

tiﬁed, such as information leakage during reconstruc-

tion attacks, are likely relevant to other UPDT meth-

ods. Similarities between ObfNet and techniques like

Replacement AutoEncoders, and Anonymization sug-

gest that residual sensitive information may persist

when transformations are insufﬁciently selective to

SECRYPT 2025 - 22nd International Conference on Security and Cryptography

352

(a) NoisyNet. (b) PathNet

Figure 10: F

with varying bottlenecks.

(a) MLP, Bottleneck:1024 (b) CNN, Bottleneck:8

Figure 11: t-SNE of Colored-MNIST with two ObfNets.

protect speciﬁc sensitive information. This under-

scores a broader challenge within UPDT frameworks:

achieving a balance between utility preservation and

robust privacy. When dealing with larger and more

complex data, scaling up ObfNet results in many

trainable parameters and FLOPs. In Table 1, a com-

parison between ObfNets and our proposed models

is presented. LightNet achieved 93% accuracy while

DenseNet achieved only 59%. Obfuscated images for

LightNet can be seen in Figure 12. DenseNet images

are visually obfuscated.

Table 1: ObfNet vs. our models on 224x224x3 images.

Model Parameters FLOPs Size

ResNet50 26.2M 7.73e9 105 MB

ObfNet-MLP 154.3M 3.08e8 617 MB

ObfNet-CNN 279.1M 6.46e8 1.11 GB

LightNet 20.7K 5.10e8 76 KB

DenseNet 25.0M 2.29e8 100 MB

5 CONCLUSION & FUTURE

WORK

All the outcomes from the experiment mentioned

above, particularly the t-SNE analysis, suggest that

the ObfNet algorithm is more inclined to leak as much

information as possible rather than removing infor-

mation to protect privacy. Throughout all the con-

ducted tests, the ObfNet algorithm consistently failed

for various reasons each time. While the small bot-

tleneck forces some information to be removed, the

lack of mechanisms to control what is eliminated un-

Figure 12: LightNet output on ImageNette (obfuscated).

derscores a broader challenge not just for ObfNet but

for similar UPDT methods. This highlights the need

for more advanced approaches that explicitly priori-

tize the removal of sensitive information while main-

taining utility. The results of RecNet on obfuscated

test images from the Colored-MNIST dataset can be

seen in Figure 8 for the MLP variant. The CNN vari-

ant has very similar results. RecNet not only approx-

imated the original input shape and digit but also ac-

curately inferred the digit color from the obfuscated

image, given that the bottleneck of the ObfNet is large

enough. The obfuscated MNIST examples presented

in (Xu et al., 2020) were fully reconstructed in Fig-

ure 7, effectively undermining the examples obtained

in the source code. This further substantiates the vul-

nerabilities of ObfNet. This observation reinforces

the broader challenge faced by UPDT methods, as

similar reconstruction risks may arise when sensitive

attributes are insufﬁciently obfuscated. All ColorNet

models display the substantial color information em-

bedded within the obfuscated images, achieving peak

feature-accuracies of 99.46% and 99.40% on MLP-

based ObfNets with bottleneck sizes 512 and 1024,

respectively (Figure 9, Figure 10). The results of the

t-SNE graphs further strengthen the ﬁndings of Col-

orNet in Figure 11, with ten primary clusters in the

MLP-based obfuscated images, each of these clus-

ters corresponds to one of the ten digits. However,

each main cluster displays distinct sub-clusters corre-

sponding to colors. This suggests that color informa-

tion remains embedded within the obfuscated images

and is susceptible to exploitation. This highlights a

broader challenge for UPDT methods: the residual

data left behind after the transformation often retains

sensitive attributes, such as color information, which

can compromise privacy. Ensuring that transformed

data is free from exploitable patterns while preserving

its utility remains a critical hurdle for these methods.

LightNet failed to obfuscate images effectively, as

demonstrated by Figure 12. This failure emphasizes

the need for better training methods where feature re-

moval is controlled. DenseNet, incorporating dense

layers and a small bottleneck, managed to obfuscate

images but at the cost of signiﬁcantly reduced clas-

siﬁcation accuracy due to the complexity of the Ima-

geNette dataset. The balance between bottleneck size,

feature removal, and task complexity remains chal-

lenging to achieve with current training methods. The

Privacy-Preserving Machine Learning in IoT: A Study of Data Obfuscation Methods

353

images were preprocessed using ResNet’s preprocess-

ing function (TensorFlow, ). Therefore, any differ-

ences observed in Figure 12 can be attributed to the

preprocessing function. The original paper of ObfNet

(Xu et al., 2020) states that ObfNet is lightweight

and can run on devices without acceleration for in-

ference. This claim holds primarily for MNIST and

other small-scale, simple datasets. However, this fea-

sibility diminishes as dataset size and complexity in-

crease, leading to a substantial rise in network pa-

rameters, size, and FLOPs. This observation is not

unique to ObfNet but reﬂects a common challenge for

many UPDT methods when scaling to more complex

data: the trade-off between computational efﬁciency

and robust privacy preservation as the size of the data

that needs to be transformed grows. The original pa-

per (Xu et al., 2020) also claims that ”when more

neurons are used in the ﬁrst hidden layer of O

, the

overall darkness levels of the obfuscation results of all

digits are equalized, suggesting a better obfuscation

quality”-however, our test results in Figure 7 show

the opposite. As the bottleneck size increases, ObfNet

inadvertently retain more information, making sensi-

tive data more susceptible to leakage. This reliance

on visual indicators of obfuscation, rather than robust

privacy metrics, is a broader issue across UPDT tech-

niques. Each dataset comes with its unique privacy

requirements and characteristics, making it difﬁcult

to establish a universal privacy metric that applies to

all cases. Furthermore, the lack of well-deﬁned de-

sign principles in UPDT methods is a common chal-

lenge as each dataset is different (Malekzadeh et al.,

2020). For example, the results from LightNet (Fig-

ure 12) demonstrate that without explicit mechanisms

to enforce effective obfuscation, networks trained to

prioritize utility-such as inference accuracy-may in-

advertently leave sensitive data insufﬁciently trans-

formed. This issue is further exacerbated by train-

ing methodologies that do not impose strong con-

straints for selective feature removal, which can result

in residual sensitive information remaining within the

transformed datasets. Addressing these shortcomings

is essential for improving the scalability, robustness,

and privacy guarantees of UPDT architectures.

REFERENCES

(2017). Geo. l. tech. rev. 202.

Abadi, M., Chu, A., Goodfellow, I., McMahan, H.,

Mironov, I., Talwar, K., and Zhang, L. (2016). Deep

learning with differential privacy. pages 308–318.

Chen, J. and Ran, X. (2019). Deep learning with edge

computing: A review. Proceedings of the IEEE,

107(8):1655–1674.

Deng, L. (2012). The mnist database of handwritten digit

images for machine learning research. IEEE Signal

Processing Magazine, 29(6):141–142.

Dhinakaran, D., Sankar, S. M. U., Selvaraj, D., and Raja,

S. E. (2024). Privacy-preserving data in iot-based

cloud systems: A comprehensive survey with ai in-

tegration.

Ding, X., Fang, H., Zhang, Z., Choo, K.-K. R., and Jin, H.

(2022). Privacy-preserving feature extraction via ad-

versarial training. IEEE Transactions on Knowledge

and Data Engineering, 34(4):1967–1979.

Feng, T. and Narayanan, S. (2021). Privacy and utility pre-

serving data transformation for speech emotion recog-

nition. In 2021 9th International Conference on Af-

fective Computing and Intelligent Interaction (ACII),

pages 1–7.

Ganin, Y. and Lempitsky, V. (2015). Unsupervised domain

adaptation by backpropagation.

ha, T., Dang, T., Dang, T. T., Truong, T., and Nguyen,

M. (2019). Differential privacy in deep learning: An

overview. pages 97–102.

He, K., Zhang, X., Ren, S., and Sun, J. (2015). Deep resid-

ual learning for image recognition.

Howard, J. (2019). Imagenette. https://github.com/fastai/

imagenette.

Malekzadeh, M., Clegg, R. G., Cavallaro, A., and Haddadi,

H. (2020). Privacy and utility preserving sensor-data

transformations. Pervasive and Mobile Computing,

63:101132.

Nieto, G., de la Iglesia, I., Lopez-Novoa, U., and Perfecto,

C. (2024). Deep reinforcement learning techniques for

dynamic task ofﬂoading in the 5g edge-cloud contin-

uum. Journal of Cloud Computing, 13(1):94.

Raynal, M., Achanta, R., and Humbert, M. (2020). Image

obfuscation for privacy-preserving machine learning.

Romanelli, M., Palamidessi, C., and Chatzikokolakis, K.

(2019). Generating optimal privacy-protection mech-

anisms via machine learning. CoRR, abs/1904.01059.

TensorFlow. Preprocesses a tensor or numpy ar-

ray encoding a batch of images. https:

//www.tensorﬂow.org/api docs/python/tf/keras/

applications/resnet/preprocess input.

Xu, D., Zheng, M., Jiang, L., Gu, C., Tan, R., and Cheng,

P. (2020). Lightweight and unobtrusive data obfusca-

tion at iot edge for remote inference. IEEE Internet of

Things Journal, 7(10):9540–9551.

Zheng, M., Xu, D., Jiang, L., Gu, C., Tan, R., and Cheng,

P. (2019). Challenges of privacy-preserving machine

learning in iot. In Proceedings of the First Interna-

tional Workshop on Challenges in Artiﬁcial Intelli-

gence and Machine Learning for Internet of Things,

SenSys ’19. ACM.

SECRYPT 2025 - 22nd International Conference on Security and Cryptography

354