GENERATION: An Efﬁcient Denoising Autoencoders-Based Approach for

Amputated Image Reconstruction

Leila Ben Othman

1 a

, Parisa Niloofar

2 b

and Sadok Ben Yahia

2 c

Faculty of Sciences of Tunis, University of Tunis, El Manar, Tunisia

Mærsk Mc-Kinney Møller Institute, University of Southern Denmark, Odense, Denmark

Keywords:

Missing Data Mechanism, Amputation, Data Quality, Imputation, Denoising Autoencoder, Image

Reconstruction.

Abstract:

Missing values in datasets pose a signiﬁcant challenge, often leading to biased analyses and suboptimal model

performance. This study shows a way to ﬁll in missing values using Denoising AutoEncoders (DAE), a type

of artiﬁcial neural network that is known for being able to learn stable ways to represent data. The observed

data are used to train the DAE, and then they are used to ﬁll in missing values. Extensive tests on different

image datasets, taking into account different mechanisms of missing data and percentages of missingness, are

used to see how well this method works. The results of the experiments show that the DAE-based imputation

works better than other imputation methods, especially when it comes to handling informative missingness

mechanisms.

1 INTRODUCTION

In the realm of image processing and analysis, the ac-

curate reconstruction of images plays an important

role in numerous applications, ranging from medi-

cal imaging to computer vision. However, missing

data is a common problem in real-world datasets,

which presents a challenge to researchers and prac-

titioners. Imputing missing values into datasets is a

way to obtain a ﬁlled-in dataset that can be used for

further analysis. Many methods have been created

to help with imputation, but this difﬁcult job needs

someone who knows how missing data causes incom-

plete datasets, such as missing completely at random

(MCAR), missing at random (MAR), or missing not

at random (MNAR). MCAR and MAR are considered

ignorable missingness mechanisms, while MNAR is

nonignorable. Nonignorable missing data, in which

the chance of missingness depends on data that has

not been observed, presents a unique set of problems

that need advanced imputation methods.

While most of the studies assume the missingness

mechanism to be ignorable, in practice there is often a

reason why a value is not observed, indicating a non-

https://orcid.org/0000-0002-0075-9848

https://orcid.org/0000-0002-3147-0078

https://orcid.org/0000-0001-8939-8948

ignorable mechanism. For example, in a cancer clini-

cal trial, suppose the outcome variable is a biomarker

reﬂecting the treatment effect and is only observed at

the end of the study. Subjects may decide to drop

out before the endpoint because the treatment seems

ineffective, which leads to a violation of the ignora-

bility assumption (Zhou et al., 2014). Moreover, the

randomness assumption is too restrictive and does not

take into account the speciﬁc reasons behind missing-

ness (Ben Othman et al., 2009).

Autoencoders, a class of neural networks well-

suited for learning representation, have demonstrated

remarkable capabilities in capturing complex patterns

and features within data. Autoencoders have also

proven to be promising in impuation tasks (Pereira

et al., 2020b). The primary goal of an autoencoder is

to encode input data into a lower-dimensional repre-

sentation and then decode it back to its original form.

Denoising AutoEncoders (DAE) add a denoising

aspect to the original autoencoder and hence they are

able to reconstruct clean or denoised versions of input

data, even when the input is corrupted by noise. This

ability makes them particularly useful for tasks such

as data imputation, where missing or noisy values in

the input need to be ﬁlled in or corrected.

This paper delves into the application of DAE

to imputing missing data for image reconstruction.

However, an important but unaddressed topic for au-

Ben Othman, L., Niloofar, P. and Ben Yahia, S.

GENERATION: An Efﬁcient Denoising Autoencoders-Based Approach for Amputated Image Reconstruction.

DOI: 10.5220/0012460700003636

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 16th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2024) - Volume 3, pages 1237-1244

ISBN: 978-989-758-680-4; ISSN: 2184-433X

1237

toencoders is the missing data mechanism. Consider-

ing the missingness mechanism when handling miss-

ing data has always been a topic not explicitly ad-

dressed by the majority of works in the literature. This

lack of thought was one of the challenges mentioned

in a study that reviewed autoencoders for ﬁlling in

missing data (Pereira et al., 2020b). Some researchers

have looked at how the missingness mechanism af-

fects the performance of statistical or machine learn-

ing models (Niloofar et al., 2013; Niloofar and Gan-

jali, 2014). However, as far as we know, no one has

looked into how missing data mechanisms with au-

toencoders affect missing data imputation. To ﬁll this

gap, we introduce a new approach relying on a variant

of a Densoing Autoencoder, called GENERATION,

for denoisinG autoENcodEr foR AmpuTed Image re-

cOnstructioN, which has the ability to impute missing

values for reconstructing data

We provide an in-depth analysis of how well DAE

ﬁlls in missing data using a range of ignorable and

nonignorable missingness mechanisms. The evalu-

ation encompasses a range of metrics, including er-

ror reconstruction and the accuracy of a classiﬁcation

technique. Our approach is also compared to other

imputation methods and the results show clear im-

provements on classiﬁcation accuracy

The remainder of the paper is organized as fol-

lows: In Section 2, we present the problem of missing

data, the missing mechanisms and a scrutiny of im-

putation methodologies. A brief overview of autoen-

coders and denoising autoencoders is also presented.

In Section 3, we thoroughly describe the GENERA-

TION approach and discuss its ability to impute miss-

ing values for the reconstruction of data. Section 4 de-

scribes the dataset, the experimental study, the results,

and the discussion of the results. Section 5 wraps this

paper up and sketches future work issues.

2 PRELIMINARIES

Data analytics researchers are aware that handling

missing data, also referred to as missing values, is

a difﬁcult problem that requires intensive considera-

tion. This well-recognized problem arises in a real-

world context. Dealing with missing values is still a

major challenge in data analytics since they can bias

the results of statistical inferences and machine learn-

ing models. Missing values can occur for a variety

of reasons, such as human error or machine failure.

The missing data mechanisms are the procedures that

control the probabilities of missing data, according

to Little and Rubin (Little and Rubin, 2002). These

mechanisms can be categorized into three types:

• Missing Completely at Random (MCAR): when

the missing value is unrelated to any observed or

unobserved data.

• Missing at Random (MAR): when the cause of the

missing value is related to observed data.

• Missing Not at Random (MNAR): when the value

being missing is related to unobserved data (the

value itself and/or other unobserved data).

It is very important to pick the right imputation

method for the missing data mechanism, as using the

wrong method can change how well the classiﬁcation

works (Twala, 2009).

2.1 Denoising Autoencoders (DAE)

An autoencoder (Michelucci, 2022) has a structure

very similar to a feedforward neural network, where

the number of neurons in the output layer is equal to

the number of inputs (just a speciﬁc version of ar-

tiﬁcial neural networks). They consist of learning a

compact or compressed representation from unlabled

training data. This compact representation consists

of reducing the data dimensionality while preserving

the important features, which can improve the perfor-

mance of prediction and classiﬁcation techniques. It

is also used to reconstruct the original data with high

accuracy. An autoencoder is composed of at least

three layers (input, hidden, and output layers) and it

is composed of two parts:

• the encoder part, which goes from the input layer

to the output of the hidden layer. It maps the input

data into a compressed representation in a lower-

dimensional space, called the latent space.

• the decoder part, which goes from the hidden

layer to the output layer. It reconstructs the origi-

nal input data from the latent space representation.

The autoencoder is trained to minimize the re-

construction error (Michelucci, 2022). It means that

it has to minimize a loss function, which measures

the difference between the reconstructed data and the

original data. Common loss functions include Mean

Squared Error (MSE) for continuous data or binary

cross-entropy for binary data. The minimization of

the loss function is made with an optimization method

such as Stochastic Gradient Descent.

The main objective of autoencoders is to make a

trade-off between learning a compact representation

while still providing an accurate reconstruction. For

example, if the dimension of the latent space is too

small, the autoencoder may not be able to capture all

the essential information and this leads to a high re-

construction error. Autoencoders have been success-

fully applied to a number of real-world applications

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

1238

involving data compression (Sriram et al., 2022), fea-

ture extraction (Maggipinto et al., 2018), denoising

(Lee et al., 2021), data reconstruction (Liguori et al.,

2021), anomaly detection (Torabi et al., 2023), food

fraud detection on honey (Phillips and Abdulla, 2022)

or the classiﬁcation of its botanical origins (Phillips

and Abdulla, 2019).

Denoising Autoencoders (DAE) are among the

most recent models to perform the imputation of

missing data (Costa et al., 2018), and present promis-

ing results in comparison to traditional methods.

A DAE intentionally introduces noise (for instance,

Gaussian noise) to the input data to prevent the net-

work from learning the identity function (Wang et al.,

2021). As a result, a DAE is trained to reconstruct

noisy data. It means that the model is forced to learn

the reconstruction of the input, given its noisy version.

2.2 Scrutiny of Imputation

Methodologies

There are many ways to handle missing values in data.

Simply deleting the rows and/or columns with miss-

ing values, also known as available case analysis (Lit-

tle and Rubin, 2002) is not feasible in practice. Im-

putation, a method of handling missing values, fre-

quently involves replacing them with plausible val-

ues. Imputation methods can be divided into two

categories: statistical and machine learning-based

methods. Statistical imputation techniques include

mean/mode/median imputation, which replaces miss-

ing values with the mean, the mode, or the median

of the non-missing values. But the resulting imputed

values may not be representative of the true distribu-

tion of the missing variable. However, we need accu-

rate imputations to avoid biasing the data. Therefore,

several advanced statistical imputation techniques ex-

ist, such as Expectation-Maximization (EM) (Demp-

ster et al., 1977) which is an iterative procedure that

proceeds in two steps where a complete dataset is

created by imputing one or more plausible values

for the missing data. Thus, this EM approach pro-

ceeds without constructing a predictive model (Costa

et al., 2018). Machine learning-based imputation

techniques build a predictive model based on the

available data to estimate those that are missing. The

k-Nearest Neighbors (KNN) imputation (Batista and

Monard, 2002) algorithm is to cite but a few. A num-

ber of different versions have been suggested in or-

der to make the original KNN imputation more accu-

rate (Keerin and Boongoen, 2022). In (Twala, 2009),

the authors made an empirical comparison of tech-

niques for handling incomplete data using decision

trees. Recently, a similar study was carried out in

(Gabr et al., 2023) and assessed the effect of incom-

plete datasets on the performance of ﬁve classiﬁca-

tion models. The authors employed different ratios of

missing values in different datasets that vary in size,

type, and balance. In (Awan et al., 2022), the authors

introduced a reinforcement learning (RL) approach

for imputing missing data. The proposed approach

used an action-reward principle to learn a data impu-

tation policy, where the agent learns to take decisions

to make the best estimation of the missing values.

In recent years, deep learning-based methods have

proven to be increasingly popular for dealing with

missing values. Consequently, more advanced treat-

ment methods based on Deep learning have been pro-

posed (Phung et al., 2019; Pereira et al., 2020a; S. Li

and et al., 2022). In (Costa et al., 2018), a compar-

ison study between state-of-the-art imputation tech-

niques and a Denoising Autoencoders approach was

performed. Their research demonstrates that miss-

ing mechanisms have an impact on the imputation

methods. In (Venkataraman, 2022), Denoising Au-

toencoders have shown the ability to learn complex

patterns from data, which makes them convenient for

data reconstruction.

A classical approach to data imputation needs an

evaluation step. In the literature, imputation meth-

ods are usually evaluated by measuring the distance

between the reference data (the ground truth dataset

without missing values) and the imputed data (Ben

Othman and Ben Yahia, 2006). Another evaluation

technique consists of analyzing the impact of the im-

putation on the classiﬁcation accuracy (Gabr et al.,

2023). In (Ben Othman and Ben Yahia, 2018), the au-

thors suggested using different criteria and considered

an evaluation technique addressing both of the fol-

lowing issues: the stability of a clustering technique

by preserving the characteristics of the original data

through the Rand (Rand, 1971) metric and the robust-

ness of association rules (Le Bras et al., 2010). Thus,

they considered the evaluation of missing value im-

putation as a multi-criteria decision-making problem

and used the TOPSIS method (Technique for Order

of Preference by Similarity to Ideal Solution) to rank

the different imputation methods according to differ-

ent criteria.

3 DENOISING AUTOENCODER

FOR DATA IMPUTATION

In the following, we thoroughly describe our ap-

proach called GENERATION. The broad strokes of our

approach are as follows:

• Integrating the different missing data mechanisms

GENERATION: An Efﬁcient Denoising Autoencoders-Based Approach for Amputated Image Reconstruction

1239

Input

Data Complete

Data

Incomplete

Data

Latent space

Encoder

Decoder

Reconstructed

Data

Amputation

Imputation

Figure 1: Overall architecture of GENERATION.

(MCAR, MAR, and MNAR) by introducing miss-

ing values using an amputation technique;

• Training a Denoising Autoencoder to learn the un-

derlying data structure of missingness from the

different missing value mechanisms to reconstruct

the data; and

• Evaluating the impact of the different missing data

mechanisms on imputation quality in terms of the

reconstruction quality performed by the autoen-

coder and through the application of a classiﬁca-

tion technique

Thus, we extend the denoising autoencoders by

replacing the denoising part with a data amputation.

In the following, we describe the methodology of our

proposed approach, GENERATION for reconstruct-

ing the amputed data.

3.1 The GENERATION Methodology

Description

Figure 1 shows the overall architecture of our ap-

proach, called GENERATION. It is composed of

two main steps: the amputation and the imputation

steps. The amputation step consists of artiﬁcially in-

troducing missing values with different mechanisms

(MCAR, MAR, and MNAR). The imputation or re-

construction step is made with an autoencoder to re-

construct the missing parts. By testing our suggested

method, we can see how well it handles the differ-

ent types of missing data and how it stacks up against

other imputation methods that are already out there.

The next few paragraphs talk about the different steps

of our approach.

1. Splitting Data. This step involves dividing the

dataset into two subsets: the training set and the test

set. The training set is used to train the autoencoder,

while the test set is used to assess the performance of

the model on unseen data. It is important to mention

that the autoencoder is not provided with labeled data;

it learns to encode and decode the input data in an

unsupervised manner.

2. Amputation. Amputation refers to the process

where missing values are generated artiﬁcially in

complete data for simulation purposes. Its main ob-

jective is to simulate different scenarios with differ-

ent missing data mechanisms to assess the imputation

quality. In this work, we employed a well-established

amputation technique proposed in (Schouten et al.,

2018), where the amputation is performed consider-

ing different missing patterns (MissingnessPattern).

A missing pattern deﬁnes a missingness mechanism

(MCAR, MAR, MNAR) and a proportion. When

this process is done, a linear regression equation with

user-speciﬁed coefﬁcients gives a weighted sum of

scores that are used to ﬁgure out the probability of

missing data. Based on his weighted sum score, each

missing pattern receives a probability of being miss-

ing for a given set of data. Then a logistic distribu-

tion function is applied to transform the weighted sum

scores into probabilities indicating whether a value

becomes missing or not. Performing this procedure

on images involves replacing some pixels with miss-

ing values (missing pixels) and pre-imputing them

with 0. In this step, we created an amputed ver-

sion of both the train and the test sets that we call

x trainAmputed and x testAmputed.

3. Initialization of GENERATION. This step

consists of deﬁning the Denoisy Autoencoder archi-

tecture. Taking into account the results and recom-

mendations of the literature, we adopted an architec-

ture consisting of an encoding network and a decod-

ing network. The characteristics of the chosen De-

noisy Autoencoder can be described as follows:

• The architecture is symmetrical.

• Both the encoder and the decoder have several

hidden layers.

• The number of neurons per layer decreases in the

encoder and increases in the decoder.

• The number of outputs of the autoencoder is equal

to the number of inputs.

The initialization of the Denoisy Autoencoder

consists of specifying the hyper-parameters. These

hyper-parameters include the number of layers, the

number of neurons per layer, the number of dimen-

sions of the latent space and the activation functions

for the encoding and decoding parts. In the experi-

ments, we varied the autoencoder architecture and the

hyper-parameters to select the one that minimizes the

reconstruction error. Figure 2 shows the architecture

built on 28 × 28 pixel gray-scale images.

4. GENERATION Training. The training sample

is made up of the amputed data (x trainAmputed) as

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

1240

Output layer

Hidden layer 3

Hidden layer 2

784 neurons

49 neurons

196 neurons

Hidden layer 1

Reconstructed Data

Input layer784 neurons

Input Data

Latent

space

Amputation

Figure 2: The GENERATION architecture at a glance.

input and the original data (x train) as target. This al-

lows the DAE to learn to reconstruct data from data

with missing values. The main goal is to get the DAE

to learn a hidden representation that shows the im-

portant parts of the data and works even when some

values are missing. Using the original data as a target,

GENERATION is pushed to generate a reconstruction

that is as close as possible to the original data. In

the training step, the target is the version of data that

we want the autoencoder to learn to reconstruct later.

Training the autoencoder requires the application of

an optimization method for minimizing a loss func-

tion. Here, the hyperparameters include the number

of epoch used to train the model, the reconstruction

error (loss function) and the optimization method.

5. Reconstruction (Imputation). During the train-

ing step, GENERATION learns a compact represen-

tation that lets the previously cut data be put back to-

gether again. Once the autoencoder is trained, we give

the amputed test sample (x testAmputed) to the en-

coder to obtain its encoding (encodedData) and then

pass that encoding through the decoder to reconstruct

the original data (ReconstructedData). The recon-

structed data should be closely related to the input

data.

4 EXPERIMENTAL RESULTS

We usher in this section by providing information

about the considered dataset.

For these experiments, we used the well-known

and publicly available MNIST dataset (https://www.

kaggle.com/datasets/zalando-research/fashionmnist).

The latter covers images of a variety of fashion items,

including t-shirts, trousers, dresses, etc. It is designed

to facilitate the development and evaluation of ma-

chine learning models for the classiﬁcation of differ-

ent types of clothing items. The dataset consists of

60, 000 training examples and 10, 000 test examples.

A 28 × 28-pixel gray-scale image serves as the repre-

sentation for each example.

We carried out a series of experiments with the

aim of evaluating our proposed approach. All the

steps described in Section 3 were conducted in Python

using the TensorFlow (Martin. Abadi and et al., 2015)

framework, Pandas and scikit-learn libraries. For data

amputation, we used the pyampute https://riannesc

houten.github.io/pyampute/build/html/index.html

python library for the amputation technique. Af-

ter tuning and testing the autoencoder in terms of

the number of hidden layers and their neurons, we

achieved satisfactory training loss values for the spec-

iﬁed autoencoder architecture depicted by ﬁgure 2.

We also used the Adam (Adaptive Moment Estima-

tion) (Kingma and Ba, 2014) optimizer to minimize

the error (loss function) with the mean squared error,

and 10 epochs proved to be suitable for the learning

process. In the following, we thoroughly describe our

experimental validation outcomes.

Serie 1 of Experiments: Gauging the Quality of the

Reconstruction Visually. In this ﬁrst experiment,

we evaluated the quality of the reconstruction of GEN-

ERATION visually. Figure 3 shows this reconstruction

for the 10 ﬁrst test images.with 50% of MAR values

missing. It is easy to see that the quality of the ampu-

tated images has improved after reconstruction. The

second series of experiments will validate this visu-

alization by studying the loss function during the re-

construction step.

Serie 2 of Experiments: Impact of the

Mechanisms of Missing Data. In this second

series of experiments, we study the impact of the

mechanisms of missing data: missing completely

at random data (MCAR), missing at random data

(MAR) and missing not at random data (MNAR).

We plot the loss function curves for each mechanism

with 60% of a proportion of missing values. We

present the result in Figure 4, from which we can

easily notice that:

• For the three different mechanisms (MCAR,

MAR, and MNAR), the loss function is decreas-

ing over the learning process. This indicates that

the model is learning and improving its ability to

reconstruct the missing values.

GENERATION: An Efﬁcient Denoising Autoencoders-Based Approach for Amputated Image Reconstruction

1241

Figure 3: Visualization of original, amputed and reconstructed images with GENERATION.

Figure 4: The loss function curves for MCAR, MAR, and MNAR mechanisms: 60% of missing data.

Table 1: Accuracy using KNNeighborsClassiﬁer (k = 5) for different missingness mechanisms.

Missingness Mechanism #MV (%) Before imputation After imputation

MCAR 30 0.6379 0.8421

50 0.6361 0.8399

70 0.6321 0.8332

90 0.6401 0.8464

MAR 30% 0.6410 0.8422

50 0.5360 0.8399

70 0.5325 0.8432

90 0.5306 0.8405

MNAR 30 0.6175 0.8418

50 0.5165 0.8531

70 0.5018 0.8620

90 0.5001 0.8710

• We can notice that the MNAR and MAR mecha-

nisms have higher loss values than the MCAR. It

means that ﬁlling in missing values in the MNAR

and MAR scenarios is harder or more complicated

to do than in the MCAR scenario. Otherwise,

performance varies according to the mechanism;

MCAR is the easiest to process. The MAR mech-

anism requires observed variables to be taken into

account and the MNAR mechanism presents ad-

ditional difﬁculties due to the relationships be-

tween observed and unobserved variables. The

result highlights the need for speciﬁc approaches

and consideration of the underlying missing data

mechanisms in order to deal efﬁciently with miss-

ing values.

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

1242

Table 2: Obtained accuracy values for KNNeighborsClassiﬁer (k = 5).

Mechanism

Imputation

GENERATION KNN Median Mean

Imputer

MCAR 0.8331 0.7336 0.7357 0.7340

MAR 0.8379 0.7328 0.7325 0.7361

MNAR 0.8397 0.7374 0.7303 0.7355

Serie 3 of Experiments: Assessing the Impact of

the Reconstruction on a Classiﬁcation Technique

In order to ﬁgure out how well GENERATION did at

imputation, we looked at how the reconstruction af-

fected a classiﬁcation method. This constitutes an

evaluation technique for missing data handling meth-

ods. To perform this evaluation, we used the KN-

NeighborsClassiﬁer. This involves using the autoen-

coder to impute missing values and then using the

KNNeighborsClassiﬁer to make predictions based on

the imputed (reconstructed) data on the one hand and

amputed data on the other hand. The idea is to study

the impact on accuracy before and after imputation.

Table 1 illustrates the results obtained when applying

the classiﬁer before and after imputation. We can tell

how well GENERATION did the imputation by com-

paring the predicted labels to the actual labels. The

results show the importance of handling missing data

when applying a classiﬁcation technique. Regard-

less of the missing data mechanism, the accuracy im-

proves after imputation. In addition, we observe once

again the impact of the missing data mechanism. In

fact, accuracy tends to deteriorate when missing val-

ues increase and are MAR or NMAR before imputa-

tion.

Serie 4 of Experiments: Comparaison of the

Imputation Technique with Baseline Imputation.

In the remainder, we compare our imputation based

on GENERATION with baseline imputation tech-

niques. We employed KNN imputation (KNNeigh-

borsImputer) with k = 3, Median imputation and

Mean imputation. We then measure the accuracy

of the imputed data using the KNNeighborsClassi-

ﬁer with k = 5. We present the results in Table 2.

No matter how the missing data is handled, our sug-

gested method using GENERATION is more accurate

than the standard imputation methods that were used.

This conﬁrms the effectiveness of autoencoders for

handling missing values. However, the classiﬁcation-

based evaluation technique does not provide addi-

tional information in relation to the missing data

mechanism. In fact, our autoencoder obtained an ac-

curacy equal to 0.83 with the different mechanisms.

However, we can notice that this was not the case with

the evaluation using the loss function. This result en-

courages us to reconsider the suitability of this evalu-

ation technique for handling missing data.

5 CONCLUSION AND FUTURE

WORK

Addressing missing values in image datasets is a crit-

ical step in ensuring the reliability and accuracy of

analyses and model performance. It was shown in this

study that denoising autoencoders are a good way to

ﬁll in missing data in images for reliability. First, the

image dataset is amputed with missing values follow-

ing different percentages of informative and noninfor-

mative missingness mechanisms. Second, the DAE is

trained on the training set and evaluated on the test-

ing set. Third, its performance is compared with that

of other well-known prediction methods. Although

our proposed methodology shows promising results

even for noninformative missingness, there are certain

other directions that should be investigated further: (i)

The major limitation of DAEs stands in the absence of

uncertainty quantiﬁcation in the model parameter es-

timation. Indeed, this could be addressed by applying

a generative model like variational autoencoders, and

(ii) the way we artiﬁcially impose missing values in

the dataset has a prominent effect on the imputation

accuracy and its outcomes. Exploring other method-

ologies and algorithms employed for this purpose is

also deemed worth further investigation.

REFERENCES

Awan, S., Bennamoun, M., Sohel, F., Sanﬁlippo, F., and

Dwivedi, G. (2022). A reinforcement learning-based

approach for imputing missing data. Neural Comput-

ing and Applications, 34:1–16.

Batista, G. and Monard, M.-C. (2002). A study of k-nearest

neighbour as an imputation method. volume 30, pages

251–260.

Ben Othman, L. and Ben Yahia, S. (2006). Yet another ap-

proach for completing missing values. In Proceedings

of the 4th Intl. Conf. on Concept Lattices and their

Applications (CLA 2006), Hammamet, Tunisia.

GENERATION: An Efﬁcient Denoising Autoencoders-Based Approach for Amputated Image Reconstruction

1243

Ben Othman, L. and Ben Yahia, S. (2018). A multiple cri-

teria evaluation technique for missing values imputa-

tion. In 12th Intl. Conf. on Research Challenges in In-

formation Science, RCIS 2018, Nantes, France, May

29-31, 2018, pages 1–12. IEEE.

Ben Othman, L., Rioult, F., Ben Yahia, S., and Cr

emilleux,

B. (2009). Missing values: Proposition of a typology

and characterization with an association rule-based

model. volume 5691, pages 441–452.

Costa, A. F., Santos, M. S., Soares, J. P., and Abreu, P. H.

(2018). Missing data imputation via denoising autoen-

coders: The untold story. In Duivesteijn, W., Siebes,

A., and Ukkonen, A., editors, Advances in Intelligent

Data Analysis XVII, pages 87–98, Cham. Springer

Intl. Publishing.

Dempster, A., Laird, N., and Rubin, D. (1977). Max-

imum likelihood from incomplete data via the EM

algorithm. Journal of the Royal Statistical Society,

39(1):1–38.

Gabr, M. I., Helmy, Y. M., and Elzanfaly, D. S. (2023). Ef-

fect of missing data types and imputation methods on

supervised classiﬁers: An evaluation study. Big Data

and Cognitive Computing, 7(1).

Keerin, P. and Boongoen, T. (2022). Improved knn imputa-

tion for missing values in gene expression data. Com-

puters, Materials and Continua, 70:4009–4025.

Kingma, D. P. and Ba, J. (2014). Adam: A

method for stochastic optimization. arXiv preprint

arXiv:1412.6980.

Le Bras, Y., Meyer, P., Lenca, P., and Lallich, S. (2010).

A robustness measure of association rules. In Proc.

of ECML PKDD’10: Part II, ECML PKDD’10, pages

227–242, Berlin, Heidelberg. Springer-Verlag.

Lee, W.-H., Ozger, M., Challita, U., and Sung, K. W.

(2021). Noise learning-based denoising autoencoder.

IEEE Communications Letters, 25(9):2983–2987.

Liguori, A., Markovic, R., Dam, T. T. H., Frisch, J., Treeck,

C., and Causone, F. (2021). Indoor environment data

time-series reconstruction using autoencoder neural

networks. Building and Environment, 191:107623.

Little, R. and Rubin, D. (2002). Statistical Analysis with

Missing Data, Second Edition. John Wiley, New York.

Maggipinto, M., Masiero, C., Beghi, A., and Susto, G. A.

(2018). A convolutional autoencoder approach for

feature extraction in virtual metrology. Procedia

Manufacturing, 17:126–133. 28th Intl. Conf. on

Flexible Automation and Intelligent Manufacturing

(FAIM2018), June 11-14, 2018, Columbus, OH, US-

AGlobal Integration of Intelligent Manufacturing and

Smart Industry for Good of Humanity.

Martin. Abadi, A. A. and et al. (2015). TensorFlow: Large-

scale machine learning on heterogeneous systems.

Software available from tensorﬂow.org.

Michelucci, U. (2022). An introduction to autoencoders.

CoRR, abs/2201.03898.

Niloofar, P. and Ganjali, M. (2014). A new multivariate im-

putation method based on bayesian networks. Journal

of applied statistics, 41(3):501–518.

Niloofar, P., Ganjali, M., and Rohani, M. F. (2013). Im-

proving the performance of bayesian networks in non-

ignorable missing data imputation. Kuwait Journal of

Science, 40(2).

Pereira, R. C., Santos, J. C., Amorim, J., Rodrigues, P., and

Henriques Abreu, P. (2020a). Missing image data im-

putation using variational autoencoders with weighted

loss.

Pereira, R. C., Santos, M. S., Rodrigues, P. P., and Abreu,

P. H. (2020b). Reviewing autoencoders for missing

data imputation: Technical trends, applications and

outcomes. Journal of Artiﬁcial Intelligence Research,

69:1255–1285.

Phillips, T. and Abdulla, W. (2019). Class embodiment au-

toencoder (ceae) for classifying the botanical origins

of honey. pages 1–5.

Phillips, T. and Abdulla, W. (2022). A new honey adulter-

ation detection approach using hyperspectral imaging

and machine learning. European Food Research and

Technology, 249.

Phung, S., Kumar, A., and Kim, J. (2019). A deep learning

technique for imputing missing healthcare data. vol-

ume 2019, pages 6513–6516.

Rand, W. M. (1971). Objective criteria for the evaluation of

clustering methods. Journal of the American Statisti-

cal association, 66(336):846–850.

S. Li, M. L. and et al. (2022). Handling missing values in

healthcare data: A systematic review of deep learning-

based imputation techniques.

Schouten, R., Lugtig, P., and Vink, G. (2018). Generating

missing values for simulation purposes: a multivariate

amputation procedure. Journal of Statistical Compu-

tation and Simulation, 88:1–22.

Sriram, S., Dwivedi, A., Chitra, P., Sankar, V., Abirami, S.,

Durai, S., Pandey, D., and Khare, M. (2022). Deep-

comp: A hybrid framework for data compression us-

ing attention coupled autoencoder. Arabian Journal

for Science and Engineering, 47.

Torabi, H., Mirtaheri, S., and Greco, S. (2023). Practical

autoencoder based anomaly detection by using vector

reconstruction error. Cybersecurity, 6.

Twala, B. (2009). An empirical comparison of techniques

for handling incomplete data using decision trees. Ap-

plied Artiﬁcial Intelligence, 23:373–405.

Venkataraman, P. (2022). Image denoising using convolu-

tional autoencoder.

Wang, Z., Akande, O., Poulos, J., and Li, F. (2021). Are

deep learning models superior for missing data impu-

tation in large surveys? evidence from an empirical

comparison. CoRR, abs/2103.09316.

Zhou, X.-H., Zhou, C., Lui, D., and Ding, X. (2014). Ap-

plied missing data analysis in the health sciences.

John Wiley & Sons.

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

1244