
semble methods (Khanzadeh et al., 2018), as well as
deep neural network architectures (Patil et al., 2023).
In addition, unsupervised learning techniques—such
as self-organizing maps (SOM), K-means clustering,
and Density-Based Spatial Clustering of Applications
with Noise (DBSCAN) —have also been explored for
identifying anomalies in thermal data during the DED
process (Taheri et al., 2019; Garc
´
ıa-Moreno, 2019;
Farea et al., 2024).
However, several research gaps remain unad-
dressed. First, there is a lack of comparative stud-
ies that rigorously evaluate supervised and unsuper-
vised ML methods side-by-side for DED defect de-
tection, especially in terms of robustness under real-
world conditions. Most prior works tend to focus
exclusively on one paradigm without assessing the
strengths and limitations of both. Second, threshold
selection in anomaly detection—particularly for un-
supervised methods like autoencoders—often lacks
standardized or domain-specific criteria, making it
difficult to generalize results across datasets and ap-
plications. Lastly, practical concerns such as extreme
class imbalance, data quality issues, and interpretabil-
ity are frequently underexplored, despite their signifi-
cant impact on real-world deployment.
In this study, we introduce a comprehensive
framework that leverages both supervised and unsu-
pervised ML techniques to detect porosity-related de-
fects in DED using thermal image data. The unsu-
pervised models include autoencoders and DBSCAN,
whilst the supervised models include Random For-
est, Extreme Gradient Boosting (XGBoost), and Con-
volutional Neural Networks (CNNs). These mod-
els are tested on a dataset comprising 1,564 thermal
images of the melt pool, where only 4.5% of im-
ages contain porosity-related defects (Zamiela et al.,
2023). This severe class imbalance introduces model-
ing challenges and necessitates preprocessing strate-
gies tailored to noisy, sparse, and imbalanced data.
A preprocessing pipeline was implemented to over-
come these limitations, including outlier removal,
data imputation, normalization, and class rebalancing.
Furthermore, feature extraction was used to support
interpretable models and reduce input dimensional-
ity. Extracted features include statistical descrip-
tors of the melt pool’s thermal distribution—such as
mean, standard deviation, skewness, and interquartile
range—which have been shown in prior research to
correlate with melt pool quality and porosity forma-
tion (Garc
´
ıa-Moreno, 2019).
This work offers a comparative study of ML meth-
ods for DED defect detection, with an emphasis on
practical issues such as data imbalance, preprocess-
ing complexity, thresholding strategies, and model
interpretability. By combining domain knowledge
with modern ML techniques, the proposed framework
aims to advance real-time defect detection in additive
manufacturing. Ultimately, the results contribute to
the goal of establishing DED as a robust and reliable
process for safety-critical industrial applications.
2 METHODOLOGY
2.1 Preprocessing Pipeline
Due to inconsistencies and noise present in the raw
thermal image data, a robust preprocessing pipeline
was implemented to prepare model-ready inputs and
enhance the performance of both supervised and un-
supervised learning models.
Some images in the dataset contain zero-valued
pixels and/or pixels with missing values. The values
of these pixels were imputed using the mean of their
respective columns within each image. Then, the ther-
mal data underwent min-max normalization, scaling
pixel values to the range of [0, 1]. This normalization
step was crucial for stabilizing gradient-based opti-
mizers in neural network training and ensuring con-
sistency and comparability across ML models.
For the shallow models, feature extraction was
employed to reduce dimensionality and enhance in-
terpretability. Each thermal image was transformed
into a structured feature vector comprising 11 statis-
tical descriptors: minimum (Min Temp), maximum
(Max Temp), mean (Mean Temp), standard devia-
tion (Std Temp), median (Median Temp), first quar-
tile (Q1), third quartile (Q3), interquartile range
(IQR), skewness, kurtosis, and peak temperature pixel
(High Temp Pixels). These features summarize the
spatial and statistical properties of the melt pool tem-
perature distribution, providing a compact and infor-
mative input format for training the supervised tree-
based classifiers—the Random Forest and XGBoost
classifiers.
Considering the inherent imbalance in the dataset,
the Synthetic Minority Oversampling Technique
(SMOTE) was employed during the training of the
supervised models. SMOTE generates synthetic mi-
nority class samples by interpolating between existing
anomalous instances, mitigating classification bias,
and improving the defect detection recall.
Collectively, these preprocessing strategies en-
sured the dataset’s compatibility and consistency
across diverse ML models, ultimately enhancing the
effectiveness and interpretability of the anomaly de-
tection framework.
ICINCO 2025 - 22nd International Conference on Informatics in Control, Automation and Robotics
504