
of quality and operational efficiency, ensuring that
manufacturing companies can meet market standards
as well as shortages.
Current AD methodologies using techniques
that are not graph-based in IIoT include extensive,
spanning Long Short-term Memory (LSTMs),
Auto-Encoders (AEs), Convolutional Neural
Networks (CNNs), and hybrids. An AE-based
Digital Twin model (Jeon et al., 2024) is used
for “Extreme Rare Anomalies” in the fabrication
process, yielding a high F1-score of 0.955. An
LSTM-based Autoencoder (Hwang et al., 2023)
works on anonymised wafer fabrication data from
the industry, whereas (Hsieh et al., 2019) uses an
LSTM-based Autoencoder with an extensive outline
of the current problems with wafer fabrication
anomaly detection, to achieve an F1-score of 0.924.
Although the aforementioned methodologies are
efficacious in their use case, they do not suite the
aims of this literature with regards to achieving
graph-based deep learning anomaly detection.
GNNs have also been applied for AD smart
manufacturing, where (Wu et al., 2022) has presented
examples from different sectors in IIoT, including
a Smart Factory, with basic models. Whilst being
deployed in great use cases, the models are more
of a proof of concept. Guan et al. (Guan
et al., 2022) implements a Temporal Convolutional
Network (Bai et al., 2018) and the Graph Attention
Network GATv2 (Brody et al., 2022) effectively
detects anomalies in the Mars Science Laboratory,
Soil Moisture Active Passive, and the Server Machine
datasets with F1-scores upwards of 0.95. Although
the proposed is a good architecture, Guan et al.
mostly outlines example use cases and semi-relevant
public datasets. Cassoli et al. (Cassoli et al., 2023)
uses Knowledge Graph creation (Bretones Cassoli
et al., 2022) to effectively implement a graph creation
pipeline for raw metrology manufacturing data and
feed this into a GNN achieving scores of 0.48. To our
knowledge, this work is the closest to this literature as
it uses metrology data from a smart factory to create
knowledge graphs and a GNN to detect graph-level
anomalies.
Cassoli et al. create Knowledge Graphs from
the public Bosch dataset
1
in a similar way to that
in Section 3.2, to create a timeline of events
with manufacturing data. Although the methods
are similar, (Cassoli et al., 2023) achieves an
F1-score of 0.48, suggesting that the GraphSAGE
architecture may not be as suitable for the task of
1
Bosch Production Line Performance
https://www.kaggle.com/competitions/
bosch-production-line-performance
graph classification versus the later explored Graph
Attention Network model.
As shown, there has been extensive work
surrounding AD in smart manufacturing, using both
non-graph and graph-based methods. However, we
believe there is a gap in current research surrounding
a foundational model that can identify anomalies in a
timeline of graph-based metrology data. This would
supply an opportunity for interpretable and accessible
GNN results, with inherent access to graph-based
visualisations through the nature of the graph-form.
Using a real-world use case from industry partners
Seagate Technology, we demonstrate the application
for this type of model on real-world data.
3 METHODOLOGY
3.1 Fabrication Dataset
The data is taken from one of two of Seagate’s
semiconductor wafer fabrication facilities in
Springtown, which manufactures read/write heads
for hard disk drives. It contains over 1000 steps and
can frequently reach over 70 unique machines
2
, with
many more Quality Control (QC) checks involving
engineer-set thresholds. These QC checks use
data from inspection stations, or process tools, that
contain many different types of metrology sensors
to execute miniscule measurements on the wafers
in the area of one nanometre. Certain sensors can
be the same type yet may have different parameters,
measuring important features on the wafer such
as thickness, uniformity and topology. However,
like any smart factory, the fabrication process can
suffer from defects through many different sources.
These defects can be caused by the equipment, the
environment, the materials, or the process itself.
The data from the QC checks are not always
indicative of the source of the defect and can be
challenging to interpret. When a defect occurs, the
wafer is pulled and inspected by an engineer to
determine if the wafer is nominal, can be reworked,
or if it must be scrapped. The latter two options
then involve a data science team to investigate the
root cause of the defect, which can be a very
time-consuming process and potentially costly to
any more faulty wafers produced. An example of
a fault that occurred in the past involved a dry
etching machine that was not functioning correctly
2
Springtown – A Hard Drive Factory Like
No Other https://www.silicon.co.uk/workspace/
springtown-a-hard-drive-factory-like-no-other-119580
Predictive Quality of In-Fabrication Products in Smart Manufacturing Using Graph-Based Deep Learning
147