Mixed Pattern Recognition Methodology on Wafer Maps
with Pre-trained Convolutional Neural Networks
Yunseon Byun and Jun-Geol Baek
School of Industrial Management Engineering, Korea University, Seoul, South Korea
Keywords: Classification, Convolutional Neural Networks, Deep Learning, Smart Manufacturing.
Abstract: In the semiconductor industry, the defect patterns on wafer bin map are related to yield degradation. Most
companies control the manufacturing processes which occur to any critical defects by identifying the maps so
that it is important to classify the patterns accurately. The engineers inspect the maps directly. However, it is
difficult to check many wafers one by one because of the increasing demand for semiconductors. Although
many studies on automatic classification have been conducted, it is still hard to classify when two or more
patterns are mixed on the same map. In this study, we propose an automatic classifier that identifies whether
it is a single pattern or a mixed pattern and shows what types are mixed. Convolutional neural networks are
used for the classification model, and convolutional autoencoder is used for initializing the convolutional
neural networks. After trained with single-type defect map data, the model is tested on single-type or mixed-
type patterns. At this time, it is determined whether it is a mixed-type pattern by calculating the probability
that the model assigns to each class and the threshold. The proposed method is experimented using wafer bin
map data with eight defect patterns. The results show that single defect pattern maps and mixed-type defect
pattern maps are identified accurately without prior knowledge. The probability-based defect pattern classifier
can improve the overall classification performance. Also, it is expected to help control the root cause and
management the yield.
1 INTRODUCTION
The semiconductor manufacturing process is fine and
sophisticated. So, if a problem in any part of the
process occurs, it can be fatal on the yield. Yield
means the percentage of the actual number of good
chips produced, relative to the maximum number of
chips on a wafer. As the yield is the product quality
in the semiconductor industry, many engineers strive
to increase the yield.
One way to increase the yield is to check the
defect pattern on wafer bin maps and control the
causes of yield degradation. Wafer bin maps can be
obtained during the EDS(electrical die sorting) test.
EDS test is a step that checks the quality of each chip
on wafers. By testing various parameters such as
voltage, current, and temperature, the chips are
tagging good or bad. Then, engineers can identify
defect patterns that appear on the map. Defect
patterns contain various type such as center, donut,
scratch, and ring. Figure 1 shows the example of
pattern types on wafer bin maps. Each pattern is
related to the different causal factors. If the pattern is
exactly identified, it can be estimated what problem
occurs.
In fact, many engineers still check the map
visually (Park, J., Kim, J., Kim, H., Mo, K., and Kang,
P., 2018). So, it is difficult to identify many wafers
one by one as the demand for semiconductors
increases. Also, it is hard to classify the type,
especially when the patterns are mixed. Although
there are several studies for automatic classification,
more research on mixed patterns is still needed.
Figure 1: Various pattern types on wafer bin maps.
974
Byun, Y. and Baek, J.
Mixed Pattern Recognition Methodology on Wafer Maps with Pre-trained Convolutional Neural Networks.
DOI: 10.5220/0009177909740979
In Proceedings of the 12th International Conference on Agents and Artificial Intelligence (ICAART 2020) - Volume 2, pages 974-979
ISBN: 978-989-758-395-7; ISSN: 2184-433X
Copyright
c
2022 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
The real data contains the maps that are difficult
to distinguish any pattern or many patterns are mixed
on a wafer. If the mixed-type defects are incorrectly
determined as a single-type defect, the causal factors
cannot be identified properly. This may affect any
critical defects and cause yield degradation. So,
accurate pattern classification is needed.
In this paper, we propose a pattern classification
method with a pre-trained convolutional neural
network model. The automatic classification method
based on the probabilities increases the overall
classification accuracy. Also, it helps to identify the
exact causes of defects and improve the yield.
The paper is organized as follows. Section 2
presents the related algorithms, and the proposed
method is explained in section 3. Section 4 presents
the experimental results on the wafer map image data
to verify the performance of the model. Finally,
section 5 describes the conclusion.
2 RELATED ALGORITHMS
Generally, the convolutional neural networks are
good at image processing. So, we use the
convolutional neural networks as a classifier. Before
classifying, the weights are learned by convolutional
autoencoders. These are utilized on the convolutional
neural networks to increase performance. In this
section, the convolutional autoencoder and the
convolutional neural networks are described.
2.1 Convolutional Autoencoder
The convolutional autoencoder is an unsupervised
learning model that learns features from images
without label information (Guo, X., Liu, X., Zhu, E.,
and Yin, J., 2017). When the number of data is small,
it can be overfitted for the training data by using the
supervised learning model. Then, it is more effective to
use an unsupervised learning method such as
autoencoder (David, O. E., & Netanyahu, N. S., 2016).
Autoencoder consists of an encoder and a
decoder, as shown in Figure 2. The key features are
trained in the hidden layer and the rest is lost in the
encoder. Then, the high dimensional data can be
reduced to the latent representations. A decoder
makes the approximation of the input as the output.
The output has to be close to the input as possible.
When as the number of nodes in the middle layer is
smaller than the number of nodes in the input layer,
the data can be compressed. The conventional
autoencoder represents high dimensional data as a
single vector.
Figure 2: The architecture of convolutional autoencoder.
The convolutional autoencoder, on the other
hand, compresses the data while maintaining the
coordinate space. So, it does not lose the space
information when the image data is used.
Convolutional autoencoder has the structure of
autoencoder using convolutional layers.
An encoder extracts the features from the
convolutional layers and pooling layers. The
convolutional layers find kernel patterns in the input
image. Kernels perform convolutional operations that
multiply the height and width of the images and
generate activation maps (Kumar, Sourabh, and R. K.
Aggarwal., 2018). And, the features are compressed
in the pooling layers. It helps to find important
patterns or reduce the number of computations (Park,
J., Kim, J., Kim, H., Mo, K., and Kang, P., 2018).
Max pooling is the most representative. An activation
map is divided by mn to extract the largest value.
Then, representative features can be extracted. After
the convolutional layers and pooling layers, the latent
representation
is generated in equation (1). is an
activation function for the input data . And, means
convolutional operation.
∙

(1)
A decoder reconstructs the data from the encoded
representation to be as close to the original input. The
reconstructed output value can be obtained from
equation (2).
means flip operation of weights and
c is biased.


∗

∈

(2)
The reconstruction loss measures how well the
decoder is performing and how close the output is to
the original data (Kumar, Sourabh, and R. K.
Aggarwal., 2018). The model is trained for
minimizing the reconstruction loss (Masci, J., Meier,
U., Ciresan, D., and Schmidhuber, J., 2011).
Mixed Pattern Recognition Methodology on Wafer Maps with Pre-trained Convolutional Neural Networks
975
2.2 Convolutional Neural Networks
The convolutional neural networks are end-to-end
models that can be used on feature extraction and
classification. The hidden layers of CNNs consist of
convolutional layers, pooling layers, fully connected
layer, and output layer. The architecture is shown in
Figure 3.
Figure 3: The architecture of convolutional neural networks.
Features are extracted from convolutional layers
and pooling layers. After that, features are used for
classifying on a fully connected layer. Neurons in a
fully connected layer have full connections to all
activations in the previous layers. And, the number of
nodes in an output layer is the same as the number of
classes. On the output layer, the probabilities of classes
to be assigned are obtained using softmax function in
(3). The input data is .
means the weights of

node, and
means the bias of

node.
|,
,



(3)
The loss function of the model in (4) can be
minimized using a backpropagation algorithm
(Cheon, S., Lee, H., Kim, C. O., and Lee, S. H., 2019).
The backpropagation algorithm looks for the
minimum error in weight space. The weights that
minimize the error is considered to be a solution to
the problem. is the number of data and is the
number of classes. And, 1
∈
is a function that has a
value of 1 when the actual class is .

1
1
∈



∈


(4)
The probabilities belonging to the

class are
obtained on the softmax function. And then, the node
with the highest probability value is classified into the
final class.
2.3 Pre-trained Convolutional Neural
Networks
The pre-trained convolutional neural networks is a
combination of convolutional autoencoder and
convolutional neural networks, as shown in Figure 4
(Masci, J., Meier, U., Ciresan, D., and Schmidhuber,
J., 2011). The weights of convolutional autoencoder
are used for initializing the weights of convolutional
neural networks.
Figure 4: The architecture of the pre-trained model.
The typical convolutional neural networks set the
initial weights randomly. On the other hand, using
weights of convolutional autoencoder make the data
be represented in low dimensional structures clearly.
It helps to create a classifier that reflects the features
more (Kumar, S., and Aggarwal, R. K., 2018).
3 THE PROPOSED METHOD
In the paper, we propose the pattern classification
method based on probabilities on softmax function.
The method contains three steps to classify the defect
patterns of wafer maps. The first step is to initialize
the weights with the convolutional autoencoder. The
second step is to create pre-trained convolutional
neural networks for single-type pattern classification.
The final step is to determine whether the patterns on
wafer maps are mixed based probabilities on softmax
function. Figure 5 shows the proposed method with
the pre-trained model.
3.1 Initialization of Weights with
Convolutional Autoencoder
Image data contains the coordinate information in pixel
units. The coordinate information of a defective die is
important. As the convolutional autoencoder preserves
ICAART 2020 - 12th International Conference on Agents and Artificial Intelligence
976
Figure 5: The proposed method.
all coordinate information of the inputs, it can prevent
distortion in the feature space. So, training with the
convolutional autoencoder is effective for extracting
the features of images (Masci, J., Meier, U., Ciresan,
D., and Schmidhuber, J., 2011).
3.2 Pre-trained Convolutional Neural
Networks Training
The convolutional neural networks are known for the
classifier which has high performance on image data.
It uses the softmax function to classify classes.
Softmax is one of the activation functions. It takes
extracted features as inputs and calculates the
probabilities on each class. The sum of probabilities
is 1. And then, a class that has the highest probability
becomes the result of classification.
To get the probabilities of each class, we train the
convolutional neural networks with the single-type
pattern data. Then, the weights trained with the
convolutional autoencoder are set to the initialized
value of convolutional neural networks. It makes the
weights be finely adjusted and key information is
preserved. So, the features of the images are well-
reflected. Also, the pre-trained model is effective when
the number of labeled data is small (Kohlbrenner, M.,
Hofmann, R., Ahmmed, S., and Kashef, Y., 2017).
3.3 Probabilistic Mixed-type Pattern
Recognition on Wafer Maps
The model can obtain the probabilities on softmax
function. And, the class that the highest probability is
assigned becomes the result for classification. If there
are single-type defects, the probability of certain node
is prominent. However, for mixed-type patterns, there
are the several highest probabilities, so the difference
in maximum and next maximum is not large. The
calculation between the difference of probabilities
and a threshold in (5) is used for determining whether
the patterns are mixed or not. The prob
∈
means the probability that belongs on the class .
threshold
maximum
prob
∈

Nextmaximum
prob
∈

(5)
If the difference of probabilities is large, it means
that the pattern type of wafer map is clearly single.
And, if the difference is not large, it means that the
pattern type is mixed. The threshold value is specified
after checking the probability distribution of data.
4 THE EXPERIMENTAL RESULT
The dataset used in the experiment is WM-811K. It
contains 172,950 wafer map images. Each pixel in the
image represents a die on wafer maps. After testing, the
normal chip is represented as 1, and the defective chip
is represented as 2. Although the shape of the wafer is
a circle, the inputs of the convolutional neural network
have to be square. So, the empty pixel is represented as
0. The goal of the experiment is to classify the
defective patterns. So, we consider the only failure
types, not normal or no label. The number of usable
data is 25,519 but the size of all images is not the same.
Most of them are rectangular. Then, we resize the data
to the 2626 square images for the training of CNNs.
Mixed Pattern Recognition Methodology on Wafer Maps with Pre-trained Convolutional Neural Networks
977
Finally, we use only 7,915 to train the model, and the
data is divided into 7:3 and used for training and testing.
The inputs consist of eight single-type pattern
data, shown in Figure 6. There are center, donut,
scratch, random, edge-loc, edge-ring, loc, near-full.
Figure 6: The single-type defect pattern maps.
The data given contains only single-type defect
patterns. In this study, the recognition of mixed-type
defect patterns requires the mixed-type data on wafer
maps. We generated the mixed-type patterns through
the computation of single-type patterns. The target
patterns for generation are center, scratch, edge-loc,
and edge-ring patterns. We assume that only two
single-type patterns can be mixed. The generated
patterns are the mixture of edge-loc and scratch, the
mixture of edge-loc and center, the mixture of edge-
ring and scratch, the mixture of edge-ring and center,
and the mixture of scratch and center as shown in
Figure 7. The number of generated data is 580.
Figure 7: The mixed-type defect pattern maps.
4.1 The Result for Single-type Pattern
Classification
The pre-trained model is used for classifying the
single-type pattern maps to calculate the sigmoid
probability value. Figure 8 shows the 10-fold
classification performance of models. Compared with
the machine learning method like support vector
machine and random forest, the performance of the
pre-trained convolutional neural networks is excellent.
In particular, the proposed model outperformed the
original convolutional neural networks. This shows
that the setting of the initial weights, which reflect the
data characteristics using a convolutional autoencoder,
helps better classification of the patterns.
Figure 8: Classification performance of models.
The following Table 1 shows the result for
classification with the single-type test cases. Overall
classification accuracy is high, but the accuracy of
scratch, edge-loc, loc patterns is relatively low.
Table 1: Classification accuracy for the single-type data.
Pattern Type Accuracy
Center 97%
Donut 100%
Scratch 85.5%
Random 90%
Edge-Loc 87%
Edge-Ring 98.5%
Loc 85.4%
Near-full 100%
4.2 The Result for Mixed-type Pattern
Classification
The defect pattern maps can be determined whether
the patterns are mixed or not by calculating between
the threshold and the difference of probabilities on
softmax function. Then, the value of the threshold is
obtained from the distribution of probabilities on
softmax. Figure 9 describes the histogram of
probabilities. In the figure, the value is determined to
0.96 empirically.
By calculating the threshold and the difference of
probabilities, the mixed-type patterns can be
recognized. Table 2 shows the testing result of the
mixed-type data. In these results, we judge that the
model classified correctly only if it recognizes the
mixed-type pattern and accurately detects which one
is mixed. Although the model is well-determined for
mixing, it was not good for detecting a single-type
pattern that constitutes a mixed-type pattern. Among
ICAART 2020 - 12th International Conference on Agents and Artificial Intelligence
978
them, the accuracy is relatively low when an edge-
ring pattern is mixed.
Figure 9: The histogram for the difference of probabilities.
Table 2: Classification accuracy for the mixed-type data.
Pattern Type Accuracy
Center + Edge-Loc 59%
Scratch + Edge-Loc 41.6%
Center + Edge-Ring 69%
Scratch + Edge-Ring 64.4%
Center + Scratch 40%
5 CONCLUSIONS
This paper proposed the probabilistic method for
classifying defect patterns on wafer bin maps. We
construct the pre-trained model with the convolutional
autoencoder and convolutional neural networks. And,
we determine whether the patterns are mixed on wafer
maps, by calculating between the threshold and the
difference of probabilities. Experiments with WM-
811K data verifies the performance of the model. The
classification performance for the single-type pattern
of the model is excellent, but the performance for the
mixed-type pattern is relatively low. It is assumed that
the patterns of training data are not clearly
distinguished and that the threshold value is set to a
very high value due to the imbalance in the number of
single-type data and mixed-type data. So, it is
necessary to supplement such parts later. And, we
assume the only two patterns can be mixed, so the
study for more mixed-type patterns has to be conducted
to apply for the actual data.
ACKNOWLEDGEMENTS
This work was supported by the National Research
Foundation of Korea (NRF) grant funded by the Korea
government (MSIT) (NRF-2019R1A2C2005949).
This work was also supported by the BK21 Plus (Big
Data in Manufacturing and Logistics Systems, Korea
University) and by the Samsung Electronics Co., Ltd.
REFERENCES
Cheon, S., Lee, H., Kim, C. O., & Lee, S. H., 2019.
Convolutional Neural Network for Wafer Surface
Defect Classification and the Detection of Unknown
Defect Class. IEEE Transactions on Semiconductor
Manufacturing, 32(2), 163-170.
David, O. E., & Netanyahu, N. S., 2016. Deeppainter:
Painter classification using deep convolutional
autoencoders. In International conference on artificial
neural networks (pp. 20-28). Springer, Cham.
Guo, X., Liu, X., Zhu, E., & Yin, J., 2017. Deep clustering
with convolutional autoencoders. In International
Conference on Neural Information Processing (pp. 373-
382). Springer, Cham.
Kim, H., & Kyung, G., 2017. Wafer Map defects patterns
classification using neural networks. Korean Institute of
Industrial Engineers, 4319-4338.
Kohlbrenner, M., Hofmann, R., Ahmmed, S., & Kashef, Y.,
2017. Pre-Training CNNs Using Convolutional
Autoencoders.
Kumar, S., & Aggarwal, R. K., 2018. Augmented
Handwritten Devanagari Digit Recognition Using
Convolutional Autoencoder. In 2018 International
Conference on Inventive Research in Computing
Applications (ICIRCA) (pp. 574-580). IEEE.
Masci, J., Meier, U., Cireşan, D., & Schmidhuber, J., 2011.
Stacked convolutional auto-encoders for hierarchical
feature extraction. In International Conference on
Artificial Neural Networks (pp. 52-59). Springer,
Berlin, Heidelberg.
Nakazawa, T., & Kulkarni, D. V., 2018. Wafer map defect
pattern classification and image retrieval using
convolutional neural network. IEEE Transactions on
Semiconductor Manufacturing, 31(2), 309-314.
Park, J., Kim, J., Kim, H., Mo, K., & Kang, P., 2018. Wafer
Map-based Defect Detection Using Convolutional
Neural Networks. Journal of the Korean Institute of
Industrial Engineers, 44(4), 249-258.
Wu, M. J., Jang, J. S. R., & Chen, J. L., 2014. Wafer map
failure pattern recognition and similarity ranking for
large-scale data sets. IEEE Transactions on
Semiconductor Manufacturing, 28(1), 1-12.
Mixed Pattern Recognition Methodology on Wafer Maps with Pre-trained Convolutional Neural Networks
979