performance of the model on unseen data and more
accurately reflects the generalization ability of the
model rather than just its performance on the training
data. By using the validation set, this work can detect
whether the model is overfitting the training data. If
the model performs well on the training set but poorly
on the validation set, it may indicate overfitting. By
fixing the random seed, the split is guaranteed to be
reproducible, which is very useful for debugging and
comparing model performance.
2.3 Data Augmentation
To enhance the model's learning capabilities, this
work implemented a series of data augmentation
techniques. Initially, pixel values were normalized
from [0, 255] to [0, 1] to streamline model processing
and expedite training convergence. The data
augmentation included random rotations up to 40
degrees and translations of up to 20% in width or
height, simulating varied object positions and
orientations. Shearing was applied to introduce
complex deformations, while random scaling up to
about 20% helped the model recognize objects of
different sizes as the same category. Horizontal
flipping of images was also performed to expand the
training dataset. Additionally, any blank spaces
created by these transformations were filled with the
nearest pixel value to preserve image integrity.
Consistent with the training data, the validation set
was normalized to maintain uniform data formatting,
which is essential for effective model training and
evaluation (Xu, 2023).
2.4 Model Architecture
VGG16 is a very early CNN model. When deep
learning became popular, it was the best neural
network architecture at the time (Simonyan, 2014).
So, it is a very good control group, reference group.
It deepens the network by stacking 3*3 convolutional
layers. Its reference volume is relatively large, about
138Millions, and its design is relatively simple. So,
for some basic image classification tasks, its
performance is still relatively good. But it may be a
bit difficult to distinguish the types of cats, because it
needs to capture more complex and subtle features,
and vgg16 is not as good as the new architecture in
this regard.
EfficientNetB0 is a new type of neural network
architecture (Tan, 2019). Its design has been
optimized and is different from VGG16. It balances
depth, width and resolution through compound
scaling technology, greatly improving accuracy. It
has a low reference size, only about 5.1 million
parameters. So, from a design point of view, it is
lighter and more efficient than VGG16. It is stronger
than VGG16 in solving complex image classification
tasks. This study is the results and performance
analysis of the training model to compare the
performance of VGG16and EfficientNetB0 when
used to classify cat breeds. Their performance is
analyzed and compared by using prediction rate,
readiness rate, F1 score and confusion matrix (Hossin,
2015).
3 EXPERIMENT AND RESULTS
3.1 Training Details
This study used the following training techniques to
optimize the model. They are ModelCheckpoint,
EarlyStopping, and ReduceLROnPlateau.
ModelCheckpoint is a callback function that helps
users to save the best performing model on the
validation set during training. EarlyStopping is a
callback function used to stop training early (Yao,
2007). If the performance on the validation set (e.g.,
validation loss) does not improve within a certain
number of epochs, training will end early to avoid
overfitting. ReduceLROnPlateau is a callback
function used to reduce the learning rate when the
model performance does not improve. It
automatically reduces the learning rate when the
validation set performance does not improve within a
certain number of epochs. This helps the model adjust
weights more smoothly when it is close to the optimal
solution.
The model needs to be trained in two phases. The
last few layers were selectively unfrozen for fine
tuning after freezing the convolutional layer at the
first stage, by freezing convolutional layers one can
be sure that it has learned general low-level features
and simple objects. This prevents overfitting. During
the second phase, the author unfrozes most of these
last layers only for some fine-tuning tasks specify
ones to learn new features from this data while
preserving all other learned general features
previously. Applying a low learning rate, fine-tuning
weighs the end layers to smoothly adjust them
through while training on those new tasks.
3.2 Result Comparison
To compare the performance of the two trained
models, this work evaluated the models in various
ways. In addition to accuracy, it also includes