computational costs. This paper contains max pooling
and global average pooling. Both of them are
common pooling operations. The fully connected
layer combines and classifies the features extracted
by the convolutional and pooling layers. It connects
all neurons of this current layer to all neurons in the
previous layer to achieve comprehensive analysis and
judgment of the features. The fully connected layer is
usually used in the last few layers of CNN to output
the classification result or prediction value.
Inside the function of model part, a sequential
model is created by using the tf.keras.Sequential
class. The model contains three max pooling layers,
one global average pooling layers, three fully
connected layers, two Dropout layers, and four
convolutional layers. To be specific, the model
structure constructs as can be seen below:
The first layer is a convolutional layer with 32
filters: kernel size is (3, 3), the activation method is
ReLU, and input shape is (200, 200, 3). The second
max pooling layer contains a (2, 2) pool size. The
third layer is a convolutional layer with 64 filters. The
kernel size is still (3, 3) and the activation function is
still ReLU. These components except the number of
filters are not going to change for every convolutional
layer. The fourth layer is the same as the second one.
Then there are two convolutional layers with 128 and
256 filters respectively. After that is the same max
pooling layer as well. Then there is a global average
pooling layer. After that are two fully connected
layers with 256 and 128 neurons respectively. The
activation function is Rectified Linear Unit (ReLU),
and L2 regularization are both used in two fully
connected layers. Behind each fully connected layer
is a Dropout layer with a dropout rate of 0.5. Finally,
there is an output layer with 102 neurons and the
activation function is softmax.
2.3 Implementation Details
This experiment is implemented through tensorflow.
When compile the model, the optimizer is set
as Adam. In deep learning, an optimizer is used to
adjust the parameters of a model in order to minimize
the loss function. When training the model, this study
uses 100 epochs as training time number. In machine
learning and deep learning, an evaluation metric is a
measure used to assess the performance of a model.
3 RESULTS AND DISCUSSION
This study tries to build several versions of the model
for aiming higher accuracy. On the Figure 2 above,
the upper one represents the training history of the
loss of model of version 5, and the lower one
represents the accuracy of model. This figure is the
version of 20 epochs training. This figure shows the
evaluation results of the model while training. As
shown in the Figure 3 above, the x-axis labels the
times of epoch, and the y-axis represents the value of
loss and accuracy. The loss rates decrease over time,
while the accuracy rates increase. Training loss
measures the percentage of incorrect interpretation on
the training dataset, and validation loss evaluates the
performance of the model on a separate validation
dataset. The training loss descends from about 5.1 to
2.9, and the validation loss decreases from about 4.9
to 3.2. Vice versa, for training accuracy demonstrates
how well the model is learning on the training dataset,
while validation accuracy gives an indication of the
model's ability to generalize to new dataset given.
Training accuracy ascends from about 2% to 27%,
and validation accuracy increases from about 1% to
23%.
After 100 epochs of training of the model, the loss
and accuracy changes more drastically than just 20
epochs. The training loss cuts down to about 1.2, and
the validation loss reduces to about 1.5. The
differences between the formers and the later are
about 1.7. In turn, training accuracy is up to about
77%, and validation accuracy adds to about 69%.
Therefore, the accuracies grow 50% and 46% for the
following 80 epochs.
Firstly, random adjustments were made to the data
in the pre-processing part, and then the parameters
and layers of the model were adjusted and several
regularizers were added to build up this accuracy.
Afterwards, this study tries to improve the
performance of the model by increasing the
complexity of it and altering the parameters to
achieve a higher accuracy, which corresponds to the
later versions 6 and 7. However, the accuracy of these
two versions was only around 0.65, which was lower
than that of version 5. Therefore, version 5 was
ultimately selected. The sample size is insufficient to
support higher accuracy. After all, there are only
8,200 samples to be trained to recognize 102 flowers
and part of these samples have to be separated into
test dataset and validation dataset. This paper may
need to import some pre-trained models to be more
accurate.
Besides, the model training takes a long time. The
efficiency of the model still needs to be improved.
There are several probable methods to improve the
speed of model training: Changing the electronic
devices that were used, and using a more powerful
hardware; Optimizing the model architecture. Like,