feasible and suitable to solve the problem of flower
classification.
To achieve the final goal, this paper first decided
and downloaded a suitable dataset Oxford 102
Flowers that contains 8, 200 samples of 102 different
types of flowers. Then this study standardized the
image by resizing all of them to 200-pixel length
squares and converting them into grayscale. The
model’s neural network consists of 4 convolutional
layers and 4 dense layers, and all of them use
Rectified Linear Unit (ReLu) as the activation
function. The input data is the pixel of an image, and
the output is a vector that reflects the possibility of
each of the 102 types. In the training process, the
Adaptive Moment Estimation (Adam) was used as an
optimizer, and the Cross Entropy Error was used as a
loss function. After training of 100 epochs, the model
accuracy finally reaches about 70%.
2 METHOD
2.1 Dataset Preparation
In the dataset preparation part, Oxford 102 Flower
(102 Category Flower Dataset) is chosen as the
training dataset in this study (Nilsback, 2008). This
dataset consists of 8,200 RGB images divided into
102 flower categories that are commonly occurring in
the United Kingdom. Each class contains between 40
and 258 images, and the images have large scale,
pose, and light variations. Including these diverse
and complicated features, this dataset is suitable for
the model that aims to solve the complex flower
classification task. When preprocessing the images,
the grayscale images were first considered an
appropriate method. As the RGB images are
translated into the grayscale, image noise is reduced,
while texture and structural features are enhanced,
thus increasing the efficiency of model processing.
Figure 1 clearly shows the difference between the
RGB images and grayscale images.
Figure 1: The RGB version(right) and the grayscale
version(left) of a flower image (Photo/Picture credit:
Original).
However, the study soon found that this
preprocessing method leads to a low accuracy
because the color is a necessary indicator of the
flower categories. As a result, the gray-scale images
are not feasible in the flower classification task;
instead, considering the importance of the color,
random adjustments of image brightness, saturation,
contrast, and hue were used to enhance the model's
adaptability to color changes. In addition, random
up-and-down or side-to-side flips also improve the
generalization of the model, and all of the images are
resized into 200 pixels length squares, as normalizes
the input shape of the model.
2.2 Convolutional Neural
Network-Based Prediction
In the step of building up the neural network model,
the Convolutional Neural Network (CNN) is first to
be considered. CNN is predominantly used to extract
the features from the grid-like matrix dataset, such as
visual datasets like images or videos (Li, 2021; Gu,
2018; O'Shea, 2015). The components in the CNN
include the Input Layer, Convolutional Layer,
Activation Layer, Pooling Layer, Flattening, Fully
Connected Layer, and Output Layer. In these layers,
the Convolutional Layer and Pooling Layer are the
essences that make CNN different from other models.
Imagine that an image is a cuboid; specifically, the
width and length of the cuboid are those of the image
while the channels(height) can represent the RGB
value of each pixel. By taking a small patch of this
cuboid and running a small neural network, which is
called a filter, the Convolutional Layer can extract the
feature of an image when the filter slides in the cuboid
and convert them into another image with different
width, height, and channels, which is referred as
feature maps. As important as the Convolutional
Layer, the Pooling Layer is periodically inserted
between the Convolutional Layer to reduce the size of
volume which makes the computation fast reduces
memory produced by multiple feature maps and
prevents overfitting. Because the advantages of CNN
include high accuracy of image analysis, robustness
to image deformation and rotation, and the need for a
large amount of label data, this network is suitable for
the flower classification task that meets these
conditions.
Considering the complex data contained in each
image, the study uses 4 Convolutional Layers, 4
Pooling Layers, and 3 Fully Connected Layers to
construct the model. Nevertheless, in this version, the
model still does not perform well after 100 epochs.
By studying the flaws of the traditional CNN, the
MLSCM 2024 - International Conference on Modern Logistics and Supply Chain Management