Banana Ripeness Level Classiﬁcation Using a Simple CNN Model

Trained with Real and Synthetic Datasets

Luis E. Chuquimarca

1,2 a

, Boris X. Vintimilla

1 b

and Sergio A. Velastin

3,4 c

ESPOL Polytechnic University, ESPOL, CIDIS, Guayaquil, Ecuador

UPSE Santa Elena Peninsula State University, UPSE, FACSISTEL, La Libertad, Ecuador

Queen Mary University of London, London, U.K.

University Carlos III, Madrid, Spain

Keywords:

External-Quality, Inspection, Banana, Maturity, Ripeness, CNN.

Abstract:

The level of ripeness is essential in determining the quality of bananas. To correctly estimate banana maturity,

the metrics of international marketing standards need to be considered. However, the process of assessing the

maturity of bananas at an industrial level is still carried out using manual methods. The use of CNN models

is an attractive tool to solve the problem, but there is a limitation regarding the availability of sufﬁcient data to

train these models reliably. On the other hand, in the state-of-the-art, existing CNN models and the available

data have reported that the accuracy results are acceptable in identifying banana maturity. For this reason,

this work presents the generation of a robust dataset that combines real and synthetic data for different levels

of banana ripeness. In addition, it proposes a simple CNN architecture, which is trained with synthetic data

and using the transfer learning technique, the model is improved to classify real data, managing to determine

the level of maturity of the banana. The proposed CNN model is evaluated with several architectures, then

hyper-parameter conﬁgurations are varied, and optimizers are used. The results show that the proposed CNN

model reaches a high accuracy of 0.917 and a fast execution time.

1 INTRODUCTION

Nowadays, most nutritionists agree that consuming

fruits is essential to have a daily nutritious diet. Many

people consume several fruits weekly. Markets are

the main vendors of fruit, which need to offer their

customers high-quality fruit. Therefore, international

markets demand quality control of fruits by agro-

industries based on international standards (Reid,

1985; Kader, 2002). One of the parameters for

fruit quality inspection is the level of ripeness, which

is related to the consumer’s appreciation for buy-

ing the product and the consumption time of some

fruits (Wang et al., 2018; Bhargava and Bansal, 2021).

The determination of the maturity of the fruit is

carried out manually in the agro-industries. There

are several weaknesses in the manual method. For

example, it is time-consuming, labor-intensive, and

can cause inconsistencies in determining banana ma-

turity by the personnel in charge. The rise of ma-

https://orcid.org/0000-0003-3296-4309

https://orcid.org/0000-0001-8904-0209

https://orcid.org/0000-0001-6775-7137

chine vision technology together with the evolution

of deep learning techniques can overcome the prob-

lems mentioned above, potentially being relatively

fast, consistent, and accurate. In agriculture, inno-

vative technologies such as artiﬁcial vision are used

for various tasks, such as fruit detection, fruit clas-

siﬁcation, and fruit quality determination (apples,

bananas, mangoes, strawberries, blueberries, among

others) (Naranjo-Torres et al., 2020). For fruit qual-

ity inspection, international standards consider three

essential aspects: colorimetry (maturity), geometry

(shape and size), and defects (texture). This work fo-

cuses on colorimetry that is directly proportional to

maturity; that is, depending on the fruit’s color level,

the maturity level can be identiﬁed (Tripathi and Mak-

tedar, 2021; Sun et al., 2021; Cao et al., 2021; Naik,

2019). For example, banana ripeness has seven levels,

according to the U.S. state Department of Agriculture

(USDA). Bananas are one of the most consumed fruits

worldwide due to its good taste and its high level of

nutrients.

Convolutional Neural Network (CNN) models are

deep learning techniques applied to computer vision

to identify banana ripeness. In some works, seven

536

Chuquimarca, L., Vintimilla, B. and Velastin, S.

Banana Ripeness Level Classiﬁcation Using a Simple CNN Model Trained with Real and Synthetic Datasets.

DOI: 10.5220/0011654600003417

In Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 5: VISAPP, pages

536-543

ISBN: 978-989-758-634-7; ISSN: 2184-4321

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

banana maturity levels are considered, but in others,

there are only four maturity levels due to the low num-

ber of images per level in the datasets (Saragih and

Emanuel, 2021).

Below is an overview of state of the art on CNN

models applied to banana maturity:

(Zhu and Spachos, 2021) uses a machine learn-

ing technique called Support Vector Machine (SVM)

to compare the results with a YOLOv3 model, which

is trained on a dataset containing few images of ba-

nanas, which generates an inaccurate model. On the

other hand, the YOLOv3 model considers only two

levels of maturity (semi-ripe and well-ripe) and ob-

tains an accuracy of 90.16%. Furthermore, the level

of maturity of the banana depends on the number of

small black areas detected in the texture of the ba-

nana, the greater the number of black dots found, the

greater the maturity of the banana. However, it does

not take into account international standards recom-

mendations.

(Saragih and Emanuel, 2021) evaluates only two

CNN models with the same number of epochs but

with a different number of initial layers to identify

four banana maturity levels (unripe/green, yellowish

green, semi-ripe, and overripe). The evaluation re-

sults of the MobileNetV2 and NASNetMobile models

showed an accuracy of 96.18% and 90.84%, respec-

tively. However, it uses a poor dataset for training

and validation, so the models are likely to be inaccu-

rate. Also, it only performs an evaluation with exist-

ing models.

(Ramadhan et al., ) used a deep CNN to identify

four maturity levels of Cavendish bananas. Banana

images are segmented using YOLO and then fed to a

VGG16 model trained with two different optimizers:

Stochastic Gradient Descent (SGD) and Adam. The

optimization model with SGD has a better accuracy

of 94.12% compared to Adam, which has an accuracy

of 93.25%. In that study, the number of images in the

dataset is sparse, and more CNN models could have

been evaluated.

(Zhang et al., 2018) designed their CNN model

for banana classiﬁcation considering seven maturity

levels. For the training and testing of the CNN model,

a dataset generated with a total of 17,312 images of

bananas is used. CNN performance results show an

accuracy of 95.6%. However, the research focuses on

a single CNN model. In addition, it does not present

evaluations of the proposed model against existing

models.

After reviewing the state of the art, it can be

said that one of the main problems in measuring ba-

nana maturity levels is that there are no public image

datasets robust enough to work with CNN models.

The generation of datasets might include not only real

images, but can also take advantage of some software

development tools available to generate synthetic im-

ages, using for example: Unreal Engine, Unity3D,

Dall·e mini, among others.

In this article, a CNN model is proposed to mea-

sure four levels of banana maturity, considering that

the model is not heavy and simple. Furthermore, this

work generates a robust dataset for the four banana

maturity levels by using real images plus synthetic

images. In the end, the proposed model with the gen-

erated dataset is evaluated against existing CNN mod-

els, setting speciﬁc hyper-parameters. The results ob-

tained from this evaluation verify that the proposed

CNN model obtains better metrics than existing CNN

models.

This paper is organized as follows. Section 2 de-

scribes the proposed methodology for developing the

work. Section 3 presents the results of banana matu-

rity inspection using the proposed model and evalu-

ates it with existing state-of-the-art CNN models. Fi-

nally, the conclusions are given in Section 4.

2 PROPOSED METHODOLOGY

The contributions of this paper are:

• Since there are currently no sufﬁciently large pub-

lic datasets with different banana maturity levels,

it is proposed to generate an extensive dataset of

images that combines synthetic and real data.

• An own CNN model to measure banana maturity

levels is proposed, the model is simple but with

good results.

• The proposed CNN model and the generated

datasets are evaluated against existing CNN mod-

els, using various conﬁgurations of the hyper-

parameters.

The generation of the dataset to be used in the

CNN models has two parts: the ﬁrst is the generation

of the synthetic dataset, and the second is the gener-

ation of the real dataset. It should be noted that the

synthetic dataset is much larger than the real dataset.

The proposed CNN model has two components:

the ﬁrst component is the design and implementa-

tion of a simple CNN model called CNN1, which

is trained with the synthetic dataset, resulting in the

generation of weights. The second component is the

application of transfer learning to the same proposed

CNN model but with the conﬁguration of weights

obtained in the CNN1 model, resulting in a CNN2

model to estimate the maturity levels of the banana,

Banana Ripeness Level Classiﬁcation Using a Simple CNN Model Trained with Real and Synthetic Datasets

537

which is trained with a dataset od real images, result-

ing in a more adjusted CNN model for banana matu-

rity measurement. Finally, the CNN model is evalu-

ated by comparing it with existing CNN models such

as: InceptionV3, ResNet50, Inception-ResNetV2,

and VGG19. This evaluation considers the conﬁgu-

ration of hyper-parameters and the application of op-

timizers. For a better understanding of the methodol-

ogy, see Figure 1.

Figure 1: Banana maturity identiﬁcation process.

2.1 Dataset Generation

For this work, two types of datasets are generated, one

real and the other synthetic, due to the limited num-

ber of real images available in public datasets and the

time demand in developing real image datasets. Pro-

viding a real and synthetic dataset of bananas images,

publicly available at https://github.com/luischuquim/

BananaRipeness.

2.1.1 Real Data

The dataset developed consists of 3,495 real images of

Cavendish bananas, which were taken in a laboratory

with a climate system between 15°C and 18°C for 28

days (the approximate duration of the ripening period

of this type of banana). Also, 4 levels of banana ma-

turity were considered for this work. Therefore, each

week the banana passed from one level of maturity to

another, as indicated in Table 1. However, bananas

have 7 maturity levels, but the number of images that

can be acquired per level for the dataset is low. For

this reason, it is necessary to group into 4 maturity

levels to obtain a greater number of images for the

dataset of each banana maturity level. The acquisition

of the set of images was carried out daily, considering

the maturity cycles of the banana. Therefore, we pro-

ceeded to collect 150 images per day. In the end, a

total of 4,200 images are generated, of which reﬁne-

ment is performed, removing images with noise, low

quality, poor lighting, wrong location of the banana,

and occlusions (see Figure 2). Therefore, the number

of images per level of maturity of the banana will be

variable. It should also be mentioned that some of the

images in the last days of the last level of maturity are

considered rotting and discarded (Ramadhan et al., ).

Table 1: Banana maturity levels per day.

Duration Level of maturity

1 - 6 days A

7 - 14 days B

15 - 22 days C

23 - 28 days D

Figure 2: Real Dataset Reﬁnement.

This procedure is costly and tedious because there

must be a staff dedicated to the data acquisition pro-

cess, during the time that the banana ripens. In ad-

dition, the conditions in which the bananas are kept

must be controlled, such as temperature. Also, it must

be taken into consideration that when moving bananas

care needs to be taken not to spoil them, because it can

cause bruises. So, to obtain a large number of images

for the dataset, which CNN models require, the ac-

quisition process must be performed multiple times,

leading to high costs and extensive staff time.

The acquisition of the images is carried out in dif-

ferent light conditions and backgrounds. However, a

reﬁnement of the images of the real dataset is carried

out, considering several aspects, such as whether the

bananas are at the corresponding maturity levels, and

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

538

eliminating images where the banana is not clearly

deﬁned. In the end, data augmentation techniques

such as rotation are applied, increasing the amount

of data. Therefore, the ﬁnal number of images in the

real reﬁned dataset is 3,495 images.

There are currently technological tools that allow

the generation of synthetic datasets. Therefore, this

type of tool is explored, such as the Unreal Engine. It

should be noted that there are other types of synthetic

image generation engines such as: Unity3D, CARLA,

or Dall·e mini (Ivanovs et al., 2022; Deiseroth et al.,

2022).

2.1.2 Synthetic Data

Synthetic datasets are an important complement to

the application in CNN models, due to the low cost

and ease of generating large numbers of synthetic

images. In addition, there are several CNN models

in the literature that make use of domain adaptation

and transfer learning techniques by applying synthetic

datasets (Charco et al., 2021).

This section describes the process of generating

the synthetic banana image dataset using a 3D mod-

eled banana. The Unreal Engine tool is used to cre-

ate a virtual scenario from which the synthetic data

is generated. The virtual environment contains: three

rails (rail-1, rail-2 and rail-3), cameras in different po-

sitions and angles mounted on each rail (positions C1

- C30), which allow the acquisition of synthetic im-

ages of bananas in 3D, as shown in Figure 3. The

appearance of the artiﬁcial banana ripeness is created

using texture from the real images.

Figure 3: Virtual scenario for the generation of the synthetic

images using Unreal Engine.

Various kinematic components of Unreal Engine

are used for the synthetic dataset generation process,

such as: Camera Rig Rail (rails), Cine Camera Ac-

tor (camera), Level Sequence (sequence), HDRIB-

ackdrop (sky and light), and Material (colors and tex-

tures). The camera is ﬁxed to the rail in different po-

sitions to capture the synthetic images (see Figure 3).

The size of the synthetic images (224x244 pixels) is

the same as the real dataset. Therefore, it must be

taken into account that the camera’s “ﬁlm back” is

conﬁgured in millimeters and that it is also propor-

tional to the pixels of the image (1px = 0.26458333).

For each camera scan, 30 images are taken on each

rail (positions C1 - C30 in Figure 3).

Once the virtual scenario is ready, four banana

maturity levels are considered for the acquisition of

synthetic images as indicated in Figure 4 (labels: A,

B, C and D). Additionally, two sublevels per maturity

level are established and labeled: A1, A2, B1, B2,

C1, C2, D1, and D2. In this way, eight colorations

are used as shown in Figure 4, modifying the tonality

curves and adding spots at maturity levels C and D.

For this it was necessary to modify the texture of the

banana using Adobe Photoshop CS6 software.

Figure 4: Synthetic images of banana maturity levels using

different backgrounds.

To provide further variability to the virtual

scenery, eight different backgrounds were used to

capture the synthetic images. The background col-

ors used were: orange, purple, brown, and light blue.

In addition, materials that come by default in Unreal

Engines were used, such as: Asset Platform, Basic

Wall, Concrete Tiles (R1), and Rock Marble (R2) (see

Figure 4). For the last subclass (level D2), the back-

grounds of ”Concrete Tiles” were changed to ”Ce-

Banana Ripeness Level Classiﬁcation Using a Simple CNN Model Trained with Real and Synthetic Datasets

539

ramic Tile” and ”Rock Marble” to ”Rock SandStone”

to avoid confusion in not distinguish the banana from

the background. In this way, considering the combi-

nations of the proposed scenarios, the total number of

synthetic images generated is 161,280, which is ap-

proximately 40 times greater than the number of im-

ages of the real dataset, which consists of 3,495 im-

ages. The number of images per maturity level of both

real and synthetic bananas is summarized in Table 2.

Table 2: Number of images in the banana dataset.

Banana

Maturity

Level

Number of

Synthetic

Images

Number

of Real

Images

Level A 40,320 1,429

Level B 40,320 815

Level C 40,320 559

Level D 40,320 692

On the other hand, before feeding any CNN with

an image dataset, the RGB images must be normal-

ized to obtain good results and to speed up the com-

putational calculations (Sola and Sevilla, 1997). In

addition, the sizes of the images must be standardized,

a batch size deﬁned, and the categorical variables en-

coded in numbers. This last process is applied to both

the real and the synthetic image datasets.

2.2 Description of the Proposed Model

The proposed model (CIDIS) consists of two convo-

lution layers followed by a max pooling layer. This

conﬁguration is repeated three times, with the rec-

tiﬁed linear units (ReLU) in the hidden layers, and

fully connected layers follow. The proposed model

(CIDIS) consists of two convolution layers followed

by a max pooling layer. This conﬁguration is repeated

three times, with the rectiﬁed linear units (ReLU) in

the hidden layers, and fully connected layers follow.

The model receives as input images of size 224x224

pixels with a depth of three due to the RGB color

channels, and in the end, there are four outputs as-

sociated with the four levels of maturity.

Then, the transfer learning technique is applied

using the CIDIS model trained with the synthetic im-

age dataset. This model has stored the scheme and

the weights of all its layers. Subsequently, the stored

model is loaded into a new instance of the same

network, transferring all the weight matrices learned

from training with the set of synthetic images. In ad-

dition, the last layers of the network (fully connected)

had to be removed to apply the optimizers. Finally,

the training of the ﬁrst layers (convolutional and pool-

ing layers) is frozen so that the learned knowledge is

not modiﬁed, so only the fully connected layers are

trained.

When the transfer learning technique is applied to

the CIDIS model, this is trained with the real image

dataset, which is reﬁned to obtain better results. For

this training, the fully connected layers are added, and

the optimizer called Adagrad is used because a con-

siderably small dataset is used compared to the syn-

thetic image dataset. This way, the CNN2 model was

obtained.

It is important to mention that when the CNN1

model was trained directly with the real image

dataset, the accuracy values obtained were lower

compared to the CNN2 model, which was fed with

the values of CNN1 applying transfer learning, which

was trained with the synthetic dataset. The CNN2

model is optimized with the following actions such

as: changing the learning rate, using dropout layers,

changing the number of epochs, changing the batch

size value, Choose between the two proposed opti-

mizers (Nadam and Adagrad).

During the training of the CNN2 model, the

Nadam and Adagrad optimizers were used. The ﬁrst

is a Nesterov accelerated adaptive moment estimation

optimizer that combines ideas from Adam (a stochas-

tic gradient descent method) that uses few computa-

tional resources and NAG (Nesterov accelerated gra-

dient), both of which apply to large datasets (Dozat,

2016). On the other hand, the second is an adaptive

algorithm that updates the learning rate as the number

of learning iterations increases and is more used in

small datasets. Both optimizers allowed to converge

quickly and efﬁciently depending on the dataset used.

Adding dropout layers is also considered to reduce

overﬁtting problems. Therefore, by modifying the

values of the hyperparameters, it is possible to verify

which ones give best results and to build a ﬁnal robust

model, ready to make predictions of banana maturity

levels. Ultimately, this optimized model is evaluated

with the real image dataset.

2.3 Evaluation of the Proposed Model

For the evaluation of the proposed model, a literature

review is carried out from which the best CNN mod-

els previusoly reported in the identiﬁcation of banana

maturity are chosen, such as: InceptionV3, ResNet50,

Inception-ResNetV2 and VGG19, which are men-

tioned in ascending order according to the quantity of

parameters to train (Faisal et al., 2020; Behera et al.,

2021; Mohapatra et al., 2022).

The VGG19 model within the state-of-the-art re-

view has high performance, high levels of accuracy,

and a considerably low training time (less than the In-

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

540

ceptionResNetV2 model) (Behera et al., 2021).

ResNet models are designed for double or triple-

layer hopping. So skipping layers reduces the dis-

appearing gradient problem. This study uses the 50-

layer ResNet-50. Transfer learning and residual learn-

ing are applied to optimize network parameters and

system development (Helwan et al., 2021).

The Inception-ResNetV2 model has 164 layers.

It is selected because it obtained a lower percent-

age of losses compared to other Inception models

in the state-of-the-art (Szegedy et al., 2017). This

model unites two concepts: Inception (reﬂecting)

and Residual Connection (residual connections) (He

et al., 2016). In addition, Inception models allow

for more efﬁcient computations and increased depth

of networks through dimensionality reductions with

stacked 1x1 convolutions. Therefore, the model man-

ages to reduce the consumption of computational re-

sources and avoid overﬁtting (Szegedy et al., 2015).

The InceptionV3 model reduces computational power

consumption, being more efﬁcient than the VGGNet

and InceptionV1 models (Kurama, 2020).

The results of the evaluation of the proposed CNN

model against the selected CNN models are indicated

in the section 3.

3 RESULTS

This section presents and analyzes the results ob-

tained with the generation of the synthetic banana im-

ages and with the reﬁnement process using the real

image dataset. In addition, the results obtained with

the training of the selected CNN models are pre-

sented, as well as the application of the transfer learn-

ing technique with the ﬁnal optimizations made to

the proposed CIDIS model. For the evaluation of the

models in all cases, a dataset distribution of 60% train,

20% test, and 20% validation is used.

Firstly, the selected CNN models are evaluated

without applying the transfer learning methodology,

and furthermore, they are trained only with the real

dataset. The results obtained with these models are

compared with the proposed CIDIS model using the

same conditions, Table 3 shows these results. The

metric used to evaluate the models was accuracy,

which calculates the frequency with which the pre-

dictions are equal to the proposed labels (0: level A,

1: level B, 2: level C, 3: level D). The time it takes for

a CNN model to classify an image was also measured,

as well as its total memory weight. With the results of

Table 3 a comparison of the accuracy values and the

average classiﬁcation time is made. Therefore, the

CNN model with the best performance is CIDIS, and

so this is the CNN model chosen for CNN1. These

results were the starting point for this project, estab-

lishing a baseline of what can be achieved only with

the real dataset and without carrying out reﬁnement or

transfer learning. The results of this test serve to com-

pare the accuracy between CNN1 and CNN2, and to

verify if the results obtained by applying the transfer

learning technique are viable.

The CIDIS model was evaluated with the images

of real bananas, obtaining an accuracy of 0.872. After

this, the CIDIS model (as CNN1 model) was trained

on the synthetic data and an accuracy of 1.0 was ob-

tained, which is perfect. This ideal result should not

be a good reference for the model, because the im-

ages of synthetic bananas with different objects, an-

gles and backgrounds are very similar to each other,

and therefore the model easily predicts maturity lev-

els. This means that although the CIDIS model ac-

curately predicts images of synthetic bananas, it is

necessary to train with images of real bananas to bet-

ter generalize the model. Then, the transfer learning

technique is applied to the CIDIS (CNN2) model con-

sidering the parameters and hyperparameters of the

CNN1 pre-trained model with the synthetic dataset,

and it is trained with the reﬁned real dataset. For the

training of the CNN2 model with the real dataset, the

fully connected layers were added and the Adagrad

optimizer was selected because the real dataset is con-

siderably smaller compared to the synthetic dataset.

Figure 5 plots the loss and accuracy functions of the

CIDIS model as a CNN2 model, with the real dataset.

It can be seen that in the accuracy graph, the

model starts learning with an accuracy of 0.74, and

continues with an increasing trend until it stabilizes

when the validation accuracy does not improve, ob-

taining an accuracy of 0.917. The graph of the

loss has a decreasing trend until it stabilizes, without

reaching overﬁtting. With the results obtained from

the CNN2 model, optimizations are applied, such

as varying the hyperparameters (batch size, learning

rate, epochs) of the CNN model, in addition, optimiz-

ers are changed and Dropout layers are added. The

results can be seen in Table 4.

4 CONCLUSIONS

It was possible to build a dataset of synthetic bananas,

which required lower costs and time invested, com-

pared to taking images of real bananas, considering

large volumes of data. In this case, generating 3,495

images of real bananas took over 30 days and required

multiple people, while generating 161,280 synthetic

images took almost as long and was done by a sin-

Banana Ripeness Level Classiﬁcation Using a Simple CNN Model Trained with Real and Synthetic Datasets

541

Table 3: Comparison of results with CNN models using real data and without transfer learning.

CNN Model Accuracy Model Weight (Mb) Average time (ms)

VGG19 0.562 160 364

ResNet-50 0.816 200 107

Inception-ResNetV2 0.869 1075 224

CIDIS (proposed CNN) 0.872 21 132

InceptionV3 0.849 187 79

Table 4: Results of the proposed CIDIS model using real/synthetic dataset and transfer learning.

CNN Model Optimizer Dropout Learning Rate Batch Size Accuracy

Nadam 2 (0.2) 0.001 50 0.881

Nadam 1 (0.2) 0.001 50 0.891

CIDIS Adagrad 2 (0.2) 0.01 50 0.904

Adagrad 1 (0.2) 0.001 50 0.917

Adam 2 (0.2) 0.001 50 0.916

Adam 1 (0.2) 0.001 50 0.906

Figure 5: Results of the accuracy and loss function of the

CNN2 model using transfer learning.

gle person using the Unreal Engine software. In addi-

tion, a simple own CNN model was implemented to

identify banana maturity, it was evaluated with other

state-of-the-art CNN models, using a dataset with real

images of bananas. Better results were obtained with

the new CNN model, which was selected for the de-

velopment of the proposed work.

In this work, the proposed CNN model (CNN1)

was trained with synthetic images, then the transfer

learning technique was used to a CNN model called

CNN2, which has the same simple architecture as the

proposed model. CNN2 was trained and evaluated

with a real dataset, obtaining a higher accuracy of

0.917 compared to the proposed CNN model without

transfer learning with an accuracy of 0.872. There-

fore, it was found that better results are obtained when

using the proposed methodology.

Although it is true, a balance was not made be-

tween the number of real images of each of the ba-

nana maturity levels; therefore, as future work, it is

intended to balance the amount of data to obtain bet-

ter results.

ACKNOWLEDGEMENTS

This work has been partially supported by the

ESPOL-CIDIS-11-2022 project.

REFERENCES

Behera, S. K., Rath, A. K., and Sethy, P. K. (2021). Maturity

status classiﬁcation of papaya fruits based on machine

learning and transfer learning approach. Information

Processing in Agriculture, 8(2):244–250.

Bhargava, A. and Bansal, A. (2021). Fruits and vegetables

quality evaluation using computer vision: A review.

Journal of King Saud University-Computer and Infor-

mation Sciences, 33(3):243–257.

Cao, J., Sun, T., Zhang, W., Zhong, M., Huang, B., Zhou,

G., and Chai, X. (2021). An automated zizania

quality grading method based on deep classiﬁcation

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

542

model. Computers and Electronics in Agriculture,

183:106004.

Charco, J. L., Sappa, A. D., Vintimilla, B. X., and Vele-

saca, H. O. (2021). Camera pose estimation in multi-

view environments: From virtual scenarios to the real

world. Image and Vision Computing, 110:104182.

Deiseroth, B., Schramowski, P., Shindo, H., Dhami, D. S.,

and Kersting, K. (2022). Logicrank: Logic induced

reranking for generative text-to-image systems. arXiv

preprint arXiv:2208.13518.

Dozat, T. (2016). Incorporating nesterov momentum into

adam. In Proceedings of the 4th International Confer-

ence on Learning Representations, pages 1–4.

Faisal, M., Albogamy, F., Elgibreen, H., Algabri, M., and

Alqershi, F. A. (2020). Deep learning and computer

vision for estimating date fruits type, maturity level,

and weight. IEEE Access, 8:206770–206782.

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-

ual learning for image recognition. In Proceedings of

the IEEE conference on computer vision and pattern

recognition, pages 770–778.

Helwan, A., Sallam Ma’aitah, M. K., Abiyev, R. H., Uze-

laltinbulat, S., and Sonyel, B. (2021). Deep learning

based on residual networks for automatic sorting of

bananas. Journal of Food Quality, 2021.

Ivanovs, M., Ozols, K., Dobrajs, A., and Kadikis, R. (2022).

Improving semantic segmentation of urban scenes for

self-driving cars with synthetic images. Sensors,

22(6):2252.

Kader, A. A. (2002). Us grade standards. Postharvest tech-

nology of horticultural crops, 3311(287):287–300.

Kurama, V. (2020). A review of popular deep learning

architectures: Resnet, inceptionv3, and squeezenet.

Consult. August, 30.

Mohapatra, D., Das, N., and Mohanty, K. K. (2022). Deep

neural network based fruit identiﬁcation and grading

system for precision agriculture. Proceedings of the

Indian National Science Academy, pages 1–12.

Naik, S. (2019). Non-destructive mango (mangifera indica

l., cv. kesar) grading using convolutional neural net-

work and support vector machine. In Proceedings of

International Conference on Sustainable Computing

in Science, Technology and Management (SUSCOM),

Amity University Rajasthan, Jaipur-India.

Naranjo-Torres, J., Mora, M., Hern

andez-Garc

ıa, R., Barri-

entos, R. J., Fredes, C., and Valenzuela, A. (2020). A

review of convolutional neural network applied to fruit

image processing. Applied Sciences, 10(10):3443.

Ramadhan, Y. A., Djamal, E. C., Kasyidi, F., and Bon, A. T.

Identiﬁcation of cavendish banana maturity using con-

volutional neural networks.

Reid, M. S. (1985). Product maturation and maturity in-

dices. Postharvest technology of horticultural crops,

pages 8–11.

Saragih, R. E. and Emanuel, A. W. (2021). Banana ripeness

classiﬁcation based on deep learning using convolu-

tional neural network. In 2021 3rd East Indonesia

Conference on Computer and Information Technology

(EIConCIT), pages 85–89. IEEE.

Sola, J. and Sevilla, J. (1997). Importance of input data

normalization for the application of neural networks

to complex industrial problems. IEEE Transactions

on nuclear science, 44(3):1464–1468.

Sun, L., Liang, K., Song, Y., and Wang, Y. (2021). An

improved cnn-based apple appearance quality classi-

ﬁcation method with small samples. IEEE Access,

9:68054–68065.

Szegedy, C., Ioffe, S., Vanhoucke, V., and Alemi, A. A.

(2017). Inception-v4, inception-resnet and the impact

of residual connections on learning. In Thirty-ﬁrst

AAAI conference on artiﬁcial intelligence.

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.,

Anguelov, D., Erhan, D., Vanhoucke, V., and Rabi-

novich, A. (2015). Going deeper with convolutions.

In Proceedings of the IEEE conference on computer

vision and pattern recognition, pages 1–9.

Tripathi, M. K. and Maktedar, D. D. (2021). Optimized

deep learning model for mango grading: Hybridizing

lion plus ﬁreﬂy algorithm. IET Image Processing.

Wang, F., Zheng, J., Tian, X., Wang, J., Niu, L., and Feng,

W. (2018). An automatic sorting system for fresh

white button mushrooms based on image processing.

Computers and electronics in agriculture, 151:416–

425.

Zhang, Y., Lian, J., Fan, M., and Zheng, Y. (2018). Deep

indicator for ﬁne-grained classiﬁcation of banana’s

ripening stages. EURASIP Journal on Image and

Video Processing, 2018(1):1–10.

Zhu, L. and Spachos, P. (2021). Support vector machine

and yolo for a mobile food grading system. Internet

of Things, 13:100359.

Banana Ripeness Level Classiﬁcation Using a Simple CNN Model Trained with Real and Synthetic Datasets

543