Simulation of near Infrared Sensor in Unity for Plant-weed

Segmentation Classiﬁcation

Carlos Carbone

, Ciro Potena

and Daniele Nardi

Department of Computer, Control and Management Engineering, Sapienza University of Rome, Via Ariosto 25, Rome, Italy

Keywords:

Unity Engine, Near Infrared, Convolutional Neural Network, Plant-weed Segmentation Classiﬁcation.

Abstract:

Weed spotting through image classiﬁcation is one of the methods applied in precision agriculture to increase

efﬁciency in crop damage reduction. These classiﬁcations are nowadays typically based on deep machine

learning with convolutional neural networks (CNN), where a main difﬁculty is gathering large amounts of

labeled data required for the training of these networks. Thus, synthetic dataset sources have been developed

including simulations based on graphic engines; however, some data inputs that can improve the performance

of CNNs like the near infrared (NIR) have not been considered in these simulations. This paper presents a

simulation in the Unity game engine that builds ﬁelds of sugar beets with weeds. Images are generated to

create datasets that are ready to train CNNs for semantic segmentation. The dataset is tested by comparing

classiﬁcation results from the bonnet CNN network trained with synthetic images and trained with real images,

both with RGB and RGBN (RGB+near infrared) as inputs. The preliminary results suggest that the addition

of the NIR channel to the simulation for plant-weed segmentation can be effectively exploited. These show a

difference of 5.75% for the global mean IoU over 820 classiﬁed images by including the NIR data in the unity

generated dataset.

1 INTRODUCTION

Precision agriculture is the use of techniques that pro-

vide key information from plant crops to improve de-

cision making about the use of the resources avail-

able in crop management (ISPA, 2020). In this area

there has been an increase in research development in

the recent years (Stafford, 2000; Khanal et al., 2017;

Patr

ıcio and Rieder, 2018; Duhan et al., 2017; Car-

bone et al., 2018).

One of the main challenges in precision agricul-

ture is to obtain the location of weeds that need to

be removed to diminish possible quality loses in the

main plants being grown (Lottes et al., 2018). The

ﬁrst step to obtain the location of such weeds is to

have a reliable system that can identify them (Lottes

et al., 2018). The most common system to accomplish

this type of task is image-based classiﬁcation through

Deep Neural Networks. In particular, Convolutional

Neural Networks (CNNs) are one of the methods that

currently provide the most promising results for this

purpose (Krizhevsky et al., 2017; Potena et al., 2017;

https://orcid.org/0000-0001-5615-0225

https://orcid.org/0000-0003-2395-2170

Carvajal et al., 2017).

As any machine learning based method, CNNs re-

quire accurately labeled data to develop that can reli-

ably identify plants and weeds in an image (Bah et al.,

2018). The data required are usually images taken in

the ﬁeld in controlled environments. Acquisition of

these data requires a lot of time and effort to then such

data need to be manually labeled by hand (Bah et al.,

2018). To ease such difﬁcult task, simulation environ-

ments based on graphic engines have been developed

to generate synthetic datasets for CNN training (Ci-

cco et al., 2016). However, these simulators, to the

best of our knowledge, have not included the simu-

lation of the near-infrared (NIR) sensor, which have

been proven quite effective on crop images (Milioto

et al., 2017).

The objective of our research is to create a sim-

ulation environment that includes the near-infrared

channel to generate a dataset for CNN training. This

is accomplished through a sugar beet ﬁeld simulated

along with weeds in the Unity game engine (Xie,

2012), hereinafter referred to as ”Unity”, using tex-

tures gathered with a real world sensor. This simu-

lator then generates images to train a selected CNN

and compare its performance against the same net-

Carbone, C., Potena, C. and Nardi, D.

Simulation of near Infrared Sensor in Unity for Plant-weed Segmentation Classiﬁcation.

DOI: 10.5220/0009827900810090

In Proceedings of the 10th International Conference on Simulation and Modeling Methodologies, Technologies and Applications (SIMULTECH 2020), pages 81-90

ISBN: 978-989-758-444-2

work trained with a dataset of real images. Our exper-

iments show that the performance result is improved

by including the NIR data. Additionally, the simula-

tor could be adjusted in future investigations for other

types of plants. This would require additional tex-

tures and adjustments to the 3D meshes. The simula-

tor could also be modiﬁed to include more points of

views representing sensors mounted in different type

of robots like ground robots or unmaned aerial vehi-

cles (UAVs).

The rest of this paper is structured as follows: Sec-

tion II presents the state of the art for plant-weed

classiﬁcation using CNNs and for simulation environ-

ments developed to generate CNNs training datasets.

Section III presents the developed simulation environ-

ment in Unity along with texture gathering with a real

sensor. Section IV explains the tests speciﬁcations.

Section IV presents the selected convolutional neural

network. Section V presents the results, the main key

performance indicators and the discussion about the

values obtained. Finally, section VI presents the con-

clusions and future works.

2 RELATED WORK

In this section we present the background state of

the art by addressing ﬁrst research developments for

Plant-weed classiﬁcation using convolutional neural

networks and then for Simulation environments that

aim to support data generation for deep learning re-

searches.

2.1 Plant-weed Classiﬁcation using

Convolutional Neural Networks

The usage of UAVs in precision agriculture is one of

the current commercial trends for these robots as well

as for research approaches (Kolodny, 2017; Carbone

et al., 2018). In (Sa et al., 2018a), an approach to

achieve reliable plant-weed classiﬁcation with UAVs

hardware constraints was developed. SegNet was the

network used with a Jetson TX2 integrated on the

UAV. Efforts to include multispectral sensor data in

convolutional neural network (CNN) training were

further explored in (Sa et al., 2018b). Another ap-

proach that reduces the requirements of the network

is presented in (Lottes and Stachniss, 2017). Here

a semi-supervised approach is presented exploiting

knowledge about the common arrangements of crops.

In (Lottes et al., 2018) a CNN was developed with

an encoder-decoder structure that includes spatial in-

formation within sequences of images. This approach

achieved a generalized improvement in results as it

performs well on new unseen ﬁelds without the need

to retrain the model. Considering spatial information

is indeed a desirable feature in agricultural inspec-

tions, as they are often performed by robots that need

to gather a sequence of images to cover the inspection

of crops.

In (Fawakherji et al., 2019) an approach of using

two networks in sequence is developed to do semantic

segmentation classiﬁcation. The ﬁrst network being

based on encode-decoder architecture to classify con-

nected patches of plant instances from the soil, then

the second network does the classiﬁcation labeling.

The results obtained were accurate enough in cases

were the images had challenging features. Further-

more, (Li et al., 2019) make an approach in cases

where the weeds are dense and overlap the plants,

using ResNet-10 as base with the Adaptive Afﬁnity

Fields method.

2.2 Simulation Environments

This research builds upon the work ”Automatic

Model Based Dataset Generation for Fast and Accu-

rate Crop and Weeds Detection” (Cicco et al., 2016),

which proposed the use of images from a simula-

tion environment to enhance the effectiveness of CNN

for crop-weed classiﬁcation using Unreal Engine 4

(Sanders, 2016), hereinafter referred to as ”Unreal”.

The new feature of our research, which switches from

Unreal Engine to Unity, is the inclusion of the sim-

ulation NIR data. Unity was chosen due to its better

compatibility and accessibility on the Linux operating

system. This will ease future research that requires

robotics libraries that are mostly supported in Linux

like the robotic operating system (ROS). To the best

of our knowledge, the inclusion of the NIR data for

plants in simulation environments based in graphics

engines has not been done before. In particular, the

research done in (Cicco et al., 2016) includes only the

red-green-blue (RGB) data for synthetic dataset gen-

eration.

Video games have been used directly as source of

training datasets for machine learning research. In

(Richter et al., 2016), pixel output from the game as

well as commands being issued were used to generate

large datasets for machine learning research purposes.

In (Shafaei et al., 2016) output images of cityscapes

were generated and tested by comparing classiﬁcation

results of a CNN trained with the generated images

and real images.

Further studies have included the customization of

the environment simulated by using video game en-

gines instead of video games with constraints deﬁned

by the game design. In (Kim et al., 2019), the Unreal

SIMULTECH 2020 - 10th International Conference on Simulation and Modeling Methodologies, Technologies and Applications

was used to simulate speciﬁc orientations of cars to

train a model that works with video data from the real

world; they highlight that the main problem in this

domain is the lack of labeled imagery which can be

generated automatically within Unreal. A similar ap-

proach was pursued with Unity in (Kaur et al., 2018)

where the data was used to train a CNN as part of

a system that provides drive assistance. Moreover,

in (Yang et al., 2016), also develops a study in the

car domain in Unity but with the focus of using the

generated data to test sensor reaction instead of CNN

training.

In (Acker et al., 2017) a cellular automation is im-

plemented with Unreal to generate random pedestrian

and vehicle movement to generate imagery that inter-

faces with a neural network through UnrealCV. The

main idea was to generate an environment that had its

movement established of an abstract level. Similarly,

in (Hattori et al., 2015) a simulator for pedestrian be-

havior is developed. Similarly, in (Zhao et al., 2019)

presents a Unity simulation for the training of a CNN

for occupancy detection in a room which reduced the

average localization error by 36.54% and 11.46% for

private and public scenarios respectively using a real

testbed.

In (Juliani et al., 2018) a platform for deep rein-

forcement learning is developed in Unity with the pur-

pose of making available an open source simulation of

learning agents in an environment with realistic visu-

als.

In (Akiyama et al., 2018) presents a method with-

out a graphic engine that generates chart images to

support studies in information visualization. A web

tool is developed to synthetically and randomly gener-

ate the charts based on probability distributions func-

tions. These charts are then rendered in PNG format.

The main contribution of this paper is the inclu-

sion of the near infrared (NIR) data in a simulation

environment developed in Unity that can build proce-

dural ﬁelds of sugar beets and weeds including RGB

and NIR data to generate images to be included as part

of a CNN training dataset.

3 INTEGRATED SIMULATION

ENVIRONMENT

In this research, Unity is the graphic engine used to

develop the simulation environment to generate the

synthetic dataset. Unity is designed to develop video

games by providing scripting functionality along with

realistic illumination. In Unity, we developed a real-

istic procedural crop ﬁeld that randomizes parameters

of its generated plants and weeds.

3.1 Sensor and Textures

Texture images and 3D meshes are required to gener-

ate the plants and weeds in Unity. Cropped images of

real plants are used as textures which are cropped to

ﬁt into the 3D meshes through UV mapping which is

the system that maps the pixels from the images to the

3D mesh. These images were gathered using the JAI

AD-130 GE camera which provides input from a visi-

ble color channel from 400-700nm and a near infrared

(NIR) channel from 750-900+nm simultaneously.

The plant images were taken in a small sugar beet

ﬁeld in the city of Ancona in Italy with the support

of the personnel of the Agenzia Servizi al settore

Agroalimentare delle Marche (ASSAM). In total, 8

textures were gathered, a cropped example image is

shown in Figure 1. The weed textures were extracted

from the 2016 Sugar Beets Dataset Recorded at Cam-

pus Klein Altendorf in Bonn, Germany, hereinafter

referred to as ”Bonn Dataset”. A total of 50 weed

textures were created by cropping the RGB and NIR

images using the masks in the labeled images, an ex-

ample is shown in Figure 2.

(a) RGB.

(b) NIR.

Figure 1: Sugar beet leaf textures, RGB on the left, NIR on

the right.

3.2 Unity Engine

Once the texture images are obtained they must be

placed in what is called materials in Unity with ap-

propriate shader coding. The shader is the part of the

simulator that takes the texture inputs and places them

on top of the 3D meshes to render the appropriate pix-

els in the screen. Then the material is an instance of

the shader code where textures images are speciﬁed.

Two main shaders are used in this simulation: one

with transparency and double side rendering (used for

plants and weeds), and one that shows raw colors to

the screen excluding transparent pixels (used to create

the masks for the labeled images).

For the plants, the textures are applied in a rect-

angular mesh with a skeleton that bends the mesh re-

sembling a bent leaf. With the previously mentioned

Simulation of near Infrared Sensor in Unity for Plant-weed Segmentation Classiﬁcation

(a) RGB. (b) Labeled.

Figure 2: Weeds in sugar beet ﬁeld, original RGB on the

left, labeled image on the right, Green labels are the plants

and red labels are the weeds, and texture showing only the

weeds cropped at the bottom.

shaders only the leaf pixels are visible in the mesh,

visually turning the bent rectangle to a leaf. With the

same shaders, the weed textures are placed in a simple

square due to the very small shapes of the weeds.

With the plant and weed meshes ready to be

spawned, a script is created to procedurally generate

the ﬁeld with some randomized parameters to create

a high variety of plants and weeds. The size and rota-

tion of the sugar beet leaves have random variations to

have similar shapes compared to the real images. The

weeds are randomly placed as planes on the ground

over the ﬁeld with their rotation being randomized.

A ﬁeld is shown in Figure 3, and images taken from

the camera in the generated level are shown in Fig-

ure 4, the RGB taken images are also shown in Figure

5 where the brightness has been manually adjusted

for demonstration purposes for this paper. During the

image generation the following steps are taken repeat-

edly until the desired amount of images is generated:

1. The camera is placed in a ﬁxed position it takes a

screenshot of the ﬁeld setup with RGB data.

2. The ﬁeld switches to the NIR data by changing

the Unity materials and the camera takes another

screenshot.

3. The ﬁeld switches to the label mask data by

changing the Unity materials and the camera takes

another screenshot.

4. The intensity of the global illumination is slightly

varied with random values, the ﬁeld is replaced

with a completely new one and the camera is

moved to a new random position within the crop.

5. Above steps are then repeated to generate more

images.

The close up images present a side by side com-

parison of the Unity synthetic images with the real

images of both RGB and near infrared data. The

Unity images show similar features, plant sizes and

illumination that are essential to get good classiﬁca-

tion results for real images. This allowed to gener-

ate a synthetic dataset of plant-weed images includ-

ing the RGB view, the NIR view and the labeled

view that holds a similar aspect to the real dataset.

The speciﬁc properties of the datasets built will be

introduced in Section 4. Additionally, the dataset

can be accessed in https://github.com/CSCarbone07/

SPQR AgriSim Unity.

(a) Unity RGB ﬁeld.

(b) Unity NIR ﬁeld.

Figure 3: Sugar beet leaf ﬁeld: RGB (above), NIR and la-

beled masks (below).

3.3 Convolutional Deep Learning

Neural Network

The deep neural network used is: ”Bonnet: An

Open-Source Training and Deployment Framework

for Semantic Segmentation in Robotics” (Milioto and

Stachniss, 2019), hereinafter referred to as ”Bonnet”.

This network is used with its default conﬁguration

for plant-weed classiﬁcation, and some modiﬁcations

SIMULTECH 2020 - 10th International Conference on Simulation and Modeling Methodologies, Technologies and Applications

(a) Unity RGB image. (b) Unity NIR image.

Figure 4: Images taken from camera for synthetic dataset

(above) compared to real images (below): RGB to the left,

NIR to the right.

were done to the code to include a fourth channel to

include the input NIR data. The research reported a

mean intersection over union (IoU) of 80.1% for crop-

weed classiﬁcation and 98.5% for the mean accuracy.

The code is also adapted to be mounted on robots with

ROS if necessary since the network was released as an

open source tool to ease its usage on robotics applica-

tions.

The network is based on an encoder-decoder ar-

chitecture, and the plant weed conﬁguration has a

9 layer conﬁguration with 3 down-sample blocks of

8, 16 and 64 ﬁlters respectively and 3 blocks of up-

sample of 32, 16 and 8 ﬁlters respectively. This con-

ﬁguration setting was developed to work with the real

data set that is used in this research, Bonn Dataset

(Chebrolu et al., 2017), which is the dataset used for

testing in this research.

4 TESTS

This section explains the key features of the datasets

used for the training of the deep learning neural net-

work. A summary of these features is presented in

Table 1.

The synthetic dataset was generated using the

level built in Unity, creating 1034 images with a size

of 1024x1024 pixels, which is one of the common op-

tions for Unity textures. The mentioned amount of

images was generated to have an amount of images

close to the half of the images available in the Bonn

Dataset, which are used as the source of real images

(a) Unity RGB enlightened image.

(b) Real RGB enlightened image.

Figure 5: Enlightened RGB images from Figure 4.

for training and performance evaluation. This dataset

has a total 1854 images, where 1034 are used as the

real dataset for the training of the CNN by ﬁtting them

into 734 for the train images, 150 for the validation

images and 150 for the test images as shown in Table

1. This leaves 820 real images which were used for

the performance evaluation of each trained CNN us-

ing each dataset. The Unity synthetic dataset has the

same distribution of images as the real dataset for the

training of the CNN. Then, a mixed dataset is built

where 300 image of the real dataset are added and

distributed equally for the training, validation and test

along with the synthetic dataset distribution for the

CNN training as shown in the third column in Table

Each dataset built has an RGB and an RGBN ver-

sion and both share the same distribution of images.

The labels are deﬁned as ground, weed and plant in all

the datasets. All the textures were gathered with the

same sensor model used to take the real images in the

Bonn Dataset. The pixels for the real dataset training

Simulation of near Infrared Sensor in Unity for Plant-weed Segmentation Classiﬁcation

Table 1: Datasets features.

Feature for RGB and RGBN input Real Unity Unity + Real

Dimensions (WxH pixels) 1296x966 1024x1024 512x512

Train (number of images) 734 734 734 Unity +100 real

Validation (number of images) 150 150 150 Unity +100 real

Test (number of images) 150 150 150 Unity +100 real

are the same as the sensor output, for the Unity dataset

the output of 1024x1024 pixels is used, and for the

mixed dataset the images were reduced to 512x512

pixels to have the same dimensions. Both the syn-

thetic and the mixed dataset have a power of two size

of images which is a common property for Unity im-

ages. The each training using each dataset had at least

200 epochs with the aim of running the training until

the gain in accuracy was negligible.

5 RESULTS

The results for each training are presented using the

IoU and the accuracy as key performance indicator

(KPI). The former is calculated through Equation 1

for each label: N is number of pixels of a classiﬁed

image, T

is the true positive where a pixel is clas-

siﬁed correctly for a given label. The sum of these

being equivalent to the intersection of the classiﬁed

image with the ground truth image for the chosen la-

bel. Then, the T

, F

and F

are the True positive,

False positive and False negative respectively for a la-

bel. Then the IoU mean is obtained by calculating an

average of the IoU mean for each label on a given im-

age, hence the total IoU mean is calculated by taking

a mean over all the IoU means of the 820 classiﬁed

images for performance evaluation.

IoU

label

∑

i=1

+ F

(1)

The second KPI used is the accuracy which is cal-

culated using Equation 2. Again, this is calculated for

each label and then a mean is calculated per image

and a mean over the 820 images is calculated as well

for this performance evaluation. The variables are the

same ones in Equation 1 with the addition of the T

which is the true negative when a pixel does not con-

tain a certain label in both the classiﬁed image and

the ground truth. Each KPI result is then multiplied

by 100 to have the results in percentages.

Accuracy

label

∑

i=1

+ T

+ F

(2)

Table 2 and 3 were obtained by using the se-

lected KPIs. The former represents the overall re-

sult after calculating the IoU and accuracy means for

the classiﬁcation over 820 labeled images using RGB

(red-green-blue) and RGBN (red-green-blue-near in-

frared) inputs from Bonn Dataset (Chebrolu et al.,

2017). The results of table 2 are also presented in Fig-

ure 6 and 7 for better visualization. Table 3 presents

the KPIs of interest for the image that got the highest

difference between RGB and RGBN mean IoU using

the synthetic (Unity) trained network. The columns

represent each classiﬁcation result for said image us-

ing each trained CNN with each dataset. This image

is shown in Figure 8, which also presents the clas-

siﬁcation ground truth (plant green and weed red la-

beled image) and the result classiﬁcation (plant green

and weed blue labeled image) using the trained net-

work for RGB and RGBN inputs using the 3 training

datasets: real, Unity and Unity+real.

5.1 Discussion

As it can be seen in Table 2 and 3, an increase in mean

IoU and mean accuracy is achieved by adding the in-

put of the NIR channel for the deep learning network

training. As an overall, the mean IoU and mean Accu-

racy (lines 8, 9, 16 and 17 of the table) has a difference

of 5.75% and 0.64% respectively for the trained neu-

ral network using the Unity training dataset shown in

the third column of Table 2. The same trend is found

in the classiﬁcation results using the other trained net-

works using the real and Unity+real dataset, second

and fourth column respectively. This means that the

NIR data can increase considerably the classiﬁcation

results, specially for the synthetic dataset as this pre-

sented the biggest improvement compared to the other

dataset trained networks.

Table 2 also includes the results for each class:

Plant, Weed, Ground. In these lines (2-13), it can be

seen that the best performance is for the ground fol-

lowed by the plants and then the weeds, which follows

an order of pixels. This is because the ground data

is considerably different from the other two classes.

Additionally, there was more training and classiﬁca-

tion pixel data available for the ground and less for

plants and weeds. Therefore, the classiﬁcation results

are in favor of soil ground classiﬁcation compared to

plant and weed classiﬁcation. Additionally, it is worth

mentioning that the available labeled images from the

SIMULTECH 2020 - 10th International Conference on Simulation and Modeling Methodologies, Technologies and Applications

Table 2: Classiﬁcation mean results (in percentage) for classiﬁcation over 820 images used for evaluation performance.

Method Real Unity Unity + Real

Plant IoU RGB 83.35 60.83 73.51

Plant IoU RGBN 83.07 71.20 75.61

Weed IoU RGB 50.49 22.54 31.57

Weed IoU RGBN 54.09 29.57 36.82

Ground IoU RGB 98.56 98.14 98.41

Ground IoU RGBN 98.53 97.97 98.73

mean IoU RGB 77.47 60.50 67.83

mean IoU RGBN 78.56 66.25 70.38

Plant Accuracy RGB 99.04 97.23 98.63

Plant Accuracy RGBN 99.18 98.33 98.70

Weed Accuracy RGB 98.93 97.75 98.59

Weed Accuracy RGBN 99.03 98.18 98.70

Ground Accuracy RGB 99.99 99.97 99.99

Ground Accuracy RGBN 99.99 99.97 99.99

mean Accuracy RGB 99.32 98.18 99.07

mean Accuracy RGBN 99.40 98.82 99.16

Figure 6: IoU results from table 2 for each class.

Figure 7: Accuracy results from table 2 for each class.

Simulation of near Infrared Sensor in Unity for Plant-weed Segmentation Classiﬁcation

Table 3: Classiﬁcation results (in percentage) for selected image with a high difference between RGBN and RGB mean IoU

score using the Unity trained network.

Method Real Unity Unity + Real

mean IoU RGB 81.56 57.68 59.45

mean IoU RGBN 83.38 74.03 71.93

mean Accuracy RGB 99.76 99.14 99.29

mean Accuracy RGBN 99.79 99.58 99.62

(a) Real RGB image.

(b) Real (RGB).

(d) Unity+real (RGB).

(e) Real image ground truth.

(f) Real (RGBN).

(g) Unity (RGBN).

(h) Unity+real (RGBN).

Figure 8: Classiﬁed image with high difference between RGBN and RGB mean IoU using the Unity trained network. First

column present original image and ground truth labeled image. Second column presents the classiﬁcation result using the real

dataset trained network. The third column presents classiﬁcation result for the Unity trained network. And the fourth column

presents the classiﬁcation result for the Unity+real trained network. Green labels plants, red labels weed in ground truth and

blue labels weeds in network classiﬁed image.

Bonn Dataset are in a slightly younger state compared

to the used for the synthetic generated images, which

also creates spread results compared to previous re-

searches (Cicco et al., 2016).

The mean IoU results obtained using the real im-

ages, shown in the second column of Table 2, are

in line with previous reported results for plant-weed

classiﬁcation using Bonnet, these are 80.1% for the

mean mIoU and 98.5% for the mean accuracy (Mil-

ioto and Stachniss, 2019). It is worth noting that the

difference of mean IoU between the RGB and RGBN

for the real dataset training is of 1.09% and 0.08% for

the mean IoU and mean Accuracy respectively could

be considered small. Multiple trends for these val-

ues have been found by previous researches when the

NIR is added as an input for plant-weed classiﬁca-

tion. (Milioto et al., 2017) reported multiple results

with differences of mean IoU ranging between 2.98%

and 27.94% and (Deng et al., 2014) reports a negative

difference of 26.27%, meaning that there are cases

where the extra data from the NIR could bring con-

fusion to the neural network.

Table 3 and Figure 8 show a case where the NIR

makes the big improvement for plant-weed classiﬁ-

cation. Here the biggest change of interest is the

mean IoU where a difference of 2.65% and 18.7%

is achieved for the real dataset trained network and

Unity trained network respectively.

As both Table 2 and 3 suggest, the improvement

trend by including the NIR input data is more promi-

nent in the synthetic dataset method, meaning that

this contribution is remarkable for simulation envi-

ronments developed to generate synthetic dataset.

In previous research ”Automatic Model Based

Dataset Generation for Fast and Accurate Crop and

Weeds Detection” using Unreal as the simulation en-

vironment to generate the synthetic dataset results ob-

tained are up to 60.2% and 99.5% for the mean IoU

and mean accuracy respectively (Cicco et al., 2016).

This result was obtained by using also a set of the im-

SIMULTECH 2020 - 10th International Conference on Simulation and Modeling Methodologies, Technologies and Applications

ages included in the Bonn Dataset (Chebrolu et al.,

2017) using a modiﬁed version of the RGB SegNet

as the convolutional neural network (CNN) with a

synthetic generated dataset of RGB images resized to

480x360 pixels (Badrinarayanan et al., 2015).

6 CONCLUSIONS AND FUTURE

WORK

The main contribution of this paper is the addition

of the near infrared sensor in a simulation environ-

ment to generate datasets of plant-weed images that

include red, green, blue and near infrared data. The

results were tested against real data with a convolu-

tional neural network (CNN) designed for plant-weed

segmentation classiﬁcation. The classiﬁcation is eval-

uated using the mean intersection over union (IoU)

and the accuracy: both values are increased by adding

the near infrared data, with the most prominent im-

provement obtained by using the synthetic data.

The results obtained suggests this method to be

bases for the simulation of other type of plants and

weeds with the purpose of providing a reliable source

of data to generate datasets for CNN training. Simu-

lation parameters as size of plants, illumination inten-

sity and color variations can be adjusted to ﬁt a spe-

ciﬁc environmental condition, and thus achieve simi-

lar classiﬁcation results. Additional type of plants and

weeds can be added as well by adding new textures or

new 3D meshes to depending on their complexity. If

the new plant general shape is the same, it would re-

quire only additional textures and simple adjustments

to the 3D meshes. The developed simulator gener-

ates sugar beets leaves using a simple rectangular bent

mesh which then take the shape of the leaf through a

transparency shader that only renders the pixels with

leaf pixels. Then, carrots, for example, could be eas-

ily added since they have also a bent stem with leaves

that follow to some degree the pattern of the bent

stem. For some studies simple new textures could be

enough and if more detail is required then extra bent

meshes could be spawned on top of the main stem to

have more precise leave shapes. Plants with consid-

erable different shapes like sunﬂowers could also be

added by adding appropriate meshes that follow the

general shape of the plant and their required textures.

The simulator could also be modiﬁed to include

more points of views representing sensors mounted in

different type of robots like ground robots or UAVs.

These robots could also be added in the simulator to

perform inspection patterns and collect data for ma-

chine learning training and testing.

REFERENCES

Acker, O. V., Lachish, O., and Burnett, G. (2017). Cel-

lular automata simulation on FPGA for training neu-

ral networks with virtual world imagery. volume

abs/1711.07951.

Akiyama, R., Araujo, T., Chagas, P., Miranda, B., Santos,

C., Morais, J., and Meiguins, B. (2018). Synthetic

chart image generator: An application for generating

chart image datasets. In 2018 22nd International Con-

ference Information Visualisation (IV). IEEE.

Badrinarayanan, V., Kendall, A., and Cipolla, R.

(2015). Segnet: A deep convolutional encoder-

decoder architecture for image segmentation. CoRR,

abs/1511.00561.

Bah, M., Haﬁane, A., and Canals, R. (2018). Deep learn-

ing with unsupervised data labeling for weed detec-

tion in line crops in UAV images. Remote Sensing,

10(11):1690.

Carbone, C., Garibaldi, O., and Kurt, Z. (2018). Swarm

robotics as a solution to crops inspection for precision

agriculture. KnE Engineering, 3(1):552.

Carvajal, J. A., Romero, D. G., and Sappa, A. D. (2017).

Fine-tuning based deep convolutional networks for

lepidopterous genus recognition. In Progress in Pat-

tern Recognition, Image Analysis, Computer Vision,

and Applications, pages 467–475. Springer Interna-

tional Publishing.

Chebrolu, N., Lottes, P., Schaefer, A., Winterhalter, W.,

Burgard, W., and Stachniss, C. (2017). Agricultural

robot dataset for plant classiﬁcation, localization and

mapping on sugar beet ﬁelds. The International Jour-

nal of Robotics Research, 36(10):1045–1052.

Cicco, M. D., Potena, C., Grisetti, G., and Pretto, A.

(2016). Automatic model based dataset generation for

fast and accurate crop and weeds detection. volume

abs/1612.03019.

Deng, W., Zhao, C., and Wang, X. (2014). Discrimina-

tion of crop and weeds on visible and visible/near-

infrared spectrums using support vector machine, ar-

tiﬁcial neural network and decision tree. Sensors &

Transducers, 26:26–34.

Duhan, J. S., Kumar, R., Kumar, N., Kaur, P., Nehra, K., and

Duhan, S. (2017). Nanotechnology: The new perspec-

tive in precision agriculture. Biotechnology Reports,

15:11–23.

Fawakherji, M., Youssef, A., Bloisi, D., Pretto, A., and

Nardi, D. (2019). Crop and weeds classiﬁcation for

precision agriculture using context-independent pixel-

wise segmentation. In 2019 Third IEEE International

Conference on Robotic Computing (IRC). IEEE.

Hattori, H., Boddeti, V. N., Kitani, K., and Kanade, T.

(2015). Learning scene-speciﬁc pedestrian detectors

without real data. In 2015 IEEE Conference on Com-

puter Vision and Pattern Recognition (CVPR), pages

3819–3827.

ISPA (2020). Home | international society of precision agri-

culture.

Juliani, A., Berges, V., Vckay, E., Gao, Y., Henry, H., Mat-

Simulation of near Infrared Sensor in Unity for Plant-weed Segmentation Classiﬁcation

tar, M., and Lange, D. (2018). Unity: A general plat-

form for intelligent agents. CoRR, abs/1809.02627.

Kaur, P., Sobti, R., and Khamparia, A. (2018). Simulation

and deep cnn based architecture for validation of in-

telligent automotive functions. In 2018 International

Conference on Advances in Computing, Communica-

tions and Informatics (ICACCI), pages 2344–2348.

Khanal, S., Fulton, J., and Shearer, S. (2017). An overview

of current and potential applications of thermal remote

sensing in precision agriculture. Computers and Elec-

tronics in Agriculture, 139:22–32.

Kim, T. S., Peven, M., Qiu, W., Yuille, A., and Hager,

G. D. (2019). Synthesizing attributes with unreal en-

gine for ﬁne-grained activity analysis. In 2019 IEEE

Winter Applications of Computer Vision Workshops

(WACVW), pages 35–37.

Kolodny, L. (2017). Fixed-wing drones not quite

taking off in commercial market, a new

DroneDeploy study ﬁnds \textbar TechCrunch.

https://techcrunch.com/2016/08/15/ﬁxed-wing-

drones-not-quite-taking-off-in-commercial-market-

a-new-dronedeploy-study-ﬁnds/. Visited on 2017-

08-09.

Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2017). Im-

ageNet classiﬁcation with deep convolutional neural

networks. Communications of the ACM, 60(6):84–90.

Li, N., Zhang, X., Zhang, C., Guo, H., Sun, Z., and

Wu, X. (2019). Real-time crop recognition in trans-

planted ﬁelds with prominent weed growth: A visual-

attention-based approach. IEEE Access, 7:185310–

185321.

Lottes, P., Behley, J., Milioto, A., and Stachniss, C. (2018).

Fully convolutional networks with sequential infor-

mation for robust crop and weed detection in preci-

sion farming. IEEE Robotics and Automation Letters,

3(4):2870–2877.

Lottes, P. and Stachniss, C. (2017). Semi-supervised online

visual crop and weed classiﬁcation in precision farm-

ing exploiting plant arrangement. In 2017 IEEE/RSJ

International Conference on Intelligent Robots and

Systems (IROS). IEEE.

Milioto, A., Lottes, P., and Stachniss, C. (2017). Real-time

semantic segmentation of crop and weed for precision

agriculture robots leveraging background knowledge

in cnns. CoRR, abs/1709.06764.

Milioto, A. and Stachniss, C. (2019). Bonnet: An Open-

Source Training and Deployment Framework for Se-

mantic Segmentation in Robotics using CNNs. In

Proc. of the IEEE Intl. Conf. on Robotics & Automa-

tion (ICRA).

Patr

ıcio, D. I. and Rieder, R. (2018). Computer vision and

artiﬁcial intelligence in precision agriculture for grain

crops: A systematic review. Computers and Electron-

ics in Agriculture, 153:69–81.

Potena, C., Nardi, D., and Pretto, A. (2017). Fast and ac-

curate crop and weed identiﬁcation with summarized

train sets for precision agriculture. In Intelligent Au-

tonomous Systems 14, pages 105–121. Springer Inter-

national Publishing.

Richter, S. R., Vineet, V., Roth, S., and Koltun, V. (2016).

Playing for data: Ground truth from computer games.

In Leibe, B., Matas, J., Sebe, N., and Welling, M.,

editors, Computer Vision – ECCV 2016, pages 102–

118, Cham. Springer International Publishing.

Sa, I., Chen, Z., Popovic, M., Khanna, R., Liebisch, F., Ni-

eto, J., and Siegwart, R. (2018a). weedNet: Dense

semantic weed classiﬁcation using multispectral im-

ages and MAV for smart farming. IEEE Robotics and

Automation Letters, 3(1):588–595.

Sa, I., Popovi

c, M., Khanna, R., Chen, Z., Lottes, P.,

Liebisch, F., Nieto, J., Stachniss, C., Walter, A., and

Siegwart, R. (2018b). WeedMap: A large-scale se-

mantic weed mapping framework using aerial multi-

spectral imaging and deep neural network for preci-

sion farming. Remote Sensing, 10(9):1423.

Sanders, A. (2016). An Introduction to Unreal Engine 4.

CRC Press.

Shafaei, A., Little, J. J., and Schmidt, M. (2016). Play

and learn: Using video games to train computer vi-

sion models.

Stafford, J. V. (2000). Implementing precision agriculture in

the 21st century. Journal of Agricultural Engineering

Research, 76(3):267–275.

Xie, J. (2012). Research on key technologies base unity3d

game engine. In 2012 7th International Conference

on Computer Science Education (ICCSE), pages 695–

699.

Yang, C., Lee, T., Huang, C., and Hsu, K. (2016). Unity 3d

production and environmental perception vehicle sim-

ulation platform. In 2016 International Conference

on Advanced Materials for Science and Engineering

(ICAMSE), pages 452–455.

Zhao, J., Frumkin, N., Ishwar, P., and Konrad, J. (2019).

Cnn-based indoor occupant localization via active

scene illumination. In 2019 IEEE International Con-

ference on Image Processing (ICIP), pages 2636–

2640.

SIMULTECH 2020 - 10th International Conference on Simulation and Modeling Methodologies, Technologies and Applications