TOWARDS AUTOMATED CROP YIELD ESTIMATION

Detection and 3D Reconstruction of Pineapples in Video Sequences

Supawadee Chaivivatrakul, Jednipat Moonrinta and Matthew N. Dailey

School of Engineering and Technology, Asian Institute of Technology, Pathumthani, Thailand

Keywords:

Object detection, Keypoint detection, Keypoint descriptors, Keypoint classiﬁcation, Image segmentation,

Structure from motion, 3D reconstruction, Ellipsoid estimation, Pineapple, Mobile ﬁeld robot, Agricultural

automation.

Abstract:

Towards automation of crop yield estimation for pineapple ﬁelds, we present a method for detection and 3D

reconstruction of pineapples from a video sequence acquired, for example, by a mobile ﬁeld robot. The detec-

tion process incorporates the Harris corner detector, the SIFT keypoint descriptor, and keypoint classiﬁcation

using a SVM. The 3D reconstruction process incorporates structure from motion to obtain a 3D point cloud

representing patches of the fruit’s surface followed by least squares estimation of the quadric (in this case an

ellipsoid) best ﬁtting the 3D point cloud. We performed three experiments to establish the feasibility of the

method. Experiments 1 and 2 tested the performance of the Harris, SIFT, and SVM method on indoor and

outdoor data. The method achieved a keypoint classiﬁcation accuracy of 87.79% on indoor data and 76.81%

on outdoor data, against base rates of 81.42% and 53.83%, respectively. In Experiment 3, we performed 3D

reconstruction from indoor data. The method achieved an average of 34.96% error estimating the ratio of the

fruits’ major axis to short axis length. Future work will focus on increasing the robustness and accuracy of the

3D reconstruction method as well as resolving the 3D scale ambiguity.

1 INTRODUCTION

Agricultural automation has the potential to improve

farm yields, improve crop quality, and lower produc-

tion costs. In particular, autonomous in-ﬁeld inspec-

tion of fruit ﬁelds could improve farmers’ ability to

track crops over time, plan maintenance and harvest-

ing activities, and predict yield. We are interested in

developing autonomous inspection robots for pineap-

ple farms that use low-cost cameras and machine vi-

sion to isolate and grade the fruit while it is still in

the ﬁeld. We focus here on the related problems of 1)

segmenting a video to ﬁnd the pineapple fruit in the

ﬁeld, and 2) obtaining 3D models of detected fruits.

In segmentation, the ﬁrst main challenge is that

since the plants are tightly spaced, we cannot typ-

ically see all of a particular fruit, ruling out shape-

based methods. Second, since pineapples often have

a similar color to the rest of the plant, we cannot rely

on color. We thus rely on texture. We use the fast

Harris algorithm (Harris and Stephens, 1988) to ﬁnd

corner points then apply the SIFT descriptor (Lowe,

2004) to those detected points. We then classify the

descriptors using a SVM (Support Vector Machine).

For 3D modeling, we combine the pineapple de-

tector with 3D point cloud estimation using well

known structure from motion techniques (see, e.g.,

Pollefeys et al., 2004), and then we perform least-

squares estimation of the quadric (in this case an el-

lipsoid) best ﬁtting the 3D point cloud. Our current

method is not robust to outlier points and does not

resolve the scale ambiguity of the 3D reconstruction

(future work will address these limitation), but the

method does provide useful information about fruit

orientation and shape.

The experiments establish the feasibility of using

texture to segment pineapples in video sequences and

using structure from motion to reconstruct pineapple

shape. This work is a step towards fully automatic

crop yield by mobile ﬁeld robots.

2 METHODOLOGY

Our methodology consists of frame selection, image

segmentation, and 3D reconstruction from the point

cloud. In our current prototype, we select two views

of each fruit in a video sequence manually. However,

180

Chaivivatrakul S., Moonrinta J. and N. Dailey M. (2010).

TOWARDS AUTOMATED CROP YIELD ESTIMATION - Detection and 3D Reconstruction of Pineapples in Video Sequences.

In Proceedings of the International Conference on Computer Vision Theory and Applications, pages 180-183

DOI: 10.5220/0002838201800183

 SciTePress

we plan to perform automated key frame selection in

future work. We detail each of the other steps in turn.

2.1 Image Segmentation

Our segmentation algorithm consists of keypoint ex-

traction, keypoint descriptor calculation, keypoint

classiﬁcation, and morphological operations to re-

trieve the fruit region in a given image.

We use the Harris corner detector (Harris and

Stephens, 1988) to ﬁnd candidate keypoints over the

whole image, since images of a pineapple’s surface

have many points with corner-like structure. We ﬁnd

that in practice, the Harris detector tends to ﬁnd fairly

dense sets of keypoints on pineapple image regions

that are very useful for reconstructing 3D point clouds

representing the fruit surface.

Classifying the keypoints as pineapple and non-

pineapple points requires a rich description of the lo-

cal texture surrounding the keypoint. We compute

SIFT descriptors (Lowe, 2004) (a 128-element vec-

tor) for the high-gradient Harris keypoints that are not

too close to image boundaries.

We use support vector machines (SVM) to clas-

sify keypoints as pineapple or non-pineapple. In other

work, we have performed experiments on SIFT key-

point descriptor classﬁcation using a variety of SVM

kernels and hyperparameter settings, and we ﬁnd that

the radial basis function (RBF) kernel has the best

overall performance. Here we use RBFs with a cross-

validated grid search over hyperparameter settings to

ﬁnd the best hyperparameter settings.

For segmentation, we ﬁnd contiguous pineap-

ple regions using morphological closing to connect

nearby pineapple points then remove regions smaller

than 25% of the expected fruit area, based on assump-

tions of image resolution and distance to the camera.

2.2 3D Reconstruction

To obtain 3D point clouds from candidate pineap-

ple image regions, we ﬁnd point correspondences be-

tween image pairs and then apply standard algorithms

from the structure from motion literature, as described

in the following sections.

The ﬁrst step is feature point extraction. Once

pineapple regions have been identiﬁed in a pair of im-

ages of the same fruit, we extract SURF (Bay et al.,

2008) feature points from those regions. Although the

Harris corner detector and the SIFT keypoint descrip-

tor work well for image segmentation, we ﬁnd that

the standard SURF algorithm gives us more reliable

correspondences for 3D point cloud reconstruction.

To ﬁnd point correspondences between two im-

ages, we ﬁnd, for each keypoint descriptor in one

image, the most similar descriptor in the other im-

age. We use the dot product similarity measure with

a threshold to ﬁnd the most likely corresponding key-

point in one image for each keypoint in the other.

To remove outliers in the resulting set of puta-

tive correspondences, we use the adaptive RANSAC

method for fundamental matrix estimation (Hartley

and Zisserman, 2004) to ﬁnd the best fundamental

matrix and correspondence consensus set, removing

outliers inconsistent with the epipolar geometry. The

remaining inlier points are used for 3D point cloud

estimation.

The next step is 3D point cloud estimation. We as-

sume that the camera’s intrinsic parameters are ﬁxed

and given as a calibration matrix K. We next estimate

camera matrices for the two images, using the essen-

tial matrix method (Hartley and Zisserman, 2004).

Once two camera matrices are known, we com-

pute linear estimates of all of the 3D points then

reﬁne those estimates using nonlinear least squares

(Levenberg-Marquardt).

In a real ﬁeld, we cannot rotate the pineapple or

move the camera to get a complete view of the fruit.

Therefore, we must estimate the fruit’s shape from a

partial view. We propose an algorithm for reconstruct-

ing the 3D shape of a pineapple from a 3D point cloud

estimated from a partial view of the fruit’s surface.

Since pineapples are nearly ellipsoidal, we model

each fruit as an ellipsoid and perform least squares

estimation of the ellipsoid’s parameters to ﬁt the point

cloud data estimated in the previous step. Using Li

and Grifﬁths’ (2004) method, we actually estimate the

quadric

Q =







a h g p

h b f q

g f c r

p q r d







(1)

deﬁning X

QX = 0 using least squares.

Once the best-ﬁtting ellipsoidal quadric Q is

found, we extract the ellipsoid’s center, orientation,

and axis radii.

3 EXPERIMENTAL RESULTS

To evaluate our methods, we performed three exper-

iments: fruit segmentation on indoor data, fruit seg-

mentation on outdoor data, and 3D fruit reconstruc-

tion. The 3D reconstruction experiment was only ap-

plied to indoor data.

TOWARDS AUTOMATED CROP YIELD ESTIMATION - Detection and 3D Reconstruction of Pineapples in Video

Sequences

181

Table 1: Distribution of training and test keypoints for in-

door and outdoor segmentation.

Data set Number of Fruit Positive Instances

Indoor Outdoor

Training 20 88.76% 35.48%

Test 10 81.42% 46.17%

Table 2: SVM keypoint classiﬁcation accuracy for indoor

and outdoor segmentation.

Model type Data set Accuracy

Indoor Outdoor

Cross- Training 97.76% 99.85%

validation Validation 93.86% 82.64%

Final Training 97.67% 99.80%

Test 87.79% 76.81%

3.1 Segmentation

In Experiment 1, we captured indoor videos of 30

pineapples from a distance of approximately 30 cen-

timeters. We chose one frame from each video and

split the data into 20 training and 10 test images.

For every image, we applied the Kovesi implemen-

tation of the Harris corner detector (Kovesi, 2000)

with a Gaussian smoothing standard deviation of 0.5,

a threshold of 1, and a non-maximum suppression ra-

dius of 2. We then extracted SIFT descriptors for the

Harris corner points by modifying an open implemen-

tation of SIFT (Vedaldi, 2006) using 4 scales and 8

orientations. The distribution of positive (pineapple)

and negative (background) keypoints over the training

set and test set is shown in Table 1.

We then built SVM models with the RBF kernel

using LIBSVM (Chang and Lin, 2001). The RBF

kernel based SVM requires two hyperparameters, c,

which controls the tradeoff between training error and

model complexity, and γ, which controls the width of

the RBF kernel. We used a grid search and 5-fold

cross validation within the training set to ﬁnd optimal

values of c and γ then used the best parameters to train

a ﬁnal model on all of the training data then used the

resulting model to classify the test set.

The best parameter setting for the indoor data was

(c = 2

, γ = 2

). Accuracy data for the classiﬁers are

shown in Table 2, and a sample of the results is shown

in Figure 1.

In Experiment 2, we performed the same steps

on data acquired outdoors. In the outdoor data,

there were many more negative keypoints due to the

complex background and occlusions (Table 1). The

best parameter setting for the outdoor data was (c =

1.5

, γ = 2

1.75

After classifying points as lying on the pineap-

Figure 1: Sample SVM classiﬁcation results, the green cir-

cle is pineapple point.

(a) (b)

Figure 2: Pineapple segmentation and SURF feature points.

(a) Result of closing operation to obtain pineapple region.

(b) SURF feature points in the pineapple region.

ple surface, we performed morphological closing with

a disk-shaped structuring element of radius 30 using

Matlab’s image processing toolbox (The Mathworks,

2007). A sample of the results is shown in Figure

2(a) We performed further processing on connected

regions containing 20,000 or more pixels (at 30 cm,

with our camera, the visible pineapple surface region

typically contains approximately 80,000 pixels).

3.2 3D Reconstruction

In Experiment 3, we next applied Strandmark’s

(2008) Matlab port of the SURF reference implemen-

tation to the pineapple region detected in each image.

A sample of the feature points we obtained is shown

in Figure 2(b). Sample results from our structure

from motion stream (matching and outlier detection,

point cloud estimation, a Delaunay triangulation of

the point cloud, and an estimated ellipsoid) are shown

in Figures 3–4. A quantitative evaluation of the el-

lipsoid estimation is presented in Table 3. We found

that the outdoor pineapple data had too many outlier

points for accurate ellipsoid estimation, so the data in

Table 3 are only for our indoor data.

4 CONCLUSIONS AND

DISCUSSION

Our experiments demonstrate the feasibility of the

proposed methods. The SVM model for pineap-

ple keypoint classiﬁcation achieves an accuracy of

87.79% for indoor data, which is sufﬁcient, with post

VISAPP 2010 - International Conference on Computer Vision Theory and Applications

182

Table 3: Ratios of medium axis length to major axis length

and short axis length to major axis length for actual fruit

and estimated ellipsoids.

Fruit Ellipsoid Error

No. medium/short/medium/short/medium/ short/

major major major major major major

1 0.83 0.83 0.71 0.58 14.46% 30.62%

2 0.96 0.96 0.66 0.58 31.54% 40.07%

3 0.85 0.85 0.65 0.53 23.41% 37.28%

4 0.84 0.84 0.69 0.57 18.66% 32.69%

5 0.88 0.88 0.73 0.58 17.04% 34.46%

6 0.85 0.85 0.72 0.58 15.57% 31.92%

7 0.86 0.86 0.63 0.53 26.30% 38.17%

8 0.89 0.89 0.71 0.60 19.99% 32.20%

9 0.83 0.83 0.67 0.56 19.39% 32.88%

10 0.88 0.88 0.65 0.54 25.96% 39.34%

Ave-

rage 0.87 0.87 0.68 0.56 21.23% 34.96%

Figure 3: Feature point matching after outlier removal with

RANSAC.

(a) (b)

Figure 4: Sample 3D reconstruction results. (a) 3D point

cloud reconstructed from the inlier points in Figure 3. (b)

Estimated ellipsoid.

processing, to accurately segment pineapple and non-

pineapple regions. The SURF keypoints computed for

these regions enable 3D point cloud estimation with

sufﬁcient accuracy in most cases to estimate an el-

lipsoid with roughly accurate dimensions. In future

work, we will focus on increasing the robustness of

the method and resolving the 3D scale ambiguity, and

perform experiments with real ﬁeld robots.

ACKNOWLEDGEMENTS

SC was supported by a graduate fellowship from the

Thailand National Science and Technology Develop-

ment Agency (NSTDA). Apisit Aroonnual helped de-

velop the ellipsoid ﬁtting software. We thank Anupun

Terdwongworakul, Paul Janecek, and the members

of the AIT Vision and Graphics Lab for useful com-

ments on the work.

REFERENCES

Bay, H., Ess, A., Tuytelaars, T., and Van Gool, L. (2008).

Speeded-up robust features (SURF). Computer Vision

and Image Understanding, 110:346–359.

Chang, C.-C. and Lin, C.-J. (2001). LIBSVM: a li-

brary for support vector machines. Available at

http://www.csie.ntu.edu.tw/ cjlin/libsvm.

Harris, C. and Stephens, M. (1988). A combined corner

and edge detector. In Proceedings of The Fourth Alvey

Vision Conference, pages 147–151.

Hartley, R. and Zisserman, A. (2004). Multiple View Geom-

etry in Computer Vision. Cambridge University Press,

second edition.

Kovesi, P. (2000). MATLAB and Octave functions for

computer vision and image processing. School

of Computer Science & Software Engineering,The

University of Western Australia. Available at

http://www.csse.uwa.edu.au/ pk/research/matlabfns/.

Li, Q. and Grifﬁths, J. G. (2004). Least squares ellipsoid

speciﬁc ﬁtting. In Proceedings of the Geometric Mod-

eling and Processing Conference. IEEE Computer So-

ciety.

Lowe, D. G. (2004). Distinctive image features from scale-

invariant keypoints. International Journal of Com-

puter Vision, 60:91–110.

Pollefeys, M., van Gool, L., Vergauwen, M., Verbiest, F.,

Cornelis, K., Tops, J., and Koch, R. (2004). Vi-

sual modeling with a hand-held camera. International

Journal of Computer Vision, V59.

Strandmark, P. (2008). SURFmex [open source software].

Available at http://www.maths.lth.se/matematiklth/

personal/petter/surfmex.php.

The Mathworks (2007). Image processing toolbox user’s

guide. Technical report.

Vedaldi, A. (2006). An implementation of SIFT detec-

tor and descriptor. Available at http://www.vlfeat.org/

∼vedaldi/assets/sift/sift.pdf.

TOWARDS AUTOMATED CROP YIELD ESTIMATION - Detection and 3D Reconstruction of Pineapples in Video

Sequences

183