ducing the pixel maps. These outputs are provided
to two separate analysis stages; one extracting the
colour statistics and the other generating the dimen-
sion statistics. The platform is generalizable allowing
any number of analysis stages for extracting particu-
lar statistics. All statistics are then inserted into the
database. The image files are stored in a file-based
repository outside of the database. The image anal-
ysis including the Mask R-CNN model and trait ex-
traction routines are written in Python.
Field data collected includes tree observations on
fruit quality, yield, size, and taste. These observations
were previously performed on paper and input into
Excel. The new pipeline allows direct entry of data
while in the field using a mobile device as well as
direct entry from researchers. Previous data in Excel
can be imported into the database.
Other data collected include judging panels that
consists of up to 7 people that subjectively rate fruits
based on taste, appearance, and quality. These ob-
servations are collected by a proprietary system. The
data can be extracted from the system for inclusion in
the database.
Overall, the integrated database allows for anal-
ysis across the diverse set of data items and a con-
sistent, historical record to perform data analysis and
prediction. The database used is PostgreSQL.
5 PRELIMINARY RESULTS
The image analysis accuracy is currently being eval-
uated over the test data set. The accuracy of the ap-
ple traits are within the tolerance of human evaluators
and can be computed in a few seconds. This allows
the image analysis to not be a bottleneck in the re-
search lab’s data collection pipeline. Positioning and
capturing the image is a much more time consuming
step than the image analysis. This is a significant sav-
ings over manual processes, and may save hundreds
of researcher hours on data collection and analysis.
The automated colour analysis and classification
of overcolouring allows for higher throughput anal-
ysis of apples and more precise measurements com-
pared to human evaluation.
Overall, the automated techniques allow for faster
processing and the potential to perform higher vol-
ume data analysis. This work has created a unique,
high-quality data set for image analysis and fruit trait
prediction. Important work in progress is the ability
to use the integrated data set to make predictions on
the tree crosses to investigate. Determining what trees
to breed is currently a manual process dependent on
researcher intuition and experience.
6 CONCLUSIONS
Automating the collection and analysis of tree fruit
research data offers the potential to reduce the time
and cost of generating new fruit varieties. This work
describes a data analysis pipeline for combining mul-
tiple data sources for analysis. A key component was
automated image analysis for fruit trait extraction.
The accuracy of the automated process is high and
allows for more image analysis than would be possi-
ble using manual techniques. This research applies to
many other fruit applications beyond apples, and can
be used by growers and environmental agencies.
Ongoing and future work includes expanding on
the integrated database to include new data sets and
using the integrated information for automated pre-
diction.
REFERENCES
Abdullah, A., Harjoko, A., and Mahmod, O. (2023). Clas-
sification of fruits based on shape and color using
combined nearest mean classifiers. Jurnal RESTI
(Rekayasa Sistem dan Teknologi Informasi), 7:51–57.
Aslam, A., Irtaza, A., and Nida, N. (2020). Object De-
tection and Localization in Natural Scenes Through
Single-Step and Two-Step Models. In 2020 Interna-
tional Conference on Emerging Trends in Smart Tech-
nologies (ICETST), pages 1–7.
Batini, C., Lenzerini, M., and Navathe, S. B. (1986).
A comparative analysis of methodologies for
database schema integration. ACM Comput. Surv.,
18(4):323–364.
Bazame, H. C., Molin, J. P., Althoff, D., and Martello, M.
(2021). Detection, Classification, and Mapping of
Coffee Fruits During Harvest with Computer Vision.
Computers and Electronics in Agriculture, 183.
Dubey, S. R. and Jalal, A. S. (2015). Fruit and vegetable
recognition by fusing colour and texture features of
the image using machine learning. International jour-
nal of applied pattern recognition, 2(2):160–181.
Duong, L. T., Nguyen, P. T., Di Sipio, C., and Di Ruscio, D.
(2020). Automated fruit recognition using Efficient-
Net and MixNet. Computers and Electronics in Agri-
culture, 171.
Government of Canada (2023). Summerland Research
and Development Centre. [Online] Available: https:
//profils-profiles.science.gc.ca/en/research-centre/
summerland-research-and-development-centre.
He, K., Gkioxari, G., Dollar, P., and Girshick, R. (2017).
Mask R-CNN. In Proceedings of the IEEE Interna-
tional Conference on Computer Vision (ICCV).
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep Resid-
ual Learning for Image Recognition. In Proceedings
of the IEEE conference on Computer Vision and Pat-
tern Recognition, pages 770–778.
A Data Analysis Pipeline for Automating Apple Trait Analysis and Prediction
379