Authors:
Dupuy Charles
1
;
2
;
Pascal Pultrini
2
and
Andrea Tettamanzi
1
Affiliations:
1
Université Côte d’Azur, I3S, Inria, Sophia Antipolis, France
;
2
Doriane Research Softare & Consulting, Av. Jean Medecin, Nice, France
Keyword(s):
Outlier Detection, Multi-Environment Field Trials, Genomic Prediction, Machine Learning Clustering Methods.
Abstract:
In plant breeding, Multi-Environment Field Trials (MET) are commonly used to evaluate genotypes for multiple traits and to estimate their genetic breeding value using Genomic Prediction (GP). The occurrence of outliers in MET is common and is known to have a negative impact on the accuracy of the GP. Therefore, identification of outliers in MET prior to GP analysis can lead to better results. However, Outlier Detection (OD) in MET is often overlooked. Indeed, MET give rise to different level of residuals which favor the presence of swamping and masking effects where ideal sample points may be portrayed as outliers instead of the true ones. Consequently, without a sensitive and robust outlier detection algorithm, OD can be a waste of time and potentially degrade the accuracy prediction of the GP, especially when the data set is not huge. In this study, we compared various robust outlier methods from different approaches to determine which one is most suitable for identifying MET anoma
lies. Each method has been tested on eleven real-world MET data sets. Results are validated by injecting a proportion of artificial outliers in each set. The Subspace Outlier Detection Method stands out as the most promising among the tested methods.
(More)