Technical nuances of machine learning: implementation and validation of supervised methods for genomic prediction in plant breeding
Alencar Xavier
Abstract: The decision-making process in plant breeding is driven by data. The machine learning framework has powerful tools that can extract useful information from data. However, there is still a lack of understanding about the underlying algorithms of these methods, their strengths, and pitfalls. Machine learning has two main branches: supervised and unsupervised learning. In plant breeding, supervised learning is used for genomic prediction, where phenotypic traits are modeled as a function of molecular markers. The key supervised learning algorithms for genomic prediction are linear methods, kernel methods, neural networks, and tree ensembles. This manuscript provides an insight into the implementation of these algorithms and how cross-validations can be used to compare methods. Examples for genomic prediction come from plant breeding.