Benchmarking algorithms for genomic prediction of complex traits

Azodi, Christina B.; McCarren, Andrew; Roantree, Mark; Gustavo, de los Campos; Shiu, Shin-Han

Azodi, Christina B. ORCID: 0000-0002-6097-606X, McCarren, Andrew ORCID: 0000-0002-7297-0984, Roantree, Mark ORCID: 0000-0002-1329-2570, Gustavo, de los Campos ORCID: 0000-0001-5692-7129 and Shiu, Shin-Han ORCID: 0000-0001-6470-235X (2019) Benchmarking algorithms for genomic prediction of complex traits. G3: Genes, Genomes, Genetics, 9 (11). pp. 3691-3702. ISSN 2160-1836

Abstract
Metadata
Downloads
Documents
Metrics

[+][-]

Abstract

The usefulness of Genomic Prediction (GP) in crop and livestock breeding programs has led to efforts to develop new and improved GP approaches including non-linear algorithm, such as artificial neural networks (ANN) (i.e. deep learning) and gradient tree boosting. However, the performance of these algorithms has not been compared in a systematic manner using a wide range of GP datasets and models. Using data of 18 traits across six plant species with different marker densities and training population sizes, we compared the performance of six linear and five non-linear algorithms, including ANNs. First, we found that hyperparameter selection was critical for all non-linear algorithms and that feature selection prior to model training was necessary for ANNs when the markers greatly outnumbered the number of training lines. Across all species and trait combinations, no one algorithm performed best, however predictions based on a combination of results from multiple GP algorithms (i.e. ensemble predictions) performed consistently well. While linear and non-linear algorithms performed best for a similar number of traits, the performance of non-linear algorithms vary more between traits than that of linear algorithms. Although ANNs did not perform best for any trait, we identified strategies (i.e. feature selection, seeded starting weights) that boosted their performance near the level of other algorithms. These results, together with the fact that even small improvements in GP performance could accumulate into large genetic gains over the course of a breeding program, highlights the importance of algorithm selection for the prediction of trait values

Metadata

Item Type:	Article (Published)
Refereed:	Yes
Subjects:	Biological Sciences > Biology Humanities > Biological Sciences > Biology Biological Sciences > Genetics Humanities > Biological Sciences > Genetics Computer Science > Artificial intelligence Computer Science > Machine learning
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Institutes and Centres > INSIGHT Centre for Data Analytics
Publisher:	Genetics Society of America
Official URL:	https://doi.org/10.1534/g3.119.400498
Copyright Information:	© 2019 The Authors. CC-BY-NC-ND 4.0
Funders:	National Science Foundation Graduate Research Fellowship (Fellow ID: 2015196719)(USA), Graduate Research Opportunities Abroad (GROW) Fellowship to C.B.A, U.S. Department of Energy Great Lakes Bioenergy Research Center (BER DE-SC0018409), National Science Foundation (IOS-1546617, DEB-1655386) (USA), National Institute of Health (R01GM099992, R01FM101219) (USA)
ID Code:	23439
Deposited On:	16 Jun 2020 13:57 by Andrew Mccarren . Last Modified 12 Aug 2022 10:08

Documents

Full text available as:

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
2MB

Metrics

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

Benchmarking algorithms for genomic prediction of complex traits

Altmetric Badge

Dimensions Badge

Downloads