Polygenic score
A polygenic score, also called a polygenic risk score, genetic risk score, or genome-wide score, is a number based on variation in multiple genetic loci and their associated weights (see regression analysis).[1][2] It serves as the best prediction for the trait that can be made when taking into account variation in multiple genetic variants.
Polygenic scores are widely employed in animal, plant, and behavioral genetics for prediction and understanding genetic architectures. In a GWAS, polygenic scores having substantially higher predictive performance than the genome-wide statistically-significant hits indicates that the trait in question is affected by a larger number of variants than just the hits and larger sample sizes will yield more hits; a conjunction of low variance explained and high heritability as measured by GCTA, twin studies or other methods indicates that a trait may be massively polygenic and affected by thousands of variants. Once a polygenic score explaining at least a few percent of variance has been created which effectively identifies most of the genetic variants affecting a trait, it can be used as a lower bound to test whether heritability estimates may be biased, measure the genetic overlap of traits (genetic correlation) which might indicate eg shared genetic bases for groups of mental disorders, used to measure group differences in a trait such as height, examine changes in a trait over time due to natural selection indicative of a soft selective sweep such as intelligence (where the changes in frequency would be too small to detect on each individual hit but the polygenic score declines), used in Mendelian randomization (assuming no pleiotropy with relevant traits), detect & control for the presence of genetic confounds in outcomes (eg the correlation of schizophrenia with poverty), and investigate gene–environment interactions.
Polygenic scores are widely used in animal breeding (usually termed genomic prediction) due to their practical use in breeding improved livestock and crops.[3] Their use in human studies are increasing.[4][5]
Estimating weights
Weights are usually estimated using some form of regression analysis. Because the number of genomic variants (usually SNPs) is usually larger than the sample size, one cannot use OLS multiple regression (p > n problem[6][7]). Instead, researchers have opted to use other methods including regressing variants one at a time (usually used in studies with human data) and using penalized regression methods like the LASSO/ridge regression.[1] (Penalized regression can be interpreted as placing priors on how many genetic variants are expected to affect a trait, and the distribution of their effect sizes; Bayesian counterparts exist for LASSO/ridge, and other priors have been suggested & used. They can perform better in some circumstances.[8]) A multi-dataset, multi-method study[7] found that of 15 different methods compared across four datasets, minimum redundancy maximum relevance was the best performing method. Furthermore, variable selection methods tended to outperform other methods. Variable selection methods do not use all the available genomic variants present in a dataset, but attempt to select an optimal subset of variants to use. This leads to less overfitting but more bias (see bias-variance tradeoff).
Predictive validity
The benefit of polygenic score is that they can be used to predict the future. This has large practical benefits for animal breeding because it increases the selection precision and allows for shorter generations, both of which speed up evolution.[9][3] For humans, it can be used to predict future disease susceptibility and for embryo selection.[4][10]
Some accuracy values are given below for comparison purposes. These are given in terms of correlations and have been converted from explained variance if given in that format in the source.
In humans
- In 2016, r ≈ 0.30 for educational attainment variation at age 16.[5] This polygenic score was based off the a GWAS using data from 293k persons.[11]
- In 2016, r ≈ 0.31 for case/control status for first-episode psychosis.[12]
In non-human animals
- In 2016, r ≈ 0.30 for variation in milk fat%.[13]
- In 2014, r ≈ 0.18 to 0.46 for various measures of meat yield, carcass value etc.[14]
In plants
- In 2015, r ≈ 0.55 for total root length in Maize (Zea mays L.).[15]
- In 2014, r ≈ 0.03 to 0.99 across four traits in barley.[16]
References
- 1 2 de Vlaming, Ronald; Groenen, Patrick J. F. (2015). "The Current and Future Use of Ridge Regression for Prediction in Quantitative Genetics". BioMed Research International. 2015: 1–18. doi:10.1155/2015/143712.
- ↑ Dudbridge, Frank (2013-03-21). "Power and Predictive Accuracy of Polygenic Risk Scores". PLOS Genet. 9 (3): e1003348. doi:10.1371/journal.pgen.1003348. ISSN 1553-7404. PMC 3605113. PMID 23555274.
- 1 2 Spindel, Jennifer E.; McCouch, Susan R. (2016-09-01). "When more is better: how data sharing would accelerate genomic selection of crop plants". New Phytologist: n/a–n/a. doi:10.1111/nph.14174. ISSN 1469-8137.
- 1 2 Spiliopoulou, Athina; Nagy, Reka; Bermingham, Mairead L.; Huffman, Jennifer E.; Hayward, Caroline; Vitart, Veronique; Rudan, Igor; Campbell, Harry; Wright, Alan F. (2015-07-15). "Genomic prediction of complex human traits: relatedness, trait architecture and predictive meta-models". Human Molecular Genetics. 24 (14): 4167–4182. doi:10.1093/hmg/ddv145. ISSN 0964-6906. PMC 4476450. PMID 25918167.
- 1 2 Selzam, S.; Krapohl, E.; von Stumm, S.; O'Reilly, P. F.; Rimfeld, K.; Kovas, Y.; Dale, P. S.; Lee, J. J.; Plomin, R. (2016-07-19). "Predicting educational achievement from DNA". Molecular Psychiatry. doi:10.1038/mp.2016.107. ISSN 1476-5578.
- ↑ James, Gareth (2013). An Introduction to Statistical Learning: with Applications in R. Springer. ISBN 978-1461471370.
- 1 2 Haws, David C.; Rish, Irina; Teyssedre, Simon; He, Dan; Lozano, Aurelie C.; Kambadur, Prabhanjan; Karaman, Zivan; Parida, Laxmi (2015-10-06). "Variable-Selection Emerges on Top in Empirical Comparison of Whole-Genome Complex-Trait Prediction Methods". PLOS ONE. 10 (10): e0138903. doi:10.1371/journal.pone.0138903. ISSN 1932-6203. PMC 4595020. PMID 26439851.
- ↑ Gianola & Rosa 2015, "One Hundred Years of Statistical Developments in Animal Breeding"
- ↑ Heslot, Nicolas; Jannink, Jean-Luc; Sorrells, Mark E. (2015-01-02). "Perspectives for Genomic Selection Applications and Research in Plants". Crop Science. 55 (1). doi:10.2135/cropsci2014.03.0249. ISSN 0011-183X.
- ↑ Shulman, Carl; Bostrom, Nick (2014-02-01). "Embryo Selection for Cognitive Enhancement: Curiosity or Game-changer?". Global Policy. 5 (1): 85–92. doi:10.1111/1758-5899.12123. ISSN 1758-5899.
- ↑ Okbay, Aysu; Beauchamp, Jonathan P.; Fontana, Mark Alan; Lee, James J.; Pers, Tune H.; Rietveld, Cornelius A.; Turley, Patrick; Chen, Guo-Bo; Emilsson, Valur. "Genome-wide association study identifies 74 loci associated with educational attainment". Nature. 533 (7604): 539–542. doi:10.1038/nature17671. PMC 4883595. PMID 27225129.
- ↑ Vassos, Evangelos; Forti, Marta Di; Coleman, Jonathan; Iyegbe, Conrad; Prata, Diana; Euesden, Jack; O’Reilly, Paul; Curtis, Charles; Kolliakou, Anna. "An Examination of Polygenic Score Risk Prediction in Individuals With First-Episode Psychosis". Biological Psychiatry. doi:10.1016/j.biopsych.2016.06.028.
- ↑ Hayr, M. K.; Druet, T.; Garrick, D. J. (2016-04-01). "027 Performance of genomic prediction using haplotypes in New Zealand dairy cattle.". Journal of Animal Science. 94 (supplement2). doi:10.2527/msasas2016-027. ISSN 1525-3163.
- ↑ Chen, L.; Vinsky, M.; Li, C. (2015-02-01). "Accuracy of predicting genomic breeding values for carcass merit traits in Angus and Charolais beef cattle". Animal Genetics. 46 (1): 55–59. doi:10.1111/age.12238. ISSN 1365-2052.
- ↑ Pace, Jordon; Yu, Xiaoqing; Lübberstedt, Thomas (2015-09-01). "Genomic prediction of seedling root length in maize (Zea mays L.)". The Plant Journal. 83 (5): 903–912. doi:10.1111/tpj.12937. ISSN 1365-313X.
- ↑ Sallam, A. H.; Endelman, J. B.; Jannink, J.-L.; Smith, K. P. (2015-03-01). "Assessing Genomic Selection Prediction Accuracy in a Dynamic Barley Breeding Population". The Plant Genome. 8 (1). doi:10.3835/plantgenome2014.05.0020. ISSN 1940-3372.
Further reading
- Agerbo et al 2015, "Polygenic Risk Score, Parental Socioeconomic Status, Family History of Psychiatric Disorders, and the Risk for Schizophrenia: A Danish Population-Based Study and Meta-analysis"
- Benyamin et al 2014, "Childhood intelligence is heritable, highly polygenic and associated with FNBP1L"
- Breen et al 2016, "Translating genome-wide association findings into new therapeutics for psychiatry"
- Bulik-Sullivan et al 2015, "LD Score regression distinguishes confounding from polygenicity in genome-wide association studies"
- Carey et al 2016, "Associations between Polygenic Risk for Psychiatric Disorders and Substance Involvement"
- Carneiro et al 2014, "Rabbit genome analysis reveals a polygenic basis for phenotypic change during domestication"
- Conley et al 2016a, "Assortative mating and differential fertility by phenotype and genotype across the 20th century" (appendix)
- Conley et al 2016b, "Changing Polygenic Penetrance on Phenotypes in the 20th Century Among Adults in the US Population"
- Davies et al 2011, "Genome-wide association studies establish that human intelligence is highly heritable and polygenic"
- Domingue et al 2015, "Polygenic Influence on Educational Attainment: New Evidence From the National Longitudinal Study of Adolescent to Adult Health"
- Dudbridge 2013, "Power and Predictive Accuracy of Polygenic Risk Scores"
- Germine et al 2016, "Association between polygenic risk for schizophrenia, neurocognition and social cognition across development"
- Kirkpatrick et al 2014, "Results of a 'GWAS Plus': General Cognitive Ability Is Substantially Heritable and Massively Polygenic"
- Krapohl et al 2015, "Phenome-wide analysis of genome-wide polygenic scores"
- Martin et al 2016, "Population genetic history and polygenic risk biases in 1000 Genomes populations"
- Papageorge & Thom 2016, "Genes, Education, and Labor Market Outcomes: Evidence from the Health and Retirement Study"
- Pasaniuc & Price 2016, "Dissecting the genetics of complex traits using summary association statistics"
- Plomin et al 2009, "Common disorders are quantitative traits"
- Power et al 2015, "Polygenic risk scores for schizophrenia and bipolar disorder predict creativity"
- Robinson et al 2015 , "Population genetic differentiation of height and body mass index across Europe",
- Srinivasan et al 2015, "Genetic Markers of Human Evolution Are Enriched in Schizophrenia"
- Stergiakouli et al 2016, "Association between polygenic risk scores for attention-deficit hyperactivity disorder and educational and cognitive outcomes in the general population"
- Visscher & Wray 2015, "Concepts and Misconceptions about the Polygenic Additive Model Applied to Disease"
- Woodley et al 2016, "How cognitive genetic factors influence fertility outcomes: A mediational SEM analysis."
- Wray et al 2014, "Research review: Polygenic methods and their application to psychiatric traits"
- Zheng et al 2016, "LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis"
- "Schizophrenia and subsequent neighborhood deprivation: revisiting the social drift hypothesis using population, twin and molecular genetic data", Sariaslan et al 2016