Research Brief

College Completion and Your Genome (Don’t Get Too Excited)

It’s still early days in genetic research, though advances will aid study of educational attainment and, notably, disease

Scientists have known for some time that because genetic differences affect cognitive abilities and personality traits that matter for school, those differences end up playing a role in how much formal schooling we complete.

By looking for these variations in the genomes of millions of people, scientists can predict, with a modicum of accuracy, whether the amount of time that an individual spends in school will be higher or lower than the average for the general population.

What does this mean for predicting the schooling potential for, say, your own child, or for any particular individual? Research on twins suggests that both the genes you inherit and your family environment play an important role in how far you are likely to go in school. By examining the influence of particular genetic variants, researchers can predict the level of education that an individual will achieve with about the same accuracy as if they made this prediction based on looking at the amount of education the individual’s parents’ received.

Opt In to the Review Monthly Email Update.

However, neither prediction is very accurate. While both matter, neither genes nor family environment determines our eventual level of education.

But even though genes are poor predictors of any individual’s education, they are useful for researchers investigating various scientific questions. And so scientists want to improve the predictive power of the polygenic index for educational attainment. The index is a formula that assigns a number indicating the combined impact of each genetic variation seen in a single genome on years of schooling.

Can It Predict Individual Outcomes?

A study in Nature Genetics increases the predictive power of the polygenic index for educational attainment by estimating more precisely the associations between years of schooling and thousands of single nucleotide polymorphisms (SNPs, pronounced “snips”). By comparing patterns in complete genomes from 3 million individuals of predominantly European genetic ancestry, researchers identified 3,952 SNPs that correlate with years of schooling. A similar consortium study the journal published in 2018 included genomes from 1.1 million individuals and found 1,271 relevant SNPs.

Research for both papers was coordinated by the Social Science Genetic Association Consortium, an international organization co-founded by UCLA Anderson’s Daniel Benjamin, which has published some of the largest genetic studies in the world by combining gene sequences of individuals in disparate research projects into one gigantic sample.

While most genomewide association studies are confined to medical research institutes, consortium papers have dozens or more authors from a range of natural and social sciences. Benjamin was a supervisor on the latest study along with UCLA Anderson’s Alexander Young, George Mason’s Jonathan P. Beauchamp, University of Queensland’s Peter M. Visscher and Loic Yengo, University of Southern California’s Patrick Turley, New York University’s David Cesarini, Oxford’s Augustine Kong and Vrije Universitet Amsterdam’s Aysu Okbay. An additional 29 scientists representing 19 institutions altogether are co-authors on the paper, along with the consortium itself and a 23andMe research team.

This study has implications for, among others, researchers and people with various common health issues, which turn out to have genetic factors in common with educational attainment. But because genetic research is prone to misinterpretation and misuse, the consortium has published 40 pages of FAQs, an explainer about how these latest findings should and should not be viewed. Key takeaways: The polygenic index for educational attainment is a useful, but weak, individual predictor of years of schooling. While it’s not very accurate for predicting individual outcomes, it is useful for researchers investigating various scientific hypotheses.

The improved index is useful to researchers in areas that may not seem obvious given the focus on years of schooling. These latest findings raise the accuracy of certain disease-specific indexes, like polygenic indexes used for asthma or Alzheimer’s, by roughly 50%. The results also improve the data scientists are already using from similar studies to explore things like brain function and attention deficit disorders. The findings suggest that increasing sample sizes in future research will lead to better predictors of years of schooling, with benefits for many areas of health and social science research.

Bigger Studies, Better Predictions

The influence of a single SNP variant on educational attainment is tiny. (About 1.4 weeks of schooling at median in the study for the variants with the largest effects.) The same is true for many other social outcomes like well-being and risk aversion, as well as disease risks that are heavily influenced by environment. For most complex human phenotypes, including most diseases, many thousands of SNP variants need to be considered to have meaningful effects on real life outcomes.

Scientists can’t find such weak influencers by comparing one or a handful of genomes; it takes at least hundreds of thousands to get a respectable average. The relevant SNPs become more and more apparent, and the predictive power of the indexes they influence gets better and better, as the sample size grows.

By adding to the genome sequences from 69 studies that they used in the 2018 paper —  including more than 2 million additional genomes from 23andMe customers who agreed to have their data used for research — Benjamin and his co-authors tripled the size of their past sample. In all, they looked at some 10 million SNPs in the roughly 3 million research participants.

The bulk led to gains in the predictive power of the index. For example, the latest study found that only 7% of the individuals whose polygenic index for education attainment was in the bottom 10th of this sample graduated from college. In the top 10th, 71% graduated. While that is an impressive degree of predictive power for comparing the group in the bottom 10th with the group in the top 10th, the index cannot make practical predictions about schooling for individuals.

When applied to an individual, the researchers explain in the FAQs, scientists using the index to predict whether a given individual would end up finishing more or less schooling than average would get it right about 62% of the time. That’s not a whole lot higher than the 50% chance of getting it right that you would have if you knew nothing at all about the individual.

The researchers want the public to be aware of these limitations as more businesses offer to personalize polygenic indexes for educational attainment. Some IVF clinics, for example, offer the service to clients who want to screen their embryos for the best possible shot at a long education. For a variety of reasons Benjamin discusses in other research, this process is a lot less likely to result in a straight-A graduate student than prospective parents might expect and may not be a service prospective parents should use.

Even as the indices improve with future research, “there will always be many people whose polygenic indexes ‘predict’ lower educational attainment who in fact attain relatively high amounts of education and vice-versa,” according to the FAQs.

Education for Disease Researchers

Despite all the disclaimers, these latest improvements have made the polygenic index about as accurate for forecasting educational attainment as some better-known demographic predictors. The current polygenic index for educational attainment predicts 13-17% of the variation between people’s educational attainment, which as the authors point out, means it fails to predict about 83-87%. By comparison, a mother’s education level predicts roughly 15% of the variation across individuals for how long they stay in school, according to an earlier study.

This level of accuracy makes the polygenic index — like parental education — useful as a control variable in research about environmental influences, the authors note. A study looking at how preschool affects later education success, for example, could control for the level of schooling affected by genetics alongside other controls, such as income. Controlling for as many sources of variation in schooling as possible, including genetic sources, improves the accuracy of results and makes it possible to do research with smaller sample sizes.

Benjamin and his co-authors also found that predictions of certain common diseases from genetic data are substantially more accurate when scientists consider the polygenic index for educational behavior alongside its disease-specific index. These include Alzheimer’s disease, bipolar disorder, ADHD, schizophrenia and coronary artery disease, as well as longevity.

The researchers point out, however, that the predictive power of polygenic indexes for these complex diseases is still very weak. Combining a disease index with the education index means that on average, they predict about 1.8% of the variation across individuals instead of 1.2% alone.

Overlaps between the SNPs identified in educational attainment and those for a specific disease can help in disease research, Benjamin explains. Stripping out the education related SNPs can narrow the targets, for example, for an Alzheimer’s researcher looking for promising drug targets. The SNPs not associated with education are more likely to be directly related to the brain biology the drug needs to change.

Why Does This Work at All?

Scientists don’t know exactly how the SNPs correlating with educational attainment actually affect that outcome. Some SNPs may affect cognitive skills. Some may affect years of schooling by influencing traits that make formal education more or less successful, like sleep quality or the ability to concentrate through boring lectures. And researchers have only a partial understanding of why the predictive power of the index falls off a cliff when applied to genetically non-European populations.

The vast majority of genetic studies have been conducted only with participants in the U.S., Europe and Australia whose recent genetic ancestors lived in Europe. This restriction to participants of relatively homogeneous genetic ancestries has been helpful in reducing the confounding factors that researchers are concerned about. But because of that, the resulting indexes don’t work nearly as well for predicting outcomes for people whose recent genetic ancestors are more distantly related, including people whose recent genetic ancestors lived in Asia or Africa.

Our tendency to mate with someone with a similar level of education boosts the predictive power of the index. The researchers find that a lot of people also marry people who are similar in other ways that correlate with educational attainment, like cognitive aptitude or region of origin. These multiple similarities appear to amplify the predictive power of the index.

In science time, these types of studies are still very new. They are leading to new insights about the genetics of diseases, as well as behavioral traits like educational attainment. But perhaps even more importantly, they are teaching us about the many and complex ways that genetic influences are shaped by environmental and social factors.

Featured Faculty

About the Research

Okbay, A., Social Science Genetic Association Consortium, et al. (2022). Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals, Nature Genetics54(4), 437-449.


Related Articles

Two men sit on a bench in a scene from the TV show Research Brief / Behavioral Economics

The Role of Chance Encounters in Silicon Valley Innovation

Cellphone signals and patent citations approximate a theory’s long-sought paper trail

Bottles of pills arranged to represent a bar graph showing the rising cost of medicine. Research Brief / Health Care

$52.6 Billion: Extra Cost to Consumers of Add-On Drug Patents

The figure is a subset, not covering huge expense of extended patents on high-priced biologics like Humira

Illustration of a brain and a hand holding up a coin Research Brief / Behavioral Economics

Do People Donate Money to Signal Their Intelligence?

Research suggests such a connection when donations are publicized

Computer model of chromosomes unwinding Feature / Health Care

BRCA Mutation: New Model Quantifies How Surgeries Reduce Cancer Risk

Informed by personal experience, a researcher parses data to help those mulling mastectomy and gynecological surgeries