Feature

New Study Disavows Marshmallow Test’s Predictive Powers

Test's originator was a central co-author but died before its completion

If your 4-year-old cannot resist eating the marshmallow in front of her, even though you promised more treats if she waits, is she headed for a lifetime of struggle? You’re not alone if you think so.

For some 30 years, parents and scientists have turned to the marshmallow test to glean clues about kids’ futures. The experiment gained popularity after its creator, psychologist Walter Mischel, started publishing follow-up studies of the Stanford Bing Nursery School preschoolers he tested between 1967 and 1973. You can have this treat now, he famously told each 4 and 5-year-old, or have two when I get back to the room. The kids who couldn’t hold out long generally grew through their teens, 20s and 30s quicker to frustrate, weaker in academic and social skills, and with more drug use, mental health and weight issues — all that, according to well-publicized studies in the decades since.

Photo courtesy of David Dini/Columbia University

But the latest Bing follow-up study, by a team of researchers that included Mischel, casts doubt that a preschooler’s response to a marshmallow test can predict anything at all about her future.

Following the Bing children into their 40s, the new study finds that kids who quickly gave in to the marshmallow temptation are generally no more or less financially secure, educated or physically healthy than their more patient peers. The amount of time the child waited to eat the treat failed to forecast roughly a dozen adult outcomes the researchers tested, including net worth, social standing, high interest-rate debt, diet and exercise habits, smoking, procrastination tendencies and preventative dental care, according to the study published in the Journal of Economic Behavior and Organization.

“With the marshmallow waiting times, we found no statistically meaningful relationships with any of the outcomes that we studied,” UCLA Anderson’s Daniel Benjamin, who brings expertise to the study that includes behavioral economics and statistical methodology, says in an interview.

Co-authors of the study include Mischel and his former graduate students, Yuichi Shoda from University of Washington and Philip K. Peake from Smith College, who collaborated with him for decades on follow-up projects. David Laibson from Harvard University, Alexandra Steiny Wellsjo from University of California-Berkeley and Nicole L. Wilson from University of Oregon are also co-authors on the JEBO study.

What Changed?

The results differ from some earlier findings, such as those in highly publicized studies starting in the 1990s that linked short marshmallow wait times to obesity, Benjamin says. In the new study, which follows a somewhat smaller and different mix of preschoolers than earlier studies, “we’re not seeing a relationship with BMI (body mass index) that other studies did find a relationship with,” he says.

The latest follow-up focuses on, for the first time, an extensive analysis of each Bing graduate’s capital formation, which considers wealth, debt and credit habits along with a few nonfinancial indicators. Capital formation reflects one’s ability to invest in something, like education and exercise, to generate later benefits, such as income or health.

The researchers found they could glean clues about a Bing preschooler’s future from the subject’s parents and the subject’s own assessments of self-control as they aged. A self-regulatory competence index, made up of 86 responses to survey questions parents took for their children around age 17 and the graduates took for themselves around ages 27 and 37, moderately predicts many of the individual outcomes, the study finds.

Adding the marshmallow test results to the index does virtually nothing to the prognosis, the study finds. A 5-year-old’s performance on the marshmallow test, the researchers suggest, is about as predictive of his adult behavior as any single component in that index; i.e., not very.

The new study may be a final blow to destiny implications formed in marshmallow test research, which like many findings from earlier psychology studies, have been questioned in recent years.

Mischel spent about a decade working on this latest Bing follow-up. He helped formulate and pre-register (in 2014) the study’s multiple hypotheses, including those that appear to contradict conclusions from his previous studies. He died in 2018 before the study was completed, shortly after a cancer diagnosis.

Fated at Four?

Marshmallow tests are still popular with parents and educators partly because of the adorable things children do when left alone with temptation. They sniff it, taste it, put their faces really, really close to it. They twist away or squeeze their eyes shut. They dance, sing, rock in their chairs. And of course, sometimes they just eat it. Their reactions, demonstrated in amateur marshmallow experiments on YouTube, look a lot like those Mischel’s team documented at Bing some 50 years ago.

Initially, Mischel was more interested in the coping strategies kids employed to not eat the marshmallow than in potential predictive powers arising from measuring how long they waited. Self-control to delay gratification — the ability to resist temptation now to get something you want later — is required to achieve goals big and small. Go to bed early to feel rested in the morning; study today for more college choices next year; work now for next week’s paycheck; give up dessert this month so the clothes fit better the next. Mischel wanted to identify effective tactics, because unlike famous psychologists before him, he believed delayed gratification skills could be taught.

Yet Mischel’s follow-up research made the marshmallow test famous for an entirely different reason. In study after study, he and his research partners documented multiple ways his least patient 5-year-old treat eaters lagged behind the waiters as they grew into teenagers, 27-year-olds and 37-year-olds.

Parents, policymakers and educators embraced the studies’ unwritten take-home message: To raise successful, responsible kids, we must teach them to resist that first marshmallow.

Schools incorporated marshmallow tests and self-control techniques into curriculums.

Parents devised their own marshmallow tests. Success gurus gave TED Talks about it. Sesame Streets notoriously out-of-control Cookie Monster starred in a series of videos demonstrating delay of gratification skills he learned from Mischel.

In 2015, the Bing project scientists won the 2015 Golden Goose Award, bestowed by a group of congressmen for government funded research that leads to significant public benefits.

Replication Doubts

The follow-up studies had issues, many of which Mischel’s team pointed out early on. They stressed the Bing cohort was a small sample (about 185 of about 550 original test takers for the first follow-up) and an exceptionally homogenous lot. Mainly the offspring of Stanford faculty and staff, only a few didn’t finish college, and those “probably went on to start Microsoft or something,” Mischel told interviewers. 

Dividing up the participants into experimental treatment groups shrunk the sample further and clouded later results. Researchers, for example, sometimes offered the children advice to help them wait: Pretend the marshmallow is a cloud, or close your eyes.

Due to confidentiality agreements, only Mischel and the original research team and their colleagues had full access to the Bing subjects and their data. While other projects could (and did) mimic the experiments for similar findings, they could not attempt direct replications of the original studies. Any other subject pool would be younger, likely having taken different versions of marshmallow tests decades after Mischel’s kids.

In 2018, a major marshmallow test study gained fame for failing to find strong correlations between wait times and adolescent outcomes. Published in Psychological Science and led by Tyler W. Watts (now at Columbia University), the study followed a much more diverse group of 900 preschoolers into their teens. Controlling for differences such as household income and cognitive abilities, they found only weak relationships to academic outcomes and no significant correlations to later behaviors, such as anti-social tendencies.

The Watts study findings support a common criticism of the marshmallow test: that waiting out temptation for a later reward is largely a middle or upper class behavior. If you come from a place of shortages and broken promises, eating the treat in front of you now might be the better bet than trusting there will be more later.

But the Watts research varied from the Bing project in significant ways, including shorter times the preschoolers had to wait out a reward. By controlling for differences in the kids’ backgrounds — differences that Mischel’s team expected would influence wait times — the study can’t be considered a failed replication of the Bing project, some scientists argued.

In other words, the Watts study helped explain why some kids wait longer. It didn’t exactly debunk the idea that kids who eat that first marshmallow are more likely to experience adult difficulties related to delaying gratification.

Modern Protocols, Different Results

The JEBO team didn’t set out to question the earlier follow-ups. “When we started the project around 2006-07, I think we all took at face value the earlier findings between the relationship between the waiting times and other outcomes,” Benjamin says. “We wanted to examine the relationship between the waiting times and economic outcomes.”

By 2011, however, the standards for statistical studies in psychology were beginning to move toward sophisticated statistical analyses and stricter research protocols. “We all wanted to adopt state-of-the-art statistical and methodological approaches,” Benjamin says.

Before analyses began, the team pre-registered the study with Open Science Framework, a step intended to prevent researcher bias from affecting the focus and findings.

“That way, all of the decisions about how to analyze the data, you have to make in advance,” Benjamin explains. “It allows everyone else in the world to see that you’re doing what you said you were going to do.”

They used advanced statistical models to address small sample sizes and other issues that may have exaggerated earlier findings. They created a new measure of the time each original preschooler waited before taking a bite (or getting the reward) to adjust for variables such as age, gender and experiment conditions. For certain analyses, they separated results from preschoolers who spontaneously devised their own strategies for resisting treats from those who were helped with ideas from the researchers.

Smoking, Alcohol, Weight, Procrastination

For each of their 113 middle-aged subjects, the researchers built a self-regulatory index based on their subjects’ and their parents’ answers on long surveys taken as part of three follow-up studies. The index questions focus on self-control issues, asking for self-ratings on statements such as “is planful, thinks ahead,” “is calm and relaxed; easygoing” and “is persistent in activities; does not give up easily.” The marshmallow wait times from preschool is the fourth component of the self-regulation index.

They identified 11 measures of capital formation, including credit card misuse, high-rate debt, income, social status and financial security, as well as diet, smoking and alcohol habits, weight and procrastination tendencies.

As the researchers predicted, the study finds only a tiny correlation between marshmallow test times and midlife capital formation. A graduate’s score on the self-regulation index was, however, modestly predictive of their middle-age capital formation, the study finds.

In secondary analyses, the researchers reexamined outcomes from earlier studies, such as SAT scores, smoking and BMI. The more recent study didn’t see the same strong relationships between waiting times and outcomes. Benjamin describes these secondary findings as “intriguing but open to different interpretations” because of multiple differences between samples and protocols.

In one analysis, the researchers dug deeper to uncover why their results differed from those of a 2013 follow-up study. That study famously linked the Bing kids that quickly ate the first marshmallow to higher body weight indices and possible obesity as 36-year-olds. More than half of the 2013 study’s 164 participants also reported their roughly 46-year-old height and weight in the latest survey. In the new study’s sample, the JEBO team found no correlation between their preschool marshmallow test results and BMI at age 36 or at age 46.

The team hypothesized higher correlations among kids who didn’t get advice on how to wait from an adult. But while results from those marshmallow tests may be somewhat more statistically predictive, the predictive power — if there is any at all — is very small, according to the findings.

The greater predictive power of the self-control index likely comes from its wider pool of information, Benjamin says. The index, the authors note, is compiled from 86 responses from each subject in three different decades of life, while the marshmallow test is a single variable measured one day in preschool.

Having It Both Ways?

In his 2014 book, The Marshmallow Test: Mastering Self Control, Mischel reiterates his earlier findings linking preschool wait times to SAT scores, BMI, sense of self-worth, drug use and other adult outcomes. The same year, without equivocation, he finalized and pre-registered research hypotheses predicting that the Bing kids’ preschool marshmallow wait times wouldn’t be very indicative of their adult lives. Meanwhile, in numerous interviews around that time, he rejected the popular idea that flunking the marshmallow test meant a lifetime of underachievement.

Mischel embraced these seemingly incongruous convictions in a PBS NewsHour segment in 2015. If you exhibit self-control at an early age, “you have a much better chance of taking the future into account and likely to have better economic outcomes,” he says. “But the idea that your child is doomed if she chooses not to wait for her marshmallows is really just a serious misinterpretation.”

Was Mischel trying to have it both ways?

Prior to marshmallows, Mischel was already well-known in his profession for Personality and Assessment (1968), a take-down of a then-popular psychology theory in which broad personality traits (extroversion, neuroticism or optimism, for example) determine behaviors throughout life. In the book, Mischel demonstrates that one’s behavior varies as much by situation as by trait; i.e., even extroverts act shy sometimes.

The marshmallow test, Benjamin explains, fit into Mischel’s whole outlook on psychology. Mischel considered the test, which allowed researchers to see how people acted in real situations, a better measure of behavior than answers on questionnaires.

The popularity of the follow-up studies that linked the test to behavior much later in life created a conundrum.

“Although he had been a leading psychology researcher for decades, he became famous among the general public because people perceived that the marshmallow test was amazing,” Benjamin says. “Getting people to pay attention to the marshmallow test was valuable because he believed, as I do, that what it aims to measure — self-control — is central to understanding human psychology and behavior.”

But when it comes to his early theory? “[Mischel] also didn’t think that any simple measure of individual differences was going to be very good at predicting behavior,” Benjamin continues. “Despite the popular perception that the marshmallow test is a crystal ball,” he clearly expected only to see only weak correlations with marshmallow test results in the latest study, Benjamin says.

Questions about the real predictiveness of marshmallow tests never dampened Mischel’s support for teaching kids delay of gratification skills. He personally worked with charter schools to instill, throughout any day’s lesson, the importance of resisting immediate temptations to get something better later. Anyone can learn this willpower, he contended, even those who just couldn’t resist that first marshmallow.

Featured Faculty

About the Research

Shoda, Y., Mischel ,W. & Peake, P.K. (1990), Predicting adolescent cognitive and self-regulatory competencies from preschool delay of gratification: Identifying diagnostic conditions. Developmental Psychology.

Mischel, W. and Ebbesen, E. (1970). Attention in delay of gratification. Journal of Personality and Social Psychology.

Benjamin, D.J., Laibson, D., W. Mischel, Peake, P.K., Shoda, Y., Wellsjo, A. & Wilson, N. (2020). Predicting mid-life capital formation with pre-school delay of gratification and life-course measures of self-regulation, Journal of Economic Behavior and Organization.

Watts, T.W., Duncan, G.J., & Quan, H. (2018). Revisiting the Marshmallow Test: A conceptual replication investigating links between early delay of gratification and later outcomes. Psychological Science.

Doebel, S., Michaelson, L. E., & Munakata, Y. (2020). Good things come to those who wait: Delaying gratification likely does matter for later achievement. Commentary on Watts, Duncan, & Quan: “Revisiting the Marshmallow Test: A Conceptual Replication Investigating Links Between Early Delay of Gratification and Later Outcomes.” Psychological Science.

Schlam, T.R. ,Wilson, N.L., Shoda, Y., Mischel, W. & Ayduk, O. (2013). Preschoolers’ Delay of Gratification Predicts Their Body Mass 30 Years Later. The Journal of Pediatrics.

Mischel, W. (2014). The Marshmallow Test: Mastering self-control. Little, Brown and Co.

Mischel, W. (1968). Personality and Assessment. Lawrence Erlebaum Associates.

Related Articles

Black background with social media and chat, friend, message bubbles. Research Brief / Behavioral Decision Making

Want To Elicit User-Generated Content? Try User-Generated Rewards

Recognition by peers leads to longer, more thoughtful online reviews and discussion threads

Covid-19 vaccine vials in a row macro close-up. Research Brief / COVID-19

Should the U.S. Aim for a Zero-COVID-19 Policy?

Tolerating a low level of transmission just might be the better strategy

Tennis player Nick Kyrgios Research Brief / Behavioral Decision Making

In Adversity, Some High Performers Give Up Rather than Dig In

The world of tennis sheds light on a potential downside to office ranking systems

Clean operating room Research Brief / Health Care

Two-Tier Pricing: Reducing Risk for ACOs and Specialist Practices

Higher prices for the first few procedures, followed by a volume discount, may help balance risks and rewards