Explore

Beware the NAEP Overreactions: 4 Reasons Why Education Pundits Should Rein in the Rhetoric This Week

By Matt Barnum

October 22, 2015

Untangle Your Mind!

Most Popular

school choice
More Than a Third of Homeschool Families Also Use Public Schools, New Data Shows
school choice
Big Tax Bill Passes Senate With Less ‘Beautiful’ Plan for National School Choice
Indiana
Tiny Indiana District With Online School Worth Millions Ordered To Close
commentary
It’s Time to Reject Chronic Absenteeism as the New Normal in Student Attendance
data analysis
Suspensions for Students with Disabilities Are Far More Frequent in These States

On Wednesday, the latest results of the National Assessment of Educational Progress (NAEP) — the gold standard, low-stakes test given to a representative cross section of 4th and 8th graders across the country — will be released. Virtually every news outlets, including The Seventy Four, will cover the numbers, which makes sense since the scores are important and have some limited value for policymakers. But if history is any guide, for one day the educational world will collectively lose its mind over NAEP, engaging in what Mathematica researcher Steve Glazerman calls “misNAEPery”.

Here are four reasons why NAEP results should be interpreted very cautiously:

When the 2013 NAEP scores were released, Secretary of Education Arne Duncan pointed to the relatively large student “gains” in Washington, D.C. NAEP scores, saying, “Leaders in D.C. have shown tremendous courage and taken bold steps that are resulting in strong growth.” Commentators across the country, including the editorial boards of the New York Times, Washington Post, and Wall St Journal, picked up on this to suggest that D.C.’s NAEP gains (as well as Tennessee’s) were showing that school reform was working.

Unfortunately, such claims are entirely unsupported. Researchers have extensively documented why it is inappropriate to use raw NAEP scores to judge the success or failure of specific policies.

The basic reason is a wonky, boring but crucially important one: NAEP scores, on their own, offer no comparison (or “control”) group by which to judge specific policies or even packages of policies. Remember eighth-grade science class? To make causal inferences, there must be both a treatment group and a control group.

As an example, let’s say Wednesday brings good news in the form of higher NAEP scores. Reformers will claim their policies are working — but how do we know? Maybe scores would have been even higher if a different set of policies were pursued. Maybe scores went up for reasons entirely unrelated to reform policies. We simply can’t say.

Student achievement is based on everything that has happened in a student’s life before taking the test.

We tend to think of schools as driving test scores because students take tests and formally learn academic content in schools. Indeed, schools have an extremely important impact on student learning, but out-of-school factors have an even greater effect on student test scores. This is yet another reason we can’t use NAEP to judge school policies. The many out-of-school factors driving achievement — the economy, access to healthcare, etc. — mean we can’t even be sure that changes in NAEP scores had anything to do with changes in schools.

In the coverage of NAEP scores, we will almost surely hear about some state whose students “showed the most growth.” For example, in 2013, the Washington Post reported that “the District [of Columbia]’s fourth- and eighth-graders made significant gains on national math and reading tests this year, posting increases that were among the city’s largest in the history of the exam.” This is not quite right, because the fourth- graders who took the test in 2013 are not the same fourth-graders who took the last NAEP years earlier. In other words, all we can say is that one group of students has a higher average score than a completely different group of students from a couple years ago.

This may seem like an academic point, but it raises yet another problem with trying to make inferences about policy based on NAEP: demographic changes among students tested may contribute to changes in average test scores. What look like ‘gains’ may just be differences in which students were tested.

I can guarantee that the NAEP results — regardless of what the actual data are — will be used by commentators to reinforce their previously held policies positions. That people will use the same data to reach opposite conclusion is an indication that we shouldn’t read too much into said data.

Advocates will surely declare “[State X, which had ‘good’ results] did [Policy Y, which I already like]; therefore everyone should do [Policy Y].” If scores show improvement reformers will say, “This shows our policies are working — full speed ahead!” If there aren’t improvement reformers will say, “This shows why our schools are in desperate need of reform — full speed ahead!”

Similarly reform skeptics will gleefully point to disappointing results as evidence that reform policies are failing. But if scores rise, they will declare that NAEP scores shouldn’t be taken seriously and that tests don’t much matter.

People believe what they believe; NAEP scores won’t — and frankly shouldn’t — change this. But can we just drop the charade?

This is not to say that NAEP scores are useless. They are genuinely important indicators about whether students across the country are learning more math and reading than past students. And although raw data cannot be used to judge specific policies or policymakers, it is absolutely reasonable to make hypotheses about policy that can then be tested rigorously.

In turn, NAEP scores have been used by researchers with careful, statistically rigorous designs to test the efficacy of certain policies. (For example, much of the research on No Child Left Behind uses NAEP data, but does so by creating controls and applying careful statistical analyses.) The key words here are statistically rigorous — an eyeball test does not count.

So, yes, although some rumors suggest that they’ll be lower, I hope NAEP scores go up on Wednesday. It will be nice to see and a hopeful sign for education reform and our country. But no, I won’t be using raw NAEP scores to judge the success of policies or politicians or to support the things I already believe — however tempting it might be.

Get stories like these delivered straight to your inbox. Sign up for The 74 Newsletter

Republish This Article Learn More

Matt Barnum is a senior staff writer at The 74.

@matt_barnum matt@the74million.org

Republish This Article

We want our stories to be shared as widely as possible — for free.

Please view The 74's republishing terms.


                <h1>Beware the NAEP Overreactions: 4 Reasons Why Education Pundits Should Rein in the Rhetoric This Week</h1>

                <h2></h2>

                <p class="sans">By <a rel="author" href="https://www.the74million.org/contributor/matt-barnum/">Matt Barnum</a></p>

                <img src="https://www.the74million.org/wp-content/uploads/2017/01/1445550844_3190.png">

                <p>This story first appeared at <a href="https://www.the74million.org">The 74</a>, a nonprofit news site covering education. <a href="https://www.the74million.org/about/newsletters/?utm_source=republish-button&utm_medium=website&utm_campaign=republish">Sign up for free newsletters from The 74</a> to get more like this in your inbox.</p>
                <div class="article__body">
<div class="listicle-intro">
<div class="article__paragraph">
<div class="article__paragraph opening" dir="ltr"><span id="docs-internal-guid-5c9c8b0a-a089-090e-d16c-9dab89bb179e">O</span>n Wednesday, the latest results of the National Assessment of Educational Progress (NAEP) — the gold standard, low-stakes test given to a representative cross section of 4th and 8th graders across the country — will be released. Virtually every news outlets, including The Seventy Four, will cover the numbers, which makes sense since the scores are important and have some limited value for policymakers. But if history is any guide, for one day the educational world will collectively lose its mind over NAEP, engaging in what Mathematica researcher Steve Glazerman <a href="http://greatergreatereducation.org/post/18672/bad-advocacy-research-abounds-on-school-reform/" target="_blank">calls</a> “misNAEPery”.</div>
<div class="article__paragraph"><span id="docs-internal-guid-5c9c8b0a-a089-090e-d16c-9dab89bb179e">Here are four reasons why NAEP results should be interpreted very cautiously: </span></div>
</div>
</div>
<div style="clear:both"></div>
<div class="listicle-item">
<div class="number-and-header">
<span class="number"> 1 </span><br />
<span class="header"> Raw NAEP data can tell us NOTHING about which education policies are effective and which aren’t. </span>
</div>
<div class="description">
<div class="article__paragraph">
<div class="article__paragraph">
<div class="article__paragraph">
<div class="article__paragraph">
<div class="article__paragraph" dir="ltr"><span id="docs-internal-guid-5c9c8b0a-918d-79aa-67ea-83e313da20d8">When the 2013 NAEP scores were released, Secretary of Education Arne Duncan pointed to the relatively large student “gains” in Washington, D.C. NAEP scores, </span><a href="http://osse.dc.gov/release/mayor-gray-and-us-secretary-education-duncan-hail-continued-improvement-district’s-test" target="_blank">saying</a>, “Leaders in D.C. have shown tremendous courage and taken bold steps that are resulting in strong growth.” Commentators across the country, including the editorial boards of the <a href="http://www.nytimes.com/2013/11/20/opinion/advertisements-for-the-common-core.html" target="_blank">New York Times</a>, <a href="https://www.washingtonpost.com/opinions/test-scores-point-to-school-reform-success-in-the-district/2013/11/07/76080cbc-47f0-11e3-a196-3544a03c2351_story.html" target="_blank">Washington Post</a>, and <a href="http://www.wsj.com/news/article_email/SB10001424052702304448204579183850707598712-lMyQjAxMTAzMDAwNzEwNDcyWj" target="_blank">Wall St Journal</a>, picked up on this to suggest that D.C.’s NAEP gains (as well as Tennessee’s) were showing that school reform was working.</div>
<div class="article__paragraph" dir="ltr"><span id="docs-internal-guid-5c9c8b0a-918d-79aa-67ea-83e313da20d8">Unfortunately, such claims are entirely unsupported. </span><a href="http://www.shankerinstitute.org/blog/select-your-conclusions-apply-data" target="_blank">Researchers</a> <a href="http://greatergreatereducation.org/post/18672/bad-advocacy-research-abounds-on-school-reform/" target="_blank">have</a> <a href="http://www.mathematica-mpr.com/~/media/publications/PDFs/Education/False_Perf.pdf" target="_blank">extensively</a> <a href="http://morganpolikoff.com/2015/10/06/friends-dont-let-friends-misuse-naep-data/" target="_blank">documented</a> <a href="http://www.edweek.org/ew/articles/2013/07/24/37naep.h32.html" target="_blank">why</a> it is inappropriate to use raw NAEP scores to judge the success or failure of specific policies.</div>
<div class="article__paragraph" dir="ltr"><span id="docs-internal-guid-5c9c8b0a-918d-79aa-67ea-83e313da20d8">The basic reason is a wonky, boring but crucially important one: NAEP scores, on their own, offer no comparison (or “control”) group by which to judge specific policies or even packages of policies. Remember eighth-grade science class? To make </span>causal inferences, there must be both a treatment group and a control group.</div>
<div class="article__paragraph"><span id="docs-internal-guid-5c9c8b0a-918d-79aa-67ea-83e313da20d8">As an example, let’s say Wednesday brings good news in the form of higher NAEP scores. Reformers will claim their policies are working — but how do we know? Maybe scores would have been even higher if a different set of policies were pursued. Maybe scores went up for reasons entirely unrelated to reform policies. We simply can’t say.</span></div>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="listicle-item">
<div class="number-and-header">
<span class="number"> 2 </span><br />
<span class="header"> Lots of things besides schools and education policies affect NAEP scores. </span>
</div>
<div class="description">
<div class="article__paragraph">
<div class="article__paragraph">
<div class="article__paragraph">
<div class="article__paragraph" dir="ltr"><span id="docs-internal-guid-5c9c8b0a-918e-5699-09a9-6a8082c5a801">Student achievement is based on everything that has happened in a student’s life before taking the test. </span></div>
<div class="article__paragraph"><span id="docs-internal-guid-5c9c8b0a-918e-5699-09a9-6a8082c5a801">We tend to think of schools as driving test scores because students take tests and formally learn academic content in schools. Indeed, schools have an extremely important impact on student learning, but out-of-school factors have an even </span><a href="http://www.shankerinstitute.org/blog/teachers-matter-so-do-words" target="_blank">greater effect</a> on student test scores. This is yet another reason we can’t use NAEP to judge school policies. The many out-of-school factors driving achievement — the economy, access to healthcare, etc. — mean we can’t even be sure that changes in NAEP scores had anything to do with changes in schools.</div>
</div>
</div>
</div>
</div>
</div>
<div class="listicle-item">
<div class="number-and-header">
<span class="number"> 3 </span><br />
<span class="header"> Changes in NAEP scores are not actually “growth.” </span>
</div>
<div class="description">
<div class="article__paragraph">
<div class="article__paragraph">
<div class="article__paragraph">
<div class="article__paragraph" dir="ltr"><span id="docs-internal-guid-5c9c8b0a-918e-cf46-1fcf-7f19ae7b7ee8">In the coverage of NAEP scores, we will almost surely hear about some state whose students “showed the most growth.” For example, in 2013, the Washington Post </span><a href="https://www.washingtonpost.com/local/education/dc-posts-significant-gains-on-national-test-outpacing-nearly-every-state/2013/11/07/dccc08c0-475c-11e3-b6f8-3782ff6cb769_story.html" target="_blank">reported</a> that “the District [of Columbia]’s fourth- and eighth-graders made significant gains on national math and reading tests this year, posting increases that were among the city’s largest in the history of the exam.” This is <a href="http://www.mathematica-mpr.com/~/media/publications/PDFs/Education/False_Perf.pdf" target="_blank">not quite right</a>, because the fourth- graders who took the test in 2013 <em>are not the same fourth-graders</em> who took the last NAEP years earlier. In other words, all we can say is that one group of students has a higher average score than a completely different group of students from a couple years ago.</div>
<div class="article__paragraph"><span id="docs-internal-guid-5c9c8b0a-918e-cf46-1fcf-7f19ae7b7ee8">This may seem like an academic point, but it raises yet another problem with trying to make inferences about policy based on NAEP: demographic changes among students tested </span><a href="http://greatergreatereducation.org/post/20736/test-scores-rise-is-it-better-education-or-gentrification/" target="_blank">may contribute</a> to changes in average test scores. What look like ‘gains’ may just be differences in which students were tested.</div>
</div>
</div>
</div>
</div>
</div>
<div class="listicle-item">
<div class="number-and-header">
<span class="number"> 4 </span><br />
<span class="header"> Most people will use NAEP data to reiterate what they already believe — no matter what the data say. </span>
</div>
<div class="description">
<div class="article__paragraph">
<div class="article__paragraph">
<div class="article__paragraph">
<div class="article__paragraph">
<div class="article__paragraph" dir="ltr"><span id="docs-internal-guid-5c9c8b0a-918f-f946-4252-ae523d657acc">I can guarantee that the NAEP results — </span>regardless of what the actual data are — will be used by commentators to reinforce their previously held policies positions. That people will use the same data to reach opposite conclusion is an indication that we shouldn’t read too much into said data.</div>
<div class="article__paragraph" dir="ltr"><span id="docs-internal-guid-5c9c8b0a-918f-f946-4252-ae523d657acc">Advocates will surely declare  “[State X, which had ‘good’ results] did [Policy Y, which I already like]; therefore everyone should do [Policy Y].” If scores show improvement reformers will say, “This shows our policies are working — full speed ahead!” If there aren’t improvement reformers will say, “This shows why our schools are in desperate need of reform — full speed ahead!”</span></div>
<div class="article__paragraph" dir="ltr"><span id="docs-internal-guid-5c9c8b0a-918f-f946-4252-ae523d657acc">Similarly reform skeptics will gleefully point to disappointing results as evidence that reform policies are failing. But if scores rise, they will </span><a href="http://dianeravitch.net/2013/11/08/naep-nonsense-dont-believe-the-hype/" target="_blank">declare</a> that NAEP scores shouldn’t be taken seriously and that tests don’t much matter.</div>
<div class="article__paragraph"><span id="docs-internal-guid-5c9c8b0a-918f-f946-4252-ae523d657acc">People believe what they believe; NAEP scores won’t — and frankly shouldn’t — change this. But can we just drop the charade?</span></div>
<div class="article__paragraph">
<hr>
<p> </p>
</div>
<div class="article__paragraph">
<div class="article__paragraph opening" dir="ltr"><span id="docs-internal-guid-5c9c8b0a-9190-5b38-2661-1f99d9744d94">T</span>his is not to say that NAEP scores are useless. They are genuinely important indicators about whether students across the country are learning more math and reading than past students. And although raw data cannot be used to judge specific policies or policymakers, it is absolutely reasonable to make <em>hypotheses</em> about policy that can then be tested rigorously.</div>
<p dir="ltr"><span id="docs-internal-guid-5c9c8b0a-9190-5b38-2661-1f99d9744d94">In turn, NAEP scores have been used by researchers with careful, statistically rigorous designs to test the efficacy of certain policies. (For example, much of the </span><a href="https://www.the74million.org/article/what-if-no-child-left-behind-worked-and-nobody-realized-it-blame-the-media">research</a> on No Child Left Behind uses NAEP data, but does so by creating controls and applying careful statistical analyses.) The key words here are statistically rigorous — an eyeball test does not count.</p>
<p><span id="docs-internal-guid-5c9c8b0a-9190-5b38-2661-1f99d9744d94">So, yes, although some rumors </span><a href="http://edexcellence.net/articles/if-national-test-scores-are-down-blame-the-recession" target="_blank">suggest</a> that they’ll be lower, I hope NAEP scores go up on Wednesday. It will be nice to see and a hopeful sign for education reform and our country. But no, I won’t be using raw NAEP scores to judge the success of policies or politicians or to support the things I already believe — however tempting it might be.</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>

Contact Us

Follow Us

Explore

Beware the NAEP Overreactions: 4 Reasons Why Education Pundits Should Rein in the Rhetoric This Week

Untangle Your Mind!

Most Popular

More Than a Third of Homeschool Families Also Use Public Schools, New Data Shows

Big Tax Bill Passes Senate With Less ‘Beautiful’ Plan for National School Choice

Tiny Indiana District With Online School Worth Millions Ordered To Close

It’s Time to Reject Chronic Absenteeism as the New Normal in Student Attendance

Suspensions for Students with Disabilities Are Far More Frequent in These States

On The 74 Today