Explore

Support The 74 and stories like this one. Donate Today!

Analysis

Osborne: States Still Rely Too Heavily on Test Scores to Hold Schools Accountable. Here’s a Better Way for Them to Break It All Down

By David Osborne

February 9, 2021

Untangle Your Mind!

Most Popular

literacy
We Started Grouping Students by Reading Ability vs. Grade. Here’s What Happened
core topics
New Report: States Need to Up Their Game on Preparing Elementary Math Teachers
school choice
More Than a Third of Homeschool Families Also Use Public Schools, New Data Shows
Indiana
Tiny Indiana District With Online School Worth Millions Ordered To Close
commentary
It’s Time to Reject Chronic Absenteeism as the New Normal in Student Attendance

Despite heated rhetoric to the contrary, most Americans think we need standardized tests, to make sure kids are learning the basics. Last year, 61 percent of adults surveyed by Gallup and Phi Delta Kappa thought it appropriate to use test scores as a main factor in judging school quality. But in a previous version of the survey, five years ago, most respondents said other indicators, such as graduation rates, employment rates, and student engagement, were more important.

There is a lot of wisdom here. We need standardized tests to see if students are learning to read, do math, write, and understand science and history. If we don’t measure such things, how will we know which schools are failing and need to be replaced?

But for the last two decades, heavy reliance on test scores has encouraged cookie-cutter schools focused on preparing students for tests. Instead, we need diverse schools that cultivate the joy of learning, engage students in meaningful thinking and help them develop the character skills — such as conscientiousness and self-control — that lead to success in life.

Though the Every Student Succeeds Act leaves states free to handle accountability as they please while requiring measurement systems, the most recent study by the Education Commission of the States showed that at least 40 states still gave test scores 80 percent or more of the weight in their elementary and middle school rating systems. In high school accountability ratings, about 20 states still assigned 70 percent or more of the weight to test scores. These numbers are far too high. States are waiving accountability during the pandemic, but the overemphasis on test scores will distort school quality ratings again when the world returns to normal.

Relying too heavily on test scores creates myriad problems. We all know people who perform well in life and work but did not test well, because of stress, learning disabilities, trauma or other challenges. Many of us have also seen questions that unconsciously assume the test takers are from a white, middle-class culture. And standardized tests often give misleading signals about students who are still learning English or who speak a dialect of English. Even if tests did not put such kids at a disadvantage, they tempt schools to prioritize rote learning to drive up scores, in the process undermining children’s natural love of learning.

Finally, heavy reliance on testing in accountability systems can discourage creation of schools for particularly challenging students, such as dropouts, children with disabilities or those who don’t speak English. They also discourage schools from trying new methods, such as project-based learning, student internships or career and technical education, that might deepen learning but hurt test scores. We need more innovation in education, not less; we need to encourage educators to start schools aimed at students who do not thrive in cookie-cutter environments.

Because I think test scores are important but tell only half the story about a school, I believe we should give them about half the weight in state accountability systems. (The main focus should be on academic growth, not current proficiency levels — another area in which states have improved under ESSA but need more progress.) For high schools, part of that other half should focus on outcomes, such as graduation rates, college entrance and persistence, and training and employment for graduates who don’t choose college. For all schools, we should also measure student engagement, using surveys.

Finally, we should learn from other countries and develop Quality School Reviews by teams of seasoned educators. (The British approach, in which teams visit schools for two to three days and rate them on things such as “quality of teaching, learning and assessment,” “outcomes for pupils,” “early years provision,” “effectiveness of leadership and management” and “personal development, behavior and welfare,” offers a good model. I will delve into this topic in my next piece in this series).

ESSA allows all of this.

An ideal state rating and accountability system

Statewide rating systems for almost all public schools should have five or six basic elements, weighted roughly as follows. I say “roughly” because there is no way to establish the scientifically “correct” percentage for each element. In my view, quality reviews are more revealing and cover more ground than parental surveys about student engagement, so they should be given more weight. And for high schools, outcomes are more important than both. The numbers I propose are my best estimates, but as we use some of these measures, we will learn more about their accuracy and value and perhaps refine our sense of how best to balance them.

ESSA requires that states use English learner progress toward proficiency, but I have not specified a recommended weight because it should vary by school. In schools with many English learners, it would be quite important; in others, with none, it would not. Finally, the balance between achievement and growth should depend on which method states use to measure growth; with some, achievement and growth can be combined in one score.

For high schools:

Student academic achievement: 20 percent
Student academic growth: 25 percent
English learner progress toward proficiency: variable
Student engagement: 10 percent
School Quality Reviews: 15 percent
Student outcomes: 25 percent

Since they do not have measurable student outcomes other than test scores, elementary and middle schools would use only five elements:

Student academic achievement: 20 percent
Student academic growth: 30 percent
English learner progress toward proficiency: variable
Student engagement: 10-20 percent
School Quality Reviews: 20-30 percent

States could combine numerous indicators to measure these elements:

Student academic growth and achievement indicators:
- Test scores in math, reading, writing, science and the social sciences
- For English learners, scores on tests designed to measure their progress in learning English
- PSAT, SAT, ACT and/or state-approved international test scores
- Industry certifications
Student engagement:
- Parent surveys
Student outcomes, for high schools only:
- Graduation rate: four-year, five- to seven-year and with GED
- Quality of diplomas, if states offer different levels
- Percentage of graduates enrolling in college
- Percentage of enrollees required to take remedial college classes
- College persistence to second and third years
- Percentage of graduates earning a two-year college degree or credential
- Percentage of non-college-bound graduates employed, in training, or in the military
- Income levels for non-college-bound graduates employed full-time

Since there is a certain amount of variability from year to year, particularly in test scores for the same students, accountability ratings should average two years of data whenever possible. Test scores from students who arrive at a school more than six weeks into an academic year should not be included. Schools should not be held accountable — or even measured — based on students they did not have an opportunity to educate for at least six months before a test.

States must ensure that the data is audited, analyzed and spot-checked, to detect foul play. Districts and schools have been caught cheating on standardized tests and manipulating attendance, graduation and dropout rates.

Some commonly used measures are not ready for prime time

There is a big difference between measurement and accountability. There are many things states should measure and publicize but not hold schools accountable for, including:

Attendance rates. These are an important measure of student engagement, but they are easily manipulated by schools and difficult to audit effectively. Making them cheat-proof would be quite expensive, so while we should collect the data, we should not tie consequences to it. Yet, 32 states make this mistake, a few giving attendance 15 to 20 percent of the weight in their accountability ratings. This may well lead to cheating scandals. Using this indicator is both foolish and unnecessary; if a school has poor attendance, that will show up in test scores and other measurers.

Student surveys. At least seven states include student survey results in their accountability systems. These may lead to pressure on students to emphasize the positive. (If you’ve ever bought a car and had the salesmen tell you to expect a phone call asking you to rate his performance — and that his entirebonus rests on your answer — you’re familiar with the problem.) Some students may also use their answers to punish teachers who are more demanding or tougher graders, a phenomenon well known to college professors.

The solution is to use parent surveys. Parents are more likely to express their true feelings about their children’s schools and less likely to comply with principals’ and teachers’ wishes than their children are. Only Idaho uses parent surveys, however.

Rather than counting student surveys toward accountability, states should require that the data be collected and distributed to schools and parents, to help schools improve and families choose appropriate schools for their children. Meanwhile, states should fund research on student surveys used for accountability, to learn where the pitfalls lie and how to overcome them.

Student demand. In some places, demand reflects parental judgments about a school’s quality. But many district schools do not allow families who live outside their zones to even apply, and some schools of choice are designed for specialized populations — for example, pregnant students, dropouts or overage students. It would be silly to punish such schools because demand was low or dropping, since lower pregnancy or dropout rates might be a sign of success.

Retention rates. Some districts and charter school authorizers measure student retention — another interesting statistic that should not be attached to consequences. Some schools lose students because they are more demanding than neighboring schools; should we punish them for their rigor?

Discipline rates. ESSA requires all states to report in- and out-of-school suspension rates, and two states, California and West Virginia, include those in their accountability systems. Publishing data on discipline rates is useful to keep schools honest, to encourage them to recognize the trauma that often underlies student misbehavior and to nudge them to use methods such as restorative justice rather than suspensions and expulsions. But we need to leave judgments about discipline up to the people who run schools. Students in one school may disrupt class frequently, so high rates of discipline may be required to ensure that their peers can learn uninterrupted. Students in another school may rarely disrupt class and thus need little disciplinary action. Any effort to punish schools for high disciplinary rates would undermine their ability to deal with the realities in their classrooms.

Here are a few more items states should measure and publicize but not hold schools accountable for:

Teacher absenteeism
Student-teacher ratios
Teacher retention

A few states and districts are experimenting with qualitative assessments of student performance, such as portfolios of work. The challenge they face is ensuring that all such assessments use an objectively equal scale, a difficult task given that they must be done by individuals making subjective judgments. Limiting variability across an entire state will be an enormous challenge — but one that states should be encouraged to work on.

There are also many questions about how to assess social-emotional skills. Again, some states and districts are experimenting with this, and they should be encouraged. But no states should include such assessments in their accountability systems until much more research has been done.

David Osborne is author of Reinventing America’s Schools: Creating a 21st Century Education System, which includes a more in-depth discussion of how to measure school quality and hold schools accountable. He leads the K-12 education work of the Progressive Policy Institute.

Get stories like these delivered straight to your inbox. Sign up for The 74 Newsletter

Republish This Article Learn More

David Osborne, author of Reinventing America’s Schools: Creating a 21st Century Education System, Reinventing Government, and other books about modernizing our public institutions, recently retired as director of the Progressive Policy Institute’s K-12 education work.

@OsborneDavid contributors@the74million.org

Republish This Article

We want our stories to be shared as widely as possible — for free.

Please view The 74's republishing terms.


                <h1>Osborne: States Still Rely Too Heavily on Test Scores to Hold Schools Accountable. Here’s a Better Way for Them to Break It All Down</h1>

                <h2></h2>

                <p class="sans">By <a rel="author" href="https://www.the74million.org/contributor/20218/">David Osborne</a></p>

                <img src="https://www.the74million.org/wp-content/uploads/2021/02/pie-charts-osborne-9.png">

                <p>This story first appeared at <a href="https://www.the74million.org">The 74</a>, a nonprofit news site covering education. <a href="https://www.the74million.org/about/newsletters/?utm_source=republish-button&utm_medium=website&utm_campaign=republish">Sign up for free newsletters from The 74</a> to get more like this in your inbox.</p>
                <p><span class="drop-cap">D</span>espite heated rhetoric to the contrary, most Americans think we need standardized tests, to make sure kids are learning the basics. Last year, <a href="https://pdkpoll.org/">61 percent</a> of adults surveyed by Gallup and Phi Delta Kappa thought it appropriate to use test scores as a main factor in judging school quality. But in a previous version of the survey, five years ago, <a href="http://www.fsba.org/wp-content/uploads/2014/01/PDK-Gallup-Poll-2015.pdf">most respondents said other indicators, such as graduation rates, employment rates, and student engagement, were more important.</a></p>
<p>There is a lot of wisdom here. We need standardized tests to see if students are learning to read, do math, write, and understand science and history. If we don’t measure such things, how will we know which schools are failing and need to be replaced?</p>
<p>But for the last two decades, heavy reliance on test scores has encouraged cookie-cutter schools focused on preparing students for tests. Instead, we need diverse schools that cultivate the joy of learning, engage students in meaningful thinking and help them develop the character skills — such as conscientiousness and self-control — that lead to success in life.</p>
<p>Though the Every Student Succeeds Act leaves states free to handle accountability as they please while requiring measurement systems, <a href="http://ecs.force.com/mbdata/mbQuest5E?rep=SA172">the most recent study by the Education Commission of the States</a> showed that at least 40 states still gave test scores 80 percent or more of the weight in their elementary and middle school rating systems. In high school accountability ratings, about 20 states still assigned 70 percent or more of the weight to test scores. These numbers are far too high. States are waiving accountability during the pandemic, but the overemphasis on test scores will distort school quality ratings again when the world returns to normal.</p>
<p>Relying too heavily on test scores creates myriad problems. We all know people who perform well in life and work but did not test well, because of stress, learning disabilities, trauma or other challenges. Many of us have also seen questions that unconsciously assume the test takers are from a white, middle-class culture. And standardized tests often give misleading signals about students who are still learning English or who speak a dialect of English. Even if tests did not put such kids at a disadvantage, they tempt schools to prioritize rote learning to drive up scores, in the process undermining children’s natural love of learning.</p>
<p>Finally, heavy reliance on testing in accountability systems can discourage creation of schools for particularly challenging students, such as dropouts, children with disabilities or those who don’t speak English. They also discourage schools from trying new methods, such as project-based learning, student internships or career and technical education, that might deepen learning but hurt test scores. We need more innovation in education, not less; we need to encourage educators to start schools aimed at students who do not thrive in cookie-cutter environments.</p>
<p>Because I think test scores are important but tell only half the story about a school, I believe we should give them about half the weight in state accountability systems. (The main focus should be on academic growth, not current proficiency levels — another area in which states have improved under ESSA but need more progress.) For high schools, part of that other half should focus on outcomes, such as graduation rates, college entrance and persistence, and training and employment for graduates who don’t choose college. For all schools, we should also measure student engagement, using surveys.</p>
<p>Finally, we should learn from other countries and develop Quality School Reviews by teams of seasoned educators. (The British approach, in which teams visit schools for two to three days and rate them on things such as “quality of teaching, learning and assessment,” “outcomes for pupils,” “early years provision,” “effectiveness of leadership and management” and “personal development, behavior and welfare,” offers a good model. I will delve into this topic in my next piece in this series).</p>
<p>ESSA allows all of this.</p>
<p><span class="pull-quote">An ideal state rating and accountability system</span></p>
<p>Statewide rating systems for almost all public schools should have five or six basic elements, weighted roughly as follows. I say “roughly” because there is no way to establish the scientifically “correct” percentage for each element. In my view, quality reviews are more revealing and cover more ground than parental surveys about student engagement, so they should be given more weight. And for high schools, outcomes are more important than both. The numbers I propose are my best estimates, but as we use some of these measures, we will learn more about their accuracy and value and perhaps refine our sense of how best to balance them.</p>
<p>ESSA requires that states use English learner progress toward proficiency, but I have not specified a recommended weight because it should vary by school. In schools with many English learners, it would be quite important; in others, with none, it would not. Finally, the balance between achievement and growth should depend on which method states use to measure growth; with some, achievement and growth can be combined in one score.</p>
<p><a href="https://www.the74million.org/wp-content/uploads/2021/02/pie-charts-osborne-10-e1612985002513.png"><img decoding="async" class="aligncenter wp-image-567994 size-full" src="https://www.the74million.org/wp-content/uploads/2021/02/pie-charts-osborne-10-e1612985002513.png" alt="" width="600" height="485"></a></p>
<p>For high schools:</p>
<ul>
<li aria-level="1">Student academic achievement: 20 percent</li>
<li aria-level="1">Student academic growth: 25 percent</li>
<li aria-level="1">English learner progress toward proficiency: variable</li>
<li aria-level="1">Student engagement: 10 percent</li>
<li aria-level="1">School Quality Reviews: 15 percent</li>
<li aria-level="1">Student outcomes: 25 percent</li>
</ul>
<p><a href="https://www.the74million.org/wp-content/uploads/2021/02/pie-charts-osborne-11-e1612985043331.png"><img decoding="async" class="aligncenter wp-image-567993 size-full" src="https://www.the74million.org/wp-content/uploads/2021/02/pie-charts-osborne-11-e1612985043331.png" alt="" width="600" height="485"></a></p>
<p>Since they do not have measurable student outcomes other than test scores, elementary and middle schools would use only five elements:</p>
<ul>
<li aria-level="1">Student academic achievement: 20 percent</li>
<li aria-level="1">Student academic growth: 30 percent</li>
<li aria-level="1">English learner progress toward proficiency: variable</li>
<li aria-level="1">Student engagement: 10-20 percent</li>
<li aria-level="1">School Quality Reviews: 20-30 percent</li>
</ul>
<p>States could combine numerous indicators to measure these elements:</p>
<ul>
<li aria-level="1">Student academic growth and achievement indicators:
<ul>
<li aria-level="2">Test scores in math, reading, writing, science and the social sciences</li>
<li aria-level="2">For English learners, scores on tests designed to measure their progress in learning English</li>
<li aria-level="2">PSAT, SAT, ACT and/or state-approved international test scores</li>
<li aria-level="2">Industry certifications</li>
</ul>
</li>
<li aria-level="1">Student engagement:
<ul>
<li aria-level="2">Parent surveys</li>
</ul>
</li>
<li aria-level="1">Student outcomes, for high schools only:
<ul>
<li aria-level="2">Graduation rate: four-year, five- to seven-year and with GED</li>
<li aria-level="2">Quality of diplomas, if states offer different levels</li>
<li aria-level="2">Percentage of graduates enrolling in college</li>
<li aria-level="2">Percentage of enrollees required to take remedial college classes</li>
<li aria-level="2">College persistence to second and third years</li>
<li aria-level="2">Percentage of graduates earning a two-year college degree or credential</li>
<li aria-level="2">Percentage of non-college-bound graduates employed, in training, or in the military</li>
<li aria-level="2">Income levels for non-college-bound graduates employed full-time</li>
</ul>
</li>
</ul>
<p>Since there is a certain amount of variability from year to year, particularly in test scores for the same students, accountability ratings should average two years of data whenever possible. Test scores from students who arrive at a school more than six weeks into an academic year should not be included. Schools should not be held accountable — or even measured — based on students they did not have an opportunity to educate for at least six months before a test.</p>
<p>States must ensure that the data is audited, analyzed and spot-checked, to detect foul play. Districts and schools have been caught cheating on standardized tests and manipulating attendance, graduation and dropout rates.</p>
<p><span class="pull-quote">Some commonly used measures are not ready for prime time</span></p>
<p>There is a big difference between measurement and accountability. There are many things states should measure and publicize but not hold schools accountable for, including:</p>
<p><strong>Attendance rates. </strong>These are an important measure of student engagement, but they are easily manipulated by schools and difficult to audit effectively. Making them cheat-proof would be quite expensive, so while we should collect the data, we should not tie consequences to it. Yet, <a href="http://ecs.force.com/mbdata/mbQuest5E?rep=SA172">32 states make this mistake</a>, a few giving attendance 15 to 20 percent of the weight in their accountability ratings. This may well lead to cheating scandals. Using this indicator is both foolish and unnecessary; if a school has poor attendance, that will show up in test scores and other measurers.</p>
<p><strong>Student surveys. </strong><a href="https://www.future-ed.org/wp-content/uploads/2019/12/FutureEdSchoolClimateReport.pdf">At least seven states include student survey results in their accountability systems</a>. These may lead to pressure on students to emphasize the positive. (If you’ve ever bought a car and had the salesmen tell you to expect a phone call asking you to rate his performance — and that his <em>entire</em><em>bonus</em> rests on your answer — you’re familiar with the problem.) Some students may also use their answers to punish teachers who are more demanding or tougher graders, a phenomenon well known to college professors.</p>
<p>The solution is to use parent surveys. Parents are more likely to express their true feelings about their children’s schools and less likely to comply with principals’ and teachers’ wishes than their children are. <a href="http://ecs.force.com/mbdata/mbQuest5E?rep=SA172">Only Idaho uses parent surveys</a>, however.</p>
<p>Rather than counting student surveys toward accountability, states should require that the data be collected and distributed to schools and parents, to help schools improve and families choose appropriate schools for their children. Meanwhile, states should fund research on student surveys used for accountability, to learn where the pitfalls lie and how to overcome them.</p>
<p><strong>Student demand. </strong>In some places, demand reflects parental judgments about a school’s quality. But many district schools do not allow families who live outside their zones to even apply, and some schools of choice are designed for specialized populations — for example, pregnant students, dropouts or overage students. It would be silly to punish such schools because demand was low or dropping, since lower pregnancy or dropout rates might be a sign of success.</p>
<p><strong>Retention rates. </strong>Some districts and charter school authorizers measure student retention — another interesting statistic that should not be attached to consequences. Some schools lose students because they are more demanding than neighboring schools; should we punish them for their rigor?</p>
<p><strong>Discipline rates. </strong>ESSA requires all states to report <a href="https://learningpolicyinstitute.org/product/essa-equity-promise-interactive">in- and out-of-school suspension rates</a>, and two states, California and West Virginia, include those in their accountability systems. Publishing data on discipline rates is useful to keep schools honest, to encourage them to recognize the trauma that often underlies student misbehavior and to nudge them to use methods such as restorative justice rather than suspensions and expulsions. But we need to leave judgments about discipline up to the people who run schools. Students in one school may disrupt class frequently, so high rates of discipline may be required to ensure that their peers can learn uninterrupted. Students in another school may rarely disrupt class and thus need little disciplinary action. Any effort to punish schools for high disciplinary rates would undermine their ability to deal with the realities in their classrooms.</p>
<p>Here are a few more items states should measure and publicize but not hold schools accountable for:</p>
<ul>
<li aria-level="1">Teacher absenteeism</li>
<li aria-level="1">Student-teacher ratios</li>
<li aria-level="1">Teacher retention</li>
</ul>
<p>A few states and districts are experimenting with qualitative assessments of student performance, such as portfolios of work. The challenge they face is ensuring that all such assessments use an objectively equal scale, a difficult task given that they must be done by individuals making subjective judgments. Limiting variability across an entire state will be an enormous challenge — but one that states should be encouraged to work on.</p>
<p>There are also many questions about how to assess social-emotional skills. Again, some states and districts are experimenting with this, and they should be encouraged. But no states should include such assessments in their accountability systems until much more research has been done.</p>
<p><em>David Osborne is author of Reinventing America’s Schools: Creating a 21st Century Education System, which includes a more in-depth discussion of how to measure school quality and hold schools accountable. He leads the K-12 education work of the Progressive Policy Institute.</em></p>
<aside class="inline_story shortcode simple"><a href="https://www.the74million.org/newsletter/"><figure style="background-image: url(https://www.the74million.org/wp-content/uploads/2022/05/the74million_logo.png);"></figure><div><span class="sans related_tag">Related</span><h4 class="sans">Sign up for The 74’s newsletter</h4></div></a></aside>

Contact Us

Follow Us

Explore

Osborne: States Still Rely Too Heavily on Test Scores to Hold Schools Accountable. Here’s a Better Way for Them to Break It All Down

Untangle Your Mind!

Most Popular

We Started Grouping Students by Reading Ability vs. Grade. Here’s What Happened

New Report: States Need to Up Their Game on Preparing Elementary Math Teachers

More Than a Third of Homeschool Families Also Use Public Schools, New Data Shows

Tiny Indiana District With Online School Worth Millions Ordered To Close

It’s Time to Reject Chronic Absenteeism as the New Normal in Student Attendance

On The 74 Today