States will be required to give every school a “comprehensive, summative” score rating its performance under draft regulations
proposed by the Department of Education for implementing the nation’s new education law.
Some educators and analysts are skeptical that such an approach is supported by evidence. Josh Starr, the head of Phi Delta Kappa International and former Montgomery County Schools superintendent, tweeted
as much. A report
from the University of Colorado’s National Education Policy Center released last year claimed, that A–F school report cards “merit a failing grade.”
In fact, there is evidence that suggests giving grades to schools has an impact on student outcomes, at least for schools that get the lowest grade. According to research in New York City and Florida, these schools respond by improving their students’ test scores — and it doesn’t appear that gains are just the result of gaming the system or teaching to the test.
School grades lead to gains in New York City and Florida
In 2014, New York City removed
letter grades from its school report cards in favor of a dashboard approach
providing disaggregated ratings on a variety of measures, including test scores, strength of instruction, and quality of school environment.
Two earlier studies
published in peer-reviewed journals showed that, under the old system, students in New York City schools that received an ‘F’ made somewhat larger than expected test score gains the following year.1
The researchers were able to determine the impact of the letter grade by comparing schools near the score cutoff between an F and a D or C — schools that are similar, with the only difference being how they were labeled.2
from University of Colorado at Colorado Springs Professor Marcus Winters, released by the Manhattan Institute, a conservative think tank, uses more recent data that once more shows students in F schools, under the old system, making significant test score improvements.3
Winters then looked at the new system, reconstructing the report cards using 2014 data and determine the grades schools would have received under the previous regime. Based on 2015 results, he finds that once letter grades stopped being used, the test score bump associated with them disappeared.
Devora Kaye, a spokesperson for the New York City Department of Education, said in an email, “Letter grades were misleading and oversimplified school quality which is why the new Snapshot evaluates schools using multiple measures and more data so families have a complete picture of a school.”
The research in New York City generally squares with multiple studies from
Florida, which also show test scores increased after schools received an F.
Gaming or learning?
Some observers have questioned whether the gains were the result of greater learning — especially given a good deal of evidence that evaluation by test scores can lead to unintended consequences like cheating
and teaching to the test
We can’t say for certain but the research gives us reason to believe that the gains were at least in part educationally meaningful.
One of the New York studies
showed increased parental satisfaction, measured by district-administered schools surveys, along with the rise in test scores. The schools tended to spend more time on direct instruction, which some research
has found to be quite effective. On the other hand, student satisfaction in schools dropped as this shift occurred.
Another of the New York City studies
found test scores gains persisted two years after schools received an F rating, suggesting that the improvement wasn’t solely the result of interventions like short-term test-taking strategies.
In perhaps the most comprehensive review of school grades, a Florida study
found that after receiving a failing grade4
, schools “appear to focus on low-performing students, lengthen the amount of time devoted to instruction, adopt different ways of organizing the day and learning environment of the students and teachers, increase resources available to teachers, and decrease principal control.” In response to accountability pressure, in other words, schools didn’t just start teaching to the test; they made significant changes. Moreover, test score gains persisted three years after the initial improvement.
of Florida found that improvements on low-stakes exams were about half the size of those on the high-stakes tests used for accountability purposes. This suggests that gains may have resulted from some combination of “gaming” and meaningful improvement.
Unanswered questions, unintended consequences
This research should not be read as closing the case on whether giving letter grades to schools is a good idea.
Perhaps the biggest caveat is that studies have generally focused on the impact of grades on low-performing schools. It’s harder to know how the report card approach affects the system as a whole; that is, the research can show that F schools make more gains than D schools, but it doesn’t tell us the aggregate effect of using letter grades as opposed to a different system. (However, research has found
that stringent accountability systems generally tend to improve overall student achievement.)
There also may be unintended consequences of school letter grades. For instance, New York City faced frequent complaints that letter grades bounced around
significantly from year to year for no apparent reason; there is also evidence
that the system reduced parental support for higher standards when they led to lower school grades.
In Florida, one study
showed that a failing grade created a significant increase in teacher turnover — particularly among the most effective teachers. (The research also found that teachers who remain showed improvement, consistent with the studies that finding gains in student achievement.)
There are many factors to consider when designing an accountability system; reasonable people can disagree about how to weigh the costs and benefits of different approaches. It is clear, though, that by by some measures, students in low-performing schools appear to benefit from grading schools.
Disclosure: The Walton Family Foundation, which is a funder of The 74, also funded the Manhattan Institute study on New York City school accountability
1. One study showed gains in both math and English with larger increases in math; the other showed gains in English, but no gains in math. (return to story)
2. Some may be wondering whether the gains are simply the result of “regression to the mean.” That’s quite unlikely because the researchers approach compared schools with relatively similar starting levels of achievement. (return to story)
3. The gains Winters finds are statistically significant but about half the size of those found in one of the previous studies. (return to story)
4. A failing letter grade also meant that students at the school had the opportunity to use a voucher to attend a private school. The study examined the combined impact of the letter grade and this ‘voucher threat’ — though other research suggests that most of the school improvement was due to stigma associated with the failing grade. (return to story)