Lerum: Study Found Teacher Evaluation Reforms Had No Effect on Student Outcomes. But That Means Doing Them Better, Not Giving Up
Support The 74's year-end campaign. Make a tax-exempt donation now.
Researchers from Brown University, the University of Connecticut, the University of North Carolina and Michigan State recently released a very interesting study that examined the effects of teacher evaluation reforms on student outcomes across the country. While prior studies have looked at the effects of changes to evaluation in various individual districts, this new research is the first to analyze the overall impact nationally.
In short, the researchers found, on average, no effect on student outcomes, including not only student achievement in math and reading, but also high school graduation and college-going rates. This null effect was found across subgroups.
The results held even when researchers accounted for the rigor of design of the evaluation system. Whether the design was intended to use more reliable measures than once-a-year, drop-in classroom visits (like structured observations that follow a rubric, or student growth), establish incentives and accountability for performance (e.g. bonuses or tying evaluation to tenure) or provide meaningful feedback and inform professional development, researchers found the same thing — there was no change in student outcome.
Matthew Kraft, one of the researchers from Brown, offered some additional insights regarding their findings on social media.
These results serve as a kind of Rorschach test for how we think about scaling reforms:— Matthew A. Kraft (@MatthewAKraft) November 30, 2021
They can be viewed as an example of weak implementation & lack of sustained commitment
as generalizable evidence that evaluation failed to move the needle in most contexts.
So, what to make of it?
Certainly, the results are hugely disappointing and cause for reflection. Those of us supporting and advocating for these reforms for years certainly weren’t doing so to barely move the needle in terms of improved student outcomes.
At the same time, I think it would be a mistake to write off teacher evaluation as a reform that’s not worth pursuing.
First, the research team confirmed previous findings demonstrating positive effects of new evaluation systems in a number of districts, like the District of Columbia and Newark. NCTQ observed that leaders in these districts used multiple measures to evaluate performance, linked evaluations to consequences (both positive and negative) and consistently assessed and improved their new systems. In other words, it’s not a fluke that evaluation can work, and it’s reasonable to assume it’s a combination of design and leadership that’s committed to thoughtful implementation.
It’s important to remember that these reforms were always meant to be part of a package — new and more measures, better teacher observation protocols, strong evaluator training, regular feedback, linked and personalized professional development, compensation tied to performance and consequences for poor performance, to name a few.
My sense is that very few places put in place all the pieces (design) and did them well (implementation), as D.C. and other jurisdictions have done. For instance, D.C.’s IMPACT system is highly regarded for its thoughtful design, strong connections to career and compensation advancement, and robust process for regularly collecting, analyzing and incorporating teacher and principal feedback on implementation. We should study these outliers and learn from them — what are the elements that made them work? What are the factors common among those that didn’t?
Second, while raising student achievement was a primary objective of strengthening teacher evaluations, it was never the only goal. In every case I can think of, a big part of the push was the idea that teachers deserved better evaluations than the binary ones they were almost universally given 15 years ago.
Teachers deserve meaningful evaluation, based on their contribution to student learning (given it’s their primary job). They deserve to have feedback and opportunities — such as leadership roles and special recognition — rather than being treated exactly the same. Teachers deserve to be recognized for something other than their start date, and they deserve to be paid based at least in part on the impact they have in the classroom.
Better evaluations aren’t a luxury item, and they’re not some reform folly — they’re essential management components for a workforce of professionals.
Third, we’ve learned that the principal’s role is undervalued and that school leaders require more and better training to help them differentiate performance meaningfully among their teacher corps.
Principals also need — and in many places didn’t get — cover for using evaluations to make hard decisions about opportunities for their staff. If a principal knows that a negative evaluation and/or recommendation for removal from the classroom will trigger endless appeals, scrutiny and the possibility that nothing will ultimately happen, what really is the incentive to take that challenge on? If the consequences don’t actually bear out any differences, why rate one teacher better or worse than another? Why not rate them all the same?? We’ve seen this dynamic (which the authors of this study mention) play out in state after state where only minuscule percentages of teachers are ever rated poorly, despite student outcomes that reveal a lack of effectiveness.
Finally, it’s likely impossible to capture in this study the negative effects of the pushback, obstacle-throwing and combative situations that existed in many states and districts, but it’s reasonable to assume it’s substantial.
Many places didn’t get started with their reforms in earnest until years after Race to the Top. And once started, there was often anything but the type of collaboration witnessed during the implementation of IMPACT in DC. Teacher evaluation policy has been a reform constantly under attack and threat of being rolled back, rather than one where folks came together and moved forward.
To be clear, none of this is to say the findings of the new study aren’t both legitimate and disturbing — they are. And kudos to the researchers for taking this project on. This is information and data that we need.
I just hope that everyone — educators, advocates, policymakers and funders alike — doesn’t take this as an easy exit from the conversation about evaluating teachers. Teacher quality matters. In order to recognize it and improve, we have to evaluate performance meaningfully.
Eric Lerum is chief operating officer at America Succeeds, responsible for ensuring the organization meets its goals and maximizes its impact as it grows the network of business-led advocacy partners.
Support The 74's year-end campaign. Make a tax-exempt donation now.