Explore

5 Key Lessons from the Successes (and Failures) of President Obama’s Teacher Evaluation Reforms

By Matt Barnum

April 3, 2016

Untangle Your Mind!

Most Popular

literacy
We Started Grouping Students by Reading Ability vs. Grade. Here’s What Happened
core topics
New Report: States Need to Up Their Game on Preparing Elementary Math Teachers
school choice
More Than a Third of Homeschool Families Also Use Public Schools, New Data Shows
Indiana
Tiny Indiana District With Online School Worth Millions Ordered To Close
commentary
It’s Time to Reject Chronic Absenteeism as the New Normal in Student Attendance

Lessons for the post-NCLB era.

The passage of the Every Student Succeeds Act and the waning of the Obama administration brings to a close federal efforts to improve teacher evaluation — a practice once widely derided for its infrequent and pro forma observations, inflated ratings, and lack of consequences.

Today most states combine different measures, including classroom observations and student test data, to produce a rating that describes effectiveness. But problems with the system persist.

Research by Matt Kraft of Brown University and Allison Gilmour of Vanderbilt University confirm other evidence that in most of the country new teacher evaluation systems still rate the vast majority of teachers effective — even though uniformly high ratings in the past were part of the impetus for creating new systems.

Based on this study, the American Enterprise Institute’s Rick Hess declared “all that time, money, and passion” dedicated to teacher evaluation “haven’t delivered much.” The Shanker Institute’s Matt Di Carlo also pointed out that evaluation systems can’t be judged primarily on how many low-performing teachers they identify.

(More: The War Over Evaluating Teachers — Where It Went Wrong and How It Went Right)

Another report, “Beyond Ratings,” by Kaylan Connelly and Melissa Tooley of the New America Foundation, argues that evaluation systems need to better calibrated to enhance professional growth and development. A paper from the Aspen Institute lays out a 10-lesson “roadmap for improvement” on teacher evaluation.

Meanwhile, Georgetown University’s Thomas Toch has taken to The Washington Monthly, Education Next, and The Atlantic to defend the Obama administration’s accomplishments. Toch argues that “state and local studies, teacher surveys, and other evidence reveals that many of the new [teacher evaluation] systems have been much more beneficial than the union narrative would suggest.”

So, with the political fight moving to the states, what can we learn from the research debate? Here are five key lessons policymakers should consider as we head into (another) brave new world of teacher evaluation.

Kraft and Gilmour’s study not only documents the high marks teachers in many states are earning, but also asks principals in one district why they tend to grade teachers on a generous curve. What they find is revealing: Principals say they are worried about finding better teachers to replace low-performers, don’t like telling teachers they’re not doing well, fear that a low rating will damage a teacher’s morale, lack time to remediate, and are daunted by difficult-to-navigate teacher dismissal processes.

The new wave of evaluation systems don’t seem to have addressed these concerns sufficiently. If districts want better-differentiated teacher ratings — important for targeting professional development and making smart personnel decisions — they need to confer with principals to ensure new programs are useful to the school leaders who will be implementing them.

Under Obama, states were strongly incentivized by the federal government to use test scores in teacher evaluation; the vast majority of states now do.

The problem with this approach was that while every state tests students in grades 3–8 in reading and math, there were few standardized assessments in other grades and subjects. Consequently, new tests — of generally unknown reliability or validity — materialized around the country for rating physical education, social studies, and first-grade teachers, among others. In some areas teacher ratings have been based on school averages or on test scores in subjects the teacher didn’t teach — prompting confusion, outrage, and multiple lawsuits.

Using test scores for all teachers was poor policy and proved to be even worse politics for reformers — it exacerbated the anti-testing backlash and contributed to the rollback of federal power in new education law. That, in turn, has led many state policymakers to try to reduce or remove student growth from teacher evaluation systems.

There is a simple solution to the problem of overtesting and unfair attribution of test score: Evaluate teachers by test scores only if there is a valid test to do so — one that rigorously isolates a teacher’s impact on student growth. Hastily creating new assessments is usually unwise.

The New America report says, “For the most part, states have prioritized getting evaluation systems up and running and are only beginning to think about using them to promote ongoing teacher learning and growth.”

The research has not yet clearly identified how to use teacher evaluation systems as a tool for improving teacher practice. However, there is encouraging new evidence that when highly rated teachers work with poorer performers the latter group improves.

Another study found that Chicago’s teacher evaluation pilot, which provided extensive training for principals to revamp how they observed teachers, had a positive impact on student achievement in its first year (but not in its second when it expanded but received less budgetary and central office support for school leaders).

While there’s still a lot we need to learn, it’s clear that states and districts should create systems to help struggling teachers improve, provide support and training for evaluators, and not expect to get this done on the cheap.

Most states have systems that assign a fixed value to each part of the evaluation.¹ For example, 50 percent might be based on principal observations, 35 percent on student test scores, and 15 percent on student surveys. Sum the separate scores and out pops a rating.

It’s not obvious that this is best way to do things, though. It constrains the principal’s judgment and discretion: she may believe a component of the evaluation to be misleading, for instance, but can do nothing to adjust it.

Some may argue that a mechanical model provides needed principal-proofing, but there is research suggesting that principals typically make smart personnel decisions. Given their accountability for school performance, it’s worth experimenting with less rigid systems that engender rather than diminish principal autonomy.

Many pundits suggest that tougher accountability and evaluation systems have contributed to what some see as a nationwide teacher shortage. There is zero empirical evidence to support this claim, to my knowledge.

However, it is certainly possible that recent evaluation systems have made teaching less appealing in some circumstances — high-poverty schools, for instance, which already often struggle to recruit and retain teachers in part because of poor working conditions. Teachers in these schools are generally at greater risk of being identified as low-performing, and potentially fired, under new evaluation systems. Making the teaching profession riskier, in perception or reality, may make it less appealing.

Some lessons may be drawn from Washington, D.C., which has been among the most aggressive in identifying and dismissing struggling teachers in disadvantaged schools. Researchers have found that the district has been able to replace poor performers with better ones, perhaps in part because of high salaries differentiated by performance and school population. D.C. public schools have also developed performance screens when hiring that seem to be helpful in determining who will be effective in the classroom.

Districts with aggressive evaluation systems that generate more teacher dismissals should pay particular attention to this issue, and ought to consider pairing evaluation reform with higher salaries or other efforts to make the job more appealing.

Footnotes:

1. A handful of states use a ‘matrix’ model in which scores on two dimensions are combined to create a summative rating. This is essentially a cruder version of a percentage-based system.

Get stories like these delivered straight to your inbox. Sign up for The 74 Newsletter

Republish This Article Learn More

Matt Barnum is a senior staff writer at The 74.

@matt_barnum matt@the74million.org

Republish This Article

We want our stories to be shared as widely as possible — for free.

Please view The 74's republishing terms.


                <h1>5 Key Lessons from the Successes (and Failures) of President Obama’s Teacher Evaluation Reforms</h1>

                <h2></h2>

                <p class="sans">By <a rel="author" href="https://www.the74million.org/contributor/matt-barnum/">Matt Barnum</a></p>

                <img src="https://www.the74million.org/wp-content/uploads/2017/01/1459525101_2530.jpg">

                <p>This story first appeared at <a href="https://www.the74million.org">The 74</a>, a nonprofit news site covering education. <a href="https://www.the74million.org/about/newsletters/?utm_source=republish-button&utm_medium=website&utm_campaign=republish">Sign up for free newsletters from The 74</a> to get more like this in your inbox.</p>
                <div class="article__body">
<div class="listicle-intro">
<div class="article__paragraph">
<div class="article__intro" dir="ltr"><span id="docs-internal-guid-03913888-d2fa-e484-676e-089991612774">Lessons for the post-NCLB era.</span></div>
<div class="article__paragraph opening" dir="ltr"><span id="docs-internal-guid-03913888-d2fa-e484-676e-089991612774">The </span><a href="https://www.the74million.org/article/senators-praise-nclb-rewrite-for-preserving-standards-and-ending-waivers-clear-path-for-final-passage">passage</a> of the Every Student Succeeds Act and the waning of the Obama administration brings to a close federal efforts to <a href="https://www.the74million.org/article/the-war-over-evaluating-teachers-where-it-went-right-and-how-it-went-wrong">improve teacher evaluation</a> — a practice once <a href="http://www.aft.org/press-release/aft-statement-widget-effect-new-teacher-projects-report-teacher-evaluations" target="_blank">widely</a> <a href="http://tntp.org/publications/view/the-widget-effect-failure-to-act-on-differences-in-teacher-effectiveness" target="_blank">derided</a> for its infrequent and pro forma observations, inflated ratings, and lack of consequences.</div>
<div class="article__paragraph" dir="ltr"><span id="docs-internal-guid-03913888-d2fa-e484-676e-089991612774">Today </span><a href="https://www.the74million.org/article/new-report-highlights-the-teacher-evaluation-paradox-high-ratings-across-the-board-but-growing-backlash">most states</a> combine different measures, including classroom observations and student test data, to produce a rating that describes effectiveness. But problems with the system persist.</div>
<div class="article__paragraph" dir="ltr"><span id="docs-internal-guid-03913888-d2fa-e484-676e-089991612774"><a href="http://scholar.harvard.edu/files/mkraft/files/kraft_gilmour_2016_revisiting_the_widget_effect_wp.pdf?m=1456772152" target="_blank">Research</a></span> by Matt Kraft of Brown University and Allison Gilmour of Vanderbilt University confirm <a href="https://www.the74million.org/article/the-war-over-evaluating-teachers-where-it-went-right-and-how-it-went-wrong">other evidence</a> that in most of the country new teacher evaluation systems still rate the vast majority of teachers effective — even though uniformly high ratings in the past were part of the <a href="http://www2.ed.gov/news/speeches/2009/07/07022009.html" target="_blank">impetus</a> for creating new systems.</div>
<div class="article__paragraph" dir="ltr"><span id="docs-internal-guid-03913888-d2fa-e484-676e-089991612774">Based on this study, the American Enterprise Institute’s Rick Hess </span><a href="http://blogs.edweek.org/edweek/rick_hess_straight_up/2016/03/when_fancy_new_teacher_evaluation_systems_dont_make_a_difference.html" target="_blank">declared</a> “all that time, money, and passion” dedicated to teacher evaluation “haven’t delivered much.” The Shanker Institute’s Matt Di Carlo also <a href="http://www.shankerinstitute.org/blog/evaluating-results-new-teacher-evaluation-systems" target="_blank">pointed out</a> that evaluation systems can’t be judged primarily on how many low-performing teachers they identify.</div>
<div class="article__paragraph" dir="ltr"><span id="docs-internal-guid-03913888-d2fa-e484-676e-089991612774">(More:</span> The War Over Evaluating Teachers — <a href="https://www.the74million.org/article/the-war-over-evaluating-teachers-where-it-went-right-and-how-it-went-wrong">Where It Went Wrong and How It Went Right</a>)</div>
<div class="article__paragraph" dir="ltr"><span id="docs-internal-guid-03913888-d2fa-e484-676e-089991612774">Another </span><a href="https://static.newamerica.org/attachments/12744-beyond-ratings-3/NA_BeyondRatingsPaper.deba47a82ff04af2833cebdbeed0c3ab.pdf" target="_blank">report</a>, “Beyond Ratings,” by Kaylan Connelly and Melissa Tooley of the New America Foundation, argues that evaluation systems need to better calibrated to enhance professional growth and development. A <a href="http://www.aspeninstitute.org/sites/default/files/content/docs/pubs/Teacher_Evaluation_Support_Systems.pdf" target="_blank">paper</a> from the Aspen Institute lays out a 10-lesson “roadmap for improvement” on teacher evaluation.</div>
<div class="article__paragraph" dir="ltr"><span id="docs-internal-guid-03913888-d2fa-e484-676e-089991612774">Meanwhile, Georgetown University’s Thomas Toch has taken to </span><a href="http://www.washingtonmonthly.com/magazine/marchaprilmay_2016/ten_miles_square/how_obama_got_schooled059895.php" target="_blank">The Washington Monthly</a>, <a href="http://educationnext.org/some-new-teacher-evaluation-systems-do-make-a-difference/" target="_blank">Education Next</a>, and <a href="http://www.theatlantic.com/education/archive/2016/03/a-new-era-for-the-battle-over-teacher-evaluations/472602/" target="_blank">The Atlantic</a> to defend the Obama administration’s accomplishments. Toch argues that “state and local studies, teacher surveys, and other evidence reveals that many of the new [teacher evaluation] systems have been much more beneficial than the union narrative would suggest.”</div>
<div class="article__paragraph"><span id="docs-internal-guid-03913888-d2fa-e484-676e-089991612774">So, with </span><a href="https://www.the74million.org/article/the-war-over-evaluating-teachers-where-it-went-right-and-how-it-went-wrong">the political fight</a> moving to the states, what can we learn from the research debate? Here are five key lessons policymakers should consider as we head into (another) brave new world of teacher evaluation.</div>
</div>
</div>
<div style="clear: both;"></div>
<div class="listicle-item">
<div class="number-and-header"><span class="number"> 1 </span><br />
<span class="header"> Determine why so many teachers get high ratings — and address the root causes </span></div>
<div class="description">
<div class="article__paragraph">
<div class="article__paragraph">
<div class="article__paragraph">
<div class="article__paragraph opening" dir="ltr"><span id="docs-internal-guid-03913888-d2fa-68a6-8859-11dc020f8146">Kraft and Gilmour’s </span><a href="http://scholar.harvard.edu/files/mkraft/files/kraft_gilmour_2016_revisiting_the_widget_effect_wp.pdf?m=1456772152" target="_blank">study</a> not only documents the high marks teachers in many states are earning, but also asks principals in one district why they tend to grade teachers on a generous curve. What they find is revealing: Principals say they are worried about finding better teachers to replace low-performers, don’t like telling teachers they’re not doing well, fear that a low rating will damage a teacher’s morale, lack time to remediate, and are daunted by difficult-to-navigate teacher dismissal processes.</div>
<p><span id="docs-internal-guid-03913888-d2fa-68a6-8859-11dc020f8146">The new wave of evaluation systems don’t seem to have addressed these concerns sufficiently. If districts want better-differentiated teacher ratings — important for targeting professional development and making smart personnel decisions — they need to confer with principals to ensure new programs are useful to the school leaders who will be implementing them.</span></p>
</div>
</div>
</div>
</div>
</div>
<div class="listicle-item">
<div class="number-and-header"><span class="number"> 2 </span><br />
<span class="header"> Don’t use test scores to evaluate every teacher in every grade and subject </span></div>
<div class="description">
<div class="article__paragraph">
<div class="article__paragraph opening" dir="ltr"><span id="docs-internal-guid-03913888-d347-9860-ce79-8a70f3b0556b">Under Obama, states were strongly incentivized by the federal government to use test scores in teacher evaluation; </span><a href="https://www.the74million.org/article/new-report-highlights-the-teacher-evaluation-paradox-high-ratings-across-the-board-but-growing-backlash">the vast majority of states now do</a>.</div>
<div class="article__paragraph" dir="ltr"><span id="docs-internal-guid-03913888-d347-9860-ce79-8a70f3b0556b">The problem with this approach was that while every state tests students in grades 3–8 in reading and math, there were few standardized assessments in other grades and subjects. Consequently, new tests — </span><a href="https://www.the74million.org/article/no-gov-cuomo-didnt-kill-test-based-teacher-evals-in-fact-his-moratorium-could-fuel-more-testing">of generally unknown reliability or validity</a> — materialized around the country for rating physical education, social studies, and first-grade teachers, among others. In some areas <a href="http://www.slate.com/blogs/schooled/2015/07/13/evaluating_teachers_in_special_subjects_is_it_fair_to_grade_music_and_art.html" target="_blank">teacher ratings</a> have been based on school averages or on test scores in subjects the teacher didn’t teach — prompting confusion, outrage, and <a href="http://www.edweek.org/ew/section/multimedia/teacher-evaluation-heads-to-the-courts.html" target="_blank">multiple</a> <a href="http://nashvillepublicradio.org/post/tennessee-lawsuit-attacks-teacher-evaluations-based-students-they-don-t-teach#stream/0" target="_blank">lawsuits</a>.</div>
<div class="article__paragraph" dir="ltr"><span id="docs-internal-guid-03913888-d347-9860-ce79-8a70f3b0556b">Using test scores for all teachers was poor policy and proved to be even worse politics for reformers — it exacerbated the anti-testing backlash and contributed to the rollback of federal power in new education law. That, in turn, has led many state policymakers to try to </span><a href="http://aheadoftheheard.org/state-legislatures-attack-student-growth-in-teacher-evaluation/" target="_blank">reduce or remove</a> student growth from teacher evaluation systems.</div>
<div class="article__paragraph" dir="ltr"><span id="docs-internal-guid-03913888-d347-9860-ce79-8a70f3b0556b">There is a simple solution to the problem of </span><a href="https://www.the74million.org/article/obama-vows-to-reduce-testing-but-doing-so-may-be-harder-than-it-seems">overtesting</a> and unfair attribution of test score: Evaluate teachers by test scores only if there is a valid test to do so — one that rigorously <a href="https://www.the74million.org/flashcard/test-scores-and-teacher-evals-a-complex-controversy-explained/1">isolates</a> a teacher’s impact on student growth. Hastily creating new assessments is usually unwise.</div>
</div>
</div>
</div>
<div class="listicle-item">
<div class="number-and-header"><span class="number"> 3 </span><br />
<span class="header"> Take the professional growth aspect of teacher evaluation seriously — systematize it </span></div>
<div class="description">
<div class="article__paragraph">
<div class="article__paragraph opening" dir="ltr"><span id="docs-internal-guid-03913888-d349-0381-6dac-e4d452899eb5">The New America </span><a href="https://static.newamerica.org/attachments/12744-beyond-ratings-3/NA_BeyondRatingsPaper.deba47a82ff04af2833cebdbeed0c3ab.pdf" target="_blank">report</a> says, “For the most part, states have prioritized getting evaluation systems up and running and are only beginning to think about using them to promote ongoing teacher learning and growth.”</div>
<div class="article__paragraph" dir="ltr"><span id="docs-internal-guid-03913888-d349-0381-6dac-e4d452899eb5">The research has not yet clearly identified how to use teacher evaluation systems as a tool for improving teacher practice. However, there is encouraging new </span><a href="http://www.nber.org/papers/w21986" target="_blank">evidence</a> that when highly rated teachers work with poorer performers the latter group improves.</div>
<div class="article__paragraph" dir="ltr"><span id="docs-internal-guid-03913888-d349-0381-6dac-e4d452899eb5">Another </span><a href="http://educationnext.org/better-observation-make-better-teachers/" target="_blank">study</a> found that Chicago’s teacher evaluation pilot, which provided extensive training for principals to revamp how they observed teachers, had a positive impact on student achievement in its first year (but not in its second when it expanded but received less budgetary and central office support for school leaders).</div>
<div class="article__paragraph"><span id="docs-internal-guid-03913888-d349-0381-6dac-e4d452899eb5">While there’s still a lot we need to learn, it’s clear that states and districts should create systems to help struggling teachers improve, provide support and training for evaluators, and not expect to get this done on the cheap.</span></div>
</div>
</div>
</div>
<div class="listicle-item">
<div class="number-and-header"><span class="number"> 4 </span><br />
<span class="header"> Don’t rely on models that leave no room for principal discretion </span></div>
<div class="description">
<div class="article__paragraph">
<div class="article__paragraph">
<div class="article__paragraph opening" dir="ltr"><span id="docs-internal-guid-03913888-d34a-a1d0-6901-2b173bbfb0b7">Most states have systems that assign a fixed value to each part of the evaluation.<sup>1</sup></span> For example, 50 percent might be based on principal observations, 35 percent on student test scores, and 15 percent on student surveys. Sum the separate scores and out pops a rating.</div>
<p dir="ltr"><span id="docs-internal-guid-03913888-d34a-a1d0-6901-2b173bbfb0b7">It’s not obvious that this is best way to do things, though. It constrains the principal’s judgment and discretion: she may believe a component of the evaluation to be misleading, for instance, but can do nothing to adjust it. </span></p>
<p><span id="docs-internal-guid-03913888-d34a-a1d0-6901-2b173bbfb0b7">Some may argue that a mechanical model provides needed principal-proofing, but there is </span><a href="http://www.nber.org/papers/w15715" target="_blank">research</a> <a href="http://www.nber.org/papers/w16240" target="_blank">suggesting</a> that principals typically make smart personnel decisions. Given their accountability for school performance, it’s worth experimenting with less rigid systems that engender rather than diminish principal autonomy.</p>
</div>
</div>
</div>
</div>
<div class="listicle-item">
<div class="number-and-header"><span class="number"> 5 </span><br />
<span class="header"> Pay attention to how evaluation affects the teacher labor market </span></div>
<div class="description">
<div class="article__paragraph">
<div class="article__paragraph opening" dir="ltr"><span id="docs-internal-guid-03913888-d34b-eacb-f859-ed1c2bbd9914">Many </span><a href="http://www.newyorker.com/news/news-desk/three-places-obama-could-teach" target="_blank">pundits</a> suggest that tougher accountability and evaluation systems have contributed to what some see as a nationwide teacher shortage. There is zero empirical evidence to support this claim, to my knowledge.</div>
<p dir="ltr"><span id="docs-internal-guid-03913888-d34b-eacb-f859-ed1c2bbd9914">However, it is </span><a href="http://www.brookings.edu/blogs/brown-center-chalkboard/posts/2015/07/20-teacher-supply-bruno" target="_blank">certainly possible</a> that recent evaluation systems have made teaching less appealing in some circumstances — high-poverty schools, for instance, which already often struggle to recruit and retain teachers in part because of <a href="http://scholar.harvard.edu/files/mkraft/files/eaq_students_final_073115_-_author_copy.pdf?m=1450106855" target="_blank">poor working conditions</a>. Teachers in these schools are generally at greater risk of being identified as low-performing, and potentially fired, under new evaluation systems. Making the teaching profession riskier, in perception or reality, <a href="https://www.aeaweb.org/articles.php?doi=10.1257/aer.20121242" target="_blank">may make it</a> less appealing.</p>
<p dir="ltr"><span id="docs-internal-guid-03913888-d34b-eacb-f859-ed1c2bbd9914"><a href="https://www.the74million.org/article/research-suggests-dcs-tough-teacher-evaluation-system-helped-students-7-big-lessons-for-other-cities">Some lessons may be drawn</a></span> from Washington, D.C., which has been among the most aggressive in identifying and dismissing struggling teachers in disadvantaged schools. <a href="http://www.nber.org/papers/w21922" target="_blank">Researchers</a> have found that the district has been able to replace poor performers with better ones, perhaps in part because of <a href="http://www.washingtonian.com/2016/01/29/stop-talking-to-teachers-as-if-theyre-missionaries/" target="_blank">high salaries</a> differentiated by performance and school population. D.C. public schools have also developed performance screens when hiring that <a href="http://www.nber.org/papers/w22054" target="_blank">seem to be helpful</a> in determining who will be effective in the classroom.</p>
<p><span id="docs-internal-guid-03913888-d34b-eacb-f859-ed1c2bbd9914">Districts with aggressive evaluation systems that generate more teacher dismissals should pay particular attention to this issue, and ought to consider pairing evaluation reform with higher salaries or other efforts to make the job more appealing.</span></p>
<hr />
<div class="article__intro">Footnotes:</div>
<p><em>1. <span id="docs-internal-guid-03913888-d34e-05bc-5f09-7e4a136ca130">A handful of states use a </span><a href="http://www.brookings.edu/blogs/brown-center-chalkboard/posts/2016/03/17-improve-design-teacher-evaluation-hansen#.VusAAFetZRk.twitter" target="_blank">‘matrix’ model</a> in which scores on two dimensions are combined to create a summative rating. This is essentially a cruder version of a percentage-based system.</em></p>
</div>
</div>
</div>
</div>

Contact Us

Follow Us

Explore

5 Key Lessons from the Successes (and Failures) of President Obama’s Teacher Evaluation Reforms

Untangle Your Mind!

Most Popular

We Started Grouping Students by Reading Ability vs. Grade. Here’s What Happened

New Report: States Need to Up Their Game on Preparing Elementary Math Teachers

More Than a Third of Homeschool Families Also Use Public Schools, New Data Shows

Tiny Indiana District With Online School Worth Millions Ordered To Close

It’s Time to Reject Chronic Absenteeism as the New Normal in Student Attendance

On The 74 Today