Explore

Support The 74 and stories like this one. Donate Today!

News

2 New Reports Show That We Really Don’t Have a Great Way to Evaluate Teachers

By Matt Barnum

November 17, 2015

Education news and commentary, delivered right to your inbox.

Most Popular

education
Penny Schwinn Drops Out of the Running for Ed Department’s Deputy Role
special education
For Decades, the Feds Were the Last, Best Hope for Special Ed Kids. What Happens Now?
Texas
Texas Passed a Bible-Themed Curriculum. But Many Districts Aren’t Using It
commentary
Truly Shifting to Science of Reading Sometimes Takes ‘Balanced Literacy Rehab’
Artificial Intelligence
Will New AI Academy Help Teachers or Just Improve Tech’s Bottom Line?

A pair of recently released reports that each find serious flaws in the two most common methods used to evaluate teachers — observing them in the classroom and trying to pinpoint how much they affect individual student achievement — could leave policymakers wondering where to turn next.

A Nov. 10 statement released by the American Educational Research Association (AERA) and picked up by national news outlets calls into question the use of what are known as value-added models. The group cites studies that show flaws in value-added — statistical measures that attempt to isolate a teacher’s impact on student growth — including inconsistency from year to year and the shortcomings of standardized tests in gauging student learning.

Using value-added models to evaluate teachers and principals, or the programs that train them, comes with “considerable risks of misclassification and misinterpretation.” On the other hand, the report points to teacher observation as “a promising alternative.”¹

Not so fast.

A paper presented a couple days later at a policy conference in Miami finds that teacher observations suffer from many of the same flaws that plague value-added measures.

For instance, a teacher’s observation score may be significantly biased by the students she teaches. Specifically, teachers of students with higher test scores tend to get higher ratings. The researchers, including lead author Matthew Steinberg of the University of Pennsylvania and Rachel Garrett of the American Institutes for Research, found that math teachers with the highest-achieving students were nearly seven times more likely to get the top observation rating than teachers with the lowest-achieving students. This generally lines up with a 2014 Brookings Institution report that found a similar bias in observations.

Steinberg and Garrett’s paper suggests that how students are placed into classrooms could impact teachers’ observation score. For example, a principal might match students with behavioral problems to a specific teacher with strong classroom management skills. But that teacher with a classroom full of unruly students could then be rated lower.²

Just as the AERA statement warned against relying too heavily on value-added measures, Steinberg and Garrett urge “greater caution… when making high-stakes personnel decisions based largely on teachers’ classroom observation scores.”

Criticism of the value-added approach has gotten much greater attention — both from researchers and the media — than the weaknesses of teacher observation, which could mean that value-added is taking an unfair beating. After all, since both approaches have flaws — some of them similar — it’s difficult to decide which measure is better.

“I think [value-added] opponents tend to set unreasonably unattainable targets for what [it] has to achieve in order to be used at all,” says Morgan Polikoff, a University of Southern California professor.

Researchers actually have a better grasp of the strengths and limitations of value-added than of observations. There is evidence, for instance, that strong value-added is positively related to long-run student outcomes like income and college attendance. No such evidence, one way or the other, exists for observations.

So where do these conflicting findings leave policymakers? If both measures are flawed, can any high-stakes decisions be made based on them?

Before trying to answer that question, consider one important caveat: High-stakes decisions in education are unavoidable. School districts either grant a teacher tenure or they don’t; they either give a teacher a raise or they don’t; they either dismiss a teacher who may be struggling or they don’t.

These important decisions can’t be ducked; the key question, rather, is how they are made.

Steinberg and Polikoff agree that the best bet is using both measures. If the two point in the same direction, particularly year after year, that likely says something meaningful about a teacher’s performance.

Combining imperfect but useful measures — rather than ignoring either one because of its faults — might be as much as we can hope for. That may be cold comfort to teachers who face evaluation based on flawed data, but the only alternative to an imperfect evaluation system is no system at all.

Footnotes:

1. It’s not clear why AERA refers to observations as an “alternative,” since it appears that every district that uses a value-added model for teacher evaluation also uses some form of observation. (Return to story)

2. Some have raised similar concerns with value-added, though more recent research has suggested non-random sorting may not be a major issue for value-added. (Return to story)

Get stories like these delivered straight to your inbox. Sign up for The 74 Newsletter

Republish This Article Learn More

Matt Barnum is a senior staff writer at The 74.

@matt_barnum [email protected]

Republish This Article

We want our stories to be shared as widely as possible — for free.

Please view The 74's republishing terms.


                <h1>2 New Reports Show That We Really Don’t Have a Great Way to Evaluate Teachers</h1>

                <h2></h2>

                <p class="sans">By <a rel="author" href="https://www.the74million.org/contributor/matt-barnum/">Matt Barnum</a></p>

                <img src="https://www.the74million.org/wp-content/uploads/2017/01/1447809571_7990.png">

                <p>This story first appeared at <a href="https://www.the74million.org">The 74</a>, a nonprofit news site covering education. <a href="https://www.the74million.org/about/newsletters/?utm_source=republish-button&utm_medium=website&utm_campaign=republish">Sign up for free newsletters from The 74</a> to get more like this in your inbox.</p>
                <div class="article__paragraph">
<div class="article__paragraph opening" dir="ltr"><span id="docs-internal-guid-bd605e5c-1828-c712-0e39-ee90dee5c9d3">A</span> pair of recently released reports that each find serious flaws in the two most common methods used to <a href="https://www.the74million.org/article/the-war-over-evaluating-teachers-where-it-went-right-and-how-it-went-wrong">evaluate teachers</a> — observing them in the classroom and trying to pinpoint how much they affect individual student achievement — could leave policymakers wondering where to turn next.</div>
<p dir="ltr"><span id="docs-internal-guid-bd605e5c-1828-c712-0e39-ee90dee5c9d3">A Nov. 10 </span><a href="http://edr.sagepub.com/content/early/2015/11/10/0013189X15618385.full.pdf+html">statement</a> released by the American Educational Research Association (AERA) and picked up by national <a href="https://www.washingtonpost.com/local/education/education-researchers-caution-against-using-value-added-models--ie-test-scores--to-evaluate-teachers/2015/11/12/72b6b45c-8950-11e5-be39-0034bb576eee_story.html">news outlets</a> calls into question the use of what are known as <a href="https://www.the74million.org/flashcard/test-scores-and-teacher-evals-a-complex-controversy-explained/4?flow=1">value-added models</a>. The group cites studies that show flaws in value-added — statistical measures that attempt to isolate a teacher’s impact on student growth  — including inconsistency from year to year and the shortcomings of standardized tests in gauging student learning.</p>
<p dir="ltr"><span id="docs-internal-guid-bd605e5c-1828-c712-0e39-ee90dee5c9d3"><a id="2" name="2"></a>Using value-added models to evaluate teachers and principals, or the programs that train them, comes with “considerable risks of misclassification and misinterpretation.” On the other hand, the report points to teacher observation as “a promising alternative.”<a href="https://www.the74million.org/article/2-new-reports-show-that-we-really-dont-have-a-great-way-to-evaluate-teachers#1"><sup>1</sup></a></span></p>
<p dir="ltr"><span id="docs-internal-guid-bd605e5c-1828-c712-0e39-ee90dee5c9d3">Not so fast. </span></p>
<p><span id="docs-internal-guid-bd605e5c-1828-c712-0e39-ee90dee5c9d3">A paper </span><a href="https://appam.confex.com/appam/2015/webprogram/Paper13735.html">presented</a> a couple days later at a policy conference in Miami finds that teacher observations suffer from many of the same flaws that plague value-added measures.</p>
<p dir="ltr"><span id="docs-internal-guid-bd605e5c-1829-0bbf-f400-d0c9e2ef7e28">For instance, a teacher’s observation score may be significantly biased by the students she teaches. Specifically, teachers of students with higher test scores tend to get higher ratings. The researchers, including lead author Matthew Steinberg of the University of Pennsylvania and Rachel Garrett of the American Institutes for Research, found that math teachers with the highest-achieving students were nearly seven times more likely to get the top observation rating than teachers with the lowest-achieving students. This generally lines up with a 2014 Brookings Institution </span><a href="http://www.brookings.edu/research/reports/2014/05/13-teacher-evaluation-whitehurst-chingos">report</a> that found a similar bias in observations.</p>
<p dir="ltr"><span id="docs-internal-guid-bd605e5c-1829-0bbf-f400-d0c9e2ef7e28"><a id="3" name="3"></a>Steinberg and Garrett’s paper suggests that how students are placed into classrooms could impact teachers’ observation score. For example, a principal might match students with behavioral problems to a specific teacher with strong classroom management skills. But that teacher with a classroom full of unruly students could then be rated lower.<a href="https://www.the74million.org/article/2-new-reports-show-that-we-really-dont-have-a-great-way-to-evaluate-teachers#1"><sup>2</sup></a></span></p>
<p><span id="docs-internal-guid-bd605e5c-1829-0bbf-f400-d0c9e2ef7e28">Just as the AERA statement warned against relying too heavily on value-added measures, Steinberg and Garrett  urge “greater caution… when making high-stakes personnel decisions based largely on teachers’ classroom observation scores.”</span></p>
<div> </div>
<div>
<p dir="ltr"><span id="docs-internal-guid-bd605e5c-1829-5c7b-9595-ff7b829200d0">C</span>riticism of the value-added approach has gotten much greater attention — both from researchers and the media — than the weaknesses of teacher observation, which could mean that value-added is taking an unfair beating. After all, since both approaches have flaws — some of them similar — it’s difficult to decide which measure is better.</p>
<p dir="ltr"><span id="docs-internal-guid-bd605e5c-1829-5c7b-9595-ff7b829200d0">“I think [value-added] opponents tend to set unreasonably unattainable targets for what [it] has to achieve in order to be used at all,” says Morgan Polikoff, a University of Southern California professor.</span></p>
<p dir="ltr"><span id="docs-internal-guid-bd605e5c-1829-5c7b-9595-ff7b829200d0">Researchers actually have a better grasp of the </span><a href="https://www.the74million.org/flashcard/test-scores-and-teacher-evals-a-complex-controversy-explained/9?flow=1">strengths and limitations</a> of value-added than of observations. There is evidence, for instance, that strong value-added is positively <a href="http://www.carnegieknowledgenetwork.org/briefs/long-term-impacts/">related</a> to long-run student outcomes like income and college attendance. No such evidence, one way or the other, exists for observations.</p>
<p dir="ltr"><span id="docs-internal-guid-bd605e5c-1829-5c7b-9595-ff7b829200d0">So where do these conflicting findings leave policymakers? If both measures are flawed, can any high-stakes decisions be made based on them?</span></p>
<p dir="ltr"><span id="docs-internal-guid-bd605e5c-1829-5c7b-9595-ff7b829200d0">Before trying to answer that question, consider one important caveat: High-stakes decisions in education are unavoidable. School districts either grant a teacher tenure or they don’t; they either give a teacher a raise or they don’t; they either dismiss a teacher who may be struggling or they don’t. </span></p>
<p dir="ltr"><span id="docs-internal-guid-bd605e5c-1829-5c7b-9595-ff7b829200d0">These important decisions can’t be ducked; the key question, rather, is how they are made.</span></p>
<p dir="ltr"><span id="docs-internal-guid-bd605e5c-1829-5c7b-9595-ff7b829200d0">Steinberg and Polikoff agree that the best bet is using both measures. If the two point in the same direction, particularly year after year, that likely says something meaningful about a teacher’s performance.</span></p>
<p><span id="docs-internal-guid-bd605e5c-1829-5c7b-9595-ff7b829200d0"><a id="1" name="1"></a>Combining imperfect but useful measures — rather than ignoring either one because of its faults — might be as much as we can hope for. That may be cold comfort to teachers who face evaluation based on flawed data, but the only alternative to an imperfect evaluation system is no system at all.</span></p>
<hr />
<div class="article__intro">Footnotes:</div>
<p><em>1. <span id="docs-internal-guid-bd605e5c-1829-ffeb-a998-8dfc5f18f065">It’s not clear why AERA refers to observations as an “alternative,” since it appears that every district that uses a value-added model for teacher evaluation </span>also uses some form of observation. <a href="https://www.the74million.org/article/2-new-reports-show-that-we-really-dont-have-a-great-way-to-evaluate-teachers#2">(Return to story)</a></em></p>
<p><em>2. <span id="docs-internal-guid-bd605e5c-1829-d015-0249-1989d76f7f4c">Some have raised </span><a href="http://www.nber.org/papers/w14666">similar concerns</a> with value-added, though more <a href="http://www.cedr.us/papers/working/WP%202011-5%20Rothstein%20Critique%2011-2-2011.pdf">recent</a> <a href="http://www.rajchetty.com/chettyfiles/w19423.pdf">research</a> has suggested non-random sorting may not be a major issue for value-added. <a href="https://www.the74million.org/article/2-new-reports-show-that-we-really-dont-have-a-great-way-to-evaluate-teachers#3">(Return to story)</a></em></p>
</div>
</div>

Contact Us

Follow Us

Explore

2 New Reports Show That We Really Don’t Have a Great Way to Evaluate Teachers

Education news and commentary, delivered right to your inbox.

Most Popular

Penny Schwinn Drops Out of the Running for Ed Department’s Deputy Role

For Decades, the Feds Were the Last, Best Hope for Special Ed Kids. What Happens Now?

Texas Passed a Bible-Themed Curriculum. But Many Districts Aren’t Using It

Truly Shifting to Science of Reading Sometimes Takes ‘Balanced Literacy Rehab’

Will New AI Academy Help Teachers or Just Improve Tech’s Bottom Line?

On The 74 Today