Explore

Researchers: No Consensus Against Using Test Scores in Teacher Evaluations, Contra Democratic Platform

This originally appeared as part of The 74’s Democratic National Convention Live Blog, which was produced in partnership with Bellwether Education Partners. See our full DNC archive.
The Democratic platform states, “We oppose … the use of student test scores in teacher and principal evaluations, a practice which has been repeatedly rejected by researchers.”
It’s not often that educational research is mentioned in a major party platform. But several researchers who study teacher evaluation say the suggestion that there is a scholarly consensus against using test scores in teacher evaluation is misleading.
The 74 contacted a number of researchers who have studied teacher evaluation or value-added measures, a common method for assessing teacher impact on student test-score growth.
There are many ways in which the use of test scores to inform teacher evaluation and school accountability can and should be improved.  But the wholesale rejection of using test scores to inform teacher evaluations is an unproductive reaction to the limitations of test-score-based evaluation metrics,” said Matthew Kraft of Brown. “A balanced reading of the literature suggests there is mixed evidence for and against using test-score-based evaluation metrics.”
Kirabo Jackson of Northwestern said he disagreed with the platform’s language and that “test scores measures are valid, albeit imperfect, measures of teacher impacts on student skills.”
“VAMs, for the teachers for whom they can be created, do provide a piece of information about teachers’ abilities to improve student test scores,” said Katharine Strunk of the University of Southern California. “I think the research suggests that we need multiple measures — test scores, observations, and others – to rigorously and fairly evaluate teachers.”
Matthew Steinberg of the University of Pennsylvania said, “My view is that there is not in fact a consensus among academic researchers, particularly economists, who do this work, that value-added scores should not be used in high stakes teacher evaluation systems.”
Jim Wyckoff (University of Virginia), Cory Koedel (University of Missouri), and Dan Goldhaber (University of Washington Bothell) all also agreed research did not support categorically rejecting test-based teacher evaluation.
Several of the researchers said that measures of test score growth had significant limitations, but also provided meaningful information about a teacher’s impact on long-run outcomes; moreover, other ways to evaluate educators, particularly classroom observations, have some of same flaws as value-added. Some studies have found that teacher evaluations that include test scores can lead to improve student outcomes.
However, Jesse Rothstein of the University of California Berkeley said that while there was not a “full consensus” on the issue, “I do think the weight of the evidence, and the weight of expert opinion, points to the conclusion that we haven’t figured out ways to use test scores in teacher evaluations that yield benefits greater than costs.”
Susan Moore Johnson of Harvard agreed, “Both standardized tests and value-added methods — widely used to calculate each teacher’s contribution to her students’ learning — fall far short of what is required to make sound, high-stakes decisions about individual teachers. Because standardized tests often are poorly aligned with state standards or a required curriculum, they fail to accurately measure what teachers teach and students learn… Combining standardized tests and VAMS for use in teacher assessment is unwise and indefensible.”
The platform may have been referring to statements from the American Statistical Association and the American Educational Research Association that raise concerns and limitations about the use of value-added measures in teacher evaluation. (Notably, though, neither statement says that such scores should not be used whatsoever in evaluation.) A 2010 position paper signed on to by several prominent scholars also raised concerns, though a response by other researchers argued that value-added had an important role in teacher evaluation.
It’s hard to say what level of agreement amounts to a consensus, and The 74’s poll of just nine researchers may not be a representative sample of expert opinion.
And while the scholarly debate has focused on value-added measures, teachers are actually more likely to be evaluated via “student learning objectives.” The 74 previously reported that such measures have limited research evidence and several teachers say they can be easily gamed.
All told, though, the researchers’ responses highlight significant disagreement — rather than clear consensus — even among scholars on this important issue.
The Democratic platform is certainly right that some researchers reject test-based teacher evaluation — but that’s hardly the full picture.
Researchers’ responses
The following researchers were asked by The 74 to respond to the Democratic platform on test-based teacher evaluation and whether there is a consensus among researchers on this issue. The platform states, “We oppose … the use of student test scores in teacher and principal evaluations, a practice which has been repeatedly rejected by researchers.”
Dan Goldhaber, University of Washington Bothell
“I absolutely don’t agree that there is a consensus. I’d be surprised if researchers would agree that we should not consider student test scores AT ALL as a means of evaluation. That said, I can see how one could make a case that there is a consensus because of the [American Statistical Association] statement on value added for teacher evaluation and the [American Educational Research Association] on the use of value added for evaluation of teacher education programs. The problem with both of those statements is that they don’t consider a counterfactual means of evaluation, such as teacher observations. I know that if I was trying to predict whether a teacher is likely to raise the test scores of the students she has next year, I’d put a great deal more weight on value added than any of the other means of evaluating teachers that we commonly see employed. Value added certainly doesn’t capture everything about teachers’ contributions to student learning and growth, but it seems strange to suggest that what lots of research (e.g. the [Measures of Effective Teaching] study) shows is the best predictor of whether a teacher will contribute to the test achievement of future students should be off the table entirely!”
Kirabo Jackson, Northwestern University
“I disagree with this statement. It is not an accurate statement for quantitative education policy researchers who write on these issues. To speak to this, I can say three things. First, we know that test scores measure real skills and are predictive of students' subsequent educational attainment, criminality, and earnings. There is little disagreement on this point. Second, we know that teachers who systematically raise test scores tend to improve longer-run educational attainment and labor market outcomes. Most quantitative researchers agree on this point. Third, there are important actions taken by teachers that are not well-measured by teachers' effects on test scores. My work on teacher effects on socioemotional skills speaks to this point, and there is a growing consensus that this is true. That is to say that test scores measures are valid, albeit imperfect, measures of teacher impacts on student skills. However, it is important to note that while test scores (test score growth) are generally accepted as valid measures of a teacher's impact on some dimensions of student skills, test scores have not been found to be valid measures for principals. Put simply, the statement is valid for principals but not for teachers.”   
Susan Moore Johnson, Harvard University
“It may seem obvious that teachers should be evaluated based on their students’ learning. However, both standardized tests and value-added methods (VAMs) — widely used to calculate each teacher’s contribution to her students’ learning — fall far short of what is required to make sound, high-stakes decisions about individual teachers. Because standardized tests often are poorly aligned with state standards or a required curriculum, they fail to accurately measure what teachers teach and students  learn. Also, panels sponsored by the National Research Council and the American Statistical Association have found that VAMs are not sufficiently reliable for use in teacher evaluation. These are not subtle problems. Consequently, combining standardized tests and VAMs for use in teacher assessment is unwise and indefensible.”
Cory Koedel, University of Missouri
“This statement seems to ignore the scientific evidence on the quality of test-based measures of teacher performance. Although test-based measures have their weaknesses, well-designed research studies show that they are superior to available alternatives in predicting how much students learn when assigned to different teachers. This information can be used to help students learn more. Future research may indeed uncover other, better measures of teacher performance, but as of yet this has not happened. There is still much to learn about these and other measures — and even more to learn about how they can be effectively integrated into teacher evaluations in practice — but to imply that there is a consensus of this nature is misleading, premature, and unscientific.”
Matthew Kraft, Brown University
“There are many ways in which the use of test scores to inform teacher evaluation and school accountability can and should be improved. But the wholesale rejection of using test scores to inform teacher evaluations is an unproductive reaction to the limitations of test-score-based evaluation metrics. For example, the decision to grant a teacher tenure is an incredibly consequential decision for school-systems, teachers, and students.  It seems irresponsible to not use all available information, including test scores, to inform such a high-stakes decision.  
“A balanced reading of the literature suggests there is mixed evidence for and against using test-score-based evaluation metrics. Yes, it ‘has been ‘repeatedly rejected’ by some researchers — but it has been championed by many others.  The Democratic Party Platform states that ‘standardized tests must be reliable and valid,’ but research has revealed that we should be equally concerned about the reliability and validity of other widely-used evaluation metrics such as classroom observation scores. I’d suggest that rather than opposing any use of test scores, students and teachers would be better served if we focused our efforts on using them in smarter ways.”
Jesse Rothstein, University of California Berkeley
“This is a contentious issue, so I don’t think there’s full consensus. But I do think the weight of the evidence, and the weight of expert opinion, points to the conclusion that we haven’t figured out ways to use test scores in teacher evaluations that yield benefits greater than costs, and thus that we should not be making strong pushes to implement the existing approaches more broadly. There’s perhaps a bit less clarity on whether it is possible to use test scores in a more intelligent way — here, I still think most experts would say that there is no good way to use them, but some hold out hope that we’ll figure it out. Of course, there are some high profile economists who I don’t think agree with either of those statements, but I think they are a distinct minority both within economics and within the education research community more broadly.”
Matthew Steinberg, University of Pennsylvania
“I do not think that there is universal consensus to support that platform …The proverbial notion of throwing the baby out with the bathwater I think is not the right approach here. There are multiple pieces of information that we can use — from student test score information, from classroom observation scores, from student surveys — to build multiple measure teacher evaluation systems that provide information to improve teacher practice, identify areas of instructional need, and to some extent make high-stakes accountability decisions in terms of differentiating teacher effectiveness. None of these measures — whether they’re based on student test scores, professional observation of teacher practice, or student surveys — are free from bias. … My view is that there is not in fact a consensus among academic researchers, particularly economists, who do this work, that value-added scores should not be used in high stakes teacher evaluation systems.”
Katharine Strunk, University of Southern California
“I don’t think there is any scholar, although I could be wrong, who would suggest that test scores be used to generate the sole measure of teacher performance. But VAMs, for the teachers for whom they can be created, do provide a piece of information about teachers’ abilities to improve student test scores. True, they are not perfect. But a lot of the newer research coming out about observations — even the most rigorous observations based on the latest observation protocol and standards — show that observations are also imperfect. I think the research suggests that we need multiple measures — test scores, observations, and others — to rigorously and fairly evaluate teachers.”

Get stories like these delivered straight to your inbox. Sign up for The 74 Newsletter

Republish This Article

We want our stories to be shared as widely as possible — for free.

Please view The 74's republishing terms.





On The 74 Today