Group measures of performance, in which teachers are evaluated based on test scores of students or subjects they don’t teach. A common example is teachers being judged on the entire school’s math or English score even if they teach, say, art. This occurred in Florida, Tennessee, New Mexico, and New York and generated significant controversy.

Student learning objectives (SLOs), in which teachers set goals for student performance on a test, either one they create themselves or a standardized one. The goals are approved by their supervisor, who then assesses the teacher based on how well the students meet those goals. One study of schools in Austin, Texas found no correlation between a teacher’s SLO score and his or her VAM score; while another study in Denver, Colorado found a moderate correlation. These results may be because SLOs and VAMs are assessing different aspects of teacher quality, but they might also call into question whether SLOs are valid measures of teacher performance.
VAMs are among the most common. Another common model is known as student growth percentile, which, like VAM, measures student test score growth, but with a different mathematical technique. These models rank students with similar prior achievement based on how much growth they make. Such models, unlike VAM, often do not include controls for student characteristics like poverty, and so may unfairly disadvantage teachers of atrisk students.
Different VAMs also use different variables and demographic factors to create students’ estimated scores. In general, models that account for more student characteristics do a better job of ensuring a level playing field for teachers of academically challenged students.
Some models compare teachers only to other teachers in the same school, though most compare teachers across a given state.
Note that ‘validity’ here is used in the statistical sense, meaning a measure’s success in measuring what it purports to measure, meaning in this case teacher effectiveness.
VAM scores can and do fluctuate from year to year and much of this fluctuation is the result of imprecise measurement (also known as “error”). For example, one study found that 57 percent of teachers who were in the bottom fifth of performance in one year, had moved to another level in the subsequent year — and 8 percent of the bottomlevel teachers were in the top performance category in the following year. In general the correlation from yearto year ranges between .2 (weakly) and .7 (fairly high).^{1}
The reliability tends to be higher for math teachers than for English teachers. Some (but not all) of this instability can be addressed by averaging multiple years of data. The yeartocareer correlation of a given teacher’s VAM is significantly higher — ranging from .55 (medium) to .78 (high) in one study — than the yeartoyear correlation.
Finally, it’s crucial to note that all performance measures have some degree of instability. There is less evidence about the reliability of these alternative measures, but what exists generally suggests principal observations are somewhat more stable over time than VAM — though stability/reliability does not imply validity. In other words, a measure could be consistent over time — like a teacher’s height — but not a very valid one to judge how well that teacher teaches.Note that ‘reliability’ here is used in the statistical sense, meaning a measure’s consistency.
1. In statistical terms a correlation coefficient ranges between 1 and 1. A correlation of 0 means there is no association whatsoever; 1 means a perfect correlation; and 1 means a perfectly negative correlation.
We don’t know for sure yet, though there’s certainly a possibility that it will, and there is some evidence suggesting both positive and negative outcomes.
There is research showing that holding schools accountable for student test scores has led to cheating and teaching to the test. At the same time, there is evidence that testbased accountability for schools has in many circumstances increased student achievement both on highstakes tests — like the yearly standardized tests — and on lowstakes exams, like the National Assessment of Educational Progress test given every two years.
However, the gains on the lowstakes tests are often not as dramatic as those on the highstakes exams, which gets back to whether teachers are teaching to the highstakes tests or cheating on them.
There have been relatively few studies on how the use of VAM in districts and schools affects students. The few pieces of research that do exist offer both reasons for caution and optimism.

A study found that providing districts with valueadded data did not lead to improved student outcomes (relative to similar districts that did not have access to such data).

A study that offered teachers with high VAM scores a $20,000 bonus for transferring to a highpoverty school produced significant student achievement gains in elementary grades but no effect in middle school.

A study of New York City’s tenure system — which was made more rigorous, partly by using VAM scores — found that the reforms likely led to improvements in teacher quality.

A study in which a group of New York City principals were given VAM scores produced small improvements in student achievement (relative to students of principals who were not given such data).
