Putting the ‘Achievable Over the Ambitious’: The Problem When Teachers Grade Themselves
Keep in mind that your boss could then use those grades to decide which employees should stay on, how much they should be paid or whether top management should let some of them go.
That’s a scenario fraught with potential conflict but it’s one that in some ways confronts teachers in districts all over the country.
Pushed by the federal government, the vast majority of states now require teacher evaluations to include some measure of student achievement — but gauging student growth in grades and subjects traditionally without standardized tests has proven especially tricky.
For those teachers who can’t be linked to a state test — by far the majority, at around three-quarters in most districts — instead there are often what’s known as student learning objectives (SLOs), referred to some places as SMART goals or student growth objectives.
The precise method varies from state to state, district to district, and even school to school, but it usually means that at the beginning of the year a teacher sets a goal for student learning, sometimes based on specific academic standards. At the end of the year, the teacher is evaluated based on the extent to which students hit those targets.
For example, a first-grade teacher might create an objective that 80 percent of his or her students will master addition and subtraction up to 20, based on a test the teacher creates. Then, at the end of the year, the teacher presents evidence to his or her evaluator, on whether this goal was met. The evaluator then rates the teacher, usually based on a rubric that allows for partial credit if the teachers’ students fall short of the target. (See additional descriptions and examples in New York, Connecticut, Hawaii, Indiana, and Washington.)
Used in at least 25 states, according to a 2014 report, SLOs are often high stakes; in some places, like New York, they determine up to half of a teacher’s evaluation. These evaluations can be tied to tenure, dismissal, and salary. Although much of the impetus for changing teacher evaluation centered on the pervasively high marks teachers were receiving, in most state and districts, the vast majority of teachers are still rated good or great.
In interviews with several teachers across the country about how SLOs work, the phenomenon emerged that in many cases, the goals teachers set are measured based on assessments those same teachers write, administer, grade, and report. Although standardized tests can be used, many states also allow regular classroom tests to count towards SLOs. Principals are supposed to oversee the process, but it’s not clear they have the time, ability, or knowledge to do so.
In other words, when it comes to a critical aspect of their employment, some teachers are grading themselves.
When Mireille Ellsworth, a high school English teacher in Hilo, Hawaii, learned about SLOs, she was skeptical. Her biggest concern was that requiring teachers to set goals for students and then judging them on whether the goals were met, would reward teachers with low expectations, since an unambitious goal is easier to achieve than a challenging one.
“If you have low expectations for a kid, it becomes a self-fulfilling prophecy," she said — a claim backed by some research.
Ellsworth was also worried that the SLO would be easy to game, since teachers designed and graded the test used to measure whether the objective was met. “If you knew how to put on a good dog-and-pony show … you could manipulate it easily," she said. (Ellsworth refused to participate in the SLO process as part of her evaluation, and subsequently won an appeal against the state after receiving a subpar evaluation rating.)
Christine O’Neil, a middle school science teacher across the country in Bridgeport, Connecticut, raised similar concerns. In her district, teachers set two SLOs: one based on a standardized test and the other “can be based on whatever [type of test] you want, and, in fact, the union encourages teacher to not base it on a standardized test,” she said.
This flexibility makes it easy to game, according to O’Neil. “There are many teachers that say, ‘I’m going to make my second SLO based on this rubric and I’m going to make sure when I grade kids on this rubric [that] enough of them pass that I meet my goals,’” she said.
The high-stakes nature of the evaluation system pushes teachers to dumb down expectations, she said.
“It’s incentivized to set low rigor goals,” O’Neil said.
Spokespeople from the Hawaii and Connecticut departments of education did not respond to requests for comment.
In most places, a teacher’s evaluator — usually the school principal or assistant principal — is supposed to protect against potential manipulation or too-easy goals. But some teachers interviewed were skeptical that principals had the ability or time to ensure SLO integrity.
Colleen Filush, a high school music teacher in Bridgeport, said that her SLO is based on whether her band students make improvements playing their instruments. Filush said that her administrators attend the concerts and visit classes to see if it seems like students are getting better — but they lack the expertise to measure students’ progress with any degree of precision.
“If you’re that evaluator, who doesn’t have the background, how do you really know what’s happening? There has to be a good rapport between the teacher and the evaluator, and a serious sense of trust,” she said.
Ellsworth, the Hawaii teacher, said in her experience administrators were so overburdened they couldn’t look carefully at SLOs. "Evaluators basically just rubber-stamped most of them,” she said.
On the other hand, Melissa Scherle, a second-grade teacher in Indianapolis, said the system generally works well from her perspective. She’s judged on how many words her students can read over time and the assessment is given by another teacher in her school to prevent cheating.
Scherle said she confers with her principal at the beginning, middle, and end of the year to discuss progress; her principal then uses a district-provided rubric to help determine what score to give a teacher based on how many students met their learning goals.
“My principal is very good at looking at all the factors,” Scherle said.
Nate Bowling, a high school government teacher in Washington and the state’s teacher of the year, said his SLOs (or student growth goals as they’re called there) are based on assessments that he creates, which he ensures are closely aligned to the classroom content. He agrees that the evaluator is crucial for the process to work.
“I think the system in the way that my principal has implemented it and evaluated me is pretty good … but I hear horror stories from [other] teachers” in different schools, he said.
At least one state, New Jersey evaluates principals based on the quality of their teachers’ SLOs.
Creating assessments for “specials” teachers has long proven difficult for teacher evaluation systems. Districts appear to be using one or a combination of strategies.
In Indianapolis, Scherle said that all teachers made SLOs based on assessments related to their content area, such as a pacer and personal fitness test for gym teachers.
Other districts require all teachers — regardless of subject area — to have at least one goal related to core academic subjects. For instance, Filush the music teacher from Bridgeport, said she is required to include a reading goal for her music class. For her part, Filush said she doesn’t mind this, because she’s been able to set a goal based on reading sheet music, integrating it seamlessly into her curriculum.
“I think if it's structured the right way, it doesn't take away; it enhances,” she said.
But O’Neil, the Bridgeport teacher at a different school, said she’s seen gym teachers helping prepare kids for English tests because those scores will count toward their evaluations.
“I do worry if our P.E. teachers are required to help our students with reading — our students only get 45 minutes of physical education twice a week and we need a lot more than that,” she said.
Because SLOs have only come into systemic use fairly recently, there is a relatively limited research base by which to judge them. Research is also complicated because SLOs lack a consistent definition or method of implementation.
“SLO systems are really new in terms of their implementation, so that newness means that the evidence is still being collected, and there’s very little research [on] the implementation of SLOs, particularly as it relates to teacher evaluation,” said Lisa Lachlan-Haché, a principal researcher at the American Institutes for Research, a nonprofit headquartered in Washington, D.C.
Little is known about the reliability or validity of SLOs in teacher evaluation. Most studies find that teachers’ SLO scores are at least somewhat related to their scores on more sophisticated measures of student growth; but some research finds no relationship at all. The differing results shouldn’t be surprising, since SLO practice likely varies so much from place to place.
One recent study found that SLO ratings were somewhat more differentiated — meaning not all teachers received top scores — than classroom observation scores, which tend to yield high marks across the board.
“There are some promising findings even in the early stages … in terms of having distributions that are more varied and less positively skewed,” Lachlan-Haché said.
Still, in New York, 94 percent of teachers scored effective or highly effective on their student learning objectives in the 2013-14 school year; this was slightly more varied than principal observations, but less so than measures of the teachers’ contribution to student growth on state tests.
In New Jersey, teachers’ SLO (or student growth) scores were significantly higher than both their observation and state test growth ratings. As a state report drily put it, “Results show that educators likely emphasized the achievable over the ambitious” in setting learning targets.
Another report found that principals reported that SLOs were the most difficult aspect of teacher evaluation to implement, and many said they felt unprepared to accurately judge teachers based on them. However, there is some evidence that SLOs improve over time once they’re put in place.
Several districts that use SLOs have found that the issue of teachers exploiting the system’s weaknesses was a major challenge. “All types of stakeholders [across districts] expressed concern about the potential for some teachers to ‘game the system’ by setting easily attainable goals,” according to a report.
But there are still relatively few studies on the extent to which SLOs, particularly those using non-standardized assessments, are subject to manipulation. In high-stakes evaluations in any profession — including but not limited to teachers — all measures might be gamed. Past research has found evidence of such practices — cheating and teaching to the test — when stakes are attached to standardized tests.
The difference being, however, that standardized exams usually have extensive test security procedures in place, including how they are created, administered, and graded. SLOs based on teacher- or school-designed tests would likely be easier to manipulate.
Many teachers — even some who were skeptical about SLOs’ role in high-stakes evaluation — said the process of setting a goal for student growth and striving to achieve it had proven valuable to their practice.
“[SLOs are] a good way to dig into the data at the beginning of the year and know where your kids are at and know where you need to go,” said Scherle, the Indianapolis teacher.
O’Neil, the Bridgeport teacher, echoed this sentiment: “It’s so easy for us as teachers to live in the moment or live in tomorrow’s lesson plan that sometimes being forced to reflect on what we’re doing — on how we’re growing as individuals and as professionals — is so useful … I think [SLOs] can be a very helpful practice of growth and reflection and goal setting.”
Amy Rosno, a middle and high school English teacher at an online charter school in Wisconsin, wrote in an email, “By establishing [an] annual SLO, I am more focused. With the SLO and monthly data collection, I am able to easily keep tabs on student success and areas for concern.”
Bowling, the Washington teacher, said, “[The SLO process] helps me be intentional about lesson design and it helps me think specifically about what I’m doing in the classroom.”
Research has found that in districts implementing SLOs, teachers reported increased collaboration as a result. Other studies have shown that SLOs get teachers to spend more time analyzing student assessment data.
Generalizations about SLOs are challenging because of how much they vary from place to place and how little research exists. But the limited research and the experience of several teachers suggest that policymakers should take care in their use when teachers are creating the tests and the stakes are high because of the risk of manipulation and the temptation to lower teacher expectations.
Yet such measures do seem to have promise as a tool to help inform teachers’ instruction and enhance collaboration.
More studies needs to be done, particularly on the question of SLOs’ validity: How easy are they to manipulate in practice? Are SLO scores valid measures of teacher quality? “There’s a growing body of research right now, [but] I think the jury is still out,“ Lachlan, the researcher, said.
For now, teachers across the country will continue to be evaluated — and evaluate themselves — based on these largely inscrutable measures