Explore

40 Years After ‘A Nation At Risk,’ What Needs to Change About School Staffing & Teacher Quality to Better Serve Students

The status quo for in-service teachers — low-stakes assessments & rigid single-salary schedules — needs to change if student outcomes are to improve.

By Thomas S. Dee

February 13, 2024

Untangle Your Mind!

Most Popular

Get stories like this delivered straight to your inbox. Sign up for The 74 Newsletter

The 74 is partnering with Stanford University’s Hoover Institution to commemorate the 40th anniversary of the ‘A Nation At Risk’ report. Hoover’s A Nation At Risk +40 research initiative spotlights insights and analysis from experts, educators and policymakers as to what evidence shows about the broader impact of 40 years of education reform and how America’s school system has (and hasn’t) changed since the groundbreaking 1983 report. Below is an excerpt from the project’s chapter on school staffing and teacher quality. (See our full series)

The publication of A Nation at Risk (ANAR) in 1983 was the defining moment of the “first wave” of education reform. It articulated improbably long-lived insights that continue to define education policy and discourse to this day. In particular, ANAR underscored, with uncommon rhetorical flourishes, the contrast between the ambitious ideals of a “Learning Society” and existing educational standards defined by modest minimum requirements, such as the low expectations embedded in high schools’ minimum competency tests and “cafeteria-style” curricula. Clearly, ANAR’s most prominent recommendation was the adoption of high school graduation requirements grounded in a “New Basics” curriculum that would feature four years of English; three years of science, math, and social studies; a half year of computer science; and, for college-bound students, two years of foreign language instruction.

However, ANAR also commented on several other dimensions of the education system in the United States, including the state of the teaching profession. In particular, ANAR concluded that “too many teachers are being drawn from the bottom quarter of graduating high school and college students.” The report also underscored the inadequate subject-matter focus of teacher training, low pay, teachers’ limited influence on key professional decisions (e.g., textbooks), and the targeted character of teacher shortages. These findings—and the seven specific recommendations ANAR made regarding teaching—have been the focus of education research, commentary, and policymaking to this day.

Below, I provide a compact overview of key insights from the research and policymaking that occurred in the wake of these recommendations. I focus specifically on the developments relevant to in service teachers, while the important issues related to recruitment, induction, and mentoring in the teaching profession are addressed separately by Michael Hansen in a previous analysis. ANAR made four specific recommendations relevant to in-service teachers. One is that teacher salaries should be “professionally competitive, market-sensitive, and performance-based” and linked to “an effective evaluation system” that rewards effective teachers and guides underperforming teachers toward improvement or termination. A related second recommendation advocates for collectively developed “career ladder” designations that distinguish beginning, experienced, and master teachers. ANAR’s remaining two recommendations for in-service teachers focus on supporting teacher improvement through funded time for professional development.

Theories of Action

ANAR’s recommendations for in-service teachers tacitly reflect two broad and complementary theories of action for improving teacher effectiveness and student outcomes. One involves improving the effectiveness of existing teachers. The intent is for this to occur through professional development activities and through the implementation of well-designed financial and professional incentives. Both of these intend to promote an understanding of high-quality classroom practices as well as their consistent use. The second theory of action focuses on selection—that is, performance assessment systems designed to retain and elevate the most effective teachers while ensuring that persistently ineffective teachers exit the classroom. Notably, these policy recommendations stand in sharp contrast to conventional efforts to promote teacher effectiveness through generic salary increases unrelated to performance or need and through reducing class sizes by hiring more teachers.

The motivations for ANAR’s theories of action rest upon several important stylized facts about teachers that have become increasingly well established since its publication. Arguably, the most foundational evidence concerns the variation in effectiveness across teachers. An older debate had questioned whether there are aspects specific to teaching that make it prohibitively difficult to measure teacher effectiveness in a valid and reliable manner. However, richer data and methodological advances have led to a consensus about the general validity of teacher effectiveness measures while also acknowledging important evidence on the degree of noisiness common to such measures.

These studies indicate that the variation in teacher effectiveness is large, particularly relative to the effects of other promising education interventions. Specifically, a one-standard deviation improvement in teacher effectiveness corresponds to a gain in student performance on standardized tests of roughly 0.1 to 0.2 standard deviations. Critically, the manner in which teachers are currently assessed — that is, informal, “drive-by” evaluations — captures virtually none of this documented variation, rates the vast majority of teachers as satisfactory, and results in little performance-based attrition of low-performing teachers from the classroom.

Another important stylized fact is that, at the hiring stage, school leaders have little capacity to identify the teachers who will become more effective. This combination of facts that teachers vary considerably in impact, but this impact can be observed much more easily after several years in the classroom than at the hiring stage—suggests the need for broader access to the teaching profession coupled with discerning assessment systems that guide subsequent personnel decisions. In particular, decisions to tenure rather than dismiss the lowest-performing teachers can have dramatic consequences given the length of teaching careers.

Over the past fifteen years, this evidence has motivated a number of ambitious public and philanthropic efforts to systematically improve the effectiveness of the teacher workforce through performance-based assessment systems. Recent research has also provided more credible evidence of direct initiatives designed to improve the performance of all in-service teachers through professional development. I discuss these policy innovations and the related research below.

Improving teacher effectiveness

ANAR recommended that teachers receive eleven-month contracts so that they could spend more time in professional development and provide additional instruction for students with special needs. While the eleven-month contract has not been widely adopted, broader efforts to improve the performance of in-service teachers through direct training and support involve a substantial expenditure of time and money. However, accurately identifying the magnitude of these outlays is not straightforward given the accounting challenges of categorizing such activities and their demands on time for both teachers and nonteaching staff. For example, a 2019 study by Alexander and Jang examined expenditure reports for Minnesota school districts and found that 1 percent of 2013–14 operational expenditures was spent on activities defined by the state as staff development. In contrast, a 2015 study by the New Teacher Project found that 2013–14 expenses related to teacher improvement constituted, on average, 8 percent of district budgets. This figure consisted of both direct expenditures on teacher improvement, such as professional development, coaching, and new-teacher support, as well as related indirect expenditures, such as the management, strategic, and operational expenses for these improvement efforts.

Focusing specifically on professional development, a 2014 study commissioned by the Gates Foundation found that the typical teacher spends sixty-eight hours per year on professional learning directed by districts, or eighty-nine hours when courses and self-guided professional learning are included. Most of the time spent by teachers in professional development occurs in workshops and professional learning communities conducted by district staff. The cost of this professional development was estimated at $18 billion per year in 2014. Teacher perceptions of the quality of these investments have generally not been encouraging, nor do they appear to have clear links to teacher performance or improvement. The Gates report also stresses the overwhelming use of district staff instead of market-tested external providers to provide professional development, as well as limited teacher voice in choosing their training.

Despite the considerable expense and prominence of teacher professional development, credible research on the impact of these investments has also been quite limited over much of the period since ANAR’s publication. For example, Yoon et al. reviewed more than 1,300 studies potentially addressing the impact of teacher professional development on student learning and found only nine studies that met the evidence standards in the federal What Works Clearinghouse: six randomized controlled trials and three quasi-experimental studies conducted between 1986 and 2003. However, what these studies revealed suggests a striking proof of concept: teachers who received substantial professional development could boost the achievement of the average control-group student by 21 percentile points. Notably, these nine professional development initiatives focused on elementary grades but differed in their theories of action.

However, other quasi-experimental studies serve as a reminder that implementing effective professional development consistently at scale is a serious challenge. Jacob and Lefgren examined the effect of teacher training in Chicago Public Schools using a credible natural experiment in which schools with low baseline test scores received additional resources for staff development. They found that this initiative had “no statistically or academically significant effect” on math or reading achievement of elementary students. Similarly, Harris and Sass examined student-level longitudinal data linked to teacher data for the state of Florida and did not find an overall impact of professional development on teacher productivity. However, they did find positive effects of content-focused math professional development on student outcomes at the elementary and middle-school levels.

Over the past decade, experimental studies of teacher professional development have proliferated. In general, they have provided mixed evidence of the learning impact of investments in professional development. For example, experimental studies by Garet et al. found that reading- and math-focused training changed teacher knowledge and practice but without clearly improving student achievement. However, meta-analytic summaries of such experimental professional development evaluations suggest that positive effects exist but vary considerably by program design. For example, Basma and Savage examined seventeen literacy-focused professional development studies and found an overall effect size for reading achievement of 0.225. Similarly, in a meta-analysis of ninety-five STEM-focused professional development studies with experimental and quasi-experimental designs, Lynch et al. report an average effect size of 0.21.

However, other multisubject meta-analyses suggest smaller but still positive effects on student learning. For example, Fletcher-Wood and Zuccollo identified fifty-three experimental evaluations of teacher professional development and found an overall effect size of 0.09. Similarly, Sims et al. reviewed 104 experimental evaluations and found an overall effect size of 0.05. Given the considerable financial expense of most training investments, effects of this size, though positive, raise serious questions about cost-effectiveness.

These reviews also note and seek to examine the considerable variation across professional development programs in terms of impact. Kennedy argues that the widely discussed design features of teacher professional development — namely program duration, emphasis on content knowledge, and use of professional learning communities — are far less relevant than whether the training addresses any of the four persistent challenges of teaching: portraying content, managing student behavior, enlisting student participation, and knowing what students understand. In a similar vein, Sims et al. characterize professional development programs by the more general ways they change teacher skills and behaviors. Specifically, they characterize teacher professional development by four “IGTP” traits that indicate whether teachers are provided with new insights (I), goal-oriented behaviors (G), and techniques (T) that are embedded in practice (P). And they conclude that professional development programs with all four traits have an effect size on student learning of 0.17. However, these assessments may obscure the relevance of professional development initiatives that focus on the most effective elements of content and practice, such as an emphasis on “science of reading” approaches in literacy-focused training.

Overall, this evidence indicates that ANAR was prescient in emphasizing the need for ongoing training of in-service teachers. The available evidence suggests that such training can have substantial effects on student learning. However, realizing the increasingly well-established potential of this training is not straightforward. It involves the perennial challenge of translating research findings—that is, the critical design features of effective professional development— into genuine changes in high-impact practice at scale.

Teacher evaluation and performance-based incentives

ANAR also made prominent recommendations to dramatically change how we pay and evaluate public school teachers. In general, the status quo to this day compensates teachers according to single-salary schedules that rigidly structure pay according to years of experience and observed qualifications (e.g., a graduate degree) that do not consistently predict teacher effectiveness. This approach has historical origins in well-intentioned efforts to eliminate overt discrimination and capriciousness in teacher pay. Today, critics allege that this inflexible approach has led to low and undifferentiated salaries that do little to attract, motivate, and retain the most-effective teachers and to direct the least-effective teachers out of the classroom, particularly in hard-to-staff schools and high-need subjects. Furthermore, this approach to pay is coupled with low-stakes, “drive-by” teacher evaluations that capture little of the variation in teacher performance and do not provide reliable guidance for professional learning.

ANAR envisioned an alternative in which teacher compensation was substantially higher but also based on performance in a manner that would direct persistently underperforming teachers either to improve or to leave the profession. In the aftermath of ANAR’s publication, several states and districts experimented with providing teachers with extra pay and career-ladder recognitions for demonstrated merit (though, not generally, dismissing chronically underperforming teachers). These reforms tended to be short-lived despite encouraging results. While the rollback of these reforms was clearly a policy choice, the underlying causes are debated. Ballou argued that it largely reflected the opposition of teachers’ unions. Murnane and Cohen contended that it reflected the distinctive character of teachers’ professional practice — that is, multidimensional and difficult to observe. However, random-assignment evidence from a comparatively well-implemented career ladder program in Tennessee indicates that it was effective in identifying teachers who raised student achievement.

The past two decades have witnessed a diverse variety of ambitious efforts, often encouraged by prominent philanthropic and federal initiatives, to measure teacher performance and to link it to improvement supports and incentives such as financial benefits, career-ladder designations, and dismissal threats. The research on these different reforms suggests their promise but also underscores the nontrivial challenges (e.g., design features, implementation, and political credibility) that make the consistent realization of this promise difficult. For example, the Obama administration’s Race to the Top (RttT) initiative disbursed more than $5 billion to states in a competition based in part on their commitment to developing systems for promoting teacher effectiveness. While RttT was effective in promoting state policy adoption, its effects on key design features and implementation are far less clear. In particular, while states were more likely to have multiple measures of teacher performance in the wake of RttT, the use of this data to inform salary and retention decisions remained uncommon. The state reforms over this period were “rarely sustained over time,” offered low bonuses, and rated fewer than 1 percent of teachers as unsatisfactory.

A more granular focus on the available evidence from specific initiatives provides richer insights into these issues of design, implementation, and political durability. For example, several studies focused narrowly on simply providing teachers with incentives for improved performance. These studies often found null (or weak) effects that are likely to reflect the unique character of these programs. “Cash for test scores” experiments with individual incentives for teachers in Nashville and group incentives for teachers in Round Rock, Texas, found little to no evidence of effects on teacher practices, attitudes, and the learning gains of their students. Similarly, studies of a group-based teacherincentive experiment in New York City found that they had no overall effects on key teacher or student outcomes.

Critics of teacher incentives suggest that these null findings reflect a misunderstanding of teacher motivations and the manner in which such incentives might debase intrinsic motivation. However, three design features of these studies could also contribute to these null findings and have important implications for performance-based assessment and compensation. First, the fact that participants know that these experimental incentives have a short term (e.g., two years) can sharply attenuate the resulting motivation to undertake changes in professional practices. This same concern can also apply to the incentives embedded in at-scale policy reforms that are viewed as faddish and unlikely to endure politically. Second, these initiatives generally focused on student achievement as the incentivized outcome. This may weaken the impact of incentives if teachers do not see or understand how they should change everyday practice to realize these rewards. A related third point is that these incentive studies generally did little to support and guide teachers in how they could change their professional practices to earn these rewards.

Three other studies suggest the potential importance of other design features. A teacherincentive study in Chicago Heights, Illinois, found positive effects on student achievement (but only in the first wave of the experiment) when the incentives were framed as the loss of an award rather than a gain. Second, the Talent Transfer Initiative (TTI) found positive effects when offering high-performing teachers a high-powered incentive ($20,000) linked to a distinctly clear, easily observed, and important behavior: working in a hard-to-staff school for two years. However, it is notable that these incentive-based gains were difficult to realize. More than 1,500 teachers had to be approached in order to fill only eighty-one vacancies. Third, the Accelerating Campus Excellence (ACE) program in Dallas similarly provided large incentives to highly effective teachers willing to work in hard-to-staff schools. Morgan et al. presented evidence that ACE produced dramatic gains in student performance: a 0.3 effect size in reading and 0.4 in math. This study also found that this success replicated as the program went to scale and that these gains were reversed when the program was eliminated.

Notably, these focused incentive programs all fall short of the more comprehensive system of assessments, supports, and incentives recommended by ANAR. TAP: The System for Teacher and Student Advancement (formerly known as the Teacher Advancement Program), which was introduced in 1999 and is currently active in “nearly twenty states and hundreds of school districts across the US,” is closer to ANAR’s vision. Specifically, the defining features of TAP include career ladder designations for teachers and job-embedded, professional learning led by master teachers. In support of this professional learning, TAP also provides teachers with comprehensive evaluations of their professional practice. However, it is not clear that this “instructionally focused accountability” articulates clear mechanisms for directing consistently low-performing teachers out of the classroom (the selection mechanism in ANAR’s theory of change). Finally, TAP includes performance pay typically linked to observations of teachers’ professional practice, such as classroom observation, portfolios, and interviews, as well as test scores.

The available evidence suggests that TAP is effective in improving teacher performance and student outcomes. Specifically, in a quasi-experimental study based on 1,200 schools from two states, Springer, Ballou, and Peng found that TAP increased student performance, particularly at the elementary school level, with effect sizes varying from 0.12 to 0.34 by grade. Similarly, Cohodes, Eren, and Ozturk, leveraging the rollout of TAP across schools in South Carolina, found that it generated improvements in several long-run outcomes, including educational attainment, criminal activity, and the take-up of government assistance. However, a random-assignment evaluation of TAP in Chicago schools by Glazerman and Seifullah found that it did not improve student achievement and that it was also vexed by the challenges of implementing this reform with fidelity, such as teacher payouts being smaller than originally stated and no rewards based on value added because of inadequate data systems.

Two other high-profile studies provided further evidence of the serious challenges of implementing comprehensive reforms of teacher assessments and compensation as well as of credibly assessing their effects. The first example is the federal Teacher Incentive Fund (TIF). Congress established TIF in 2006 to provide grants to high-need schools implementing performance-based compensation systems. The four required components of TIF reforms also resembled those suggested by ANAR: (1) measures of teacher performance, including observations of classroom practice; (2) large, differentiated, difficult-to earn performance bonuses; (3) additional pay for career-ladder opportunities, such as becoming a master teacher and coach; and (4) professional development linked to the teacher assessments. A congressionally mandated study of TIF focused on the 2010 grant recipients in more than 130 school districts and found it led to student achievement of 1 to 2 percentile points higher in reading and math.

However, there are two important caveats to this evidence of modest impact. First, the implementation of these reforms in the study districts was incomplete. Only about half of the participating districts reported implementing all four components of the reforms required by TIF. In particular, professional development was frequently not provided, and most teachers received bonuses, “a finding inconsistent with making bonuses challenging to earn.” Second, the treatment–control contrast assessed in this random assignment study did not examine the effect of TIF versus “business as usual.” Instead, the treatment schools in the study were intended to receive pay-for-performance bonuses while the control group received automatic bonuses. And all study participants, both treatment and control, were assigned access to the three other TIF components: career ladder responsibilities and rewards, evaluative feedback, and professional development. In this critical but often overlooked detail, the federal study of TIF more closely resembles the studies of teacher incentives noted above than a true evaluation of teacher assessment systems.

The Gates-funded Intensive Partnerships for Effective Teaching initiative is a second widely discussed example of implementing and evaluating teacher assessment systems. This initiative sought to introduce assessment reforms within three school districts and four charter management organizations. Similar to both TAP and TIF, this effort featured focused professional development and career ladder incentives along with performance pay and retention decisions based on direct, structured observation of teacher practice and value-added scores. A quasi-experimental study found that these reforms did not clearly improve the focal student outcomes of high school graduation and college attendance. However, the implementation of the reforms appears to have been weak. The teacher evaluations flagged few teachers as poor performers, and in sites with available data, only 1 percent were dismissed for poor performance. As with the federal TIF evaluation, the treatment contrast that was studied was muted because the comparison schools in this study often adopted similar policies.

IMPACT, the highly controversial teacher assessment reforms introduced in the District of Columbia Public Schools (DCPS), is distinctive as a seminal and enduring effort to implement ANAR’s recommendations with fidelity. IMPACT evaluated DCPS teachers on multiple measures with a heavy emphasis on structured classroom observations, including some conducted by district staff, and linked professional development. These evaluations resulted in measures of teacher performance that exhibited variation rather than being largely uniform. IMPACT linked these measures to high-stakes consequences: substantial pay increases for “highly effective” teachers, particularly those in high-poverty schools; dismissal for a small number of “ineffective” teachers; and a dismissal threat for “minimally effective” teachers who did not become effective within a year.

A quasi-experimental study of the incentive contrasts embedded in IMPACT found it had positive effects on teacher performance. This study’s design leveraged a feature of IMPACT in which teachers with performance scores just below a threshold value were deemed “minimally effective” and subject to a dismissal threat while those with scores at or above the threshold were not. A comparison of teachers just below and above this threshold found that the threat of dismissal caused minimally effective teachers either to leave the district or to improve their measured performance substantially. A powerful financial incentive for highly effective teachers to repeat their prior performance also appeared to have positive effects.

Three other aspects of IMPACT merit emphasis. First, the political credibility and resiliency of IMPACT appeared to be highly salient. In 2010, when the city (and district) leadership who championed IMPACT were forced out of office, the first “minimally effective” designations did not appear to change teacher behavior. However, the ratings reported in the summer of 2011, when it appeared that IMPACT would endure, did drive changes in teacher behavior.

Second, evidence indicates that IMPACT not only improved the performance of existing teachers but also replaced underperforming teachers who exited with substantially more effective instructors. Specifically, a quasi-experimental study by Adnot et al. finds that, when a low-performing teacher exited, their replacement raised student performance by 0.14 standard deviations in reading and 0.24 standard deviations in math. Third, the performance benefits of IMPACT’s incentives endured through subsequent revisions to the teacher supports and ratings structure.

A second district reform of note (and one with strong parallels to IMPACT) began in the Dallas Independent School District in 2015. Specifically, like IMPACT, the Teacher Excellence Initiative (TEI) replaced a single-salary schedule with compensation based on multiple measures of teacher performance. Furthermore, like IMPACT, it also did so in the context of accountability for school principals. TEI also implemented a unique design feature to discourage inflated or arbitrary ratings of teachers. It fixed the overall distribution of ratings and penalized principals for subjective ratings that were highly misaligned with test-based ratings. A synthetic-control study by Hanushek et al. found that these reforms led to statistically significant increases in student achievement that grew over time to a roughly 0.2 standard deviation in math and a 0.1 standard deviation in reading.

Concluding thoughts

ANAR’s recommendations that focused on improving the effectiveness of in-service teachers were a harbinger of some of the most dramatic education policy innovations of the past forty years. And these innovations have provided us with several proofs of concept and new insights that establish the potential to improve student learning through dramatic changes in teacher evaluation, in-service training, and compensation.

However, it must also be acknowledged that there has clearly not been large-scale, lasting change regarding ANAR’s teacher-focused recommendations. Uninformative, low-stakes assessments of professional practice and rigid single-salary schedules are still the norm for the vast majority of teachers in US public schools. And while in-service teachers do engage in extensive professional development, the impact of these expensive and highly variable investments is uncertain at best.

Any serious effort to reimagine the assessment, training, and compensation of in-service teachers should begin by confronting the factors that have contributed to the long durability of the status quo. There appear to be three broad and interrelated impediments to substantive change. The first is the need to improve the knowledge base of how best to design the key features of these reforms. For example, efforts to improve teacher evaluation and introduce performance-based teacher pay rely critically on valid and reliable measures of teacher performance. Promising gains in measuring teacher effectiveness are likely to come from continued improvements to structured rubrics for classroom practices. Incentives can better guide the professional improvement of teachers when they are linked to the high-impact, everyday classroom practices teachers directly control and can enhance through complementary training.

Another important area where improved knowledge is critical to driving at-scale change concerns the design of teacher professional development. The typical professional development experience, workshops directed by internal district staff, is often criticized (e.g., the New Teacher Project 2015). At the same time, a recent and growing body of experimental studies indicates that purposively designed professional development can have substantial impact. This literature generally emphasizes the particular benefits of in-service training that focuses on meeting more general challenges of teacher practice. While more can be learned about the design of professional development, the question of how to design its delivery is even more uncertain. A study from the Gates Foundation suggests that relying more on external providers of professional development will make it easier to move nimbly to market-tested and effective approaches. However, several of the teacher assessment reforms discussed here instead emphasize redesigning internally provided professional development to rely on master teachers who may be better positioned to serve as coaches providing embedded and relevant training. These issues underscore the need to build a complementary learning agenda around any new reforms (e.g., inquiry cycles, networked improvement communities).

A second impediment to realizing ANAR’s vision concerns the multifaceted operational challenges of implementing meaningful reforms effectively at scale. The null findings from credibly identified studies of professional development in at-scale field settings suggest this issue. However, more-direct and sobering evidence comes from several well-funded, high-profile efforts to introduce teacher assessment and compensation reforms at some scale. These include (1) the failure to deliver value-added bonuses because of data-system inadequacies in TAP; (2) the limited variation in teacher ratings and their infrequent use in personnel decisions in the Gates Foundation’s Intensive Partnership for Effective Teaching; (3) the inconsistent delivery of professional development and the broad distribution of bonuses under the federal Teaching Incentive Fund; and (4) the limited use of teacher evaluations to guide salary and retention decisions under the RttT initiative.

A third and closely related impediment is political opposition. With regard to introducing performance-based pay, this most obviously refers to the opposition of teachers’ unions. However, it can also involve unresponsive public-sector bureaucracies. Furthermore, reform efforts can also fail when their success and durability rely on politically determined funding commitments. The political opposition to reform in the broader public also turns on misinformation about what the existing evidence discussed here actually indicates. Specifically, opponents of the types of reforms recommended by ANAR often argue that investments in professional development are effective while performance-based pay has failed.

Given these interlocking issues, a compelling way to achieve change at scale may involve forming political coalitions around compelling reforms that adopt some but not all of ANAR’s proposals. For example, it may be possible to move school districts toward more effective professional development delivered by a carefully curated set of outside vendors if their provision involved cost-sharing that saved district resources. Alternatively, it may be possible to achieve durable political support for a teacher evaluation system if that system focuses narrowly on identifying master teachers and providing them with training and extra pay to coach their peers but takes a more incremental approach toward dismissing underperforming teachers. Intentionally combining such efforts with careful evaluation could, over the longer term, seed further evidence-based change in this important domain.

See the full Hoover Institution initiative: A Nation At Risk +40.

Get stories like these delivered straight to your inbox. Sign up for The 74 Newsletter

Republish This Article Learn More

Thomas S. Dee is an education professor at Stanford University, a senior fellow at both the Hoover Institution and the Stanford Institute for Economic Policy Research, and a research associate at the National Bureau of Economic Research.

Republish This Article

We want our stories to be shared as widely as possible — for free.

Please view The 74's republishing terms.


                <h1>Key to Improving America’s Schools: Rethinking School Staffing & Teacher Quality</h1>

                <h2>The status quo for in-service teachers — low-stakes assessments & rigid single-salary schedules — needs to change if student outcomes are to improve.</h2>

                <p class="sans">By <a rel="author" href="https://www.the74million.org/contributor/706069/">Thomas S. Dee</a></p>

                <img src="https://www.the74million.org/wp-content/uploads/2024/02/teacher-quality-a-nation-at-risk-education.jpg">

                <p>This story first appeared at <a href="https://www.the74million.org">The 74</a>, a nonprofit news site covering education. <a href="https://www.the74million.org/about/newsletters/?utm_source=republish-button&utm_medium=website&utm_campaign=republish">Sign up for free newsletters from The 74</a> to get more like this in your inbox.</p>
                
<p><em>The 74 is partnering with Stanford University’s Hoover Institution to commemorate the 40th anniversary of the ‘A Nation At Risk’ report. Hoover’s </em><a href="https://www.hoover.org/nation-risk-40-review-progress-us-public-education">A Nation At Risk +40<em> research initiative</em></a><em> spotlights insights and analysis from experts, educators and policymakers as to what evidence shows about the broader impact of 40 years of education reform and how America’s school system has (and hasn’t) changed since the groundbreaking 1983 report. Below is an excerpt from the project’s chapter on <a href="https://www.hoover.org/research/school-staffing-and-teacher-quality-nation-risk-40">school staffing and teacher quality</a>. (</em><a href="https://www.the74million.org/series/a-nation-at-risk-plus-40/"><em>See our full series</em></a><em>)</em></p>



<p></p>



<p>The publication of A Nation at Risk (ANAR) in 1983 was the defining moment of the “first wave” of education reform. It articulated improbably long-lived insights that continue to define education policy and discourse to this day. In particular, ANAR underscored, with uncommon rhetorical flourishes, the contrast between the ambitious ideals of a “Learning Society” and existing educational standards defined by modest minimum requirements, such as the low expectations embedded in high schools’ minimum competency tests and “cafeteria-style” curricula. Clearly, ANAR’s most prominent recommendation was the adoption of high school graduation requirements grounded in a “New Basics” curriculum that would feature four years of English; three years of science, math, and social studies; a half year of computer science; and, for college-bound students, two years of foreign language instruction.</p>



<p>However, ANAR also commented on several other dimensions of the education system in the United States, including the state of the teaching profession. In particular, ANAR concluded that “too many teachers are being drawn from the bottom quarter of graduating high school and college students.” The report also underscored the inadequate subject-matter focus of teacher training, low pay, teachers’ limited influence on key professional decisions (e.g., textbooks), and the targeted character of teacher shortages. These findings—and the seven specific recommendations ANAR made regarding teaching—have been the focus of education research, commentary, and policymaking to this day. </p>







<p>Below, I provide a compact overview of key insights from the research and policymaking that occurred in the wake of these recommendations. I focus specifically on the developments relevant to in service teachers, while the important issues related to recruitment, induction, and mentoring in the teaching profession are addressed separately by Michael Hansen <a href="https://www.the74million.org/article/40-years-after-a-nation-at-risk-key-lessons-for-reinvigorating-americas-teacher-workforce/">in a previous analysis</a>. ANAR made four specific recommendations relevant to in-service teachers. One is that teacher salaries should be “professionally competitive, market-sensitive, and performance-based” and linked to “an effective evaluation system” that rewards effective teachers and guides underperforming teachers toward improvement or termination. A related second recommendation advocates for collectively developed “career ladder” designations that distinguish beginning, experienced, and master teachers. ANAR’s remaining two recommendations for in-service teachers focus on supporting teacher improvement through funded time for professional development.</p>



<aside class="inline_story shortcode simple"><a href="https://www.the74million.org/article/40-years-after-a-nation-at-risk-key-lessons-for-reinvigorating-americas-teacher-workforce/"><figure style="background-image: url(https://www.the74million.org/wp-content/uploads/2024/01/a-nation-at-risk-plus-40-teachers.jpg);"></figure><div><span class="sans related_tag">Related</span><h4 class="sans">‘A Nation At Risk’ Turns 40: How America Can Reinvigorate Its Teacher Workforce</h4></div></a></aside>



<p></p>



<h3 class="wp-block-heading">Theories of Action</h3>



<p>ANAR’s recommendations for in-service teachers tacitly reflect two broad and complementary theories of action for improving teacher effectiveness and student outcomes. One involves improving the effectiveness of existing teachers. The intent is for this to occur through professional development activities and through the implementation of well-designed financial and professional incentives. Both of these intend to promote an understanding of high-quality classroom practices as well as their consistent use. The second theory of action focuses on selection—that is, performance assessment systems designed to retain and elevate the most effective teachers while ensuring that persistently ineffective teachers exit the classroom. Notably, these policy recommendations stand in sharp contrast to conventional efforts to promote teacher effectiveness through generic salary increases unrelated to performance or need and through reducing class sizes by hiring more teachers.</p>



<p>The motivations for ANAR’s theories of action rest upon several important stylized facts about teachers that have become increasingly well established since its publication. Arguably, the most foundational evidence concerns the variation in effectiveness across teachers. An older debate had questioned whether there are aspects specific to teaching that make it prohibitively difficult to measure teacher effectiveness in a valid and reliable manner. However, richer data and methodological advances have led to a consensus about the general validity of teacher effectiveness measures while also acknowledging important evidence on the degree of noisiness common to such measures.</p>



<p>These studies indicate that the variation in teacher effectiveness is large, particularly relative to the effects of other promising education interventions. Specifically, a one-standard deviation improvement in teacher effectiveness corresponds to a gain in student performance on standardized tests of roughly 0.1 to 0.2 standard deviations. Critically, the manner in which teachers are currently assessed — that is, informal, “drive-by” evaluations — captures virtually none of this documented variation, rates the vast majority of teachers as satisfactory, and results in little performance-based attrition of low-performing teachers from the classroom.</p>



<p>Another important stylized fact is that, at the hiring stage, school leaders have little capacity to identify the teachers who will become more effective. This combination of facts that teachers vary considerably in impact, but this impact can be observed much more easily after several years in the classroom than at the hiring stage—suggests the need for broader access to the teaching profession coupled with discerning assessment systems that guide subsequent personnel decisions. In particular, decisions to tenure rather than dismiss the lowest-performing teachers can have dramatic consequences given the length of teaching careers.</p>



<p>Over the past fifteen years, this evidence has motivated a number of ambitious public and philanthropic efforts to systematically improve the effectiveness of the teacher workforce through performance-based assessment systems. Recent research has also provided more credible evidence of direct initiatives designed to improve the performance of all in-service teachers through professional development. I discuss these policy innovations and the related research below.<br></p>



<h3 class="wp-block-heading">Improving teacher effectiveness</h3>



<p>ANAR recommended that teachers receive eleven-month contracts so that they could spend more time in professional development and provide additional instruction for students with special needs. While the eleven-month contract has not been widely adopted, broader efforts to improve the performance of in-service teachers through direct training and support involve a substantial expenditure of time and money. However, accurately identifying the magnitude of these outlays is not straightforward given the accounting challenges of categorizing such activities and their demands on time for both teachers and nonteaching staff. For example, a 2019 study by Alexander and Jang examined expenditure reports for Minnesota school districts and found that 1 percent of 2013–14 operational expenditures was spent on activities defined by the state as staff development. In contrast, a 2015 study by the New Teacher Project found that 2013–14 expenses related to teacher improvement constituted, on average, 8 percent of district budgets. This figure consisted of both direct expenditures on teacher improvement, such as professional development, coaching, and new-teacher support, as well as related indirect expenditures, such as the management, strategic, and operational expenses for these improvement efforts.</p>



<p>Focusing specifically on professional development, a 2014 study commissioned by the Gates Foundation found that the typical teacher spends sixty-eight hours per year on professional learning directed by districts, or eighty-nine hours when courses and self-guided professional learning are included. Most of the time spent by teachers in professional development occurs in workshops and professional learning communities conducted by district staff. The cost of this professional development was estimated at $18 billion per year in 2014. Teacher perceptions of the quality of these investments have generally not been encouraging, nor do they appear to have clear links to teacher performance or improvement. The Gates report also stresses the overwhelming use of district staff instead of market-tested external providers to provide professional development, as well as limited teacher voice in choosing their training.</p>



<p>Despite the considerable expense and prominence of teacher professional development, credible research on the impact of these investments has also been quite limited over much of the period since ANAR’s publication. For example, Yoon et al. reviewed more than 1,300 studies potentially addressing the impact of teacher professional development on student learning and found only nine studies that met the evidence standards in the federal What Works Clearinghouse: six randomized controlled trials and three quasi-experimental studies conducted between 1986 and 2003. However, what these studies revealed suggests a striking proof of concept: teachers who received substantial professional development could boost the achievement of the average control-group student by 21 percentile points. Notably, these nine professional development initiatives focused on elementary grades but differed in their theories of action.</p>



<p>However, other quasi-experimental studies serve as a reminder that implementing effective professional development consistently at scale is a serious challenge. Jacob and Lefgren examined the effect of teacher training in Chicago Public Schools using a credible natural experiment in which schools with low baseline test scores received additional resources for staff development. They found that this initiative had “no statistically or academically significant effect” on math or reading achievement of elementary students. Similarly, Harris and Sass examined student-level longitudinal data linked to teacher data for the state of Florida and did not find an overall impact of professional development on teacher productivity. However, they did find positive effects of content-focused math professional development on student outcomes at the elementary and middle-school levels.</p>



<p>Over the past decade, experimental studies of teacher professional development have proliferated. In general, they have provided mixed evidence of the learning impact of investments in professional development. For example, experimental studies by Garet et al. found that reading- and math-focused training changed teacher knowledge and practice but without clearly improving student achievement. However, meta-analytic summaries of such experimental professional development evaluations suggest that positive effects exist but vary considerably by program design. For example, Basma and Savage examined seventeen literacy-focused professional development studies and found an overall effect size for reading achievement of 0.225. Similarly, in a meta-analysis of ninety-five STEM-focused professional development studies with experimental and quasi-experimental designs, Lynch et al. report an average effect size of 0.21.</p>



<p>However, other multisubject meta-analyses suggest smaller but still positive effects on student learning. For example, Fletcher-Wood and Zuccollo identified fifty-three experimental evaluations of teacher professional development and found an overall effect size of 0.09. Similarly, Sims et al. reviewed 104 experimental evaluations and found an overall effect size of 0.05. Given the considerable financial expense of most training investments, effects of this size, though positive, raise serious questions about cost-effectiveness.</p>



<p>These reviews also note and seek to examine the considerable variation across professional development programs in terms of impact. Kennedy argues that the widely discussed design features of teacher professional development — namely program duration, emphasis on content knowledge, and use of professional learning communities — are far less relevant than whether the training addresses any of the four persistent challenges of teaching: portraying content, managing student behavior, enlisting student participation, and knowing what students understand. In a similar vein, Sims et al. characterize professional development programs by the more general ways they change teacher skills and behaviors. Specifically, they characterize teacher professional development by four “IGTP” traits that indicate whether teachers are provided with new insights (I), goal-oriented behaviors (G), and techniques (T) that are embedded in practice (P). And they conclude that professional development programs with all four traits have an effect size on student learning of 0.17. However, these assessments may obscure the relevance of professional development initiatives that focus on the most effective elements of content and practice, such as an emphasis on “science of reading” approaches in literacy-focused training.</p>



<p>Overall, this evidence indicates that ANAR was prescient in emphasizing the need for ongoing training of in-service teachers. The available evidence suggests that such training can have substantial effects on student learning. However, realizing the increasingly well-established potential of this training is not straightforward. It involves the perennial challenge of translating research findings—that is, the critical design features of effective professional development— into genuine changes in high-impact practice at scale.<br></p>



<h3 class="wp-block-heading">Teacher evaluation and performance-based incentives</h3>



<p>ANAR also made prominent recommendations to dramatically change how we pay and evaluate public school teachers. In general, the status quo to this day compensates teachers according to single-salary schedules that rigidly structure pay according to years of experience and observed qualifications (e.g., a graduate degree) that do not consistently predict teacher effectiveness. This approach has historical origins in well-intentioned efforts to eliminate overt discrimination and capriciousness in teacher pay. Today, critics allege that this inflexible approach has led to low and undifferentiated salaries that do little to attract, motivate, and retain the most-effective teachers and to direct the least-effective teachers out of the classroom, particularly in hard-to-staff schools and high-need subjects. Furthermore, this approach to pay is coupled with low-stakes, “drive-by” teacher evaluations that capture little of the variation in teacher performance and do not provide reliable guidance for professional learning.</p>



<p>ANAR envisioned an alternative in which teacher compensation was substantially higher but also based on performance in a manner that would direct persistently underperforming teachers either to improve or to leave the profession. In the aftermath of ANAR’s publication, several states and districts experimented with providing teachers with extra pay and career-ladder recognitions for demonstrated merit (though, not generally, dismissing chronically underperforming teachers). These reforms tended to be short-lived despite encouraging results. While the rollback of these reforms was clearly a policy choice, the underlying causes are debated. Ballou argued that it largely reflected the opposition of teachers’ unions. Murnane and Cohen contended that it reflected the distinctive character of teachers’ professional practice — that is, multidimensional and difficult to observe. However, random-assignment evidence from a comparatively well-implemented career ladder program in Tennessee indicates that it was effective in identifying teachers who raised student achievement.</p>



<p>The past two decades have witnessed a diverse variety of ambitious efforts, often encouraged by prominent philanthropic and federal initiatives, to measure teacher performance and to link it to improvement supports and incentives such as financial benefits, career-ladder designations, and dismissal threats. The research on these different reforms suggests their promise but also underscores the nontrivial challenges (e.g., design features, implementation, and political credibility) that make the consistent realization of this promise difficult. For example, the Obama administration’s Race to the Top (RttT) initiative disbursed more than $5 billion to states in a competition based in part on their commitment to developing systems for promoting teacher effectiveness. While RttT was effective in promoting state policy adoption, its effects on key design features and implementation are far less clear. In particular, while states were more likely to have multiple measures of teacher performance in the wake of RttT, the use of this data to inform salary and retention decisions remained uncommon. The state reforms over this period were “rarely sustained over time,” offered low bonuses, and rated fewer than 1 percent of teachers as unsatisfactory.</p>



<p>A more granular focus on the available evidence from specific initiatives provides richer insights into these issues of design, implementation, and political durability. For example, several studies focused narrowly on simply providing teachers with incentives for improved performance. These studies often found null (or weak) effects that are likely to reflect the unique character of these programs. “Cash for test scores” experiments with individual incentives for teachers in Nashville and group incentives for teachers in Round Rock, Texas, found little to no evidence of effects on teacher practices, attitudes, and the learning gains of their students. Similarly, studies of a group-based teacherincentive experiment in New York City found that they had no overall effects on key teacher or student outcomes.</p>



<p>Critics of teacher incentives suggest that these null findings reflect a misunderstanding of teacher motivations and the manner in which such incentives might debase intrinsic motivation. However, three design features of these studies could also contribute to these null findings and have important implications for performance-based assessment and compensation. First, the fact that participants know that these experimental incentives have a short term (e.g., two years) can sharply attenuate the resulting motivation to undertake changes in professional practices. This same concern can also apply to the incentives embedded in at-scale policy reforms that are viewed as faddish and unlikely to endure politically. Second, these initiatives generally focused on student achievement as the incentivized outcome. This may weaken the impact of incentives if teachers do not see or understand how they should change everyday practice to realize these rewards. A related third point is that these incentive studies generally did little to support and guide teachers in how they could change their professional practices to earn these rewards.</p>



<p>Three other studies suggest the potential importance of other design features. A teacherincentive study in Chicago Heights, Illinois, found positive effects on student achievement (but only in the first wave of the experiment) when the incentives were framed as the loss of an award rather than a gain. Second, the Talent Transfer Initiative (TTI) found positive effects when offering high-performing teachers a high-powered incentive ($20,000) linked to a distinctly clear, easily observed, and important behavior: working in a hard-to-staff school for two years. However, it is notable that these incentive-based gains were difficult to realize. More than 1,500 teachers had to be approached in order to fill only eighty-one vacancies. Third, the Accelerating Campus Excellence (ACE) program in Dallas similarly provided large incentives to highly effective teachers willing to work in hard-to-staff schools. Morgan et al. presented evidence that ACE produced dramatic gains in student performance: a 0.3 effect size in reading and 0.4 in math. This study also found that this success replicated as the program went to scale and that these gains were reversed when the program was eliminated.</p>



<p>Notably, these focused incentive programs all fall short of the more comprehensive system of assessments, supports, and incentives recommended by ANAR. TAP: The System for Teacher and Student Advancement (formerly known as the Teacher Advancement Program), which was introduced in 1999 and is currently active in “nearly twenty states and hundreds of school districts across the US,” is closer to ANAR’s vision. Specifically, the defining features of TAP include career ladder designations for teachers and job-embedded, professional learning led by master teachers. In support of this professional learning, TAP also provides teachers with comprehensive evaluations of their professional practice. However, it is not clear that this “instructionally focused accountability” articulates clear mechanisms for directing consistently low-performing teachers out of the classroom (the selection mechanism in ANAR’s theory of change). Finally, TAP includes performance pay typically linked to observations of teachers’ professional practice, such as classroom observation, portfolios, and interviews, as well as test scores.</p>



<p>The available evidence suggests that TAP is effective in improving teacher performance and student outcomes. Specifically, in a quasi-experimental study based on 1,200 schools from two states, Springer, Ballou, and Peng found that TAP increased student performance, particularly at the elementary school level, with effect sizes varying from 0.12 to 0.34 by grade. Similarly, Cohodes, Eren, and Ozturk, leveraging the rollout of TAP across schools in South Carolina, found that it generated improvements in several long-run outcomes, including educational attainment, criminal activity, and the take-up of government assistance. However, a random-assignment evaluation of TAP in Chicago schools by Glazerman and Seifullah found that it did not improve student achievement and that it was also vexed by the challenges of implementing this reform with fidelity, such as teacher payouts being smaller than originally stated and no rewards based on value added because of inadequate data systems.</p>



<p>Two other high-profile studies provided further evidence of the serious challenges of implementing comprehensive reforms of teacher assessments and compensation as well as of credibly assessing their effects. The first example is the federal Teacher Incentive Fund (TIF). Congress established TIF in 2006 to provide grants to high-need schools implementing performance-based compensation systems. The four required components of TIF reforms also resembled those suggested by ANAR: (1) measures of teacher performance, including observations of classroom practice; (2) large, differentiated, difficult-to earn performance bonuses; (3) additional pay for career-ladder opportunities, such as becoming a master teacher and coach; and (4) professional development linked to the teacher assessments. A congressionally mandated study of TIF focused on the 2010 grant recipients in more than 130 school districts and found it led to student achievement of 1 to 2 percentile points higher in reading and math.</p>



<p>However, there are two important caveats to this evidence of modest impact. First, the implementation of these reforms in the study districts was incomplete. Only about half of the participating districts reported implementing all four components of the reforms required by TIF. In particular, professional development was frequently not provided, and most teachers received bonuses, “a finding inconsistent with making bonuses challenging to earn.” Second, the treatment–control contrast assessed in this random assignment study did not examine the effect of TIF versus “business as usual.” Instead, the treatment schools in the study were intended to receive pay-for-performance bonuses while the control group received automatic bonuses. And all study participants, both treatment and control, were assigned access to the three other TIF components: career ladder responsibilities and rewards, evaluative feedback, and professional development. In this critical but often overlooked detail, the federal study of TIF more closely resembles the studies of teacher incentives noted above than a true evaluation of teacher assessment systems.</p>



<p>The Gates-funded Intensive Partnerships for Effective Teaching initiative is a second widely discussed example of implementing and evaluating teacher assessment systems. This initiative sought to introduce assessment reforms within three school districts and four charter management organizations. Similar to both TAP and TIF, this effort featured focused professional development and career ladder incentives along with performance pay and retention decisions based on direct, structured observation of teacher practice and value-added scores. A quasi-experimental study found that these reforms did not clearly improve the focal student outcomes of high school graduation and college attendance. However, the implementation of the reforms appears to have been weak. The teacher evaluations flagged few teachers as poor performers, and in sites with available data, only 1 percent were dismissed for poor performance. As with the federal TIF evaluation, the treatment contrast that was studied was muted because the comparison schools in this study often adopted similar policies.</p>



<p>IMPACT, the highly controversial teacher assessment reforms introduced in the District of Columbia Public Schools (DCPS), is distinctive as a seminal and enduring effort to implement ANAR’s recommendations with fidelity. IMPACT evaluated DCPS teachers on multiple measures with a heavy emphasis on structured classroom observations, including some conducted by district staff, and linked professional development. These evaluations resulted in measures of teacher performance that exhibited variation rather than being largely uniform. IMPACT linked these measures to high-stakes consequences: substantial pay increases for “highly effective” teachers, particularly those in high-poverty schools; dismissal for a small number of “ineffective” teachers; and a dismissal threat for “minimally effective” teachers who did not become effective within a year.</p>



<p>A quasi-experimental study of the incentive contrasts embedded in IMPACT found it had positive effects on teacher performance. This study’s design leveraged a feature of IMPACT in which teachers with performance scores just below a threshold value were deemed “minimally effective” and subject to a dismissal threat while those with scores at or above the threshold were not. A comparison of teachers just below and above this threshold found that the threat of dismissal caused minimally effective teachers either to leave the district or to improve their measured performance substantially. A powerful financial incentive for highly effective teachers to repeat their prior performance also appeared to have positive effects.</p>



<p>Three other aspects of IMPACT merit emphasis. First, the political credibility and resiliency of IMPACT appeared to be highly salient. In 2010, when the city (and district) leadership who championed IMPACT were forced out of office, the first “minimally effective” designations did not appear to change teacher behavior. However, the ratings reported in the summer of 2011, when it appeared that IMPACT would endure, did drive changes in teacher behavior.</p>



<p>Second, evidence indicates that IMPACT not only improved the performance of existing teachers but also replaced underperforming teachers who exited with substantially more effective instructors. Specifically, a quasi-experimental study by Adnot et al. finds that, when a low-performing teacher exited, their replacement raised student performance by 0.14 standard deviations in reading and 0.24 standard deviations in math. Third, the performance benefits of IMPACT’s incentives endured through subsequent revisions to the teacher supports and ratings structure.</p>



<p>A second district reform of note (and one with strong parallels to IMPACT) began in the Dallas Independent School District in 2015. Specifically, like IMPACT, the Teacher Excellence Initiative (TEI) replaced a single-salary schedule with compensation based on multiple measures of teacher performance. Furthermore, like IMPACT, it also did so in the context of accountability for school principals. TEI also implemented a unique design feature to discourage inflated or arbitrary ratings of teachers. It fixed the overall distribution of ratings and penalized principals for subjective ratings that were highly misaligned with test-based ratings. A synthetic-control study by Hanushek et al. found that these reforms led to statistically significant increases in student achievement that grew over time to a roughly 0.2 standard deviation in math and a 0.1 standard deviation in reading.<br></p>



<h3 class="wp-block-heading">Concluding thoughts</h3>



<p>ANAR’s recommendations that focused on improving the effectiveness of in-service teachers were a harbinger of some of the most dramatic education policy innovations of the past forty years. And these innovations have provided us with several proofs of concept and new insights that establish the potential to improve student learning through dramatic changes in teacher evaluation, in-service training, and compensation.</p>



<p>However, it must also be acknowledged that there has clearly not been large-scale, lasting change regarding ANAR’s teacher-focused recommendations. Uninformative, low-stakes assessments of professional practice and rigid single-salary schedules are still the norm for the vast majority of teachers in US public schools. And while in-service teachers do engage in extensive professional development, the impact of these expensive and highly variable investments is uncertain at best.</p>



<p>Any serious effort to reimagine the assessment, training, and compensation of in-service teachers should begin by confronting the factors that have contributed to the long durability of the status quo. There appear to be three broad and interrelated impediments to substantive change. The first is the need to improve the knowledge base of how best to design the key features of these reforms. For example, efforts to improve teacher evaluation and introduce performance-based teacher pay rely critically on valid and reliable measures of teacher performance. Promising gains in measuring teacher effectiveness are likely to come from continued improvements to structured rubrics for classroom practices. Incentives can better guide the professional improvement of teachers when they are linked to the high-impact, everyday classroom practices teachers directly control and can enhance through complementary training.</p>



<p>Another important area where improved knowledge is critical to driving at-scale change concerns the design of teacher professional development. The typical professional development experience, workshops directed by internal district staff, is often criticized (e.g., the New Teacher Project 2015). At the same time, a recent and growing body of experimental studies indicates that purposively designed professional development can have substantial impact. This literature generally emphasizes the particular benefits of in-service training that focuses on meeting more general challenges of teacher practice. While more can be learned about the design of professional development, the question of how to design its delivery is even more uncertain. A study from the Gates Foundation suggests that relying more on external providers of professional development will make it easier to move nimbly to market-tested and effective approaches. However, several of the teacher assessment reforms discussed here instead emphasize redesigning internally provided professional development to rely on master teachers who may be better positioned to serve as coaches providing embedded and relevant training. These issues underscore the need to build a complementary learning agenda around any new reforms (e.g., inquiry cycles, networked improvement communities).</p>



<p>A second impediment to realizing ANAR’s vision concerns the multifaceted operational challenges of implementing meaningful reforms effectively at scale. The null findings from credibly identified studies of professional development in at-scale field settings suggest this issue. However, more-direct and sobering evidence comes from several well-funded, high-profile efforts to introduce teacher assessment and compensation reforms at some scale. These include (1) the failure to deliver value-added bonuses because of data-system inadequacies in TAP; (2) the limited variation in teacher ratings and their infrequent use in personnel decisions in the Gates Foundation’s Intensive Partnership for Effective Teaching; (3) the inconsistent delivery of professional development and the broad distribution of bonuses under the federal Teaching Incentive Fund; and (4) the limited use of teacher evaluations to guide salary and retention decisions under the RttT initiative.</p>



<p>A third and closely related impediment is political opposition. With regard to introducing performance-based pay, this most obviously refers to the opposition of teachers’ unions. However, it can also involve unresponsive public-sector bureaucracies. Furthermore, reform efforts can also fail when their success and durability rely on politically determined funding commitments. The political opposition to reform in the broader public also turns on misinformation about what the existing evidence discussed here actually indicates. Specifically, opponents of the types of reforms recommended by ANAR often argue that investments in professional development are effective while performance-based pay has failed.</p>



<p>Given these interlocking issues, a compelling way to achieve change at scale may involve forming political coalitions around compelling reforms that adopt some but not all of ANAR’s proposals. For example, it may be possible to move school districts toward more effective professional development delivered by a carefully curated set of outside vendors if their provision involved cost-sharing that saved district resources. Alternatively, it may be possible to achieve durable political support for a teacher evaluation system if that system focuses narrowly on identifying master teachers and providing them with training and extra pay to coach their peers but takes a more incremental approach toward dismissing underperforming teachers. Intentionally combining such efforts with careful evaluation could, over the longer term, seed further evidence-based change in this important domain.</p>



<p><em>See the full Hoover Institution initiative: </em><a href="https://www.hoover.org/nation-risk-40-review-progress-us-public-education"><em>A Nation At Risk +40</em></a><em>. </em></p>

Contact Us

Follow Us

Explore

40 Years After ‘A Nation At Risk,’ What Needs to Change About School Staffing & Teacher Quality to Better Serve Students

The status quo for in-service teachers — low-stakes assessments & rigid single-salary schedules — needs to change if student outcomes are to improve.

Untangle Your Mind!

Most Popular

Big Tax Bill Passes Senate With Less ‘Beautiful’ Plan for National School Choice

How One Rural Elementary School Achieved Over 80% Reading Proficiency

Survey: 60% of Teachers Used AI This Year and Saved up to 6 Hours of Work a Week

Schools, Groups Serving Undocumented Kids Take Their Activities Underground

Nearly $7 Billion for Schools in Jeopardy as Ed Dept. Holds Up Federal Funds

Theories of Action

Improving teacher effectiveness

Teacher evaluation and performance-based incentives

Concluding thoughts

On The 74 Today