In this interview, Jackson, an economist, discusses his work on the importance of school spending and why students seem to benefit from increased resources. He also talks about what drew him to education research, what test scores do not measure, how providing teachers with curriculum resources can improve student achievement, and what research he’s working on presently.
The interview has been edited for clarity and length.
The 74: Can you tell me just a little bit about your background, how you became a researcher and what interested you in education research specifically?
Jackson: I’ve always been interested in social science, and the question of how and why there are differences in outcomes across different populations has always been something that’s interested me. If you look across countries, there are some countries where people are living very well on average and others where people aren’t living so well. In general, that tends to correlate very heavily with the level of education in those countries.
My dad is also an economist, and I grew up hearing these things. One strategy for many countries, especially after they gained independence, was to invest a lot in education, and the idea was that by investing in education you are investing in economic growth and that’s going to lead to prosperity. That’s at the national level, but even within a country, when you look at areas where there’s a lot of poverty, it tends to be associated with low levels of educational attainment as well.
Education has always been something that seems like it’s going to be the key to reducing some of these disparities. That’s why I’ve always been interested in the basic topic.
One critique I’ve heard of the role of economists in education is that sometimes you all can be too hyperfocused on the numbers and that there can be a sort of “garbage in, garbage out” problem when there’s too much of a focus on numbers in education, especially when the prevailing numbers are test scores. Do you think there’s any validity to that criticism?
For certain I agree with the idea that garbage in means garbage out, but I don’t think it’s fair to criticize economists. I think in general what economists have done is, we try to be very disciplined and we believe in being able to quantify things, generally. We believe in analyzing things we can measure, and if you can’t measure it, then you can’t verify things. I think that’s what separates us from other fields. We are social scientists, we’re not in the humanities, and if you’re doing social science, measurement is a big part of that, and being able to falsify theories is part of that, and the only way to falsify something is doing empirical work where you can document things. Because of that, economists have been very focused on things that can be measured, and test scores are one of those things that can be very well measured. You can always make the argument, along the lines of the joke of the drunk man who’s lost his keys — he’s fumbling around and he says, “I lost my keys over there, but I’m looking over here because this is where there’s light.”
In some sense, perhaps economists are guilty of that, but I think most of us who are really thoughtful about it — and I think many economists are — recognize that there is a problem in just focusing on, say, test scores because we can measure it, and are informed by the work of psychologists, sociologists and anthropologists who have looked in classrooms and looked in schools and thought hard about what are the things that matter.
Some of us, like myself, have thought about ways we can measure some of the more difficult to measure, nebulous concepts. That’s the way I’ve taken it, which is not to say we’re not going to focus on measurement, but to say, can we move measurement forward to capture more of the things that matter that may or may not be measured by test scores. The critique that test scores is not the only thing or the best thing is absolutely valid. But I would push back against the claim that movement toward measuring things and verifying things is a bad thing. I think it’s a good move and we just have to be smarter about measurement.
Let’s talk about some of your specific research interests, starting with your frequently cited research on school funding. Can you talk about the impetus for your research and how you approached the question?
The question started out with the acknowledgment that if you look at the literature on the effects of school spending on student outcomes, it’s actually pretty old. A lot of the studies that have looked closely at the effect of school spending on student outcomes were written in the 1990s, some in the early 2000s.
The quality of the data that we have on student outcomes has improved dramatically in the past 15 to 20 years. It actually relates to the first question you asked about the growth of the economics of education — we have much better data on educational outcomes than we used to have. Also the methods that economists have been using, specifically applied economists, to tease out causal relationships from data have improved dramatically in the past 15 to 20 years. I’m not saying that to criticize the old literature — just that the old literature is old.
If you look at the old literature, the first thing that sort of struck me is that there is an idea, which is very pervasive among economists, that school spending doesn’t matter. It’s something I learned in graduate school. In casual conversation with most economists, they would say, “Yeah, yeah, we know that school spending doesn’t matter.” I sort of started from that standpoint and thought, Let me look at the literature and see what the evidence base is for that statement. As I kept on looking through, it became pretty clear that the evidence supporting that idea was pretty weak. It just would not stand up to the level of scrutiny that we put on empirical evidence today.
So you looked at court-ordered school spending.
That’s right. One of the problems with the existing literature at the time was that it was basically based on correlations between spending and outcomes. We know that raw correlations may give you the right answer, but they could also give you the wrong answer because we don’t know what else could be going on in the data. If you wanted to get a sense of what the causal relationship was between increases in school spending and student outcomes, ideally you want something that’s kind of random. In the extreme, what you really want is to just randomly give some school districts more money and see what happens to the kids who are in those schools. That’s in the ideal case what you’d want. The closest approximation of that in the real world, that we could think of, was these school finance reforms.
The school finance reforms started in 1972. The basic idea was, within a state there would be a court case brought against a state arguing that the prevailing system to distribute funds to schools was inequitable, and they typically had to be overhauled if the courts decided the existing system was not equitable. In response, they would change the funding formulas, which typically resulted in some school districts, within a year or two of the court decision, getting some additional funds, and we argue that this is kind of random, in the sense that the timing of this is going to be random. We know that it’s going to be the lower-spending districts that get more money, but the timing of when that money drop occurs is essentially random. Comparing the change in outcomes between groups of students who were in school during the money drop versus prior to the money drop — compared to other districts that didn’t get any money — approximates this randomized experiment.
What were the main effects of school spending that you found?
What we basically do is compare the educational attainment, earning and poverty levels in adulthood for individuals who would have been different ages when the money drop occurred. The finding is that if you are of school-going age — between 5 and 17 years old — when your school district experienced an inflow of additional money, those individuals had higher educational attainments, higher earnings and lower levels of adulthood poverty compared to their counterparts from the same district who would have been, say, 18 years old when the money drop happened ... but weren’t in school at the same point in time.
How do you know that the money drops were random? We might worry that it’s not random — and this has been a critique of your paper — in that maybe courts drop the money when they think the district is most in need or are responsive to some other non-random factor.
It’s certainly possible that it’s not random, and as I said, it’s not a randomized experiment, so there’s always the possibility in any empirical work that there could be some other confounding variables — even if you have an experiment, you’re still not sure. In our analysis we’re very, very careful to present an array of evidence suggesting that the effects are real.
One thing that we show is that it affects you more, the more years you’re in school. If you look at kids who are at the same school at different points in time, the kids who were, say, 12 years old when the money drop accrued benefit more than kids who were 16 years old when the money drop occurred. Kids who were 18 years old are totally unaffected. That tells you that whatever happened has to do with actually being in school and the duration of time the individual was in a school with the additional money.
Do you worry that how court-ordered money is spent might be different than money that goes through the regular political meat grinder or the state funding formula — might those two different pots of money be spent differently, so that your results may not apply to non-court-ordered spending?
Absolutely. The benefit of the study is that arguably we’ve identified spending changes that could be considered quasi-random, so we’re getting a real causal relationship. The downside of that is that it’s a causal relationship for a specific kind of spending increase, so what you gain in credibility, you lose a little bit in external applicability, and that’s always a trade-off that you’re going to face.
It’s the same problem if you run an experiment — what may hold in an experimental setting may not hold in a non-experimental setting, and this is a kind of similar situation. The kinds of effects we’re observing, associated with school finance reforms, may not be the same thing you experience for just general increases in spending or what you would experience if you were to slash spending. That’s always something to think about, and I don’t really have a good answer for that. That’s something I’m currently doing research on right now: trying to look at what happens when schools lose money — specifically, what happened during the recession.
One idea that has a lot of traction in the political debate is the idea that spending money wisely is much more important than how much money is spent. Indeed, some scholars and researchers also make that claim. What do you think of that?
I find the dichotomy kind of annoying, to be perfectly honest. If you take a step back, whether money is spent efficiently or not is certainly important. But if you don’t have the money, you can’t spend it. It’s really quite as simple as that.
The fact that there may be waste does not in any way, shape or form negate the fact that more money may have a positive effect. We should do both: We should spend more money and we should spend it more wisely. They’re just not opposed to each other.
Your research finds that the districts that got this court-ordered money spent it on some traditional inputs, right?
Yes. To be clear, there’s no way to know everything of what they did because we can only look at the things that we measured, but in terms of the things that we measured, we were able to see that the districts that got the increased spending used it to reduce class sizes, increase teacher pay and increase the length of the school year. That’s the basic finding, but there could have been other things that went on as well that we did not measure.
Let’s talk about your research on teacher quality and what you found in terms of teachers’ impacts on students outcomes beyond test scores.
I’m actually working on a few things on the same idea. In one paper, I started out from the standpoint that we know that teacher effects on test scores measure something, and we’ve seen through work by Raj Chetty and co-authors that teachers who improve test scores have important impacts on long-run outcomes for kids. We know that they measure something, and it’s a valuable tool to assess teacher quality.
And in fact you’ve also documented that test-score value-added [teachers’ impact on their students’ test score growth] measures something, separately from Chetty et al.’s work, right?
That is correct. It’s a real thing.
This paper is not set out to be a critique of using test scores, but really asking the question: We know that test scores measure something and that teacher effects on test scores measure something, but how much is it not measuring?
Informed by literature in economics that tries to infer non-cognitive skills or soft skills or socioemotional skills from behaviors, I set up a set of non-test-score outcomes, or behaviors that might be sensitive to non-cognitive skills that are not well picked up by test scores. The ones I used specifically are grades, attendance, discipline and on-time grade progression. The idea is that students who come to school on time and come to school every day, those are the types of students who are exhibiting stick-to-it-iveness and grit and academic engagement, and those may not be exactly the same traits picked up by having higher test scores.
What I do basically is create an index of these behaviors and see whether teachers have systematic effects on those skills. The question, once I do that, is, Do the teachers who improve these skills also improve long-run outcomes in ways that we would not detect if we used their effects on test scores alone? And the answer is an unequivocal “yes.”
In that sense, the teachers who are very good at improving test scores are not necessarily the same ones who are good at improving these soft skills or non-cognitive skills, right?
That is correct. Now, to be clear, the teachers who improve one tend to be on average better at the other one, so I’m definitely not finding that teachers who are good at test scores are bad at improving these soft skills.
So there’s a positive correlation, but not a particularly strong one, between the two sets of skills.
That is correct.
But that does not mean there’s not a trade-off. You can imagine that it depends on the context. Even if they’re positively correlated in the data that I look at, if you put into place an incentives system that says we’re going to reward you for improving your students’ test scores, and we’re going to ignore what you’re doing with soft skills, you might actually create a situation where you could generate a negative correlation between the two. The fact that they’re positively correlated does not mean there’s not a trade-off — there could very well be a trade-off.
To put numbers on this, only about 5 or 6 percent of the variability of the teacher effect on test scores is associated with the effect on these softer skills, and vice versa.
Can you describe the relationship between the teacher effects and students’ long-run outcomes?
The long-run outcomes that I look at in this current study: looking at students in ninth grade and their ninth-grade teachers. Then I look at their 12th grade outcomes, whether they graduate from high school and whether they report that they’re going to attend college. The finding there is that the teacher effects on these soft skills are much more predictive of their effects on these longer-run outcomes than the teacher effects on test scores. Teacher effects on test scores do predict — teachers who raise test scores are associated with higher levels of high school graduation and higher levels of students planning to go to four-year college, but the effects of teachers on the soft skills are much larger in magnitude.
Some people have looked at your research and made the argument that this suggests that test-score-based value-added measures of teacher quality shouldn’t be used for individual teacher evaluation. Where do you come down on that?
It really comes down to what we mean by “used.” For sure they’re imperfect measures; for sure they’re noisy.
There are two reasons why policymakers might not want to use test scores for evaluation. One is that they’re missing a lot of important skills, which is what I’m documenting. The other one is that even if we only cared about the skills measured by test scores, they’re pretty noisy and it bounces around from year to year. Both are true.
To me, what I think that means is that there’s a lot of noise, which means we have to take everything with a grain of salt. But there’s also a considerable amount of signal, and insofar as there’s signal, it means it’s valuable. I think perhaps the way I think we should use it is that it should be part of the evaluation system and taken as a piece of the whole, not as the entirety. To be honest, I think most or many school districts that are using test scores for evaluation are kind of using it as a component of evaluation and not for the entire thing.
No district that I’m aware of is using test scores for the entire thing. The most that I’ve seen is 50 percent.
My gut sense is 30 percent is probably about where it should be.
What can policymakers take from your research on non-test-based measures?
The main conceptual takeaway that has policy content is that there is something that we can detect, that teachers are having effects on, that predicts, to a great extent, how effective teachers are at improving long-run outcomes of kids. That’s important.
Of course, then the question is, how do we measure these things and how do we use them. Arguably, that’s a separate question, and I can speak to that as well. The first point is sufficiently important that it stands on its own.
The second piece is, well, how do we measure it and how do we use it. I think one way to do it would be to come up with good psychometric measures of these constructs that I’m describing, that would be difficult to manipulate.
Another potential avenue, which I explore a little bit, is, we could use measures the following year. In principle, you could say, “I’m going to evaluate a ninth-grade teacher based on how well her students do in 10th grade, not on how well she does in ninth grade.” That way, I can’t suddenly give all my students A’s. That’s not going to benefit me, because that’s not going to make them do better in 10th grade. That’s one way to get around some of these issues. There are ways to structure incentives to deal with this gaming issue.
I would also say it’s not obvious that we have to use this to attach stakes to it necessarily. We could just use it as an evaluation tool. We could observe this and some teachers who are doing well in one dimension, we target them for remediation or for training or for special professional development, or something of that nature. I don’t think it has to necessarily be used in a high-stakes environment for it to be useful for policy.
Let’s turn to your recent research on curriculum. Can you talk to me about what you found in that study?
I’m writing this paper with a co-author, Alexey Makarin. In this particular intervention the curriculum was designed by Mathalicious, which is a private company that writes pretty-high-quality lessons for schools. The lessons tend to be very real-world lessons, and they’re designed to be interesting and engaging for the students. They tend to be project-based.
We ended up with three different school districts, and we randomly gave some teachers access to these online lessons, which means the lessons are provided online, teachers can download them, and then they can teach them in the class. They’re not taught online; they’re just provided online to the teachers. So teachers are randomly given access to these online lessons, and among those who were given access, half were randomly given access to some additional supports online — basically an “Edmodo,” like a Facebook group you can log in and you can see a webinar describing the lessons you have access to, saying if you teach this lesson, you may want to highlight certain things. Essentially, it’s additional supports to facilitate use of these lessons.
The basic theory behind why we thought this might be useful was that teachers are engaged in a lot of stuff. They’re managing the classroom, they’re planning lessons, they’re also delivering lessons, and it’s a complicated job. You might think not everyone is good at all three of those things, and if you can just provide teachers with high-quality lessons, they might be able to give lessons that are better than what they would have planned themselves, and it also provides some time savings that they could spend doing other things.
In some sense it’s a very light-touch intervention. We just provided it to them, and they can use it if they want to use it; they don’t have to use it if they don’t want to use it. This could actually improve outcomes a lot if the teachers who use it are those whose lessons would not have been as good as the one that’s provided online.
That’s essentially what we find. We find that providing the lessons plus the supports led to a pretty sizable improvement in student test scores at the end of the year. And those test score gains were much more highly concentrated among teachers at the lower end of the quality distribution. It was precisely the weaker teachers who benefited the most from having access to these lessons. We didn’t find any negative effects for anybody else. It suggests that giving these high-quality lessons improved the outcomes quite a bit.
And it’s a pretty cheap thing to do. Talking about spending money wisely, it would easily pass a cost-effectiveness test, right?
Yeah, in terms of cost-effectiveness, it’s ridiculous. The cost per pupil — I forget exactly what the numbers were — but every license at the time was just about $200 per teacher. If you throw in the costs of the additional supports, it ends up being maybe $300 max per teacher. Every teacher has about 80 kids in the classroom, because we’re talking about high school in this case. You’re teaching 80, 90 kids, spending $300 total — the per-pupil cost is really, really small.
The test score gains are about 8 percent of a standard deviation. To put that into context, that’s about the same benefit of reducing class size by a quarter. It’s a pretty big effect, and it’s costing on average about $5 or $6 per student. In terms of cost benefit, I think the cost-benefit ratio we computed was over 100. It blows most interventions that we think of, like reducing class size or increasing teacher quality, out of the water.
Having said all that, I’m not saying that we shouldn’t do those less cost-effective things, but if you want to improve outcomes, you should start with the low-hanging fruit first. This seems like a pretty low-hanging-fruit way to start.
What other research topics are you working on currently?
I’m doing two projects right now.
One is actually a continuation of the work looking at the effects of K-12 spending. What we’re doing in this study, which is also with Berkeley’s Rucker Johnson, is asking this question: To what extent is spending at the early-education level related to the effectiveness at the K-12 level?
Specifically, if you think of the schooling system as a whole, as opposed to two different pieces, if you could improve early-childhood education such that students are better able to take advantage of the K-12 system, each additional dollar spent could actually increase the impact of every dollar you spend in K-12.
To explore that, we use a rollout of Head Start as a quasi-random shock to exposure to early-childhood education for low-income kids. Essentially what that means is that there are some kids who were exposed to Head Start when they were young and were exposed to a school finance reform when they were between the ages of 5 and 17. Others were only exposed to Head Start but not exposed to increases in K-12 spending. Others were not exposed to Head Start but did have increases in K-12 spending. By making comparisons across these groups, we can ask the question of whether the effect of both of them is greater than the sum of the two. That’s essentially what we’re going to look for, what we’d call dynamic complementarity. Is there dynamic complementarity between early-education spending and K-12 spending?
Basic findings are that the effect of Head Start on long-run outcomes is bigger if the child subsequently attends a K-12 school that is well funded. The opposite is also true: Each additional dollar spent on K-12 is more effective at increasing wages if it was preceded by Head Start spending.
The other research is also a continuation of the study where we’re looking at the effects of teachers on non-test-scores outcomes. I’m using data from the Chicago Consortium of Public Schools. The idea is to replicate what I’ve already done and use richer data. In those data, they collect psychometric measures of non-cognitive skills, such as grit, study habits and social skills. Basically, I want to see whether the teachers who are improving the behaviors that I described are the ones who are improving measures of grit and the self-reported measures of soft skills.
Those data can also be linked to college outcomes, and we’re hoping to link it to labor market outcomes to track the students through adulthood, to see whether the teachers who are improving soft skills versus test scores, what their effects are on long-run outcomes such as college-going, crime, college graduation and earnings.
The nice thing about this is that these data also contain classroom observations of teachers by parents, principals, and survey measures by teachers and students that are all going to allow us to hopefully get a sense of what is going on inside the classrooms that are generating these differences. What are teachers actually doing such that they are improving test scores versus improving soft skills and vice versa? Get inside that black box a bit to get a sense of what’s going on. Only when we get a sense of which teacher practices are generating gains for students can we really start to inform professional development and all the rest of the kinds of things you want to do to improve student outcomes.