‘Countries Aren’t Sports Teams’: International Test Rankings Distort More Than They Reveal, Renowned Statistician Warns
Renowned statistician and education expert Judith Singer has issued a new warning against overinterpreting the results of international test rankings. In an article published Thursday in the journal Science, she writes that tests like PISA and TIMSS bear important data for education policy makers around the world, but that rankings systems based on scores ultimately distort more than they reveal.
U.S. students’ mediocre performance on tests of math, reading, and science routinely yield glum narratives in the national press, particularly when held up against the relatively higher scores in East Asian countries like Korea, Japan, and Singapore.
The insight gained from large-scale assessments provides a valuable window into academic practices, Singer says, but journalists and politicians are too often transfixed by their countries’ respective spots in the global pecking order.
“The rankings that are commonly used to report the results of [international tests] draw headlines, but they are often incredibly misleading,” she told The 74. “The countries aren’t sports teams to be ranked as winners and losers.” Indeed, she observed, the British press uses the same term to describe the hierarchy of international testing performance — “league tables” — as for soccer and rugby standings.
Singer, a senior vice provost and the James Bryant Conant Professor of Education at Harvard, has written widely on improving methods of quantitative analysis in public policy. As the chair of a committee assembled by the National Academy of Education, she has helped edit a report on international large-scale assessments that test students from various countries on content knowledge. She will speak on a panel about the report’s release in Washington on Friday.
Worse than the alarmism accompanying news stories, Singer says the rankings themselves are frequently arbitrary and mercurial. Positions change from year to year for reasons having little or nothing to do with student performance in a given country. And the rules of the tests allow for a certain amount of gamesmanship, as when Shanghai earned a top ranking for math in the 2012 PISA exam — only for the world to later discover that it had excluded 27 percent of its 15-year-olds from taking it.
On the 2015 PISA, Japan improved on its fourth-place ranking for scientific literacy three years earlier, moving to second overall. But the jump wasn’t because of improved performance; scores actually went down, though not as much as other countries’.
In a Japanese news item on the results, a graph shows scores and rankings over time. A line representing the country’s science ranking ascends from 2012 to 2015 — even though actual scores dropped by nine points.
Deep-seated national differences also tend to skew our perceptions of who’s up and who’s down. It doesn’t really make sense, Singer remarked, to group countries with decentralized education sectors — like the United States, Canada, and Germany — alongside those with properly national school systems, such as France, that can mandate instructional and curricular choices at will across their entire student populations.
Motivation matters too. In Singapore, surveys indicate that eight out of 10 primary school-aged children receive private instruction for testing. In Korea, the government spends 3.5 percent of the Gross Domestic Product on schooling; independently, families spend another 2.6 percent on private tutors and other resources. Top tutors for wealthy pupils can become millionaires themselves.
Meanwhile, students in the United States and other Western countries are generally thought not to take low-stakes international tests particularly seriously, given that they have no impact on grades or college eligibility. A recent study indicated that American students’ performance on PISA would improve substantially if they were offered money for correct answers.
“Singapore has fewer schools than Massachusetts has school districts,” Singer said. “So when you look at the results of Singapore — which is a city-state, though it’s treated as a country — you’re talking about a very small jurisdiction. There are undoubtedly school districts in Massachusetts that far exceed the performance of Singapore.”
Drawing apples-to-apples comparisons among disparate countries with wildly varying educational approaches leads to false narratives about what produces success, with low-performers looking to emulate the “special sauce” driving high achievement — whether it’s special curricula, smaller class sizes, or something else — in high-flying countries like Finland or Korea.
Rather than spending millions trying to ape the tactics of international competitors, Singer says that countries should use testing data to learn more about themselves. She particularly recommends the comparison of within-country analyses of phenomena like kindergarten entry age across nations, such that similar areas (like the United States and Canada) can match and analyze data on, for instance, achievement gaps.
Lastly, with the release of reading and math results from the National Assessment of Educational Progress (NAEP) — sometimes referred to as “the nation’s report card” — coming next week, Singer cautions that too-broad comparisons between New Hampshire and New Mexico can be just as dangerous as those between Israel and Mexico.
“There are only 50 states, just like there are only 50-75 countries in any assessment. And I could sit here, with no effort, and come up with more variables than there are states that could plausibly explain the variation in test scores. It’s got to be more nuanced.”
Get stories like these delivered straight to your inbox. Sign up for The 74 Newsletter