“Criticizing test results for reflecting these inequities is like blaming a thermometer for global warming.”
That was the viral moment from the recent NCME statement on admissions testing. The line clearly was intended to go viral and it did go viral; well, as viral as any technical defense of standardized testing can go – quoted and retweeted tens of times.
I like a glib “test as thermometer” quip as much as the next psychometrician and I have enjoyed the various versions of this one that have been used in the context of college admissions testing. There was something about the line and the statement, in general, however, that just didn’t feel right.
NCME framed the statement as “highlighting the critical distinctions between group score differences and test bias.” Along with an obligatory quote from the Standards and an academic reference to correlation and causality, the test as thermometer equivalence appears to be drawing a clear distinction between test scores and test use. Test scores, it appears can reflect real differences between groups without the tests being biased. This separation of test scores from test use is something that we have not seen in the organization’s arguments on validity. As NCME president, Steve Sireci has written, “To ignore test use in defining validity is tantamount to defining validity for ‘useless’ tests.” Does the same argument apply to test bias?
When the tests in question are college admissions tests their primary intended use is fairly explicit. One can assume that a claim that the tests are biased refers at least as much to their use in college admissions as in a technical claim about the accuracy of the scores. To dismiss this claim with a technical lesson on misconceptions about test scores comes across as defensive, at best, tone deaf, and somewhat self-serving.
NCME could have chosen to focus their response on this portion of their quote from the Standards: “group differences in testing outcomes should trigger heightened scrutiny for possible sources of test bias…”
- They could have discussed whether the construct being assessed by the college admissions tests is academic achievement (in English language arts and mathematics) or college readiness. If the former, then we are back to the question about whether the focus on the accuracy of the group differences is tantamount to the defining bias for useless tests.
- They could have discussed differential validity and the importance of establishing that the relationship between English language arts and mathematics achievement and college readiness (or success in college) is the same for students whose low performance is “caused by disparities in educational opportunities” as it is for other students.
- They could have discussed the role that test scores play in the “proper use and interpretation of all data associated with college readiness” and explained how limited or extensive that role should be given what the field knows about college admissions tests and test scores – particularly with respect to the performance of the subgroups of students in question.
Instead, NCME chose to offer a heavily nuanced defense of college admissions tests and test scores. I have to wonder who they see as the primary audience for this statement.