The world turned upside down – the phrase referenced in Hamilton to describe the impact of end of the American Revolution seems an apt choice to summarize the current state of affairs in K-12 large-scale testing and educational measurement. The field still cannot seem to get out of its own way when discussing whether annual state summative tests should provide actionable information to inform instruction. Standardized testing is popping up fairly regularly as a topic of discussion on national news and talk shows, which based on my experience is never a good thing. The assertion that the Joint Standards are racist went unchallenged in an NCME-sponsored webinar featuring a current, past, and future NCME president.
The issues listed above and others like them, however, are merely a reflection of rather than the cause of the revolution that is turning the world of large-scale testing and educational measurement upside down. The revolution, which has been simmering for decades, started to approach a boiling point over the past ten years, and perhaps has now boiled over is centered on the shift in focus from the measurement of groups (or individual differences within a group) to the measurement of individuals.
Yet let’s be content, and the times lament, you see the world turn’d upside down.
Quantum Leap
This shift in focus from groups to individuals is far more consequential (there’s that word again) than the change from norm-referenced to criterion-referenced testing or the criterion-referenced interpretation of test scores. It involves far more than reporting individual student scores, achievement levels, and subscores.
There is an expectation that educational measurement will be able to provide accurate, precise, and detailed information (preferably in real time) about what an individual student knows and is able to do, as well as enabling educators to draw solid inferences about what that student does not yet know and is not yet able to do. That expectation cannot be met simply by reporting individual student scores on a test constructed on a foundation designed to model group-level performance. It also seems highly unlikely that the expectation can be met by educational measurement that remains focused on one sample of student performance at a single point in time or ignores the context in which student performance occurs.
Measuring and describing the performance of individual students in the desired manner requires a quantum leap in educational measurement (i.e., a breakthrough, a sudden highly significant advance). It requires new models, new science, and new ways of thinking. This goes beyond making use of multidimensional IRT models. (Question: If a multidimensional IRT software package drops, but is never actually used in an operational assessment program, does it make a sound?). Cognitive diagnostic models may be a step in the right direction, but are likely not a large enough step. Research in computational psychometrics, like that being done by Alina von Davier, promises the type of advance that will be necessary.
In his 1997 paper, Postmodern Test Theory, Mislevy describes the emergence of quantum mechanics in physics, but also the value in the continuing coexistence of quantum mechanics and “outmoded” principles and models from Newtonian physics. We may well still have a use for our “Newtonian” psychometric models for describing the performance of large groups of students, but we also need more. We need the educational measurement equivalent of quantum mechanics with models that better describe the performance of individual students, and we must also be prepared to deal with all that comes with them.
In the same 1997 article, Mislevy is just beginning to weave the complex sociocognitive web that emerges in his recent work. In the course of developing models to better describe the performance of individuals within such complex social and cognitive structures, we are almost certain to arrive at our own version of Heisenberg’s uncertainty principle. That’s OK. Education is messy. People are messy. Educational measurement will be messy, but it can be so much more useful than it is now in providing information about individual students to people who need it.
Theoretical Equilibrium
As we figure out which side is up in an educational assessment and measurement world turned upside down, we must also reach a state of theoretical equilibrium. That is, we must find balance regarding the role of theory in our work.
It may sound paradoxical, but the modern era of K-12 assessment and measurement, driven by item response theory, has been markedly atheoretical – statistical theory being something very different from educational or learning theory. (One can revisit Mark Wilson’s 2017 NCME Presidential Address if you are unclear on the difference between the two and how it affects test design and construction.) The inexorable growth of data science and the critical eye being cast toward the prevailing education, statistical, and social theories that have driven educational assessment and measurement can only make the role of theory more precarious, at least in the near term.
On the other hand, there is a push for education, in general, and educational assessment and measurement, in particular, to be more cognizant of and driven by learning theory. Of course, learning theory, like psychometrics, is less a field unto itself than it is the synthesis and application of principles from a host of related fields. How to apply learning theory at scale and the extent to which learning theory will be subject to the same social scrutiny as measurement theory are two unknowns.
At some point, however, the field will acknowledge the need to reconnect educational measurement and educational psychology. There is a balance to be attained – theory-assisted data science (or psychometrics) and data-assisted learning theory. Balance, of course, does not imply that the field will be stagnant, or even stable. Change is constant. Change is good. Balance, however, can keep the world from being turned upside down – until it needs to be.