As we tear the March page off of our digital calendars and enter April, traditionally regarded as a time of renewal and rebirth, it seems appropriate to pause and reflect on the state of our field; a field which in my case is an amalgam of assessment, K-12 state testing, and the broader field of educational measurement.
My observations these days are those of someone watching the game from the cheap seats, think upper deck end zone seats at a football game. It’s a different view of the game than those that I had for so many years either from the sidelines as a consultant or from the middle of the fray as a player for the state or the assessment contractor. It’s even different than the view from the box seats I sit in while attending two or three “games per season” as a TAC member.
Sadly, I miss out on seeing and hearing some of what is going on in the trenches and lack a firsthand understanding of the game plan. On the other hand, this vantage point not only promotes more of a focus on the big picture within the game but also allows you to look up, peruse the landscape, and see what’s going on outside of the stadium – something which should not be underestimated. With this disclaimer or caveat out of the way, let’s jump right in.
AI – We’ve Been Waiting For You
There has never been a need for a crystal ball to see that the future of educational measurement and assessment would involve computers and would be in the classroom, integrated with instruction – you know, where the students and teachers are. In the third edition of Educational Measurement in 1989, Bunderson et al predicted and described four generations of computerized educational measurement:
- Computerized testing: administering conventional tests by computer
- Computerized adaptive testing: tailoring the difficulty or contents of the next item or an aspect of the timing of the next item on the basis of examinees’ responses
- Continuous measurement: using calibrated measures embedded in a curriculum to continuously and unobtrusively estimate dynamic changes in the student’s achievement trajectory and profile as a learner
- Intelligent measurement: producing intelligent scoring, interpretation of individual profiles, and advice to learners and teachers, by means of knowledge bases and inferencing procedures
As early as the 1960s, we saw the development of PLATO, a computer-assisted learning system. Since 1989, computer-assisted instruction flourished, interim assessment companies forged ahead into the second generation of CAT and major publishing companies abandoned their traditional large-scale tests in favor of curriculum- and classroom-based offerings. To a large extent, however, educational measurement and testing were frozen in place, awaiting the technological advancements needed to support the infrastructure necessary for Bunderson et al’s second, third, and fourth generations.
In the past 15 years, those advancements allowed the field at large to finally to progress to the second generation and for a handful of innovative companies to begin to venture into the third. As Sue Brookhart and I wrote in Assessment to Inform Instruction and Learning, our chapter in the 5th edition of Educational Measurement, “the movement that Bunderson et al. (1989) anticipated and recent history has borne out. The arc of the history of assessment is moving from institutional purposes and bending toward assessment to inform teaching and learning” (p. 1078).
The future seemed bright and closer than ever before. Even as we completed the first draft of our chapter in 2020, however, it was clear that full evolution into the third and fourth generations remained beyond our reach, simply too labor intensive and cost prohibitive to be implemented at scale.
Enter AI.
With the emergence and continue rapid development of AI tools, continuous and intelligent measurement are no longer aspirational visions of an unattainable future. Rather, they are the bright lines and pillars by which innovations in educational measurement, assessment, and testing should be judged beginning today.
Efforts to use AI to do what we currently do a little bit better or a little bit more efficiently are fine, but they don’t interest me all that much. I want to see AI used to make possible the things we’ve dreamed of doing for the past 30, 40, or 50 years but that remained far beyond our reach. Why not?
It’s a brave new world that will require brave new thinkers and brave new leaders willing to try and perhaps fail.
Discovering Our Toes
As I watch my earnest psychometric and large-scale testing colleagues trip over themselves to embrace students, teachers, instruction, schools, etc. the image that comes to mind is the sense of wonder experienced by a baby discovering its toes for the first time. Those “toes” have been there all along, but we’ve never really noticed them or given them much thought. But now that we’ve found them, just think of the possibilities.
There’s sure to be some awkwardness at first, but as we reach out, interact with, and come to understand what’s going in the world that was just beyond our reach we will strengthen our core, increase our flexibility, and ready ourselves for the complex coordination needed to support student learning.
As we take our first tentative steps into assessment to inform instruction and learning, the hardest part will be finding and keeping our balance. It will take some time, I’m sure, for some of us who have been in measurement and testing our entire career to accept that we are not the straw the stirs the drink, but merely a member of the ensemble in the production of student learning. And at the other end of the spectrum will be those imbued with the zeal of the convert who preach that each and every test and testing occasion must be designed to provide useful information to individual teachers and students to inform instruction and learning. Balance.
The bottom line is that regarding the use of measurement and testing to support instruction and learning, we are still in our infancy, like the 4 -6 month old discovering its toes. There will be growing pains along the way, but I’m confident that at the end of the day, we will find our footing, stand tall, and step smartly as we apply our knowledge and skills in measurement, assessment, and testing to the field of education.
Like that infant growing into a toddler, young child, etc. it’s critical that we develop new knowledge and skills.
Take Me To A Ruler
Is educational measurement destined to become a real science?
Given that we’ve just spent the better part of a decade recognizing and reckoning with our measurement and testing roots, the notion of returning to those roots in any way may seem ludicrous. However, even as we leave phrenology, Galton, and Cattell in the past, it is with caution and some trepidation that I suggest a not-to-distant future in which physical measurements supplant the “cognitive” instruments and measures that have defined educational measurement for the past century or so. The writing is on the wall.
As I described in a previous post, at a foundational level cognitive science and neuroscience have become inextricably intertwined as cognitive neuroscience. The theories and interpretations remain oriented toward social science. The measurements not so much.
At a more applied level, we are more regularly seeing articles discussing things such as tracking eye movements, sequences, elapsed time, and other physical actions during problem solving (and not simply as security measures). As we focus more on durable skills that involve managing interactions with others, I’m certain that we will be using wearables to collect biometric information about increasingly complex physical reactions during those encounters.
As our focus moves more from documenting achievement to understanding learning, it is inevitable that we will be collecting data on what parts of the brain light up and what clenches, twitches, increases in speed, or decreases in intensity under certain conditions or as certain behaviors of interest are demonstrated.
The key will be how we decide to use those physical measurements and whether we have learned lessons from our past.
Face of Assessment
Every field needs its champion, its frontperson, its voice. That need is magnified in a multidisciplinary field such as ours about to step into the unknown.
As one-by-one the old guard fades into history, we’ve been forced to bid final farewells to those who served as the faces of our field as we navigated the worlds of NRT, CRT, IRT, NCLB, and at long last, CBT and CAT. For the past few years, like Simeon at the Temple, I’ve been watching and patiently waiting for that new face to emerge.
There have been some notable candidates. For a while, it appeared that Derek Briggs was poised to don the mantle connecting the past, present, and future of educational measurement. Alina von Davier has stood front and center stretching and reshaping our understanding of psychometrics. About a decade ago, Steve Sireci emerged from a sabbatical tanned, rested, ready, and loaded for bear. Jennifer Randall has made us uncomfortable in and with our own measurement and testing skin. Their lights, as well as the lights of many others in the field, still burn bright, and each remains a formidable force in their own right.
At this moment, however, there is one figure who transcends them all. As the face of the field, I give you the titan of TACs, the doctor of discontinuity, the hero of happy hour, professor plushie himself, Andrew “The Null Hypothesis” Ho.
Befitting his moniker, you may not be able to accept Andrew as the face of educational measurement and assessment, yet it’s clear that we have failed to reject him. More than any other entity in the field at this moment (person, Pearson, or platform), Andrew stands tall as the nexus between measurement, testing, and assessment; between academia, industry, and the policy world; as well as between east coast, west coast, and Iowa.
He strengthens our foundations while encouraging us to branch out in new directions.
He strives to help each of us to feel a bit more normal, to determine our fit in this skewy joint distribution of psychometricians, assessment specialists, educators, and policymakers.
He continues to raise the bar while on more than one occasion closing said bar and picking up the tab.
He is our opalite, promoting calmness, emotional healing, growth, and clear communication.
On The Cusp of Great Change
As I wrote in Fundamentals and Flaws, across my three decades in large-scale testing there was a constant feeling that the field was on the cusp of great change.
In 2026, however, it’s clear that we are approaching one of those watershed moments in measurement and K-12 assessment; one of those turning points that occur every 20 years or so, roughly coinciding with the publication of a new volume of Educational Measurement and a revision of the joint Standards.
The period between 1965 and 1970 gave us Title 1, NAEP, and minimal competency or basic skills testing.
1985-1990 ushered in the shift from commercial norm-referenced test batteries to custom-designed state testing programs, from percentile ranks to percent proficient, state standards, state NAEP, and a unified theory of Validity.
2005-2010 saw the development of the Common Core State Standards, the shift to college-and-career-readiness, and technological advances that allowed us to build the infrastructure needed to support the changes described above.
What can we expect to see from educational measurement, assessment, and testing in 2030 when all of the dust that is currently being churned up finally settles? How will this turning point be defined?
Finally, it’s not too early to think ahead to 2050 when kids in elementary school today will be entering the field with their freshly minted PhDs and their children will be entering elementary school. Perhaps the most important question we can ask today is what groundwork will we lay for what we hope educational measurement and assessment will look like in 2050?