They Told Me There’d Be Consequences

I have been led to believe that for all intents and purposes the intensive battle over the importance of consequences in educational measurement and testing is over. Even those of us who remain convinced that trying to craft a unified theory of validity that includes all uses and consequences, whether proposed or not, intended or unintended, has only made things worse believe that that the theory, design and practices of educational measurement have consequences, and that those consequences must be studied and understood. (that was a mouthful)

A half decade has passed since the 2020 NCME presidential address “Psychometricians in the Hands of an Angry Mob” in which Steve Sireci crafted an impassioned argument for the organization and the field to be more cognizant of the consequences of our actions and more proactive in monitoring and safeguarding how our tests and test results are interpreted and used.  As they say, the only thing necessary for the triumph of bad measurement and evil testing is for good psychometricians and their AI partners to do nothing.

Therefore, as I sat down this morning to review the preliminary program for 2026 NCME annual meeting in Los Angeles, I expected to see a healthy number of sessions dedicated to research examining the consequences of policies and practices related to educational measurement and testing. 

Now mind you, I’m not naïve. This is not my first NCME rodeo. I knew that the bulk of the sessions would be dedicated to what they’ve always been dedicated to: academics and their graduate students tweaking this thingamabob and that parameter to build a better multidimensional mousetrap. Although now, of course, they would be exploring ways to use AI to do so better, faster, and stronger. And perusing the program, that is what I found. 

Still, I was surprised by the dearth of sessions dedicated to consequences or even to examining uses of testing. 

A quick search of the program using the keyword “consequence” yields a grand total of 9 results – not all of them actually addressing the consequences of testing policies and practices and two of those nine assigned to the highly-valued 3:30 – 5:00 pm slot on the final day of the conference.

What if we do the research thing and dig a little deeper? 

Measurement, Tests, Uses, and Consequences

Since the aforementioned presidential address and the events of the late 2010s and early 2020s there has certainly been an uptick in interest in the field of educational measurement and testing in topics such as social justice, cultural relevance, equity, and of course, fairness. That interest is certainly reflected in the program. 

To a large extent, however, it appears to me (and the program suggests) that much of the field’s research capital in those areas is still dedicated to the generation of the test score rather than to the consequences of its use. It is certainly true that there are very real downstream consequences related to the fundamental measurement and test design decisions that go into producing a test score. Research in those areas, however, is still not the same as empirical research examining the consequences of policies and practices related to the use of tests. 

A few other observations about 2026 NCME sessions that purport to address consequences of testing.

  1. A common classification for those sessions was “organized discussion” as opposed to coordinated session or presentations of research papers. I am all for discussions (organized or otherwise) about reimagining educational measurement, the importance of examining consequences, and the potential benefits of doing so, but such discussions are still precursors to evaluations and empirical research. 
  2. Several sessions focused on classroom assessment (many sponsored by the Classroom Assessment SIGIMIE) shine a spotlight on the need to recognize the difference between the interpretation of test scores, the ability to use those test scores, and the consequences of the use of those test scores on student learning and other key outcomes. 
  3. A lot of the work on consequences seems to be occurring in niche areas like alternate assessment for specific subpopulations or personalized assessment. 
  4. Consequences still seems to be the focus of a relatively small niche of researchers and practitioners within NCME, with the dedicated band of sisters and brothers appearing across multiple sessions.  
  5. If equity, fairness, and consequences are the hill you’re willing to die on then I would recommend attending sessions dedicated to the memory, lives, and work of Neil Dorans, Robert Mislevy, and Jim Popham.

Finally, a humorous note related to consequences, how gutsy of our Pearson friends to begin the title their session on what happens when the data stops with the word “CANCELLED!”  As a former program co-chair I can only imagine the potential consequences of how some people will interpret that as they quickly scroll through the program. 

NCME is Perfectly Designed

“Every system is perfectly designed to get the results it gets.” W. Edwards Demings

We are all familiar with the principle that a system produces what it is designed to produce. The fact is that NCME is producing the type of research that it is designed to produce. As Derek Briggs noted in his book, educational measurement and educational testing are distinct, albeit related, fields. In my experience as a testing specialist, often it has been a stretch to figure out how to apply the basic research on measurement presented at NCME to my practical testing problems. At best, I knew that there was going to be a significant lag time between research at NCME and my work with states. (I’m still waiting for the application of all of those wonderful multi-dimensional models I saw presented in the early 2010s.) 

NCME is designed and best suited to produce basic research on using educational measurement to generate “better” test scores – scores that better support fairness, equity, and validity. It is certainly also within the charge and responsibility of NCME to support the appropriate interpretation of those test scores and clearly identify the inferences that they can and cannot support. 

I’ve never been convinced, however, that NCME is the most appropriate group to focus on more applied uses of educational measurement for educational testing and ultimately, one step further removed, to conduct research on the consequences of those uses. Sometimes the question,”If not us, who?” is more than rhetorical.

I could be wrong. But with regard to NCME and testing, history and the current program support my case. 

Ultimately, the organization needs to decide whether it’s better to do well what it’s designed to do or to redesign itself to do something that it has never done.  

Image by patrick Blaise from Pixabay

Published by Charlie DePascale

Charlie DePascale is an educational consultant specializing in the area of large-scale educational assessment. When absolutely necessary, he is a psychometrician. The ideas expressed in these posts are his (at least at the time they were written), and are not intended to reflect the views of any organizations with which he is affiliated personally or professionally..