I will say upfront that among my favorite memories from my years in large-scale testing are the conversations I have had with Dr. Derek Briggs. Those conversations covered topics as wide-ranging as the Celtics-Lakers rivalry in the 1980s, Charlottesville 2017, and just what do we mean by a year’s worth of growth. They took place in settings as varied as a small-group discussion during a Center for Assessment Colloquium and a small restaurant in Kansas over a 3-hour dinner featuring a few too many Diet Cokes (I don’t recall what Derek was drinking). The defining features of all of those conversations (at least for me) were that they were entertaining, informative, challenged my thinking, ended with major issues unresolved, and I left looking forward to our next encounter.
I wasn’t surprised, therefore, to come away from his recent book, Historical and Conceptual Foundations of Measurement in the Human Sciences: Credos and Controversies, with very much the same impression. In addition, it left me wanting to read, and write, and think so much more about the one Credo and several controversies detailed throughout the book.
Before going any further, I should acknowledge an obvious fact.
With the possible exception of those reverse mortgage commercials featuring Tom Selleck, I accept that I am not a member of the target audience for most products, and that is true for this book as well.
- I am not a graduate student starting out in educational measurement for whom it will provide a solid foundation in the foundations of measurement.
- I am not an early- or mid-career professional seeking a better understanding of what was done in the past with an eye toward building on it to improve the future.
- I am not even an advocate of one position or another combing through the text in search of that smoking gun or explanatory argument to support my position.
No, my interests are much more personal. I am the man standing in front of the mirror making a reckoning of the choices I made during a 30-year career in large-scale testing. While not expecting Derek to be my Clarence, on one level, in today’s parlance, I was hoping to be able to see myself in at least some small part of those special “characters” Briggs described; and ideally, to be able to draw a thread between my work and theirs.
On another level, to paraphrase Toby Keith, this field that I gave my best 30 years to has fallen under attack (again). A mighty sucker punch has come flyin’ in and there will be no turning back. There is always an opportunity to take a look back, however, and to better understand what has taken place and why. My hope was that one way or another, this book would help me determine the extent to which I have cheated that guy in the glass.
So, what did I learn about myself and about our field?
Through The Looking Glass
For the most part, the world that Briggs describes in his book is not one that I recognize.
Oh, sure, I can relate to Galton having reached the point where mathematics no longer came easy (or came even after hard work and effort), and accepted his placement on the proverbial Wright Map of mathematics and mathematicians. And anyone who ever attended a staff meeting with me knows that much like Spearman and Stevens, I “relish academic combat”; there was a reason, after all, that I was once given the sarcastic nickname, Dale Carnegie. I do hope, however, that I never reached the point attributed to Spearman of being “more interested in winning the argument than in genuinely trying to understand the argument.” (p. 251-252)
But the worlds of Galton, Binet, Spearman, Thurstone, Stevens et al., the tasks they were undertaking, and the questions they were trying to answer were all very different than mine.
And that, too, did not come as a surprise. I knew that something was up the first time I wandered into an NCME session while attending AERA. As Briggs states quite clearly and emphatically in Chapter 1, What is Measurement?:
Testing and measurement are two distinct activities. This assertion is so important that it bears repeating: Testing and measurement are two distinct activities. When certain assumptions are made and conditions are met, they overlap, and in such instances, it may well be a reasonable shorthand to refer to testing as “educational measurement.” But it is important to appreciate the way that this move implies an elevation of testing onto the same level of implied authority that would be found in the field of metrology for the measurement of temperature and time…When tests are automatically granted the status of measurement, they are that much more easily appropriated as vehicles for social injustice, even when this may well have been the opposite of the intent of the test designer. (pp. 13-14)
On this platform I refer to myself as an assessment consultant (this is not the place to get into the discussion of assessment v. testing). I recognized long ago that I was involved in the subfield within assessment concerned with K-12 large-scale testing. I was (am) a testing specialist, not a measurement specialist. And although I allow myself to be labelled a psychometrician when it benefits the people signing the checks, we all understand that the modern psychometrician is a person who, in general, has no soul, believes only in relationships among numbers, and has no allegiance to measurement, testing, or education. (And be warned, the requirement that a psychometrician be a person will soon be obsolete.)
OK, that’s a bit harsh. As Briggs describes, psychometrics itself “was a field of study and practice that overlapped whatever boundaries existed between the emerging traditions of experimental psychology, educational psychology, and educational measurement.” (p. 259) Let’s agree that “psychometrician” is simply a catch-all, generic term that can mean a lot of different things to a lot of different people. A characteristic, of course, which makes it perfect for the field of “educational measurement” which is a discipline (and I use that term loosely) that apparently is as latent as the traits and putative constructs it purports to measure.
Again, not a surprise.
The first clue, of course, was the decision not to include Educational Measurement in the title of the book. At first glance that might appear to be a publisher’s decision to increase the market for the book. Any doubt about intent, however, is dispelled by the decision to open the book with a recreation of the “measurement” scene from Dead Poet’s Society (one of the greatest movies ever made) and end it with an equally detailed recounting of the Michell takedown of Stevens, psychological measurement, and therefore, either by heredity or the transitive property, educational measurement.
(If you are not familiar, picture Joel Michell leading an SNL cold open where Stevens and psychological measurement, are something associated with Trump or Fox News. Not an SNL fan, picture Michell as John Oliver and Stevens and psychological measurement as any topic he decides to feature on Last Week Tonight. Still no, well picture Michell as Chris Hansen waiting for educational measurement to knock on the door.)
I’m a Mirrorball – I can Change Everything about Me to Fit In
So, did I ever find myself and K-12 large-scale testing within the pages of the Briggs book? Why yes, I did. Thanks for asking.
The first snippets that resonated with me occurred in the chapter on Thurstone, the concept of invariance, and the measurement of attitudes.
But finally, it all came together, crystallized if you will, in the final chapter (page 311 of 332) as Briggs discussed Stevens and operationalism:
A problem with operationalism, at least in the strictest rendition, and one of the reasons it fell out of favor as a philosophy of science (see Chang, 2019), is that in the absence of any way to independently observe some attribute of interest, there will be as many measures of the attribute as there are unique operational procedures being applied. This would seem to move backward to a time when all measurement was a local affair, contingent on decisions about standard units that were often the province of the ruling class. (p. 311)
Eureka! That’s me. This Is Us.
That’s K-12 State Testing in English language arts, mathematics, and science. It’s who we are. It’s what we do.
Apparently, Stevens was confident that because “science is knowledge” and agreement would be fostered through critique and debate among members of society those potential problems associated, let’s say for example, with 50+ unique sets of standards and definitions of proficiency, were something that we would never allow to occur. Hey, Stevens, I could give you fifty reasons why that didn’t work out as expected.
Give us a set of standards and we will build a test aligned to them. The scale we report on has no underlying meaning. Not a problem. We will do our best to place people in the right order and help you to set an achievement standard. Our research is focused on how to make the process of doing that more effective, more efficient, and more fair.
Would it be nice if we devoted some mental energy and resources to actually trying to better understand what this so-called proficiency that we are measuring is, how it is acquired, and how it relates to things like mastery of the standards being assessed, individual differences among students, or instruction? Well, sure. I guess so.
Magic Mirror On the Wall, Who Is the Fairest of them All
I have a better understanding of the relationship between what I did for 30 years and measurement. I am comfortable with the perspective that K-12 testing is the application of certain procedures for a particular, limited purpose. It’s not measurement. We are not psychologists. Testing serves a purpose. I served a purpose. But what about that purpose?
What do I tell that guy in the glass about the other aspect of all of this, about the elephant in the room, about whether the tests that we worked on were “easily appropriated as vehicles for social injustice, even when this may well have been the opposite of” my intent, and how should the two of us feel about that?
That answer will have to wait until next week’s blog post.