As we approach the middle of July, I find myself glancing more often at the thermometer propped up on the side of my desk. More often than I like over the past few weeks, the temperature in my home office has been hovering around 80º F, even in the mornings when I do most of my writing. It’s not ideal, but neither is it oppressive. With my 3-speed Honeywell fan from Target, a water bottle from some conference sweating on one of the felt-covered coasters my daughter made for me those AOL CDs that came in the mail when she was a child, and wearing one of my light-colored Taylor Swift t-shirts I soldier on fairly comfortable.
The heat doesn’t make me yearn for the air-conditioned office where I used to work (nope, not even for as long as it took me to write this sentence); but it does get me thinking about thermometers. Among the collection of measurement, or measuring, devices that I had in my office (scales, measuring cups, rulers, etc.), thermometers were always my favorites. It didn’t matter whether it was the Galileo thermometer, the candy thermometer, the Statue of Liberty with the thermometer on her front, or the Taylor thermometer that sits on my desk today. Thermometers and their indirect, invisible cause-and-effect relationships are fascinating.
The Taylor thermometer is nothing special. It’s made of plastic. You can get one today on Amazon for $6. But like most thermometers, it does the one thing that it’s supposed to do. It tells me what the temperature is with an acceptable level of accuracy, thereby living up to its lofty name, Taylor Precision Wall Thermometer.
As an added bonus, the numbers are large and bold enough to me to see even without my reading glasses and they are color-coded – red for temperatures above freezing and blue for temperatures below freezing. Plus, the thermometer contains side-by-side Fahrenheit and Celsius scales ranging from highs of 120º F and 50º C to a low of that magical psychometric favorite of -40º.
Ah, there’s just something special about thermometers.
Standardized Tests as Thermometers
There was a time not too long ago when the thermometer was also the go-to metaphor for large-scale standardized tests. You might also see “snapshot” or “dipstick” from time-to-time, depending on the audience, but thermometer was the prevailing choice to describe the function, purpose, and use of standardized tests to everyone from beginning measurement students to lay audiences to policymakers as well as to teachers and students.
There are three main reasons, of course, that things like thermometers, snapshots, and dipsticks worked well as metaphors for standardized tests. The first is that the comparisons were accurate. The second, and more important, reason is that everybody understood what thermometers, snapshots, and dipsticks did and didn’t do, what information each provided and also what they didn’t provide. Finally, the third, and most important, reason is that they set an appropriate level of expectations for standardized tests and the accuracy and precision of their scores much more effectively than any standard error bar or confidence interval ever could. Nobody taking their own temperature expected the thermometer to tell them why they had a fever; or for the dipstick to tell them why they were burning and/or leaking oil, or what those three smiling cherubs in their pristine outfits looked like 30 seconds after that snapshot was taken.
Over time, however, it seems that such simple, straightforward metaphors have fallen out of favor. These days, you are much more likely to hear aligned, coherent, balanced, diagnostic, accessible, instructionally useful, formative, normative and other primarily performative terms bandied about when describing standardized tests. I’m not suggesting that any one of those terms is completely devoid of meaning; and I’ll concede that when used appropriately, some portion of that meaning is relevant to discussions about standardized tests. All, however, lack the clarity of their metaphorical predecessors in conveying meaningful information to either professional or lay audiences.
Why did our simple metaphors for standardized tests fall out of favor?
Well, dipstick and snapshot are easy to explain away as quaint terms from a bygone era. In the case of snapshot, one might even argue that the metaphorical usage has become the term’s primary meaning, and that’s too much for me to process on a hot summer day.
But why abandon “thermometer”? Thermometers, in one form or another, are ubiquitous. Or perhaps I should say that the simple information that the thermometer provides – the temperature – is ubiquitous whether or not we see an actual thermometer attached to it.
It’s Not You, It’s Us
I think that the simple fact is that we came to believe that the thermometer was too simple a metaphor to meet our needs. That is, our need to give the people what they want and to market our tests. When people are clamoring for more information, it’s not a winning strategy to compare your product to a device that provides one simple piece of information – no matter how well it does its job.
And it’s not a question of improving the product.
No matter how much you improve a thermometer, it’s still a thermometer. It still simply tells you the temperature.
Think about how the thermometers we use to check our body temperature have changed over the years.
The state-of-the art has moved well beyond the glass mercury thermometers that many of us struggled to hold under our tongue for three minutes when we already weren’t feeling well. Long gone are the days of “shaking down” the thermometer and then trying to round up shards of glass and those little toxic balls of mercury when you accidentally whacked it against the bathroom counter.
We made the switch to digital thermometers that were easy to read and cut the time down to less than a minute.
Then they gave us disposable sleeves to make the whole process more sanitary.
Next came non-contact thermometers with almost instantaneous readings.
And we even have wearables that can monitor body temperature and changes in body temperature constantly, if we like.
There have also been corresponding underlying changes to how body temperature is measured or estimated, but across all of these changes, the output is still same:
It’s simply the temperature.
I could attempt to make a similar timeline showing the evolution of the standardized test over the same time period, but the bottom line would be that with increased rigor, length, and attached accountability consequences. one could argue that the “improvements” in standardized tests have been comparable to moving from the oral to the rectal version of the glass mercury thermometer. In other words, standardized tests are a pain in the butt.
Be that as it may, here’s the rub: the simple temperature and the standardized test score are both still important and useful pieces of information.
People still want to know their own temperature, how warm or cold it is outside, the oven temperature when they are backing a cake, etc. – all information that can be provided by a thermometer.
People still want an overall measure of student achievement or proficiency.
Of course, they also want more.
We Really Don’t Know Clouds At All
We’ve looked at the standardized test score from all sides and every possible angle trying to get more out of them to give people what they want. But we really don’t know any more about student achievement than we did before. For the most part, that’s because we’ve failed to develop measures of other key factors that people need in addition to the test score to inform policy and/or instruction; that is, our version of measures and indicators of things like amount of precipitation, wind speed and direction, barometric pressure, humidity, dew point, wind chill factor, heat index, etc. We have done little to develop models to describe and predict future performance based on currently available information.
Watch a weather report on any nightly news broadcast and you get all of that, nicely packaged into a weather forecast in which the plain old temperature is still a featured player.
The basic standardized test score, too, can still be a featured player in informing educational policy and evaluating instructional programs, but only if we develop measures and instruments to provide the other information that people also want and need to supplement the information provided by the test score.
Because when it comes to pretending that the standardized test is more than a thermometer and test scores provide more information than they actually do, we’ve been up and down and over and out, and I know one thing: no amount of processing is going to allow us to wring any more out of a standardized test score than was there from the beginning.
The test is a thermometer. The score is a temperature.
And we should accept them for what they are.
Image by Jarosław Kwoczała from Pixabay