The truth of the matter (and that’s the only context in which I’ll use the word truth when discussing the reporting of test scores) is that those five words in the title would provide much better guidance on the interpretation and use of test scores than any of our attempts at technically based explanations of the limitations of those scores.
Based on a True Story
When that message appears at the beginning of a movie, Netflix original, or docudrama we know what to expect. The events set before us actually occurred and the people who put the show together did their best to depict those events accurately. Without any further prompting, however, we understand that what we are watching isn’t exactly what took place.
We believe that we are getting a good sense of the story packaged neatly as a 2-hour summary or perhaps unveiled in an 8-episode limited series that we can consume in weekend binge.
We know that dialogue has been created, the timeline altered, and that some events may have been left out altogether. We know that multiple people may have been combined into a single character who seems to be in the middle of everything in order to better tell the story. But when the credits roll, we feel that we have a better understanding of the event that took place than when we sat down to watch, and perhaps we may even have gained some insight into why events unfolded as they did.
And when you give it some thought, isn’t that what we want stakeholders to experience when they sit down with our test score reports.
The best that we can do
You’re probably thinking, surely, we can do more than that, better than that. Our stakeholders deserve better. The children deserve better. The Standards demand better. Oh, my heavens, think about the Standards!
The reality, however, is no, we can’t do better than that. “Based on a true story” is the best that we can do and would be a vast improvement over the current state-of-the-art in our reporting of state test scores.
The truth, sometimes she hurts. (Truth, there’s that word again.)
Let me be clear. When I say that “Based on a true story” is the best that we can do and a far, far better thing than we have ever done before I am not referring to the reporting of state test results, in general. Test score reports can and must do a much better job of telling the story contained in those test scores that we have devoted our lives to producing.
Rather, I am referring specifically to our attempts to depict test scores (e.g., scaled scores, raw scores, percentages, and percentiles) and the measurement error associated with them in some meaningful way. There we have failed, failed spectacularly, an epic fail.
Consider the following:
- Standard operating procedure is to either to
- report test scores without any indication of the uncertainty contained therein – which I don’t need the Standards to tell me is just plain wrong or
- place an error bar around the observed score accompanied by a description of standard error of measurement tailored for a lay audience – which we do in a way that conveys information that is just plain wrong, as I explained in one of the first posts in this blog, seven years and 165 posts ago.
- We have never come to grips with the difference between the standard error of true scores and the standard error of test scores, a difference which Frederick Lord made clear way back in 1952 in A Theory of Test Scores. 1952. Seventy years ago. Forty years before IRT became the coin of the realm in state assessment. (In case there’s any confusion, our state assessment reports contain test scores not true scores.)
- In our effort to report standard error of measurement, we are focusing all of our attention and all of the accumulated knowledge of our profession (well, except for that bit from Lord in 1952) on accounting for and describing the few test-related factors that might influence individual school performance (or group performance). Are those factors important? Sure. Are they even close to being the most important factors for test users to consider when trying to understand the performance of an individual student on a particular day on a particular test form, or in attempting to interpret the performance of groups of students within a school? Hells no!
Based on all of the above, I have concluded that providing the disclaimer “Based on a True Story” and dispensing with the rest would be a major improvement over current practice, and is, in fact, the best that we can do.
The Story of Us
If this change in practice accomplished nothing else, it would enable attention (and time and money) to be shifted to the telling of the actual story, or stories, that we want to tell with test scores – the stories that increase the utility of state testing programs, the stories that facilitate the interpretation and use of test scores to improve instruction and student learning.
It should come as no surprise that left to our own devices, we (i.e., measurement and assessment specialists, psychometricians) designed test score reports which focused on standard error of measurement. SEM is our story. It is the story that matters most to us and the one that we know best – even if we have failed miserably in conveying its meaning and importance to others.
One of the reasons that we have failed in that task, however, is that we were doomed from the start. Measurement error is OUR story. It is not a story that interests policy makers, educators, students, or parents – the primary audience for test score reports.
Unfortunately, most of us are not equipped to tell the story of the educational implications of the test scores we produced; in the same way that we are not equipped to write the items that are included on those tests.
Who tells that story?
Short answer – not us.
Slightly longer answer – people who specialize in and are actively involved in education policy, instruction, and student learning.
The bottom line is that it is our job to ensure that the tests we develop measure what they are supposed to measure and have a standard error of measurement and level of reliability that is appropriate to support the inferences that users of the test want, or need, to make – inferences that allow them to tell their story. Having accomplished that, we should get out of the way.
Acting in good faith
Most of the problems associated with the interpretation and use of test scores have nothing to do with the standard error of measurement. Rather, they are caused by the desire to use test scores in ways that simply cannot be supported – no matter how small the standard error of measurement or how accurate or consistent the performance level classification.
The phrase “Based on a True Story” is not an empty promise. It carries with it some expectations and responsibilities.
A work that is “Based on a True Story” is not simply a period piece or a novel set in the midst of historical events. It is not simply a vehicle being used to promote a particular position or advocate for a certain cause. There is an expectation that it has offered a fair and balanced portrayal of all of the key information needed to understand the story.
There are other phrases such as “Based on True Events” or “Inspired by True Events” that convey lesser degrees of a connection to reality.
We want test users to have faith that the test results that we have given them reflect actual realities of student and school performance – that is, they are based on a true story. The safeguards that we have in place to monitor state testing programs such as Peer Review and professional Standards could stand to focus more on that aspect of testing programs.
For additional information …
Finally, even when we reach the point that test score reports do a much better job of telling the right story, any good work “based on a true story” will end with a list of resources for those who would like to learn more about the topic. When appropriate, the number for a crisis hotline is provided.
We do a bit of that now, but we can do a much better job directing people toward additional well-designed, accessible, multimedia resources designed to help them better understand student and school performance – as opposed to better understanding the test score. Developing and promoting such resources should be a top priority in reporting the results from state assessment programs.
If we continue use state tests to produce test scores for individual students, that step in the storytelling process is even more critical.
Test score reports for individual students routinely direct parents and guardians to their child’s teacher for more information. It is essential, however, that we be clear that this is for additional information about student performance, and not primarily for additional information about the test or the test score.
We have to do more to make sure that parents, guardians, and teachers believe that statement is not just a throwaway line at the bottom of the report. A starting point to accomplish that is that we really have to believe that teachers know much more about the performance of individual students than we could ever measure on a state test or convey through a test score report that is, at best, based on a true story.