It’s pretty much impossible to engage with any media platform on any topic without someone telling you, “The data speak for themselves!” or perhaps the less pedantic, “The data speaks for itself!” Often, that message is being delivered by a trusted authority like rock star Dr. Fauci (lately, very often). Other times, it may be a cable news infotainer or random Twitter user urging you to simply listen to the data. In every case, however, it seems that data need a spokesperson. The data never actually speak for themselves.
In fact, I don’t recall ever hearing data speaking for themselves, and I have been listening. Not to get all arrogant Pharisee about it, but I would like to believe that if data were going to talk directly to anybody it would be to someone like me. I have loved data my entire life, and with the exception of that one time experimenting with confirmatory factor analysis when I was young and desperate for a job, my intentions have been pure.
But no, speaking directly to us is not the way data works. Data, as they say, works in more mysterious ways.
And just maybe after the third time through the data …
Just because data won’t speak for themselves, however, doesn’t mean they won’t open up to you if you treat them right. Treating data right means that you have to respect the data, take the time necessary to understand them, and don’t ask more of them than they are able to offer. If you do that, you may be amazed at what a data set has to offer. You just cannot rush it.
- The first time with data is an opportunity to become acquainted.
- The second time you start to develop a deeper mutual understanding – what’s in the data set and what’s missing; who’s included and perhaps more importantly, who’s not; how were the data collected; is this the first time they have been collected or is there a history there to explore.
- With that understanding, it’s only after the third time with data that you begin to get a sense of whether these data can meet your needs and expectations – whether you will be able to forge a long-term relationship or if this would be just a one-off analysis of convenience because the data are available.
Every data analyst has to be willing to part ways with a data set when it becomes clear that things just aren’t going to work out. There are so many quotes from John Tukey that apply to this situation, but the best may be:
The data may not contain the answer. The combination of some data and an aching desire for an answer does not ensure a reasonable answer can be extracted from a given body of data.
And the more serious warning of disputed origin that if you torture data long enough it will tell you anything. It may be true that there are eight million stories in the naked data if we conduct enough analyses, but few of them will stand up to cross-validation or replication. Those aren’t the stories we want our data to tell.
Ah, but when you have found the right data set, the one matched to your needs and expectations, oh, the places you’ll go and the stories you’ll tell.
“Stories constitute the single most powerful weapon in a leader’s arsenal.” – Howard Gardner
When you are fortunate to have found the right data, the right finding (and not simply a finding that supports your position), and the right story, the data still won’t speak for themselves. They need someone to tell that story. When presented well, stories with and about the data can be highly effective. With regard to the results and findings from large-scale testing, however, our storytelling has been, for lack of a better term, piss-poor.
While technology has led to advances in virtually all aspects of large-scale testing over the past two decades, I would argue that reporting large-scale test results has regressed from where it was conceptually in the 1990s and where it was heading aesthetically through the work of organizations like The Grow Network in the early 2000s.
Although there are exceptions, they are just that, exceptions. The most common method of conveying large-scale tests results on paper and via the internet is still a static, flat data table or a series of flat data tables that confuse a data dump for a data report.
The shortcomings are many, but a few examples will suffice for this post:
- Since annual testing began under NCLB, 11 cohorts of students have now progressed through grades 3 through 8. Test reports at all levels (student, school, district, state), ignore that wealth of longitudinal data and information in favor of presenting results from individual years or cross-sectional results – driven by compliance with federal law and regulations.
- Although probability and uncertainty are central to the large-scale testing story, we either choose to ignore them altogether in our reports or depict them via a standard error bar that is both technically inaccurate and conveys misleading information to its intended audience – a bad storytelling perfecta.
- In the late 1980s and early 1990s, the test score was a supporting character in reports of large-scale tests results that focused on how various attitudes, beliefs, practices, and programs were related to student and school performance. With much better access to all sorts of supporting and explanatory data today, better tools to model the relationship between those data and test performance, and the technology to present that story effectively, we simply report test scores disaggregated by federally required subgroups. You want tests to provide actionable information and to move away from a deficit mindset, let’s move from reporting what to describing why.
Perhaps we have sacrificed utility for perceived accessibility. Perhaps we don’t know what story to tell. Perhaps we have not thought enough about it. Perhaps we don’t have the right people thinking about it.
In A Whole New Mind (2005), Daniel Pink wrote, “The future belongs to a different kind of person with a different kind of mind: artists, inventors, storytellers-creative and holistic ‘right-brain’ thinkers.” You may be thinking, that doesn’t sound like the type of person who you’ll find leading a testing company, graduate program in psychometrics, or a state assessment program, but it could be.
Who Tells Their Story? – A Final Thought
If “Let the data speak for themselves!” has been one of the dominant messages of 2020-2021, the other, without question, has been that it matters profoundly whose story is being told, whose perspective is included in the story and who is included in the storytelling. We cannot forget that the story of large-scale test results always comes with a perspective. In closing, I’ll leave you with these lines from the song Wonderful from the musical Wicked.
Elphaba, where I’m from
We believe all sorts of things that aren’t true
We call it “history”
A man’s called a traitor or liberator
A rich man’s a thief or philanthropist
Is one a crusader or ruthless invader?
It’s all in which label is able to persist
There are precious few at ease
With moral ambiguities
So we act as though they don’t exist