If asked to identify the biggest successes of the Education Reform movement over the past two decades I would have to put selling the importance of disaggregating data at or near the top of my list. Acceptance and adoption of the practice of disaggregating data is well beyond what one might expect from mere compliance with federal requirements or even testing industry standards. The abundance of disaggregated data, however, can be a double-edged sword. Like virtually everything else in our assessment and accountability enterprise, disaggregated data are subject to unintended consequences.
The same set of disaggregated assessment results may be used by one group of stakeholders to advance the goals of education reform, while it is being used simultaneously by other stakeholders as evidence for preserving the status quo. Used as intended, disaggregated data can provide valuable, finer-grained information to guide the interpretation and use of assessment results to evaluate and improve curriculum and instruction for all students. Conversely, disaggregated data can also facilitate the efforts of well-meaning individuals (and others) to adopt or accept lower expectations for historically low-performing groups of students.
The greatest concern with the deluge of disaggregated data, however, may be that it can result in paralysis by analysis as policymakers and educators try to make sense of multiple layers of score reports from dozens of assessments disaggregated by overlapping subgroups of students.
Is that an elephant?
When I think of people trying to make sense of disaggregated K-12 large-scale and interim assessment data I cannot help but think of rooms filled with elephants, all kinds of elephants. There is that big old elephant standing right there in the middle of the room that nobody wants to talk about. Then there are all those pink elephants flying about. Sometimes, of course, the room is so cluttered with tables and walls of disaggregated data that you cannot see the elephants through the tree, pie, bar, and other charts.
One of the first perceived benefits of disaggregating state assessment results was to shine a spotlight on the elephant in the room. Even within the highest performing districts and schools in the state there were often subgroups of students for whom the educational experience was quite different. At the state level, there might be persistent achievement gaps that stayed the same or grew wider even as overall state performance improved from one year to the next. Placing an emphasis on disaggregated results made it more difficult to ignore those gaps and groups of students. Repeatedly shining a spotlight on a problem, however, is quite different from fixing the problem.
As we moved forward with disaggregation the norm and accountability the law, educators came face-to-face on an annual basis with literally hundreds year-to-year comparisons across their assessments, and those comparisons are often based on small groups of students. That situation, of course, is the perfect breeding ground for statistical pink elephants, or worse red herrings, that result in resources being misdirected toward problems that don’t exist or resources being redirected each year as new random results pop up.
Now cross the subgroup information with the layers of disaggregation of the test results themselves: total scores (scale scores and achievement levels), domain or claim scores, standard scores, scores by item type, and perhaps item-level scores for a sample of released items. The result, a thousand points of mirrored light bouncing off of each other making it impossible to recognize that the piece of data you are holding is part of an elephant. That creates a situation in which it becomes impossible a) to identify the major problems to be solved (i.e., the elephant in the room) and/or b) to focus sufficient resources on those major problems.
Accepting without reservation the importance of disaggregating data, is there a way to do so that will minimize unintended consequences? Can we present assessment results in a way that will promote and facilitate effective action to improve curriculum, instruction, and student learning?
The short answer is yes, with a much more thoughtful and proactive approach to the interpretation and use of results from K-12 assessments.
I Need Data.
Is there any other kind?
Back in 1998, when Massachusetts released the first round of results from its brand-new state assessment program, the Massachusetts Comprehensive Assessment System (MCAS), they made the deliberate decision not to disaggregate or otherwise interpret results by categories of race and ethnicity, sex, or economic status. Their primary reporting strategy was driven by the desire to communication the state’s commitment to holding all students to the same standard. Consequently, the primary focus of reporting was on the participation and performance of all students.
The only categories reported in the state release of those program-defining initial set of results were the non-overlapping, state-defined student status categories: Students in Regular Education Programs, Students with Disabilities, and Limited English Proficient Students. The anachronistic names notwithstanding, the state chose to focus attention on the participation and performance of two groups of students who had been systematically exempted from the previous assessment program: students with disabilities and English learners.
Their thinking regarding the reporting of results by race/ethnicity was also driven by the desire to communicate the state’s commitment to holding all students to the same standard. In their previous assessment program, reports of school and district results included a regression-based “comparison score band’ that allowed schools and districts to compare their performance against demographically similar schools rather than a fixed criterion. The district summary excerpt below from the previous assessment system shows both the high percentage of students exempted and the use of the comparison score band.
The decision not to report results by race/ethnicity resulted in backlash from people accusing the department of trying to hide the low performance of certain subgroups and also from people upset over the loss of their demographic safety net. As the old adage says, if neither side likes your decision, you must be doing something right.
Making Sense of It All
Just as there are reasons for disaggregating data, there are also reasons why we aggregate assessment data across items and across students. At one level, of course, there are statistical benefits to aggregation, benefits that support the interpretation and use of test scores. Perhaps more importantly, however, we aggregate data into units that are practical and actionable. Although it is not possible (or advisable) to avoid disaggregating data by all federally required reporting categories, it is possible to prioritize areas for improvement and to structure reporting efforts accordingly to best communicate those priorities. With the limited amount of data available due to the pandemic, this might be a good year to start thinking seriously about what you want and need to communicate with the data that you have.
States can learn a lesson from the release of NAEP results. States may not have the time, resources, or legal authority to shape their messages in the same manner as NAEP, but the reporting of assessment results must be more than a simple data dump. Whether it is the achievement gap, chronic absenteeism, or performance on items measuring higher-order cognitive skills, a state (or district) can prioritize the actionable information that it wants to feature in its release of results. Disaggregated data is a powerful tool; but like all tools it is only useful if people are given the right tool for the job and know how to use it.