Seeking Comparability in an Incomparable Year

The past year has been a year like no other. We experienced a summer like no other, a World Series like no other, an election like no other, followed by a Thanksgiving, Christmas, Presidential Inauguration, and Super Bowl like no other.

How can we possibly expect to produce comparable state test results in this school year like no other?

The purpose of this post is not to address the general question of whether administering spring 2021 state tests is a good or bad idea. There are many facets to that question that can be debated on their merits. [For the record, I think the benefits of testing outweigh the potential costs.]

Rather, the purpose of this post is to offer counterarguments to three issues that have been raised as significant technical concerns threatening the comparability of results from state tests administered in spring 2021: remote test administration, not being able to test all students, and the difficulty in interpreting test results.

Given that debates of this kind appear to rely heavily on a call to the authority of those making the argument, I will provide a brief summary of my experience with the issue of comparability to establish such authority. That sad fact is that I have spent more time wrestling with the issue of state test comparability the past 15 years than any sane person should.

I have co-authored CCSSO-commissioned reports on comparability in 2013 and 2016, a peer-reviewed journal article, and a book chapter.
I authored a $93K report on the comparability of state test results and a series of annual $15K reports. (money is substance, after all)
I participated in a federally funded multi-year study on test score comparability.
I was there in DC when the field debated whether it should be pronounced “compare – ability” or the finally accepted, [KOM] + [PUH] + [RUH] + [BIL] + [UH] + [TEE]
I was there the following year when the field spent valuable time and resources trying to convince people that “compare – ability” was also something important, but different from comparability.
If all of that is not sufficient, I spent a solid week in 2009 driving the incomparable Miss Phoebe from North Carolina to Virginia to Washington, DC for a series of meetings on comparability.

It is from that lofty perch, therefore, that I offer these thoughts on the issue of comparability and spring 2021 state tests.

Results from Remote and In-person Test Administrations Are Not Comparable

Remote test administration appears to be a particular bogeyman for the comparability crowd. The lists of potential threats to comparability read like the ingredients label of any processed comfort food that we have relied on during the pandemic:

Differential access to the internet, differential access to devices (some students taking the test on a cell phone), suitable environments for testing, interrupted testing due to technology issues (e.g., dropped Zoom connections, screens freezing), interrupted testing due to physical issues (phone ringing, other people in the house, outside noise), differences in student motivation due to any or all of the above, having to take a test alone with no other students in the room, access to accommodations, differences in or lack of proctoring support, having to test at the kitchen sink [OK not really, but these lists have included everything but the kitchen sink]

Remote test administration is especially interesting because so many of the issues raised in its name get at the heart of the incomplete understandings (i.e., misconceptions) about comparability that plague the field. Comparability, like validity, is centered on the inferences that one wants to make from test results. Conditions that may increase comparability for one set of inferences, may actually decrease it for another.

The comparability question that matters in spring 2021 is not whether the performance of students taking the test remotely should be compared to those students who have been attending school and were administered the test in-person.
The question of interest is not whether students taking the test remotely might have performed differently under more favorable test administration conditions (maybe).
The question is not whether students instructed and tested remotely would have performed differently if there had been no pandemic.

The question of interest is how well does test performance reflect students’ level of achievement given the instruction they have received for the past year. If there is enough confidence that the extent to which test results reflect actual student achievement is sufficient for the purposes for which the tests results are being used then the results are comparable.

Testing conditions which we might label as noise under normal circumstances may be signal this year. Many of the conditions included in the list above are very real and arguably will affect a student’s test performance. For students taking the test remotely, however, those same conditions have likely affected their instruction for the past twelve months. The same factors that might depress test performance have likely had a negative impact on instruction and depressed achievement throughout the school year. How likely is it that those conditions will affect student performance differently on test day than they have throughout the year?

A final note in this section is that the last thing states want to consider in 2021 is the use of mode adjustments to attempt to control for differences due to the pandemic. At a minimum, mode adjustments require an assumption that the differences being adjusted for are due solely to a construct-irrelevant factor such as students’ unfamiliarity with a computer test platform. For the reasons discussed above, it’s not likely that is the case this year. With the introduction of computer-based testing a few years ago, the field made many mistakes in attempting to use mode adjustments as a panacea – a miracle cure for all sorts of design flaws. We don’t want to be error-repeaters in that regard.

We Won’t Be Able to Test All of the Students

It’s almost certainly true that many states will be unable to test 95% or more of their students in spring 2021 as they might in a normal year. There will be opt outs, issues with remote and hybrid schooling, and other reasons why some students will not be tested. OK, fine. Is this an all or nothing question? 95% or bust?

To this point in time, missing from the “not comparable because of the sample tested” argument is serious discussion of and guidance on what states will be able to do with test results if some, but not all students, can be tested. Is there a cutoff in terms of percentage of students tested or representativeness of the sample?

Do we throw out state testing if 90% of students can be tested?
What about 70%? 60%?
What about 50%, if it’s a representative sample?
What if the percentage of righteous students tested is five less than fifty? Will you throw out state testing for the lack of five percentage points?

The interpretation of test results be more challenging is not all students are tested. The jump to the conclusion that this is a valid reason for not testing, however, is dependent upon the legitimacy of the third comparability-related argument.

Results are Likely to be Misinterpreted

The crux of this argument is that test results are difficult to interpret and are often misinterpreted or misused under normal circumstances. Therefore, there is little chance that they will be interpreted correctly this year. In essence, it is our professional responsibility to protect assessment illiterate and data illiterate stakeholders (educators, policymakers, media members, the public, parents and students) from themselves. (We’ll start by telling them that a two-week lockdown will be enough to flatten the curve.)

This argument either a) reeks of professional arrogance, or b) is an indictment of a field has failed miserably for the past two decades at helping stakeholders interpret test results. Actually my money is on both. The effort to find reasons to stop state testing in spring 2021 has produced some strange bedfellows.

If anything, spring 2021 should provide a golden opportunity to make real progress in helping people understand the uncertainty and limitations that are always associated with state test results. Expectations of certainty and absolutes are low this year. People are open to cautions, caveats, and understanding alternate explanations for causes of performance. Why would we give up this opportunity to advance the public’s understanding of state test scores.

In conclusion

There are solid arguments to be made on both sides of the issue of whether state tests should be administered in spring 2021. Among those are legitimate technical concerns (e.g., equating, field testing) and concerns about the security of test items needed for future use. Let’s not make this decision more difficult for state policymakers, however, by clouding the debate with pseudo-technical arguments intended only to promote an anti-testing in 2021 agenda.

Image by S. Hermann & F. Richter from Pixabay

Share this:

Published by Charlie DePascale