IADA and the Comparability Fallacy

The recent Request for Information by the USED related to improving the Innovative Assessment Demonstration Authority (IADA) has produced a spate of posts, letters, articles, etc. related to comparability.

Spate is such a lovely onomatopoeic word. And so appropriate to describe the arguments being made about comparability.

Spate.

It feels like some sort of formal (perhaps British), past tense of “spit” as in

Those chaps really were spating into the wind with those proposals.

or

That take on comparability spate in the face of conventional measurement thinking.

On top of that, when I say it out loud, it evokes the same sense as the word “splat”; that is, the sound that their arguments about comparability and IADA make as they hurl them against the wall in the hope that one might stick.

While there are many reasons why IADA may have fallen short of its intended goals (whatever those might have been), it is difficult for me to regard its comparability requirements as a primary culprit. In my view, it’s more likely that comparability is being used as either a convenient scapegoat or as a straw man to be knocked down in the service of some ulterior motive.

Before circling back to what that ulterior motive might be, let’s briefly consider two key questions:

  • Why require comparability in IADA?
  • What is comparability?

Why Require Comparability in IADA?

To answer the question “Why require comparability?” we first have to answer the question “What is IADA?” and why was it a part of ESSA.  Unfortunately, the word salad name of the program, “Innovative Assessment Demonstration Authority,” provides few clues to those of us who are not native speakers of the “Capitol Hill” dialect of English. Even more unfortunate, those who jump on the first two somewhat familiar words, “Innovative Assessment,” are headed down the wrong path.

Although less than a decade ago, we tend to forget the context in which ESSA and IADA were formed.

The federal government had just poured billions of dollars into the Race to the Top Program and hundreds of millions of dollars were dedicated specifically to building the next generation large-scale assessment systems. ESSA was an opportunity to reap what had been sown.

It was also a time when the federal government was pushing assessment audits and caps on testing time. The law included a provision that allowed states to use college admissions tests in place of their required high school assessment and to replace the traditional the end-of-year state test with interim assessments. ESSA was a time for efficient solutions.

One can reasonably conclude, therefore, that the “innovation” that IADA was looking for was the innovative use of the technological infrastructure states had built under NCLB and RTTT in order to use next generation assessment systems more efficiently produce the estimate of students’ college-and-career-readiness states needed to feed into their accountability systems and meet federal accountability requirements.

I will discuss how the educational assessment community tends to use the term “innovation” in my next post, but for the purpose of this post it’s important to understand that the colloquial use of the term by the rest of the world (including those on Capitol Hill) is focused primarily on efficiency. As one online definition puts it:

The purpose of innovation is to come up with new ideas and technologies that increase productivity and generate greater output and value with the same input.

Innovation is about efficiency, increasing productivity while reducing costs – including the cost of time. Innovation is a value-add for the investment in assessment, student information systems, computer-based testing platforms, the common core content standards, and added provisions for assessment flexibility in the new law.

The goal was not to come up with a new definition of college-and-career readiness. Rather, the goals was to find a more efficient way to determine whether high school students were, in fact, college-and-career ready and other students were on track to college-and-career readiness.

Within that context, the requirement for comparability makes perfect sense.

What is Comparability?

When I reflect on my own experiences focused on answering the question, “What is Comparability?” the oft-cited observation made by President John F. Kennedy at a 1962 White House dinner honoring Nobel Laureates comes to mind:

I think this is the most extraordinary collection of talent, of human knowledge, that has ever been gathered together at the White House, with the possible exception of when Thomas Jefferson dined alone.

Over the past couple of decades, I had the good fortune several times to be in the same room with the “most extraordinary collection of talent” from state departments of education, academic institutions, and the assessment industry as they attempted to define the concept of test and test score comparability and determine how to evaluate comparability in real-life assessment settings.

Around the same time that the Common Core State Standards were being developed, I had the privilege of playing a small part in a multi-year, multi-state project for CCSSO, led by Phoebe Winter, Evaluating the Comparability of Scores from Achievement Test Variations.

The first major takeaway from that project was:

  • “The comparability of test scores is a matter of degree. How comparable scores need to be for a specific test variation depends on how the test scores will be interpreted and used. We can think of the degree of comparability along two related dimensions, content and score level, as shown in Figure 1.” (p. 6)

comparability

In December 2016, USED made it clear that for IADA they were interested in rather low levels of comparability on both dimensions (i.e., state content standards and either achievement level or pass/fail scores).

A second takeaway from the project was that there is a difference between how psychometricians traditionally have used the term “comparability” and the use of the term with regard to questions about the use and interpretation of scores in a program such as IADA. The IADA interpretation is much closer to the colloquial use of the term.

Those takeaways in no way suggest that comparability questions about the use of test scores are not complex and do not require careful investigation. Those questions are complex, and they must be addressed thoughtfully and completely. There is no need, however, to blur the comparability issue with irrelevant technical concerns and their associated baggage.

 About a decade later, I was able to participate with another extraordinary collection of talent in the conceptualization and production of the National Academy of Education publication, Comparability of Large-Scale Educational Assessment: Issues and Recommendations, which “provides guidance to key stakeholders on how to accurately report and interpret comparability assertions concerning large-scale educational assessments as well as how to ensure greater comparability by paying close attention to key aspects of assessment design, content, and procedures.”

As for the Thomas Jefferson aspect of the JFK quote, throughout the decade bookended by those two projects, I was most fortunate to have an office next to Brian Gong, where my thinking on comparability evolved over the course of countless formal, informal, and often impromptu discussions that ran well beyond the time one or both of us was supposed to have been home for dinner.

In addition to a chapter co-authored in the aforementioned NEA publication, another tangible product of those discussions with Brian was a 2013 paper commissioned by CCSSO, Different But the Same: Assessment and comparability in the era of the Common Core State Standards, in which we addressed some of the most pressing questions on the mind of state chiefs and policymakers at the time (if not state assessment staff):

  • Just how comparable would results from the PARCC and Smarter Balanced tests be,
  • How comparable did they need to be? and
  • How could states and researchers go about answering both of the previous questions?

Comparability questions not all that different than those that might be posed under IADA.

Just as the comparability claims that states and others wanted to make about college-and-career-readiness based on PARCC and Smarter Balanced Scores were rather straightforward, so to, I believe, are the comparability claims expected by and within IADA:

Regardless of the assessment system used (current traditional assessment or a new innovation) can the state make the same claim, with the same level of confidence, about whether a student meets the state’s standard for college-and-career-readiness, proficiency, mastery, or whatever the state has chosen to call its desired outcome?

Frankly, within the framework of “innovation as efficiency” the current comparability requirement serves as a good, if indirect, gatekeeper for the IADA. At a minimum, it requires people to demonstrate that they are able to think outside of the box.

Here’s the bottom line: If a state, its assessment contractor, and/or its technical advisors is unable to accomplish the relatively pedestrian task of crafting a comparability strategy and argument good enough to suffice for the Secretaries and functionaries at the USED, I have little faith that they have what it takes to conceive of, develop, establish, implement, and evaluate an innovative assessment program.

For the sake of argument, however, let’s accept the premise that our best and brightest have not run off to certification testing and that there are at least still a few folks left working on state testing who are able to think outside of the box.

What, then is the problem with the IADA comparability requirement?

Thinking Of A Different Box

The simple answer is that the people protesting most loudly about the IADA’s comparability requirement are not interested in thinking outside of the current box, or more accurately, innovating within the current box. They are interested in building a different box.

They are not interested in doing the same thing differently (i.e., innovatively, more efficiently, more productively). They are interested in doing a different thing.

Comparability with the current definition of college-and-career-readiness in English language arts, mathematics, or science is not their goal. They have a different outcome in mind. They may argue that theirs is a better outcome for any number of reasons, and most likely they will be correct.

So be it. Different is Different. And states have done different before, well before there was an IADA.

Sometimes, states have double tested a sample of students and schools for a year, or two, as they considered a new system.

Sometimes, states have petitioned USED for a transition year as they implement a new system.

Often, states have simply made a clean break from one year to the next as they switched assessments, content standards, and/or achievement standards. (Perhaps not the best option, but who am I to judge?)

What’s different about this time that states feel that they need to pull students, and more importantly perhaps, schools, out of their current assessment and accountability system for multiple years in order to effectively implement their new system?

That is the question that advocates calling for changes to the comparability (and other) requirements of IADA must answer – along with explicitly making their argument that it is time to do something different, not simply do, measure, and assess the same thing differently.

Then it will be up to the powers that be to decide whether that endeavor fits within IADA or there is a need to come up with a new program.

In my next post, I will review how innovative assessment programs and the way that we discuss innovation in educational assessment differs from the innovation as efficiency model and discuss whether it makes sense to force fit innovation into a program like IADA.

Spoiler alter: It doesn’t.

Header image by Mediamodifier from Pixabay

Published by Charlie DePascale

Charlie DePascale is an educational consultant specializing in the area of large-scale educational assessment. When absolutely necessary, he is a psychometrician. The ideas expressed in these posts are his (at least at the time they were written), and are not intended to reflect the views of any organizations with which he is affiliated personally or professionally..

One thought on “IADA and the Comparability Fallacy

Comments are closed.