Life is full of three-word phrases.
Some tend to have profound and lasting consequences that extend far beyond what may have been intended when they were uttered. Phrases such as I Love You, That Looks Safe, and for those among us wavering on new year’s resolutions, Just One Bite might fall into this category.
Other ubiquitous three-word phrases like While Supplies Last, Limited Time Offer, Exclusions May Apply, and Void Where Prohibited function exactly as intended; even if we are usually not happy to see them. Often hidden in the fine print, their sole purpose is to put constraints on an offer or claim that is being made.
In the last couple of years, a three-word phrase has begun to make its way into the assessment lexicon – on this test. At first glance, the phrase, or a close variation of it, seems neither new nor threatening when used to describe student or group performance. Charlie spelled 23 words correctly on this week’s spelling test. Karla met the college readiness benchmark on the SAT. In Vermont, 42% of grade 5 students performed at the Proficient level or higher on the Smarter Balanced mathematics test. Taken at face value, the phrase is used simply to identify the test that was taken.
Recent use of this common phrase, however, is intended to do much more than identify the source of performance. Its purpose is to limit interpretation of student or school performance; to make it clear that the performance should be interpreted within the specific framework of the test or testing program.
Again, at first glance, we might regard this use of the phrase as innocuous or perhaps even a step forward in test use and interpretation. Identifying the source of a test score seems quite consistent with many of our Standards for Educational and Psychological Testing, beginning with Standard 1.0:
Clear articulation of each intended test score interpretation for a specified use should be set forth, and appropriate validity evidence in support of each intended interpretation should be provided.
When considered as part of a larger effort to marginalize and vilify large-scale assessment, however, the connotation of the phrase on this test changes dramatically. It is the second punch in a one-two combination intended to knock out large-scale assessment. The left jab that has weakened the credibility of large-scale assessment is the argument “Scores on large-scale assessments are ______” – fill in the blank with your favorite criticism: not valid, unfair, inaccurate, not representative, unstable, insufficient, not authentic, etc. Now with the right cross of on this test, critics of large-scale assessment (or its uses) seek to nullify test scores by limiting their interpretation to that already weakened large-scale assessment.
Even the most well-designed assessment program can sustain only so many of these blows before collapsing in a heap to the canvas.
My first encounters with the on this test crowd occurred while working with two states setting achievement standards on their new college-and-career-readiness tests. A vocal minority in both states were adamant that the phrase on this test should be added to each achievement level description. Their stated intent was to convey that students’ performance on the state assessment was not representative of their overall level of achievement.
My most recent encounter came late last year in a Stephen Sawchuk post in Edweek about the decision to add the modifier NAEP in front of the achievement level classifications on the National Assessment of Educational Progress; as in NAEP Basic, NAEP Proficient, NAEP Advanced. As stated in the post, “[t]he rewording may seem awfully minor to the uninitiated. But there’s a deeper subtext behind the changes, and that’s why this is worth noting.”
For their part, the National Assessment Governing Board (NAGB) makes the argument that the addition of the NAEP modifier is intended to clarify that the NAEP Proficient level, “is not intended to reflect ‘grade level’ performance expectations, which are typically defined normatively and can vary widely by state and over time. NAEP Proficient may convey a different meaning from other uses of the term ‘proficient’ in common terminology or in reference to other assessments.” Forgoing for now a discussion of whether the NAEP achievement levels are defined any more or less normatively than any other achievement levels, nobody can deny that achievement standards can and do vary widely by state and over time. Consequently, there is confusion when the same label Proficient is used across states and assessments to describe those varying standards. From that perspective, the label NAEP Proficient serves the purpose of clearly identifying the set of achievement standards against which student performance is being judged.
For long-time critics of the NAEP achievement standards, however, the modifier is another weapon in their fight to marginalize NAEP results. The achievement level results no longer represent what proficient fourth or eighth grade students across the United States should know and be able to do; rather, they simply reflect NAEP Proficient – a mythical concept that is not tied to any state’s grade level standards and expectations.
As assessment/measurement specialists, our professional values and standards have made us unwitting accomplices in the effort to undermine large-scale assessment. We agree with and/or can be quoted making statements such as
- test scores should not be considered in isolation,
- a student’s score on a given day or test might not reflect her/his true performance,
- multiple measures should be used to evaluate student achievement, or
- a test score reflects student performance on this test.
In the past, we mastered the art of expanding on those statements via PowerPoint bullets and charts to defend large-scale assessment with winning arguments before policy makers and the courts. In this era of soundbites, tweets, and memes, however, we may never get that far.
With hubris, we attach a great deal of importance to our work and our high-quality assessments. Remember, however, that without the ability to generalize student or school performance beyond a particular test we have nothing. The task before us is clear; and if we envision a future in which large-scale assessment makes a valuable contribution to improving student learning, we must not fail on this test.