What images come to mind when you hear the word vertical?
- Is it the towering buildings that surround you when you attend a conference in New York City or Chicago?
- Is it your favorite ski run in Utah?
- Snowboarders soaring above the halfpipe a few months ago at the Olympics.
- Maybe that scene from The Polar Express. (Hold on Tightly!)
Now close your eyes and picture your favorite K-12 testing vertical scale?
What? You don’t have a favorite vertical scale? Oh, I see. You’re a normal person.
Even if you regard yourself as practitioner in the fine art of large-scale testing, however, it may have taken you a bit of time to conjure up an image of a vertical scale.
Because, you see, while we will spend hours (careers even) putting our scales and scale scores under a microscope trying to find the meaning hidden within, we spend relatively little time thinking about the scale as a whole and what it is telling us – the big picture, the whole elephant, as it were.
When you did form that image of a vertical scale, I bet that it looked very different than the mental images of vertical listed above.
Sure, the typical K-12 achievement test vertical scale starts out like an escalator on the DC Metro. By the time it reaches the end of middle school, however, it begins to look more like your typical moving walkway in an airport terminal.
Why is that?
Vertically challenged vertical scales are not just a feature of the criterion-referenced state summative tests developed to meet NCLB and ESSA requirements. They are not an unintended consequence. They do not reflect an attempt by policymakers and their assessment contractors to game the system – to flatten the curve to show that more kids are college-and-career ready.
The same drooping phenomenon was present in the vertical scales underlying all of the low-stakes norm-referenced, standardized achievement tests that preceded the current generation of state tests.
Why is that?
The Same but Different
At first glance, the graphs of vertical scales from the old NRT and current state tests look quite similar. Don’t be deceived.
There is one key difference that makes the persistence of the flattened vertical scale more confusing, and perhaps more troubling.
A typical NRT graphic would have shown 50th percentile performance at each grade level, perhaps with Q1 and Q3 thrown in for fun. The flattening of the vertical scale across grade levels on the NRT was simply depicting actual student performance in an era of high dropout rates and well-defined high school tracks.
When we look at graphs depicting vertical scales on current state tests, we are most often looking at the progression of achievement level cut scores across grade levels.
That is, the graphs of state test vertical scales do not depict actual student performance (what is), they depict expected student performance (what should be). And those expectations become quite flat across the high school grades.
That’s a horse of a different color.
Of course, I know that standard setting methodologies and the achievement level cut scores they produce, by design, are heavily influenced by actual student performance.
“Scratch a criterion and you’ll find a norm” – a quote attributed to Bob Linn and used often over the years by my Center for Assessment colleagues – certainly applies here.
But I cannot attribute all of the flattening to norms. There has to be another reason to explain why expectations for student achievement are so flat across high school grades.
Why is that?
Getting SMARTER about Vertical Scales
How flat are vertical scales? Very flat.
Even flatter than they appear on graphs that begin with grade 3 tests.
By starting at the end of grade 3, state tests cut off the most vertical portion of the vertical scale. That’s right. Most of the real action in a K-12 vertical scale occurs between kindergarten and third grade. You can still see that phenomenon if you examine the vertical scales for interim assessment programs that begin testing students before grade 3.
Let’s use the Smarter Balanced scale as an example of how flat vertical scales become at higher grade levels. I know that it may seem as though I have been picking on Smarter Balanced and its scale since the earliest days of this blog, but that is not the case. Some of my best friends are Smarter – and others wish they were. Smarter Balanced is just a victim of its own success. They are an established assessment program with a vertical scale and lots of publicly available information.
Smarter Balanced reports test scores from grade 3 to grade 11 on a scale that ranges from 2000 to 3000, with three cut scores at each grade level that classify student performance into one of four achievement levels. As described by the state of Hawaii in their Family Report Interpretive Guide, “Students who performed at Level 3 or 4 have demonstrated the knowledge and skills necessary for college and career readiness if they continue their progress.”
Tale of the Tape
English Language Arts/Literacy
- The Level 3 cut score increases by 151 points from grade 3 (2432) to grade 11 (2583).
- The increase is just 16 points from grade 8 (2567) to grade 11 (2583).
- The Level 3 cut score increases by 192 points from grade 3 (2436) to grade 11 (2628).
- The increase is 42 points from grade 8 (2583) to grade 11 (2628).
Flat but wide:
For context, compare the difference in Level 3 cut scores across grades 3 to 11 to the difference between the Level 2 and Level 4 cut score within a single grade, grade 8.
In English Language Arts/Literacy, there is a 151-point difference in the Level 3 cut across grades 3 to 11, while the difference between the Level 2 and Level 4 cuts at grade 8 is 181 points on the Smarter Balanced Scale.
In Mathematics, there is a 192-point difference in the Level 3 cut across grades 3 to 11, while the difference between the Level 2 and Level 4 cuts at grade 8 is 149 points on the Smarter Balanced Scale.
Why is that?
Where Do We Go from Here?
With regard to how our field has handled vertical scales, and perhaps should handle vertical scales in the future, two popular sayings come to mind.
Some very fine people in our field have put themselves through hell trying to make content-based interpretations of vertical scale scores. We want so, so badly to be able to place all students (and items) on a single scale. That elusive scale is our Ark of the Covenant, the one true score scale to rule them all; and it always seems to be dangling there just beyond our reach. So, we keep on going.
The problem, however, is that you keep on going when you’re in hell in order to get out before the devil knows you’re there. The assumptions are a) if you keep going you will find a way out and b) the devil doesn’t know that you are there.
In the case searching for meaning in vertical scale scores, there is no way out and the devil is well aware of our presence and our desires. The meaning we seek from vertical scales is an illusion. It is a temptation undoubtedly placed there by the devil themself.
The more appropriate response with regard to interpreting and acting upon vertical scales is to try something different. If the “micro” approach has not been fruitful, let’s try interpreting vertical scales from a “macro” level.
Vertical scales built on our norm-referenced and criterion-referenced tests have repeatedly told us two things:
- A great deal of change in student performance is occurring between kindergarten and grade 3.
- Relatively little is taking place after grade 8.
One interpretation of those two pieces of information is that things are going well in the early grades, therefore, we need to do more at the high school level to get things moving there. That seems to be the interpretation that has driven K-12 education policy in the United States.
That interpretation made some sense in the norm-referenced (what is) world but makes much less sense in the criterion-referenced (what should be) world.
An alternative interpretation, and the answer to our repeated “Why is that?” question, is that the reality is the critical action in K-12 education takes place before eighth grade, particularly before third grade. At least with regard to the things that we are measuring on our large-scale achievement tests, virtually all of the knowledge, skills, and abilities that we want all students to attain – that is, the common core (lowercase c’s) – are taught and learned through eighth grade.
That make sense. High school primarily is for students to engage with their interests, develop an area of specialization, and travel along multiple pathways in preparation for college and career – all the while, honing, building on, and applying the knowledge and skills that they have acquired in kindergarten through eighth grade in personalized pursuits.
It’s all right there in the vertical scale. It’s been staring us in the face the whole time.
Flip The K-12 Script
Therefore, the logical policy conclusion to be drawn from our horizontal vertical scales is that we should be investing more heavily in primary education and early childhood education.
And if public funding of education is a zero-sum game (with the possible exception of federal money) then that likely will mean pulling public resources from high schools. I’m OK with that.
If we are reinventing public education, I am fine with shifting more of the responsibility for financing secondary education to industries, organizations, institutions of higher education, and foundations; with having them take more responsibility for the transition to college and career (with adequate social safeguards, of course). The pendulum has already started to swing in that direction with some innovative programs in schools, districts, and states across the country.
If we want to reinvent public education, we have to be open to thinking differently about things.
After all, I never imagined there could be any value in trying to interpret vertical scales. I just needed to look at them from a fresh perspective.