It’s Saturday morning. The weekend lies before me. We might take the yacht up the river and allow ourselves to be enraptured by the foliage as it envelops us, or head over to the club for some golf, tennis, or one of the other myriad activities that well-to-do white folks engage in before ski season.
Perhaps, on a whim and caprice I will don my Crimson togs and take a drive to cheer on the boys at the Stadium. After all, football is to a crisp fall day as apple pie and the flag are to America.
Oh, but we must save the ritualistic male bonding for Sunday night – football night in America. Toss some red meat and asparagus on the grill, grab a craft beer, gather before the big screen, and immerse ourselves in a gridiron event so testosterone-fueled that it must be balanced by Adele and a specially written Carrie Underwood song.
As I sit on the veranda sipping my blueberry-lime-lemongrass smoothie and gaze out over the meticulously manicured pastoral grounds, the result of sedulous attention and care, however, I cannot help but reflect on how far we have come in test development since that magnificent red oak before me was a mere sapling.
But it’s all right now, we learned our lesson well
When I myself was the proverbial sapling still being shaped by my surroundings, all of the above would have been fair fodder for standardized test items – even the cigarette-selling slogan in the title.
And there were gaps in performance between groups of students. There were gaps based on sex, gaps based on race and ethnicity, gaps based on socioeconomic status.
But then, a young Jim Popham (Can you be young if you are ageless?) showed us the problems associated with test items based on yachts, tennis, football, and blueberry-lime-lemongrass smoothies. So, we were more careful about the context of our items.
Thus began a decades-long odyssey as we sought to find engaging passages that yielded relevant test items, but were not biased on the basis of race, SES, or SEX.
We warned classroom teachers about test items set in amusement parks. First, some students may not have encountered amusement parks. Second, “amusement park” is difficult word to read.
We stopped asking even generic questions about celebrating holidays because some cultures and religions didn’t observe holidays.
We stopped asking questions about snow days or vacation activities.
Alas, the siren song of engagement was too much for some of our best and brightest, sending them crashing into the rocks. Nevertheless, we persisted.
We mobilized the National Guard (and state police) when a high school test contained a poem about suicide.
We prostrated ourselves and cried “Mea Culpa!” when accused of spreading northeast, liberal environmentalist ideology through our reading passages and mathematics items.
We developed long lists of topics not to include on large-scale tests because of potential bias and sensitivity issues and convened diverse committees to ensure that those lists stayed current and that we did not stray from them.
And still there were gaps in performance between groups of students. There were gaps based on sex, gaps based on race and ethnicity, gaps based on socioeconomic status, gaps based on disability status, gaps based on English language proficiency, gaps in growth as well as gaps in achievement, and now with the pandemic there are gaps in the gaps.
Insanity is doing the same thing over and over and expecting different results
So, it’s back to the drawing board to design and build tests that are engaging, culturally relevant, and allow students to demonstrate what they know and can do.
Perhaps we had it all wrong. Rather than trying to minimize person-task interaction (an offshoot of aptitude-treatment interaction), perhaps we want to optimize it. We won’t say that explicitly (or out loud), of course, because we left the psychologists and instructional design people behind years ago, but you get the idea. We will attempt to match test items with students.
And you see, we have technology in place to make it possible, or to at least make it seem like it might be possible.
We have adaptive testing engines with algorithms to select specific items for individual students.
We already use student personal profiles to ensure that the proper accommodations are available to students during large-scale testing. On Smarter Balanced, for example, this is known as the Individual Student Assessment Accessibility Profile (ISAAP). The ISAAP process could certainly be adapted to collect information to identify the assessment stimuli students will find most engaging and will best allow them to demonstrate their knowledge and skills.
With that information in hand, the procedures developed by Ric Luecht and his disciples can be applied to produce parallel test forms tailored to each student.
Of course, there are some details to work out. It wouldn’t be appropriate to make decisions based on applying gross group classifications of race, ethnicity, or sex to individual students. And that’s before we even begin to consider the impact of all of the possible sociocultural combinations. Artificial intelligence will help.
And with all of those sociocultural combinations, it still might not be that easy to create equivalent assessment stimuli for individual students, but it will be better. And artificial intelligence will help.
That should keep us busy for a few years.
And still there will still be gaps in performance between groups of students, if anyone is still interested in looking at groups of students.
Progress is rarely linear, but it is never circular
We have to stop finding ourselves right back where we started.
We can make assessment more engaging and culturally relevant, but those improvements will produce only marginal or incremental improvement in student performance unless accompanied by radical changes in instruction, curriculum, school structure, school financing, and society.
We also have to be clear and certain about what we hope to get out of assessment that is more engaging and culturally relevant.
- Engaging and culturally relevant instruction – that’s a no-brainer. Watch any movie ever made about teaching. Last month, I re-watched Freedom Writers on Netflix and read Erin Gruwell’s memoir, Teach With Your Heart.
- Engaging and culturally relevant assessment tied to that instruction at the local level – sure.
- Engaging and culturally relevant large-scale testing – that’s a little trickier.
The need for a connection between curriculum/instruction (how something is taught) and large-scale testing (how the outcome is measured) has been a longstanding topic of debate. The ageless Jim Popham’s arguments on instructional sensitivity have been cited (on occasion, appropriately) to support the need for a close connection.
But large-scale testing is not instruction.
At some point, we will still want students to be able to apply their knowledge and skills beyond contexts that are engaging and culturally relevant. At least I think that we will.
Finally, as a nation, we also have to be clear and certain about what we want students to know and be able to do and why.
Is there a common core (generic, lowercase) of knowledge and skills in English language arts and mathematics that we want all students in the United States to possess? I believe that there is, but that common core is much more limited than currently expressed in state content and achievement standards.
I am less clear on whether there is a common set of outcomes, proficiencies, competencies, etc. on which we want all students to demonstrate the application of that common core. If there is, I am confident that it occurs much closer to the beginning than the end of secondary school. There are just so many beautiful ways for students to apply their literacy, numeracy, and thinking skills for me to believe that a common set of outcomes makes sense after the fundamentals are attained.
It’s Saturday morning. I think I’ll take a walk.