We are at halftime of the 2022 NAEP Reporting Bowl and, my friends, let’s just say we have some catching up to do. The NAEP Long Term Trend results are in the books, and they are not good. Others have chosen more colorful terms like “harrowing” and “shocking” to describe the results but, at least for now, I’ll stick with not good.
For the second half still has to play out. After a brief break to analyze the LTT trends and tendencies and make some half-time adjustments to the narrative, we’ll be right back on the field for the release of the “Main NAEP” results in October.
The chances that things will change dramatically in the second half are slim to none. They rarely do. And if the results do change dramatically, man that would be a royal mess.
But you never know, right. That’s why you play the game hard from the opening kickoff until the final whistle.
After all, there has to be a good reason why NAGB chose to administer two very different NAEP tests in 2022 and report the results a month apart. It certainly wasn’t because people were clamoring for more testing during the 2021-2022 school year. As I recall, there was still some pretty strong and heated opposition to subjecting schools and students to large-scale testing last year – think about the students.
If there was no reasonable expectation that the two NAEP tests might provide some very different information to inform policy and support the learning recovery, why break with tradition and force fit an extra LTT test into the mix.
Otherwise, you would think that NAGB might have just followed the lead of most states (#StatesLeading) and reported National results from the Main NAEP in September and followed those up with “local” State results in October.
But it is what it is. There were two NAEP tests and we are at halftime in the reporting cycle. So, here are a few halftime observations.
Erased Two Decades of Progress
One of the most popular immediate hot takes from the NAEP LTT results is that the pandemic has erased two decades of progress in public education. Now that’s a bold claim and a lot of weight to put on the results of a single no-stakes test that measures knowledge and skills considered important more than a half century ago, was administered via a paper-and-pencil test form, and provides no results back to the student.
But for the sake of argument let’s accept the results at face value.
The problem with the “erased two decades of progress” interpretation is that NAEP LTT results don’t show two decades of progress.
Let’s start with Reading. What the Reading results of 9-year-olds basically show is not two decades of progress, but rather one “jump” in progress about the time that NCLB was being enacted. Note also that this jump or “one giant leap” occurred when NCLB was a novel idea, before the full annual testing requirements for NCLB kicked in, and before the idea of reaching 100% proficiency in Reading was fully abandoned as a fool’s errand.
The story in Mathematics is similar
Instead of one jump in performance there are two. Again, one at the time of NCLB and one about a decade earlier. You can do your own research on what was taking place in the math world in the late 1980s and early 1990s. It’s quite fascinating.
Ready Set Jump …
In both Reading and Mathematics, performance in the periods prior to and after the jump is relatively flat. What we see is that there is some type of event that serves as a catalyst that stimulates instruction and results in higher performance on NAEP. Then we wait for the next catalyst.
In Mathematics, there might appear to be slight “progress” from 2000 to 2012 and a slight “decline” from 2012 to 2020, but NAGB tells us none of that wobble was significant. That is how they can say things like the drop from 2020 to 2022 was the first ever drop in math scores even though the graph suggests that scores have been declining since 2012. Because of the pandemic, of course, we’ll never know whether the 2012-2020 “decline” would have continued to become the actual “first ever drop in math scores” in 2024.
In this case, the event that precipitated change in instruction was the pandemic. Rather than serving as a catalyst, of course, the pandemic had the opposite effect. The results of NAEP and pretty much all other tests reflect the effect of the pandemic on instruction and student learning.
One scrap of good news is that unlike the previous events that served as catalysts, there is no reason to believe that pandemic produced a lasting or long-term negative effect on instruction – at least not directly, although we certainly cannot ignore the indirect effects on teachers and students.
A Five-Alarm Fire
Some have compared the NAEP LTT results to an alarm, a clear signal that there is a national problem that must be addressed, a call to arms. There may be some truth to this interpretation. There is no question that the release of NAEP results commands a level of attention that no other test results can match.
The problem with the “NAEP as alarm” metaphor is that alarms are generally most useful when they warn us of a situation that is occurring (e.g., there is smoke, do something about it or get out now) or better yet warn of impending danger (e.g., someone is trying to break into my house). These NAEP results tell us that the house burned down six months ago after burglars had broken in and removed everything of value. And that’s not a criticism or flaw of this NAEP test. It’s a characteristic of all NAEP tests and pretty much all state summative tests.
Large-scale summative state tests have often been compared to autopsies or postmortems – and not in a favorable way. As the gold standard of large-scale K-12 summative assessment, it makes complete sense, therefore, that NAEP is the ultimate postmortem.
What else could it be?
Postmortems and autopsies serve a purpose.
Just the facts, Ma’am (plus a whole lot more)
It has been said so often and by so many that it’s taken as fact that NAEP’s job is to tell us simply and directly what happened and not why it happened.
NAEP measures performance at Time A, measures performance again at Time B, and tells us whether there was a significant change in performance (positive or negative) during the interval between Time A and B. NAEP results do not tell us why performance changed.
Nothing could be further from the truth.
Just Sit Right Back and You’ll Hear a Tale
NAEP is designed specifically to tell us why — to not only to report performance but to tell the story of that performance, the Story of US Education.
As a state assessment practitioner (whether at an assessment company, state department of education, or as a consultant) I always harbored a bit of jealousy toward the folks working on NAEP. You probably didn’t know that. I’m sure I hid it well.
I envied not only the time that they had to process results and prepare the narrative, but perhaps even more I envied the information they had at their disposal. As I wrote in my November 2020 post, Give NAEP a Chance, contextualizing reporting is one of the things that NAEP does very well and that states would do well to emulate.
NAEP reporting is old school. NAEP may report results in terms of scale scores and statistical significance, but let’s face it that information is as meaningless as it is useless. I am not even sure that the scores that NAEP reports fit the criteria that we normally use to define a scale. It doesn’t matter.
The value-added that NAEP provides is in the wealth of background and demographic information that it uses to dig deeper into the results – information about inputs, attitudes, and more. Some of that information may be collected from questionnaires or surveys. Some of it may be gleaned from other available data sources. All of it is used to shape and tell a story.
Many state assessment programs used to do the same thing or at least attempt to do the same thing. But that was before there was tremendous pressure (political, legal, instructional) to release results as soon as possible. That was before questionnaires became invasions of privacy. That was before the focus of reporting state tests shifted from the school and describing school performance to the individual student – at the same time as the high-stakes use of state test results for school accountability increased. But that’s another story.
The Second Half
The score is 28-3 and we know how this game is going to end. There is not going to be a miraculous second-half comeback, no last second Hail Mary pass. In fact, as I said at the top of this piece, NAGB is royally screwed if the results of the Main NAEP tell a different story than these NAEP LTT results. NAGB knows that and the rest of us know that. So, I am confident that won’t occur.
So, why should we keep watching the game?
Well, one of the most powerful pieces of “background” or “demographic” information that NAEP uses in its reporting is “State”. When Main NAEP (formerly known as State NAEP) came into being, it gave NAEP a second lease on life.
It was a simple bar chart comparing NAEP and State Test percent proficient by state that was the catalyst for the Common Core. Yes, NAEP caused the CCSS even if NAGB didn’t adopt them. It was the image of Tennessee falling off of the bottom of the NAEP chart of ordered State Test proficiency cuts that was the walk of shame that helped spur reform in Tennessee.
Is there a state, or states, which weathered the pandemic better than others or has begun to recover more quickly? If so, then we begin to look at the other background variables. What was different about their situation or how they handled it? What information is there that I can use in my state?
It’s not that NAEP is a better test than all other K-12 summative tests. It’s not that NAGB and its assessment contractors are better than all state departments of education and their contractors. And it’s not that NAEP provides any information at all about one’s own state that the state didn’t already know (or shouldn’t have already known). No, NAEP is the gold standard because of the information it provides about other states.
That is why we await the Main NAEP results next month. Frankly, that is why we await the Main NAEP results every two years.