In The Interim

All signs indicate that we are entering a time of transition for large-scale state testing. Consensus on the need to reform state testing may be surpassed only by widespread agreement on the need to reinvent and reimagine public education in the United States. I hesitate to add the modifier K-12 or P-14 to state testing or public education because, frankly, it is not clear to me what the starting and completion points to either state testing or public education should be or will be 5, 10, or 15 years from now.

What is clear to me is that the utility of state testing going forward will be determined largely by the way in which we approach this transition, the interim between current state testing, with policies and practices dictated by tradition and regulated by ESSA, and whatever form state testing may take in the future.

If it is true that past actions predict future behavior (and that premise is, after all, the cornerstone upon which educational measurement and testing rests), I would have little reason to be optimistic about the field’s ability to handle this transition effectively.

However, ever the optimist, and believer in lifelong learning for institutions as well as individuals, I am confident that the field will learn from our past and do better this time.

Because this is not the first time we have ventured down this path.

Once More Unto the Breach

Conservatively, I would argue that this is the fourth major transition, or inflection point, in large-scale state testing since I entered the field in the fall of 1989. Like this one, the three previous transitions each had their own unique point of emphasis.

The first transition in the early 1990s, spurred in part by a rejection of traditional NRT (i.e., standardized, multiple-choice, norm-referenced tests) reflected a shift toward the classroom and authentic assessment.
The second in the early 2000s, a direct result of NCLB, reflected a shift to annual testing and a focus on individual student results.
The third in the early 2010s was the shift beyond the bubble test, with states leading the way to the next generation of state tests designed to assess rigorous college-and-career-ready standards with results to be reported in terms of agreed upon nationwide achievement standards.

We can look back on those three attempts to reform state testing, neatly spaced every ten years, and lament that none of them had the desired impact on state testing, and more importantly on instruction and learning. None met or exceeded expectations, none fulfilled the dreams of the dreamers who championed their cause.

And there would be some truth to that.

Why did we fall short?

As it did in 1989, state testing still relies heavily on selected-response items, predominantly measures lower- level cognitive skills, achievement standards still vary across states, and testing may remain more norm-referenced than we like to believe.

But we cannot ignore the fact that state tests and state testing are very different than they were in 1989 – even improved in significant ways.

I could generate a long list of changes and improvements beginning with what is tested and how it is tested; who is tested, how and how often they are tested; what results are reported, how they are reported, and how they are used.

Most of those improvements, however, I would characterize as renovations to the existing structure. Reimagining and reinventing require razing and rebuilding, not simply renovating.

Why did we fall short?

Reflecting on three decades and three transitions, I would argue that in each case the battle was lost before it began, in the planning stages – in the interim.

Reap What You Sow

Then he told them many things in parables, saying: “A farmer went out to sow his seed. As he was scattering the seed, some fell along the path, and the birds came and ate it up. Some fell on rocky places, where it did not have much soil. It sprang up quickly because the soil was shallow. But when the sun came up, the plants were scorched, and they withered because they had no root. Other seed fell among thorns, which grew up and choked the plants. Still other seed fell on good soil, where it produced a crop–a hundred, sixty or thirty times what was sown.

Our efforts to reform state testing in the 1990s were very much like scattering seed along the path. Ideas were strong and intentions good, but planning and execution were rushed both on the technical side and the implementation side. We didn’t prepare the field – either ourselves or the test users.

With NCLB, the field mobilized for annual testing. Technology was leveraged to create and assign student identifiers, accommodations and accommodation policies were expanded to include all students, and statistics were developed to monitor student growth. But a continuous cycle of assessment and accountability allotted no time for school improvements to be designed, developed, implemented, and take root where they were needed most.

Our reform efforts in the 2010s included many of the prior problems, plus they also fell among the thorns which grew up and choked the CCSS.

During each testing transition, we failed to take the time in the interim necessary to ensure that we had good seeds to sow and to prepare the soil in which we scattered them.

Preparing The Fields

Reimagining and reinventing take time – time necessary to accomplish very important tasks.

The 1990s was an era of irrational exuberance, and as Rich Hill suggested in his 2000 reflection on the successes and failures of the assessment efforts in Kentucky, we underestimated the time and resources necessary to effect change. The Kentucky Education Reform Act was signed in April 1990 and the first assessment administered in 1992.

As for the 2010s, I have long contended that the goal of the 2010-2015 Race to the Top Assessment Program should have been to produce a blueprint and request for proposals (RFP) for a next generation state assessment program — not an operational assessment program. An RFP for a research-based and empirically tested assessment program is the outcome most commensurate with the timeframe and the money allocated to the program.

We repeatedly fail to devote the time and energy needed to prepare our own field (i.e., large-scale testing, educational measurement, psychometrics) to reinvent state testing. And states repeatedly fail to devote the time and energy needed to prepare the broader field (i.e., educators, students, parents, policymakers), or to allow them to prepare themselves, for the changes in practice that the reforms associated with education reform require.

This is not a call to be overly cautious, conservative, or to maintain the status quo. It is not a call to wait until we have all of the answers, and everything is perfect, before doing anything.

I am a firm believer in the perils of analysis paralysis and that perfection is the enemy of the good. However, I have also witnessed the results when assessment programs are implemented before they or the field is ready.

Good is good but could be better if only we do things the right way this time.

Inch by inch, row by row, I’m gonna make this garden grow

What is the right way? What will it take to prepare the fields?

An internet search for creating the conditions for effective change will return some combination of these ‘C’ words.

Commitment, Clarity, Capacity, Construction

Add balanced assessment systems to your search and expect to see

Coherence, Comprehensiveness, Continuity

Of course, near the top of every list is

Communication

Lots of words. Lots of activity taking place in the interim.

To wrap up this post, let’s focus on those first four overarching, critical concepts that may have been most lacking in our previous attempts to reform assessment.

Commitment

There must be commitment to change on the part of stakeholders at all levels – including the long-term commitment to provide sufficient time and resources. Commitment cannot be mandated or generated via a top-down approach. It requires creating buy-in.

Clarity

A good first step in creating buy-in is clarity. Clarity in the purpose of the assessment and its intended uses. Clarity in the changes in curriculum, instruction, and student learning that will be necessary. Clarity in the relationship between the assessment, its results, and the desired changes in practice. Clarity in the instrument. Clarity in the criterion.

Capacity

Commitment and clarity are insufficient unless we also build the capacity to implement the desired changes. Computer-based testing, in general, and adaptive testing, in particular require new skills. Assessing deeper learning requires new skills. Interpreting complex student performances and processes requires new skills. Effectively modeling complex school-teacher-student interactions requires new skills. Developing and implementing curricula and instructional materials for comprehensive college-and-career-ready standards requires new skills.

Construction

A new test and testing program will have to be constructed; and if it really is a reinvented assessment that is being constructed, take your initial time/cost estimates and double or triple them. Of course, test construction is just the tip of the iceberg. Reimagining and reinventing assessment and education will require massive infrastructure changes. Consider the construction that was necessary between 2010 and 2015 to successfully roll out computer-based testing. Consider the construction that was necessary between 2002 and 2006 to create statewide information systems and assign student identifiers. That level of effort and activity was needed simply for renovations to assessment. Reinvention may require exponentially more – to borrow a favorite pandemic word.

It will take time to reimagine and reinvent state testing and public education. But there is plenty to keep us busy in the interim.

Image by J Garget from Pixabay