I have discussed re-imagining assessment and offered cautions to the assessment/educational measurement community as we enter the next once-in-a-generation period of re-imagining and reinventing assessment, particularly large-scale testing. In this post, I synthesize those thoughts into a straightforward recommendation to the field:
- When we truly re-imagine assessment, in virtually all cases, we are doing so in support of efforts to re-imagine education.
- Implementing a new vision of assessment will be messy. Implementing a new vision of education will be magnitudes messier.
Do It Anyway!
The fight is never about grapes or lettuce. It is always about people. – César Chávez
Earlier this month, I listed several of the assessment innovations that my colleagues and I introduced at NCSA over the years. Each and every one of those large-scale testing innovations was intended to support a specific education reform.
- Efforts to design, administer, score, and set standards on constructed-response items were in response to the need for the large-scale assessment program to align with education reform efforts in Kentucky.
- We developed a method for setting standards on portfolios because Massachusetts had determined (rightly so, in my humble opinion) that an alternate assessment portfolio was the best method of assessment to support instruction for students with significant cognitive disabilities.
When Achieve and the states participating in the American Diploma Project introduced the Algebra II end-of-course exam in the early 2000s, it was in support of their research-based belief that it was essential for all high school graduates to be able to apply the higher-level math concepts historically taught in Algebra II.
The PARCC and Smarter Balanced consortia were developed to support states’ implementation of the Common Core State Standards.
Standards-based assessment is in support of standards-based instruction.
Competency-based assessment is in support of competency-based education.
Assessment Reform supports Education Reform. We can never view it the other way around.
Better a diamond with a flaw than a pebble without – Confucius
I have rarely been a fan of the aphorism perfect is the enemy of good when it has been applied to large-scale testing; too often having seen it used as a justification for mediocrity, expedience, or a lack of imagination. Similarly, as my colleague Carla Evans pointed out in a recent post, our field has not been served well by the practice of flying the plane while we are building it. There is a huge difference, however, between trying to figure out how to attach the wings and engine as we were doing in the 1990s and identifying adjustments needed to improve the user experience or to reach the destination more efficiently.
Not being able to do something perfectly out of the gate, should not stop us from doing it at all.
As much as we like to think that assessment can lead the way, the reality is that large-scale test development will always lag behind instruction. When it comes to duplicating or modeling desirable instructional practices on large-scale tests, we will know how to do something in the classroom long before we figure out that we would like to see the same practice on large-scale tests, and even longer before we figure out how to make it work on a large-scale test – at scale – and perhaps for accountability. Item types developed specifically for large-scale tests rather than the classroom (e.g., some of those now being developed to assess the NGSS) invariably will have to be tested, evaluated, and adjusted multiple times in an iterative process before they function optimally.
We now live in a world where concepts like rapid product development, agile development, and continuous design are the norm. Customers expect a functional product from the start, but they also expect that product to continue to evolve and improve to better meet their needs.
The sad part of this is that large-scale testing already operates in a state where we have to make significant adjustments to our products after their initial release and continue to refine and adjust them year after year – even as we try to hang on to the claim that we are not changing the measure. We make all of these adjustments because we have to. Imagine how much better at it we might be if we incorporated this iterative design thinking process into the large-scale test design, development, and implementation process – and communicated that process effectively to stakeholders and users.
Note: Don’t confuse knowing that something works in the classroom with strong school an effective teacher, supportive school leadership, and sufficient resources with being able to apply that same instructional practice at scale. Applying instructional practices at scale is a different challenge, and an even more difficult one, than applying large-scale testing practices at scale.
Keep Your Eyes on the Prize
When attempting to re-imagine and innovate in assessment, the most important and most difficult task will be to keep your eyes on the prize – and as stated above, the prize is a change in education, not a change in assessment.
There will be many times when the temptations and distractions to look in another direction are great.
- During the initial design phase, the pressure will be to allow yourself to be boxed in by constraints, past practices, and the context-specific interpretations of the Standards that apply to current large-scale constraints and practices.
- When the initial rollout reveals that adjustments are needed, as described above, there will be calls to abandon the innovation in favor of tried and true
- When initial excitement accompanying the new initiative wears off, results are not immediate, and the realization of how much work lies ahead to improve the assessment and to improve the education system sets in, political will and the tangible and intangible support that comes with it may falter.
Keep your eyes on the prize
The last great push to innovate in large-scale assessment in the 1990s was driven by many of the same factors at play today. Portfolios, performance events, essays, and constructed-response items were seen as a way not only to measure deeper, more authentic learning, but also as a way to allow more students to demonstrate what they knew and could do on their own terms and in their own way. To some extent, each of the reasons listed above played a part in shutting down those efforts across the country.
I will not argue that we were ready to implement large-scale portfolios, performance events, and other school-based assessments for accountability purposes in the early 1990s. We weren’t. We were learning (while flying) and the assessment programs were improving fairly quickly, but they weren’t ready for prime time.
With the benefit of hindsight, I have to wonder what education and large-scale testing would look like today, three decades later, if we had somehow been able to find a way to stay the course through second half of the 1990s. Do it anyway.
Keep your eyes on the prize
In the period ahead, the assessment and measurement community again will be asked to consider constructs not yet measured, approaches to assessment that extend well beyond traditional large-scale testing, and technical challenges that it has either not considered previously, or more likely, has considered but not yet solved. Beyond those measurement issues, there will be social issues related to the appropriate use of assessment. Even before the events of 2020, we were being asked to address social issues related to the use of assessment and test results.
Keep your eyes on the prize.
In response to my contention that as a measurement community, we need to measure what society wants to measure (that is, we cannot let our shortcomings define and limit what is measured), my colleague Susan Lyons asked
What is our role in ensuring that “society” isn’t just those with the most power and privilege, but [ensuring that] those involved in defining the construct reflect our aspirations for a pluralistic democracy?
As members of society, we can and should play an active, or even activist, role in determining what is measured and how assessment is used.
As a measurement community and stewards of the use of assessment in education, I view our primary responsibility as staying true to our core concepts of validity, reliability, and fairness. That will be much easier said than done, of course. It is always that way with high-stakes endeavors, and few things are more high-stakes for our democracy than public education.
Staying true to our core principles does not mean adhering blindly to the instantiation of those principles that has been defined by large-scale testing. Coefficient alpha is not the answer to all of life’s questions. Rather, it means
- Validity and validation focused on whether all of the desired inferences are fully supported.
- Reliability, as part of validity, focused not blindly on consistency for the sake of consistency, but on the sufficiency of the evidence gathered to support those desired inferences.
- Fairness, as part of validity, focused on the utility of an assessment program for all of its participants and all of its desired
To do those things well, we will have to seek the input and active participation of stakeholders outside of our narrowly defined measurement and assessment community: content specialists, policy makers, educators, parents, students, and the community as a whole.
This is not a new concept, or requirement, but it is one on which we have come up short in the past as we increasingly looked inward toward building “better” assessment instruments, rather than outward toward the essential elements necessary to build better assessment.
Keep your eyes on the prize
When we truly re-imagine assessment, in virtually all cases, we are doing so in support of efforts to re-imagine education.
Implementing a new vision of assessment will be messy. Implementing a new vision of education will be magnitudes messier.
Do It Anyway!