The definition of insanity is doing the same thing over and over again, but expecting different results. (source unkown)
Those who cannot remember the past are condemned to repeat it. (Santayana)
This is one of those times when there are so many quotes that describe the situation so well that it is impossible to select just one.
We are at a crossroads in educational assessment, where the decisions states make in 2016 and 2017 likely will shape assessment for the next ten to fifteen years. There is a new federal law in place with assessment and accountability requirements that states are trying to interpret. There are forces pushing for high-quality assessments that require students to demonstrate college- and career-readiness; assessments that emphasize writing, critical thinking, problem-solving, and research skills. At the same time there are forces citing feasibility and practicality in calling for assessments that take less time, are less expensive, and produce immediate results.
If all of the above feels oddly familiar, it should. In too many ways, 2015-2017 is beginning to look like a replay of the period between 2002 and 2004; and that is where our quotes begin.
Newman! – or in this case, Neuman!
In 2002-2003, states were trying to figure out how they would implement the new state assessment requirements of No Child Left Behind (NCLB) that would take effect in 2006. Instead of testing students once at the elementary, middle, and high school levels, the assessment requirements of NCLB required annual testing of students at grades 3 through 8, plus an additional test at high school. NCLB also required those state assessments to be high-quality assessments aligned with the state’s challenging academic achievement standards, and to involve multiple up-to-date measures of student academic achievement, including measures that assess higher-order thinking skills and understanding. Additionally, the accountability requirements of NCLB required the results of those assessments to be processed quickly to allow states to issue school and district accountability reports in time for accountability requirements such as “school choice” to take effect prior to the beginning of the new school year.
States did not know whether it was possible or whether they had the capacity to test students at seven grades instead of three, process the results of those assessments quickly enough to meet the accountability requirements, and also administer the high-quality assessments described in the law. Something had to give, and Susan Neuman, Assistant Secretary for Elementary and Secondary Education in the U.S. Department of Education, stepped up to do the giving. At a keynote luncheon during the annual Large-Scale Assessment Conference sponsored by the Council of Chief State School Officers (now the National Conference on Student Assessment), Neuman announced that the NCLB assessment requirements could be satisfied by tests consisting solely of multiple-choice items. While we ate cake, the die was cast on the design of state assessment for the next decade.
Those who cannot remember the past are condemned to repeat it.
It is critical to the current situation to remember the state of large-scale assessment in the period leading up to NCLB and the USED acceptance of multiple-choice tests. In many ways, the 1990s was a golden age of innovation in large-scale state assessment. The modern era of state assessment began in the mid-1980s, shortly before the backlash against traditional, multiple-choice, norm-referenced standardized testing reached a peak with uproar over the Lake Wobegon Effect. Many state assessment programs such as the New Jersey Assessment of Knowledge and Skills (NJ ASK) and Massachusetts Comprehensive Assessment System (MCAS) were able to incorporate a direct writing assessment and a variety of item types into an otherwise traditional test format. Other states attempted to push the assessment envelope well beyond the traditional end-of-year bubble test. A few examples of the assessment programs which explored innovations such as the use of portfolios and performance tasks, along with constructed-response items and direct writing assessments included the following:
- New Standards Project
- Vermont Portfolio Assessment Program
- Maryland School Performance Assessment Program (MSPAP)
- Kentucky Instructional Results Information System (KIRIS)
- Maine Educational Assessment (MEA) and Local Assessment Systems (LAS)
- California Learning Assessment System (CLAS)
- Rhode Island Distinguished Merit Testing Program
- Nebraska Student-based Teacher-led Assessment and Reporting System (STARS)
In some fundamental way, each of those programs attempted to push the assessment envelope well beyond traditional multiple-choice items. Some attempts were more successful than others. Careers were made pointing out issues of technical quality associated with programs such as KIRIS or the Vermont Portfolio Assessment Program. All of those programs, however, reflected a spirit of innovation and improvement – a sense that it was possible to re-imagine large-scale state assessment. The statement by Lauren Resnick, co-founder of the New Standards Project, conveyed the feelings of most of the programs listed above, “When you do something as expansive as what we did, you have to have a belief you can make almost anything happen.”
Two decades later, many of the practical and technical problems associated with the innovative assessments of the 1990s remain unsolved and unstudied or potential solutions remain untested. That might be acceptable if we could argue that 20 years of research had resulted in little progress in solving those problems. The reality, however, is that much of that research never occurred. The impetus for innovative assessment and solving those problems stopped in the 1990s. When push came to shove at the end of the 1990s, innovation gave way to expediency. The pressure to meet the increased testing and accountability demands of NCLB simply outweighed any countervailing forces to produce high quality, innovative, performance-based assessments.
How likely is it that the experience of the late 1990s will repeat itself in 2016, 2017, or 2018? What will this next generation of state assessments look like after two or three test administration cycles? Is the recent decision by the state of Connecticut to eliminate the performance task from the Smarter Balanced English language arts assessment an anomaly or a tipping point? How far will assessments such as PARCC go in attempt to rightsize themselves – what design features will be compromised? Should we be excited or nervous that words like rightsize are being used in discussions about assessment?
The definition of insanity is doing the same thing over and over again, but expecting different results.
A key feature of each of the innovative assessment programs listed in the previous section is that their developers were willing to think outside of the traditional, end-of-year, summative assessment box. They did not limit themselves to on-demand assessments that could be administered neatly to individual students in one, two, or three sessions during a tightly defined administration window. Most of those programs reflected the belief that the authentic assessment of higher order cognitive skills such as critical thinking and problem solving required a different kind of assessment experience. Assessment of those skills requires students to engage with problems and performance tasks over an extended period of time and it requires assessments embedded in curriculum and instruction. In short, assessment of the skills that we consider critical is messy.
As states built the new common core aligned assessments such as PARCC and Smarter Balanced, however, they stayed firmly within the traditional assessment box. PARCC may have tried to build a bigger box and Smarter Balanced used adaptive technology to try to measure more precisely the types of things that you can assess well in that box. However, neither program attempted anything close to the innovative assessment programs of the 1990s. That outcome was not a surprise in the current era of accountability, but it does evoke a sense of doing the same thing over and over again, but expecting different results.
End-of-year assessments such as PARCC and Smarter Balanced can and will do a better job at assessing skills such as problem solving and critical thinking than most state assessment administered in the NCLB era. There is, however, a ceiling to what such assessments can do. As reflected in the 2014 National Research Council report Developing Assessments for the Next Generation Science Standards the effective assessment of complex content standards such as the Common Core State Standards and the Next Generation Science Standards requires a fundamental change in our thinking about assessment, and requires a new kind of assessment program.
ESSA offers states some opportunity for flexibility and innovation in the design of their testing programs. States may explore ways to incorporate results from interim assessment programs into an end-of-year performance level classification for students. States may also propose innovative assessment designs for assessment systems to support their accountability systems. Such efforts, however, will be limited by the constraints of accountability as well as the practical and technical challenges faced by the innovative assessment programs of the 1990s. Without a true commitment to fundamental change, to accepting the messiness, and to providing the time and resources needed to build comprehensive, cohesive assessment systems across the school, district, and state levels, it is likely that the once-in-a-lifetime window of opportunity for assessment reform that Secretary Duncan described in 2009 and 2010 will close with only incremental improvement in the state of state assessment.
To end on a positive note, all hope is not lost. Acknowledging the limitations of end-of-year, on-demand assessment, realizing that it has become unmanageable, and admitting that we are powerless over our addiction to it is the first step toward the solution. There are pieces of a comprehensive system being built through solid work on formative assessment practices, the use of interim assessments, and through efforts to build and sustain educators’ assessment literacy. The NRC Committee on Developing Assessments of Science Proficiency has provided a solid framework for moving forward. The recently published NCME volume, Meeting the Challenges to Measurement in an Era of Accountability, addresses many of the technical challenges laid out in the 1990s and provides examples of performance-based assessment programs in a variety of content areas. The pieces are there. Will we pick them up and stop doing the same thing over and over again or are we condemned to repeat the errors of the past?