assessment, accountability, and other important stuff

Archive for December, 2015

Don’t Look Back – A Blog Year in Review


As 2015 comes to an end, it is futile to resist the temptation to reflect briefly on this first year of the Embrace the Absurd blog.  Reflections, after all, are the lifeblood of the blog.  As advertised, the fifteen essays posted in 2015 addressed assessment, accountability, and other important stuff.  There were posts on technical topics such as the way we describe standard error of measurement (The Best That We Can Do) and vertical scales (Are You Smarter Than a 5th Grader?).  There were posts that addressed the new next generation assessments directly (Shall We Dance?) and indirectly (Take Me Out To The Ballgame).  Of course, with the signing of ESSA, there were also recent posts related to ESEA (ESEA – It’s So Much More Than A Test) and NCLB (Goals: Assets or Distractions).  Through site stats and a nifty little annual report, informed me that the blog was viewed (usually once) by more than 200 people in six different countries and was viewed by me lots of times.  All in all, it feels like 2015 was the next logical step in the progression that began with the virtual blog that I kept in my head for several years and continued with Thoughts from the Center – the private predecessor to Embrace the Absurd that I shared with colleagues in 2014.

Although the posts covered a variety of topics, most of them revolved around the central theme of the blog: If you are in search of meaning, you must be willing to look well beyond a test score.  The message that in an ideal world results from large-scale assessment should confirm information that educators already possess about student achievement was conveyed in two of the initial posts, Psychometrician, Do No Harm and Taylor Swift, Headbanging, Summative Assessment, and Actionable Information.  For years, I attempted to deliver that message at workshops using another large-scale data collection effort, the U.S. Census, as an example; that is, we do not wait for the census results to tell us how many people we have living in our household, whether classes are too crowded in our schools, etc.  Then with the 2010 Census, the government began running advertisements stating that it was important to complete the census so that you would know whether your town needed a new traffic light.  Embrace the Absurd.

When I entered the large-scale assessment industry in 1989 (trademark pending, Taylor Swift), the most important lessons I learned from my mentors Rich Hill and Stuart Kahl involved the importance of understanding not only the strengths of large-scale assessment, but also its limitations; recognizing that data from large-scale assessments can supplement, but not supplant information that educators process on a daily basis in the classroom.  In that vein, I once borrowed from John Lennon’s Imagine and the National Research Council to end a session at the old CCSSO Large-Scale Assessment Conference with a verse that began with “Imagine there’s no Harcourt, It’s easy if you try” and concluded with “Imagine all the teachers, Knowing What Students Know”.  In the years that followed, however, we have seen hundreds upon hundreds of millions of dollars allocated to building bigger and better large-scale assessments and complex statistical machinery employed to wring every drop of information from that single, summative score.  Embrace the Absurd.

A new year and a new reauthorization of ESEA, however, bring the promise of new hope, new challenges, and new blog posts.  There will be new opportunities to ponder and comment on issues in assessment, accountability and other important stuff.  There will be new reasons to keep the conversation going, to ask why (or why not), to continue the search for meaning, and to embrace the absurd.

Goals: Assets or Distractions

A noticeable difference between NCLB and ESSA is that ESSA is devoid of explicit goals.  Yes, one could argue that “Every Student Succeeds” is a goal.  I am still hedging my bet, however, on whether people will treat that tagline as a goal or as a policy statement, as in every student succeeds becomes the ESEA equivalent of “everyone gets a trophy”.  Sure, ESSA does try to sneak some things like college-readiness and target high school graduation rates in through the back door, but there is nothing that corresponds to the NCLB rallying cry of 100% Proficient by 2014.  Is that a good thing?

Goals as a Distraction

There is little question that, on so many levels, the NCLB goal of all students Proficient by 2014 was a distraction almost from the very beginning.  First, nobody ever believed that all students would be performing at the proficient level by 2014, regardless of who determined what qualified as proficient performance.  Second, the reliance on test results as the sole determiner of proficiency allowed concerns about test use to serve as a diversion from the real issues associated with improving student proficiency.  Third, the mechanisms put in place to define Annual Measurable Objectives (AMO) and to measure Adequate Yearly Progress (AYP) toward the goal of 100% Proficient by 2014 caused immeasurable damage by focusing on unmeasurable and largely irrelevant intermediate goals.  Fourth, the penalties, or sanctions, for not meeting intermediate goals somehow became conflated with programs designed to help districts meet the goals. (Aside: Was confusing punishments with programs just a fluke or a reflection a deeper mindset in education, consistent with the manner in which grades are assigned and attempts are made to change behaviors in schools?)

The 100% Proficient goal was so much of a distraction that most people did not understand, or did not want to understand, that it was never actually a requirement.  The requirement was that schools reduce the percentage of non-Proficient students by 10% each year.  A school with 10%-20% of its students Proficient in 2002 would have been required to have approximately 75% of its students Proficient by 2014.  A school with 90% Proficient in 2002 had a 2014 target of 97% Proficient.  Perhaps the 74% and 97% targets were no more attainable that the 100% Proficient goal, but they would have certainly changed the conversation.

The Obama administration provided evidence of just how much simply changing language can change perceptions with their NCLB waivers.  The waivers were widely portrayed and perceived as providing districts and schools with relief from the unrealistic and impossible mandate of the fundamentally flawed requirements of a broken NCLB.  However, when you do the math, the waiver requirement that schools cut in half achievement gaps by 2018 is for all intents and purposes identical to the original NCLB requirements. As shown in the chart below, regardless of whether a school is low-performing, high-performing, or somewhere in between, there is no appreciable difference between cutting an achievement gap in half in seven years and reducing the percentage of non-Proficient students by 10% each year – a practical example of the importance of basic mathematics literacy.



The entire AMO and AYP process offers another example of the need for a basic understand of mathematics and statistics.  The amount of time, effort, and money devoted to developing and implementing the machinery to determine whether schools actually met their AMO of 43%, 66%, or 83% Proficient was appalling.  First, it is pretty much impossible to determine with a single test whether a school actually has 66% of its students Proficient versus 63% or 69%.  Second, who cares whether 66% of students are Proficient?  We care whether individual students are Proficient and we care whether the school is making real progress toward implementing a program that will produce the long-term goal of 100% Proficient, but nobody should care whether 43%, 66%, or 83% of students are Proficient in a given year.

The consequences of focusing on test scores and achieving intermediate targets at the expense of programs designed to promote long-term improvement have been well-documented.  When the focus shifts from the intended outcome—improved student achievement—to the performance target—a test score—and it is much easier to produce short-term improvements in test scores than it is to produce long-term, sustainable improvements in student performance, the end result is a consequential validity disaster of epic proportions.  Of course, the phenomenon is not limited to school accountability, test scores, and NCLB.  Countless weight loss programs and get-rich-quick schemes and fundamentally flawed business models provide ample evidence of the allure and consequences of focusing on targets rather than changing behaviors.

Goals as an Asset

For all of the problems that can be linked to the 100% Proficient goal of NCLB, there is ample evidence that goals can be a valuable asset to support the changes in behavior needed to achieve long-term results.  At the program level, goals can provide the foundation for securing commitment, resources, and funds to support a long-term initiative.  At the individual level, explicit goals can help safeguard equity by ensuring that everyone is working toward the same long term outcome.

Many people have written about the characteristics of establishing productive goals that lead to sustainable changes in behavior.  The qualities ascribed to good goals and goal setting are well-known:

  • Goals need to attract and keep attention; that is, people have to perceive achieving the goal as an important and necessary outcome to commit to it.
  • Goals need to be linked directly to programs designed to change behaviors.
  • Goals need to be specific.
  • Goals need to be difficult yet attainable.
  • Goals need to be future-oriented. (One of the cardinal sins on NCLB is it remained in effect so long that its goals were no longer in the future.)

A common theme underlying each of the points above is that goals have to be appropriate for a given situation.  In psychology, the Yerkes-Dodson law describes an empirical relationship between arousal and performance, which describes the importance of finding the appropriate amount of emotional arousal to stimulate optimal performance or optimal functioning.  Too little or too much arousal can lead to less than optimal functioning, and consequently, have a negative impact on functioning.

In sports, the approaches used to stimulate optimal performance in a professional athlete are not likely to be as effective with an amateur or novice.  Effective fitness programs are personalized, tailored to the individual’s current fitness level, lifestyle demands, and long-term goals.  In school accountability, goals and consequences must take account of the context.  It is unlikely that a single set of goals and consequences will be equally effective with a high-performing school, a school with just meeting accountability targets, and a school struggling to succeed.  Equally important, the programs implemented to elicit the sustainable changes in behavior needed to produce the desired outcomes will vary for each of those schools.

ESSA provides the flexibility states to design and build the personalization of goals and consequences into their district and school accountability programs. There is also a danger, however, associated with such flexibility and ESSA’s lack of explicitly stated outcomes for all students.  As I wrote in a prior piece, the incentives to establish different achievement expectations for different sets of students are strong.   Computing conditional growth scores for students and value-added scores for schools and teachers are the latest approaches to accounting context when evaluating the performance of students, teachers, and schools.  Like their predecessors such as alternative norm groups and similar school score bands, these measures can be valuable tools for establishing realistic short-term goals and consequences for particular groups of students.  Without a clear focus on a long-term goal or outcome, however, the use of these conditional scores in isolation can result in separate and unequal expectations for various subgroups of students who are difficult to teach or who attend schools lacking in resources – a phenomenon which I have referred to as the slippery slope of growth.

The Goal

The goal, therefore, is for states to develop school accountability programs under ESSA that improve upon, rather than abandon, the outcomes-based goals of NCLB.  Hopefully, those accountability programs will find a way to use those goals as an asset rather than a distraction.

ESEA – It’s so much more than a test

Whether ESSA is signed into law before the end of the year – or like the release of the Iran hostages in 1981 we have to wait until Arne Duncan officially leaves office – it appears that we will finally see a reauthorization of ESEA.  Much of the coverage of the long-awaited and hotly debated end of NCLB has focused on the accountability and assessment requirements of ESSA; specifically, on identifying the ways in which ESSA requirements differ from those of NCLB – annual testing remains but states will have more control over accountability systems; the waivers are gone but key components of the priority and focus school concepts remain; growth and alternate assessments based on alternate achievement standards (AA-AAS) which were introduced after NCLB was enacted in 2001 have found their way into ESSA.  Overall, one can conclude that even with the shift of control over accountability systems from the federal government to the states and the much more nebulous statement of goals, test-based accountability remains alive and well under ESSA.

In designing new test-based accountability systems under ESSA, however, we must find a way to reconnect the assessment and accountability requirements with the rest of the law.  Listening to much of the criticism directed at NCLB over the last decade, a person unfamiliar with ESEA might walk away with the impression that the federal government simply implemented testing requirements and expected that schools would improve.  Missing from much of the rancor directed at NCLB and test-based accountability has been even an acknowledgement that ESEA, in general, and Title I, in particular, allocate billions of dollars to fund programs that are intended to improve the academic achievement of disadvantaged students.

I witnessed the impact of this disconnect firsthand as a delegate to the 2004 Maine State Democratic Convention.  In an awe-inspiring display of the power of the combination of misinformation and a frenzied mob being caught up in the heat of the moment, the delegates blocked any debate on the floor and through a nearly unanimous voice vote raucously adopted an amendment adding the following one-sentence paragraph to the party platform:

The No Child Left Behind Act should be repealed.

Few in the auditorium that afternoon appreciated the tragic irony of the juxtaposition of that line with the opening two paragraphs of the party’s position on education:

Life-long access to education is critical to the well being of our citizens, our economy, and our democracy. Education must begin with early childhood programs, such as Head Start and child development services, that prepare children to learn. We urge free preschool education for all Maine children.

Life-long learning for all requires a strong public education system that provides opportunities for students of all ages throughout the state, including the physically and mentally challenged. We recognize the importance of special education, gifted and talented programs, and multicultural programs, and we support the long overdue full funding of these programs as mandated by state and federal law. We also support increased funding to bring existing schools into compliance with federal and state accessibility mandates. And we recognize the problem of bullying within schools and support providing local school districts with help in developing and implementing anti-bullying policies.

Of course, one could simply accept with bemusement that a bunch of Maine democrats failed to see the connection between the No Child Left Behind Act that they demanded be repealed and the programs they so whole-heartedly endorsed.  After all, over the course of the last decade the Maine Democratic Party has been teetering on the brink of becoming the third party in a two-party system.  However, the thinking exhibited in Maine in 2004 is fairly representative of the debate surrounding ESEA/NCLB, and more important, is also reflected in the accountability systems states have designed to meet ESEA requirements.  In most cases, state test-based accountability systems implemented under NCLB stop at assigning ratings based on an aggregation of status, improvement, and growth indicators derived from school performance on state assessments.  Those accountability systems fail to make a direct connection between school (or student) performance on those assessments and the programs that were funded and implemented to improve that performance.

From the beginning, Title 1 of ESEA included assessment and accountability requirements as a safeguard to ensure that the federal money being allocated to programs to improve the achievement of the disadvantaged was being spent wisely.  In a 1968 study of the implementation of the new law, Bailey and Mosher wrote the following about the law’s accountability requirements:

For various reasons, the evaluation mandate, even if ambiguous and ambitious, was considered not only important, but necessary:

Sec. 205. (a)(5) that effective procedures, including provision for appropriate objective measurements of educational achievement, will be adopted for evaluating at least annually the effectiveness of the programs in meeting the special educational needs of educationally deprived children.

Major proponent of the provision, Senator Robert F. Kennedy, … regarded it as a protection against the infusion of Title I funds into on-going school programs unlikely to upgrade the achievement of educationally disadvantaged children. (ESEA – The Office of Education Administers a Law)

More than thirty-five years after the original ESEA, a similar connection between funding, programs, and accountability can be seen in various quotes attributed to President Bush in describing the assessment and accountability requirements of NCLB:

For the first time, the federal government basically demanded results in return for money.

That’s why Ted Kennedy and George Miller were very effective.  We didn’t agree on the funding formulas and certain issues, but we did agree on the basics.  And that is, you cannot expect excellence unless you measure.

If you’re going to fund [schools], like we’ve been doing for years, we in the federal government ought to demand accountability, which seems to me a very conservative principle.

Somewhere along the way, we lost the program evaluation perspective to ESEA accountability – the concept that specific programs were being implemented to meet particular goals and it was necessary to evaluate the effectiveness of those programs, including the fidelity of their implementation.  Perhaps NCLB contributed to the disconnect between accountability and programs by separating the consequences of failing to meet accountability targets (e.g., school choice, tutoring, restructuring) from the programs that were being funded.  By not linking the consequences of accountability systems directly to an evaluation of program effectiveness, perhaps NCLB inadvertently fed into the all-too-American predilection for achieving the outcome without changing behaviors (see historical data on weight loss supplements).

In any event, moving forward with ESSA it is important to reestablish the link between inputs and outputs, between the federally funded programs that states and schools are implementing under ESSA and the expected results of those programs.  At a minimum, establishing such a link may force someone to be able to provide a rationale for how the proposed programs are supposed to help produce the desired outcomes.  If we are fortunate, establishing such a link also will reduce the effectiveness of rhetoric that suggests that the federal government and states believe that testing alone will lead to improved achievement.  Ideally, establishing a link between programs and expected outcomes may even lead to a fruitful conversation about what types of outcomes can and should actually be expected from our schools, teachers, and students.

Test scores, or accountability indicators based on test scores, can provide important information about whether the Title I goals “to provide all children significant opportunity to receive a fair, equitable, and high-quality education, and to close educational achievement gaps” are being met. Tests and test scores alone, however, can neither achieve those goals nor tell us why/why not states and schools are meeting those goals.   There is a reason why under ESSA, Title I alone allocates more than $15 billion per year to programs and only $378 million per year to state assessments.  Under ESSA, we must commit to doing a better job of evaluating the effectiveness of those programs.