assessment, accountability, and other important stuff

Archive for December, 2016

You Can’t Always Get What You Want – A Blog Year in Review

Charlie DePascale

As I look back at the thirteen essays that I posted in 2016, it is clear that the dominant theme of the blog this year was that we need to acknowledge and accept our limitations.  Specifically, across the essays there were three messages:

  1. As psychometricians, we need to embrace rather than run from the uncertainty in educational measurement.
  2. As assessment specialists and influencers of educational policy, we need to convey useful information about the limitations of large-scale assessment and its appropriate role in K-12 education.
  3. As citizens, we need to engage in thoughtful conversations about the purpose and goals of public education; with a particular focus on the future of K-12 public education and the resources that we are willing to commit to it.

On the surface, a year of essays focused on limitations sounds somewhat bleak.  Overall, however, I am pretty certain that the message was intended to be uplifting and encouraging.  My July 4th essay, We Hold These Truths To Be Self-Evident…, addressed the truth that there are no truths or at least there are no absolutes: reality is contextual, we cannot eliminate uncertainty, and modeling is not measurement.  As they say, a man’s got to know his limitations.  After all, without an awareness and appreciation of our current limitations, how can we hope to improve?

Uncertainty in educational measurement

Four of the essays addressed issues related to uncertainty in educational measurement.  We have to acknowledge that there is uncertainty in educational measurement, in general, and in large-scale assessment, in particular.  Not only do we have to acknowledge that there is uncertainty, we have to convey the message that uncertainty is not a shortcoming. We have to engender an appreciation for uncertainty.  We have to provide users of assessment information the tools that they need to process and use information that is incomplete.

Growing Pains – my most recent post, addressed the complexity involved in defining, let alone, measuring growth and making inferences from growth scores.

Citius, Altius, Fortius – used the Rio 2016 Olympics as the context to discuss the ways in which particular values and constraints shape the design of accountability systems.

Interval Scales, Unicorns, and Non-stick pans – turned my frustration with sitting in traffic into a discussion of the relevance, or lack thereof, of certain measurement concepts (i.e., interval scales) that perhaps we rely on a little more than we should.

And all the teachers are above average – made a case for the appropriateness of using a normed-referenced criterion in an accountability system, particularly in cases where there is not an established standard or in situations where we expect that a great deal of system-wide improvement over an extended period of time is needed to reach an acceptable standard.

The limited role of large-scale assessment

Five of the essays addressed various issues related to the role that large-scale assessment can and should play in K-12 education.  Large-scale assessment is a good tool for a very limited set of purposes, but is really a poor tool for most other purposes to which it is commonly applied.

It’s January. Can Johnny Read? – we began the year with the seemingly obvious observation that annual assessments cannot provide information about what students can do at any given point in time during the school year (i.e., when teachers are making instructional decisions for individual students).

Is that all there is? – addressed the need to make a compelling case for the use of large-scale assessment, particularly when those assessments require the allocation of significant resources and are used for high-stakes accountability purposes.

This is my fight song? – used the common task of placing wristbands on concert-goers old enough to purchase alcoholic beverages as a call for placing more thought on whether the current requirements for large-scale assessment are consistent with the principle form follows function.

It’s Deju Vu All Over Again – offered a cautionary look back at the decisions that influenced the design of NCLB assessment programs as states are beginning to plan their assessments under ESSA.

The Road to Hell and High-Quality Assessment – called for restraint in holding large-scale assessments up to standards that a) they cannot meet or b) they should not attempt to meet.

The Purpose of Education

Two of the essays addressed the need for a serious conversation about the purpose and future of public education.  What is it that we really want our schools to accomplish?  Are we willing to identify the resources needed to accomplish those goals and purposes? And are we willing to make a commitment to allocate those resources before holding students and educators accountable?

In the blink of an eye – used the occasion of my 35th college reunion to address the dual goals of equity and excellence, the conflicts between those goals, things that had changed over the last 35 years and things that had stayed the same.

One Small Step – proposed three goals that might be achievable “small steps” or starting points for public discourse on education reform: develop a clear and complete definition of college- and career-readiness; agree on the limited amount of content knowledge and skills that are critical for all adults to possess when they complete public schooling; and renewing a commitment to meet President Clinton’s 1996 technology goals for schools.

The Gift Was Ours to Borrow

My November essay, A Month of Goodbyes, reminds us of the need to enjoy, appreciate, and make the most of the time that we have.  In that essay, I say goodbye to Stan Deno, who passed away in October.  Earlier this month, we said goodbye to George Madaus.  Both men were giants in the field and influenced the lives of so many people over so many years.  Both men also lived long lives and knew that the end of their journey on earth was near. Life, however, can change in an instant.  In March, I watched a young singer, Christina Grimmie, perform on her 22nd birthday.  Three months later she was murdered at a similar concert.  Earlier this month, a friend’s 21 year old son was driving at lunchtime when his truck rolled off of the road, changing the rest of their lives.

In the musical Hamilton, one of the questions asked of Alexander Hamilton throughout the show is “why do you write like you’re running out of time?”  Of course, none of us knows when our time will run out.  I am looking forward to 2017 and a third year of sharing my thoughts through this blog.  I may not be able to emulate Alexander Hamilton and write day and night, but I resolve to write more often and more regularly.  Looking ahead to 2017, I turn once more to lyrics from Hamilton:

And there’s a million things I haven’t done
But just you wait, just you wait

Happy 2017!  Embrace the Absurd!

Growing Pains

Charlie DePascale

“To ensure greater flexibility in tracking individual students’ annual progress, growth models provide states with more options for a nuanced accountability system, while adhering to the core principles of No Child Left Behind.”

— Secretary of Education Margaret Spellings

It all seemed so simple in 2005 when states wanted to include growth in their assessment and accountability systems for NCLB.  In short, districts and schools wanted credit in the accountability system for progress made by students who had no real chance of attaining grade-level proficiency within a single school year; often, those were students whose achievement at the beginning of the school year was well below their nominal grade level.  All that we had to do was to determine how many years it would take students to reach proficiency if they continued to make progress at their current rate.  If a student was on track to be Proficient within a reasonable amount of time (e.g., three years) the school could include her in the numerator when calculating the percentage of students who had met the accountability target for a given year. No problem.  Or perhaps schools would get credit for a student if he made a year’s worth of growth in a given year.  Sure, why not?

Then it got real.

Simple concepts like on track to proficiency and a year’s worth of growth were not quite so simple.  Since 2005, CCSSO alone has published more than a half dozen guides to growth models.  And if those guides are any indication, one of the few things that has clearly grown over the last decade is the number of pages it takes to explain growth models.


As Castellano and Ho state in the 117-page 2013 document, A Practitioner’s Guide to Growth Models,

In the practice of modeling growth, the operational definition of growth does not always align with the intuitive definition of growth.  If this were a guide only for the growth models that aligned with intuition, it would be a short guide that excluded a number of models in active use across states. (p. 12)

What is an intuitive definition of or perspective about growth? 

At the recent annual conference of the Northeastern Educational Research Association, I moderated an invited panel discussion titled, Growth – You get it, right?  The first question I posed to panelists Peter Swedzewski, Lisa Keller, and Damian Betebenner was,

What is growth? How would you define it?

As each of the panelists answered the question and we discussed their responses, my impression was that a few intuitive notions about educational growth emerged:

  1. Growth is not the same as change; growth is an interpretation of change.
  2. Growth requires some sort of reference point (i.e., a basis for evaluation or comparison)
  3. There are many different reference points for growth.

(Disclaimer: Those are my impressions of the general discussion. The panelists and audience members may hold other views).

None of those notions of growth should have seemed new or come as a surprise to me.  Although I am guilty of using terms like growth, change, gain, etc. interchangeably on occasion, I know that change, or a gain score, usually has little meaning by itself and is usually of little interest to anyone receiving or using the score. In education, So what?  questions are always more interesting and more important than How much? questions.

And the answers to the So what? questions about change virtually always require a reference point.  We have become pretty good at providing norm-referenced and criterion-referenced descriptions of changes in performance from one year to the next.

  • Johnny scored 75 points higher this year than last year, but to remain at the Proficient level he would have had to score 150 points higher.
  • Jane’s gain of 100 points from last year to this year was more than 99% of sixth grade students in the state.
  • If Tim continues progress at the same rate next year as he has over the last two years, it is likely that he will no longer be on track to college- and career-readiness.
  • Tina’s score indicates that she can now has the following skills which she did not have last year: find and position integers and other rational numbers on a number line and use long division to convert a rational number to its decimal form,

It is worth noting, however, that many such descriptions tend to be descriptions of the student’s current status (i.e., the result of change or growth) rather than descriptions of the change or growth itself.

On track

We have a long way to go to really understand what it means to claim that a student is on track to college- and career-readiness or even simply on track to proficiency at the next grade level.  Such claims must be based on assumptions about the content standards, the content of the assessment, the achievement standards, and the interactions among them.  Such claims must also be based on empirical information about the performance of particular groups of students performing at various points along the performance continuum.  In some way, information about the effectiveness of districts, schools, and teachers must also be baked into the process.

And when we have decided how to model all of that information and produced some type of score, we still must determine what it means to be on track.  Is a student on track if there is a 50% chance that she will meet the desired goal or target?  67%?  80%?  Who makes that decision, and when is it better to simply report or describe probabilities rather than to make a claim that a student is or is not on track?

Bringing content back into the discussion about growth

Much of the discussion about growth and growth models has been back-end discussion; that is, it has focused on the processing of assessment scores (with or without additional information) to produce some type of growth score.  Much less attention has been paid to the front-end assumptions about growth that have gone into the design of the assessments that produce the scores that are being processed.  In a 2010 presentation, Brian Gong identified five distinct ways to define growth:

  • Growth is increase in performance on the same thing, toward mastery.
  • Growth is learning one topic and then learning a more advanced topic in a sequence or content.
  • Growth is increase in expertise on the same thing (e.g., a more powerful mental model, increased fluency, greater independence).
  • Growth is increase in integration across content and skills
  • Growth is increase of knowledge and skills outside the defined areas.

One would assume that folks developing content standards or designing assessments would have reached consensus on which of these statements best fits their notion of growth; and that such a notion of growth is reflected in the standards and the assessments.

Within our current model of determining growth based on change in performance across annual end-of-year assessments, which of the five conceptions of growth are most likely to be supported and which are not?  Would different conceptions of growth be better supported in a within-year system of interim or benchmark assessments?  What about within a competency-based system of learning and assessments?

When the USED requires that assessments must be designed to support the measurement of student growth, the common interpretation of this requirement is that the assessment must produce scores that differentiate among students along the performance continuum.  In other words, we interpret the requirement to mean that assessments cannot have large floor or ceiling effects or simply support coarse achievement level classifications. Under our current state assessment model, those might be necessary, but not sufficient, conditions for supporting the measurement of growth.  Under different models (e.g., a mastery model) those conditions may be neither necessary nor sufficient.

As we move into the second decade of discussing growth, we may want to shift the conversation from how best to define and measure growth within our current assessment model to what type of assessment model best fits our desired definition of growth.

A Year’s Worth of Growth

The missing link, Rosetta Stone, or Holy Grail in our understanding of growth continues to be the elusive definition of a year’s worth of growth.  Historically, we have the empirical, norm-referenced definition/interpretation in which a year’s worth of growth is defined by the 50th percentile performance on consecutive annual assessments.  More recently, we have moved toward a more quasi-criterion based interpretation in which a year’s worth of growth is defined in terms of the knowledge and skills required to be Proficient on the state standards from one year to the next – or perhaps simply by the progression of knowledge and skills in the state grade-level content standards from one year to the next. Neither of those approaches alone, of course, provides a sufficient or complete understanding of the concept of a year’s worth of growth.

At this point in time, discussions of the meaning of a year’s worth of growth inevitably and invariably produce more questions than answers.  To the extent that those questions help identify the desired definition of growth or clarify how growth should be used in accountability systems, more questions than answers is not necessarily a bad outcome.  The last thing that we want is answers that are based on foundations of sand; that is, answers based on incomplete understanding of assumptions, claims, and consequences.

There are unanswered questions about whether a year’s worth of growth means the same thing for low performing, typically performing, and high performing students.  What can and/or should we expect from each of those groups; and what are the implications of those expectations on outcomes such as achievement gaps or readiness gaps?

There are unanswered questions about whether a year’s worth of growth means the same thing across grade levels.   Most assessment programs that employ a vertical, or developmental scales reveal that at least in terms of what is being measured on the scale, a year’s worth of growth requires less year-to-year gain (or change) as students progress across grade levels.

A graph of the achievement level thresholds for the Smarter Balanced English Language Arts/Literacy tests demonstrates both the within grade and across grade questions described above.  Moving from grade 4 to grade 5, a gain of 29 points is needed to maintain performance at the Level 3 threshold, but a gain of 49 points is needed to maintain performance at the Level 4 threshold.  A gain of only 16 points is needed to maintain performance at the Level 3 threshold across the three years between grades 8 and 11.  In contrast, a 41 point gain is needed to maintain performance at Level 3 from grade 3 to grade 4.  Do all of these gains reflect a year’s worth of growth?


The Big Picture – Looking beyond growth

Personally, I also wonder whether there are some types of courses, particularly in high school, from which we do not really expect much, if any, student growth – dependent, of course, upon how we define growth.  Consider the following examples:

  • A 12th grade English elective with a curriculum centered on writings of or about the Holocaust. The primary purpose of the course may be for students to grow on standards related to understanding the period and the human experience, while applying and maintaining skills related to reading comprehension, writing, or research.
  • An introductory unit/course on the types, or classes, of rock: how they are formed, where they are found, what they are used for, what they tell us about the earth. For most students the content of such a unit may change very little from middle school to high school to college – with each exposure designed to refresh or reinforce previous knowledge.
  • Business, vocational, or technical mathematics courses in high school in which students apply basic skills in arithmetic, algebra, and perhaps, geometry to specific contexts. Students may not exhibit growth defined as acquiring new content knowledge or skills in mathematics, but may solidify their mathematics skills while exhibiting growth in a different field.
  • A 12th grade mathematics course designed specifically to ensure that a student who has attained college-readiness in mathematics by the end of grade 11 but is not interested in taking any higher-level mathematics courses does not regress before arriving at college.

In summary, while it may be true that “if we don’t grow, we aren’t really living” and “every experience is an opportunity to grow” at some point the purpose of K-12 education must shift toward the maintenance and application of knowledge and skills that have been acquired.  Undoubtedly, that point will vary across content areas for individual students as they begin to identify their long-term goals and the pathways that they intend to follow.  Defining and measuring growth in those contexts is probably going to require more than a state assessment in English language arts and mathematics.