On Scales, Achievement Standards, and Trends

As part of my series of posts on NAEP last month, I used the absurd hypothetical of a 250-year trend line in support of the much less absurd argument that perhaps, like all good things, it was time for the NAEP trend line as currently conceived and constituted to come to an end.

As expected, the post elicited a response from Andrew Ho, the self-proclaimed “protector of trend” who deftly posed the question, “who does breaking the NAEP trend help and how?” I might respond that I know firsthand of one state hurt by the manner in which a single “average” national adjustment was applied to maintain the NAEP trend during the 2017 conversion to computer-based testing. But I know that some would respond that sometimes it’s necessary to sacrifice a prawn for the greater good – Laissez les tendances rouler!

My post also spurred a response and thoughtful post from Enis Dogan in which he laid out a series of alternative approaches to demonstrate that keeping or breaking the NAEP trend was not a simple yes/no, all or nothing decision. I agree with Enis 100% and with the proper beverages at hand, I would even be willing to argue with Andrew that on at least one occasion NAGB already has employed similar methods to bruise, sprain, and perhaps even break its trend while talking of adjustments and continuing on with reporting. But I’ll save that argument for another day.

Today, while my friends, colleagues, countrymen, and the rest of the measurement and assessment world gathers in Los Angeles, I come neither to bury nor to praise the NAEP trend.  Rather, I come to offer an alternative not to how we use the trend, but more deeply to how we conceive of trends and the relationship between scales, achievement standards, and trends.

This post is an update to a presentation that I made around 2010 to the Massachusetts Comprehensive Assessment (MCAS) Technical Advisory Committee at a time when the department (DESE) was considering making the switch from MCAS to PARCC. A swithch that they feared would break the decade-long trends that the state had established with its MCAS tests and used as a thermometer or yardstick for measuring the effectiveness of their Education Reform initiative.

My point today, as it was then, is that the NAEP trend as currently conceived and constituted is based on a far too narrow interpretation of the concept of a trend or a trend line in educational assessment and education reform.  That is, the concept is one in which the scale, achievement standard, and trend are inexorably linked. Break one link and the trend train is derailed. That belief is what makes it necessary to “maintain” the LTT scale back to the 1970s and main (state) NAEP scale back to the 1990s.

On the contrary, I argue that scales, achievement standards, and trends are, in fact, distinct entities. It is entirely possible, feasible, and if fact quite common, to decouple the trend from both the scale and the achievement standard and watch it continue to roll merrily along the track. To wit…

Scales

In simplest terms, a scale is connected to a single test or measure.

That statement applies whether we are discussing the Celsius and Fahrenheit scales, a meter stick and a yard stick, the SAT and ACT, MCAS, Smarter Balanced, NWEA MAP, or NAEP.

We can extend the concept of “single test” a bit to include carefully constructed parallel test forms, which we argue are interchangeable, and therefore by definition, still a single test.

Unlike with thermometers and measures of distance, the temperature rises and the ground gets a little shaky beneath our feet when we don our psychomagician’s hat and apply statistical wizardry to build tests that attempt to extend the scale in either direction or to measure more precisely a narrow range of the scale, but that’s an argument for another day, post, or treatise.

The point is that the scale is connected to the test.

Achievement Standards

 In our current world of large-scale state testing for accountability, achievement standards are tied directly to content standards. NAEP is similar. In bygone days, it was common to say that the content standards told us “what” and the achievement standards told us “how much” was needed for student performance to be declared Proficient at the end of a grade level (or Basic or Advanced).

We’ve come a long way from the early days of thinking of standard setting as simply identifying cutscores on a test. Today, rich performance level descriptions based on complex content standards are the foundation of standard setting.

In massive comparability studies conducted between 2005 and 2015, we demonstrated and established that achievement standards need not be tied to a single test. I was part of one such study, a USED-funded study, supported by CCSSO, and led by Phoebe Winter charged with “Evaluating the Comparability of Scores from Achievement Test Variations.” I believe that my first professional contact with Enis Dogan was related to an evaluation of the appropriateness of Louisiana using PARCC achievement standards while administering a state assessment based on PARCC. Additionally, it was accepted that it was possible for PARCC and Smarter Balanced, both aligned to the Common Core State Standards, to share an underlying set of achievement standards. (Again, a topic for another day.)

In a recent webinar, the point was raised that a problem with state testing is that trends are broken because the lifespan of state tests has dropped to under five years (I believe). A change in state tests (or contractors), however, does not necessitate a change in achievement standards or trends. As long as the content standards to which the tests are aligned remain constant, it is possible and practical for a state to maintain its achievement standards and continue to report statistics such as “percent proficient” (sorry, Andrew) that maintain trends over time.

Of course, in some cases, states welcome a change in achievement standards when they change tests or content standards.

And in some cases, content standards and/or achievement standards do need to change over time. Which brings us to the third car in our train: trends.

Is it possible to maintain meaningful trends without maintaining either a constant scale or achievement standards?

Trends

I’ll answer the question above with two of my own:

  • What do you mean by trend?
  • To whom are you reporting the trend and why?

As suggested above, with regard to state testing and NAEP, we apply a most narrow and conservative definition of trend; that is, one in which a constant scale and achievement standard are maintained.  It’s true that there are cases outside of education in which such an approach is used. For example,

  • We do want and need to know that the temperature of the planet is increasing.
  • It is important to know that the US population is becoming more obese.

Such straightforward applications of trends tend to be the exception rather than the rule. Most trends of interest and import are based on statistics or indicators that are temporal, time bound.

We are not so much interested in actual income or even changes in income over time as we are in

  • The percentage of people living below the poverty line.
  • The standard of living available to a family today with 1 full-time income, 2 full-time incomes, or more than 2 full-time incomes.

We are not as interested in how well people can read and write compared to a time past as we are in changes to the literacy rate over time.

Definitions of the poverty line, standard of living, and yes, even literacy (i.e., the achievement standards) can change and evolve over time, but we still report the trend. The trend in and levels of those applied indicators are what is of primary interest to policymakers and the general public, much more so than a number on the original scale.

That income number and $10 will buy you a cup of coffee.

The same principle and phenomenon occur in our field of education.

We are interested in whether students graduating from high school are ready for college and careers. Are there more college-and-career-ready students today than there were 5 years ago, 15 years ago, or 20 years ago?

For the policymaker, public, admissions officer, and employer, it matters little whether the definition of college-and-career readiness has changed.

Is there some benefit to knowing whether kids can read, write, or do mathematics better today than their siblings, parents, and grandparents did? Sure, but that information is not where the utility of our large-scale assessment program lives.

Progress

Some defenders of the NAEP trend latch onto the word “progress” in the name of the assessment program and the charge from Congress to report on academic achievement and trends.

It’s at least worthy of discussion whether progress and trends are best measured and reported in terms of differences on a fixed scale as the current model does or by an approach in which progress and trends are defined in terms of proximity to and movement toward a tangible goal.

Image by Pabitra Kaity from Pixabay

Published by Charlie DePascale

Charlie DePascale is an educational consultant specializing in the area of large-scale educational assessment. When absolutely necessary, he is a psychometrician. The ideas expressed in these posts are his (at least at the time they were written), and are not intended to reflect the views of any organizations with which he is affiliated personally or professionally..