Would AYP Have Sucked Less Without Test Scores?

In my previous post, Whose Job Is It, Anyway?, I did not include accountability systems in my discussion of a state’s responsibility for the validation of common uses of test scores.  As I mentioned in that post, that omission was not because accountability systems do not require careful scrutiny; they certainly do. It is precisely because there are so many aspects of accountability systems that must be examined closely even after making the decision that school or teacher accountability is an appropriate use for test scores that these systems constitute a different class of test use and warrant their own discussion. As an exemplar for this point, I have selected Adequate Yearly Progress (AYP) – the source of so much of our current state of discontent and a system so fundamentally and fatally flawed that it could only be the product of the legislative process.

First, A Validity Question

Before examining the mechanics of AYP, there is a validity question regarding the use percent proficient as an indicator of school effectiveness to consider: 

Are interpretations or claims related to the effectiveness of a school supported by the percentage of students performing at the proficient level or above in English language arts and mathematics?

That question has been debated for two decades and is a legitimate validity question. In the abstract, there is, of course, no right or wrong answer, only different sets of values to be considered. If you do not believe that some aggregate of student performance in English language arts and mathematics is an appropriate or sufficient indicator of school effectiveness then you would not be designing an accountability system like AYP.  If you do believe that ensuring student proficiency in English language arts and mathematics is a minimum bar that all schools must meet then you can move forward in designing an accountability system based on the proficient students (but hopefully, not a system that looks anything like AYP).

When a decision is made to use percent proficient in English language arts and mathematics as an indicator of school effectiveness, it is important to clearly communicate and support the claim that is being made about the school (e.g., why schools in which one subgroup doesn’t meet their proficiency target are “failing schools” ??!!). It is also important to fully consider the consequences of that decision, but that is a discussion for another day.

Second, Whither Test Scores?

Note that to this point we have not mentioned the use of test scores in this discussion of the appropriateness of using percent proficient as an indicator of school effectiveness.  Performance on a common state assessment is merely one of many ways that the percentage of students in a school who are proficient in English language arts and mathematics could be estimated.  The accuracy of the estimate based solely on test scores versus estimates based on teacher judgments, a combination of teacher judgments and test scores, or perhaps some other modeling approach is a completely separate question than whether percent proficient is an appropriate indicator of school effectiveness.  

The decision to use test scores for accountability will also have practical implications and consequences that must be taken into account. However, those consequences, although related, are likely are different than the consequences associated with the original decision to use percent proficient as the primary indicator in the school accountability system.  If there is already an existing state assessment program, we know that attaching higher stakes to test scores may lead to practices that affect the interpretation of those scores. If a state assessment program would have to be developed specifically to provide input data for the accountability system, there are well-known costs and benefits that must be weighed.

It may also be the case that associating the state assessment program and its tests with a poorly designed accountability system may have a negative impact on the credibility of the state’s assessment program and how the state assessment is perceived by stakeholders, which leads to the question posed in the title of this post, “Would AYP have sucked less without test scores?”

Third, AYP DOA

There may be ways to use percent proficient effectively in a school accountability system, but let me be clear, Adequate Year Progress (AYP) as delineated by NCLB was not one of them.  In honor of the 2020 Democratic ticket I offer the following two thoughts on AYP:

At best, AYP was a nice slogan. If we had a goal of 100% Proficient by 2014, schools making yearly progress toward that goal would have been a good thing.  However, without any empirical information on what adequate progress to 100% Proficient would look like from starting points of 25%, 50%, or 80%, the federal government established arbitrary and capricious guidelines for setting annual measurable objectives (i.e., targets) and states made their best guesses at how to implement those guidelines.

The arbitrary annual targets, however, were just the tip of the iceberg in terms of the problems with AYP.

  • It was not feasible to reliably estimate whether the percentage of students proficient in a school met a target for a given year and also account for students in the subgroups that were the focus of the system.
  • Sampling error in annual estimates of the percentage of proficient students in a school population overwhelmed the change in performance that could be reasonably expected or would be required from one year to the next.
  • While it may be possible to observe and measure real progress over 3 years, 5 years, or 10 years, year-to-year changes are too small and/or too subject to noise to inform policy and practice. 

Perhaps the most significant problem with AYP is that the system created false and distracting targets. Even if precise measurement were possible, when the goal is 100% proficient, it is a waste of time and resources to attempt to focus on whether 52.8% of the students in a school are proficient in a given year of if the percent proficient increased from 65.6% to 69.1% from one year to the next.  The long-term curricular and instructional interventions and programs that one implements in an attempt to ensure that 100% of students are proficient in 12 years likely will be very different than efforts to ensure that 46of students are proficient next year or 52% the following year. 

What’s the Use?

My goal in concluding this series of posts on validity by questioning whether AYP would have sucked less without test scores was to demonstrate the complexity in answering questions about validity and the use of test scores.  Our tendency to allow tests to be conflated with the accountability system and to miss the nuances in validity questions only makes it more difficult to improve both the tests and the accountability system.

  • If you don’t believe that percent proficient in English language arts and mathematics is an appropriate indicator of school effectiveness then no amount of technical tweaking to would make an accountability system based on percent proficient a better system. 
  • On the other hand, if you do believe that percent proficient in English language arts and mathematics is an appropriate indicator of school effectiveness, it may worth the effort to attempt to improve a system like AYP.
  • However, even if you believe that percent proficient in English language arts and mathematics is an appropriate indicator of school effectiveness and you believe that the school accountability system is sound, there may still be concerns about how the use of test scores in such a high-stakes system might affect the accuracy of the interpretation of those test scores as measures of student proficiency on the state’s content standards.

And the list could go on.

So, no, AYP would not have sucked less without test scores.  And, state assessment programs might be in a better position today if they had never been associated with AYP. Understanding why, however, is the key to making things better under ESSA and whatever the next reauthorization might be.

(Image source: http://ge.projects.history.ucsb.edu/iv/)

Published by Charlie DePascale

Charlie DePascale is an educational consultant specializing in the area of large-scale educational assessment. When absolutely necessary, he is a psychometrician. The ideas expressed in these posts are his (at least at the time they were written), and are not intended to reflect the views of any organizations with which he is affiliated personally or professionally..

%d bloggers like this: