## A case for a norm-referenced criterion

*Charlie DePascale, June 2014*

*This brief was written in response to questions about the use of norm-referenced criteria in teacher evaluation systems. Specifically, questions were raised about the fairness of systems in which the bottom ‘x’ percent of teachers were always identified as less effective regardless of their level of effectiveness. The issues addressed here apply to schools and districts as well as teachers and are being debated once again as states design their new accountability systems to meet the requirements of the Every Student Succeeds Act (ESSA).*

As the popularity of norm-referenced growth scores increases, there is also a growing concern about the appropriateness, or fairness, of using a norm based criterion as part of a teacher evaluation system. The case against norm-referenced scores includes arguments such as the use of a norm-referenced criterion

- creates a moving target for teachers,
- means that some teachers will always fail to meet the target, and
- makes it difficult, if not impossible, for teachers to know what they have to do to meet the target.

Arguments such as those apply to overtly norm-referenced scores such as student growth percentiles as well as other regression-based and multivariate approaches in which the underlying norm-referencing may be less obvious.

One proposed *solution* to the issues described above has been to “anchor” the measure by comparing performance each year to a fixed set of norms. The fixed set of norms may be based on performance in a single baseline year or may reflect combined performance across a selected set of years. Comparing student and school performance to an anchored set of norms has a long history in educational testing and accountability. With traditional norm-referenced test series, it was common to establish norms during the development of a test and apply those norms for several years until a new version (or edition) of the test was developed. With K-12 achievement test series, it would not be unusual for a set of norms to be in place for 7-10 years. For college admissions tests like the SAT and ACT, norms may be in place for decades.

The practice of anchoring norms does have some appeal and it does address some of the concerns expressed above. Anchoring norms does provide a common point of reference for as long as those norms remain in place. It also provides the opportunity, in theory, for all teachers in subsequent years to meet a target based on the established norms – that is, for all teachers to be effective. On the surface, that approach may seem fair to teachers, however, is it an appropriate foundation for a teacher evaluation system when

- There is no clearly established criterion for teacher effectiveness (or an effective teacher).
- There is an expectation for continuous improvement on the part of individual teachers and the system as a whole.

I would argue that those two conditions do exist now with regard to teacher effectiveness in K-12 education; and they are likely to always exist – and that is not necessarily a bad thing.

At this time, there is no clearly established definition of effective teaching; and such a definition would certainly be a prerequisite for establishing a measure and criterion for effective teaching. To be sure, there are well-known systems that identify and define components of effective teaching (e.g., Danielson, Marzano); and there is a body of educational research that has identified teaching behaviors associated with effective teaching. We are a long way, however, from determining how these components or dimensions come together in a particular context to produce effective teaching or an effective teacher. Although there may be some minimum levels that any teacher must meet on one or more dimensions to even be allowed to teach, it has become clear that understanding context is critical to determining how much and what combination of *x*, *y*, and *z* might produce the best results in a particular situation. And given the complexity of the teaching situation, attempting to prescribe a one-size-fits-all approach to teachers along with a corresponding measure or criterion seems foolhardy. At its worst, such an approach produces the current situation in which effective teaching is defined by a student test score.

Moreover, for the sake of argument let’s assume that

- we could define effective teaching and determine a minimum criterion for “effectiveness” tomorrow, and
- consistent with current teacher evaluation results, 85% – 95% of teachers are meeting that criterion and should be classified as effective.

Even if that were the case, nothing about K-12 education and teaching is static. The field is continually changing, and the teacher evaluation and accountability systems we develop should reflect and promote continuous improvement. Consider the following:

- We expect instruction and student achievement to improve as teachers and school systems become more familiar with the new college-readiness standards.
- We expect instruction to change as successive cohorts of students progress through curricula and instruction aligned to college-readiness standards.
- We expect advances in educational research and expect teachers to apply those advances.
- We expect advances in technology to make teaching more efficient and effective.
- We expect the global economy and marketplace to place ever-increasing demands on the K-12 and K-16 system – demanding that the systems produce students with different and higher skills.

None of those five factors are consistent with a teacher evaluation or accountability system based on a static criterion for teacher effectiveness. Rather, they call for systems that reward teachers who are flexible, continually learning, continually improving, and yes, teachers who are performing better than their peers facing similar conditions.

Finally, in terms of fairness, I think that there is reason to consider the argument that including a norm-referenced criterion within a teacher evaluation system is not only fair, but is, in fact, more appropriate than a real or pseudo criterion based on a comparison to a set of anchored norms. A primary question being addressed by norm-referenced systems is

*What impact on student learning was a teacher able to achieve compared to other teachers under similar circumstances?*

In essence, that is the question at the core of most personnel evaluation systems. Most employers do not want employees who simply meet a minimum criterion for acceptable performance – particularly when that standard was established several years ago. Most employers want employees who are able to make the best of out of the situation in which they are placed. In short, most personnel evaluation is norm-referenced. We should at least consider that before rushing to fabricate a criterion so that all teachers can be above average.