Psychometrician, Do No Harm

(Prepared for presentation on April 18, 2015 at the NCME annual conference in Chicago, IL)

Last fall, I was asked to participate in a panel discussion, responding to questions from teachers on the broad topic of making use of assessments and data from assessments in the classroom. Over the course of the winter and spring, as is often the case, the plans for the panel and my role in it morphed until finally coalescing into the daunting task described in the conference program:

Charlie DePascale will talk about psychometricians’ roles in ensuring high-quality measures, competing measurement priorities for, and barriers to, providing educators with more useful information.

One reason that the task was intimidating was because I have never considered myself a psychometrician. I only became a psychometrician when one of my previous employers changed the name of my position to principal psychometrician. What is a psychometrician and what is psychometrics? I was unsure how to answer those questions, so I decided to check the Psychometrics Society website. As it turns out, they are also a little unclear on the answer to those questions. To answer the question, they asked four noted psychometricians to offer a definition of psychometrics. The following carefully selected portions of their definitions made me feel better about calling myself a psychometrician:

Because many of the questions that psychometricians study transcend disciplinary boundaries, and concern general issues of measurement and data analysis, the boundaries of the discipline are fuzzy…

Because measurement in psychology is often done with tests and questionnaires, it is rather imprecise and subject to error. Consequently, statistics plays a major role in psychometrics…

Today, psychometrics covers virtually all statistical methods that are useful for the behavioral and social sciences…

Feeling reassured that I can speak as a psychometrician, I will address the charge to this panel in the context of four big picture issues in which I am involved and on which we would benefit from a stronger connection between psychometricians and educators: large-scale assessment, interim assessments, teacher evaluation and SLO, and fundamental concepts of assessment, measurement, and data literacy.

Large-scale Assessment

In educational testing, psychometricians are most closely associated with large-scale assessments; that is, external, standardized assessments such as
Custom state assessments,
Norm-referenced achievement tests,
National interim assessment programs, and
College admissions exams.

Among those, state assessments have taken on added importance and a seemingly ever-increasing presence in the lives of K-12 educators since the full implementation of the assessment and accountability requirements of NCLB in 2006, With regard to state assessments, the most important message that I, as a psychometrician, can deliver to educators is that the state assessment should not provide you with any information that you didn’t already know. Of course, you already knew that. However, in the midst of the hoopla around data-driven instruction and the hubbub about assessment results informing instruction, and with the new and improved next generation assessments growing in every conceivable dimension, perhaps you were beginning to doubt yourself. Rest easy. I am here to assure you that in a well-functioning, cohesive, local system of curriculum, instruction, and assessments aligned to the state content and achievement standards, state assessment results should confirm what you already knew about the performance of your students, schools, teachers, programs, etc.

What information does the state assessment provide?

You are familiar with the expression no news is good news and have probably heard the term multiple measures. Ideally, when a system is working well, the state assessment will serve as an additional measure confirming what you already knew; an external audit that lets you know that you are in sync with other districts and the state in your interpretation of the state content and achievement standards. The state assessment also provides a common metric that can be used to compare results across students, schools, districts, and over time.

With certain tests, it is also possible to compare performance across states. The catch to all of the above, of course, is the requirement for a well-functioning and aligned local system. If the local system is not well-designed or has been implemented poorly, the state assessment may, in fact, provide information that is discrepant with your local information. In that case, the response should be to determine how the results are different and to ask why. Figuring out why the results are different and what to do about it will involve some analysis of the test results, but will primarily focus on an examination of local materials, practices, and most important, student work in relation to the state standards.

Am I saying that the state assessment results are always right and local practices must always be adjusted when there is a discrepancy? No! However, there must always be an understanding of why there is a discrepancy.

Now one may be tempted to ask, if state assessment is intended to serve the confirmatory purpose that you describe, does it need to be so long, do we need to test all students every year, or do we need to place so much emphasis on the results. Those are good questions to be discussed on another day.

Interim Assessments

Interim assessments such as the MAP tests offered by NWEA and Renaissance Learning’s STAR Reading and Math exams have become quite popular with school districts across the country. They are relatively easy to administer, return results immediately, and provide a variety of informative and useful reports about student achievement in the content area as a whole and on specific skills. What’s not to like?

As the assessment reports provided with the interim assessments continue to improve and the level of information provided continues to increase, it is critical for educators to have a working understanding of how the reported information was derived. How can the test produce such detailed diagnostic and prescriptive information on the basis of such a short administration?

To some extent, the detailed or diagnostic information provided in those reports is based on statistical relationships gleaned from vast amounts of data. In a sense, the results are predictions or descriptions of patterns of typical performance based on the available data. What they are not, in most cases, is a certification that an individual student has mastered a specific standard or set of standards based on assessing the student directly on those standards with a sufficient number of items to make such a determination of mastery. And that’s OK, as long as teachers and administrators understand what the scores mean and how to use them appropriately.

One important thing to keep in mind at this particular point in time is that the stability or consistency of the relationship between student performance on the test items administered and students’ overall performance outside of the test is critical to the usefulness of those scores. In the midst of the implementation of new college ready standards, curriculum, and instruction, there is a high likelihood that those relationships will become unstable (at least temporarily) and may change permanently. The testing companies will make the necessary adjustments to reflect the new relationships as those stabilize, but in the meantime, it may be prudent to use additional caution in interpreting and using those results.

Educator Evaluation and SLO

In the last few years, I have become involved in the design of educator evaluation systems for states, particularly in the design and use of Student Learning Objectives (SLO). On the surface, the basic concept behind SLO is pretty straightforward: at the beginning of the year a teacher defines the knowledge and skills students are expected to acquire during the year; the teacher provides appropriate instruction and monitors student progress throughout the year; at the end of the year the teacher determines the extent to which students have attained the desired knowledge and skills. To a psychometrician, that sounds like teaching. However, I have been told by multiple K-12 educators and teacher educators that this way of thinking represents a paradigm shift. Clearly, a great topic for additional discussion between teachers and psychometricians. Of course, with the implementation of SLO, the devil is in the details; or perhaps some might argue, how the devil is using the details to classify teacher effectiveness. Again, a topic for additional discussion between educators and psychometricians.

Two important points to remember about SLO:

A wide variety of programs that differ in critical aspects are being implemented under the label SLO.
An SLO is a process that includes assessment, but an SLO is not an assessment.

Fundamental Assessment, Measurement, and Data Literacy

For educators to use assessment well in each of the three contexts described above, as well as in the classroom on a regular basis, there is a fundamental level of assessment, measurement, and data literacy required. The first step to acquiring that literacy is to understand that those are three interrelated, but different, concepts.

Assessment literacy refers to the understanding of practices and procedures related to the development and use of assessments in the classroom.

Developing or selecting the appropriate test for a particular purpose
Determining whether an assessment is accessible and free from bias.
Understanding the ways in which the format of a test item impacts the
information that it provides

Measurement literacy refers to the understanding of some fundamental measurement principles, particularly those related to validity and the uncertainty of measurement.
Understanding that all test scores are imprecise (contain error); and the
impact of that on setting targets for pre-post gain scores
Understanding what is gained and what is lost from allowing a student to
retake a test
Awareness of the interrelationship between achievement levels, the
distribution of students scores, and student performance on a test.

Data literacy refers to the skills needed to organize and manipulate data so that it can be analyzed, interpreted, and used to support instruction.

Knowing how to combine data across multiple assessments
Working knowledge of the various tools used to organize, analyze, and
present data
Having the skills to be a wise consumer and producer of data; and to know
how to protect and to share data.

Takeaways

This is hard, complicated, and messy.

In 2009, I attended a seminar titled Measuring 21st Century Skills: New Tools for a New Era. The keynote speaker for the session was Elena Silva, then a Senior Policy Analyst with Education Sector. During the course of her presentation, she happily reported that she had met with psychometricians who told her that if she could define the construct, they could measure it. What the psychometricians failed to tell her, however, was that in education it is virtually impossible to define the construct in such a way that we can actually measure it. The truth is that education is complicated and messy. There are too many factors and too many complex interactions and too much human involvement to stand a chance at true measurement.

Trust your intuition (sometimes), but verify.

In 2005, Phi Delta Kappan published an article by Braun and Mislevy warning of the dangers of policy based on what they referred to as Intuitive Test Theory. They made the case against assessment policy based on commonly held misconceptions or intuitions about assessment and measurement without input from assessment and measurement specialists. In that article, however, they also made the case for intuition in the use of assessment at the classroom level; and the need to be able to trust teachers’ and administrators’ intuition about assessment and assessment results is more critical today in this data-driven world. With the basic literacy described above, understanding of the imprecision of our measurements, and the role that we expect teachers and administrators to play in interpreting and using data, we have to be able to trust their intuitions. If something seems wrong with the data, there probably is something wrong with the data. For their part, we also need teachers and administrators who have the willingness and the tools to verify and support those intuitions with empirical data.

Tradeoffs are necessary, but never trade away the important stuff.

One of the first things we learn as budding psychometricians is that nothing is more important than validity and that validity is limited by reliability. Somewhere along the line that message is reshaped to you cannot have validity without reliability (yet another discussion for another day) and suddenly our focus shifts to reliability. Soon all sorts of decisions are made to preserve reliability at the expense of validity. That is our contribution to the Tradeoffs fiasco. After we are done, there are all of the non-measurement related tradeoffs caused by concerns such as: cost, testing time, fairness, and acceptability. In the end, we end up with an assessment instrument that is ill-suited for its intended purpose, one where the results can never support the claims that are being made. Of course, we seldom go back and adjust those claims or statements of purpose to reflect the final instrument. It is critical, therefore, to understand what you may be sacrificing in the name of standardization or to save time and money.

It is much easier for teachers to understand psychometrics than for
psychometricians to understand teaching.

With some basic training and practice, teachers (and administrators and even policy makers) can acquire the knowledge and skills that they need to make sound assessment choices, interpret and use assessment results effectively, and even construct assessments of sufficient quality for their uses. Most important, with some basic training and practice, teachers will know what not to do with assessments and assessment results. On the other hand, we are far, far away from a point where psychometricians can understand and model teaching, learning, and student achievement with anywhere near the accuracy and precision of meteorologists predicting the weather. What psychometricians can do, however, is work better to understand the type of questions that educators are asking of assessments and assessment results; and then use that information to produce better reports, design better assessments, and convey better information about the limitations of assessment.

This is only a test!

Assessment is a powerful tool, and it can be a dangerous tool if used improperly or recklessly. The bottom line, however, is that assessment is only a tool. A deep understanding of content, and pedagogy is necessary to make the tool useful in the classroom. Beyond the classroom, we need policy makers who have a deep enough understanding of assessment and measurement to know when to seek expert support before establishing assessment based policy and laws that have a negative impact on the classroom and ruin the fun for all of us. An ongoing dialogue among psychometricians, educators, and policy makers is critical. At this point in time, psychometricians cannot do nothing and do no harm.

Psychometrician, Do No Harm

Published by Charlie DePascale

4 thoughts on “Psychometrician, Do No Harm”

Share this:

Published by Charlie DePascale

4 thoughts on “Psychometrician, Do No Harm”