assessment, accountability, and other important stuff

Charlie DePascale

 

 

There are certain things that you hear about for the first time and with all your heart you want them to be real. Although your brain tells you to be skeptical, you desperately want those things to exist and to have the magical powers that people ascribe to them.  For me, interval scales, unicorns, and non-stick pans are three such wondrous things.  It would be great to see a unicorn, or the even rarer pegacorn, wander through my yard every now and then along with the deer, turkeys, and occasional fox.  Who doesn’t tear up as the Ark pulls away without them in The Unicorn Song? And non-stick pans…  Every time that I see the commercial for the amazing pan made with ceramic and titanium, I want to believe that the fried eggs, melted cheese, and S’mores will slide right out of the pan.  And if the pan does work as advertised, I want to believe that there will not be research in a few years that tells me about the dangers of ceramic and titanium pans.

Alas, so far I have seen no unicorns and have not sent in the $19.99 (plus shipping and handling) for the pan.  However, as much as I would like to believe that a unicorn will walk by my office window or that I will never again lose sleep over scratching a new non-stick pan, my expectations for unicorns and non-stick pans are low.  That is not the case for interval scales.  Time after time, I have put my faith in interval scales and time after time they have let me down.

My latest encounter with the deceptive nature of interval scales occurred last week as I attended a concert at Symphony Hall in Boston.  I met my wife and daughter in Cambridge at 6:35 p.m. for the 5.5 mile drive to Symphony Hall and the 8:00 p.m. Pops concert.  At 8:05 p.m., 90 minutes after starting out, I arrived at the parking garage across the street from Symphony Hall. (Fortunately, my wife and daughter got out of the car a half-mile from Symphony Hall and were settled in their seats 10 minutes before the conductor took the stage.)  Of course, because the measurement gods can never pass up an opportunity to drive home a point, the 65 mile drive home to Maine took only 84 minutes.  Interval scales had let me down again.  It took less time to traverse the 65 miles home after the concert than it took to drive the 5.5 miles to the concert.

To recap:

5.5 miles – 90 minutes

65 miles – 84 minutes

In reality, of course, I had no expectation that the time required for the 5.5 mile trip from Cambridge to Boston at 6:30 p.m. would be directly proportional to the time required for the trip home.  I did not even think that the time required would be directly calculable from the known distance and posted speed limits.   Having grown up in Boston, I know that the question “How far is it from A to B?” is always answered in units of time and not distance.  Yes, distance is measured with equal-interval scales such as miles, kilometers, inches, or centimeters (ratio scales, actually), but those scales and their intervals are largely irrelevant to driving in or around Boston.  They tell us nothing of interest.  Knowing that it was 5.5 miles from Cambridge to Boston told me nothing about the time it would take to make the trip.  It also told me nothing about the amount of gasoline that would be consumed during the trip, the amount of oil that would be burned off in 90 minutes of stop-and-go traffic, or whether we should have stopped in the restroom in Cambridge before starting out.

In most cases, distance will not be quite as irrelevant in planning a trip as it was Thursday evening.  The amount of time and other resources needed to cover a particular mile, or 1,000 miles, however, is always context dependent.  Are you traveling on a highway or on city streets?  Are you driving through the flat farmland of Indiana or winding through the mountains in Colorado?  If you are Indiana, is there a 40 mph headwind absolutely destroying your gas mileage or a thunderstorm making it impossible to see more than a couple of car lengths ahead of you?  It all seemed so simple back in elementary school when one of the first formulas we learned was D = RT.  We should have been suspicious when D = RT returned to play such a prominent role in our first calculus class.

If well-established, physical, “real” interval scales that we can see and touch can be made irrelevant by context, what hope do we have for scales in educational measurement?  One of the first things we learn in educational measurement is that in 1946, Stevens defined a typology of levels of measurement or scales: nominal, ordinal, interval, and ratio. In many ways, that is the measurement equivalent of in 1492, Columbus sailed the ocean blue.    We quickly learn that most of the data that we deal with is either nominal (gender, race/ethnicity) or ordinal (grade level, ratings or letter grades) and that, in practice, none of it will be ratio (because we want to believe that there is no such thing as having absolutely no knowledge or proficiency).  The best that we can hope for is to find, or produce, interval level scales.  Ironically, the data that we deal with most often in educational measurement, counts of items answered correctly, in fact, may be considered ratio level, but the count is rarely the variable of interest.  Yes, we know that if on a reading test Peggy answers 12 questions correctly and Derek answers 6 questions correctly that Peggy has answered twice as many questions correctly as Derek.  However, the simple counts usually do not provide enough information to draw conclusions about Peggy’s or Derek’s proficiency in reading – the variable of interest – or conclusions about differences between their levels of reading proficiency.  So, we apply psychometric techniques to use the relationships between those counts of items answered correctly to define a measure of reading proficiency on an interval scale.

The resulting interval scale can provide us with information about the level of Peggy’s and Derek’s reading proficiency – as defined by the set of items we have administered.  From the scale, we can also determine the distance between Peggy and Derek on the reading proficiency scale.  However, with regard to knowing how much time, effort, and other resources it will take to improve Derek’s reading proficiency to Peggy’s level, are we any better off than I was with the knowledge that it was 5.5 miles from Cambridge to Boston? No, without an understanding of context it is pretty much impossible to make a sound judgment about what is needed to close the reading proficiency gap between Derek and Peggy. Closing that gap could be similar to driving the 5.5 miles to Boston or it could be more like the 65 mile drive back home to Maine. We just don’t know.  Establishing an interval level scale has done nothing to put us in a better position to know what it will take to close the gap in reading proficiency between Peggy and Derek.  As was the case with miles, considered by itself, the equal interval reading proficiency scale is largely irrelevant to our questions of interest:

  • How much instructional time and resources are needed to improve Derek’s reading proficiency?
  • How effective has Derek’s teacher, school, or district been in improving his reading proficiency?

To answer those questions, we need to consider context.  We understand the need for context when it comes to driving, but often seem blind to context when developing accountability systems and expectations for student improvement.

I could end this post at this point, but it is probably important to say just a little bit more about the interval scales we develop in educational measurement.  Familiar physical scales for distance have established standards.   That is not the case for most of the constructs and scales we use in educational measurement.  For example, there is no universally accepted definition of reading proficiency, in general, or even for a more limited construct such as third grade reading proficiency.  Rather, reading proficiency is defined by a set of content standards and the assessment designed to measure student achievement of those standards.  As President Bush stated in 2003, “Well, if a child can pass the reading test, the child has learned to read, as far as I’m concerned.”

In this regard, our constructs and scales are more like unicorns than non-stick pans.  Non-stick pans are real, but can vary greatly in quality and other characteristics.  Unicorns and our scales, however, are not tangible.  Their existence is dependent upon our belief in them.  And our belief in them is dependent upon their utility.  In other words, we must understand context not only to interpret intervals and differences on our scales, but to understand the constructs themselves.  Ultimately, the uncertainty in our constructs and scales may be a good thing.  Understanding the importance of context may help shift the focus from arbitrary test scores to rich descriptions of what students know and are able to do under certain conditions; and then to a focus on what is needed to change student performance and/or those conditions.  With all my heart, I really want to believe that.

%d bloggers like this: