assessment, accountability, and other important stuff

Archive for September, 2015

Faith and Validity


Pondering validity on the occasion of Pope Francis’ visit to the United States

From Kane back to Ebel , there are religious overtones, sometimes thinly veiled, to discussions of validity as the alpha and the omega of educational measurement.  Without validity, there is no measurement.  Validity was in the beginning, is now, and ever shall be.  Amen.

As a lifelong practicing Catholic, this spiritual framing of validity has a familiar and comfortable feel to it.  Begin with the idea that although there is but one true validity, it comprises multiple (often three) validities, none of which is greater than construct validity.  Consider  the mystery that unlike virtually everything else in measurement, validity itself cannot be measured.  There are statistics that support a validity argument, but there is no overall validity statistic, per se.  Validity is all around us, and our effort to find and compile evidence of validity is never ending.  Most of the important things that our field is trying to measure cannot be seen or touched, but their existence can be demonstrated through words and deeds. Those are not foreign concepts to me.

The manner in which threats to validity are regarded also fits well within my upbringing as an American Catholic – everything in moderation and everything in context.  Test preparation and teaching to the test in moderation may be fine, but taken to the extreme are likely threats to validity.  Collaboration with peers, support from teachers, or the use of tools such as calculators or dictionaries may be fine under some conditions, but not others.  Assessment developers strive to produce high quality items and tests, but the very use of the same items or tests repeatedly will make them useless.  In short, there are few absolutes.

Above all that, there is a strange comfort in knowing that the meaning of validity is a mystery even to the most learned scholars in the field. In the 1961 article, Must All Tests be Valid?, Ebel quotes Cronbach, Gulliksen, and other test specialists of the day on validity and concludes, “[i]t would be difficult to state in words a core of meaning common to all the various definitions of test validity…”  Subsequent treatises by Messick (1989) and Kane (2006) undeniably have furthered the discussion of validity, but arguably have done little to establish a core meaning of validity and validation.  Newton and Shaw (2015) describe the current state of arriving at a core meaning of validity “as a standoff between scholars (and their followers) who advocate radically different usages.”


Within the Catholic Church, there are, of course, issues much more highly contentious and important than the role of consequences in validity. On a day-to-day basis, however, we do not let such issues paralyze us.  We seek our best answer to the question What Would Jesus Do?, and we act upon it.  Later, we reflect on our actions, we ask forgiveness when wrong, and we always try to find a better answer and do better the next time.   All of us engaged in assessment today must adopt a similar approach to validity and validation.  We must move forward and vow to design our testing programs with care, implement them with fidelity, report results accurately, make claims cautiously, gather evidence of their effectiveness, and do better the next time.

The final section of the 12-page introduction to the Validity chapter of the 2014 Joint Standards includes the following statement

[A] test interpretation for a given use rests on evidence for a set of propositions making up the validity argument, and at some point validation evidence allows for a summary judgment of the intended interpretation that is well supported and defensible.  At some point the effort to provide sufficient validity evidence to support a given test interpretation for a specific use does end (at least provisionally, pending the emergence of a strong basis for questioning that judgment).

If assessment specialists, educators, and policy makers move forward with a focus on presenting, challenging, and refining our set of propositions, determining what evidence is necessary (not simply what evidence is readily available) to evaluate those propositions, gathering that evidence, and making a summary judgment that is well supported and defensible, I believe that an epiphany will occur.  In a moment of divine inspiration and clarity, the veil will be lifted.  We will move from absolute darkness to glimmering light, and thence to the bright and clear vision that issues directly related to the test are often nothing more than a mote of dust in the validation and evaluation of our test-based programs and policies.

  • The validation and acceptance of test-based graduation policies rests little on the question of whether the test is providing an accurate measure of students’ proficiency in mathematics or English language arts. It is much more often concerned with the fairness and appropriateness of holding all students to a minimum level of proficiency as a prerequisite for earning a diploma.


  • The use of student test scores in teacher evaluation systems is never a question of whether a mathematics or English language arts test is a measure of teacher or teaching effectiveness. It is not.  The test, we claim, is a measure of student achievement in mathematics or English language arts; and we can gather evidence to validate that claim.  With regard to teacher evaluation, however, it is our set of propositions about the way in which student achievement and teacher effectiveness are interrelated that must be validated to make a well supported and defensible summary judgment about the effectiveness of an individual teacher.


  • Test scores in mathematics and English language arts do not provide a sufficient basis on which to evaluate the overall quality of a school. However, since its inception in 1965, a primary purpose of Title I was to close achievement gaps in reading, writing, and mathematics between students living in low-income households (particularly in high concentrations in urban and rural areas) and those who are not.  Therefore, student achievement in mathematics and English language arts are certainly important outcomes to consider in evaluating the effectiveness of Title I programs or the use of Title I funds.

The Consequences of Misunderstanding Validity

In closing, we return to the Joint Standards and the final section of the Introduction to the chapter on validity.  In contrast to the statement discussed above, the final paragraph of the Introduction begins with the statement

Ultimately, the validity of an intended interpretation of test scores relies on all the available evidence relevant to the technical quality of a testing system.

As practitioners in the field, we can continue gnashing our teeth and wailing over what does and does not merit consideration as evidence relevant to the technical quality of a testing system.  In doing so, however, we cannot let others lose sight of the fact that the technical quality of a testing system and the validation of a particular interpretation of a test is usually just one of the propositions on which their program or policy has been built.  If they do validate and evaluate the rest of their propositions, they are like the foolish man who built his house on sand.

He was my teacher, and he was effective


Labor Day is one of those times each year when memories of my father come flooding back.  Dad was a high school teacher for forty years from the late 1950s until the late 1990s.   Labor Day, signaling the end of summer and the beginning of each new school year, was a major event for our entire family that I recall fondly.  These last few years, however, my thoughts have turned to how he would have fared in this era of high-stakes testing, school accountability, and teacher effectiveness.

I have no doubts about the impact that he had on his students.  Growing up, when we went to a shopping mall, restaurant, or ball game it was the exception when we did not run into one of his former students with thanks and a story to share.  He mentored students long after they graduated from high school, and when I got my first teaching job in 1981, it was one of my father’s first students who became my colleague and mentor.  And when Dad passed away in 2009 (fittingly at the end of June), the bureau drawer packed with notes on the back of yearbook photos, cards, poems, drawings, and lengthy letters from students, parents, and former students made it clear that he had a tremendous impact on many, many lives.

But, how would that translate into a value-added score, median growth percentile, and an overall effectiveness rating?

As a starting point, in general, he did not teach the top mathematics students.  Among the titles of the courses he was assigned over the years were Algebra 1C and variations on the theme mathematics for everyday life.  As one of his students from the early 1970s recalled 30 years later in a 2008 blog post:

I had Mr. Dee for a Senior Math class called, “Trig and Topics in Algebra”.  This class was mainly for college bound kids who were good in humanities, but not so hot in Math.  Kids who were good in Math took Calculus in Senior year… I am terrible at math.  I do remember learning Sign, Co Sign, and Tangent (or is is [sic] “sine”??) in Mr. Dee’s class, but I couldn’t tell you any more than that.  I guess that stuff is used in engineering, but I’ve never used it since I left high school.

The blogger recalls and describes many vivid details from the class, but of course, none of them involve the teaching or learning of mathematics.  This led him in 2008 to this observation,

Even though I hate math, and even though in 1972 I thought Mr. Dee was very cool, TODAY as a 53-year-old it bothers me that he really didn’t do his job.  Maybe he should have been a teen counselor or something.  Maybe he SHOULD have had an “issues” show on radio or television.  He was a cool guy but I learned almost nothing in his class.

In response to that blog, another former student from the same era comments

Mr. D was a very kind teacher who loved to tell a joke…and teach about life…when he did teach math he did teach math…he was always there for the student….he always listened….always laughed and always manage [sic] to teach that there was a lesson in life besides math…,friendship….

All of which leads to the question that I am asking today, what was his job as a teacher and did he do it effectively?  I know that he could teach children mathematics and have seen firsthand evidence that he did so successfully. I know that his teaching changed as the world, education, and the requirements of the job changed dramatically over the years from his first teaching assignment in the late 1950s at a private boys school with some classes of 60+ students enrolled in what we would now call a career-technical program (a full seven years before he earned his bachelor’s degree in 1965), to his first public school position and the classes referenced above in the late 1960s and early 1970s, to the 1980s when he was upset each week when students returned from the resource room with perfect scores on tests he prepared, but were unable to answer a single question in class, to June 1998 and the advent of test-based accountability, when he finally left that very same classroom he entered 30 years before.

However, I also know that across all of those years, mathematics, basketball, and driving were simply the vehicles through which he taught children.


When I think back to my own high school days, it is not the content that I remember, not even the content from advanced placement classes.  I remember nothing about the Latin grammar and structure of the Aeneid, but I will never forget Mr. Jameson dramatically explaining that Hell hath no fury like a woman scorned; and the many times he stood at the front of the class, putter in hand, simulating a smooth stroke, and telling us that Form Follows Function.  Although I was a high school mathematics teacher and am known by some as a psychometrician, I remember next to nothing of the calculus I learned in high school and could not learn in college (Do psychometricians need to understand calculus?).  However, I do remember Mr. Durante’s warnings to us throughout junior and senior year in high school to be prepared for the sharks out there that we would encounter throughout our lives.  Even from graduate school, the single lesson from my master’s program that has had the most lasting impact on me and my career was the class that my advisor set aside the syllabus and the readings for the week to spend the entire evening providing us with a detailed outline and examples on how to write a research paper.

In addition to the life lessons and life skills discussed above, there are also relevant cognitive skills not necessarily reflected in students’ test scores that are a central focus of effective teaching.  In a 2012 op ed piece in the Hartford Courant, my colleague Steve Stemler, an associate professor of psychology at Wesleyan University, begins by asking readers to Think about the best teacher you ever had.  What is it that made him or her great?  In his response he makes the following observation:

Sure, students should be mastering content. Nobody disputes that. But aside from a few basics, most content knowledge in a field of study changes over time as thinking evolves and research emerges. What students really need to develop is resourcefulness, creativity, a passion for learning and the skills for learning how to learn, among other things.

As the 2015-2016 school year starts, we are regrouping from the most recent attempts to reform teacher evaluation.  The last time down this path we added student outcomes (i.e., test scores) to the traditional review of inputs (i.e., classroom observations).  Accepting that student outcomes should be part of the teacher evaluation equation, let’s begin this time by trying to identify, account for, and balance all of the critical student outcomes that define effective teaching.

It won’t be easy to find the proper balance between fleeting content knowledge and skills that can be measured on an end-of-course test and those enduring lessons that may be so much more important in the long run.  It won’t be easy to come up with a single, simple metric on which to rank order teachers and set effectiveness level cut scores.

I am confident, however, that we will end up in a good place as long as we start the conversation on effective teaching and teacher effectiveness with questions as long as we start the conversation with

Think about the best teacher you ever had.  What is it that made him or her great?

[photo and drawing are from the collection of memories he left behind. sources unknown]