assessment, accountability, and other important stuff

How Arne Works

Charlie DePascale

During my August trip to Minnesota I was able to check two books off of my summer reading list: Relativity – The Special and the General Theory by Albert Einstein and How Schools Work by Arne Duncan.  As the old joke goes, one was a book that asked me to rethink basic concepts and ideas long-held as fundamental truths, and the other was a book by Einstein.

I will attempt to reconcile Relativity and large-scale assessment in a later post.  Today’s post is devoted to my five takeaways from Arne Duncan and How Schools Work.

how schools work

1. Lies and Incentives

“Education runs on lies.”  This is the first sentence of the first chapter titled Lies, Lies Everywhere.

The in-your-face focus on lies no longer has the same shock value that it did when most of us were introduced to Arne in 2009; no, not after eight years of life in the honesty gap that rolled into the current era of fake news and alternative facts.

What was surprising, however, was how freely he uses the word lie. In some circles, the word lie implies more than a simple departure from the “truth” or reality.  To say that a person has lied or is a liar suggests an intent to deceive or mislead. Arne, however, uses the word lie to describe a broad array of statements and actions that one might refer to as myths, misconceptions, misinterpretations, untested beliefs, or defense mechanisms.  In one example involving a Chicago principal, Arne begins the section stating, “One such principal told this lie directly to Mrs. Daley and me, and I’ll never forget it..”  He ends the same story about the same principal stating, “I loved Chester’s honesty throughout – first when he challenged Mrs. Daley and then when he told me he’d been mistaken about his kids.”

In the end, perhaps actions based on lies or misperceptions cause the same problems and have the same negative impact on children. If your role is to solve those problems, however, understanding whether you are dealing with a lie or a misperception should influence your approach to a solution.  And if you are counting on current teachers, administrators, and policy makers to be part of the solution, starting off by call them liars might not be the best approach.

Incentive is another special word in the Arne lexicon.  Arne rightfully notes the importance for school improvement efforts to include incentives as well as the sticks associated with NCLB.  One example he offers of an incentive, however, is firing Chicago teachers caught cheating on a standardized test. I believe his argument is that the district ensuring that bad behavior is not rewarded is an incentive for the good behavior of all of the other teachers.  A second incentive he discusses is related to the teacher evaluation requirements associated with Race to the Top and the administration’s NCLB waivers.  I don’t know many teachers who viewed state-designed educator evaluation systems as an incentive.

You can only show me a stick and tell me it’s a carrot for so long before I figure out that’s a lie.

2. Story Driven

After reading How Schools Work, it is clear to me that Arne is story-driven.  By story-driven, I am not referring to the many stories that drive the narrative in How Schools Work.  Rather, I am referring to the concept of story-driven described by Bernadette Jiwa in her 2018 book Story Driven – you don’t need to compete when you know who you are.  Story-driven individuals and the organizations they lead have a “clear sense of purpose and identity” that defines and drives them.

Jiwa’s story-driven framework is defined by five words –    Backstory, Values, Purpose, Vision, and Strategy. The backstory is our journey to now, which create our values (guiding beliefs)  and purpose (reason to exist).  In a story driven organization, those are the forces that drive the organization’s vision (aspiration for the future) and strategy (align opportunities, plans, and behavior).

Arne’s backstory that defines his identity, values, and purpose are his experiences growing up in Chicago with his mother’s inner-city after-school program.  As he describes it, the Chicago that he saw with her program was just two miles but a world away from the section of Chicago where he lived. That’s not a bad backstory for a U.S. Secretary of Education.

3. No place for states

Virtually my entire career has been spent closely connected to state departments of education, as an assessment contractor, an employee, and for the last 16 years as a consultant. It appears, however, that state departments play, at best, a minor supporting role in Arne’s world.  At worst, they are another one of the liars, a barrier to improving schools.

There are three direct references to state departments of education that stand out in the book.  The first is a reference to low achievement standards set on the Illinois state assessment; offered as a direct instance of the lies told to students and parents in Chicago and as a general example of the so-called  race to the bottom by states across the country as they prepared for NCLB accountability requirements.  In the second reference, a DOE official in New Jersey is simply a pawn in a story detailing how the arrogance/incompetence of the Christie administration led to the state not being awarded millions of dollars of Race to the Top funding.  The third was a reference to speaking with Connecticut’s “chief education officer” on the day of the Sandy Hook shooting in the emotional and powerful chapter on guns in schools and society.

I guess this should not be a surprise.  Arne made his mark at the district level and it is clear that his vocation is in schools.  He does acknowledge the role that strong (and weak) governors can play in improving education, but like many in education does not seem to have a handle on the role that a state department of education can and should play.

Can the department be more than simply an agent implementing the policies of the federal government, governor or state chief? Can a state department of education be a change agent on its own? It behooves those of us who have centered our careers at the state level to be proactive in answering that question.

4. Time Travel

Reading How Schools Work, I felt that I had traveled back in time.  It is the same feeling that I get when I read remarks from former President Obama; and I am sure I would feel the same if I spent the $500 for an Intimate Conversation with Michele Obama.  It is the sense of hope and change that had me sitting in a store front office in Portsmouth, New Hampshire in the summer and fall of 2007 updating databases and making phone calls for an upstart candidate for president.

Then I remember that it is 2018.  This group had their eight years in office.  Yes, they made some improvements, but they fell far short of achieving their vision.  I understand the obstacles in their way.  What I have not yet determined for myself is how hard they tried to overcome those obstacles. And the frightening thought, if they did do their absolute best then what will it take and how long will it take to truly make a difference?

5. The Public School Model

It may be confirmation bias, but after reading How Schools Work I am convinced now more than ever that our public school model is not only broken but is outdated and is not something that we should try to repair.

To be clear, the ideal and concept of public education (i.e., the right to access for all to a high quality education) is as important as it ever was, arguably more important.

Also, there are fine schools and educators in suburbs, rural towns, and cities across the country where children are receiving a world-class education.

Our general model of K-12 public education, however, is broken at its core.  The funding model is not sustainable. We are well beyond the point where it is possible to fit the student-centered policies of the last 50 years into an educator-centered system.  We have burst through the age-based boundaries of the K-12 system at both ends and we long ago passed the point where the internal markers of grade levels have any meaning.

Everything in Arne’s book from his mother’s after school program to the foundation(s) he founded to his experiences in Chicago and USED to his plans for the future tell us that we need a new model for public education.

Arne and many of the rest of us have spent our lives trying to improve education from within the current system.  Arne’s mother worked outside of the system – although not necessarily by choice. I think that it is time to abandon a K-12 system clinging to a past that no longer exists for a new system that reflects the present and anticipates the future.

My Miss Brooks

Charlie DePascale

Our Miss Brooks was a highly successful comedy series on radio and early television that followed the life and career of a fictional high school English teacher, Connie Brooks. My Miss Brooks, Ann Brooks, was a highly successful teacher of the fifth and sixth grade Advanced Work Class at the Mather School in Dorchester, Massachusetts when I entered her class in the fall of 1969.

The Advanced Work Class (AWC) was, and still is, a program within the Boston Public Schools “that provides an accelerated academic curriculum for highly motivated and academically capable students. Coursework is challenging, and performance standards are high.”  According to BPS and borne out by data, a major benefit of the program is “[s]tudents who successfully complete AWC are well prepared to compete for admission to the three BPS exam schools or to other accelerated programs.”

In my 1969 instantiation of the AWC, 20 students from elementary schools throughout Dorchester (the largest “neighborhood” in Boston) were selected to spend 5th and 6th grade in Miss Brooks’ class at the Mather School.  There may have been some testing involved in the selection process, perhaps including IQ testing, but I was unaware of that.

The class included 10 girls and 10 boys and we were diverse by Boston/Dorchester standards of the time; that is, there were students from Irish and Italian backgrounds (along with a few other ethnic groups) and the class was 90% white.  We were from a mix of blue- and white-collar middle class families. Almost all of the original 20 students completed the two years, but there were a couple of replacements along the way.

From the beginning of the fifth grade, the openly acknowledged goal was that at the end of the two-year program all of us would pass the entrance exam to one of the city’s Latin schools: Boston Latin School (aka Boys Latin) for the boys and Girls Latin for the girls.  (The two single-sex grade 7-12 Latin schools became coed as we were entering the eighth grade and remain separate coeducational schools today.)

Although passing the standardized, multiple-choice test administered in the spring of sixth grade was the goal, as I think back there is nothing that I recall from those two years that now would be considered test prep. I am certain that I am forgetting some things through the fog of 50 years.  Surely, we must have had some basic English and mathematics lessons.  There were quizzes, tests, grades, and lots of homework. Those things, however, were not what defined the class, and they are not what I remember from this pivotal time in my K-12 school career.

It was clear that this was going to be a different experience the moment we walked through the door of Room 8 at the Mather School. For the first time since kindergarten, this was not a classroom with rows of wooden desks bolted to the floor.  This room contained shiny modern desks that were arranged around the room in four u-shaped clusters of five, but could be easily rearranged or cleared away, when necessary – and there were plenty of times when it was necessary.  And then there was the first class activity.

A boy and girl were selected to stand at the front of the class, introduce themselves to each other and have a conversation.  Looking back, we could have gone with where did you go to school last year or what did you do this summer; and sure, we were just weeks removed from minor events like the first moon landing, Woodstock, and Chappaquiddick.  But, standing at the front of that class we had nothing but fidgeting and uncomfortable silence. I tried unsuccessfully for two years to talk to that little red-haired girl…  No wait, I was the one with red hair and that’s a different Charlie’s story.

Anyway, those first awkward conversations were just the beginning of two years of constant interacting, collaborating, performing, and celebrating with each other. The biggest event was the annual Christmas play our class performed; rehearsals throughout the fall culminating in two school-wide performances – for grades 1-3 and 4-6. These were full performances with hand-painted, wood-frame sets, costumes, and props.  Our 5th grade performance of Charles Dickens’ A Christmas Carol was followed in the 6th grade with the heart-wrenching The Birds’ Christmas Carol by Kate Douglas Wiggin. (It was at the class Christmas party following the 6th grade performance that I learned that Jeremiah was a bullfrog.)

In addition to the Christmas plays, other examples of special activities included.

  • Our class newspaper complete with school and local news, sports, entertainment, and comic sections. Mimeographed copies were widely distributed.
  • The Greek festival at the end of our unit on Ancient Greece where we made presentations, displayed the results of our efforts working with wet clay, and most of us had our first taste of feta cheese and baklava.
  • Our performance of Raindrops Keep Fallin’ On My Head, in costume, and in French at the annual schoolwide Mother and Daughter night.
  • Keeping with the French theme, the end-of-the-year French festival where we tried our hand at making various French dishes and produced a mimeographed collection of recipes. The French custard recipe became a Father’s Day tradition at our house.

All of those activities supplemented the discussions, collaborative projects, and presentations that were a regular part of our daily routine. And we constantly rearranged those desks into various small groups where we pushed, challenged, and supported each other.

In the spring of sixth grade we all passed the standardized entrance exam and were admitted into our respective Latin schools. And six (or seven) years later, most of us graduated from either Boston Latin School or the newly named Boston Latin Academy. We were prepared for the school and not simply for the test.

As I look back on it now, preparing us to succeed at the school was much more important than preparing us for the test because, in reality, the entrance exam was not a high-stakes or high-risk test for us. Yes, the Latin schools were selective and admission was competitive.  In the early 1970s, I estimate that there were about 10,000 sixth graders in the Boston Public Schools. If equally divided among boys and girls that would be 5,000 boys competing for the approximately 500 seventh grade seats at Boston Latin.

There was probably little doubt, however, that our carefully selected group of 10 boys would perform in the top 10% of BPS students on the entrance exam. What we didn’t understand at the time was that the hard part was staying in the school and graduating.  Although approximately 500 students entered in the seventh grade in 1971 and an additional batch of students entered our class in the ninth grade, our graduating class in 1977 had just over 200 students. During 7th grade orientation we received the Latin school version of look at the boy on your left and the boy on your right, two of you won’t be here by 12th grade.

In 1971, the entrance exam was a broad net, collecting three times as many students as would ultimately graduate.  Additional filtering was done at the school.   There was a large tolerance for selection error on the test.

The admissions math changed, of course, the very next year when the school became coed, potentially doubling the pool of applicants.  It changed again as enrollment in the Boston Public Schools dwindled and a much greater portion of the Latin School class came from private elementary schools. And at some point in the intervening years, the admissions philosophy changed.  The goal was to do what was necessary to ensure that all admitted students had the opportunity to make it to graduation.  Last year, Boston Latin had 417 seventh grade students and 412 twelfth grade students.

All of the changes described above raise the stakes associated with the entrance exam.  I wonder what the impact has been on the Advanced Work Class.

room 8

Recipe and Play Script

Ten Years of Taylor

Ten Years of Taylor cropped

Charlie DePascale

Ten years ago, August 22, 2008, my daughter and I attended our first Taylor Swift concert in Hartford, Connecticut. The original plan was to attend the concert a few weeks later in Massachusetts – a bit closer to our home in Maine.  About to start high school, however, she was a bit worried (i.e., panicked): what if I have too much homework …  So, we made the late August trip to Hartford.  (The September concert was our second concert.)

This summer, my wife and I visited our daughter in Maryland.  On a very hot and humid night at FedEx field, the three of us attended my 30th Taylor Swift concert.

There are so many memories across 10 years and 30 shows – from that 8-song set opening for Rascal Flatts to the 2-hour reputation Stadium Tour show.

With my daughter –

  • Finding a public library with wifi close to my meeting in Rhode Island, waiting for Fearless tickets to go on sale and then sending her the two-word e-mail, We’re In!
    • Later that summer driving home from that concert in Connecticut through a tropical storm.
  • Eating lunch in the Wesleyan dining hall before driving home for Christmas and deciding, sure we can make the drive to Philadelphia for a concert during spring break; then buying tickets when they went on sale at noon.
  • Foxboro in the rain (more on that later)
  • Stopping in DC for a concert on the way to college visits in North Carolina and Virginia
  • Driving to Tanglewood on the rumor that Taylor would perform at the James Taylor concert. (she did)
  • Capping off a family trip to Colorado, Wyoming, and Mt. Rushmore with a concert in Denver
  • Walking around Georgetown last November taking pictures with ‘reputation’ UPS trucks.

And on my own –

  • Trips driving around North Carolina for back-to-back shows in the Raleigh – Greensboro – Charlotte triangle (extended to Charlottesville/UVA and back-to-back-to-back shows in 2013)
  • The 20-hour visit to NYC via Amtrak during Thanksgiving week for a concert at Madison Square Garden – with surprise guest James Taylor
  • Combining concerts in the Twin Cities with visits to old friends from the University of Minnesota

There are two memories, however, that will always stand out above the rest.

Taylor in Maine – aka Hello, Boats!  (August 27, 2010)

DSC09279

Hello, Boats!

Taylor was in Maine for the premiere of the video for Mine, to be broadcast that evening on CMT. The radio was full of Taylor Sightings and rumors about the location of the secret broadcast. I was spending one of the last Fridays of the summer with the family.  At the end of lunch, I decided to turn on my laptop to check my e-mail – no smartphone at that time.

There in my Inbox was an e-mail from Taylor Nation with the subject line “You’re Invited: Taylor’s CMT Taping in Maine”.  The e-mail contained instructions about how to dress/behave for a live television show, and directions to a school in Kennebunkport where a bus would take guests to/from the still unnamed location of the taping.

This had to be some sort of prank.  Sure, I lived in Maine, had attended a few concerts, and had already purchased way too much merchandise, but still…   But what if it’s real?  All we had to lose was the time for a short drive to Kennebunkport.

The e-mail didn’t mention anything about bringing a guest.  If it’s real, will it work for two people?  We made a plan and my daughter and I drove to the pickup location.

The good news, there was a bus and a small group of people.  The bad news, all of the other people seemed to have had some involvement in the taping of the video earlier that summer – as extras, volunteers, or staff in places Taylor visited.  There was a local person with a check-in list.  Of course, we weren’t on her list and she knew nothing about the e-mail invitation. Thankfully, after reading the invitation and looking at my daughter she told us to get on the bus!

 

DSC09240

 

While the bus was on its way, the secret location was announced on the radio.  When we arrived, there was a crowd lining the street and driveway leading up to the seaside property where the show would take place.  With orange wristbands secure, we were led through the crowd to our designated area on the lawn.  We were told what to expect and how to react during the run through without Taylor, the rehearsal with Taylor, the live show, and that there would be a small concert following the show.  We were told that some of George H.W. Bush’s grandchildren were there and that the former President would arrive soon.

I stayed off to the side and watched Taylor arrive, and then a bit later, President Bush.  My daughter, being better able to fit in small places worked her way up toward the front of the crowd.  It all went perfectly.  Taylor came out for the interview, turned and waved hello to the crowd of boats off shore, and the video premiered.  After a short break Taylor returned with her band returned and performed a short concert.  What a night!

Foxboro in the Rain (June 25, 2011)

DSC01404

I‘m not a fan of stadium shows, but I have attended five of Taylor’s ten concerts at Gillette Stadium (only half, clearly not obsessed). Each Gillette show has been a memorable experience, but none will ever match Foxboro in the Rain.

My daughter and I didn’t sit together.  I had a seat on the floor near the stage. She didn’t want to sit on the floor, so she had a seat near the top of the lower level – in what fortuitously was one of the few covered rows at Gillette Stadium.

For me, the two highlights of the night were Mean and the rain.  Coming into the show, I didn’t understand how strongly young fans related to Mean.  Many of Taylor’s songs were autobiographical, but Mean was the first that was not a generic experience. It was explicitly about her and the claim that she couldn’t sing – “drunk and grumbling on about how I can’t sing.”  I thought that the direct reference made the song less relatable. Apparently, however, the magic of Mean is that direct connection to Taylor’s experience.

Walking around before the show we saw a sea of ‘why you gotta be so mean’ t-shirts and small groups singing the song all around the stadium.  And then the concert; sitting in front of the stage and hearing 50,000 young voices singing out strong

But someday I’ll be living in a big old city
And all you’re ever gonna be is mean, yeah
Someday I’ll be big enough so you can’t hit me
And all you’re ever gonna be is mean
Why you gotta be so mean?

It was an overwhelming experience.

DSC01050

And then there was the rain.  It started out as a light shower over part of the stadium.  My daughter thought it was part of the show – Oh, they even have fake rain. There had already been fake snow earlier in the show (Back to December). It all fit together.  No, the rain was real.  It rained hard and it kept raining.  Taylor kept singing, dancing, and lying in puddles of water on the stage when the choreography called for it.  And fans kept singing, dancing, and taking pictures with their iPhones. This was so different than the time I was soaked to the skin in my wool band uniform at a Harvard-Cornell football game.

(I was amazed that the phones were not damaged by the rain.  Reading the online forums the next day, many of them were.)

As it turns out, it’s not a good idea to lie in puddles of water singing in the rain. Within a week, Taylor was sick and had to postpone shows.  I was there for her first show back in Montreal in mid-July, but that’s another story…

It Was The Start of a Decade…

When my daughter and I drove to Hartford in August 2008 I had no idea it was the start of a decade of 30 shows in eleven states plus Washington, DC plus Montreal.   The first time I listened to her first album, ‘Taylor Swift’, I knew these were great stories and this was a great storyteller. When I heard Fifteen those were words I wanted to tell my daughter.  And when I heard The Best Day those were words I hoped to hear from my daughter someday.  And when I heard All Too Well, well ….

So, next weekend I will sit in U.S. Bank Stadium in Minneapolis, sobbing through Long Live one more time.  And if fate steps in and there are no more Taylor Swift tours or concerts for me after this summer, I know that for as long as I live, and I hope as long as my daughter lives, these last ten years will be remembered.

 

 

 

 

 

Charlie DePascale

When I think about educational measurement the first thing that comes to mind is a high-fructose corn syrup commercial from about 10 years ago.

 

 

On one side there is the man who holds, but cannot articulate, the widespread, but ill-defined, perception (misperception?) that high-fructose corn syrup is inherently bad.  On the other side is the woman with the tempting treat who provides a couple of carefully selected facts and makes the claim that high-fructose corn syrup is fine in moderation.  Man takes the treat from Woman and all is right in their world 30-second commercial world. It is truly an Adam and Eve moment – although that’s probably not the allusion the sponsors of the commercial were after.

In 2018, as a field and as an industry, educational measurement finds itself in much the same place as high fructose corn syrup.  We developed an appealing, inexpensive product (i.e., large-scale standardized tests), exulted in its success, and then could do little when we lost control of the product that defines us.   For much of this century, we have taken the role of the woman in the HFCS commercial.   We ensure people that there is nothing harmful or evil about educational measurement used properly and in moderation; all the while watching test use soar beyond anything that can be called moderation. Assuming that it will be impossible to produce a cute 30-second video on the benefits of educational measurement that will be as effective as the counterargument that John Oliver has already produced, where do we go from here?

I think that the only solution is to engage aggressively in rebranding. Educational Measurement is ripe for a makeover or perhaps even a complete do-over.  Now is the time to change not only the surface image of educational measurement, but to actually change what we mean when we talk about educational measurement.

Two assessment industry icons have already started this rebranding.  College Board began by changing the name of the SAT (much like KFC), conducted a major overhaul of their flagship instrument, and then created a new suite of products and services aimed at a new market.  ACT has gone even further as it redefines itself as a learning company rather than an assessment company.  As described in an EdSurge article earlier this year, ACT CEO Marten Roorda “wants the ACT to become more involved in the learning process, and provide more analytics solutions to teachers and students. “

Reforms to the ACT and SAT assessments, of course, are just the tip of the iceberg. Learning analytics, big data, personalized instruction, and adaptive learning are trending topics in education which are already impacting the measurement community.  At the ITC 2018 conference in July, John Hattie and Alina von Davier delivered keynote addresses on visible learning and computational psychometrics, respectively, which forced those listening to reconsider how we think about and do educational measurement.  As Kathleen Scalise explained at the 2018 NCME conference in April, it is not a question of if or when big data and learning analytics will impact educational measurement, they are already here and they already have.

Back to our roots

We must begin any attempt to rebrand, redefine, or refocus educational measurement by revisiting our roots.  And the best place to reconnect with those roots is the first edition of the so-called bible of our field, Educational Measurement, published in 1951.  We choose this as a starting point because as explained by E.F. Lindquist (editor), “prior to its publication … no book had yet been published that would even begin to fill an urgent need…for a comprehensive handbook and textbook on the theory and technique of educational measurement.”

It can also be argued that among the four editions of Educational Measurement (1951, 1971, 1989, 2006), the initial edition made the best attempt to ask and answer the Why? Question; that is, to define the purpose of educational measurement.  It is from understanding the purpose of educational measurement that we are able to glean the core values and guiding principles of our field which is the first step

In Part 1, The Functions of Measurement in Education, the 1951 edition begins with four chapters that address fundamental issues related to the primary functions of measurement in education at that time:

  • The Functions of Measurement in the Facilitation of Learning
  • The Functions of Measurement in Improving Instruction
  • The Functions of Measurement in Counseling
  • The Functions of Measurement in Educational Placement

Part 3, Measurement Theory, begins with a chapter on The Fundamental Nature of Measurement that ends with a section titled, Explanation as the End of Measurement, and the following admonition:

The primary concern of measurement, however, should be for an understanding of the entire field of knowledge rather than with statistical or mathematical manipulations upon observations.

Knowledge will be advanced by recognizing what the empirical methods of measurement ignore…The aim of measurement must ever be the explanation of, or the meaning for, observed phenomena.

A practical application of those statements is provided by Ralph Tyler in describing the organization of his chapter on the functions of measurement in improving instruction:

Since the purpose of this chapter is to outline the ways in which educational measurement, that is, achievement testing, can serve to improve instruction, we shall consider first what steps are involved in an effective program of instruction and then indicate the contributions that achievement testing can make to each of these steps.  In this connection it will be noted that educational measurement is conceived, not as a process quite apart from instruction, but rather as an integral part of it.

Tyler then goes on to describe four sequential phases of instruction

  1. To decide what ends to seek; that is what changes in student behavior to try to bring about
  2. To determine the content and learning experiences likely to attain those ends
  3. To determine an effective organization of those learning experiences to bring about the desired ends effectively and efficiently
  4. To appraise the effects of the learning experiences to determine whether they have brought about the desired ends or changes in student behavior.

He argued in 1951 that educational measurement as a field had become stuck on Step 4 (documenting the effects of instruction); not focusing enough on how educational measurement can and should inform and support the other three steps of instruction.

This problem has only been exacerbated in the last 60+ years as our field has become more technical, more specialized, and more separated from instruction.  While we may pay lip service to the notion that a key purpose of educational measurement is to facilitate learning and improve instruction, we do little to understand and support that function.

Educational measurement must find a way to support all aspects of the instructional process; toward the ultimate goal of improving student learning.  And having taken on that task, we must find a way to convey the message that measurement is more than a test of student outcomes.

Rebuilding and Rebranding

Obviously, rebuilding and rebranding educational measurement will not be simple.  It will require more than a quick fix like a 30-second commercial, a catchy new slogan, or a name change. However, although not sufficient, I do think that a name change is necessary.  The term ‘educational measurement’ is too closely associated with achievement testing to continue to serve a useful purpose.  Additionally, it does not accurately reflect either what we have been doing as a field for the last 60 years or the new directions in which the field is moving (e.g., with a focus on personalization and computational psychometrics).

My suggestion for a starting point is to replace the term ‘measurement’ with ‘modeling’ – Educational Modeling.  What is the case for modeling? Just for starters …

  1. With a few notable exceptions, modeling is a much more accurate description of what we do as a field than measurement. (Yes, I see you out there Rasch folks.)
  2. By its very nature, the term modeling conveys a sense of concern with an entire process or an entire system and the interactions among the components of that system.
  3. Measurement, not just educational measurement, is an outdated 20th century concept. The 21st century world is just much too complex to measure. Our field, and psychology in general, latched on to the term measurement last century because it was cool and gave the field credibility. Modeling is the new measurement.
  4. Finally, through the Common Core State Standards (and its offspring) we have invested nearly a decade in spreading the word to K-12 educators, students, and the general public of the importance of modeling and its central role in all that we do as intelligent human beings.  Let’s take advantage of that and create coherence between what we say and what we do in education.

The Common Core defines modeling as “the process of choosing and using appropriate mathematics and statistics to analyze empirical situations, to understand them better, and to improve decisions.”  That sounds like what we are doing (or should be doing) in educational measurement.  In further describing modeling, the Common Core further states “Real-world situations are not organized and labeled for analysis; formulating tractable models, representing such models, and analyzing them is appropriately a creative process. Like every such process, this depends on acquired expertise as well as creativity.”

Again, isn’t that what we are supposed to be doing in educational measurement?

Let the games begin

To get this ball rolling, I call on NCME, consistent with their vision to be the recognized authority in measurement in education, to take the first step by changing their name to the National Council on Modeling in Education.  They won’t even have to change their logo, URL, or Twitter handle,

The next step would be for NCME and co-editors Linda Cook and Mary Pitoniak to make the upcoming 5th edition of our bible, Educational Measurement, a New Testament for our field.  Educational Modeling has a nice ring to it as a title.

We have to start somewhere to restore the reputation of educational measurement.

Are you ready for it?

 

 

 

 

Give Me A Lever

Charlie DePascale

5016825476_ac97485fd5_z

I realized very early in my career that the law of the lever, as explained by Archimedes in some variation of the quote above, was critical to my success. In short, there was little that I could do on my own as an assessment specialist, or psychometrician, to improve education; but working in concert with the right lever, we could move the world. My task, therefore, was to identify and associate with those levers.

In large-scale assessment that lever most often is a state policymaker; that is, a deputy commissioner, commissioner, board member, or governor. I left state assessment directors off of this list, because in many ways they are in the same position as I am. Without a policymaker as a lever, there is little that an assessment director can do. And there is no need to explain why federal education officials are never the appropriate lever.

[Aside: From my perspective as an assessment specialist, I see policymakers as my lever.  From their perspective, I may be their lever.  That is not really important.  What matters is the understanding that we need each other.  I may need them more than they need me, but we do need each other.]

Like operating a simple lever, the process of working with a policymaker is quite straightforward.  Begin with a policymaker who has a clear vision of what she or he wants to accomplish and a sense that assessment can help.  From that starting point, identify ways in which assessment can be used to support or advance the policymaker’s goals. Working together, determine what type of assessment is needed and how best to convey information from the assessment.  Understand what the policymaker would like to say (or needs to say) and work together to figure out a way to help her or him say it in their own voice.  Equally important, help them understand what the assessment cannot do and what they should not say.

When it all comes together just right, it’s a beautiful thing. Over the course of my career, I have been fortunate to be associated with several policymakers and assessment programs where things did come together quite well.

Of course, the pieces do not always come together exactly as you hoped.  Perhaps there are too many constraints (cost, time, capacity) to design and develop the assessment that is needed.  Perhaps the education leadership, governor, and legislature are not on the same page.  Perhaps other goals are higher on the policymaker’s priority list.  Or perhaps things came together for one brief, shining moment, but could not be sustained.

It is in such less than ideal situations that working with the right levers becomes even more important. With the right partners, you are often able to adapt, work through the issues, and make the best of the situation; sometimes just treading water until the context changes.  Without the right partners in place, however, oh you’ve got trouble. There is little that an assessment specialist can do – the assessment program flounders and the state moves on to another assessment program.

What about assessment in the classroom?

The importance of having the right partners is easy to see with regard to large-scale assessment.  Ideally, the assessment specialist and policymaker are working side-by-side to implement and maintain the assessment program.  That type of direct relationship rarely exists with regard to assessment in the classroom.  However, understanding the importance of partners is just as important to an assessment specialist when considering assessment at the classroom level.

My lever in the classroom is the teacher rather than the policymaker.  As an assessment specialist, however, I am likely to be much farther removed from a teacher than I was from a policymaker. I will seldom be in a position to interact directly with teachers as they make assessment decisions and use assessment information. The basic equation, however, remains the same.  For my work to make a difference in the classroom there must be an appropriate partner ready and willing to use it.

My task becomes providing tools that will help put the right information at the right time into the hands of a teacher who can use it to inform and improve instruction – in support of the ultimate goal of improved student learning.

That task is complicated by the fact that there are some teachers who are not prepared to be good levers and others who may be in situations that do not allow them to be good levers.  I might provide the same information, in the same way, at the same time to two teachers in the same school and see very different results.  Without a teacher prepared to use it, any information that I can provide will be much less effective.

How does this impact what I do and how I do it?

First, I have to acknowledge and understand the role of the teacher as my partner.  Although I will not be working side-by-side with individual teachers during implementation, I cannot work in isolation from teachers during the design and development process.

Second, I have to make the assumption that there will be a good partner in place at the classroom level. I have to design tools and resources that will be useful to an effective teacher.

Third, I have to realize that there will be many cases where the second assumption above is false.  I am convinced, however, that the solution is not to try to develop tools and resources so that can be used by any teacher, regardless of the knowledge and skills they bring to the table. That is a fool’s errand.

Rather, the solution is to identify and work with other partners to make the second assumption above more likely to be true.  Those partners will include state-level policymakers, district and school administrators, developers of curriculum and instructional support materials, and teacher educators. Throughout the entire process of preparing, certifying, and providing in-service support to teachers there must be a concerted effort to ensure that teachers are equipped to effectively use assessment in the classroom.

In short, …

The solution is not to try to make assessment teacher proof.

The solution is not add-on programs and materials designed to make teachers “assessment literate”.

The solution is to work with partners on multiple levels to better provide useful information to teachers who are prepared to make use of it.

 

If I Did It

Confessions of a Psychometrician

By OJ Simpsons Paradox with Charlie DePascale

Charlie – As we waited six long months for the release of the 2017 NAEP results, some wondered whether we would ever know the whole story; what really happened that February when NAEP reading and math went digital. Now that those results have been released and the NAEP trend line preserved, what do we really know?

This week, we are pleased to welcome, OJ Simpsons Paradox, a statistician and part-time psychometrician, usually locked deep within the bowels of the government where he has the ear of top education policy makers.  Today, he is here to offer his hypothetical account of how a broken trend line could be and should be “fixed” without anyone suspecting a thing.

OJ:  It all starts with NAEP.  The one constant through all the years, Ray, has been NAEP. America has rolled by like an army of steamrollers. It’s been erased like a blackboard, rebuilt, and erased again. But NAEP has marked the time.  This assessment, this trend line, is a part of our past, Ray.  It reminds us of all that once was good, and that could be great again. People want the trend line, Ray.  People definitely want the trend line.

Charlie: OK. You can call me Ray.  But aren’t people skeptical?

OJ: Ray Ray, you just tell them what they want to hear hear hear hear hear.  You need to tell em tell em tell em What they wanna hear.

Charlie: Sure, people will hear what they want to hear, believe what they want to believe; but this is psychometrics, measurement, facts…

OJ: It’s statistics, son.  Facts are stubborn things, but statistics are pliable.

Charlie: Pliable, yes. But, if the trend line were broken, how could you fix it?  You tell us that in the national sample students taking the test on paper performed 4 percentage points better on each item than those taking the test on computer.  That sounds like a big difference.

How does that compare to the p-value difference normally found between a top-performing state like Massachusetts and the national average or with states near the bottom of the list?

OJ: Right, in State A there is a 5-point scale score difference …

Charlie: Wait.  Sorry to interrupt.  No, I am asking about the national p-value difference.

OJ: Mindset.  You start with the mindset that the trend has been preserved and that you need incontrovertible evidence to prove that it has been broken.  The rest is just statistics.

You tell me that there is a 5 point difference between a state’s performance on paper and computer.  You think, “Damn, five points on NAEP is huge!”  NAEP can go 30 years without changing by five points.

But, could a difference that large happen by chance?  Maybe not too often, but 5/100 times, 1/100 times, 1/1000 times – you see where I am going with this?

Charlie: But what about Power?  With a paper sample of only 500 students…

OJ: Power!  We can take a year to report results and nobody bats an eye.  We can post cute little Twitter surveys while people are waiting and people ‘like’ them.

We can take the time we need to prepare the message. When I worked for a state we were taken to court and lost when we wanted to take two days to prepare a memo before releasing results.

We can bury you with videos, charts, graphs, data tools when we release the results.

That’s the only power I need.

Charlie:  People will want to know what happened to the trend line.

OJ:  We are reporting that nothing happened to the trend line, Don.  Reports that something hasn’t happened are always interesting to me, because as we know there are known knowns; there are things we know we know.  We also know there are known unknowns; that is to say we know there are some things we do not know.  But there are also unknown unknowns – the ones we don’t know we don’t know.

Charlie:  What does that mean?

OJ: Exactly!

Charlie: The trend line.  Was it broken?

OJ: Son, we live in a world that has trend lines; and those trend lines need to be maintained.  Who’s gonna do it?  You?  You have the luxury of not knowing what I know – that misrepresenting performance of an individual state, while tragic, probably saved lives; and my existence, while grotesque and incomprehensible to you, saves lives.

You want me maintaining that trend line!

YOU NEED ME MAINTAINING THAT TREND LINE!

Charlie: Well, thank you OJ.  That’s all the time we have today.  We are all looking forward to the release of the 2018 NAEP results later this fall.

OJ: We’ll see.

It’s about time

Charlie DePascale

We have all asked the question, “Where did the time go?”

As troubling as that question can be, more recently, I find myself pondering an even more vexing question, Where did time go?

Every day, it seems as though time has been removed as a dimension or component of some part of our lives in which it was always really important.

Television, of course, is a prime example.  I grew up with “same bat time, same bat channel,” and Sunday nights at 8 with the family in front of the television (could I stay awake long enough to see Topo Gigio).  Later there was 11:30 on Saturday nights and “must see TV” on Thursday.  Appointment television!

Now, I can watch a show whenever, wherever, and however I want – on demand.  I can still watch any of those shows referenced above as easily as a show that aired last night.  And not just whole shows.  I can pull up a clip of my favorite moments; like Sheldon erasing time as he makes a basic mistake while explaining the time parameter to Penny on The Big Bang Theory.

Not only can we pull up television or movie clips, clips of our own lives are now also neatly stored and readily available on demand.  We are supposed for forget certain things over time and to be able to process, shape, and reshape our memories. However, as Taylor Swift wrote recently,

“This is the first generation that will be able to look back on their entire life story documented in pictures on the internet, and together we will all discover the after-effects of that.”

Will it become more difficult for time to heal all wounds if we remove the passing of time; if every day, or at any time, moments in our lives are replayed for us in full color, with video and even sound?

Our Brief History of Time

Educational measurement, of course, has not been immune to this loss of time.  In previous posts, I have discussed our loss of the time needed to design, develop and evaluate assessment programs before making them operational. There is also the apparent lack of any understanding or consideration of time and the foundational formula D = RT when setting accountability goals for individual students, schools, or states.  The loss of time that I want to discuss today, however, is more fundamental to educational assessment.

Not so long ago, time was central to the design and administration of tests and also to the reporting and interpretation of test scores.  In the heyday of norm-referenced testing, test scores were based directly on the interpretation of a student’s performance at a particular point in time.  Grade Equivalent scores described student performance in terms of what was typical (or expected) at a given point of time within a school year.  Those scores, as well as percentile ranks and stanines, were based on the particular point in time at which the test was administered; with separate norm tables developed for each week within a defined test administration window.  As we moved to the NCLB era and more criterion-referenced achievement levels, student performance was still evaluated and interpreted in comparison to expectations at a fixed point in time (i.e, at the end of a particular grade level).  Time was still in play as recently as 2010 with the advent of the Common Core State Standards, when we spoke of student proficiency in grades 3-8 in terms of being “on track for college-and-career readiness” by the end of high school.  Referring again to our old friend, D = RT, the use of the term ‘on track’ implies that we have a fairly thorough understanding of distance, rate, and of course, time.

Losing Track of Time

Somewhere over the last five years, however, the assessment/measurement community lost track of time.  Ironically, in part, our loss of an appreciation for time can be attributed to pressures directly related to time – too much testing time, too long to report results, and the well-intentioned yet poorly conceived backlash against “seat time” in favor of competencies to be defined later.

But those reasons can only partially explain our complete abandonment of time. Perhaps we simply have succumbed to the pressures of an on-demand world.  Perhaps we started to believe our own rhetoric about vertical scales, invariance, and the wonders of IRT. Perhaps the assessment industry is simply trying to adapt to technology and the “lean startup” concept – get the product in the hands of the customer faster.

With almost reckless faith in psychometric theory we are willing to boldly go where no assessment person has gone before.  We will administer items anytime, anywhere, in any combination, and apply item parameters generated across wide swaths of time (it all averages out in the end) to produce a theta estimate for a student.

And what do we do with that theta estimate?  That’s where things get tricky.  Our “time-based” tools for reporting and interpreting test scores have not caught up with this new “time-free” approach to assessment.  We convert the theta estimate to a scale score – even a vertical scale score.  And then …

Time is all we have

And then we are face-to-face with the reality that educational assessment cannot exist without time.  Without slipping into the philosophical argument over whether any type of psychological measurement, including educational measurement, is “real measurement” we have to acknowledge that virtually all of our IRT-based assessment lacks the underpinning of a theory-based scale.  At our best, we assemble an agreed upon collection of items and collect data on student performance on those items at a particular point in time.  We cannot interpret student performance on our large-scale assessments without a consideration of time and both the expected and relative performance of students at that point in time.  We can make awkward attempts to couch test scores in criterion-referenced terms, but as the quote often attributed to Bob Linn says, “scratch a criterion and you’ll find a norm.”

But if we have the serenity to accept the ways in which we cannot change our dependence on time, perhaps we will have the courage to change the things that we can change, and the wisdom to know the difference.

At this time, we are embarking on one of our field’s greatest adventures and challenges – the development of assessments to measure attainment of the Next Generation Science Standards.  It is a task that challenges everything we know and hold dear about alignment, item construction, test construction, scoring, reporting, reliability, and of course, validity.  With nothing more than a meager notion of a construct, we are developing and implementing NGSS assessments.  Perhaps these NGSS assessments will be an example of the old principles of test construction meeting the new principles of the lean startup strategy – iterating with the client to understand the construct and build the product that is needed.  The NGSS assessments and construct will form and re-form each other over time.  If that’s the mindset of the assessment developers, clients, and policy makers that’s not necessarily a bad approach.

Only time will tell.