assessment, accountability, and other important stuff

Archive for the ‘Uncategorized’ Category

I’m With The Band



Harvard University Band


Charlie DePascale ’81

This weekend the Harvard University Band celebrates its 100th anniversary.  Along with meeting my wife, my time in the band remains one of the two happiest memories of my four years at Harvard. Actually, my memories of the band begin with the end of my junior year of high school.

It was the summer of 1976, the Bicentennial Year, and a high school classmate told me about this Summer Pops band at Harvard: anybody can join, they rehearse one night a week, and there are two concerts at the end of the summer – one in Harvard Yard and one at the Hatch Shell, where the Boston Pops play. OK, sign me up.

During that summer, on stage at Sanders Theater with a couple of hundred people of all ages and musical ability, I had my first interactions with Tom Everett, director of Harvard bands.  Until that summer, I had no intention of applying to Harvard.  Harvard was for other people.  But during that summer with Tom, I remember thinking, hey, if this is what Harvard people are like, I could spend four years here.  So, I applied, was accepted, joined the band and the wind ensemble, and quickly learned that there were no other people at Harvard like Tom.

Despite that, my decision to attend Harvard was a net positive (did I mention meeting my wife), and my experience with the band was definitely positive. In my short time with the band, I enjoyed performing at the Kennedy Center, traveling to New York City, Washington, DC, and Montreal, performing a song conducted by the legendary Arthur Fiedler, playing for Jackie Onassis, and on one magical December night witnessing the beginning of a major collegiate point-shaving scandal and fulfilling my childhood dream of playing Amen in the Holy Cross basketball band.  Dare to dream!

And then there are the lessons learned that extended well beyond my years in Cambridge.

First, there are a few practical takeaways:

  • A wool jacket can absorb several times its weight in rainwater and still be fine the following week – a clarinet, not so much.
  • If you’re tired enough, you can sleep anywhere – on the cement floor of a game room in Ithaca, in an end zone at Princeton, sharing a sofa bed with a virtual stranger in an apartment in Montreal, or on a bandmate’s shoulder during a long, late-night bus ride.
  • At least one time in their life, everyone should experience walking through a dark tunnel into a sunlit stadium to hear and literally feel the roar of 60,000 cheering people.

And then there are the larger life lessons that have served me well throughout my career.

  • Illegitimum non carborundum – Enough said.
  • Lines (1) – Sometimes when the gun sounds and you are jumping, or scrambling, from one formation to the next you end up on the wrong 45-yard line (they all look alike, you know). When that happens, just fall into line with the trumpet section, play the song, and rejoin the clarinets for the next formation.
  • Lines (2) – Everything and everyone is fair game for the halftime humor of Harvard Band – even the band itself. However, there are times when you know you are crossing a line that shouldn’t be crossed – for me, it was the formation that paired Ted Kennedy with a popular Bee Gees song. Don’t shy away from the line, but try to stay on the right side.
  • “The Game” Syndrome – Every week, the halftime show had to fit into a tight window. When our time was up, we were off the field – this wasn’t American Pie (reference is to the 1971 song; we can discuss the resemblance of the HUB to the early 2000s movie franchise at another time). That limit was a good match for our practice of rehearsing the show for the first time the morning of the game. The Harvard-Yale game, however, had a longer halftime, which provided a few extra minutes for an extended halftime show. Of course, the temptation to turn our show into a Super Bowl-worthy extravaganza was too great to resist – often with the same result as recent Super Bowl halftime shows.  Forty years later, there are still nights when my dreams are haunted by giant royal stick figures trying to “walk” across the field.  Dream big, but know your limitations.
  • A Dedicated Core – Every volunteer organization, whether it is a college band, a town Democratic committee, a regional educational research organization, or a national professional association cannot function without a dedicated core of passionate people who are willing devote way too much of their own time to doing all of the big and little things that must be done so that everything runs smoothly when the rest of us just show up. Treasure those people.
  • Leader of the Band – With the right person leading them a group of 200 community members, or 150 Harvard students looking to have fun, or 50 student musicians grateful for one more opportunity to keep playing can each make such beautiful music. It takes a special person to know how to pick the right music, create the right environment, and effectively structure a limited amount of rehearsal time to get the most out of each of those groups and individuals; teaching and gently moving them in the right direction with humor, skill, grace, and wealth of knowledge, skills, and experience.  Thanks, Tom.

So yes, I’m with the band and the band will forever be a part of me.

Happy Anniversary HUB!  Here’s to the next 100 years.


A good day ruined

Charlie DePascale

After a wonderful late summer day spent enjoying a rare weekday afternoon baseball game in Boston, I sat down last night and looked at my Twitter feed.  There among the trending items was this headline

The University of Texas’s Secret Strategy to Keep Out Black Students

Without even clicking to look at the article I knew without a doubt that this ‘secret strategy’ had to involve standardized testing.  Is this how gun manufacturers, drug companies, and those e-cigarette people feel?

Standardized tests don’t kill people …

Where will this end?

Will NCME soon be thought of in the same way as the NRA – but you know, without the money, membership, or political clout? The president of AERA fired a direct shot at testing last spring with her presidential address, “An Inconvenient Truth About the New Jim Crow of Education” – a catchy title.

Will San Francisco lawmakers be next to set their sights on NCME – labeling it a domestic terrorist organization?  Will there be a resolution to block NCME from holding its 2020 conference in San Francisco?  If so, will AERA support it?  After all, it wouldn’t be the first time AERA moved a conference in California in the name of supporting a social cause.

What about New York, already regarded as the Lexington and Concord of the opt-out revolution?  Will the governor and legislature take aim at NCME and deny state funds to people doing business with the testing industry?  What will the pineapple say?

Where do we go from here?

Is there a way to stem this anti-testing tide and restore the shine to this field of ours? (dare I say, to make testing great again. no, I think I’ll pass on that.)

I can support increased background checks for all test users.  I am more skeptical about federal- or state-mandated bans or limits on state and local testing.  Perhaps those efforts can reduce the damage caused by high-stakes census testing, but an ill-conceived and poorly-developed teacher-made test in the hands of an inexperienced teacher can still cause a lot of harm one child at a time.

Standards, lists of best and fair testing practices, and policy statements are necessary, but not sufficient.  I am pretty sure that I already read somewhere that it’s not acceptable practice to base a high-stakes decision on a single test score.

Improving the assessment literacy of all involved in the testing process including students, teachers, policy makers, the media, the general public, and psychometricians is a good place to start.  I know some folks in New Hampshire who are doing some nice work in that area. (sorry, no names or links.  need to keep that personal/professional firewall intact.)

Improved assessment literacy, of course, won’t stop people who want to use testing for unsavory or evil purposes from doing their thing.  It might, however, make others more aware of when testing is being used to do harm; and make them more likely to speak out; and make them less easily persuaded to opt out.

The trend toward moving the locus of assessment from the state house to the classroom also seems like a step in the right direction. Not only will that put actionable information in the hands of teachers, where it belongs; that shift will help eliminate problems like the one faced by Massachusetts last spring. As we have known for a long time, passage-based triggers are much easier to avoid in the classroom than on a state assessment.

Improving local assessment is a step, however, that will require tremendous investment in the infrastructure of teacher preparation programs and schools.  Remember that one of the big advantages of large-scale assessment is that it’s cheap and doesn’t require much training of teachers and school administrators.

Advances in instructional and assessment technology, personalized learning systems, and modeling based on a much broader base of data than test scores and student attendance also has a great deal of promise, but will not be without its own technical and social challenges.

I look back fondly on the days when we were accused of trying to peek into family life with our student survey questions on how much time was spent watching television each night or whether the student had a part-time job; of trying to brainwash students with our questions about the environment; or of simply trying to get students to fail with our trick questions that contained plausible distractors.  So, yes, tracking and developing psychometric models for a student’s eye movement, heart rate, and other signals of student engagement might face some resistance.

For now, however, I will go back to Twitter and read about climate change.  That ‘inconvenient truth’ reference aside, I am pretty sure testing isn’t being blamed for climate change – well, not yet – or maybe I’ll just listen to some music.


Is this person college-and-career ready?


yes no


Charlie DePascale

Now that the administration has dropped efforts to include a citizenship question on the 2020 Census, perhaps there is space on the form for the proficiency question, “Is this person college-and-career ready?”  For persons 18 and under, the question would be, “Is this person on track to college-and-career readiness?”

Think about it. We ask the question in April 2020 and by December 31st we have a national count of the number of college-and-career ready residents in the United States.  By March 31, 2021 we have state-level counts disaggregated by race, ethnicity, and other key demographic factors.  In about the same amount of time that it took to produce the 2017 NAEP Reading and Mathematics results, we would have proficiency information for the entire U.S. resident population instead of the small portion of the population captured by that ill-defined social construct grade level.

The federal government could then make decisions about how much money to allocate to programs designed to improve college-and-career readiness and how best to distribute that funding across the states (just as they do with other information collected through the Census). States could begin to redesign their early childhood, K-12, postsecondary, and adult education programs to better meet the needs of their residents (just as they do with other information collected through the Census).

I bet that you are thinking, but Charlie, just how accurate could that self-reported information possibly be?  Well, you see, accuracy is a funny concept; it’s one of those eye of the beholder, depends on what the meaning of “is” is type of things.  Would a U.S. Census count of college-and-career readiness be any more or less accurate than the differences we have had in proficiency estimates among the 50 states or between states and NAEP?  Would the actions triggered by a U.S. Census count of college-and-career readiness be any more or less appropriate than actions resulting from the wide variations across states in the percentage of schools identified for support and improvement under ESSA accountability systems?

Or perhaps you are thinking, but Dr. DePascale – psychometrician – a single Census question on college-and-career readiness is not measurement.  Where are the “big 5” sources of validity evidence?  Where are the external alignment studies?  Where is the USED Peer Review?

All valid points, but here’s the thing, federal and state assessment policy has never been about measurement.  As I have argued in previous posts, determining the percentage of students in a state who have met minimal competency standards, attained proficiency,  are on track to college-and-career readiness or who have made progress from fall to spring is now, always has been, and always will be, at its core, a data collection problem and not a measurement problem.

At one time, the most efficient and accurate way to solve that data collection problem was with a large-scale state assessment; that is, with a short, on demand, machine-scored test administered to students in the general education program at selected grade levels.  But that time was a long time ago.  Policies and laws on inclusion changed.   The student population became more diverse. Content and performance standards became more rigorous and complex.

Thought experiment: Imagine you have  placed a group of experts in a room (or even a group of testing company psychometricians), and tasked them with coming up with the most efficient and effective way to determine the number or percentage of students in a state who are on track to college-and-career readiness or the number and percentage of high school graduates who are college-and-career ready.  If you don’t like a group of experts, you can crowd-source the task or use artificial intelligence to solve it.

Whatever approach you take, it is highly unlikely that the solution that is generated will be a single, on demand, end-of-year, state assessment.  If you expand the task to determining the number for the country rather than a single state, I guarantee that the solution will not be 40-50 unique state assessments.

The solution may include a limited amount of state and federal assessment (e.g., something like NAEP), but it is virtually certain that the solution will be more centered on quality data collection than on high quality assessment; and if we are looking for a data collection solution, what better place to begin than the U.S. Census Bureau.  Their self-described mission is “to serve as the nation’s leading provider of quality data about its people” with the goal “to provide the best mix of timeliness, relevancy, quality and cost for the data we collect and services we provide.”  Does any state department of education assessment or any testing company claim the same mission and goal?  Would we want them to?

Where would we begin?

So, with the proficiency question on the 2020 Census where would we begin to ensure the most accurate count possible?  The first step would probably be to develop a common definition of college-and-career readiness that we want people to use when answering the question.  The next step might be a public education campaign to get the public on board with the importance of collecting the information. That campaign undoubtedly would include clear descriptions and real-life examples of college-and-career readiness or of being on track to college-and-career readiness – descriptions that people can easily grasp and apply to themselves and the people in their home.

Now you may be asking yourself aren’t those the same things that we should do when introducing a new set of content standards or assessment program.  The answer, of course, is yes; but often those steps are forgotten or are given insufficient attention and resources when the focus is on building a better assessment or accountability system rather than on collecting better data.

There have been efforts at such public relations campaigns in the past, and they have been somewhat successful.  When the MCAS tests and new performance standards were introduced in Massachusetts in the late 1990s, “What Does Proficient Look Like” workshops were held in communities across the state and “Test Yourself” brochures were distributed at toll booths, grocery stores, and public libraries.  When the Common Core State Standards were introduced, it was impossible to watch a professional golf tournament on network television without seeing a “Support the Common Core” commercial sponsored by EXXON or some other major corporation (yes, that sentence was intentionally Bidenesque).

Massachusetts no longer has toll booths, people buy groceries online, and public libraries are being repurposed to meet the changing needs of communities.  Women’s soccer matches may be a better option than professional golf tournaments for spending advertising dollars (at least every four years).  Yes, the medium will change, but the message and the need for the message remains the same.

We can develop the best large-scale assessment ever imagined; but at the end day and at the of the school year, if every teacher, parent, and student cannot give an accurate answer to the question “Is this person on track to college-and-career readiness?” without looking at a score on a state assessment, what have we really accomplished?

Gold Standard?




Charlie DePascale

Disclaimer:  I did not have access to my laptop and was forced to prepare this post on my Surface tablet. I apologize in advance for any effect that had on the length or quality of the post.

By any metric, 2017 was, and continues to be, a very bad year for NAEP.  Troubles began in April 2018 with the utter fiasco that was the long-delayed release of the 2017 Reading and Mathematics results.  Seldom in the course of human history have so many good statistics been sacrificed in the name of preserving an illusory trend line.  Then late last week came the announcement that no amount of statistical sleight of hand could save the results from the 2017 Writing assessment.  (And by announcement, I mean burying information deep on a website on the first Friday of summer that no results were forthcoming and that a more detailed report would be available in the spring of 2020.)

Perhaps, however, the worst news for NAEP was when they announced that they were not releasing results from a major assessment, few people noticed and fewer people cared.

Where does all of this leave NAEP as we await the results from the 2019 NAEP Reading and Mathematics assessments?

As I started to write this post, I will admit that I was feeling a bit cynical toward NAEP and my original title was ‘Gold Standard, My Ass!’

But then I thought, who among us hasn’t wanted just 3 more days or even 3 more hours to figure out what the hell was going on as we tried to equate writing results across years. If NAEP can lead the way and establish 3 years as an acceptable time frame, more power to them.

Plus, there have been plenty of times in the last 30 years when I have had to help state leaders make the hard decision between making necessary changes to their Reading and Mathematics tests or preserving their reporting scale and trend line.  If NAEP can lead the way on having your cake and eating it, too, pass me another slice.

And why should a state go through all of those hoops to convince USED that they really did administer a test and hold all students accountable this year if it is possible to just decide not to report results.  Be that shining light in the darkness, NAEP! We will follow!

Still not totally sure which direction I should go with this post, I thought a little bit more about the term ‘gold standard’ and what it represents.

There are several things about NAEP that do make it a symbol of the ideal in large-scale assessment:

  • Testing periodically rather than every year.
  • Testing intermittently across grade levels rather than at every grade level
  • Testing samples of students rather than all students
  • Using matrix sampling to improve the sampling of content on each assessment
  • Separate scaling of domain areas so that subscores might actually be useful
  • Demonstrating a total disdain for deadlines in the name of getting it right
  • The willingness to serve as an example of how difficult it is to set meaningful performance standards on a large-scale assessment.

Those things should be more than enough to offset the lack of transparency and no individual student scores and establish NAEP as an ideal; that is, a gold standard.

The final thing that turned me around on NAEP as a gold standard, however, was remembering that the gold standard is an antiquated monetary concept that was abandoned by virtually all nations decades ago; it is an anachronism that simply no longer works in the real world.

So NAEP, I owe you an apology.  Feel free to hold firm as the old white male of assessments in the changing world of 2019.  In so many ways, you are and will always be the gold standard.

IASA – Refreshing our Memory

Charlie DePascale

This year marks the 25th anniversary of the 1994 reauthorization of ESEA, known as the Improving America’s Schools Act (IASA).  Throughout the year, I will explore how various aspects of that law shaped my career, educational assessment and accountability, and K-12 education, in general. All of this will be done, of course, with an eye toward the next reauthorization of ESEA and the future of K-12 assessment and accountability.

As we begin the year, however, let’s just take a few minutes to refresh our memories on the thoughts about equity, excellence, and education that drove the 1994 law. Sometimes it’s not necessary to write anything new.  The words speak for themselves. I call particular attention to the middle section titled, What Has Been Learned Since 1988.






‘‘(1) IN GENERAL.—The Congress declares it to be the policy of the United States that a high-quality education for all individuals and a fair and equal opportunity to obtain that education are a societal good, are a moral imperative, and improve the life of every individual, because the quality of our individual lives ultimately depends on the quality of the lives of others.

‘‘(2) ADDITIONAL POLICY.—The Congress further declares it to be the policy of the United States to expand the program authorized by this title over the fiscal years 1996 through 1999 by increasing funding for this title by at least $750,000,000 over baseline each fiscal year and thereby increasing the percentage of eligible children served in each fiscal year with the intent of serving all eligible children by fiscal year 2004.

‘‘(b) RECOGNITION OF NEED.—The Congress recognizes that—

‘(1) although the achievement gap between disadvantaged children and other children has been reduced by half over the past two decades, a sizable gap remains, and many segments of our society lack the opportunity to become well educated;

‘‘(2) the most urgent need for educational improvement is in schools with high concentrations of children from low income families and achieving the National Education Goals will not be possible without substantial improvement in such schools;

‘‘(3) educational needs are particularly great for low-achieving children in our Nation’s highest-poverty schools, children with limited English proficiency, children of migrant workers, children with disabilities, Indian children, children who are neglected or delinquent, and young children and their parents who are in need of family-literacy services;

‘‘(4) while title I and other programs funded under this Act contribute to narrowing the achievement gap between children in high-poverty and low-poverty schools, such programs need to become even more effective in improving schools in order to enable all children to achieve high standards; and

‘‘(5) in order for all students to master challenging standards in core academic subjects as described in the third National Education Goal described in section 102(3) of the Goals 2000: Educate America Act, students and schools will need to maximize the time spent on teaching and learning the core academic subjects.

‘‘(c) WHAT HAS BEEN LEARNED SINCE 1988.—To enable schools to provide all children a high-quality education, this title builds upon the following learned information:

‘‘(1) All children can master challenging content and complex problem-solving skills. Research clearly shows that children, including low-achieving children, can succeed when expectations are high and all children are given the opportunity to learn challenging material.

‘‘(2) Conditions outside the classroom such as hunger, unsafe living conditions, homelessness, unemployment, violence, inadequate health care, child abuse, and drug and alcohol abuse can adversely affect children’s academic achievement and must be addressed through the coordination of services, such as health and social services, in order for the Nation to meet the National Education Goals.

‘‘(3) Use of low-level tests that are not aligned with schools’ curricula fails to provide adequate information about what children know and can do and encourages curricula and instruction that focus on the low-level skills measured by such tests.

‘‘(4) Resources are more effective when resources are used to ensure that children have full access to effective high-quality regular school programs and receive supplemental help through extended-time activities.

‘‘(5) Intensive and sustained professional development for teachers and other school staff, focused on teaching and learning and on helping children attain high standards, is too often not provided.

‘‘(6) Insufficient attention and resources are directed toward the effective use of technology in schools and the role technology can play in professional development and improved teaching and learning.

‘‘(7) All parents can contribute to their children’s success by helping at home and becoming partners with teachers so that children can achieve high standards.

‘‘(8) Decentralized decisionmaking is a key ingredient of systemic reform. Schools need the resources, flexibility, and authority to design and implement effective strategies for bringing their children to high levels of performance. ‘‘(9) Opportunities for students to achieve high standards can be enhanced through a variety of approaches such as public school choice and public charter schools.

‘‘(10) Attention to academics alone cannot ensure that all children will reach high standards. The health and other needs of children that affect learning are frequently unmet, particularly in high-poverty schools, thereby necessitating coordination of services to better meet children’s needs.

‘‘(11) Resources provided under this title can be better targeted on the highest-poverty local educational agencies and schools that have children most in need.

‘‘(12) Equitable and sufficient resources, particularly as such resources relate to the quality of the teaching force, have an integral relationship to high student achievement.

‘‘(d) STATEMENT OF PURPOSE.—The purpose of this title is to enable schools to provide opportunities for children served to acquire the knowledge and skills contained in the challenging State content standards and to meet the challenging State performance standards developed for all children. This purpose shall be accomplished by—

‘‘(1) ensuring high standards for all children and aligning the efforts of States, local educational agencies, and schools to help children served under this title to reach such standards;

‘‘(2) providing children an enriched and accelerated educational program, including, when appropriate, the use of the arts, through schoolwide programs or through additional services that increase the amount and quality of instructional time so that children served under this title receive at least the classroom instruction that other children receive;

‘‘(3) promoting schoolwide reform and ensuring access of children (from the earliest grades) to effective instructional strategies and challenging academic content that includes intensive complex thinking and problem-solving experiences;

‘‘(4) significantly upgrading the quality of instruction by providing staff in participating schools with substantial opportunities for professional development;

‘‘(5) coordinating services under all parts of this title with each other, with other educational services, and, to the extent feasible, with health and social service programs funded from other sources;

‘‘(6) affording parents meaningful opportunities to participate in the education of their children at home and at school;

‘‘(7) distributing resources, in amounts sufficient to make a difference, to areas and schools where needs are greatest;

‘‘(8) improving accountability, as well as teaching and learning, by using State assessment systems designed to measure how well children served under this title are achieving challenging State student performance standards expected of all children; and

‘‘(9) providing greater decisionmaking authority and flexibility to schools and teachers in exchange for greater responsibility for student performance.






Three Little Words


Charlie DePascale

Life is full of three-word phrases.

Some tend to have profound and lasting consequences that extend far beyond what may have been intended when they were uttered.  Phrases such as I Love You, That Looks Safe, and for those among us wavering on new year’s resolutions, Just One Bite might fall into this category.

Other ubiquitous three-word phrases like While Supplies Last, Limited Time Offer, Exclusions May Apply, and Void Where Prohibited function exactly as intended; even if we are usually not happy to see them.  Often hidden in the fine print, their sole purpose is to put constraints on an offer or claim that is being made.

In the last couple of years, a three-word phrase has begun to make its way into the assessment lexicon – on this test.  At first glance, the phrase, or a close variation of it, seems neither new nor threatening when used to describe student or group performance. Charlie spelled 23 words correctly on this week’s spelling test. Karla met the college readiness benchmark on the SAT. In Vermont, 42% of grade 5 students performed at the Proficient level or higher on the Smarter Balanced mathematics test. Taken at face value, the phrase is used simply to identify the test that was taken.

Recent use of this common phrase, however, is intended to do much more than identify the source of performance.  Its purpose is to limit interpretation of student or school performance; to make it clear that the performance should be interpreted within the specific framework of the test or testing program.

Again, at first glance, we might regard this use of the phrase as innocuous or perhaps even a step forward in test use and interpretation.  Identifying the source of a test score seems quite consistent with many of our Standards for Educational and Psychological Testing, beginning with Standard 1.0:

Clear articulation of each intended test score interpretation for a specified use should be set forth, and appropriate validity evidence in support of each intended interpretation should be provided.

When considered as part of a larger effort to marginalize and vilify large-scale assessment, however, the connotation of the phrase on this test changes dramatically. It is the second punch in a one-two combination intended to knock out large-scale assessment.  The left jab that has weakened the credibility of large-scale assessment is the argument “Scores on large-scale assessments are ______” – fill in the blank with your favorite criticism: not valid, unfair, inaccurate, not representative, unstable, insufficient, not authentic, etc.  Now with the right cross of on this test, critics of large-scale assessment (or its uses) seek to nullify test scores by limiting their interpretation to that already weakened large-scale assessment.

Even the most well-designed assessment program can sustain only so many of these blows before collapsing in a heap to the canvas.


My first encounters with the on this test crowd occurred while working with two states setting achievement standards on their new college-and-career-readiness tests.  A vocal minority in both states were adamant that the phrase on this test should be added to each achievement level description. Their stated intent was to convey that students’ performance on the state assessment was not representative of their overall level of achievement.

My most recent encounter came late last year in a Stephen Sawchuk post in Edweek about the decision to add the modifier NAEP in front of the achievement level classifications on the National Assessment of Educational Progress; as in NAEP Basic, NAEP Proficient, NAEP Advanced. As stated in the post, “[t]he rewording may seem awfully minor to the uninitiated. But there’s a deeper subtext behind the changes, and that’s why this is worth noting.”

For their part, the National Assessment Governing Board (NAGB) makes the argument that the addition of the NAEP modifier is intended to clarify that the NAEP Proficient level, “is not intended to reflect ‘grade level’ performance expectations, which are typically defined normatively and can vary widely by state and over time. NAEP Proficient may convey a different meaning from other uses of the term ‘proficient’ in common terminology or in reference to other assessments.” Forgoing for now a discussion of whether the NAEP achievement levels are defined any more or less normatively than any other achievement levels, nobody can deny that achievement standards can and do vary widely by state and over time. Consequently, there is confusion when the same label Proficient is used across states and assessments to describe those varying standards.  From that perspective, the label NAEP Proficient serves the purpose of clearly identifying the set of achievement standards against which student performance is being judged.

For long-time critics of the NAEP achievement standards, however, the modifier is another weapon in their fight to marginalize NAEP results. The achievement level results no longer represent what proficient fourth or eighth grade students across the United States should know and be able to do; rather, they simply reflect NAEP Proficient – a mythical concept that is not tied to any state’s grade level standards and expectations.

As assessment/measurement specialists, our professional values and standards have made us unwitting accomplices in the effort to undermine large-scale assessment.  We agree with and/or can be quoted making statements such as

  • test scores should not be considered in isolation,
  • a student’s score on a given day or test might not reflect her/his true performance,
  • multiple measures should be used to evaluate student achievement, or
  • a test score reflects student performance on this test.

In the past, we mastered the art of expanding on those statements via PowerPoint bullets and charts to defend large-scale assessment with winning arguments before policy makers and the courts.  In this era of soundbites, tweets, and memes, however, we may never get that far.

With hubris, we attach a great deal of importance to our work and our high-quality assessments.  Remember, however, that without the ability to generalize student or school performance beyond a particular test we have nothing.  The task before us is clear; and if we envision a future in which large-scale assessment makes a valuable contribution to improving student learning, we must not fail on this test.

Look What You Made Me Do

A 2018 Blog Year in Review

Charlie DePascale


We have reached the end of 2018 and another year of posts on Embrace the Absurd. When I look back at the ten essays posted this year, I think that the phrase that best sums up this year of posts is look what you made me do – and not simply for the obligatory Taylor Swift reference.

A primary theme that ran across my posts this year is that we, as a field, may be just a tiny bit out of control; reactive rather than proactive; allowing ourselves to be defined by others; or perhaps overwhelmed by the moment.

I began 2018 with the post, Implausible Values, discussing the stress and strain being put on the field and our equating infrastructure by demands for shorter tests, alternate tests and adaptive forms, less standardization and more flexibility, more accuracy and precision, and immediate results.  I also wrote of the paradox of taking at least six months to produce results for a few NAEP tests and no more than six days to complete equating for a dozen state assessments.

NAEP returned as a topic in April with, If I Did It, a satirical treatment of the efforts to control mode effect and preserve the trend line in the reporting of the 2017 NAEP Reading and Mathematics state results; an effort which could serve as the poster child for our 2018 theme. I have little doubt that 2017 NAEP results will serve as a cautionary tale in educational policy and measurement courses for generations to come.

Across the year, a trio of posts addressed the broad issues of time, validity, and the essence of educational measurement.  In It’s About Time we address not only the lack of time mentioned above, but also the extent to which our measurements and interpretations are dependent upon and bound by time, and the growing need to incorporate time into our measurement models.  Bring Back Valid Tests addresses our ongoing struggle to develop an operational definition of validity.

In my 2016 NERA presidential address, Living in a Post-Validity World: Cleaning Up Our Messick, I argue that in the nearly 30 years since Messick’s 1989 chapter, we have wandered the desert searching for the Promised Land of a unified theory of validity.  As with many of the constructs that we attempt to measure, we still lack a clear understanding of validity; yet one of our guiding principles is that you have to understand and clearly define something before you can measure it.

This leads to my call for Rebranding Educational Measurement with the argument that the field will be better served both by not only acknowledging, but also embracing the uncertainty in what we do; and included this reminder from the 1951 first edition of Educational Measurement, “[t]he primary concern of measurement, however, should be for an understanding of the entire field of knowledge rather than with statistical or mathematical manipulations upon observations.”

 A Year of Professional and Personal Journeys

2018 was also a year of personal and professional journeys. In Ten Years of Taylor I describe literal and figurative journeys with my daughter across ten years of Taylor Swift concerts from 2008 – 2018.  In My Miss Brooks I describe the 4th and 5th grade class that set me on this assessment/measurement journey nearly fifty years ago; and with the benefit of hindsight reflect on the high-stakes test that awaited at the end of those two years that may not have been as high-stakes as we thought at the time.

And throughout 2018, there were other journeys not noted directly in the blog, including the Red Sox eight-month, 119-win journey from Opening Day to their fourth World Series championship since 2004.  And now we have solid empirical evidence (n=2) that the Red Sox own the first 18 years of the century.

After 25 years of organizing regional conferences in small venues in places like Rocky Hill (CT), Springfield (MA), Buffalo, and Pittsburgh, in April 2018 I finally made it to the big time – a national conference on Broadway. Serving as 2018 NCME co-chair with long-time friend and colleague, April Zenisky, we were able to bring together past, present, and future leaders of our field to reflect on the past, present, and future of the field.

Outside of the conference, New York City brought feelings of awe when standing in the middle of Times Square at night or earlier that day sitting in front of a Renoir painting at the Metropolitan Museum of Art. That feeling was matched, if not surpassed, a month later driving through the mountains of Northern Utah and Southern Idaho on a Sunday afternoon with Marren Morris’ My Church on repeat on iTunes. And then on my first trip outside of North America, there was the incomparable and simply indescribable feeling standing in the middle of Anne Frank’s room in Amsterdam.

A New Year and New Beginnings

Today we look forward to a new year with new journeys, and new beginnings.  For the second time in my career it feels like we are on the cusp of a new era in K-12 assessment and educational measurement. Technology, personalized learning, big data, more complex and higher-order content standards, and a renewed interest in assessment in the classroom have created a perfect storm of challenges and opportunities for assessment and educational measurement. NCME has begun work on the fifth edition of Educational Measurement, which brings with it the opportunity to take the time needed to reflect on where the field is now, how it got here, and the directions it might, could, and dare I suggest, should go in the future.

So, as we begin 2019, let’s renew our commitment to keep the faith, fight the good fight, and as always, embrace the absurd.

A Letter to Santa



Dear Santa,

I am the next generation of large-scale assessment and I am 4 1/2 years old.  I have been very good this year. At least I have tried very hard to be good.  I have been reliable and fair. I think that I have been valid, but Uncle Steve says that’s not for me to decide. I have tried not to do things that I really shouldn’t do like evaluating teachers and promoting little kids from third grade to fourth grade.

Some of the bigger kids try to get me to play in their accountability games.  They like to do all sorts of strange things to my scores before they report them.  I am not even sure that what’s reported are even my scores anymore.  I tell them why can’t you just use percent proficient – everybody understands that.  Andy from across the street just laughs at me, “Ho, Ho, Ho”, and says looking at those percentages is like “viewing progress through a funhouse mirror.” My best friend Joey is even meaner.  He just runs around yelling, “Liar, Liar, Hair on Fire!” I don’t even know what that means.

Santa, it seems like people are always trying to change me.  They want me to be shorter, but they want five performance levels and subscores.  They want me to cost less, but they want to use authentic texts and measure high-level skills. They want me to tell them if kids are on track to be college- and career ready and they don’t even know what that means.  I try to adapt, but it’s really hard.  You know, real people used to take such care in putting me together; now it seems algorithms just grab items off of a shelf like a shopper on Christmas Eve and like magic, Happy Birthday, a test is born ready to administer!  You know Santa, sometimes I don’t even feel like I am the same test when they put me on a computer.


I have to tell you Santa, I am a little worried about 2019.  Can you believe that in a couple of months I have to test NAEP Reading and Mathematics again?  It seems like they just reported results from my 2017 tests.  I hope that goes more smoothly this time around.

And then there are all of things they are asking me to do to assess the next generation science standards.  There are just so many changes and things that have never been tried before. Everyone tells me I look phenomenal, but I am not so sure.

Does anyone really understand what the performance expectations mean?

Has anyone tried to define proficient performance on different combinations of performance expectations?

Has anyone even thought about what proficient performance across a whole science test is supposed to look like?

I am afraid that we might be putting the sleigh before the reindeer here, Santa.

I mean, what’s the rush? I would really hate for this to be the 1990s all over again – the last time they tried to introduce next generation assessments before they were ready.  A whole lot of promising young assessments were cut down before they reached their prime in that purge.

Santa, I can’t take another heartbreak. Lately it feels like everything I do turns into a disaster. I guess I really don’t know what large-scale testing is all about. Santa, isn’t there anyone who knows what large-scale testing is all about?

So Santa, if you can bring me only one gift this year it would be to help people remember the true meaning and purpose of large-scale assessment.  Help them understand where I fit within a coherent and balanced system of assessments.

I know that’s a lot to ask; but I believe, Santa.  I believe.




How Arne Works

Charlie DePascale

During my August trip to Minnesota I was able to check two books off of my summer reading list: Relativity – The Special and the General Theory by Albert Einstein and How Schools Work by Arne Duncan.  As the old joke goes, one was a book that asked me to rethink basic concepts and ideas long-held as fundamental truths, and the other was a book by Einstein.

I will attempt to reconcile Relativity and large-scale assessment in a later post.  Today’s post is devoted to my five takeaways from Arne Duncan and How Schools Work.

how schools work

1. Lies and Incentives

“Education runs on lies.”  This is the first sentence of the first chapter titled Lies, Lies Everywhere.

The in-your-face focus on lies no longer has the same shock value that it did when most of us were introduced to Arne in 2009; no, not after eight years of life in the honesty gap that rolled into the current era of fake news and alternative facts.

What was surprising, however, was how freely he uses the word lie. In some circles, the word lie implies more than a simple departure from the “truth” or reality.  To say that a person has lied or is a liar suggests an intent to deceive or mislead. Arne, however, uses the word lie to describe a broad array of statements and actions that one might refer to as myths, misconceptions, misinterpretations, untested beliefs, or defense mechanisms.  In one example involving a Chicago principal, Arne begins the section stating, “One such principal told this lie directly to Mrs. Daley and me, and I’ll never forget it..”  He ends the same story about the same principal stating, “I loved Chester’s honesty throughout – first when he challenged Mrs. Daley and then when he told me he’d been mistaken about his kids.”

In the end, perhaps actions based on lies or misperceptions cause the same problems and have the same negative impact on children. If your role is to solve those problems, however, understanding whether you are dealing with a lie or a misperception should influence your approach to a solution.  And if you are counting on current teachers, administrators, and policy makers to be part of the solution, starting off by call them liars might not be the best approach.

Incentive is another special word in the Arne lexicon.  Arne rightfully notes the importance for school improvement efforts to include incentives as well as the sticks associated with NCLB.  One example he offers of an incentive, however, is firing Chicago teachers caught cheating on a standardized test. I believe his argument is that the district ensuring that bad behavior is not rewarded is an incentive for the good behavior of all of the other teachers.  A second incentive he discusses is related to the teacher evaluation requirements associated with Race to the Top and the administration’s NCLB waivers.  I don’t know many teachers who viewed state-designed educator evaluation systems as an incentive.

You can only show me a stick and tell me it’s a carrot for so long before I figure out that’s a lie.

2. Story Driven

After reading How Schools Work, it is clear to me that Arne is story-driven.  By story-driven, I am not referring to the many stories that drive the narrative in How Schools Work.  Rather, I am referring to the concept of story-driven described by Bernadette Jiwa in her 2018 book Story Driven – you don’t need to compete when you know who you are.  Story-driven individuals and the organizations they lead have a “clear sense of purpose and identity” that defines and drives them.

Jiwa’s story-driven framework is defined by five words –    Backstory, Values, Purpose, Vision, and Strategy. The backstory is our journey to now, which create our values (guiding beliefs)  and purpose (reason to exist).  In a story driven organization, those are the forces that drive the organization’s vision (aspiration for the future) and strategy (align opportunities, plans, and behavior).

Arne’s backstory that defines his identity, values, and purpose are his experiences growing up in Chicago with his mother’s inner-city after-school program.  As he describes it, the Chicago that he saw with her program was just two miles but a world away from the section of Chicago where he lived. That’s not a bad backstory for a U.S. Secretary of Education.

3. No place for states

Virtually my entire career has been spent closely connected to state departments of education, as an assessment contractor, an employee, and for the last 16 years as a consultant. It appears, however, that state departments play, at best, a minor supporting role in Arne’s world.  At worst, they are another one of the liars, a barrier to improving schools.

There are three direct references to state departments of education that stand out in the book.  The first is a reference to low achievement standards set on the Illinois state assessment; offered as a direct instance of the lies told to students and parents in Chicago and as a general example of the so-called  race to the bottom by states across the country as they prepared for NCLB accountability requirements.  In the second reference, a DOE official in New Jersey is simply a pawn in a story detailing how the arrogance/incompetence of the Christie administration led to the state not being awarded millions of dollars of Race to the Top funding.  The third was a reference to speaking with Connecticut’s “chief education officer” on the day of the Sandy Hook shooting in the emotional and powerful chapter on guns in schools and society.

I guess this should not be a surprise.  Arne made his mark at the district level and it is clear that his vocation is in schools.  He does acknowledge the role that strong (and weak) governors can play in improving education, but like many in education does not seem to have a handle on the role that a state department of education can and should play.

Can the department be more than simply an agent implementing the policies of the federal government, governor or state chief? Can a state department of education be a change agent on its own? It behooves those of us who have centered our careers at the state level to be proactive in answering that question.

4. Time Travel

Reading How Schools Work, I felt that I had traveled back in time.  It is the same feeling that I get when I read remarks from former President Obama; and I am sure I would feel the same if I spent the $500 for an Intimate Conversation with Michele Obama.  It is the sense of hope and change that had me sitting in a store front office in Portsmouth, New Hampshire in the summer and fall of 2007 updating databases and making phone calls for an upstart candidate for president.

Then I remember that it is 2018.  This group had their eight years in office.  Yes, they made some improvements, but they fell far short of achieving their vision.  I understand the obstacles in their way.  What I have not yet determined for myself is how hard they tried to overcome those obstacles. And the frightening thought, if they did do their absolute best then what will it take and how long will it take to truly make a difference?

5. The Public School Model

It may be confirmation bias, but after reading How Schools Work I am convinced now more than ever that our public school model is not only broken but is outdated and is not something that we should try to repair.

To be clear, the ideal and concept of public education (i.e., the right to access for all to a high quality education) is as important as it ever was, arguably more important.

Also, there are fine schools and educators in suburbs, rural towns, and cities across the country where children are receiving a world-class education.

Our general model of K-12 public education, however, is broken at its core.  The funding model is not sustainable. We are well beyond the point where it is possible to fit the student-centered policies of the last 50 years into an educator-centered system.  We have burst through the age-based boundaries of the K-12 system at both ends and we long ago passed the point where the internal markers of grade levels have any meaning.

Everything in Arne’s book from his mother’s after school program to the foundation(s) he founded to his experiences in Chicago and USED to his plans for the future tell us that we need a new model for public education.

Arne and many of the rest of us have spent our lives trying to improve education from within the current system.  Arne’s mother worked outside of the system – although not necessarily by choice. I think that it is time to abandon a K-12 system clinging to a past that no longer exists for a new system that reflects the present and anticipates the future.

My Miss Brooks

Charlie DePascale

Our Miss Brooks was a highly successful comedy series on radio and early television that followed the life and career of a fictional high school English teacher, Connie Brooks. My Miss Brooks, Ann Brooks, was a highly successful teacher of the fifth and sixth grade Advanced Work Class at the Mather School in Dorchester, Massachusetts when I entered her class in the fall of 1969.

The Advanced Work Class (AWC) was, and still is, a program within the Boston Public Schools “that provides an accelerated academic curriculum for highly motivated and academically capable students. Coursework is challenging, and performance standards are high.”  According to BPS and borne out by data, a major benefit of the program is “[s]tudents who successfully complete AWC are well prepared to compete for admission to the three BPS exam schools or to other accelerated programs.”

In my 1969 instantiation of the AWC, 20 students from elementary schools throughout Dorchester (the largest “neighborhood” in Boston) were selected to spend 5th and 6th grade in Miss Brooks’ class at the Mather School.  There may have been some testing involved in the selection process, perhaps including IQ testing, but I was unaware of that.

The class included 10 girls and 10 boys and we were diverse by Boston/Dorchester standards of the time; that is, there were students from Irish and Italian backgrounds (along with a few other ethnic groups) and the class was 90% white.  We were from a mix of blue- and white-collar middle class families. Almost all of the original 20 students completed the two years, but there were a couple of replacements along the way.

From the beginning of the fifth grade, the openly acknowledged goal was that at the end of the two-year program all of us would pass the entrance exam to one of the city’s Latin schools: Boston Latin School (aka Boys Latin) for the boys and Girls Latin for the girls.  (The two single-sex grade 7-12 Latin schools became coed as we were entering the eighth grade and remain separate coeducational schools today.)

Although passing the standardized, multiple-choice test administered in the spring of sixth grade was the goal, as I think back there is nothing that I recall from those two years that now would be considered test prep. I am certain that I am forgetting some things through the fog of 50 years.  Surely, we must have had some basic English and mathematics lessons.  There were quizzes, tests, grades, and lots of homework. Those things, however, were not what defined the class, and they are not what I remember from this pivotal time in my K-12 school career.

It was clear that this was going to be a different experience the moment we walked through the door of Room 8 at the Mather School. For the first time since kindergarten, this was not a classroom with rows of wooden desks bolted to the floor.  This room contained shiny modern desks that were arranged around the room in four u-shaped clusters of five, but could be easily rearranged or cleared away, when necessary – and there were plenty of times when it was necessary.  And then there was the first class activity.

A boy and girl were selected to stand at the front of the class, introduce themselves to each other and have a conversation.  Looking back, we could have gone with where did you go to school last year or what did you do this summer; and sure, we were just weeks removed from minor events like the first moon landing, Woodstock, and Chappaquiddick.  But, standing at the front of that class we had nothing but fidgeting and uncomfortable silence. I tried unsuccessfully for two years to talk to that little red-haired girl…  No wait, I was the one with red hair and that’s a different Charlie’s story.

Anyway, those first awkward conversations were just the beginning of two years of constant interacting, collaborating, performing, and celebrating with each other. The biggest event was the annual Christmas play our class performed; rehearsals throughout the fall culminating in two school-wide performances – for grades 1-3 and 4-6. These were full performances with hand-painted, wood-frame sets, costumes, and props.  Our 5th grade performance of Charles Dickens’ A Christmas Carol was followed in the 6th grade with the heart-wrenching The Birds’ Christmas Carol by Kate Douglas Wiggin. (It was at the class Christmas party following the 6th grade performance that I learned that Jeremiah was a bullfrog.)

In addition to the Christmas plays, other examples of special activities included.

  • Our class newspaper complete with school and local news, sports, entertainment, and comic sections. Mimeographed copies were widely distributed.
  • The Greek festival at the end of our unit on Ancient Greece where we made presentations, displayed the results of our efforts working with wet clay, and most of us had our first taste of feta cheese and baklava.
  • Our performance of Raindrops Keep Fallin’ On My Head, in costume, and in French at the annual schoolwide Mother and Daughter night.
  • Keeping with the French theme, the end-of-the-year French festival where we tried our hand at making various French dishes and produced a mimeographed collection of recipes. The French custard recipe became a Father’s Day tradition at our house.

All of those activities supplemented the discussions, collaborative projects, and presentations that were a regular part of our daily routine. And we constantly rearranged those desks into various small groups where we pushed, challenged, and supported each other.

In the spring of sixth grade we all passed the standardized entrance exam and were admitted into our respective Latin schools. And six (or seven) years later, most of us graduated from either Boston Latin School or the newly named Boston Latin Academy. We were prepared for the school and not simply for the test.

As I look back on it now, preparing us to succeed at the school was much more important than preparing us for the test because, in reality, the entrance exam was not a high-stakes or high-risk test for us. Yes, the Latin schools were selective and admission was competitive.  In the early 1970s, I estimate that there were about 10,000 sixth graders in the Boston Public Schools. If equally divided among boys and girls that would be 5,000 boys competing for the approximately 500 seventh grade seats at Boston Latin.

There was probably little doubt, however, that our carefully selected group of 10 boys would perform in the top 10% of BPS students on the entrance exam. What we didn’t understand at the time was that the hard part was staying in the school and graduating.  Although approximately 500 students entered in the seventh grade in 1971 and an additional batch of students entered our class in the ninth grade, our graduating class in 1977 had just over 200 students. During 7th grade orientation we received the Latin school version of look at the boy on your left and the boy on your right, two of you won’t be here by 12th grade.

In 1971, the entrance exam was a broad net, collecting three times as many students as would ultimately graduate.  Additional filtering was done at the school.   There was a large tolerance for selection error on the test.

The admissions math changed, of course, the very next year when the school became coed, potentially doubling the pool of applicants.  It changed again as enrollment in the Boston Public Schools dwindled and a much greater portion of the Latin School class came from private elementary schools. And at some point in the intervening years, the admissions philosophy changed.  The goal was to do what was necessary to ensure that all admitted students had the opportunity to make it to graduation.  Last year, Boston Latin had 417 seventh grade students and 412 twelfth grade students.

All of the changes described above raise the stakes associated with the entrance exam.  I wonder what the impact has been on the Advanced Work Class.

room 8

Recipe and Play Script