assessment, accountability, and other important stuff

A good day ruined

Charlie DePascale

After a wonderful late summer day spent enjoying a rare weekday afternoon baseball game in Boston, I sat down last night and looked at my Twitter feed.  There among the trending items was this headline

The University of Texas’s Secret Strategy to Keep Out Black Students

Without even clicking to look at the article I knew without a doubt that this ‘secret strategy’ had to involve standardized testing.  Is this how gun manufacturers, drug companies, and those e-cigarette people feel?

Standardized tests don’t kill people …

Where will this end?

Will NCME soon be thought of in the same way as the NRA – but you know, without the money, membership, or political clout? The president of AERA fired a direct shot at testing last spring with her presidential address, “An Inconvenient Truth About the New Jim Crow of Education” – a catchy title.

Will San Francisco lawmakers be next to set their sights on NCME – labeling it a domestic terrorist organization?  Will there be a resolution to block NCME from holding its 2020 conference in San Francisco?  If so, will AERA support it?  After all, it wouldn’t be the first time AERA moved a conference in California in the name of supporting a social cause.

What about New York, already regarded as the Lexington and Concord of the opt-out revolution?  Will the governor and legislature take aim at NCME and deny state funds to people doing business with the testing industry?  What will the pineapple say?

Where do we go from here?

Is there a way to stem this anti-testing tide and restore the shine to this field of ours? (dare I say, to make testing great again. no, I think I’ll pass on that.)

I can support increased background checks for all test users.  I am more skeptical about federal- or state-mandated bans or limits on state and local testing.  Perhaps those efforts can reduce the damage caused by high-stakes census testing, but an ill-conceived and poorly-developed teacher-made test in the hands of an inexperienced teacher can still cause a lot of harm one child at a time.

Standards, lists of best and fair testing practices, and policy statements are necessary, but not sufficient.  I am pretty sure that I already read somewhere that it’s not acceptable practice to base a high-stakes decision on a single test score.

Improving the assessment literacy of all involved in the testing process including students, teachers, policy makers, the media, the general public, and psychometricians is a good place to start.  I know some folks in New Hampshire who are doing some nice work in that area. (sorry, no names or links.  need to keep that personal/professional firewall intact.)

Improved assessment literacy, of course, won’t stop people who want to use testing for unsavory or evil purposes from doing their thing.  It might, however, make others more aware of when testing is being used to do harm; and make them more likely to speak out; and make them less easily persuaded to opt out.

The trend toward moving the locus of assessment from the state house to the classroom also seems like a step in the right direction. Not only will that put actionable information in the hands of teachers, where it belongs; that shift will help eliminate problems like the one faced by Massachusetts last spring. As we have known for a long time, passage-based triggers are much easier to avoid in the classroom than on a state assessment.

Improving local assessment is a step, however, that will require tremendous investment in the infrastructure of teacher preparation programs and schools.  Remember that one of the big advantages of large-scale assessment is that it’s cheap and doesn’t require much training of teachers and school administrators.

Advances in instructional and assessment technology, personalized learning systems, and modeling based on a much broader base of data than test scores and student attendance also has a great deal of promise, but will not be without its own technical and social challenges.

I look back fondly on the days when we were accused of trying to peek into family life with our student survey questions on how much time was spent watching television each night or whether the student had a part-time job; of trying to brainwash students with our questions about the environment; or of simply trying to get students to fail with our trick questions that contained plausible distractors.  So, yes, tracking and developing psychometric models for a student’s eye movement, heart rate, and other signals of student engagement might face some resistance.

For now, however, I will go back to Twitter and read about climate change.  That ‘inconvenient truth’ reference aside, I am pretty sure testing isn’t being blamed for climate change – well, not yet – or maybe I’ll just listen to some music.



yes no


Charlie DePascale

Now that the administration has dropped efforts to include a citizenship question on the 2020 Census, perhaps there is space on the form for the proficiency question, “Is this person college-and-career ready?”  For persons 18 and under, the question would be, “Is this person on track to college-and-career readiness?”

Think about it. We ask the question in April 2020 and by December 31st we have a national count of the number of college-and-career ready residents in the United States.  By March 31, 2021 we have state-level counts disaggregated by race, ethnicity, and other key demographic factors.  In about the same amount of time that it took to produce the 2017 NAEP Reading and Mathematics results, we would have proficiency information for the entire U.S. resident population instead of the small portion of the population captured by that ill-defined social construct grade level.

The federal government could then make decisions about how much money to allocate to programs designed to improve college-and-career readiness and how best to distribute that funding across the states (just as they do with other information collected through the Census). States could begin to redesign their early childhood, K-12, postsecondary, and adult education programs to better meet the needs of their residents (just as they do with other information collected through the Census).

I bet that you are thinking, but Charlie, just how accurate could that self-reported information possibly be?  Well, you see, accuracy is a funny concept; it’s one of those eye of the beholder, depends on what the meaning of “is” is type of things.  Would a U.S. Census count of college-and-career readiness be any more or less accurate than the differences we have had in proficiency estimates among the 50 states or between states and NAEP?  Would the actions triggered by a U.S. Census count of college-and-career readiness be any more or less appropriate than actions resulting from the wide variations across states in the percentage of schools identified for support and improvement under ESSA accountability systems?

Or perhaps you are thinking, but Dr. DePascale – psychometrician – a single Census question on college-and-career readiness is not measurement.  Where are the “big 5” sources of validity evidence?  Where are the external alignment studies?  Where is the USED Peer Review?

All valid points, but here’s the thing, federal and state assessment policy has never been about measurement.  As I have argued in previous posts, determining the percentage of students in a state who have met minimal competency standards, attained proficiency,  are on track to college-and-career readiness or who have made progress from fall to spring is now, always has been, and always will be, at its core, a data collection problem and not a measurement problem.

At one time, the most efficient and accurate way to solve that data collection problem was with a large-scale state assessment; that is, with a short, on demand, machine-scored test administered to students in the general education program at selected grade levels.  But that time was a long time ago.  Policies and laws on inclusion changed.   The student population became more diverse. Content and performance standards became more rigorous and complex.

Thought experiment: Imagine you have  placed a group of experts in a room (or even a group of testing company psychometricians), and tasked them with coming up with the most efficient and effective way to determine the number or percentage of students in a state who are on track to college-and-career readiness or the number and percentage of high school graduates who are college-and-career ready.  If you don’t like a group of experts, you can crowd-source the task or use artificial intelligence to solve it.

Whatever approach you take, it is highly unlikely that the solution that is generated will be a single, on demand, end-of-year, state assessment.  If you expand the task to determining the number for the country rather than a single state, I guarantee that the solution will not be 40-50 unique state assessments.

The solution may include a limited amount of state and federal assessment (e.g., something like NAEP), but it is virtually certain that the solution will be more centered on quality data collection than on high quality assessment; and if we are looking for a data collection solution, what better place to begin than the U.S. Census Bureau.  Their self-described mission is “to serve as the nation’s leading provider of quality data about its people” with the goal “to provide the best mix of timeliness, relevancy, quality and cost for the data we collect and services we provide.”  Does any state department of education assessment or any testing company claim the same mission and goal?  Would we want them to?

Where would we begin?

So, with the proficiency question on the 2020 Census where would we begin to ensure the most accurate count possible?  The first step would probably be to develop a common definition of college-and-career readiness that we want people to use when answering the question.  The next step might be a public education campaign to get the public on board with the importance of collecting the information. That campaign undoubtedly would include clear descriptions and real-life examples of college-and-career readiness or of being on track to college-and-career readiness – descriptions that people can easily grasp and apply to themselves and the people in their home.

Now you may be asking yourself aren’t those the same things that we should do when introducing a new set of content standards or assessment program.  The answer, of course, is yes; but often those steps are forgotten or are given insufficient attention and resources when the focus is on building a better assessment or accountability system rather than on collecting better data.

There have been efforts at such public relations campaigns in the past, and they have been somewhat successful.  When the MCAS tests and new performance standards were introduced in Massachusetts in the late 1990s, “What Does Proficient Look Like” workshops were held in communities across the state and “Test Yourself” brochures were distributed at toll booths, grocery stores, and public libraries.  When the Common Core State Standards were introduced, it was impossible to watch a professional golf tournament on network television without seeing a “Support the Common Core” commercial sponsored by EXXON or some other major corporation (yes, that sentence was intentionally Bidenesque).

Massachusetts no longer has toll booths, people buy groceries online, and public libraries are being repurposed to meet the changing needs of communities.  Women’s soccer matches may be a better option than professional golf tournaments for spending advertising dollars (at least every four years).  Yes, the medium will change, but the message and the need for the message remains the same.

We can develop the best large-scale assessment ever imagined; but at the end day and at the of the school year, if every teacher, parent, and student cannot give an accurate answer to the question “Is this person on track to college-and-career readiness?” without looking at a score on a state assessment, what have we really accomplished?

Gold Standard?




Charlie DePascale

Disclaimer:  I did not have access to my laptop and was forced to prepare this post on my Surface tablet. I apologize in advance for any effect that had on the length or quality of the post.

By any metric, 2017 was, and continues to be, a very bad year for NAEP.  Troubles began in April 2018 with the utter fiasco that was the long-delayed release of the 2017 Reading and Mathematics results.  Seldom in the course of human history have so many good statistics been sacrificed in the name of preserving an illusory trend line.  Then late last week came the announcement that no amount of statistical sleight of hand could save the results from the 2017 Writing assessment.  (And by announcement, I mean burying information deep on a website on the first Friday of summer that no results were forthcoming and that a more detailed report would be available in the spring of 2020.)

Perhaps, however, the worst news for NAEP was when they announced that they were not releasing results from a major assessment, few people noticed and fewer people cared.

Where does all of this leave NAEP as we await the results from the 2019 NAEP Reading and Mathematics assessments?

As I started to write this post, I will admit that I was feeling a bit cynical toward NAEP and my original title was ‘Gold Standard, My Ass!’

But then I thought, who among us hasn’t wanted just 3 more days or even 3 more hours to figure out what the hell was going on as we tried to equate writing results across years. If NAEP can lead the way and establish 3 years as an acceptable time frame, more power to them.

Plus, there have been plenty of times in the last 30 years when I have had to help state leaders make the hard decision between making necessary changes to their Reading and Mathematics tests or preserving their reporting scale and trend line.  If NAEP can lead the way on having your cake and eating it, too, pass me another slice.

And why should a state go through all of those hoops to convince USED that they really did administer a test and hold all students accountable this year if it is possible to just decide not to report results.  Be that shining light in the darkness, NAEP! We will follow!

Still not totally sure which direction I should go with this post, I thought a little bit more about the term ‘gold standard’ and what it represents.

There are several things about NAEP that do make it a symbol of the ideal in large-scale assessment:

  • Testing periodically rather than every year.
  • Testing intermittently across grade levels rather than at every grade level
  • Testing samples of students rather than all students
  • Using matrix sampling to improve the sampling of content on each assessment
  • Separate scaling of domain areas so that subscores might actually be useful
  • Demonstrating a total disdain for deadlines in the name of getting it right
  • The willingness to serve as an example of how difficult it is to set meaningful performance standards on a large-scale assessment.

Those things should be more than enough to offset the lack of transparency and no individual student scores and establish NAEP as an ideal; that is, a gold standard.

The final thing that turned me around on NAEP as a gold standard, however, was remembering that the gold standard is an antiquated monetary concept that was abandoned by virtually all nations decades ago; it is an anachronism that simply no longer works in the real world.

So NAEP, I owe you an apology.  Feel free to hold firm as the old white male of assessments in the changing world of 2019.  In so many ways, you are and will always be the gold standard.

Charlie DePascale

This year marks the 25th anniversary of the 1994 reauthorization of ESEA, known as the Improving America’s Schools Act (IASA).  Throughout the year, I will explore how various aspects of that law shaped my career, educational assessment and accountability, and K-12 education, in general. All of this will be done, of course, with an eye toward the next reauthorization of ESEA and the future of K-12 assessment and accountability.

As we begin the year, however, let’s just take a few minutes to refresh our memories on the thoughts about equity, excellence, and education that drove the 1994 law. Sometimes it’s not necessary to write anything new.  The words speak for themselves. I call particular attention to the middle section titled, What Has Been Learned Since 1988.






‘‘(1) IN GENERAL.—The Congress declares it to be the policy of the United States that a high-quality education for all individuals and a fair and equal opportunity to obtain that education are a societal good, are a moral imperative, and improve the life of every individual, because the quality of our individual lives ultimately depends on the quality of the lives of others.

‘‘(2) ADDITIONAL POLICY.—The Congress further declares it to be the policy of the United States to expand the program authorized by this title over the fiscal years 1996 through 1999 by increasing funding for this title by at least $750,000,000 over baseline each fiscal year and thereby increasing the percentage of eligible children served in each fiscal year with the intent of serving all eligible children by fiscal year 2004.

‘‘(b) RECOGNITION OF NEED.—The Congress recognizes that—

‘(1) although the achievement gap between disadvantaged children and other children has been reduced by half over the past two decades, a sizable gap remains, and many segments of our society lack the opportunity to become well educated;

‘‘(2) the most urgent need for educational improvement is in schools with high concentrations of children from low income families and achieving the National Education Goals will not be possible without substantial improvement in such schools;

‘‘(3) educational needs are particularly great for low-achieving children in our Nation’s highest-poverty schools, children with limited English proficiency, children of migrant workers, children with disabilities, Indian children, children who are neglected or delinquent, and young children and their parents who are in need of family-literacy services;

‘‘(4) while title I and other programs funded under this Act contribute to narrowing the achievement gap between children in high-poverty and low-poverty schools, such programs need to become even more effective in improving schools in order to enable all children to achieve high standards; and

‘‘(5) in order for all students to master challenging standards in core academic subjects as described in the third National Education Goal described in section 102(3) of the Goals 2000: Educate America Act, students and schools will need to maximize the time spent on teaching and learning the core academic subjects.

‘‘(c) WHAT HAS BEEN LEARNED SINCE 1988.—To enable schools to provide all children a high-quality education, this title builds upon the following learned information:

‘‘(1) All children can master challenging content and complex problem-solving skills. Research clearly shows that children, including low-achieving children, can succeed when expectations are high and all children are given the opportunity to learn challenging material.

‘‘(2) Conditions outside the classroom such as hunger, unsafe living conditions, homelessness, unemployment, violence, inadequate health care, child abuse, and drug and alcohol abuse can adversely affect children’s academic achievement and must be addressed through the coordination of services, such as health and social services, in order for the Nation to meet the National Education Goals.

‘‘(3) Use of low-level tests that are not aligned with schools’ curricula fails to provide adequate information about what children know and can do and encourages curricula and instruction that focus on the low-level skills measured by such tests.

‘‘(4) Resources are more effective when resources are used to ensure that children have full access to effective high-quality regular school programs and receive supplemental help through extended-time activities.

‘‘(5) Intensive and sustained professional development for teachers and other school staff, focused on teaching and learning and on helping children attain high standards, is too often not provided.

‘‘(6) Insufficient attention and resources are directed toward the effective use of technology in schools and the role technology can play in professional development and improved teaching and learning.

‘‘(7) All parents can contribute to their children’s success by helping at home and becoming partners with teachers so that children can achieve high standards.

‘‘(8) Decentralized decisionmaking is a key ingredient of systemic reform. Schools need the resources, flexibility, and authority to design and implement effective strategies for bringing their children to high levels of performance. ‘‘(9) Opportunities for students to achieve high standards can be enhanced through a variety of approaches such as public school choice and public charter schools.

‘‘(10) Attention to academics alone cannot ensure that all children will reach high standards. The health and other needs of children that affect learning are frequently unmet, particularly in high-poverty schools, thereby necessitating coordination of services to better meet children’s needs.

‘‘(11) Resources provided under this title can be better targeted on the highest-poverty local educational agencies and schools that have children most in need.

‘‘(12) Equitable and sufficient resources, particularly as such resources relate to the quality of the teaching force, have an integral relationship to high student achievement.

‘‘(d) STATEMENT OF PURPOSE.—The purpose of this title is to enable schools to provide opportunities for children served to acquire the knowledge and skills contained in the challenging State content standards and to meet the challenging State performance standards developed for all children. This purpose shall be accomplished by—

‘‘(1) ensuring high standards for all children and aligning the efforts of States, local educational agencies, and schools to help children served under this title to reach such standards;

‘‘(2) providing children an enriched and accelerated educational program, including, when appropriate, the use of the arts, through schoolwide programs or through additional services that increase the amount and quality of instructional time so that children served under this title receive at least the classroom instruction that other children receive;

‘‘(3) promoting schoolwide reform and ensuring access of children (from the earliest grades) to effective instructional strategies and challenging academic content that includes intensive complex thinking and problem-solving experiences;

‘‘(4) significantly upgrading the quality of instruction by providing staff in participating schools with substantial opportunities for professional development;

‘‘(5) coordinating services under all parts of this title with each other, with other educational services, and, to the extent feasible, with health and social service programs funded from other sources;

‘‘(6) affording parents meaningful opportunities to participate in the education of their children at home and at school;

‘‘(7) distributing resources, in amounts sufficient to make a difference, to areas and schools where needs are greatest;

‘‘(8) improving accountability, as well as teaching and learning, by using State assessment systems designed to measure how well children served under this title are achieving challenging State student performance standards expected of all children; and

‘‘(9) providing greater decisionmaking authority and flexibility to schools and teachers in exchange for greater responsibility for student performance.






Three Little Words


Charlie DePascale

Life is full of three-word phrases.

Some tend to have profound and lasting consequences that extend far beyond what may have been intended when they were uttered.  Phrases such as I Love You, That Looks Safe, and for those among us wavering on new year’s resolutions, Just One Bite might fall into this category.

Other ubiquitous three-word phrases like While Supplies Last, Limited Time Offer, Exclusions May Apply, and Void Where Prohibited function exactly as intended; even if we are usually not happy to see them.  Often hidden in the fine print, their sole purpose is to put constraints on an offer or claim that is being made.

In the last couple of years, a three-word phrase has begun to make its way into the assessment lexicon – on this test.  At first glance, the phrase, or a close variation of it, seems neither new nor threatening when used to describe student or group performance. Charlie spelled 23 words correctly on this week’s spelling test. Karla met the college readiness benchmark on the SAT. In Vermont, 42% of grade 5 students performed at the Proficient level or higher on the Smarter Balanced mathematics test. Taken at face value, the phrase is used simply to identify the test that was taken.

Recent use of this common phrase, however, is intended to do much more than identify the source of performance.  Its purpose is to limit interpretation of student or school performance; to make it clear that the performance should be interpreted within the specific framework of the test or testing program.

Again, at first glance, we might regard this use of the phrase as innocuous or perhaps even a step forward in test use and interpretation.  Identifying the source of a test score seems quite consistent with many of our Standards for Educational and Psychological Testing, beginning with Standard 1.0:

Clear articulation of each intended test score interpretation for a specified use should be set forth, and appropriate validity evidence in support of each intended interpretation should be provided.

When considered as part of a larger effort to marginalize and vilify large-scale assessment, however, the connotation of the phrase on this test changes dramatically. It is the second punch in a one-two combination intended to knock out large-scale assessment.  The left jab that has weakened the credibility of large-scale assessment is the argument “Scores on large-scale assessments are ______” – fill in the blank with your favorite criticism: not valid, unfair, inaccurate, not representative, unstable, insufficient, not authentic, etc.  Now with the right cross of on this test, critics of large-scale assessment (or its uses) seek to nullify test scores by limiting their interpretation to that already weakened large-scale assessment.

Even the most well-designed assessment program can sustain only so many of these blows before collapsing in a heap to the canvas.


My first encounters with the on this test crowd occurred while working with two states setting achievement standards on their new college-and-career-readiness tests.  A vocal minority in both states were adamant that the phrase on this test should be added to each achievement level description. Their stated intent was to convey that students’ performance on the state assessment was not representative of their overall level of achievement.

My most recent encounter came late last year in a Stephen Sawchuk post in Edweek about the decision to add the modifier NAEP in front of the achievement level classifications on the National Assessment of Educational Progress; as in NAEP Basic, NAEP Proficient, NAEP Advanced. As stated in the post, “[t]he rewording may seem awfully minor to the uninitiated. But there’s a deeper subtext behind the changes, and that’s why this is worth noting.”

For their part, the National Assessment Governing Board (NAGB) makes the argument that the addition of the NAEP modifier is intended to clarify that the NAEP Proficient level, “is not intended to reflect ‘grade level’ performance expectations, which are typically defined normatively and can vary widely by state and over time. NAEP Proficient may convey a different meaning from other uses of the term ‘proficient’ in common terminology or in reference to other assessments.” Forgoing for now a discussion of whether the NAEP achievement levels are defined any more or less normatively than any other achievement levels, nobody can deny that achievement standards can and do vary widely by state and over time. Consequently, there is confusion when the same label Proficient is used across states and assessments to describe those varying standards.  From that perspective, the label NAEP Proficient serves the purpose of clearly identifying the set of achievement standards against which student performance is being judged.

For long-time critics of the NAEP achievement standards, however, the modifier is another weapon in their fight to marginalize NAEP results. The achievement level results no longer represent what proficient fourth or eighth grade students across the United States should know and be able to do; rather, they simply reflect NAEP Proficient – a mythical concept that is not tied to any state’s grade level standards and expectations.

As assessment/measurement specialists, our professional values and standards have made us unwitting accomplices in the effort to undermine large-scale assessment.  We agree with and/or can be quoted making statements such as

  • test scores should not be considered in isolation,
  • a student’s score on a given day or test might not reflect her/his true performance,
  • multiple measures should be used to evaluate student achievement, or
  • a test score reflects student performance on this test.

In the past, we mastered the art of expanding on those statements via PowerPoint bullets and charts to defend large-scale assessment with winning arguments before policy makers and the courts.  In this era of soundbites, tweets, and memes, however, we may never get that far.

With hubris, we attach a great deal of importance to our work and our high-quality assessments.  Remember, however, that without the ability to generalize student or school performance beyond a particular test we have nothing.  The task before us is clear; and if we envision a future in which large-scale assessment makes a valuable contribution to improving student learning, we must not fail on this test.

Look What You Made Me Do

A 2018 Blog Year in Review

Charlie DePascale


We have reached the end of 2018 and another year of posts on Embrace the Absurd. When I look back at the ten essays posted this year, I think that the phrase that best sums up this year of posts is look what you made me do – and not simply for the obligatory Taylor Swift reference.

A primary theme that ran across my posts this year is that we, as a field, may be just a tiny bit out of control; reactive rather than proactive; allowing ourselves to be defined by others; or perhaps overwhelmed by the moment.

I began 2018 with the post, Implausible Values, discussing the stress and strain being put on the field and our equating infrastructure by demands for shorter tests, alternate tests and adaptive forms, less standardization and more flexibility, more accuracy and precision, and immediate results.  I also wrote of the paradox of taking at least six months to produce results for a few NAEP tests and no more than six days to complete equating for a dozen state assessments.

NAEP returned as a topic in April with, If I Did It, a satirical treatment of the efforts to control mode effect and preserve the trend line in the reporting of the 2017 NAEP Reading and Mathematics state results; an effort which could serve as the poster child for our 2018 theme. I have little doubt that 2017 NAEP results will serve as a cautionary tale in educational policy and measurement courses for generations to come.

Across the year, a trio of posts addressed the broad issues of time, validity, and the essence of educational measurement.  In It’s About Time we address not only the lack of time mentioned above, but also the extent to which our measurements and interpretations are dependent upon and bound by time, and the growing need to incorporate time into our measurement models.  Bring Back Valid Tests addresses our ongoing struggle to develop an operational definition of validity.

In my 2016 NERA presidential address, Living in a Post-Validity World: Cleaning Up Our Messick, I argue that in the nearly 30 years since Messick’s 1989 chapter, we have wandered the desert searching for the Promised Land of a unified theory of validity.  As with many of the constructs that we attempt to measure, we still lack a clear understanding of validity; yet one of our guiding principles is that you have to understand and clearly define something before you can measure it.

This leads to my call for Rebranding Educational Measurement with the argument that the field will be better served both by not only acknowledging, but also embracing the uncertainty in what we do; and included this reminder from the 1951 first edition of Educational Measurement, “[t]he primary concern of measurement, however, should be for an understanding of the entire field of knowledge rather than with statistical or mathematical manipulations upon observations.”

 A Year of Professional and Personal Journeys

2018 was also a year of personal and professional journeys. In Ten Years of Taylor I describe literal and figurative journeys with my daughter across ten years of Taylor Swift concerts from 2008 – 2018.  In My Miss Brooks I describe the 4th and 5th grade class that set me on this assessment/measurement journey nearly fifty years ago; and with the benefit of hindsight reflect on the high-stakes test that awaited at the end of those two years that may not have been as high-stakes as we thought at the time.

And throughout 2018, there were other journeys not noted directly in the blog, including the Red Sox eight-month, 119-win journey from Opening Day to their fourth World Series championship since 2004.  And now we have solid empirical evidence (n=2) that the Red Sox own the first 18 years of the century.

After 25 years of organizing regional conferences in small venues in places like Rocky Hill (CT), Springfield (MA), Buffalo, and Pittsburgh, in April 2018 I finally made it to the big time – a national conference on Broadway. Serving as 2018 NCME co-chair with long-time friend and colleague, April Zenisky, we were able to bring together past, present, and future leaders of our field to reflect on the past, present, and future of the field.

Outside of the conference, New York City brought feelings of awe when standing in the middle of Times Square at night or earlier that day sitting in front of a Renoir painting at the Metropolitan Museum of Art. That feeling was matched, if not surpassed, a month later driving through the mountains of Northern Utah and Southern Idaho on a Sunday afternoon with Marren Morris’ My Church on repeat on iTunes. And then on my first trip outside of North America, there was the incomparable and simply indescribable feeling standing in the middle of Anne Frank’s room in Amsterdam.

A New Year and New Beginnings

Today we look forward to a new year with new journeys, and new beginnings.  For the second time in my career it feels like we are on the cusp of a new era in K-12 assessment and educational measurement. Technology, personalized learning, big data, more complex and higher-order content standards, and a renewed interest in assessment in the classroom have created a perfect storm of challenges and opportunities for assessment and educational measurement. NCME has begun work on the fifth edition of Educational Measurement, which brings with it the opportunity to take the time needed to reflect on where the field is now, how it got here, and the directions it might, could, and dare I suggest, should go in the future.

So, as we begin 2019, let’s renew our commitment to keep the faith, fight the good fight, and as always, embrace the absurd.

A Letter to Santa



Dear Santa,

I am the next generation of large-scale assessment and I am 4 1/2 years old.  I have been very good this year. At least I have tried very hard to be good.  I have been reliable and fair. I think that I have been valid, but Uncle Steve says that’s not for me to decide. I have tried not to do things that I really shouldn’t do like evaluating teachers and promoting little kids from third grade to fourth grade.

Some of the bigger kids try to get me to play in their accountability games.  They like to do all sorts of strange things to my scores before they report them.  I am not even sure that what’s reported are even my scores anymore.  I tell them why can’t you just use percent proficient – everybody understands that.  Andy from across the street just laughs at me, “Ho, Ho, Ho”, and says looking at those percentages is like “viewing progress through a funhouse mirror.” My best friend Joey is even meaner.  He just runs around yelling, “Liar, Liar, Hair on Fire!” I don’t even know what that means.

Santa, it seems like people are always trying to change me.  They want me to be shorter, but they want five performance levels and subscores.  They want me to cost less, but they want to use authentic texts and measure high-level skills. They want me to tell them if kids are on track to be college- and career ready and they don’t even know what that means.  I try to adapt, but it’s really hard.  You know, real people used to take such care in putting me together; now it seems algorithms just grab items off of a shelf like a shopper on Christmas Eve and like magic, Happy Birthday, a test is born ready to administer!  You know Santa, sometimes I don’t even feel like I am the same test when they put me on a computer.


I have to tell you Santa, I am a little worried about 2019.  Can you believe that in a couple of months I have to test NAEP Reading and Mathematics again?  It seems like they just reported results from my 2017 tests.  I hope that goes more smoothly this time around.

And then there are all of things they are asking me to do to assess the next generation science standards.  There are just so many changes and things that have never been tried before. Everyone tells me I look phenomenal, but I am not so sure.

Does anyone really understand what the performance expectations mean?

Has anyone tried to define proficient performance on different combinations of performance expectations?

Has anyone even thought about what proficient performance across a whole science test is supposed to look like?

I am afraid that we might be putting the sleigh before the reindeer here, Santa.

I mean, what’s the rush? I would really hate for this to be the 1990s all over again – the last time they tried to introduce next generation assessments before they were ready.  A whole lot of promising young assessments were cut down before they reached their prime in that purge.

Santa, I can’t take another heartbreak. Lately it feels like everything I do turns into a disaster. I guess I really don’t know what large-scale testing is all about. Santa, isn’t there anyone who knows what large-scale testing is all about?

So Santa, if you can bring me only one gift this year it would be to help people remember the true meaning and purpose of large-scale assessment.  Help them understand where I fit within a coherent and balanced system of assessments.

I know that’s a lot to ask; but I believe, Santa.  I believe.