Papers

Selected papers, book chapters, presentations, and publications
Overcoming Barriers to School-based, Large-Scale Assessment with the Support of Artificial Intelligence Tools (2025)

Despite decades of technological advances and calls for more authentic assessment, large-scale assessment in elementary and secondary schools in the United States remains characterized by infrequent, external, on-demand, standardized tests consisting primarily of selected-response items. It could even be argued that advances in technology that have facilitated the growth of computer-based and computer-adaptive testing have exacerbated the issue by being more focused on increasing eBiciency than on enhancing authenticity, relevance, and utility. In many ways, the type of state-supported, school-based and curriculum-embedded authentic assessment of higher order thinking and 21st century skills envisioned in the 1990s seem farther from reality than ever before. The advent of artificial intelligence tools that can be applied to support both assessment and instruction, however, oBers the promise of overcoming the barriers to state-supported, school-based, large-scale assessment.

This paper was presented at the 8th International Association for Innovations In Educational Assessment annual conference.

The Time Trap: Why it’s misguided to report state assessment results as “years of learning” (2024)

with Damian Betebenner

In the wake of the COVID-19 pandemic, educators and policymakers have scrambled to assess the impact on student learning. Popular metrics that have gained traction are the notions of “years of learning lost” or “months behind,” which attempt to quantify the educational setbacks caused by the pandemic. The allure of these time-based metrics is understandable; they provide a seemingly straightforward way to communicate the magnitude of learning loss to non-technical audiences. But beneath their simplicity lies a complex web of assumptions, statistical manipulations, and, ultimately, misleading conclusions that may do more harm than good.

This paper presented at NCME 2024 delves into the flaws of time-based metrics and argues for a more meaningful approach to measuring student progress. It challenges the assumption that more time automatically equates to more learning and advocates for assessments that provide richer, content-based insights into student abilities and needs. By critically examining the current narrative, the paper seeks to inspire educators and policymakers to reconsider how we measure and respond to educational challenges in a post-pandemic world.

Assessment to Inform Teaching and Learning (2024)

with Susan Brookhart

Focusing on assessment of what students know and can do in relation to school learning goals with the intent of (a) informing teacher instructional planning and instructional moves and (b) informing students’ thinking and understanding during classroom lessons, this chapter reviews the purpose and uses, design, measurement considerations, implementation in practice, and impact on learning of eight types of assessment. The review supports the thesis that in recent years, expectations have risen that all assessments—not just those designed to be used formatively—should inform teaching and learning. The nature of assessment information is increasingly broad. Its utility depends on both the quality of information and the quality of the process with which the assessment is implemented. A corollary of this trend is increasing demand on the level of teacher and student assessment literacy. From this foundation, future trends in assessment to inform teaching and learning are identified.

pre-print of chapter from the fifth edition of Educational Measurement (expected publication date, 2025)

Teaching Literacy – A Holistic Reframing of Teacher Assessment Literacy (2021)

There has been longstanding and widespread agreement that some degree of assessment literacy is an essential component of effective instruction. It is also universally acknowledged that efforts to enhance teachers’ assessment literacy historically have been inadequate and largely unsuccessful. As the concept of assessment literacy has evolved, recent efforts are much more focused on the use of assessment by teachers within the context of instruction. To a certain extent, however, there is a lingering perspective that assessment is important to support instruction, but is different or separate from instruction. At the same time, the terms instruction and teaching have become synonymous. In this paper, I propose a more holistic perspective of assessment as an inseparable component of teaching, that is, there can be no teaching without assessment. From that perspective, teacher assessment literacy can best be viewed as teaching literacy. In the final sections of the paper, implications for supporting teachers’ interpretation and use of external, large-scale test results are discussed along with recommendations for reporting results from large-scale tests in a way that supports teaching literacy.

A Brief History of Innovation in Educational Assessment – Bursting the Bubble (2021)

Invited presentation made as part of the opening session of the Center for Assessment’s virtual RILS conference on Design Innovations in Educational Assessment Systems.
Act 1 – A discussion of innovation of innovation v. invention and a comparison of innovation in assessment and how we purchase and listen to music in my lifetime.
Act 2 – The Good, The Bad, and The Ugly of attempts to innovate in educational assessment over the past three decades.
Act 3 – While innovation in “real life” is usually associated with making life easier, innovation in educational assessment has usually been linked to raising standards, higher stakes, and more complex assessments.

State Assessment and High School – A square peg for a round hole (2020)

Like the proverbial square pegs and round holes, some things just don’t quite fit well together. Over the past twenty-five years, it has become clear that high school and state assessment fall into that category. The American concept of the comprehensive high school has been structured around students pursuing a variety of pathways to diverse postsecondary destinations. State assessment has been structured around the concept of all students traveling the same route at the same rate; arriving at a common destination at the same time. It should not come as a surprise, therefore, that when the irresistible force of high school meets the immovable object that is USED the result is nothing more than raised temperatures and a lot of wasted energy. That is not to say that there is no role for state-sponsored or even state-mandated assessment in high school. As with any use of assessment, the key is determining how best to use assessment in a way that is consistent with, and ideally even advances, the purposes and goals of high school.

Comparability of Individual Students’ Scores on the “Same Test” (2020)
with Brian Gong

Chapter in National Academy of Education publication Comparability of Large-Scale Assessments: Issues and Recommendations (Eds: Berman, Haertel, & Pellegrino). In large-scale assessments, individual student test scores on the same test are expected to be comparable, but meeting this goal is challenging . The challenge is exacerbated in large-scale K–12 testing because the term “same test” refers to various cases in which stu- dents may take different sets of items under different conditions . This chapter addresses how to evaluate whether comparability across conditions is sufficient to support a par- ticular inference or test use . Common threats to comparability arise from a lack of atten- tion to design decisions and psychometric procedures . There are also external threats that might affect the accuracy and/or interpretation of students’ scores . Students’ opportunity to learn (OTL) the content assessed and familiarity with the item formats and tools used on the assessment are two types of comparability threats related primarily to their prior experiences .

Building A Conceptual Framework for Assessment Literacy (2018)
with Amy Sharp, Kelli Ryan, and Damian Betebenner

Like the proverbial square pegs and round holes, some things just don’t quite fit well together. Over the past twenty-five years, it has become clear that high school and state assessment fall into that category. The American concept of the comprehensive high school has been structured around students pursuing a variety of pathways to diverse postsecondary destinations. State assessment has been structured around the concept of all students traveling the same route at the same rate; arriving at a common destination at the same time. It should not come as a surprise, therefore, that when the irresistible force of high school meets the immovable object that is USED the result is nothing more than raised temperatures and a lot of wasted energy. That is not to say that there is no role for state-sponsored or even state-mandated assessment in high school. As with any use of assessment, the key is determining how best to use assessment in a way that is consistent with, and ideally even advances, the purposes and goals of high school.

Living in a Post-Validity World: Cleaning Up our Messick (2016)

Paper to accompany the Presidential Address delivered at the 2016 conference of the Northeast Educational Research Association in Trumbull, Connecticut. In the address I examine the “state of validity” in educational measurement and assessment. More than a quarter-century after the publication of Messick’s groundbreaking chapter in the 3rd edition of Educational Measurement, are we any closer now to understanding and being able to explain validity? Are we making and promoting better interpretations of tests scores than we were in 1989? Are we making and promoting better use of tests and testing programs than we were in 1989?

Salvaging RTT Assessment (2011)

EdWeek commentary addressing the complexity of the goals the Obama administration hoped to achieve through the Race to the Top Assessment Program and warning that If there is any chance for the RTT assessment program to accomplish its goals and not simply produce a few “pilot projects” or “discrete tests, cobbled together,” as Sec. Duncan put it, all involved must face the truths about how extensive those goals are and what will be necessary to accomplish them. The piece offers four clear steps that states and the federal government can take to salvage Race to the Top assessment and avoid an epic fail.

Formative Reform: Purposeful planning for the next generation of assessment and accountability systems (2009)

As we consider the next generation of assessment and accountability systems it is to our advantage to pause and engage in a process of formative education reform. That is, to define the purposes of our accountability systems and assessment systems, as well as the purposes and goals of the public education system that they are intended to support. The goal of this paper is to provide background information and pose questions to inform this process of purposeful reflection and decision-making.

The Ideal Role of Large-Scale Testing in a Comprehensive Assessment System(2003)

Article in the ATP Journal of Applied Testing Technology. The role of large-scale assessment in public education has grown tremendously since the mid-1980s and unquestionably will continue to grow with the implementation of the assessment and accountability requirements of the No Child Left Behind Act. In the rush to meet the demand to measure validly and reliably the performance of all students, however, it must not be forgotten that large-scale assessment is only one component of a comprehensive assessment system. The factors that led to the predominance of large-scale assessment are reviewed and the appropriate role of large-scale assessment in a comprehensive assessment system is discussed.