assessment, accountability, and other important stuff


The loss of state assessment results in the wake of COVID-19 does not have to mean a loss of information about student proficiency

Charlie DePascale

Given that the COVID-19 pandemic is affecting nearly all aspects of our lives, it is not a surprise that it has brought a critical nationwide, federally mandated data collection effort to a halt.  I am not referring to Census 2020 which was forced to suspend all of its field operations.  Nor am I referring to the IRS and Tax Day which has been moved from April 15 to July 15.  No, the nationwide data collection effort to which I am referring is the annual administration of state assessments to millions of public school students in grades 3 through 8 and high school.

With school closures affecting more than 55 million students across the country and nearly all states obtaining testing waivers, it is nearly a certainty that there will be not be and should not be state testing this spring.  This cancellation of testing causes a significant hardship to assessment contractors and more importantly, deprives states of information used to inform policy, a condition which if we believe in the reasons for testing, ultimately is harmful to students.

The “good news” is that data that would have been collected through state testing is not lost.  Like the data that is collected through the census or tax filings, data on student proficiency on state standards in the 2019-2020 school year is still there waiting to be collected.  We may just have to adjust our thinking on what data we are collecting and how we are collecting it.

State Assessment is a Data Collection Effort

First and foremost, we have to recognize that state assessment is at its core a data collection effort.  Because the current solution includes an assessment, we have fallen into the trap of viewing the task of collecting data on student proficiency from a measurement perspective and treating all challenges to the process as measurement problems.  The fundamental task, however, similar to the census, is to produce an accurate count of the number of students in the state who are meeting state achievement standards.  The task is not to measure student proficiency.

It can certainly be argued that at one time the most accurate and efficient way to collect the desired information was through an assessment administered to students statewide.  That solution, however, became less desirable over time as state content standards became more complex, state achievement standards became more rigorous, requirements to include all students became more rigid, and the consequences associated with the results of the assessment increased (see Campbell’s Law).

At the present time, the current model of state assessment is fast becoming an anachronism; perhaps not as much of an anachronism as annual tax filings, but more of an anachronism than the census, simply due to the frequency of state assessment if for no other reason.

States have known since at least the 1990s that an on-demand test composed primarily of selected-response items was insufficient to fully measure student proficiency, but for 25 years that remained the most feasible and efficient solution.  The PARCC assessment, however, was likely the field’s gallant last gasp at developing an on-demand state assessment to measure college-and-career readiness standards.

Moving forward, state assessment will still be at least a component of the best available solution to compile accurate information about student proficiency, but assessment is not the only solution.

There are Proficient Students Even if there Is No Assessment

There may be doubt about whether a tree falling in the forest makes a sound if no one is around to hear it, but there is no such doubt about student proficiency.

After accepting that the task is to count, not measure, we must recognize that students are proficient (or not) in English language arts, mathematics, science, and a host of other areas regardless of whether we administer an assessment.

Over time, the belief became ingrained that we need a state assessment to determine whether a student is proficient.  The state assessment and its items defined the meaning of loosely worded state content standards.  Achievement level descriptors were most often developed for the assessment rather than the content standards; and were used in conjunction with the unfortunately named process of standard setting to define the state’s achievement standards. In short, the state assessment system and student proficiency became a closed system.

Federal policy that decreed performance on state assessment as the gold standard for student proficiency and elevated alignment to state content standards as the most important evidence in the validation of state assessment programs only helped to keep the system closed.

The fact remains, however, that students acquire proficiency in English language arts and mathematics through curriculum and instruction aligned to state content and achievement standards.  That proficiency builds over the course of the school year and resides within the student, not within the test, when she or he sits down in the spring to take the state assessment.

Our actions as assessment professionals, policy makers, and educators, suggest that we have forgotten the principle that the purpose of assessment is not to define a construct such as proficiency in English language arts and mathematics, but rather to provide us with information that helps us to be able to accurately and consistently distinguish among students at various places along the proficiency continuum.

Teachers Should Be the Best Judges of Student Proficiency

If we accept that proficiency exists outside of the assessment then if follows logically that the best judge of a student’s proficiency should be the teacher who a) has deep knowledge of the state content and achievement standards and b) has been instructing that student for seven months with a curriculum, instruction, and formative assessment practices aligned to those standards.  Setting aside for now debate about the extent to which the two conditions are met in classrooms across the country, nobody is in a better position than the student’s teacher to make an informed judgment about student proficiency.

There are, of course, many reasons why states do not and should not rely on teachers’ judgments alone when collecting information about student proficiency for school accountability.  Concerns about self-reporting of results for accountability purposes are real; as is the fact that one of the primary things that we are measuring or evaluating through school accountability systems is the extent to which there is alignment between the state’s and local educators’ understanding of the state content and achievement standards.

The current situation, however, presents an opportunity to collect those teacher judgments with minimal risk.  First, accountability waivers will eliminate high stakes uses that might bias judgments.  Second, most states have data school- and student-level data from previous years against which to monitor these judgments.

The next critical question is whether enough instruction has taken place to enable teachers to make the necessary judgments.  The answer to that question is unequivocally yes.  If state testing had already started or was about to start within the next month, teachers have sufficient evidence to make an informed judgment of student proficiency.  I would argue that teacher judgments about student proficiency at the time of school closures is a more accurate reflection of the level of proficiency a student acquired during the 2019-2020 school year than an assessment administered when school resumes either this year or next year.  There will be other reasons for measuring student performance at that time.

So, if teachers have the data that states need, is there a feasible way for the state to collect it?

Collecting Data from Teachers on 2019-2020 Student Proficiency

With relatively minor adjustments, it should be possible to use the same infrastructure already in place to administer state assessments to collect teacher judgments of student proficiency.  Given that testing was about to begin, we can assume that student registration lists had already been prepared to sign students into computer-based tests and that procedures were in place to provide access to teacher test administrators as well.  States or assessment contractors may not have access to information needed to assign individual students to specific teachers, but that is a minor inconvenience.

Preparing online resources, instructions, and a form for teachers to enter ratings of student proficiency would not be a heavy lift, certainly not in comparison to scoring, processing, and equating tests.  States and their contractors can decide what judgment they would like teachers to make.

Using the NAEP achievement level categories of Below Basic, Basic, Proficient, Advanced as an example, a state might ask teachers to assign students to one of the four achievement levels or simply to indicate whether the student’s level of proficiency was at the Proficient level or above (i.e., Proficient or Advanced).  In activities conducted in conjunction with standard setting for a state assessment, we have asked teachers to designate students’ proficiency as Low, Medium, or High within one of the four achievement levels (a total of 12 possible classifications).  My personal preference, however, is to allow teachers to use borderline categories as shown below for a total of 7 possible classifications: Below Basic, Borderline Below Basic/Basic, Basic, Borderline Basic/Proficient, Proficient, Borderline Proficient/Advanced, Advanced.

Will the results of the teacher judgment process be totally accurate, complete, or interchangeable with assessment results?  Probably not, but that’s OK.  They can still become useful information to support the school improvement process.

 More Than A Stopgap

If I viewed the collection of teacher judgments only as a one-time stopgap to make the best of the 2019-2020 school year, I might hesitate to suggest it.  It is a fact, however, that if we have any hope for education reform and school improvement efforts to be successful, we need teachers to understand what proficiency is and to be able to classify student performance along the proficiency continuum.

One of the big unanswered questions when state assessment results are released each year is whether those results are consistent with the way that local administrators and teachers perceive their students’ performance.

It is also a fact that we are not going to be able to continue to use on-demand large-scale assessment measure the complex knowledge, skills, and abilities that we want students to acquire. It is inevitable and desirable that in the near future states are going to have to rely on teacher judgment of student performance as key part of the information they collect from schools each year.

Given the conditions in a particular state, it might be foolish for state assessment leaders to consider any type of data collection in the coming weeks or months.  However, if a state is seeking a way to recover data lost from cancelling testing in 2019-2020, we have a unique opportunity to begin to take the first step toward collecting that information.  We might as well use it.

A Useless Test Bias Argument

Charlie DePascale

close up of books on shelf

Photo by Suzy Hazelwood on


“Criticizing test results for reflecting these inequities is like blaming a thermometer for global warming.”

That was the viral moment from the recent NCME statement on admissions testing. The line clearly was intended to go viral and it did go viral; well, as viral as any technical defense of standardized testing can go – quoted and retweeted tens of times.

I like a glib “test as thermometer” quip as much as the next psychometrician and I have enjoyed the various versions of this one that have been used in the context of college admissions testing.  There was something about the line and the statement, in general, however, that just didn’t feel right.

NCME framed the statement as “highlighting the critical distinctions between group score differences and test bias.” Along with an obligatory quote from the Standards and an academic reference to correlation and causality, the test as thermometer equivalence appears to be drawing a clear distinction between test scores and test use.  Test scores, it appears can reflect real differences between groups without the tests being biased.  This separation of test scores from test use is something that we have not seen in the organization’s arguments on validity.  As NCME president, Steve Sireci has written, “To ignore test use in defining validity is tantamount to defining validity for ‘useless’ tests.”  Does the same argument apply to test bias?

When the tests in question are college admissions tests their primary intended use is fairly explicit. One can assume that a claim that the tests are biased refers at least as much to their use in college admissions as in a technical claim about the accuracy of the scores.  To dismiss this claim with a technical lesson on misconceptions about test scores comes across as defensive, at best, tone deaf, and somewhat self-serving.

NCME could have chosen to focus their response on this portion of their quote from the Standards: “group differences in testing outcomes should trigger heightened scrutiny for possible sources of test bias…”

  • They could have discussed whether the construct being assessed by the college admissions tests is academic achievement (in English language arts and mathematics) or college readiness. If the former, then we are back to the question about whether the focus on the accuracy of the group differences is tantamount to the defining bias for useless tests.
  • They could have discussed differential validity and the importance of establishing that the relationship between English language arts and mathematics achievement and college readiness (or success in college) is the same for students whose low performance is “caused by disparities in educational opportunities” as it is for other students.
  • They could have discussed the role that test scores play in the “proper use and interpretation of all data associated with college readiness” and explained how limited or extensive that role should be given what the field knows about college admissions tests and test scores – particularly with respect to the performance of the subgroups of students in question.

Instead, NCME chose to offer a heavily nuanced defense of college admissions tests and test scores.  I have to wonder who they see as the primary audience for this statement.


I’m With The Band



Harvard University Band


Charlie DePascale ’81

This weekend the Harvard University Band celebrates its 100th anniversary.  Along with meeting my wife, my time in the band remains one of the two happiest memories of my four years at Harvard. Actually, my memories of the band begin with the end of my junior year of high school.

It was the summer of 1976, the Bicentennial Year, and a high school classmate told me about this Summer Pops band at Harvard: anybody can join, they rehearse one night a week, and there are two concerts at the end of the summer – one in Harvard Yard and one at the Hatch Shell, where the Boston Pops play. OK, sign me up.

During that summer, on stage at Sanders Theater with a couple of hundred people of all ages and musical ability, I had my first interactions with Tom Everett, director of Harvard bands.  Until that summer, I had no intention of applying to Harvard.  Harvard was for other people.  But during that summer with Tom, I remember thinking, hey, if this is what Harvard people are like, I could spend four years here.  So, I applied, was accepted, joined the band and the wind ensemble, and quickly learned that there were no other people at Harvard like Tom.

Despite that, my decision to attend Harvard was a net positive (did I mention meeting my wife), and my experience with the band was definitely positive. In my short time with the band, I enjoyed performing at the Kennedy Center, traveling to New York City, Washington, DC, and Montreal, performing a song conducted by the legendary Arthur Fiedler, playing for Jackie Onassis, and on one magical December night witnessing the beginning of a major collegiate point-shaving scandal and fulfilling my childhood dream of playing Amen in the Holy Cross basketball band.  Dare to dream!

And then there are the lessons learned that extended well beyond my years in Cambridge.

First, there are a few practical takeaways:

  • A wool jacket can absorb several times its weight in rainwater and still be fine the following week – a clarinet, not so much.
  • If you’re tired enough, you can sleep anywhere – on the cement floor of a game room in Ithaca, in an end zone at Princeton, sharing a sofa bed with a virtual stranger in an apartment in Montreal, or on a bandmate’s shoulder during a long, late-night bus ride.
  • At least one time in their life, everyone should experience walking through a dark tunnel into a sunlit stadium to hear and literally feel the roar of 60,000 cheering people.

And then there are the larger life lessons that have served me well throughout my career.

  • Illegitimum non carborundum – Enough said.
  • Lines (1) – Sometimes when the gun sounds and you are jumping, or scrambling, from one formation to the next you end up on the wrong 45-yard line (they all look alike, you know). When that happens, just fall into line with the trumpet section, play the song, and rejoin the clarinets for the next formation.
  • Lines (2) – Everything and everyone is fair game for the halftime humor of Harvard Band – even the band itself. However, there are times when you know you are crossing a line that shouldn’t be crossed – for me, it was the formation that paired Ted Kennedy with a popular Bee Gees song. Don’t shy away from the line, but try to stay on the right side.
  • “The Game” Syndrome – Every week, the halftime show had to fit into a tight window. When our time was up, we were off the field – this wasn’t American Pie (reference is to the 1971 song; we can discuss the resemblance of the HUB to the early 2000s movie franchise at another time). That limit was a good match for our practice of rehearsing the show for the first time the morning of the game. The Harvard-Yale game, however, had a longer halftime, which provided a few extra minutes for an extended halftime show. Of course, the temptation to turn our show into a Super Bowl-worthy extravaganza was too great to resist – often with the same result as recent Super Bowl halftime shows.  Forty years later, there are still nights when my dreams are haunted by giant royal stick figures trying to “walk” across the field.  Dream big, but know your limitations.
  • A Dedicated Core – Every volunteer organization, whether it is a college band, a town Democratic committee, a regional educational research organization, or a national professional association cannot function without a dedicated core of passionate people who are willing devote way too much of their own time to doing all of the big and little things that must be done so that everything runs smoothly when the rest of us just show up. Treasure those people.
  • Leader of the Band – With the right person leading them a group of 200 community members, or 150 Harvard students looking to have fun, or 50 student musicians grateful for one more opportunity to keep playing can each make such beautiful music. It takes a special person to know how to pick the right music, create the right environment, and effectively structure a limited amount of rehearsal time to get the most out of each of those groups and individuals; teaching and gently moving them in the right direction with humor, skill, grace, and wealth of knowledge, skills, and experience.  Thanks, Tom.

So yes, I’m with the band and the band will forever be a part of me.

Happy Anniversary HUB!  Here’s to the next 100 years.


A good day ruined

Charlie DePascale

After a wonderful late summer day spent enjoying a rare weekday afternoon baseball game in Boston, I sat down last night and looked at my Twitter feed.  There among the trending items was this headline

The University of Texas’s Secret Strategy to Keep Out Black Students

Without even clicking to look at the article I knew without a doubt that this ‘secret strategy’ had to involve standardized testing.  Is this how gun manufacturers, drug companies, and those e-cigarette people feel?

Standardized tests don’t kill people …

Where will this end?

Will NCME soon be thought of in the same way as the NRA – but you know, without the money, membership, or political clout? The president of AERA fired a direct shot at testing last spring with her presidential address, “An Inconvenient Truth About the New Jim Crow of Education” – a catchy title.

Will San Francisco lawmakers be next to set their sights on NCME – labeling it a domestic terrorist organization?  Will there be a resolution to block NCME from holding its 2020 conference in San Francisco?  If so, will AERA support it?  After all, it wouldn’t be the first time AERA moved a conference in California in the name of supporting a social cause.

What about New York, already regarded as the Lexington and Concord of the opt-out revolution?  Will the governor and legislature take aim at NCME and deny state funds to people doing business with the testing industry?  What will the pineapple say?

Where do we go from here?

Is there a way to stem this anti-testing tide and restore the shine to this field of ours? (dare I say, to make testing great again. no, I think I’ll pass on that.)

I can support increased background checks for all test users.  I am more skeptical about federal- or state-mandated bans or limits on state and local testing.  Perhaps those efforts can reduce the damage caused by high-stakes census testing, but an ill-conceived and poorly-developed teacher-made test in the hands of an inexperienced teacher can still cause a lot of harm one child at a time.

Standards, lists of best and fair testing practices, and policy statements are necessary, but not sufficient.  I am pretty sure that I already read somewhere that it’s not acceptable practice to base a high-stakes decision on a single test score.

Improving the assessment literacy of all involved in the testing process including students, teachers, policy makers, the media, the general public, and psychometricians is a good place to start.  I know some folks in New Hampshire who are doing some nice work in that area. (sorry, no names or links.  need to keep that personal/professional firewall intact.)

Improved assessment literacy, of course, won’t stop people who want to use testing for unsavory or evil purposes from doing their thing.  It might, however, make others more aware of when testing is being used to do harm; and make them more likely to speak out; and make them less easily persuaded to opt out.

The trend toward moving the locus of assessment from the state house to the classroom also seems like a step in the right direction. Not only will that put actionable information in the hands of teachers, where it belongs; that shift will help eliminate problems like the one faced by Massachusetts last spring. As we have known for a long time, passage-based triggers are much easier to avoid in the classroom than on a state assessment.

Improving local assessment is a step, however, that will require tremendous investment in the infrastructure of teacher preparation programs and schools.  Remember that one of the big advantages of large-scale assessment is that it’s cheap and doesn’t require much training of teachers and school administrators.

Advances in instructional and assessment technology, personalized learning systems, and modeling based on a much broader base of data than test scores and student attendance also has a great deal of promise, but will not be without its own technical and social challenges.

I look back fondly on the days when we were accused of trying to peek into family life with our student survey questions on how much time was spent watching television each night or whether the student had a part-time job; of trying to brainwash students with our questions about the environment; or of simply trying to get students to fail with our trick questions that contained plausible distractors.  So, yes, tracking and developing psychometric models for a student’s eye movement, heart rate, and other signals of student engagement might face some resistance.

For now, however, I will go back to Twitter and read about climate change.  That ‘inconvenient truth’ reference aside, I am pretty sure testing isn’t being blamed for climate change – well, not yet – or maybe I’ll just listen to some music.



yes no


Charlie DePascale

Now that the administration has dropped efforts to include a citizenship question on the 2020 Census, perhaps there is space on the form for the proficiency question, “Is this person college-and-career ready?”  For persons 18 and under, the question would be, “Is this person on track to college-and-career readiness?”

Think about it. We ask the question in April 2020 and by December 31st we have a national count of the number of college-and-career ready residents in the United States.  By March 31, 2021 we have state-level counts disaggregated by race, ethnicity, and other key demographic factors.  In about the same amount of time that it took to produce the 2017 NAEP Reading and Mathematics results, we would have proficiency information for the entire U.S. resident population instead of the small portion of the population captured by that ill-defined social construct grade level.

The federal government could then make decisions about how much money to allocate to programs designed to improve college-and-career readiness and how best to distribute that funding across the states (just as they do with other information collected through the Census). States could begin to redesign their early childhood, K-12, postsecondary, and adult education programs to better meet the needs of their residents (just as they do with other information collected through the Census).

I bet that you are thinking, but Charlie, just how accurate could that self-reported information possibly be?  Well, you see, accuracy is a funny concept; it’s one of those eye of the beholder, depends on what the meaning of “is” is type of things.  Would a U.S. Census count of college-and-career readiness be any more or less accurate than the differences we have had in proficiency estimates among the 50 states or between states and NAEP?  Would the actions triggered by a U.S. Census count of college-and-career readiness be any more or less appropriate than actions resulting from the wide variations across states in the percentage of schools identified for support and improvement under ESSA accountability systems?

Or perhaps you are thinking, but Dr. DePascale – psychometrician – a single Census question on college-and-career readiness is not measurement.  Where are the “big 5” sources of validity evidence?  Where are the external alignment studies?  Where is the USED Peer Review?

All valid points, but here’s the thing, federal and state assessment policy has never been about measurement.  As I have argued in previous posts, determining the percentage of students in a state who have met minimal competency standards, attained proficiency,  are on track to college-and-career readiness or who have made progress from fall to spring is now, always has been, and always will be, at its core, a data collection problem and not a measurement problem.

At one time, the most efficient and accurate way to solve that data collection problem was with a large-scale state assessment; that is, with a short, on demand, machine-scored test administered to students in the general education program at selected grade levels.  But that time was a long time ago.  Policies and laws on inclusion changed.   The student population became more diverse. Content and performance standards became more rigorous and complex.

Thought experiment: Imagine you have  placed a group of experts in a room (or even a group of testing company psychometricians), and tasked them with coming up with the most efficient and effective way to determine the number or percentage of students in a state who are on track to college-and-career readiness or the number and percentage of high school graduates who are college-and-career ready.  If you don’t like a group of experts, you can crowd-source the task or use artificial intelligence to solve it.

Whatever approach you take, it is highly unlikely that the solution that is generated will be a single, on demand, end-of-year, state assessment.  If you expand the task to determining the number for the country rather than a single state, I guarantee that the solution will not be 40-50 unique state assessments.

The solution may include a limited amount of state and federal assessment (e.g., something like NAEP), but it is virtually certain that the solution will be more centered on quality data collection than on high quality assessment; and if we are looking for a data collection solution, what better place to begin than the U.S. Census Bureau.  Their self-described mission is “to serve as the nation’s leading provider of quality data about its people” with the goal “to provide the best mix of timeliness, relevancy, quality and cost for the data we collect and services we provide.”  Does any state department of education assessment or any testing company claim the same mission and goal?  Would we want them to?

Where would we begin?

So, with the proficiency question on the 2020 Census where would we begin to ensure the most accurate count possible?  The first step would probably be to develop a common definition of college-and-career readiness that we want people to use when answering the question.  The next step might be a public education campaign to get the public on board with the importance of collecting the information. That campaign undoubtedly would include clear descriptions and real-life examples of college-and-career readiness or of being on track to college-and-career readiness – descriptions that people can easily grasp and apply to themselves and the people in their home.

Now you may be asking yourself aren’t those the same things that we should do when introducing a new set of content standards or assessment program.  The answer, of course, is yes; but often those steps are forgotten or are given insufficient attention and resources when the focus is on building a better assessment or accountability system rather than on collecting better data.

There have been efforts at such public relations campaigns in the past, and they have been somewhat successful.  When the MCAS tests and new performance standards were introduced in Massachusetts in the late 1990s, “What Does Proficient Look Like” workshops were held in communities across the state and “Test Yourself” brochures were distributed at toll booths, grocery stores, and public libraries.  When the Common Core State Standards were introduced, it was impossible to watch a professional golf tournament on network television without seeing a “Support the Common Core” commercial sponsored by EXXON or some other major corporation (yes, that sentence was intentionally Bidenesque).

Massachusetts no longer has toll booths, people buy groceries online, and public libraries are being repurposed to meet the changing needs of communities.  Women’s soccer matches may be a better option than professional golf tournaments for spending advertising dollars (at least every four years).  Yes, the medium will change, but the message and the need for the message remains the same.

We can develop the best large-scale assessment ever imagined; but at the end day and at the of the school year, if every teacher, parent, and student cannot give an accurate answer to the question “Is this person on track to college-and-career readiness?” without looking at a score on a state assessment, what have we really accomplished?

Gold Standard?




Charlie DePascale

Disclaimer:  I did not have access to my laptop and was forced to prepare this post on my Surface tablet. I apologize in advance for any effect that had on the length or quality of the post.

By any metric, 2017 was, and continues to be, a very bad year for NAEP.  Troubles began in April 2018 with the utter fiasco that was the long-delayed release of the 2017 Reading and Mathematics results.  Seldom in the course of human history have so many good statistics been sacrificed in the name of preserving an illusory trend line.  Then late last week came the announcement that no amount of statistical sleight of hand could save the results from the 2017 Writing assessment.  (And by announcement, I mean burying information deep on a website on the first Friday of summer that no results were forthcoming and that a more detailed report would be available in the spring of 2020.)

Perhaps, however, the worst news for NAEP was when they announced that they were not releasing results from a major assessment, few people noticed and fewer people cared.

Where does all of this leave NAEP as we await the results from the 2019 NAEP Reading and Mathematics assessments?

As I started to write this post, I will admit that I was feeling a bit cynical toward NAEP and my original title was ‘Gold Standard, My Ass!’

But then I thought, who among us hasn’t wanted just 3 more days or even 3 more hours to figure out what the hell was going on as we tried to equate writing results across years. If NAEP can lead the way and establish 3 years as an acceptable time frame, more power to them.

Plus, there have been plenty of times in the last 30 years when I have had to help state leaders make the hard decision between making necessary changes to their Reading and Mathematics tests or preserving their reporting scale and trend line.  If NAEP can lead the way on having your cake and eating it, too, pass me another slice.

And why should a state go through all of those hoops to convince USED that they really did administer a test and hold all students accountable this year if it is possible to just decide not to report results.  Be that shining light in the darkness, NAEP! We will follow!

Still not totally sure which direction I should go with this post, I thought a little bit more about the term ‘gold standard’ and what it represents.

There are several things about NAEP that do make it a symbol of the ideal in large-scale assessment:

  • Testing periodically rather than every year.
  • Testing intermittently across grade levels rather than at every grade level
  • Testing samples of students rather than all students
  • Using matrix sampling to improve the sampling of content on each assessment
  • Separate scaling of domain areas so that subscores might actually be useful
  • Demonstrating a total disdain for deadlines in the name of getting it right
  • The willingness to serve as an example of how difficult it is to set meaningful performance standards on a large-scale assessment.

Those things should be more than enough to offset the lack of transparency and no individual student scores and establish NAEP as an ideal; that is, a gold standard.

The final thing that turned me around on NAEP as a gold standard, however, was remembering that the gold standard is an antiquated monetary concept that was abandoned by virtually all nations decades ago; it is an anachronism that simply no longer works in the real world.

So NAEP, I owe you an apology.  Feel free to hold firm as the old white male of assessments in the changing world of 2019.  In so many ways, you are and will always be the gold standard.

Charlie DePascale

This year marks the 25th anniversary of the 1994 reauthorization of ESEA, known as the Improving America’s Schools Act (IASA).  Throughout the year, I will explore how various aspects of that law shaped my career, educational assessment and accountability, and K-12 education, in general. All of this will be done, of course, with an eye toward the next reauthorization of ESEA and the future of K-12 assessment and accountability.

As we begin the year, however, let’s just take a few minutes to refresh our memories on the thoughts about equity, excellence, and education that drove the 1994 law. Sometimes it’s not necessary to write anything new.  The words speak for themselves. I call particular attention to the middle section titled, What Has Been Learned Since 1988.






‘‘(1) IN GENERAL.—The Congress declares it to be the policy of the United States that a high-quality education for all individuals and a fair and equal opportunity to obtain that education are a societal good, are a moral imperative, and improve the life of every individual, because the quality of our individual lives ultimately depends on the quality of the lives of others.

‘‘(2) ADDITIONAL POLICY.—The Congress further declares it to be the policy of the United States to expand the program authorized by this title over the fiscal years 1996 through 1999 by increasing funding for this title by at least $750,000,000 over baseline each fiscal year and thereby increasing the percentage of eligible children served in each fiscal year with the intent of serving all eligible children by fiscal year 2004.

‘‘(b) RECOGNITION OF NEED.—The Congress recognizes that—

‘(1) although the achievement gap between disadvantaged children and other children has been reduced by half over the past two decades, a sizable gap remains, and many segments of our society lack the opportunity to become well educated;

‘‘(2) the most urgent need for educational improvement is in schools with high concentrations of children from low income families and achieving the National Education Goals will not be possible without substantial improvement in such schools;

‘‘(3) educational needs are particularly great for low-achieving children in our Nation’s highest-poverty schools, children with limited English proficiency, children of migrant workers, children with disabilities, Indian children, children who are neglected or delinquent, and young children and their parents who are in need of family-literacy services;

‘‘(4) while title I and other programs funded under this Act contribute to narrowing the achievement gap between children in high-poverty and low-poverty schools, such programs need to become even more effective in improving schools in order to enable all children to achieve high standards; and

‘‘(5) in order for all students to master challenging standards in core academic subjects as described in the third National Education Goal described in section 102(3) of the Goals 2000: Educate America Act, students and schools will need to maximize the time spent on teaching and learning the core academic subjects.

‘‘(c) WHAT HAS BEEN LEARNED SINCE 1988.—To enable schools to provide all children a high-quality education, this title builds upon the following learned information:

‘‘(1) All children can master challenging content and complex problem-solving skills. Research clearly shows that children, including low-achieving children, can succeed when expectations are high and all children are given the opportunity to learn challenging material.

‘‘(2) Conditions outside the classroom such as hunger, unsafe living conditions, homelessness, unemployment, violence, inadequate health care, child abuse, and drug and alcohol abuse can adversely affect children’s academic achievement and must be addressed through the coordination of services, such as health and social services, in order for the Nation to meet the National Education Goals.

‘‘(3) Use of low-level tests that are not aligned with schools’ curricula fails to provide adequate information about what children know and can do and encourages curricula and instruction that focus on the low-level skills measured by such tests.

‘‘(4) Resources are more effective when resources are used to ensure that children have full access to effective high-quality regular school programs and receive supplemental help through extended-time activities.

‘‘(5) Intensive and sustained professional development for teachers and other school staff, focused on teaching and learning and on helping children attain high standards, is too often not provided.

‘‘(6) Insufficient attention and resources are directed toward the effective use of technology in schools and the role technology can play in professional development and improved teaching and learning.

‘‘(7) All parents can contribute to their children’s success by helping at home and becoming partners with teachers so that children can achieve high standards.

‘‘(8) Decentralized decisionmaking is a key ingredient of systemic reform. Schools need the resources, flexibility, and authority to design and implement effective strategies for bringing their children to high levels of performance. ‘‘(9) Opportunities for students to achieve high standards can be enhanced through a variety of approaches such as public school choice and public charter schools.

‘‘(10) Attention to academics alone cannot ensure that all children will reach high standards. The health and other needs of children that affect learning are frequently unmet, particularly in high-poverty schools, thereby necessitating coordination of services to better meet children’s needs.

‘‘(11) Resources provided under this title can be better targeted on the highest-poverty local educational agencies and schools that have children most in need.

‘‘(12) Equitable and sufficient resources, particularly as such resources relate to the quality of the teaching force, have an integral relationship to high student achievement.

‘‘(d) STATEMENT OF PURPOSE.—The purpose of this title is to enable schools to provide opportunities for children served to acquire the knowledge and skills contained in the challenging State content standards and to meet the challenging State performance standards developed for all children. This purpose shall be accomplished by—

‘‘(1) ensuring high standards for all children and aligning the efforts of States, local educational agencies, and schools to help children served under this title to reach such standards;

‘‘(2) providing children an enriched and accelerated educational program, including, when appropriate, the use of the arts, through schoolwide programs or through additional services that increase the amount and quality of instructional time so that children served under this title receive at least the classroom instruction that other children receive;

‘‘(3) promoting schoolwide reform and ensuring access of children (from the earliest grades) to effective instructional strategies and challenging academic content that includes intensive complex thinking and problem-solving experiences;

‘‘(4) significantly upgrading the quality of instruction by providing staff in participating schools with substantial opportunities for professional development;

‘‘(5) coordinating services under all parts of this title with each other, with other educational services, and, to the extent feasible, with health and social service programs funded from other sources;

‘‘(6) affording parents meaningful opportunities to participate in the education of their children at home and at school;

‘‘(7) distributing resources, in amounts sufficient to make a difference, to areas and schools where needs are greatest;

‘‘(8) improving accountability, as well as teaching and learning, by using State assessment systems designed to measure how well children served under this title are achieving challenging State student performance standards expected of all children; and

‘‘(9) providing greater decisionmaking authority and flexibility to schools and teachers in exchange for greater responsibility for student performance.