It is clear to even the staunchest advocates of state testing and test-based accountability that item response theory (IRT) is not the best foundation on which to build models of school performance, let alone school effectiveness. It is time, therefore, to shift our accountability focus from IRT and building better tests to lessons we can learn about building better accountability systems from the IRS – yes, the Internal Revenue Service.
I’ve got your attention. Great.
There is widespread agreement that we are in dire need of a new model for school accountability. The current test-based accountability model, focused on a limited set of outcomes and centered on the results of an end-of-year state summative test is, at best, incomplete.
If we are in search of a new model, we need not look any further than the IRS. After all, what organization is associated more with accountability than the Internal Revenue Service.
Before you twist yourself into a knot like the ICC of a set of items with non-uniform DIF, I am not suggesting that we simply hand over responsibility for school accountability, lock, stock, and barrel, to the IRS and those 87,000 new agents funded by the Inflation Reduction Act.
Although now that I think about it, why not place the Treasury Department in charge of school accountability. Federal involvement in public education, after all, is largely concerned with the transfer and use of funds. Plus, it’s not like we don’t already spread responsibility for public education across an alphabet soup of federal agencies. It takes a village. The school lunch program historically has gotten its seal of approval from the USDA. Much of current school policy, driven by civil rights laws and regulations, falls under the purview of the DOJ. The DOT helps to ensure that our students make it safely to and from school. And we learned over the past few years just how much the CDC has their gloved hands in the school pie.
Exactly what is it that the USED does?
But I digress.
What I am suggesting is that as we think about redesigning school accountability we consider using the IRS and the annual exercise of paying our income taxes as a model.
This post is not the first time that I have drawn upon the IRS and the mid-April Tax Day as an analogy for state testing and accountability.
For years, I liked to point out that with regard to paying taxes, the flow of information of information was almost exclusively in one direction: from lowercase us to the uppercase US. . We do not sit around each spring waiting for the IRS to tell us how much money that we made in the previous year. I will admit that each year after completing our tax returns, many of us are truly surprised by the size of the refund we will be receiving, or heaven forbid, the additional payment we’re required to make, but that is more an indictment of the state of financial literacy in the US than anything to do with the IRS as a model for school accountability.
More recently, I used the IRS as an example to help make my argument that state testing to meet federal assessment and accountability was much more appropriately regarded as a data collection activity than a measurement activity.
The newest addition to my IRS analogy is that we have now reached the point where the IRS already has virtually all of the information that taxpayers are “reporting” to them during tax season. As I reviewed the pile of forms and schedules that my wife and I submitted to the IRS in April, I was hard pressed to find a single piece of information which had not previously been reported to the IRS by a bank, employer, or other institution.
What are the key takeaways about our interactions from the IRS that we might want to incorporate into a reimagining of school accountability? To recap:
- Data flows in one direction.
- The primary activity that we wish to optimize is data collection.
- It possible to collect data continuously, in real-time, from multiple sources.
Data flows in one direction
The premise that a school should already know whether it is effective is critical to the design of state assessment and accountability programs.
Schools should know which standards and competencies their students have mastered (and which they have not yet mastered), and whether those students are proficient or on track to college-and-career-readiness. A school should understand its climate, its students and all else that goes into it being an effective school.
The purpose of accountability programs is NOT for the state to tell local educators, parents, and students something about their own school and students that they did not already know.
What the state can provide is context that help schools better interpret their own data. At a minimum, that context includes normative data and information that allows schools to make comparisons to what is going on in other schools across the state. Ideally, the state has used data collected over time from schools across the state to develop models of effective schools that a school can apply to its own situation.
Note that when teachers are asking for the state assessment program to provide more information to support the instruction of individual students or school administrators are asking for the accountability system to provide more information than an overall score or rating, in general, they are asking for one of two things:
- For the state to provide tools for them to collect data that they do not already have.
- For state support in interpreting the data that they already have.
In short, they are seeking information and state support to assist them in doing their own job better; that is instruction and supporting student learning. Their motivation is not to help the state to do its job better, and that’s as it should be.
The primary activity that we want to optimize is data collection
There are two initial questions that we need to ask and answer in designing a school accountability system:
- What evidence is necessary to demonstrate whether a school is effective? and
- What is the best way to collect that evidence?
Our answers to those questions might involve the use of some measurement tools, but it should become obvious very quickly that our primary activity is collecting data from schools.
It is possible to collect data continuously, in real-time, from multiple sources.
Advances in technology have made continuous, real-time, data collection not only possible, but feasible.
One of the positive legacies of No Child Left Behind and Race to the Top is that they incentivized states to apply those advances in technology toward the development of student and school information systems. Those systems, which started with individual student identification numbers, enabled states to monitor student performance over time and made possible the development and use of growth scores and models. More importantly, those systems provided states with a whole host of options for collecting input and outcome data, samples of student work, and other artifacts directly from schools regularly, unobtrusively, and if necessary, in real-time or continuously.
What kinds of data?
We can start with assessment data, but state test results are just the tip of the iceberg. Information systems make it possible to collect data from interim assessments, screening tests, district-wide assessment programs, embedded performance tasks and projects, and student course grades (final and across grading periods).
Information systems also make it possible to collect school-level data on critical inputs such as curriculum, course offerings, as well as other programs offered to support academic success and overall well-being.
Information on teacher certifications and credentials can be mapped to course assignments for both teachers and students.
That data can be linked to real-time data on student and teacher attendance, discipline, safety, achievement, SEL, etc.
AI can be used to selectively sample, review and rescore artifacts of student work stored in the system.
All of the above may have a “big brother” feel to it and the memory of the InBloom fiasco still lingers.
I could easily build on that uncomfortable feeling by extending the IRS analogy to describe how the data collected from schools can be used to identify likely places where it it necessary to conduct audits to identify and close honesty gaps or opportunity gaps. Handled correctly, I am not particularly opposed to such a use.
However, what a waste it would be if we just stopped there when there is so much more to be done.
What should the state be doing with all that data?
Despite 20+ years of federally-mandated school accountability, we know little about effective schools. We have barely scratched the surface on defining school effectiveness, let alone modeling it.
Data modeling is what states and their partners should be doing to help refine, test, and validate a definition of school effectiveness (and college-and-career-readiness while we’re at it), to identify and better understand key drivers and levers of effectiveness, and to support schools in their efforts to be more effective in supporting student learning.
We have taken baby steps in data modeling to support state assessment and school accountability before. The development and reporting of “similar school bands” or “comparison score bands” was a primitive form of modeling. We have developed growth models ranging from simplistic and simple-minded “models” like gain scores to Student Growth Percentiles to value-added models such as EVAAS, but have done relatively little to understand and apply the results of those models.
Even within assessment, until recently, we have been extremely conservative in our use of modeling, barely acknowledging that what we are doing with IRT is, in fact, modeling and have been sneaking other models in through the back door via alternate assessments.
To some extent, it is fair to argue that our willingness and ability to make the most of data modeling has been limited due to the availability of the key ingredient —data. That barrier has been effectively eliminated.
One might also point to our need to meet immediate demands (i.e., producing annual results and reports) as a barrier to taking the time needed to develop and apply models of student achievement and school effectiveness, citing projects such as SEDA which tend to live outside of the annual accountability cycle.
Our unwillingness and inability to actually define and model school effectiveness also may be due to a failure of imagination or one of the other shortcomings one often encounters when humans are tasked with solving wicked problems (e.g., our inability to disentangle the concepts of transparency and simplicity). To quote the philosopher Jack Sparrow, “The problem is not the problem. The problem is your attitude about the problem. Do you understand?”
Whatever the cause, it is essential that we understand that we can choose to model student achievement and school effectiveness well or we can choose to continue to apply inadequate, incomplete, and inappropriate models, but we cannot choose NOT to model student achievement and school effectiveness. In the words of William James, “When you have to make a choice and don’t make it, that is in itself a choice.”
Every time that we select a set of accountability indicators, assign a weight to those indicators to create a composite score or even choose to use a profile instead of a composite score to assign schools star ratings or letter grades, we are engaging in modeling school effectiveness.
Perhaps at one point in time computing percent proficient and simply aggregating a few indicators to produce a school accountability rating was the best that we could do.
But with the data and resources that we have available at our fingertips this time around, we can and must do so much better, so much better.
Image by Gerd Altmann from Pixabay