assessment, accountability, and other important stuff

Archive for June, 2020

I Can ‘C’ Clearly Now

1565774D-F7CD-43F6-B68B-13FF55E6AD47_1_201_a

Charlie DePascale

The pandemic, concerns related to social justice and fairness, and a host of pre-2020 pre-existing conditions have pushed private and public institutions of higher education to either temporarily suspend or more permanently drop the use of tests such as the ACT and SAT as part of the college admissions process.  At the K-12 level, state assessments were cancelled in spring 2020 and some states are beginning to ask whether it makes sense to test in the spring of 2021. As a result, it appears inevitable that the oft-maligned high school GPA will take on greater significance in the next few years in the absence of large-scale standardized testing.

This has led me to reflect on my own experiences with grades and GPA.  By pretty much any measure in use at the time, I was quite successful in mastering the academic demands of public school through high school. And I was successful enough in college as I transitioned from music to educational research.  As I look back from a distance of 50 years, it is clear to me, however, that the areas in which I struggled did as much, if not more, to shape my future than the areas in which I thrived.

Measure Twice, Cut Once

I still remember the feeling upon opening the manila report card envelope in the fall of 5th grade and for the first time in my life not seeing the enclosed white card with “Honor Roll” printed in bold blue font at the top. Searching frantically through the report card, we found the ‘C’ in Woodworking.  From the ensuing conversation between my parents, teacher, and the shop teacher I recall the phrases “can’t cut a straight line” and “don’t ever let him become a surgeon” – the 1960s version of constructive feedback. I accepted that my hands were meant for different work as the song says, proudly displayed my dog-shaped key holder with the much too thin body, and moved on.  (Many years later while setting up my own basement work area I realized that everything in the elementary school shop had been set up for right-handed students, but that’s a discussion for another day.)

There were more C’s to follow at Boston Latin School.  Although most of my final grades were A’s and B’s, there were enough C’s mixed in across the five marking periods each year to map out my relative strengths and weaknesses. Growing up in the midst of the space race I spent hours with my telescope studying the moon, planets, and stars and reading everything I could find. My dreams of being a NASA scientist slowly faded, however, across three years of high school physics. At the same time, my interest in music as a career soared; so, it was on to Harvard and a concentration (i.e., major) in music.

At Harvard, I made the Dean’s List twice: in my first semester freshman year and my last semester senior year.  It was not lost on me that those two semesters were the semester before I started taking classes toward my concentration and the semester after I completed my class requirements in music.  My senior year included my first courses in computer programming, statistics, and an education policy course at the Graduate School of Education.

My pattern of growth in year-long college music theory and composition classes mirrored my performance in high school physics – from struggles at the beginning of the year to a certain level of comfort mid-year to mild success in the spring.  Although that pattern produced final grades of at least B- in high school that was not the case in college.

In the uniquely and thoroughly Harvard grading system, the full-year average of a C+ in the fall semester and B in the spring was a C+.  As that pattern played out over several music theory and composition courses, I began to realize that music and musicology might not be my future.  Sure, it was disappointing, but I’m not bitter. Each year, I set aside $100 in first six months of the year and $10,000 in the second six months to donate to Harvard, and then in December I send them a check for $100.

I never lost my love of music, I still go outside to watch the ISS pass over southern Maine, and I have even built a bookcase or two.  After college, I spent the next decade sharpening my skills in other passions that had always been there; moving on to teaching, a master’s degree in educational research, a doctorate in educational measurement and evaluation, and ultimately, a 30-year career in assessment and education policy that fit nicely between Joe Biden’s first and final presidential campaigns.

Find Your Passion

Would my life have turned out differently if those music grades had been B- instead of C+?  Probably not. Like the physics courses in high school, it was the experience of being able to put that particular passion to the test that made the difference. I wouldn’t trade the experience of those physics and music courses for the chance to start studying statistics a few years earlier.

This spring, as I watched the virtual Harvard Commencement and Boston Public Schools virtual graduation celebrations I heard student speakers talk about discovering themselves, finding their passion, and trying again, failing again, and failing better. I certainly felt that I was able to do those things in high school and college. I never had the feeling that GPA was a gatekeeper that could prevent me from reaching the next step. I am not sure that is the case with many high school students today.

This new concern about the potential negative consequences attaching increased significance to high school GPA rests on top of already existing concerns about what I see as the growing trend to make high school and college (i.e., postsecondary training) much more of a direct path toward a career.  Although I understand the issues related to the student debt and high cost of higher education and the desirability of having students graduate from high school “career-ready,” in my case I know that I could see my career path much more clearly at the end of college than in the fifth, ninth, or even twelfth grade.

To borrow one of the top phrases of 2020, we cannot let the solution be worse than the problem.  In the name of student choice, practicality, affordability, and eliminating evil standardized tests it is easy for me to envision a system in place 15-20 years from now in which students are placed into immutable pathways (or tracks) as early as middle school.  In our country, in general, and in education and assessment, in particular, we have not proved ourselves to be very proficient in thinking through the long-term consequences of complex reforms and initiatives. Looking back, however, I can see clearly now.

Why I Belong to the WBCA

A Father’s Day Reflection on Basketball and My Family

 

761AD379-4096-4566-9982-54DEC7816A90_1_201_a

Charlie DePascale

Since 2006, I have been a member of the WBCA – the Women’s Basketball Coaches Association.  That may seem odd given that I have never coached a women’s basketball team or any basketball team for that matter.

The WBCA does now have a membership category for fans. Many people know me as a devoted fan of Boston College Women’s Basketball,  and let’s say an ardent admirer of UCONN. I also support my Crimson and Golden Gophers, and in general, I am a fan of women’s basketball.  That might be enough to explain why I faithfully renew my WBCA membership each year; but it would only be part of the story.

In this Father’s Day post, I would like to share the rest of the story.

Father

It would be an understatement to say that coaching basketball had a profound impact on my father’s life and career.  Remarkably, his decision to stop coaching basketball had an even greater impact on his life and on our family.

After graduating from high school in 1949, he served four years in the Air Force before returning home to the Roxbury section of Boston where he was born. By 1957 he was married and working full-time as a side laster in the local shoe factory . He was also a player-coach on the company basketball team in the Boston Park League and volunteered three afternoons and three nights per week as a basketball coach (among other things) at the Emmanuel House.

Emmanuel House was a Catholic settlement house in the neighborhood. He had found refuge there after his father died when he was 8 years old.  It was at the Emmanuel House that he developed his lifelong love for basketball, despite never “outgrowing” the nickname  “Pee Wee” that his mother gave him as a child.

After winning the settlement league championship with Emmanuel House and the division championship with the company team he was offered the coaching job at Don Bosco Technical High School, a Catholic vocational/technical school that had recently relocated to downtown Boston – about a mile from his neighborhood.

7FFE160B-162B-4CC9-840C-0D39EE8735BB_1_201_a

In 1958, he led Don Bosco to the Catholic League Suburban Division championship – the school’s first athletic championship. The following year he was offered a position as a history and math teacher. He left the shoe factory (returning in the summer months) and began what would become a 40-year career as a math teacher.  Don Bosco didn’t have its own gym until the 1970s, so after school as many players as possible would pile into his shiny black Plymouth and drive to practice.

I was born in 1959 and my sister followed in 1961. His career as a coach and teacher were thriving. Around that time, the new director of the school, a selfless priest named Fr. Vincent Duffy, encouraged my father to go to college – yes, he had not yet gone to college. He replied that with two young kids at home he could not possibly go to night school and coach basketball, the job he was hired to do.  Fr. Duffy told him that in the long run it would be better for him to get his credentials to be a teacher even if it cost the school a coach.

My father stopped coaching (kept teaching) and enrolled in night school at Northeastern University, about a mile from Don Bosco.  After earning his Bachelor’s Degree with honors, he continued on another mile up the road, earning his M.Ed. from Boston State College in 1968. In the 1967-1968 school year he also left his position at Don Bosco, moving to public schools and Canton High School.

Father and Son

After settling into his new position, basketball called again.  Taking on the position of coach of the freshman boys basketball team, he also served as advance scout for the varsity team. As a 10-year old, I was able to accompany him around southeastern Massachusetts on Friday nights (but not Tuesdays) as he scouted the next opponent.  I eagerly watched him chart shots and make notes throughout the game.  I learned how to chart shots along with him. After returning home, I would head off to bed as he sat at the kitchen table writing up his notes to deliver to the varsity team on Saturday morning.  He would patiently explain to me the key things he was noting, but I just couldn’t see the game the way he did. I assumed that skill would come with age and experience.  It didn’t.

The varsity team won its first league championship. After coaching the freshman team for a few more years, he took another short break from basketball.  Then in the mid-1970s he was offered the opportunity to coach the varsity girls basketball team.

Now in my last years of high school and first years at Harvard I was able to catch the bus to Canton and attend his games.  By that time, I had sharpened my skills at charting shots as well as tracking offensive and defensive rebounds, assists, and turnovers.  I would summarize the data, compute the game and cumulative stats, prepare visualizations, call in scores, and write game summaries for the local papers. Although I had clearly found my passion, I didn’t realize it at the time.  I was just thrilled to be working with my father and making a small contribution to his team.

BC88AD06-11B9-4169-9FDC-51AC76352C24_1_201_a

And together we became fans of girls and women’s basketball.  Never ones to do something halfway, we watched the annual Iowa 6 on 6 basketball tournament on public television.  I sent away for an Iowa Cornets t-shirt (how could I resist that name as a basketball fan and music major).  We watched Delta State, Louisiana Tech, and Old Dominion win championships and cheered for the Mighty Macs.

At Harvard, I would climb the stairs to the top of the IAB (Indoor Athletic Building) to the “glorified” high school gym one floor above the swimming pool. There I would watch the Harvard women’s team with players such as Wendy Carle (a friend of a friend and the woman who took piano lessons right before me) and later Rose Guarino (daughter of a Boston Public Schools music administrator I knew and one of the kindest men I ever met). I would also get to catch up with some of my father’s former players who now played for local colleges and universities.  Things were a lot more personal and informal in those days – you could just mingle with players on the court after the game.

When I arrived at the University of Minnesota in 1983 for graduate school, one of the first things I did was spend $12 for a season ticket to all women’s athletics (yes, $12).  Because I had the ticket, I started attending women’s volleyball games.  At those games I met the women’s basketball team – fundraising in the concourse during each game. Again, it was a different time.  And so, that winter I spent many nights making the cold walk to Williams Arena to cheer for Laura, Carol, Gretchen and the rest of the Gophers.

I was a fan of women’s basketball.

Father and Daughter

Fast forward to 2005 and a dreary Saturday afternoon at Logan Airport in Boston. I am waiting for a flight to Indianapolis for my first NCAA Women’s Final Four. Having taken the bus to the airport from Portsmouth, NH I am sitting comfortably at the gate two hours early.

As other people began to arrive, I was joined in my small cluster of seats by two assistant coaches from Boston College and Harvard head coach Kathy Delaney-Smith.

(Aside for my non-basketball readers:  At that time Kathy Delaney-Smith had been at Harvard for 20+ years with 9 first place finishes in the Ivy League and 5 NCAA tournament appearances. But she was already a legend in Massachusetts basketball before coaching her first game at Harvard due to the success of her Westwood High School team and her position as a champion for gender equality.  My father’s team had considered it a moral victory to be able to contain her team for one quarter.)

The coaches chatted briefly about basketball strategy and then began discussing how they supported their players as student-athletes, the struggles and challenges the players faced, the philosophies of their programs.  It was a brief, but inspiring, conversation to observe.

At that time my daughter, Mary, was in fifth grade and had just completed her second year of organized basketball.  She was the tallest in her class – the first in our family to play with her back to the basket – and already seemed to have inherited her grandfather’s understanding of the game (maybe it skips a generation).  We had already been looking at basketball camps for her to attend that summer; and before I got on the plane I knew it had to be either Harvard or Boston College.

As it turned out, the Boston school calendar was extended because of a large number of snow days, the Harvard camp was cancelled, and Mary was on to Boston College.

DSC06306c

Mary attended basketball camp at BC for seven years – it was her annual refuge and oasis as she progressed through middle school and high school.  The basketball was fun, but the life lessons learned from Cathy Inglese, her staff, and her players were invaluable.  There were different life lessons when “Coach” was let go, a new coach was hired, and players transferred.  In the years that followed, it wasn’t quite the same with the new staff, but Coach Crawley, with her “Spirit Day” sermons (probably the most appropriate word) dispensed her own invaluable lessons to the young fans and campers.

We bought BC season tickets and became very familiar with the drive to/from Chestnut Hill. Through the ACC schedule, Mary was exposed to other powerful role models like Brenda Frese, Coach P (we are from Maine, after all), and Kay Yow. We made the trip to Greensboro to visit colleges and attend the ACC tournament (for my money, the best week of basketball anywhere).  As time went on, we travelled to Connecticut to see Carolyn play with the Chicago Sky, watched Ayla sing at the Grand Ole Opry, and Mary bought Vic’s social justice t-shirts. Through Facebook, Mary watched other players begin successful post-basketball careers.

DSC02425x

As for Mary’s playing days, the 5’3” she reached in fifth grade was her final growth spurt. She moved from center to forward to point guard, but never lost her love for boxing out and fighting for a rebound.  She played basketball through high school and was co-captain of the JV team her junior and senior years. She is now a graduate student at the University of Maryland, cheers on Brenda Frese and the once-hated Terps at the Xfinity Center and even has an office in the historic Cole Field House. And when she was home for Christmas last December, we were in our regular seats at Conte Forum to watch BC take on NC State.

The Rest of the Story

Now you know the rest of the story.

When I think of the impact that coaching and the game itself has had on our family, how can I not support women’s basketball and women’s basketball coaches.  When I think of the impact that teachers and coaches like my father and Cathy Inglese had on countless students, players, and campers, how can I not support women’s basketball and women’s basketball coaches.

So, on this Father’s Day I will remember my father and FaceTime with my daughter and thank basketball for all that is has given our family.

 

 

Hamilton & The Future of Assessment

Your Obedient Servant, C. DePascale
hamilton assessment

roughly to the beat of Alexander Hamilton –
(with sincere apologies to Lin-Manuel Miranda and the entire rap community)

How could three simple, questions, multiple-choice
And some False-True, dropped in the middle of the Hamilton app
On my iPhone, by all accounts just trivia
Invalid, reveal to me the future of assessment? 

The through-course assessment without assessment
Gets you much farther without testing being harder
By being a lot smarter, by students being self-starters
By springtime, you’ll know exactly what they’ve mastered.

 You think this sounds insane, man, that thing is just a game.
It’s candy for your brain, no alignment to our claims
But there’s a million things we haven’t done
And we just can’t wait. We just can’t wait.

After answering the daily trivia questions on the Hamilton app off and on for the past two years I finally reached the 1,000-star milestone. If you’re not familiar with the app, here are the basics:

  • Three questions per day
  • One point/star awarded for each correct answer
  • Extra stars for answering all three questions correctly

The bottom line is that you have to make a commitment and answer a lot of questions correctly over an extended period of time to reach 1,000 stars.

IMG_9439

Look around, look around

As I reflected and pondered the significance of this major life milestone I arrived at two conclusions:

  • I know a lot, but not too much, about the musical Hamilton; and wait for it …
  • This trivia game is a perfect example of, or at least a metaphor for, what we want out of assessment and instruction.

Based on the questions I have answered correctly and perhaps more importantly, those that I did not, I can make a solid summative, self-appraisal of my knowledge of two constructs: Hamilton the Musical and US History during the Revolutionary War period – the two major themes addressed by the game.  The developers of the app or a teacher could make a similar judgment. Focusing on the musical:

  • I am quite proficient in areas related to the music or performance of the musical itself. Seeing the show three times and listening to the CD countless times has paid off.
  • I am still somewhat proficient, but less so, in areas related to the creative and production team and the lives and careers of the original Broadway cast.
  • I am not proficient in other productions of the show and the details of ancillary material such as the Hamilton Mixtape and Hamildrops, although I am familiar with both.

This information would allow someone to make fairly accurate inferences about the depth of my obsession with Hamilton.  There is a logical underlying progression to the questions I am able to answer correctly in the trivia game that allows one to accurately place me along a Hamilfan or Hamfam scale.

True, there may be outliers who are totally obsessed with the Mixtape and know next to nothing about the show; there will always be outliers. Similarly, there may be a class of fourth-graders somewhere who have been taught the low-level skill of determining the first derivative of y = 3x2, but we make the assumption that most people who can answer questions about first derivatives have successfully advanced through a sequence of mathematics courses to arrive at Calculus I.

We build learning progressions and assessments based on information that maximizes our ability to make accurate judgments and instructional decisions, but we don’t rely solely on a test to make summative judgments about student performance.

Thinking Past Tomorrow

Now before you challenge me to a duel, no I am not suggesting that the future of assessment is low-level multiple-choice, true-false, complete-this-lyric questions.

I am suggesting that future of assessment is the continuous collection of information along a well-defined learning progression using both formal and informal methods. Information collected so that a teacher or student will have sufficient information at a given point in time to be able to evaluate the evidence in front of her and make an appropriate formative or summative judgment.

I am suggesting a shift in mindset from assessment as a separate event to assessment as an integral part of the teaching and learning process.  We have had limited success in conveying that idea with respect to formative assessment, but it applies much more broadly. As the poet Pellegrino taught us, it’s all about knowing what students know.  Traditional end-of-year large-scale assessments as we know them in K-12 education should be, at their best, a proxy for and confirmation of the year-long collection of evidence compiled by teachers and students.

Yes, this brave new world of assessment is wide enough for both a greater focus on continuous assessment based on well-defined learning progressions and psychometrics, IRT, adaptive testing, and all of the technical tools that we have come to love.  We have to accept and embrace the idea, however, that it’s time for our assessment world to be turned upside down.

What comes next?

Well, for me, it’s time to take a break, sit for a moment alone in the shade, lament the cancelled Hamilton performances at the Kennedy Center this summer, and wait patiently for #Hamilfilm to drop on Disney+ on July 3rd.

As I have written previously, however, I am certain that 30 years from now today’s graduate students, young faculty, and freshly minted psychometricians will be retelling the story of how they seized this moment in time and blew us all away by not giving up their shot to improve assessment and education.

Something is Not Quite Write

Nagging Issues That Can Affect the Utility of Assessments

 

writingtools

Charlie DePascale

Starting with the direct assessment of writing, the inclusion of items requiring students to produce written responses may be the most significant development in large-scale assessment in the past three decades.  We now stand on the cusp of a new wave of advances with automated scoring supporting locally administered curriculum-embedded performance tasks that measure 21stCentury skills. As we move forward, however, we need to acknowledge and reflect on key aspects of assessing student writing that we have not quite figured out. 

In this post, I offer examples of unresolved issues at four critical points in the assessment process.  Most importantly, each of these issues affects the ability of teachers and students to interpret and use the results of assessments to support instruction and improve student learning.

  • Designing assessments that mirror authentic writing
  • Understanding the accuracy and precision of scoring
  • Assumptions in the scaling and calibration of writing items
  • Equating and linking assessments containing writing tasks

Designing Assessments that Mirror Authentic Writing

One of the strongest arguments for moving from traditional selected-response writing tests to direct writing assessment was the perceived importance of authentic assessment – of measuring students’ ability to actually produce effective writing rather than to simply recognize the elements of good writing.  A similar argument about authentic assessment was made regarding the need to shift as quickly from paper-based to computer-based assessment of writing  – students (and everyone else) write on computers, not paper.

Despite our best efforts, however, we must acknowledge that the writing students produce for on-demand large-scale assessments in response to prompts or stimuli that they have never seen is far from authentic writing.  It is simply a good sample of the writing that students are able to produce under a particular set of conditions and it has been found to be a fairly solid indicator of students’ general writing achievement.

At the very least, we must clearly communicate that there should be a difference between the quality of writing considered “proficient” on an on-demand large-scale assessment and the product that would be considered proficient in the classroom after the full application of the writing process, including the use of resources not available in an on-demand assessment setting.

Although it may be beneficial to use a single criterion-based rubric to classify student writing on the assessment and in the classroom, there must be consideration of the context in which the writing is produced.  For example, if a scoring rubric calls for “effective selection and explanation of evidence and/or details,” the presentation of evidence and details considered effective for a student response produced on demand in a setting where the student may not have access to primary sources should be different than the presentation of evidence and details expected in response to the same task administered in a classroom setting where the student has access to other materials as well as the opportunity to review and revise their work.  Although the criterion-based rubric may remain constant, the quality of work required to meet certain criteria will vary based on context. 

It is a disservice to teachers and students to perpetuate the notion that the exemplars reflecting writing at various points along a scoring rubric are context free.  The same considerations of context apply within the classroom setting as teachers evaluate student responses produced on a classroom assessment, for a two-day homework assignment, or for an extended research project.

Understanding the Accuracy and Precision of Scoring

There is little doubt that we are now moving rapidly from human to automated scoring of students’ written responses.  When moving to a new and improved system, we must consider that there may be reasons why things are done the way they are, and those reasons may not be obvious to us 10, 15, or 20 years down the line. For example,  I am concerned that in building automated scoring models we might forget the thinking that shaped current scoring processes.  Although one might assume that the desired outcome when scoring essays is exact agreement among two raters of the same piece of writing that might not always be true.

When holistic scoring of student essays gained popularity in large-scale assessment in the late 1980s and early 1990s it was common practice for each student essay to be scored by two raters. If the rubric contained six score categories, each rater assigned the essay a score of 1-6.   The student score was the sum of the two raters’ scores (i.e., 2-12 for a 6-point rubric).   Using an adjacent agreement scoring model, students received an even numbered score if the two raters assigned the essay the same score (1-1, 2-2, 3-3, 4-4, 5-5, 6-6) and an odd-number score if the raters assigned adjacent scores (i.e., 1-2, 2-3, 4-3, 4-5,6-5).

A major focus of blind double-scoring has always been reliability and inter-rater agreement. Therefore, when other approaches to increasing reliability became more practical, such as administering multiple writing tasks per student and having different raters score each item on a student’s test, the practice of double scoring all student responses became less common.  As the field moves to automated scoring, all student scores will be based on single scoring.

Monitoring inter-rater agreement, however, was not the only reason for double-scoring and expedience was not the only reason for accepting adjacent scores. Analyses of scored student essays conducted over several years and assessment programs have shown a discernible difference between the quality of writing in student responses assigned an odd numbered score (e.g., 4-3,3-4) and responses with the next lower and higher even scores (3-3 and 4-4).  Responses at the borderline between two broad score categories were different than responses in the middle of either category. Like achievement levels, each score category contains a continuum of student performance. At least at the aggregate level, allowing adjacent scores from two raters added to the accuracy and precision of scoring.

This approach to human scoring could be quite useful in building automated models. It can be applied directly by training the model on student responses scored by two raters or indirectly by training the model on single-scored response but assigning points to students using a decision rule that accounts for responses that the model identifies as borderline. For example, clear 1’s receive 2 points, clear 2’s receive 4 points, but borderline 1-2 papers receive 3 points.  Such an approach might not only maintain the advantages of the human scoring model but extend them to the individual level as well.

Assumptions in the scaling and calibration of writing items

Whether within a standalone writing assessment or an English language arts assessment that combines reading and writing, the standard operating practice has been to apply a unidimensional IRT model that treats each writing task as an individual item.  The problem with this approach is that any differences in performance across tasks are attributed wholly to differences in item difficulty. There is no room within a unidimensional model for the concept that a student of a particular ability level may be a better writer in one genre than another.

For example, if students at a particular grade level are less effective at responding to prompts that require them to generate persuasive essays than they are at responding to prompts that require them to produce narratives or informative/explanatory essays then the persuasive prompts are classified as more difficult.  In practice, this plays out as students and teachers seeing differences in ratings of student writing against the criterion-based writing rubric and no differences in the scaled scores or achievement level results.

In reality, there undoubtedly are differences in difficulty in prompts within genre and also differences in students’ effectiveness in writing across genres.  However, by applying scoring models that conflate the two and mask cross-genres differences in students’ writing ability we are not providing useful information to support instruction in writing.  Such an approach might be “fair”  at the aggregate level from an accountability or measurement perspective but does not account for individual differences within genre and lacks utility in improving instruction and student learning.

Equating and linking assessments containing writing tasks

Nearly all of the unexpected and inexplicable issues I have encountered in equating state assessments across 30 years have been related to equating English language arts tests that include reading items and a writing task.  It’s not that the process never works, but when it appears that it hasn’t worked there is no good way figure out why it is not working or to fix it.  

I have spent far too many summer Sundays (yes, these problems always come to a head on a Sunday) in a testing company office or on a conference call with state assessment staff and psychometricians trying to figure out what to do so that the state assessment results make sense: do we fix the writing item parameters to a particular value, drop writing from the equating,  “freeze” the writing results with some version of equipercentile linking. And then how do we document the decision?

And even when the equating process appears to work, you cannot really be certain that it has actually done what you designed the assessment to do. I cannot count the number of times that equating an English language arts test went smoothly because the writing prompt was dropped from the anchor set based on pre-determined equating decision rules. 

There are several good, technically sound reasons for dropping the writing task from the anchor set or even not including it in the equating process by design. When we drop the writing task from the equating, however, we have to ask whether the results are an accurate reflection of student performance.  If student performance from one year to the next improves more in writing than reading due to a new focus on writing and improved instruction will that improvement be captured if the writing task is not included in the equating process.  By not including writing in equating, might we mask real differences between reading and writing performance or perhaps real differences in writing performance across genres in the same way discussed in the previous section on item calibration?

We have tried to force writing to fit nicely into a unidimensional box but every so often it likes to stick its head out and tell us that it really doesn’t fit. Reading and writing are related, but separate constructs.  There are other non-IRT based approaches to combining reading and writing performance into a composite English language arts score, but those pose their own challenges such as the need for adequate field testing to understand differences in difficulty among writing tasks.  As stated above, there are reasons why things are done the way they are now, and those reasons must be understood before we try to make “improvements” to the current process.

Moving assessment forward

The inclusion of items requiring students to produce written responses had an immediate and profound impact on instruction (although there is still room for the use of well-designed selected-response and technology-enhanced items in the assessment of writing). Demonstrating that the large-scale assessment of writing was practical as well as possible from the perspective of technical quality opened the door for the use of a variety of constructed-response items across content areas.

Advances in administration and scoring technology are now making it feasible to expand the use of tasks in large-scale, standardized assessments which require written responses in ways that were unthinkable ten or even five years ago. These advances will also make it possible expand our concept of large-scale standardized assessment to include curriculum-embedded performance assessment tasks that are administered locally at different times of the year based on the curriculum and individual student achievement. It is critical, however, that we devote at least as much time considering how the results of those new assessment tasks will be interpreted and used to support instruction as we devote to the technical challenges of administering and scoring them.