Does my writing compute?

A few years ago, my local school district invested in software designed to teach students better writing skills. The computer program — without the help of a teacher — would rate their work on a scale of 1 to 6 and give them feedback on the needed improvements, such as fixing grammatical errors or expanding sentence fragments into full sentences. The students could watch their scores rise as they made corrections, actively engaged in the process of learning new English usage skills, while their teachers were freed from the chore of reading every draft.

Great theory. Now some reality: During my daughter’s initial assignment using the software, her first draft earned a 5.9 out of 6. The tenth of a point deduction was for repeating a short phrase. Fair enough. She changed the wording — maybe four words — and her score inexplicably plummeted to a 4. She put the original wording back and her score rose by a couple tenths of a point. Then she spent the next three hours trying to figure out how to get her score back up and left the computer sobbing and declaring that she hated writing and school.

Other students were having similar experiences, the teacher said. The district used the software less and less, and three years later, the principal mentioned that they were dropping it altogether.

So it was hard to avoid a little cynicism when a professor at the University of Akron recently reported that automatic essay-scoring software, used to assess the writing samples in the state standards tests administered annually under the No Child Left Behind Act, rates most essays about the same as human scorers do.

The field of natural language processing, a subfield of artificial intelligence, is always growing more sophisticated. It would be unfair to judge this software — several different types from different creators were tested — by the computer disaster at my local school. But sophisticated enough to lead us into an era of good writing?


What software can do, according to Akron education professor Mark Shermis, is assess structure — the skeleton of writing but not the flesh. It can judge whether a sentence is complete, but not whether the sentence says anything worth a damn. It can hunt for sophisticated vocabulary but can’t determine whether those words are employed in meaningful and appropriate ways. It can suss out poor grammar, incorrect spelling and repeated phrases but not originality, flow or liveliness. It also cannot begin to measure accuracy, depth, logic or critical thinking skills. Sad to say, machines just don’t understand us.

Meanwhile, the software takes a dim view of creative flourishes that enliven prose if they reach outside a narrow box of English expository writing. Sentence fragments? Forget it. And sentences that begin with “and.” What about intentional repetition for effect, a sort of linguistic ostinato? Safer to stick with the plodding voice of the traditional five-paragraph essay.

There are no official proposals at the moment to replace human scorers with machines, Shermis said, but there are plenty of groups interested in pursuing the idea. Right now, the movement smacks more of a desire to save money than to improve education.

To be fair to the machines, this isn’t solely about the weaknesses of automated thinking. They’re following a scoring rubric that calls for rating composition skills based on form rather than substance. That’s why the scores are similar to those arrived at by humans, who have often been instructed to follow the same paint-by-number rules. It results from the emphasis on basic skills brought to us by the standardized testing movement.

But no one should mistake that for good writing.

The education reform movement pushed schools into an era of ever more multiple-choice tests. In response, teachers moved away from deeper curriculum. When complaints arose that this was dumbing down education, most states adopted a purportedly richer common curriculum for reading and math, which will show up in schools in a couple of years and make essays a bigger part of the testing equation. But if the scoring process cannot measure whether a student has melded fact, thought and verbal grace into cohesive written form, we might as well stick with having students fill in the bubbles.

--Karin Klein