Advertisement

Computer Reads Poetry, Supports Claim It’s Work of Shakespeare

Share
Times Staff Writer

A statistical analysis based on every word in the writings of William Shakespeare supports the claim that a recently discovered love poem attributed to him is genuine, a statistician said Thursday.

The untitled poem was discovered late last year by Gary Taylor, a 32-year-old researcher from Topeka, Kan., who found it in a book in the Bodleian Library at Oxford University.

The discovery touched off a lively literary debate over whether Shakespeare actually wrote the poem, since some scholars consider it to be of poor quality as poetry and think there are other anomalies about it.

Advertisement

Enter statistics.

Ten years ago, two statisticians, Brad Efron of Stanford University and Ron Thisted of the University of Chicago, analyzed the frequency of word usages in Shakespeare. They published their results in the journal Biometrica in an article entitled, “How Many Words Does Shakespeare Know?”

The entire body of Shakespeare’s known work contains 884,647 words. Of these, 14,376 different words are used just once, 4,343 are used twice, 2,292 are used three times and so forth. It is possible statistically, based on a frequency distribution of this kind, to predict how many new words should appear in a new work, based on its length. (More about this later.)

After the 350-year-old poem was discovered, the two put the poem to the test.

The work contains a total of 430 words.

Efron and Thisted predicted that 430 additional words by Shakespeare would contain, on the average, about seven new words--plus or minus a little less than three.

The actual number of new words in the poem is nine.

Similarly, their theory predicts that 4.21 words that have been used exactly once before by Shakespeare would be used a second time. The actual number is seven.

“The poem comes out very well,” Efron said by telephone Thursday from Stanford. “The statistics are remarkably similar to Shakespeare statistics.”

“That doesn’t mean that Shakespeare wrote it,” he hastened to add. “It means you can’t obviously reject the idea that Shakespeare wrote it.”

Advertisement

By contrast, Efron said, similar statistical analyses of the works of Marlowe, Johnson and John Donne (who are sometimes claimed to be the “real” authors of the works attributed to Shakespeare) come out with very different numbers.

“Right away you can see that Marlowe did not write Shakespeare,” Efron said. “You can see that Johnson did not write Shakespeare. You can see that Donne did not write Shakespeare. But the new poem passes the test.”

The usual statistical analyses of authors’ works focus on common words such as “and” and “the.”

Efron and Thisted’s test focuses on rarer words, which are thought to be less dependent on context and therefore a more reliable clue.

The statistical method for predicting how many new words will appear, based on the frequency of previously used words, was developed early in this century but was based on butterfly collecting in Malaysia, Efron said. A collector had seen some species of butterflies several hundred times and others just once, and he wanted to know how many he had not seen at all. Strange as it seems, statistics can handle this problem.

Consider: Suppose there was one species of butterfly and none other that the collector always saw. He would be justified in concluding that there probably weren’t any others. Similarly, if he saw a different species each time, he could reasonably conclude that there were many more to come.

Advertisement

Somewhere between, statistics can take the frequency distribution of the known world and predict the unknown, which is how Efron and Thisted predicted how many new words there would be in the newly discovered poem attributed to Shakespeare.

“It was fun to see our theory--which we never thought we’d get to test--come out so well,” Efron said.

Advertisement