Humans talk happy.
Maybe it comes from our being on top of the evolutionary heap. Maybe we suppress the language of naysaying and grouchiness to make social relations smoother. Maybe we're just happy.
Whatever the explanation, the most commonly used words in human languages across a wide range of cultures are more likely to carry positive connotations than negative ones, says the largest-ever study of natural language and its emotional capacity.
The new research, published Monday in the journal PNAS, is the first to use "big data" to confirm what has widely been called the Pollyanna hypothesis: the notion that, since humans are fundamentally happiest when socializing, human communication -- no matter where you find it -- will generally skew happy.
First put forth in 1969, the Pollyanna hypothesis posits, first, that as social creatures, humans take a basic pleasure in communication with other humans, and second, that, as the currency of such social exchange, our languages will largely reflect that positive feeling.
Compared to negative words, then, the Pollyanna hypothesis would suggest that words that convey positive emotions could be expected to be "more prevalent, more meaningful, more diversely used and more readily learned," wrote the authors.
Led by the University of Vermont's Peter Sheridan Dodds, an international group of mathematicians, modelers and linguists set out to test that hypothesis with a set of tools not available back in 1969. They combed through Twitter, the New York Times, the Google Books Project, Google's Web Crawl, and a library of movie and television subtitles and song lyrics to draw up lists of the roughly 10,000 most frequently-used words in each of ten languages.
Sometimes using several of these sources, the researchers generated a body of most-commonly used words in English, Spanish, French, German, Brazilian Portuguese, Korean, Chinese, Russian, Indonesian and Egyptian Arabic. (To generate the body of most-commonly used words in the English language, for instance, the researchers used Google Books, The New York Times, Twitter and music lyrics.)
Then, they paid native speakers of each of those languages to rate how they felt in response to each of those words on a nine-point scale, where 1 is most negative or saddest, 5 is neutral, and 9 is most positive or happiest.
For each word, they collected 50 ratings from native speakers.
They found that each language, on the whole, uses positive words more frequently and in a wider range of forms than they do negative words. There were gradations of relative linguistic happiness, of course: Spanish, followed by Brazilian Portuguese, English and Indonesian, topped the list for happy language; Chinese appeared least happy, with Korean, Russian and Arabic -- in that order -- showing low but increasing levels of linguistic happiness.
The conclusion that this confirms the Pollyanna hypothesis comes with some big ifs: if the ten languages studied successfully reflect the whole of human language; if the "corpora" of oft-used words in each of those languages accurately reflects the emotional balance of that language; and finally, if the language we use -- and the frequency with which we use it -- actually conveys our emotional states, and not just our circumstances.
This sort of "big data" approach is gaining increasing traction in the study of social networks, their impact on the larger society and their influence on individuals.
The authors assert that language-based instruments such as this might serve as "hedonometers," or measures of overall happiness or satisfaction. Depending on what sources are used, such an instrument might reflect either bedrock levels of happiness or shifting moods across large populations of people sharing a language, a cultural outlook and a social network.
Refined and improved, they wrote, such instruments might be used "to chart the dynamics of our collective social self."
In an earlier effort, the authors scoured Twitter in English for oft-used words and found, over time, that their emotional valence rose and fell in close parallel to the Gallup organization's well-being polls and related indices at the state and city level. They've also developed a literary hedonometer, which uses word-frequency to illustrate the emotional trajectory -- or "happiness time signature" -- of three great works of literature: "Moby Dick" (in English), "Crime and Punishment" (in Russian); and "The Count of Monte Cristo" (in French).
"This apparent linguistic encoding of our social nature," the authors write, appears to be universal. But it can only be gleaned, they caution, by measuring the whole of a language, not just dipping in, willy-nilly, to certain parts. In the future, they added, such linguistic measurements should be done in new languages, on different demographic groups and using phrases as well as words.