Advertisement

Researchers Track Web-Wide Words

Share
Associated Press

Computer scientist Jon Kleinberg is taking a virtual stroll down the information superhighway, surfing cyberspace for verbal megatrends.

Did you wince?

Those hopelessly passe terms were passably hip just a few years back. Then, because of overuse or a feckless public, they fell out of fashion.

Kleinberg’s research is actually scientific. He uses algorithms to identify sudden jumps in the use of words, offering a glimpse into the mechanics of language evolution -- what makes a word hot, or not.

Advertisement

“It’s a fun tool to aim at things and see what happens,” said Kleinberg, a Cornell University associate professor.

Search engines that scour Web pages for specific words work fairly well, although there is a lot of weeding out of old and weird results. Kleinberg’s software is different. It looks at data without being given a keyword and reports back on significant topics.

For instance, the program scanned State of the Union addresses going back to 1790 (they are all online) and produced a list of “word bursts,” or words that jumped in frequency.

The program found “depression,” “banks” and “recovery” on presidential lips in the 1930s. In the late ‘40s and ‘50s, “atomic” was the explosive catchword.

The speech scan was a test to show that the software could come up with results that correlate to the real world.

The program is intended to look at data about which the searcher has no clue -- say a mountain of unread e-mail or documents -- and divulge a list of what topics were hot and when they started to heat up.

Advertisement

So far, the software detects trends in retrospect. Kleinberg is making it more predictive.

Prabhakar Raghavan, chief technology officer of Sunnyvale, Calif.-based software company Verity Inc., has used Kleinberg’s software to analyze Web logs, online journals commonly known as “blogs.”

Seeking emerging trends among cutting-edge bloggers, Raghavan looked for bursts of references and links to other people’s Web sites. Raghavan found that the software successfully identified such bursts, a skill that could ultimately help advertisers target their sales pitches.

To Web word watcher Paul McFedries, the burst software sounds like a great idea. He’s been using “wetware” -- his wits -- to trawl Internet databases for new uses of language.

McFedries posts the results on his site, the Word Spy.

“I kind of now have this sixth sense,” he said. “I see a word I recognize as being new, and then I check and see whether it’s just something the writer made up or if it’s something a lot of people are using.”

But, alas, new word watching can pinch nerves in these days of closely protected Internet trademarks.

McFedries got into a spot of trouble when he noted that people have started using “google” -- from the popular search engine -- as a verb, meaning scoping out a subject or person, as in, “Naturally, I googled him before I agreed to go out with him.”

Advertisement

Google Inc. lawyers took exception to using “google” as anything but a trademark. Harmony was restored after McFedries agreed to reference the Google trademark on his site.

Some of the words McFedries has spotted are tech-related, such as “ham,” which means legitimate e-mail that gets lost in “spam” filters because it contains some spam-like phrases. Others are free-floating jargon, such as “induhvidual,” meaning one who acts foolishly.

(Spam, the now-mainstream label for junk e-mail, is believed to derive from a Monty Python sketch that made fun of the eponymous canned meat.)

The Internet both creates and propagates terminology, “e-tymologist” McFedries says.

So there’s “dead cat bounce,” an unlovely phrase referring to a stock that dives, starts to rise, then falls back after leaving the trading floor for the world at large.

“Ping,” the sound of a sonar pulse, has evolved to mean getting someone’s attention online, as in, “I’ll ping Frank to see if he’s there.”

And then there’s “bandwidth,” which describes not just transmission but also mental capacity. For instance: “I’m not sure he’s up to the job. He’s got awfully low bandwidth.”

Advertisement

What makes a new word stick?

Simple sells; clever crashes, says Allan Metcalf, author of “Predicting New Words: The Secrets of Their Success” and executive secretary of the American Dialect Society.

Of course, there’s nothing new about creating words, he points out.

One of the most dramatic language upheavals came after the Normans conquered Britain in 1066. Suddenly, all the “nobs” were speaking French. With no one to lay down the law about proper English, the peasants had their merry way, dropping the Germanic inflections of Old English and developing easier-on-the-tongue Middle English.

What’s different today is the Internet effect, capturing each mutation and revolutionizing the study of words.

“It’s kind of like the Hubble telescope,” Metcalf said. “The stars are what they were before, but now you can see them more clearly.”

*

(BEGIN TEXT OF INFOBOX)

A burst of online catchwords

Granular: Looking at something in finer detail.

Ping: To check whether someone is available. From a software command that determines whether a computer is running and reachable over a network. The term derives from the “ping” of sonar bouncing off an object underwater.

Blogger: One who writes an online journal -- a Web log, or blog.

Multislacking: Surfing the Web instead of working

Cracker: Hacker who uses talents for malicious or criminal ends. Also dark-side hacker.

Crash-test dummy: Someone who buys the initial release of a software package, probably riddled with bugs.

Advertisement

Dot snot: An arrogant young person who became rich by creating a dot-com. Similar to millionerd.

Feature shock: Computer user’s reaction when faced with a program with many features.

Internot: Someone who does not use the Internet.

Meatspace: Opposite of cyberspace, the real world.

Mouse potato: Someone who spends a lot of time at the computer.

Screenager: A young person who has grown up with TVs, computers, ATMs and the like.

*

Source: Associated Press

Advertisement