Advertisement

Google improves its speech recognition

Share

Google Inc. switched on a program this week that lets a smart phone learn to understand its user’s voice, dramatically increasing the effectiveness of verbal commands to search the Internet, send an e-mail or post a Facebook update.

The newly updated Voice Search app for Android phones gradually learns the patterns of the user’s speech so it can more accurately interpret the person’s voice commands.

That ability is of growing importance to the Mountain View, Calif., search giant, which sees Internet searches on smart phones as a significant part of its business. Although the company doesn’t disclose specific numbers, about 1 in 4 searches on Android devices are now done by voice, and the search volume on Android phones climbed 50% in the first six months of 2010.

Advertisement

“A lot of the world’s information is spoken, and if Google’s mission is to organize the world’s information, it needs to include the world’s spoken information,” said Mike Cohen, who heads the company’s speech efforts.

Google’s ambitions don’t stop at improving voice recognition. Its recent purchase of Phonetic Arts, a British company that specializes in speech output, highlights Google’s plans to allow your computer or smart phone to speak back to you, in a voice that will sound increasingly natural.

Google earns the vast majority of its revenue through search advertising. It expects a majority of its Internet business to flow through smart phones and other wireless devices in the future, so high-quality voice services are of high importance.

The linguistic models that Cohen’s team has helped develop over the last six years at Google, based on more than 230 billion search queries typed into google.com and speech inflections recorded from millions of people who used voice search, are now so vast and complex that it would literally take several centuries for a single PC to create Google’s digital model of spoken English.

“Voice is a critical strategic competence” for Google, said Al Hilwa, an analyst with the research firm IDC.

Google’s Dec. 3 announcement of the Phonetic Arts acquisition — terms were not disclosed — is “complementary to what Google is doing in social networking, video and mobile where it should be possible for people on the go to talk to their mobile devices, search engines or social networks as an alternative mechanism of interaction,” Hilwa said.

Advertisement

Speech also is a key area where Google competes with Microsoft Corp. The Redmond, Wash., software giant purchased Tellme Networks Inc. in 2007 to bulk up its speech services and offers voice search through its Bing search engine.

Before joining Google, Cohen co-founded Nuance Communications, a Menlo Park, Calif., speech technology company, in 1994. Cohen, a part-time jazz guitarist who once worked as a piano tuner and whose sextet once played the Montreux Jazz Festival in Switzerland, has been a research scientist in the field of computer speech technology for more than a quarter-century.

Cohen likes to laugh about Google placing a native Brooklynite — given the New York borough’s famously stretched and tangled dialect — in charge of speech recognition.

“I’m from Brooklyn,” Cohen said. “I’ve never parked my car; I only ‘paahhk’ my car.”

A person’s accent, he said, is one of the most difficult challenges for speech recognition services and is one problem that the new personal voice recognition service should help overcome.

Still, deciphering human speech involves much more than understanding accents. Variables include the shape of a person’s mouth, teeth and throat and the cadence and pitch of sentences.

“It’s all different from one person to another, and that all affects the sounds that come out,” Cohen said. “There is tremendous variation between individuals. It’s been a known thing that you can do better [at speech recognition] if you can do something to try to adapt to an individual’s speech patterns.”

Advertisement

Swift writes for the San Jose Mercury News/McClatchy.

Advertisement