Don’t expect the Tower of Babel to tumble, but computer scientists are breaking down language barriers with machines that can translate languages in seconds and recognize speech regardless of accent.
“Where can it lead to?” said Bill Wulf, assistant director of the National Science Foundation. “How big is your imagination?”
Need to call Tokyo? Hotel switchboards, within a few years, will be able to connect English-speaking callers to a special computer that translates their words into Japanese, then responds by machine in English.
Stuck in a foreign hospital? Patients and doctors who speak different languages will be able to communicate via computer in about a decade.
Traveling abroad and can’t speak the language? Forget a dictionary. Plan on packing a pocket translator.
Too busy to type information into the computer? Just dictate.
The list goes on and on, including eyeglasses for the deaf with screens that provide transcripts of conversation, and the ability to order merchandise like plane tickets by talking on the phone to computers.
“One of the most distinguishing characteristics of human beings is our ability to communicate,” Wulf said. “What you’re seeing here are techniques that improve that ability.”
Companies are beginning to cash in on voice-recognition programs even while experts are trying to make the computers capable of dealing with the ambiguity of human language, bad grammar and different accents, among other things.
“We need to reduce the costs and increase the size of the vocabulary and increase the capabilities,” said Raj Reddy, director of Carnegie-Mellon University’s Robotics Institute. “All of those are yet to be solved.”
“What is there is obviously a very significant improvement over what we used to know how to do even six months ago.”
Dragon Systems Inc. of Newton, Mass., developed a program that Xerox Corp. used in 1986 to save nearly $10 million by inventorying all 2.2 million of its parts for the first time, said Dragon’s assistant marketing manager, Jonathan Robbins.
The $100,000 system recognized 1,000 words. A 5,000-word program is out, and a 20,000-word version is expected by early 1989, Robbins said. Dragon is also working with the federal Defense Advanced Research Projects Agency to develop a voice-controlled jet fighter cockpit.
Kurzweil Applied Intelligence Inc. of Waltham, Mass., has developed voice-recognition systems for radiologists and emergency rooms. Doctors dictate their reports to a computer, and the reports are printed. The programs originally recognized 1,000 words, but the vocabulary has been expanded to 5,000.
More than 100 U.S. hospitals use the radiology system, VoiceRAD, and about a dozen have VoiceEM for emergency rooms, spokesman Martin Schneider said.
Both Dragon’s and Kurzweil’s systems require brief pauses between words and must be trained to recognize each speaker.
“Most of us don’t speak properly most of the time,” said Reddy, who is also president of the American Assn. for Artificial Intelligence. “If you transcribe anybody’s voice . . . there are all types of pauses and repetition, hums and haws, which we know how to ignore. Computers don’t know that yet.”
“Either we have to come up with ways to deal with (this) or we have to teach people to speak right,” said Kai-Fu Lee, a Carnegie-Mellon researcher who has developed a speech recognition system known as Sphinx.
It’s a slow, complex task because human language is rife with ambiguity, unlike mathematical, scientific or computer language.
Take, for example, this sentence: “The box is in the pen.”
“For humans, it’s obvious because we all know that the box cannot be in the writing pen (through) our unconscious kind of common sense,” said Masaru Tomita, associate director of Carnegie-Mellon University’s Center for Machine Translation.
“If you want computers to do this, it’s going to be very difficult,” Tomita said. “The computers have to know the typical size of a box and the typical size of a pen. One thing we can do very quickly is that whenever the word pen appears, we can assume it’s a writing pen. But on the other hand, then the system will make mistakes.”
A classic example of machine misinterpretation involves the epigram, “The spirit is willing, but the flesh is weak.” Early translating systems turned that into, “The vodka is good, but the meat is rotten.”
Lee’s Sphinx, unlike earlier systems, can identify English words spoken continuously. It boasts up to 96% accuracy identifying 997 words; people don’t have to spend hours training it to recognize their speech patterns, and minor variations in accent pose little problem.
Air Pressure to Voltage
“It’s the first system that has all these capabilities,” said Lee, 26, who has been working on Sphinx for a year and a half, funded by the Defense Advanced Research Projects Agency.
Sphinx transforms spoken words from changes in air pressure into changes in voltage. Each 10-millisecond slice of sound is assigned a string of digits and, through a mathematical process, matched with other sequences to yield the word with the best possible meaning. So that time isn’t wasted searching the entire vocabulary, Sphinx takes syntax into account.
Tomita, 30, who plans to work this fall with Lee, has developed a system capable of recognizing 100 words of doctor-patient dialogue in Japanese, then translating the Japanese into English uttered by a computer’s speaker.
The computer transforms the spoken Japanese into its written equivalent, which is translated into written English and sent to another part of the computer that generates English speech.
Tomita has also devised a program that translates written English or Japanese into written English, Japanese or German. A hypothetical exchange between a doctor and patient--"I have a headache” and “Take two aspirins"--is typed into the computer, and the translation appears within seconds.
Like Sphinx, Tomita’s 1,000-word systems consider context.
He hopes to increase the programs’ vocabularies and eventually expand them to conversation used in making hotel reservations and registering for conferences. His work is funded, in part, by IBM Corp. of Tokyo.
“We need to pick some domain where it’s very, very defined, always clear what you’re talking about,” he said.
The need for translating systems is considerable, especially for those not fluent in English, according to Tomita, who counts himself in that category.
There are more than 3,000 world languages and dialects, said Jaime Carbonell, director of the Center for Machine Translation.
90% Don’t Speak English
“Americans don’t seem to be aware of the fact that maybe 90% of the world population doesn’t speak English at all,” Tomita said. “But everywhere you go, there is somebody who can speak English.”
Because of that smugness and sense of superiority as well as insufficient funding, machine translation has received scant attention in the United States since the 1960s, experts say. Most work has been in Europe and Japan.
“The man on the street may or may not directly benefit from a machine-translation system,” Wulf said. “On the other hand, he may benefit indirectly from American scientists’ being able to interact better with the Russian or the Japanese or the French scientists.”