Advertisement

The Cutting Edge: COMPUTING / TECHNOLOGY / INNOVATION : Software Knows to Deconstruct in Plain English

Share
SPECIAL TO THE TIMES

Who has set the table?

Who has the sets of tables?

To humans, those are very different questions. But to computers, they are nearly identical. Kathleen Dahlgren is trying to teach machines to recognize the difference.

Dahlgren is president of Intelligent Text Processing, a Santa Monica firm made up of linguists and artificial-intelligence experts who have spent six years building InQuizit, a dizzyingly complex piece of software that knows approximately 200,000 of the most common meanings for words in the English language.

InQuizit has learned how to recognize that in the first sentence, “table” is a piece of furniture where people sit down to eat, while in the second sentence, “table” describes a list of figures in a chart.

Advertisement

Dahlgren and her colleagues hope that ability will vault InQuizit past a raft of competitors and make it the premier software for helping people find information in databases of books, articles and other texts, particularly on the World Wide Web.

Traditional methods for finding what amounts to a needle in a digital information haystack rely on computer programs, called search engines, that use “true or false” principles to determine whether a key word or phrase is contained in a document. Some advanced search engines are programmed to find synonyms of key words or use complex statistical analysis to try to identify the particular meaning of a word based on the other words that are near it.

But those kinds of searches often retrieve documents that aren’t relevant, while at the same time skipping over documents that contain useful information. Some of them also require users to pose their questions in an arcane and nonintuitive way.

*

InQuizit, by contrast, is a “natural language” search engine, allowing users to ask a question in plain English. And because it understands the nuance of a question, 87% of the documents it retrieves are relevant, and it captures 96% of all relevant documents, the company said.

“They have really broken the code,” said Michael Golden, an engineering psychologist at the Army Research Laboratory in Aberdeen, Md., which gave Intelligent Text Processing a $550,000 grant to develop better interfaces between people and machines. Last summer, the Army honored ITP as one of the most innovative small companies it funded.

“It’s beyond anything that’s currently out there,” said John Lin, president of Digital Facades, a Santa Monica interactive media design firm that creates Web pages and CD-ROMs. “It’s going to be just an incredibly revolutionary tool to locate information that would otherwise be nearly impossible or very difficult to find.”

Advertisement

For example, consider the sentence, “The mother can transmit AIDS to her baby.” First, InQuizit deconstructs the sentence and identifies “transmit” as the verb. Then, based on the words “AIDS” (a kind of disease) and “baby” (a kind of person), it realizes that “transmit” means “infect” rather than “send,” “broadcast,” or any of its other definitions.

Later, when a user asks a question like, “Can a mother give AIDS to her baby?” or “Did the baby get AIDS from its mother?” or any of the thousands of other ways to pose that query, InQuizit will know to retrieve the sentence “The mother can transmit AIDS to her baby.”

“The machine should act like a good Web librarian,” said Edward Stabler, vice president of ITP and a linguistics professor at UCLA. “It should figure out what information you need, go get it and put it up on your screen.”

Dahlgren and Stabler studied natural language processing at IBM in the 1980s. In 1990, they founded ITP to use what they had learned at IBM to build an easy-to-use and highly accurate search engine.

InQuizit was originally intended to search text databases of any sort, whether they were on client-server networks, CD-ROMs or the Internet. When the popularity of the World Wide Web--the graphics-rich, user-friendly portion of the Internet--exploded last year, the company decided to target Web sites as potential clients.

“This is the bread and butter of what people want,” said Jack Reynolds, ITP’s marketing director. “This is what makes Web sites useful.”

Advertisement

Reynolds says he hopes to license InQuizit to a few hundred Web sites by the end of the year. Pricing would depend on the size of a client’s text database and other factors, but licensing fees would run at least $10,000 per Web site, he said.

Pathfinder is one potential client. The mammoth Web site features archives of magazines such as Time, People and Sports Illustrated, and “searching is at the core of visiting the Web site,” said Pathfinder’s producer, Jack Mason, who tested InQuizit at ITP’s office last week.

“The idea that you could ask a question in plain English and get an appropriate answer is naturally appealing,” he said. “It enables people to have an experience with their computer that is more human than the current searching techniques.”

Advertisement