Advertisement

Advances Offer Hope for Truly Useful Filters

Share
Steve G. Steinberg is an editor at Wired magazine. His e-mail address: steve@wired.com

Every day I sift through a mountain of newspapers, magazines and electronic messages to find the news that matters to me. And every day computer scientists work on artificial intelligence and natural language-processing techniques in the hope of someday automating my task. But despite the prodigious amounts of money and brainpower expended over the last 20 years, their efforts appear to be as Sisyphean as mine.

In a reflection of this lack of progress, the focus has recently shifted away from enabling machines to understand meaning and context and toward a new set of techniques known as collaborative filtering, which attempt to take advantage of people’s collective intelligence rather than replace it. The idea is simple: find out the opinions of people similar to me to help predict which articles I will find useful.

With the emergence of global computer networks that allow the opinions of thousands of readers to be quickly collected and disseminated, collaborative filtering has the potential to finally put a dent in the problem of information overload. It’s already being used by Agents Inc.’s Firefly system (https://www.ffly.com) to help consumers find record albums they like.

Advertisement

But collaborative filtering also has its share of shortcomings and subtle dangers that, if not addressed, could lead to the same kind of disappointment that has long plagued more traditional information filtering research.

Probably the best way to understand the promise and perils of collaborative filtering is to look at how it is being used by GroupLens (https://www.cs.umn.edu/Research/GroupLens), a research project at the University of Minnesota that filters messages on the global Internet bulletin board system know as Usenet.

Usenet is divided into 8,000 or so newsgroups, each focusing on a specific interest. People can post messages to these groups that can then be read by anyone else. The problem is that of the 100,000 or so messages posted to Usenet each day, only a handful are of interest to any one person, and that person may not know exactly where to look.

Currently, a few traditional information filtering techniques are used to sort through this barrage. Search engines such as Alta Vista (https://www.altavista.digital.com) and Deja News (https://www.dejanews.com) allow users to find documents and messages that contain specific keywords. That’s wonderful in some cases--if you’re, say, just looking for messages about the writer Bruno Schulz. But it’s not much help if you’re interested in more general topics that can’t be identified with just a few words.

Another partial solution is moderated newsgroups. These are special newsgroups that only carry messages that have been approved by an elected moderator. This cuts down on the redundant and useless messages that often clutter a newsgroup, but if you don’t agree with the moderator’s opinion of what’s important, you’re out of luck.

By using collaborative filtering, the GroupLens system is able to avoid both of these shortcomings. With GroupLens, you are asked to rate every message you read with a score from 1 to 5. A 1 means you didn’t find the message useful to you at all; a 5 means you found it very useful.

Advertisement

Once you’ve rated a few dozen messages, GroupLens can match you with users who have similar tastes and interests. Then, based on the opinions of those users, GroupLens can predict with a good deal of accuracy how useful you will find messages you haven’t yet read.

A critical aspect of GroupLens is that it doesn’t merely compute an average of what everybody thinks about a message. Instead, it looks at only those people who have demonstrated tastes similar to yours.

Brad Miller, one of the system’s developers, illustrates why this is important by pointing to a newsgroup in which users exchange their favorite recipes. If you’re a vegetarian on this group, you won’t find chicken recipes very useful, even though the majority of the group’s users may love them. By identifying groups of similar users, GroupLens is able to provide the kind of filtering you’d find on a moderated newsgroup while still supporting a multiplicity of opinions.

So how well does GroupLens work in practice? That’s hard to say. Right now GroupLens is used by only about 200 people and only rates messages in 20 of the most popular newsgroups. That makes the system a lot less useful than it could be. Nonetheless, the anecdotal evidence is positive. According to Miller, GroupLens filtering has made rec.humour, a newsgroup where people are supposed to trade jokes but usually end up trading insults, funny again. As anyone who has tried to read rec.humour can tell you, that’s a major accomplishment.

But a close look at GroupLens also shows some weaknesses that could develop into serious problems for larger-scale collaborative filtering systems. The first weakness is what economists call the free-rider problem. After rating a few dozen messages in order to establish his or her interests, a GroupLens user can get all the benefits of the system without ever rating another message. Yet if everyone did this, GroupLens would quickly become useless.

To combat this problem, researchers have tried to make rating messages as painless as possible. In the case of GroupLens, it’s a matter of a single keystroke. However, it remains to be seen if people will put up with even that modicum of effort.

Advertisement

A more fundamental problem is with the idea of peer recommendations. Collaborative filtering experts are quick to admit that there are some types of information--medical advice, for example--where we would rather know the opinion of an expert instead of the opinion of a hundred peers. However, I wonder if this isn’t true for most types of information. Even for something as simple as movie recommendations, I’d hate to get recommendations only from people with tastes as plebeian as mine.

Paul Resnick, a scientist at AT&T; Research and one of the original developers of GroupLens, believes that we’ll eventually see hybrid solutions that base predictions on expert opinions when available (and when they show some correlation to our tastes) but otherwise use peer recommendations. This kind of social filtering would be far superior to anything artificial intelligence researchers could ever hope to develop, and is well within our reach.

Advertisement