Advertisement

Take My Site, Please

Share
TIMES STAFF WRITER

When D.R. Peck of San Diego launched a Web site listing all 3,800 auto repair shops in the county, he figured it was a slam-dunk to be on the first page of any Internet search using the keywords “San Diego,” “auto” and “repair.”

He was wrong. Despite using various search engines such as Yahoo and AltaVista, Peck could not find one that listed his site in the top 10, or even the top 100. Somehow he had been outranked by hundreds of barely related newspaper, tourist and other sites.

“How could we not be there?” Peck asked. After all, his was the only Web site offering a directory of repair shops.

Advertisement

It was then that Peck began his journey--some would say descent--into the dark side of the Web. Closely analyzing the algorithms, or formulas, used by the search engines to rank sites, Peck redesigned his own to appeal to the search engines’ system. Through a series of tricks ranging from secreting dozens of invisible keywords on the page to laying out certain words in a specific order, Peck raised his site’s ranking to a consistent No. 1.

“To me, it’s all a big game,” said Peck, now a jaded consultant in the war of manipulating information. “A legitimate quality Web site is just never going to show up [higher than] one that has been optimized.”

The ease of manipulating information has turned out to be among the most frustrating and intractable problems of the Information Age. At a time when trillions of bytes of data are freely surging around the globe in what should be a renaissance of knowledge, technology also has provided the means to distort and manipulate it all.

Search engine companies such as AltaVista, Yahoo and Excite@Home have become ensnared in a kind of arms race, trying to discover and defeat the latest tricks of what is now broadly known as “spamdexing”--an offspring of the computer-jock term “spam,” which denotes irritating or burdensome e-mail.

“I would say it has become the most time-intensive activity in our business,” said Kris Carpenter, director of search products and services for Excite@Home, the third-most popular search engine on the Internet.

Web Traffic Equals Money

While most Web surfers assume that search engines are neutral arbiters of relevance that look for keywords on sites and display their results with machine-like detachment, the reality is that the programs are easily fooled. The Web-savvy--or the unscrupulous--can boost even bogus sites onto the first page of search results for certain keywords.

Advertisement

The manipulation of commercial directories to gain prominent placement is certainly nothing new. Take the yellow pages, where almost every category starts with a handful of companies whose names begin with A, AA or AAA, simply to ensure they sit at the top of the alphabetized listing.

But the economic stakes on the Web make such schemes even more important. In Internet commerce, where traffic equals money, a site ranked in the first few pages of a search can attract millions of surfers. Being listed lower, among the proletariat of Web pages, so to speak, is often tantamount to not being listed at all.

“If [your site is ranked] 157,000 out of 300,000, you won’t be found,” said Randy J. Ellis, sales manager of Native Sun, a Signal Hill telecommunications equipment dealer that relies on spamdexing techniques to draw traffic to its site.

The root of the problem is the huge amount of information that pours onto the Internet each second: The Web now contains an estimated 6 trillion bytes of information spread over 800 million Web pages and 3 million computers that distribute the information.

There is no way to index all these pages by hand. Search engines rely on programs known as “spiders,” which automatically roam the Internet seeking out new pages. But the sheer vastness of the Web overwhelms even them.

The most comprehensive search engines have managed to index no more than 16% of the Web, according to a study by the NEC Research Institute in Princeton, N.J.

Advertisement

Moreover, just getting listed by a search engine means almost nothing. Oren Etzioni, chief technology officer of the multi-service Web site Go2Net and a professor of computer science at the University of Washington, said that in a study he conducted at the university, half of all clicks from search engines went to just the top 10 search listings. More than 90% of the clicks went to the top 50 sites.

“Just being in the top 1% means nothing,” he said. “[That means only] you’re in the top million. Good luck.”

The craft of search engine manipulation has therefore advanced quickly.

One of the earliest methods to deceive the spiders was called simply “spamming”--writing a certain keyword hundreds or even thousands of times in the invisible title of a Web page.

For example, when a spider stumbled on a page that contained “Porsche” 1,000 times, it would deduce that the page was about Porsches and would rank it very high for a search on that keyword.

The search engine companies were able to defeat this trick by commanding their spiders to ignore repetitive keywords in the invisible title.

Spamdexers soon learned that they could embed invisible keywords in the text of Web pages themselves by using the same color as the background or by using extremely small text that shows up as thin lines on a Web page. Viewers would not see the text, but the search engines would detect them.

Advertisement

Because there are essentially no space limitations on a Web page, spamdexers eventually learned that it was just as easy to spam many keywords at once. The practice of celebrity spamming has become indiscriminate.

One pornographic Web site had the names of 400 female celebrities on its site written in tiny text, including Connie Chung, Amy Irving, Audrey Hepburn, Diane Sawyer, Gillian Anderson, Hedy Lamarr, Monica Lewinsky, Molly Ringwald and Olympic skater Nancy Kerrigan.

“If you’re a female celebrity in America, your name is going to be used to sell pornography online,” said Victor Polk, an attorney for Kerrigan, who filed a suit against the owners of one pornographic site to stop them from using her name.

The tricks have evolved far beyond spamming keywords. They now include techniques that can automatically hijack viewers searching for information on say, Princess Diana, to a pornographic site. Modern Web page massagers say they can consistently place even the most ridiculously unrelated Web sites into the top few positions.

Do a search using a familiar keyword string like ‘Nintendo Game Boy,” “Princess Diana,” “MP3,” or the rock group “Genesis,” and you will find real estate firms, music stores, personal Web pages and pornography sites that use these popular keywords to steer traffic their way.

Even the keyword “spamdexing” first turns up an article about search engines that flashes on the screen for a few seconds before sending you off to Allsexgames.com.

Advertisement

One of the most abusive spamdexing techniques, according to Edgar Whipple, technical director of index engineering at AltaVista, the fifth-most popular search engine, is to place an entire, invisible copy of a dictionary into a page’s code so the page will pop up no matter what search term is entered.

“We’re always on the defensive,” he said. “It’s a real pain.”

For all that, spamdexing also has its positive side. To a certain extent, a well-tuned site helps the search engines accomplish their Herculean task of indexing the ever-expanding Web. And even legitimate businesses often refine their own sites using spamdex techniques (under which circumstances it is more commonly known as “optimizing”).

“We, unfortunately, view these people as both outlaws and partners,” Carpenter said.

Native Sun used several optimizing tricks in the last year to boost itself into top-10 ranking for such keywords as “Norstar,” a manufacturer whose equipment it sells.

“If you don’t do this, you’re nowhere,” Ellis said. “It’s a very important marketing strategy. You’ve got to use your head.”

Going After the Spiders

The proliferation of consulting firms and specialized software tools to inflate search engine rankings, however, may undermine the purity of information in the Information Age--raising the question whether any information in the future will stand on its own or simply be a construction of search engine manipulations.

Peck, who now runs his own consulting firm on Web site optimization, Green Flash Systems, readily concedes that he is seen by some as an outlaw even though all his clients are legitimate companies only seeking to keep themselves high in the rankings.

Advertisement

“Even a quality Web site won’t show up,” Peck said. “This is a market now where sellers have a huge interest in being No. 1 on a search list. Money talks in terms of the pages that show up.”

“Anything created by a machine can be manipulated by a machine,” said Fredrick Marckini, author of “Achieving Top 10 Ranking in Internet Search Engines” and the head of a consulting firm specializing in Web site optimization. “It’s a constant cat-and-mouse game. But you absolutely have to pay attention to how the search engines work. If you don’t, it’s just like putting a sign up in the middle of a forest.”

The latest technique involves creating special Web pages that will appear only to search engine spiders, each of which has its own electronic identifier. When a Web site detects the spider’s identifier, it serves up a special page, precisely tuned with the right keywords and layout to ensure a high ranking.

When regular users click on the link in the search engines’ listings, they are sent to the same computer the spider visited but given a different page, which could be anything from car parts to pornography.

“That one took us a long time to diagnose,” Whipple said, adding that he calls the technique “spoon-feeding.”

The search engines were finally able to defeat most spoon-feeding by fighting fire with fire. Whipple said the spiders can be electronically disguised to appear to be ordinary users.

Advertisement

Spoon-feeding is actually the method that Peck uses most often to help his clients.

A controversy erupted last year when it became known that one of those clients, State Farm Insurance, was spoon-feeding pages to the search engines.

In the end, the vast majority of search engines decided that State Farm’s use was allowable.

Most of the search engine companies say that as long as the information they index is related to the actual Web pages, they have no problem with spamdexing.

‘The Words Mean Nothing Now’

But even in its most benign form, spamdexing raises the issue of whether the traditional reliance on keywords will have a place in the future.

“It’s got to move more and more away from just looking at the words on a page,” said Danny Sullivan, the London-based editor of Search Engine Watch, a Web journal on search engine developments. “The words mean nothing now.”

Most search engines are moving toward a greater role for humans in deciding what is important.

Advertisement

Google, for example, is a search engine that ranks Web sites based on the number and importance of the pages that link to it, making it more resistant to keyword tampering.

Yahoo and About.com are two search engines that mainly rely on human editors to pick the best sites. Both are largely immune to manipulation, although their collection of sites is tiny compared with such automated engines as AltaVista, with 140 million sites indexed.

Scott Kurnit, president of About.com, said the human approach makes more sense in the flood of information on the Web. Humans are hard to fool, and their picks are more relevant than those generated by a machine, he said.

“Humans are just smarter,” he said. “It’s the human touch that makes things relevant.”

Perhaps the most radical approach has been taken by Goto.com, a search engine that simply charges companies to be listed at the top in their categories. AltaVista has adopted a similar strategy on a limited basis.

Jeff Brewer, chief executive of Goto.com, said there is no point in trying to maintain an aura of neutrality.

Instead, the search engines should let the market decide what is important and what isn’t.

Like advertisements in the yellow pages, paid search rankings can indicate a prosperous Web site that is useful to people.

Advertisement

“It just happens that people give weight to big advertising because it shows that company has made an investment in reaching consumers,” he said.

Web sites pay Goto.com from a few cents to a few dollars every time someone clicks on their link.

“With Goto, you know it’s advertising,” Brewer said. “With others, you never know.”

Advertisement