Advertisement

Biotech Firm’s Mix-Up on Fly Genome Creates a Stir

Share
TIMES STAFF WRITERS

In a scientific mix-up that has become the talk of gene researchers, the biotechnology company that deciphered the genetic code of the fruit fly inadvertently included stretches of human genetic material in data it posted on a public Web site.

The error by Maryland-based Celera Genomics was discovered by federal officials who monitor GenBank, the database where the codes of the fly and other creatures are freely available. The company retracted the information a few days ago, soon after being told of the error.

The mix-up, which is unlikely to cause anylasting scientific damage, has become another issue dividing the scientific community at a time when Celera and a consortium of public genome centers are racing to complete and publish a working draft of the human genetic code.

Advertisement

The race to be first with the completed code is a high-stakes games not just for scientific bragging rights but also for determining who will have first shot at using the new genetic data to develop medical breakthroughs: publicly funded researchers or private drug companies.

The human genome is a detailed instruction manual for running the inner machinery of every member of the species. When researchers learn to read the manual they will have at hand a tool for fighting disease, maintaining health and perhaps extending the human life span. The mapping of the fruit fly genome was an important test run for Celera’s effort.

Scientists with the public Human Genome Project say that the discovery of human genetic code fragments and other more vexing errors in Celera’s fly data are evidence of the company’s rush to publish and the pressures of competition. Some wryly compare the mix-up to the plight of the genetically confused part-human, part-fly character in the classic horror movie “The Fly.”

The dust-up is just the latest example of a growing acrimony between the company and many of the academic scientists in the public human genome effort. It follows a complaint to Congress from Celera’s president, J. Craig Venter, who contended that the public scientists were taking shortcuts in order to win the race to map the human genome.

Richard K. Wilson, co-director of the Washington University genome sequencing center, points to that testimony when asked the significance of mixing fly and human data.

“What does it tell me?” Wilson asked. “It tells me that people who live in glass houses shouldn’t throw stones. I’m responding to [Venter’s] testimony, his public comments, his penchant for knocking the public effort. If anybody else put out this kind of sequence with this degree of contamination in it, Celera would be all over them and telling [the press] what an awful job whoever produced that sequence had done.”

Advertisement

J. Paul Gilman, Celera’s director of policy planning, acknowledged that the fly data were contaminated with a small amount of human code. “There is nothing new about any of this in terms of the sequences that have been published,” he said.

A Celera collaborator on the fly map, Gerald Rubin of UC Berkeley, said the contamination was “absolutely true and absolutely trivial.”

He said that he could not explain how the human data ended up in the fly material and that there could be other types of contamination that has not yet been caught. “This is the equivalent of your misspelling someone’s name in a story,” Rubin said.

The completion of the genetic code of the fly, one of the most thoroughly studied lab animals, has been hailed as a major triumph for Celera, even by the company’s critics.

The genome of any organism--human or fly--is its entire genetic code, contained in long DNA molecules made up of four chemical building blocks, identified by the letters A, T, C and G. There are 3 billion of these chemical letters in the human genome packaged in 23 pairs of chromosomes. In contrast, the fly genome contains about 180 million letters in just four pairs.

But even in the fly, determining the exact order of those letters--sequencing its genome--is a daunting task, made possible by high-speed automated machines that do the job with minimal human intervention.

Advertisement

Celera, a subsidiary of PE Corp., was launched two years ago with the mission of being first to complete the human genome sequence.

However, it began instead with the smaller fly genome, working with Rubin’s group and the publicly funded genome center at the Baylor College of Medicine in Texas.

The fly became a kind of test run for Celera’s rapid-fire sequencing technique that relied on hundreds of machines to read out the chemical letters of genetic fragments and a phalanx of supercomputers to assemble all the pieces like a mammoth jigsaw puzzle.

The firm began work on the fly last May and published its finished sequence last month. Like most of the genomes of organisms regarded as completed so far, the fly code has not been totally mapped; there are gaps throughout the chromosomes. And some fragments of fly DNA that have been decoded cannot be placed in the larger landscape.

The human DNA data that contaminated the fly sequence were found among those fragments that could not be mapped.

David Lipman, director of the National Center for Biotechnology Information, said the human contaminants in the fly data were identified by the center’s staff after Celera submitted its results to GenBank.

Advertisement

By comparing the fly code with other organisms in the database, a center scientist discovered that 69 pieces of code--150,000 letters in all--was from human DNA and not fly. That’s about a tenth of 1% of the total genome.

In papers published last month in the journal Science, Venter, Rubin and their collaborators acknowledged that more than 2% of the fly’s genetic code could not be mapped precisely and that some fragments “may represent as-yet-unidentified foreign DNA.”

“It offends me, the degree to which this has been politicized,” Rubin said. “I’m not ashamed. I’m not embarrassed that some of these sequences had to be withdrawn.”

He also noted that the published sequence was described as “Release 1” and that the collaborators intend to make further releases as they fill gaps and improve accuracy.

Lipman agrees with Rubin that contamination errors do pop up from time to time at every sequencing center. “I don’t really see this as evidence of careless work,” he said.

However, Lipman’s staff has found some other problems with the fly sequence published by Celera that are only slowly being resolved. Specifically, some genetic information included in the fly database that describes possible functions of genes are “out of sync” with the sequence.

Advertisement

“That is a bookkeeping problem that is avoidable,” Lipman said. “In the rush to get out there, there was some confusion.”

The firm, he said, is working to fix those problems and they should not detract from Celera’s achievement.

“I’m overwhelmingly impressed with what they could do in the time period,” Lipman said.

Advertisement