Advertisement

An Unfolding Gene Map at ‘Finish Line’

Share
TIMES STAFF WRITERS

Over the next several days, a 2-year-old biotechnology company, Celera Genomics, is expected to announce that it has completed a version of the human genetic code. That will be followed in June by a similar announcement from the public Human Genome Project.

The mapping of the human genome is one of the most significant and widely trumpeted achievements in modern science. The research it enables promises new treatments for disease, new drugs to promote healthy growth and delay aging, and new ways to detect disorders early while there is still time to do something about them.

The expected announcement by Celera will be a defining moment in the bruising, often bitter competition between the biotech firm and the international collaboration of academic researchers--a race that remained remarkably close going into the final straightaway.

Advertisement

Yet many of the participants on the public side say that it isn’t the time to shoot off fireworks in celebration. This is a meaningful milestone that has come much faster than expected, they say, but the work is far from done.

And there are good reasons to question whether the race to complete the human genome is indeed over:

* The “finish line” in the race is arbitrary and has been drawn and then redrawn by the participants themselves. Originally, both Celera and the international academic centers making up the Human Genome Project had established exacting standards for completion: determining the order--or sequence--of 99.9% or more of the chemical building blocks in the human genetic code. But the sequences to be announced this spring are unlikely to be quite that complete; and the public effort’s “working draft” will have as little as 90% of the sequence.

* A “working draft” of the human genome is, by definition, not a finished product but a work in progress. Both the public draft and the Celera genome will contain many gaps in the listing of the estimated 3 billion chemical building blocks in the DNA molecules that make up the human genetic code. Such incomplete efforts are indeed useful but also frustrating, like a map that includes most main roads and bridges but leaves out some local streets.

* Even in a “finished” version of the genome, which the public project hopes to complete by 2003, there will be vast uncharted regions of human genetic code. That’s because there are sizable stretches of DNA within the 23 pairs of human chromosomes that are simply too difficult to decipher with current technology. At a meeting in England last fall, scientists from the Human Genome Project recognized that and agreed on a definition of “essentially finished,” knowing that large tracts of terra incognita would remain in each chromosome.

* For a time at least, Celera’s map is certain to be more complete than the public one, even though the company’s scientists did far less decoding than they promised when they formed their company in 1998. That’s because Celera will be able to incorporate the public data into its own. Over the next two years, the gap will gradually close as the Human Genome Project continues its decoding efforts.

Advertisement

Still, despite these reservations, the pending announcements signal a remarkable achievement for mankind: the first detailed looks at the whole set of genetic instructions that shape our individuality, our response to medications and our susceptibility to disease.

Imperfect Maps of ‘New World’

These drafts can be compared to the first imperfect maps of the “new world” that enabled generations of explorers to navigate the globe. Except in this case, the terrain to be explored is the microscopic world of human genetics that lies within our cells--coiled molecules of DNA embedded in tiny chromosomes.

“It’s appropriate to make a fuss,” said Dr. Francis S. Collins, director of the National Human Genome Research Institute, which has pumped some $250 million into deciphering the human genome in recent years. “But let’s be sure as we’re doing so to keep in context all of the caveats about how much more work lies ahead of us to be able to understand what this means.”

J. Craig Venter, Celera’s flamboyant president and chief scientific officer, agrees that cracking the human genetic code is just the beginning of a huge scientific enterprise. “If people want to view it as a race, it is a race to the starting line,” Venter told The Times shortly after he helped launch Celera with $300 million from PE Corp.

For Celera, being first with a draft of the genome provides a sharp demonstration of its scientific prowess, which could translate into a boost in its share price. Moreover it has already filed provisional patents on thousands of novel human genes, intellectual property which it will share with its pharmaceutical company customers.

Deciphering Genome Opens New Worlds

For their part, academic researchers working on the public effort are eager to win credit within the scientific community for years of pioneering work and to secure financial support from Congress to continue their sequencing efforts.

Advertisement

Everyone agrees that deciphering the human genome, even in a working-draft stage, has opened up new worlds to explore, even if the health gains that are promised will not come immediately. It will take years of research to understand how human genes interact, and it can take eight to 10 years to bring any new medications to market.

Collins points out that both academic and corporate scientists have already tapped into the public project’s data, which is updated nightly on a government-operated Web site and already covers 85% of the genome.

In the last year alone, Collins said, researchers using the posted sequence identified genes responsible for a dozen different hereditary disorders, including two forms of deafness and a rare form of epilepsy.

And private companies have plumbed the data to find genes that play roles in diabetes, asthma, psoriasis and migraines.

The human genome is a complete set of genetic instructions, contained within the chromosomes and present in almost every cell in the body. Those instructions are written out in molecules of DNA in a compact, four-letter code--the letters A, T, C and G standing for the individual chemical building blocks adenine, thymine, cytosine and guanine.

The human genome is so long that it would take as many as 100,000 pages of this newspaper to write them all out, even in smaller type than this.

Advertisement

No one knows precisely how big the human genome is, and they won’t until all the letters are accounted for. Current estimates--ranging from 2.8 billion to 3.2 billion chemical letters--are based on random sampling, the way news organizations poll voters.

The first methods for spelling out the letters in order--also known as sequencing--were developed in the 1970s. A decade later, a handful of scientists proposed an all-out effort to decipher a reference genome--a composite made from several anonymous individuals that would become a standard for comparison in the hunt for the genetic causes of disease.

In 1985, Charles DeLisi, then director of health and environmental research for the Department of Energy, began to enlist support for what eventually would become a $3-billion, 15-year effort to decipher the human genome and those of other species, primarily paid for by the federal government and England’s Wellcome Trust, one of the world’s largest charities.

International teams of researchers divided up the chromosomes, splitting them into chunks--separating the encyclopedia-sized genome into more manageable chapters for eventual decoding.

The process was an extraordinarily slow and painstaking one, requiring scientists to grow, track and decode millions of overlapping fragments of DNA and then piece them together.

To do the job efficiently, scientists at academic centers participating in the Human Genome Project set up sequencing assembly lines using automated sequencers and sophisticated software.

Advertisement

Richard A. Gibbs, director of the Human Genome Sequencing Center at Baylor College of Medicine in Texas, compares the process to putting together a map of the continent by piecing together millions of small, overlapping aerial photos.

Some areas proved difficult if not impossible to decipher because they contained large runs of repeating letters. In Gibbs’ analogy, these would be like large stretches of landscape without well-defined landmarks--like vast areas of desert or ocean or forest. Within these areas, one aerial shot would be virtually indistinguishable from the next over large areas.

The original goal was to start at one end of each chromosome and work across to the other end, said Dr. Robert H. Waterston, director of the Washington University School of Medicine gene sequencing center. But it soon became clear that this was impossible without breakthroughs in technology.

A number of scientists are working on these problems. But, says Waterston, “we’ll all be dead before we can say we’re finished.”

Fortunately, the richest--and most commercially promising--sections of genome can be sequenced. These are the areas where most of the genes can be found--the pieces of DNA that carry instructions for making proteins and that control the basic processes of life.

Officially launched in 1990, the Human Genome Project began the huge task with its scientists unsure of how much of the human genetic code could be deciphered and whether it could meet its 2005 deadline.

Advertisement

Beginning in 1996, the consortium members all agreed to post their raw sequence data nightly on GenBank, a public Web site maintained at the National Center for Biotechnology Information. Whatever results the consortium produced would be immediately available to all scientists, whether they worked for private companies or universities.

From 1993 to 1998, the project decoded about 120 million letters of human code, Collins said. The sequencing centers had made only a small dent in the task, but that was 50% ahead of its original goals.

Enter Venter and Celera.

“‘We’ll do in three years what was taking 15, for hundreds of millions rather than billions,” Venter said in an interview at the time. There was no time to waste while millions of women were dying of breast cancer, he testified last month to a congressional subcommittee, quoting the company’s motto, “Speed matters. Discovery can’t wait.”

Venter, a onetime National Institutes of Health scientist and head of The Institute for Genomics Research, proposed taking a shortcut to finish the job that the Human Genome Project had started.

Celera’s strategy, as spelled out in the journal Science, was to skip the laborious, time-consuming job of dividing up the genome chromosome by chromosome. Instead, the company would take multiple copies of the entire DNA from a single anonymous volunteer and break that into millions of overlapping pieces.

Using 300 highly automated DNA decoding machines from sister company PE Biosystems, sequencing would be incredibly fast. The company couldn’t know that the approach worked until it proved that its battery of supercomputers could put all the pieces together again.

Advertisement

Many of the scientists with the public project scoffed. But the effect on the consortium was electric.

At various times, both sides insisted that this was not a race. But it was hard not to hear the report of the starting gun and the pounding of feet on gravel. And each side began redrawing the finish line.

In the fall of 1998, the Human Genome Project said it would produce a finished genome by 2003, two years ahead of schedule, and a less complete “working draft” by 2001.

Both sides announced milestones along the way. Most notably, the public genome project published a paper in December that said it had for the first time deciphered the genetic code of an entire human chromosome.

But, in fact, large areas of chromosome 22 were not sequenced at all. A few months before, the participants meeting in England had agreed that a chromosome could be deemed “essentially finished” when it was as complete as current technology will allow.

“We said it must be complete within the limits of technology,” said Washington University’s Waterston. By doing so, Waterston said, the scientist gave Celera an opportunity to define its own finish line.

Advertisement

In January, for its part, Celera held a news conference to announce that it had the sequence of 90% of the human genome in its database. But that figure included DNA sequences drawn from the Human Genome Project. And the company had not assembled the pieces.

In March, Celera and the publicly funded Berkeley Drosophila Genome Project announced publication of the entire genome of the drosophila, the fruit fly, perhaps the best-studied laboratory animal on Earth. The published result proved that the company’s software could assemble an entire genome. But the fly genome still has a number of gaps that will be filled over the next year. And large parts of the genome--about a third of its 180 million chemical letters--cannot be sequenced with current technology.

Last month, Celera’s Venter announced that the company had completed the sequencing phase of the first human genome. All that was left is to put the pieces together, Venter told Congress on April 6, and that would take another three to six weeks.

However, scientists with the public project quickly pointed out that Celera’s sequencing effort fell far short of what it had promised in 1998. Instead of covering the entire sequence 10 times to eliminate gaps and guarantee accuracy, the company’s team had only sequenced the genome three times.

Without tapping into the public sequences available to them on GenBank, Celera’s data “would be in millions of pieces,” Waterston said.

Celera’s combined sequence is certain to be better than the public data alone, acknowledges National Human Genome Research Institute director Collins: “The fact the sequence we’re producing is available for everybody in the public sector or the private sector was very much the point. We’re delighted to see any group take this information and add to it and make it even more valuable.”

Advertisement

Says Eric Lander, director of the genome center at the nonprofit Whitehead Institute for Biomedical Research in Cambridge, Mass., a major contributor to the public effort: “For all intents and purposes, the race is already over. Something like 85% of the human sequence is freely available on the Web, and scientists are using the data every day. It will continue to get incrementally better.

“But, crowds are not waiting to shoot off fireworks when the last DNA base is read,” Lander said. “Scientists have already moved on, trying to find all the ways to use the information.”

Advertisement