Advertisement

$1 Gets You Into This Gene Pool

Share
TIMES SCIENCE WRITER

For people gambling on the future of genetics, the human number is in play.

Around the world, scientists are placing bets on just how many genes it takes to make you or me.

So far, nobody knows.

Estimates rise and fall as precipitously as a biotechnology stock.

To make book on such uncertainty, Ewan Birney, a leading computational biologist in England, earlier this year challenged his sober-minded colleagues to gamble a dollar and some incalculable measure of their reputations on their best estimate of the true number of human genes.

So far, 228 geneticists and biocomputing experts have put their money on numbers ranging from 28,000 to 200,000 genes.

Advertisement

Surprisingly, the uncertainty about the number of human genes has grown--not narrowed--as researchers put the finishing touches on the first rough draft of all the DNA that makes up the human genetic sequence, which took hundreds of scientists a decade to compile.

These days, anybody with a few hours to kill can download from the Internet the entire human genetic sequence, comprising the billions of chemical components that make up an individual’s heredity. But scientists don’t really know where the actual genes start or stop, let alone what they do, or how they interact.

“Our difficulty in doing even the simplest thing--like counting the number of genes--is an indication of how much we have to learn to mine the information in the human genome sequence,” said UC Berkeley genetics expert Gerald M. Rubin, who is vice president for biomedical research at Howard Hughes Medical Institute in Chevy Chase, Md.

Indeed, at least one biotechnology company already may have applied for patents on more human genes than actually exist in nature.

“Nobody has figured how to reliably find genes in a DNA sequence,” Birney said. “All our estimates are all over the shop and the only way to deal with this is a bet.”

So far, researchers actually have identified 38,000 human genes. At least they think they have. Birney has wagered his dollar that, all in all, it takes exactly 48,251 genes to make a human being.

Advertisement

Whatever the actual winning number, it goes to the heart of biological complexity.

In so many ways, human potential may seem boundless, but one way to begin understanding its limits is to learn the number of genes that together are responsible for humanity’s biological promise.

So far, however, the genetic code conceals as much as it reveals. The human genome contains all the DNA required to build a person. Some of that DNA codes for genes; much of it does not. The active genes are broken up and hidden in the inactive stretches of genetic material. It all looks alike. And any certainty beyond that starts to become a betting matter.

The confusion surrounding that number is understandable, for the effort to decipher the human genome is a breakthrough in progress.

Despite a White House ceremony in June celebrating completion of the working draft of the human genome, “hundreds of thousands of gaps” still remain in both the publicly and the privately funded versions of the human genetic code, said genomics expert Philip Green at the University of Washington in Seattle.

Identifying Genes on Piecemeal Basis

For the time being, the complete human genome sequence is like a 24-volume encyclopedia--one thick book for each of the human chromosomes, the physical structures that contain the genes. But the only thing anyone can recognize reliably in the books is the order of the chemical alphabet in which it all is written.

Unlike any conventional book, however, these natural genetic texts can read and copy themselves, translating a few alphabetic signs into a galaxy of meanings.

Advertisement

Some of those meanings are understood. People have been identifying individual human genes on a piecemeal basis for years. Birney’s group at the European Molecular Biological Laboratory maintains an international interactive genome data base called Project Ensembl so large that it resides in a bank of 120 computers. A dozen new disease-related genes have been identified just in the past six months using Ensembl and other such public databases.

But overall, the questions posed by the billions of chemical DNA characters that make up the complete set of human genes are still so fundamental that, for betting purposes anyway, a few scientists are still arguing about how best to recognize this basic building block of heredity.

“One of the big problems . . . is that, even given the completed genome, finding genes in that sequence is still rather difficult,” said John Quackenbush at the Institute for Genomic Research in Rockville, Md. “Some of it has to do with what you classify as a gene.”

Quackenbush has bet a dollar that there are exactly 118,253 human genes.

“It is guaranteed to be absolutely wrong,” he said sheepishly. ‘If I were going to make another bet, I’d make a different one.”

Maybe higher. Maybe lower.

Human nature complicates the count.

Unlike bacteria, in which almost all of a creature’s DNA is made to work for a living by coding active genes, human genes are fragmented, and the fragments themselves are scattered throughout strings of garbled code.

“In the human and other higher organisms, the genes themselves are broken up by intervening sequences,” Quackenbush said. “When you survey the genome, it is very difficult to pick out these scattered pieces and put them together properly.”

Advertisement

Indeed, only about 2% of the human genome actually contains genes.

Almost 95% of the human genetic code consists of repetition, remains of ancient viruses and incomprehensible snatches of sequence that have no discernible function.

Some strings of human DNA consist of the same genetic message repeated thousands of times. Others are the remains of once-working genes, known as pseudo-genes, that time and mutation have transformed beyond recognition. Other sequences simply defy understanding--at least for the moment.

In less complex organisms, after a decade of feverish gene sequencing, researchers have determined the exact order of the thousands of nucleotide base pairs that make up the DNA of about two dozen life forms.

The simplest known cellular organism--a microbe called Mycobacterium genitalium--has 517 genes. The gut bacteria E. coli has 4,300 genes. Each cell of baker’s yeast contains 6,000 genes. The genome of the fruit fly has 13,601 genes.

Many of the same genes--for regulating, editing, repairing and organizing DNA--show up again and again in different life forms.

The absolute minimum number of genes required for life can be as few as 300--if they are the right ones. No one knows if there is any upper limit.

Advertisement

As researchers comb through all of these genomes for the living texts of genes, many biologists are concluding that the biological construction kit for humankind has far fewer genes than previously believed.

It did not take researchers long to notice that size and complexity don’t seem to go hand in hand.

Some very simple organisms appear to have much more elaborate DNA sequences, and therefore more genes, than creatures that seem--to the eye at least--more complex.

A salamander has more raw DNA than a whale. A 1-millimeter-long wriggler called the nematode worm, whose body contains only 959 cells on average, has 19,099 genes in its genome--5,000 more than the more elaborately constructed drosophila fruit fly.

If the lowest guesses in the human gene sweepstakes are correct, it means that it takes only twice as many genes to make a person, with an average of 100 trillion cells in the human body, as to make a worm or a fruit fly.

“I think any biologist who is worth his salt should throw his hands up at this point and say we don’t know how complex organisms are built up from genes,” Birney said. “You don’t do it just by having more genes.”

Advertisement

Clearly, there are other factors at work, involving gene regulation, splicing and other exercises in evolutionary sleight of hand.

“You can achieve complexity not just by increasing the number of genes but using the same genes in more inventive ways,” said Rubin at UC Berkeley. “There is no reason to assume that you would need a huge number of genes to build something as complicated as a human.”

Despite the second thoughts, each bet in the gene sweepstakes is more than a blind stab in the dark.

With a rough draft of the human genome to work with, researchers are using ingenious computer programs and elaborate pattern recognition strategies to identify the significant sequences in the raw stream of human DNA data that may contain active genes.

Some of these programs scan for strings of code believed to begin or end genes, while others compare sequences to known genes or pieces of genes. Others compare new sequence data to the genomes of other organisms such as the fruit fly, or attempt to detect genes by classifying the proteins they produce.

Last month three laboratories using three independent analytical techniques published three different estimates, from a low of 35,000 genes to a high of 120,000 genes. The three estimates were published in the scientific journal Nature Genetics.

Advertisement

Green and his University of Washington colleagues have estimates close to the bottom of that range. By using two elaborate computerized search strategies to filter out the genetic dross, they came up with estimates of 34,700 and 33,630 human genes.

“We are fairly confident that the answer is in that ballpark,” Green said.

Conventional Wisdom Defied

Green is sure enough of his team’s work to bet on it.

He put his sweepstakes dollar on 35,000 human genes--a bet neatly bracketed by other geneticists. One scientist has a dollar on 34,999 genes, while another researcher put a dollar on 35,001 genes.

Taking a slightly different tack, researchers led by Jean Weissenbach at the Genoscope Sequencing Center in Evry, France, arrived at another low estimate of between 28,000 and 34,000 human genes. But DoubleTwist Inc., an Oakland-based biotechnology company, announced with just as much certainty that there are 105,000 human genes. Two other genomics companies, Incyte Genomics Inc. in Palo Alto and Human Genome Sciences in Rockville, Md., have asserted there may be at least 140,000 human genes.

Quackenbush and his colleagues at the Institute for Genomic Research came up with an estimate of 120,000.

Almost all of these educated guesses defy the previous conventional wisdom, which holds that the human genetic sequence contains between 80,000 and 100,000 genes.

However ingenious these high-speed computer protocols and genetic databases, they still are far from perfect, several experts said.

Advertisement

“The tools aren’t really up to the task yet,” said Rubin at UC Berkeley.

The computer programs may err by counting fragments of genes that never actually do anything, counting two genes as one, or blurring crucial distinctions between the ends and beginnings of gene sequences.

To further confuse things, many genes have multiple names or more than one function.

The computer databases often contain many duplicates of the same sequence or overlapping fragments.

And, several scientists acknowledge, there is still some fudge in the calculations.

“It is some kind of numbers game, not exact science,” said Quackenbush. “At some point, it is like taking the age of your cat and dividing by your mother’s weight.”

Not until 2002 will the scientists even decide what method to use to determine the winning gene count, because there is so little scientific certainty today.

In the meantime, anyone can play the game.

The rules of the winner-takes-all sweepstakes are posted on the Internet at an official Gene Sweepstakes Web site at https://www.ensembl.org/genesweep.html.

As befits a scientific gaming enterprise, the eight rules governing bets have 11 technical footnotes.

Advertisement

The official winner will be declared in 2003. The closest number will win. In case of ties, the pot will be split.

The winner will take home a leather-bound copy of “The Double Helix,” autographed by its author, Nobel laureate James Watson, who helped discover the structure of DNA 50 years ago and launch the age of genetic engineering.

To put a number in the official running, however, contestants must personally sign their name and write their e-mail address and their number in a book maintained by David Stewart at the Cold Spring Harbor Laboratory on New York’s Long Island, where Birney proposed the wager during a scientific conference earlier this year.

“I have turned away hundreds of e-mail guesses from around the world,” said Stewart, conference director at Cold Spring Harbor. He has abstained from betting himself.

Rubin, who led the publicly funded effort to sequence the genome of the fruit fly, has waited to place his bet until he visits the lab on Long Island in August.

“I think my bet is going to end up on a number in the 50,000 to 60,000 range, something on the order of 55,000 genes, which is what most scientists think,” he said.

Advertisement

“This is like a public opinion poll among scientists,” Rubin said. “You can see what they all make of the conflicting data.”

(BEGIN TEXT OF INFOBOX / INFOGRAPHIC)

Making Book on Genes

Uncertainty over the number of human genes is so great that hundreds of researchers have entered a formal gene sweepstakes to bet on the correct number.

Contest rules and other information can be found on the Genesweep website at: https://www.ensembl.org/genesweep.html

*

The Rules

* It costs $1 to place a bet this year, $5 in 2001 and $20 in 2002.

*

* Bets are on one number. Closest number wins. In case of ties, the pot is split.

*

* Determination of the gene number will occur at a meeting in 2003.

*

* One bet per person per calendar year.

*

Current Voting

Number of bets: 228

Highest estimate: 200,000

Lowest estimate: 27,462

Median: 53,700

*

Source: Ensembl project

Advertisement