Scientists poised to decode human genome
Saturday, June 03, 2000 10:52 IST
Deep in the heart of Celera''s headquarters is a control room that looks like something right out of a space fantasy. Banks of computer screens line the walls and a console stretches along the floor, but the swivel chairs in front of each workstation along the console are empty.
"We only use this room when something goes wrong,'''' said Paul Gilman, who is in charge of policy planning for Celera.
And so far, very little has gone wrong for the firm, which is pushing ahead to decipher the human genome -- the entire collection of human genes. Celera was founded on the premise that it could use a radical shortcut called whole-genome shotgun sequencing to map the genome. Much more quickly than even its brash founder Craig Venter thought possible, it has accomplished this project.
Sometime in June, Celera is expected to announce it has the complete rough-draft sequence of the human genome. At about the same time, an international team of publicly funded scientists will announce the same thing.
But the word complete is misleading. What they will have will be just the first step -- a bare-bones outline -- of all the genes that make us tick.
The sequence will be three billion repeats of four letters, A, C, T and G, that represent the nucleotides that make up DNA. Both Celera and the Human Genome Project will know the order of these repeats, but the real work will lie in deciphering this repetitive code.
Able To Move On To Next Stage
"It is pretty exciting to be at this stage at last where we can see the whole genome, see exactly what needs to be done and be able to move on to the next stage,'''' said Dr. Robert Waterstone, director of the genome sequencing center at Washington University School of Medicine in St. Louis, one of the leaders of the Human Genome Project.
The ultimate prize will be an understanding of what life is: how it begins, how it develops and what makes it end. Doctors should be able to make a genetic examination of a patient and give a pretty accurate prediction of his or her risk for certain genetically determined diseases so they can tailor prevention and treatments for each individual.
Scientists will understand what genes turn on and off at each stage of development and will be able to grasp the precise nature of disease. But this is decades away. What scientists will have next month is a long, mostly undecipherable stretch of code -- like ATTGCTGCAT -- that will have to be read.
The letters stand for adenine, guanine, cytosine and thymine. These nucleotides pair up to form the "rungs'''' in the ladder of DNA, which twists into a coil. Each set of three nucleotides codes for an amino acid, the basic building blocks of proteins, which in turn make up cells and hormones and everything else in the body.
For instance, a sequence of AAA codes for lysine, GCA for alanine, TGT for cysteine and so on. Proteins can be made up of numbers and combinations of these amino acids. A protein coded for by an average-sized gene with 3,000 base pairs of A, C, T and G will be made up of about 1,000 amino acids.
But codes do not read straightforwardly. Not all groupings of A, T, C and G make for a gene. Some are special codes that signal the start of a gene, some signal the end, and some are ''''junk DNA'''' that may be key to fully understanding the genome.
Powerful computers at Celera and the Human Genome Project''s member labs will be used to sort out where the genes are. They can get a head start from human genes already sequenced and an even bigger boost from labs that have sequenced entire human chromosomes -- the structures that carry the genes.
Celera will have the advantage of seeing the Human Genome Project''s data, which is posted on the Internet. And all teams can compare human sequences with those of various microbes and insects that have been mapped.
Celera finished mapping the fruit fly Drosophila in March and found many genes in common with human beings.
There is disagreement on just how many genes there are. The official estimate is somewhere between 60,000 and 100,000, but several papers published in this month''s issue of Nature Genetics show what an imprecise science genomics still is.
Brent Ewing and Philip Green of Washington University said they had figured there are only about 34,000 genes in a human genome. Jean Weissenbach and colleagues at Genoscope in France said their estimate was 30,000.
In the same issue, John Quackenbush and colleagues at The Institute of Genome Research in Rockville, one of the anchors of the Human Genome Project, used a different method to estimate there are 120,000 genes.
A few months ago, California-based Incyte, which is quietly trying to do its own genome sequencing, predicted it would find 140,000 different human genes. And California-based DoubleTwist Inc, and Sun Microsystems Inc, said this month they had used data from the Human Genome Project to find 65,000 genes already and they had a potential for 40,000 more.
The task of finding all these genes starts with about 15 anonymous volunteers and a whole lot of bacteria. Celera is using the genes of five people. It has finished the first step of sequencing one person already and plans to overlap this with the four others.
All will remain anonymous but were chosen to be somewhat diverse. "We are not looking for all variations in the human population,'''' Mark Adams, vice president for genome programs at Celera, told Reuters. "We are sequencing the entire genome of each individual, unlike the NIH, which is doing a mosaic.''''
The publicly funded Human Genome Project he referred to is using a mosaic of about 10 different people, who are also not being identified. But NIH (National Institutes of Health) researchers say most of the samples the Human Genome Project laboratories are analyzing come from a single man.
The samples come from sperm or blood. DNA is removed and inserted into colonies of bacteria, usually the E. coli beloved of laboratory scientists, which pump out DNA like little factories.
In both Celera''s labs and those working on the Human Genome Project, robots sample the growing colonies and find the ''''best'''' DNA, which is broken into pieces. Scientists just a few years ago would have had to painstakingly drip this onto gels that would filter out the DNA as the first step toward sequencing.
But now a visit to Celera or the Massachusetts Institute of Technology''s Whitehead Institute, powerhouse of the Human Genome Project, reveals a roomful of gleaming ``3700s,'''' the machines made for shotgun sequencing by PE Biosystems, a sister company of Celera under the PE Corp. umbrella.
Sucking Up Dna 24 Hours A Day
The 3700s can labour for 24 hours a day without supervision, sucking up DNA in their 96 capillary tubes. As the DNA wicks up the tube, it spills over the top, nucleotide by nucleotide. A laser detects each one and a video camera inside picks up the fluorescent flashes. This information is made digital and sent to the computers.
At Celera, the computers stand in a huge room, silent but for the whirring of fans and giving off a ghostly bluish glow. Other rooms at Celera are entered using swipe cards but Gilman lays his hand on a palm reader to get into the computer room.
Here is where the real work is done. "We are taking tens of millions of pieces of puzzle and putting it into a pile and telling the computer, ''Now figure out the puzzle,'''''' he said.
Only a computer could see the patterns in 3 billion repeats of just four letters.
The whole room has 50 terabytes of storage -- equivalent to 4 or 5 libraries of Congress. Its NIH counterpart, GenBank, is fed information daily by various Human Genome Project labs.
Here the two projects part ways. Celera hopes its hastily assembled version of the genome will offer a good enough picture to be worthwhile to subscribers. But Human Genome Project scientists predict it will be too full of holes.
Their more carefully assembled version will take longer to produce but, they hope, will be of much higher quality.
One key will lie in just how important the "junk'''' DNA is. Celera thinks it can get along fine by identifying the genes, but some researchers predict that the junk DNA may, like a dusty antique store, harbor treasures -- perhaps sequences that control how and when genes work.