By tracking coronavirus mutations, scientists aim to forecast the pandemic’s future

A woman a face shield, mask and gloves holds a rack of vials
Selam Bihon, a clinical lab scientist at the Stanford Clinical Virology Laboratory, processes upper-respiratory samples from patients suspected of having COVID-19. The U.S. is stepping up efforts to monitor genetic changes in samples like these.
(Noah Berger / Associated Press)

When a novel coronavirus began its murderous run a little more than a year ago, the use of genetic sequencing as a tool for tracking infectious disease outbreaks was in its infancy.

How quickly a young science grows up in a pandemic.

This month, the practice of genomic surveillance begins its coming of age. The Biden administration has launched an unprecedented push to sequence tens of thousands of coronavirus samples from newly-infected people in every state and territory of the U.S., and build a program to plumb that data for insights into where the pandemic is heading next.

The centerpiece of the project — the new National SARS-CoV-2 Strain Surveillance, or NS3, program — will be tasked with the job of detecting emerging new variants of the virus in time to mount effective responses.


But that’s just the beginning. By allowing scientists to see the pandemic’s patterns of growth, genomic surveillance could help officials anticipate when new outbreaks will put stress on hospital capacity, where vaccination drives could cool hot spots, and whether the rise of worrisome strains should dictate more stringent public health measures.

If carried out as intended, the initiative could take a technique that has so far been used to reconstruct outbreaks after the fact and use it to see around the pandemic’s next corner.

It’s a capability that can’t come soon enough, researchers said.

The United States needs to expand its capacity for genomic surveillance “rapidly and exponentially,” said Kristian Andersen, who directs a program of infectious disease genomics at the Scripps Research Institute in La Jolla. “The faster, the better.”

There’s certainly room for improvement. During a single week in early January, for instance, just 251 viral samples were fully sequenced in the entire country. That figure represented just over 0.01% of the nation’s 1,972,530 new infections confirmed that week.

Confusion over the terms ‘variant’ and ‘strain’ predate this coronavirus. It seems virologists never got around to defining their terms.

Feb. 4, 2021

“We’re working with blindfolds on if we’re unable to track the emergence and spread of these new SARS-CoV-2 variants,” said Dr. Charles Chiu, a UC San Francisco geneticist whose lab played a key role in uncovering a homegrown variant that is growing fast in California.

If the surveillance program succeeds, Chiu added, “we’ll be prepared not only for SARS-CoV-2, but also for any other emerging infectious threat in the future.”


To gain that visibility, seven universities and a welter of private labs across the country have ramped up their efforts and are now analyzing 3,000 new viral samples a week, said Dr. Rochelle P. Walensky, the new director of the Centers for Disease Control and Prevention. As more universities come online in the next month or so, the CDC’s goal is to coordinate the genetic sequencing of 7,000 samples each week and disseminate insights gleaned from them to states and the public, she said.

In addition, the Biden administration is spending $15 million to help state and territorial public health departments build up their capacity to collect, share and analyze viral samples, and then merge that data with the traditional practice of epidemiology. Funding for the broader initiative comes from a pot of $19.1 billion set aside in an emergency funding measure to expand the nation’s capacity for testing and contact tracing.

The NS3 program calls for subjecting a weekly sampling of 750 viral specimens to lab tests designed to detect whether they have picked up new capabilities, including the ability to “escape” the effects of medicines or vaccines used to treat or protect against COVID-19.

“This is a good start,” Walensky said.

COVID-19 patients who take months to overcome their coronavirus infections despite treatment can become incubators of dangerous new strains.

Jan. 30, 2021

For more than a decade, the CDC and state agencies have used genomic sequencing to investigate hospital infections, monitor outbreaks caused by contaminated food, and track competing strains of influenza, HIV and tuberculosis. But they were not prepared for the size and speed of the COVID-19 pandemic.

As the outbreak exploded, state and local public health departments have largely abandoned hope of tracking viral spread through contact tracing. With so many people infected, so much asymptomatic spread and such spotty testing, there’s been limited time for — or value in — the gumshoe practice of asking people where they’ve traveled and with whom they had contact in order to sketch out a tree of sequential infections.

Genomic surveillance offers a high-tech alternative for mapping the virus’s spread.

To understand how, it helps to know that the 30,000 nucleotides that make up the SARS-CoV-2 genome usually mutate at a predictable rate as the virus passes from person to person. If you sequenced viral samples from several people who were all infected by a single person, you could lay each sample’s long sequence of letters alongside the others and see that they sprang from a common source. If you then sequenced samples from the people they subsequently infected, the progression of letter changes observed in those later samples would allow you to infer who infected whom.


Before long, you’d have a tree of infections that reveals when and where new shoots have emerged, how robustly they have grown, and where they are likely to lead.

We’re working with blindfolds on if we’re unable to track the emergence and spread of these new SARS-CoV-2 variants.

— Dr. Charles Chiu, a geneticist at UC San Francisco

If a new branch began sending out new shoots at a surprising rate, it might be evidence of a superspreader event — a holiday party, perhaps, or a political rally. Or you might suspect that the virus’ genetic makeup had changed, as it did in Britain, to make it more contagious.

Zoom in on this picture and you can draw lines among individual infections. Zoom out and the overall shape of the pandemic becomes evident: patterns of spread, populations with special vulnerabilities, inflection points.

But achieving that level of resolution will require a lot more work.

The CDC has been collecting and sequencing coronavirus specimens since the beginning of the pandemic. By Jan. 1, a public database managed by the National Institutes of Health included the complete genomes of 34,873 samples for researchers to pore over.

But they didn’t offer a true picture of what was going on around the country. Most of the submissions came from states like Washington, California, New York and Texas, which implemented their own sequencing programs. And only a few states have the capability to test coronavirus samples in labs to see whether they are resistant to COVID-19 vaccines or medicines.

Gloved hands hold a rack of sample vials  near lab equipment
A scientist at the Stanford Clinical Virology Laboratory processes samples from patients suspected of having COVID-19. Scientists analyze samples like these to watch for genetic changes that might make the coronavirus more dangerous.
(Noah Berger / Associated Press)

Universities and medical centers with well-funded genetics labs were churning through additional samples as well. By the start of 2021, they had shared sequencing data from another 52,364 U.S. specimens with a global consortium of genetics researchers known as GISAID.

Even when put together, examining that sequencing data was like peeking through a keyhole. Entire places and populations were missing, and those that were included got only a cursory look. That made it impossible to get a realistic picture of the pandemic, much less identify or anticipate new genetic variants.

Researchers once believed t would take months or even years for the virus to develop resistance to vaccines. The speedy evolution is largely a result of the virus’ unchecked spread.

Jan. 29, 2021

As it lashes together a vast system of commercial, academic, medical and governmental organizations, the CDC will have to ensure that those involved can share samples and communicate their findings quickly and in a common language.

Obstacles abound. Labs follow different practices in sharing, analyzing and retaining samples. Privacy practices vary. Some state public health departments are still figuring out how to link epidemiological data — the time and location a sample was taken, the demographic and medical background of the person who gave it and how sick he or she became — to a genetic sequence.

What’s more, the nation’s public health departments are largely staffed by microbiologists and epidemiologists who trained at a time when genetic science was a small part of the curriculum, said Dr. Gregory L. Armstrong, who directs the CDC’s Advanced Molecular Detection Program and will oversee the agency’s pandemic surveillance initiative.


“There’s a lot of enthusiasm” about how genetic sequencing could help sharpen the picture in public health, Armstrong said. “But we need a larger part of that workforce to be agile with genomic data.”

Dr. Marc Suchard, a professor of biomathematics and human genetics at UCLA, said when that information is merged with epidemiological data, researchers and public health officials will be able to “reconstruct a much richer history of how, where and when the virus is moving through our communities” and effectively act to counter it.

Chiu likens the undertaking to Operation Warp Speed, the federal program to accelerate the development of and distribution of COVID-19 vaccines that has had its share of hiccups along the way.

“Everyone welcomes this,” Chiu said. “But the devil’s in the details.”