Op-Ed: The broken U.S. data systems that struggled during the pandemic can be fixed

Residents dine socially distant in a New York nursing home.
Residents dine socially distant at a senior housing facility in New York in April, a few months after the state government confirmed it undercounted thousands of COVID-19 deaths in nursing homes.
(Associated Press)

COVID-19 has made us all armchair epidemiologists.

We have all been tracking case counts in our communities, deciphering the curves of hospitalizations and deaths on graphs and gauging what we each can do to reduce risks. The data we use are the same critical bits of information our government needs to make policy decisions about masks, vaccines, resource allocation and supplies.

For a year into the pandemic, our nation relied on academic, media and other volunteers to collect and report data on testing and cases, rather than on our nation’s public health agency, the Centers for Disease Control and Prevention. The speed, scope and scale of the data overwhelmed the government’s outdated infrastructure until it finally created new systems for functions such as disease tracking and race and ethnicity reporting.

Scientists are skeptical of the Institute for Health Metrics and Evaluation’s new way of counting COVID-19 cases and their claim that the pandemic’s toll is twice as bad as we thought.

May 28, 2021


Heroically, for a year, hundreds of volunteers scraped state dashboards and parsed news conferences for data to share online, while public health officials scrambled to reenter data submitted by faxes and cobble together databases.

Our nation’s data are messy, unreliable, incomplete and slow to be reported, with a lack of standard reporting definitions since they vary by geography. Taken together, these inconsistencies make it easy to understand the litany of data misadventures that have characterized the pandemic response. We’ve had unprocessed, misplaced and forgotten information. We’ve also seen backlogs, delays and data dumps of hundreds to thousands of records on deaths or cases resulting in underreporting, overreporting and distorted timelines.

In some cases, molecular tests used to identify current infections were combined with antibody tests that identify past infections to inflate the number of tests being done. Provisional deaths — of people who probably had COVID-19 but without confirmed tests — and deaths that could be attributed to other underlying conditions were not reported. Nursing home deaths connected to COVID were not reported as such when those patients were transferred to hospitals. Officials in some states removed COVID data from their websites and thus the public’s view.

Over the course of the pandemic, the federal government has made significant strides in tracking COVID-19 data, reporting deaths in a timely manner and creating a secure enclave of clinical data for research. The HHS Protect hub — built for the Department of Health and Human Services with technology from the company Palantir, for which one of us is chief medical officer — makes available near-real-time data from 6,000 hospitals and nursing homes on available ICU beds, ventilators and other scarce supplies and coordinates assistance across nine federal agencies and all 50 state health departments.

But more change is needed. Data are still largely locked into the government’s outdated legacy tech systems that can’t communicate with one another, and bureaucracies are too busy putting out fires to collaborate — both the result of underfunded public health. The U.S. still had to rely on international data to make decisions on vaccine boosters, despite administering more vaccine doses domestically, because we couldn’t link and analyze our own data.

The ‘dataome’ has all the hallmarks of a living system, including a deeply symbiotic relationship to its originators, Homo sapiens.

Aug. 8, 2021

The next pandemic is coming. And it will require a new comprehensive national data infrastructure. NASA is a great model in how it changed from a bureaucratic, siloed system that develops technology in-house to a more open network that embraces fresh ideas and tools from partners. The federal government’s Data Modernization Initiative is pursuing this path but needs a more comprehensive approach to modernization that considers pertinent data from across the federal and state agencies linked to clinical and personal behavior data.


Temporary funding made available during the pandemic should be replaced with $500 million in annual funding. The proposed federal Health Statistics Act would also help by codifying the need for data standards, easier sharing, use of electronic medical record data for surveys and disease monitoring, and requirements for governance to ensure the necessary partnerships. This would make data more apolitical and less subject to manipulation and misinformation.

Improving trust in public health is also essential to these efforts. Trust comes from transparency, and that means linking and sharing data without compromising individual privacy or enabling inadvertent public disclosures. The national data system could restrict clinical information to qualified researchers whose methods have been approved by an independent ethics board, and health data to public health officials who track and control outbreaks. The system could offer unrestricted data without individual identifiers for wider dissemination. Health information systems should also be auditable, with unalterable records of any processing or transformation of the data from collection to analyses and reports.

The private sector can often build tools more cheaply, quickly and efficiently than government can. HHS Protect was stood up in under two weeks. Meanwhile, academia can provide new perspectives on how best to link and analyze data for novel insights.

These investments will strengthen our abilities to handle future pandemics and other disasters. And they will also make a difference in “normal” times, helping to prevent chronic disease and injuries and to address health inequities.

A lot more than a better data platform will be needed to fuel the next revolution in public health: fixing the relationship between public health and the public, creating trust across the political spectrum, hiring and training new public health practitioners and building new partnerships. But now is the time to invest in a modern public health operating system, while interest is still high — and while health data are still part of everyone’s daily lives.

Ali S. Khan is dean of the College of Public Health at the University of Nebraska Medical Center. William J. Kassler is Palantir’s chief medical officer working with the U.S. government.