Readers have submitted hundreds of questions about The Times’ coronavirus tracker, which compiles the latest data on the spread of COVID-19 in California. Here are answers to the most common queries.
- Where does your data come from?
- Why does the number of new cases and deaths dip every weekend?
- What is the difference between a confirmed and probable case?
- How do you estimate the number of recovered cases?
- Why do your numbers sometimes vary from other websites?
- Why do your totals differ from those announced by Los Angeles County?
- How do you group counties into regions?
- Why can't I find a nursing home I'm looking for?
- How is the doubling time calculated?
- Can I have access to your underlying data?
- Who can I contact if I find an error?
Where does your data come from?
Figures for cases and deaths are drawn from data files released by the California Department of Public Health. The agency gathers figures from the 61 county and city health departments across the state.
These agencies independently compile lists of COVID patients in their area and post them on the web.
Up until September 2021, reporters in The Times’ Data and Graphics Department visited each local agency website, logging the latest totals for cases, deaths and recoveries in a Times database.
The data collection effort was done in partnership with journalists at the San Francisco Chronicle, the San Diego Union-Tribune, KQED, KPCC, CapRadio and Calmatters. Also joining is Big Local News, a project of the Stanford Journalism and Democracy Initiative, which will help Stanford students and journalists at smaller outlets access the data.
For the first year of the pandemic, those figures were typically days ahead of state tallies. When the data feed improved, The Times ended its independent survey and shifted to rely on the consolidated database released by the public health department. That led to a one-time change in the charts and totals on this site.
Why does the number of new cases and deaths dip every weekend?
Due to bottlenecks at testing labs and other delays in government bureaucracy, the totals posted by local agencies do not reflect the precise date when patients test positive or die. That is especially true on weekends and holidays, when the government’s compilation of new data slows due to reductions in staffing.
That leads to a predictable drop in the number of new cases and deaths announced on weekends and holidays. It also results in a predictable surge in new cases and deaths at the start of each work week.
For this reason, The Times looks to seven-day averages of new cases and deaths instead of single-day totals to evaluate the latest trends. Readers are advised to do the same.
What is the difference between a confirmed and probable case?
A confirmed case is an infection that has been verified with a molecular test, such as a polymerase chain reaction test that looks for the genetic fingerprint of the coronavirus. A confirmed case is the result of a positive molecular test, such as a polymerase chain reaction test. Federal standards consider this method the main criteria for diagnosing a SARS-Cov2 infection.
Some California counties report both confirmed and probable cases as part of their totals. The Times includes both confirmed and probable cases in its numbers.
How do you estimate the number of recovered cases?
There is no comprehensive source for the number of people who have tested positive for COVID-19 and recovered.
While many local health agencies publish a tally of how many patients have recovered from the virus, some of the most populous — like Los Angeles County — do not provide this information. The result is that there is no way to compile a complete statewide figure.
Some agencies and experts estimate the number of recovered cases using a simple formula.
The number is calculated by subtracting active case, deaths and currently hospitalized patients from the total case count. Active cases are estimated as the number of new cases over the last 14 days, which is based on the amount of time the CDC says most adults remain infectious with COVID-19. The Times consulted biostatisticians at UCLA and UCSF to develop the approach. Some health agencies use the same method.
The result matches closely in areas with officially reported counts. One limitation is that there may be some overlap among active and hospitilized cases, leading to an underestimate in the number of recovered patients.
Why do your numbers sometimes vary from other websites?
There are several reasons why the charts published by The Times can be different.
The Times’ count relies on the latest reports from the California Department of Public Health, which are the most comprehensive figures available. Some local health agencies publish more precise charts that reflect the date when each patient tested positive, rather than the day cases were first announced.
That method offers a more accurate reflection of the spread of the virus, but it can take days or weeks to gather the necessary data. It is also not universally published by all agencies, making it impossible to compile statewide. Due to the delays and the lack of universal availability, The Times chooses to plot tallies using the date that cases and deaths are reported.
Additionally, some local agencies revise past tallies after errors are identified and corrected. In cases where these updates are announced and posted online, The Times seeks to reconcile its count to match.
Finally, some local agencies choose to exclude cases from jails and prisons. The Times includes them.
Why do your totals differ from those announced by Los Angeles County?
Cases and deaths in Los Angeles County are initially compiled by three local agencies: the L.A. County Department of Public Health, the Long Beach Department of Health and Human Services and the Pasadena Public Health Department.
The county department compiles figures from all three agencies and releases the combined total each day. The data it includes from Long Beach and Pasadena are often a day behind. The Times gathers its numbers from the state public health department, which can consolidate the figures on a different time schedule than the local agencies.
How do you group counties into regions?
The California Department of Health organizes the state’s 58 counties into five regions.
Southern California is Imperial, Inyo, Los Angeles, Mono, Orange, Riverside, San Bernardino, San Diego, San Luis Obispo, Santa Barbara, and Ventura counties.
San Joaquin Valley is Calaveras, Fresno, Kern, Kings, Madera, Mariposa, Merced, San Benito, San Joaquin, Stanislaus, Tulare and Tuolumne counties.
Greater Sacramento is Alpine, Amador, Butte, Colusa, El Dorado, Nevada, Placer, Plumas, Sacramento, Sierra, Sutter, Yolo and Yuba counties.
Bay Area is Alameda, Contra Costa, Marin, Monterey, Napa, San Francisco, San Mateo, Santa Clara, Santa Cruz, Solano and Sonoma counties.
Northern California is Del Norte, Glenn, Humboldt, Lake, Lassen, Mendocino, Modoc, Shasta, Siskiyou, Tehama and Trinity counties.
Why can't I find a nursing home I'm looking for?
Case and death counts for nursing homes and assisted-living facilities are drawn from two state sources: the California Department of Public Health, which oversees skilled-nursing facilities, and the California Department of Social Services, which oversees residential-care facilities for the elderly and other adult residential facilities.
The agencies post new totals most days, which are collected by Times reporters and entered into a combined database. To appear on our tracker, a facility must be licensed by one of the two agencies and appear on either list.
How is the doubling time calculated?
The growth in coronavirus cases has followed a similar pattern around the world. Left unchecked, the virus spreads in a predictable way, with each day’s count increasing by a reliable percentage over the previous day. In mathematics, this pattern is known as exponential growth.
One way to measure the speed of the spread is by calculating the amount of time it would take for an area’s total cases to double. For instance, if, over a series of days, an area reports case totals of 100, followed by 130, followed by 170, followed by 220, the case count is growing by about 30% per day. Looked at another way, the total number of cases doubled in less than three days.
To gauge recent trends, The Times follows this process for the state and all 58 counties with the last seven days of data. Using statistical software, those figures are fitted to an exponential curve to estimate the current rate of growth. With some algebra, that percentage growth can be translated to a doubling time. In the previous example, a 30% growth rate is equivalent to a doubling time of 2.6 days.
Can I have access to your underlying data?
In an effort to aid scientists and researchers in the fight against COVID-19, The Times has released its database of California coronavirus cases to the public.
The database is available on Github, a popular website for hosting data and computer code. The files are updated daily at github.com/datadesk/california-coronavirus-data.
Who can I contact if I find an error?
Compiling The Times database requires hours of data entry each day. Mistakes happen. If you see information that you believe is incorrect or out of date, please contact Sean Greene at firstname.lastname@example.org or Iris Lee at email@example.com.