Editorial: Privacy vs. accuracy? Questions swirl around 2020 census

A laptop screen shows a site with the words "United States Census 2020" and "Start questionnaire."
A new statistical technique meant to protect the privacy of respondents to the 2020 census has raised questions about the reliability of some final counts.
(U.S. Census Bureau)

In the long history of the United States census — the first was conducted in 1790 — few have faced headwinds as strong as the current count.

First there was congressional underfunding that slowed the Census Bureau’s ability to do community outreach and other ground-prepping activities. Then came the Trump administration’s efforts to politicize the census by adding a citizenship question intended to suppress participation in Democratic-leaning immigrant communities, causing undercounts that could reduce federal aid to those communities and skew redistricting in favor of Republicans.

And then the COVID-19 pandemic delayed in-person data collection and forced the Census Bureau to push back by several months the deadlines for reporting each state’s allotment of seats in the House of Representatives and for sending states their redistricting data. The delays will make it harder for states to complete the redistricting process in time for the 2022 elections.


Iran and the U.S. are finally talking, but not face-to-face

April 6, 2021

Now there’s a fresh breeze stirring: Some critics argue that the Census Bureau’s decision to use a statistical technique, differential privacy, to help protect respondents’ privacy may render some of the census data unreliable, potentially exaggerating the size of rural communities (and thus increasing their representation in Congress and state legislatures) and undercounting nonwhites. Those problems have particular significance in California and other states with large numbers of nonwhite residents, and for states with large rural populations.

The scope of the problem can be measured by the breadth of its critics. California legislative leaders have written to the White House questioning the effects of differential privacy and imploring that “the work of the Census Bureau ... be carefully reviewed by those who have the confidence of the president to accomplish the goals we all share.” The state of Alabama filed a lawsuit last month over similar questions. And two immigrant rights groups, Asian Americans Advancing Justice and the Mexican American Legal Defense and Education Fund, issued a report Monday warning that “if there is a systemic bias in the resulting data, the legal requisites of redistricting would not be met” and could violate the federal Voting Rights Act.

How to fix the problem is unclear, although census officials are still working at it. If the issues can’t be resolved, some experts believe the bureau could still produce final numbers using 2010 methodology, but it is running out of time to do so.

The Census Bureau faces two conflicting responsibilities: to protect the privacy of respondents, and to ensure an accurate head count, which is vital to both the reallocation of House seats and the distribution of roughly $1.5 trillion in federal spending each year. To improve the census’ accuracy, the government has in the past used statistical tricks to account for people census enumerators know exist but can’t reach, such as assigning to a non-responsive address average household data for that particular census block.

Los Angeles County’s progress in rolling back needless incarceration and building up a care-based response to public safety has been extraordinary. Now is the best time to proceed: Start the jail demolition countdown.

April 5, 2021

But after the 2010 census, officials came to realize that the advent of commercial data brokers and stronger computers meant the privacy of census respondents could be compromised by sorting through commercially available data (often used by businesses to target consumers) and matching it against census data to extract individual characteristics, a process called re-identification. That led the bureau to incorporate differential privacy techniques that inject false data — “noise,” in the parlance — into the census, making re-identification more difficult.

The false data don’t affect statistics in large sets of data — the total population count, for example — but do have an amplified effect on small data sets of the sort that, for instance, are used in redistricting legislative seats.


So there’s the tension — privacy protection versus census data accuracy. The Census Bureau is expected later this month to release a new trial run using differential privacy that you can be sure will be combed over by advocates and others concerned with achieving both an accurate census and protecting the privacy of the respondents. This all may ultimately lead to more court challenges than the one filed by Alabama, leaving the outlook for the final census figures uncertain.

It is in the nation’s best interest that the government and census watchdogs find that sweet spot between completing an accurate census (in time for the redistricting of congressional and legislative districts) while maintaining acceptable levels of privacy protection.

But if the government can’t reach that balance, then it should abandon differential privacy or skew its methodology to emphasize accuracy. As important as protecting privacy is, it doesn’t warrant imperiling the reliability of vital data. Number crunchers might be able to glean a few personal tidbits, but much of that information is already available through commercial data harvesting. Emphasizing privacy over accuracy in this case is the wrong move.