Skip to Content


Health Data Sharing and PrivacyDe-identification of DataHealth Information and Data Sharing

Managing Privacy Risks to Advance Health Equity through Dissemination of Disaggregated Data

January 10, 2023

Overview

Dissemination of data disaggregated by race and ethnicity is an important step in advancing health equity. However, the public dissemination of datasets that include race and ethnicity raises important legal considerations around privacy, primarily around re-identification. Re-identification refers to the ability to use data from a de-identified dataset to identify individuals. Modifications to the released data can reduce re-identification risks while maximizing the data’s utility.

While data aggregation—generally, the process of compiling and summarizing different data from different sources—is a well-recognized concept, the term data disaggregation may not be as universally recognized. Data disaggregation refers to the separation of compiled information into smaller units based on common characteristics such as race and ethnicity, or gender. Dissemination of data disaggregated by race and ethnicity (herein disaggregated data) is an important step in advancing health equity. Public dissemination of COVID-19 datasets disaggregated by race and ethnicity, for example, exposed the disproportionate impact of the virus on Black and Latino individuals in a widespread manner across the country. At the same time, the public dissemination of datasets that include race and ethnicity raises important legal considerations around privacy, particularly for members of underserved populations who may be at greater risks of “re-identification” and privacy harms.

Re-identification refers to the ability to use data from a de-identified dataset to identify individuals, and it is the most prominent risk presented by dissemination of disaggregated data. Although race and ethnicity are not direct identifiers, they can be thought of as quasi-identifiers, owing to the potentially significant increase in risk of re-identification when they are included in a dataset. Such risks can be realized through linkage with demographic characteristics, which are often found in external datasets.

Re-identification risks from race and ethnicity data can vary dramatically from one state to another, and even within different areas of a state depending on the geographic distributions of racial and ethnic groups. Further, re-identification risks from race and ethnicity increase with the presence of additional characteristics in public health datasets, such as age, gender, and marital status. These risks become even greater when the data relates to smaller populations.

Compounding the problem, legal minds, including judges, frequently disagree on the extent of such risks of reidentification. In a recent case, a majority of the Wisconsin Supreme Court affirmed the dismissal of a challenge to the state’s production of records listing certain businesses with two or more COVID-19 cases, opening the door to the release of the information. The chief justice, in a dissenting opinion, strongly disagreed, arguing the majority’s decision essentially sanctioned disclosure of private medical information of the individuals who had had COVID-19.

The Health Insurance Portability and Accountability Act’s (HIPAA’s) expert determination method provides a reasoned basis for responsible management of re-identification risks while balancing these risks with the need for greater dissemination of disaggregated data to promote health equity. Under that method, an expert familiar with generally accepted statistical and scientific principles and methods for rendering information not individually identifiable may determine there is no more than a “very small” risk of re-identification.

Qualified experts, trained in disclosure risk assessment and limitation methods, can suggest possible modifications to the released data that will reduce re-identification risks to acceptable thresholds while maximizing the data’s utility for the desired purposes. For example, when more detail is required regarding racial/ethnic categories, the level of detail can be reduced for other quasi-identifiers. Such trade-offs to assure very small re-identification risks might be achieved, for example, by:

  1. providing age groups instead of age in years;
  2. increasing the size of the geographic reporting units;
  3. collapsing certain geographic areas together; or
  4. selectively censoring race/ethnicity in areas where too few individuals exist to allow safe reporting.

To assist public health practitioners and attorneys across state, Tribal, and local governments in the use of data to advance health equity, the Network for Public Health Law has produced a legal handbook that addresses the role of law in collecting and disseminating public health data disaggregated by race and ethnicity. You can download Disaggregation of Public Health Data by Race and Ethnicity: A Legal Handbook on the Network’s website.

This post written by Stephen Murphy, J.D., Senior Attorney, Network for Public Health Law– Mid-States Region Office and Daniel Barth-Jones, M.P.H., Ph.D., Principal Privacy Expert, and Contributor to Disaggregation of Public Health Data by Race & Ethnicity: A Legal Handbook.

The Network for Public Health Law provides information and technical assistance on issues related to public health. The legal information and assistance provided in this document do not constitute legal advice or legal representation. For legal advice, readers should consult a lawyer in their state.

Support for the Network is provided by the Robert Wood Johnson Foundation (RWJF). The views expressed in this post do not represent the views of (and should not be attributed to) RWJF.