Human Research Protection Program

De-identifying Non-Protected Health Information (PHI) Data

It is the investigator's responsibility to keep his/her subjects' data as safe and secure as possible. Often investigators like to work with de-identified data to decrease the chance of a breech in confidentiality. 

De-identification is the process by which all links between the subjects' personally identifying information and their research data are severed and the investigator has no code by which to re-identify them. Note, however, that there is always a risk of re-identification without a code, no matter how small. 

Some common direct identifiers that a data set cannot include if wanting to be categorized as de-identified are:

  • Names

  • Addresses

  • Telephone numbers

  • Fax numbers

  • Email addresses

  • Social media usernames or handles

  • URLs/IP addresses

  • Social Security numbers

  • Dates of birth

  • Dates of death

  • Student identification numbers

  • License / certificate numbers

  • Medical record numbers

  • Health plan numbers

  • Dates of service

  • Account numbers

  • Vehicle/serial/device numbers

  • Facial photographs or images

  • Biometric identifiers, including voices and fingerprints

  • Any other unique identifying numbers, codes or characteristics

In addition to the direct identifiers listed above, the investigator must be cognizant of indirect identifiers, which, in some circumstances, may make it possible to identify an individual deductively (i.e., if the sample set is too small or the indirect identifiers too many. (See examples below.)

It is important to remember that the more indirect identifiers investigators collect, the higher the risk of re-identification. In addition, the investigator must keep in mind that there may be information publicly available on the subjects that are unconnected from the data collected for the specific research in question, and cumulatively this information may be used to re-identify the subjects. 

The IRB will consider all indirect identifiers along with sample size to determine whether a dataset is truly de-identified.

Examples

  1. An investigator collects the religious affiliation, sex, grade level and academic major of a sample of 500 students. The sample set includes one Christian female freshman philosophy major ...

  2. An investigator collects the marital status of 50 recent college graduates. The sample set includes one widow / widower ...