Human Research Protection Program

Home / Special Topics / Anonymous vs. Confidential

Anonymous vs. Confidential

In research, the terms "anonymous" and "confidential" are often confused and treated as interchangeable. They have, however, very distinct meanings, and it is important for an investigator to understand the difference when developing his/her research study.

When conducting human subjects research, the level of risk to which subjects are exposed is an important consideration that the Institutional Review Board must determine in the course of its review. It is the investigator's responsibility to minimize the potential for harm to his/her research subjects.

The distinction between anonymous and confidential data relates to this level of risk to subjects, and the investigator must clearly define the activity as anonymous or confidential prior to conducting the research.

No activity can ever be both anonymous and confidential (though, a single research study may utilize numerous activities, or methods of data collection, some of which yield anonymous data and some of which yield confidential data).

Expand All

Anonymous

Anonymous refers to data that can in no way be linked to information that could potentially be used to identify or trace a specific subject. When an investigator promises anonymity, even the investigator him/herself cannot link the research data collected with the individual from whom it was collected.

Example: Surveys are often conducted anonymously, particularly when completed online. However, even conducting a survey online does not guarantee anonymity. Computers have IP (internet protocol) addresses by which a user may be identified. The investigator, therefore, must ensure s/he uses survey software that does not track IP addresses.

It is commonly believed that data is anonymous if the investigator has not collected direct identifiers, such as name, Social Security number or student ID number. It should be understood, however, that indirect and demographic variables, such as age, race or sex, could, in some circumstances, be used to identify subjects, particularly when a number of them are being collected.

Therefore, if the investigator finds it necessary for his/her research to collect specific identifiers, s/he should collect only the identifiers necessary for the research objectives. In addition, the anonymity of the data may be invalidated due to small sample size and/or a sample that is not diverse. If precautions are not taken, it may be difficult to conceal the identity of the subjects and relatively easy to link the subjects to their data

Example: An investigator collects the religious affiliation, sex, grade level and academic major of a sample of 400 students. The sample includes one Christian female freshman philosophy major.
Example: An investigator collects the marital status of 50 recent college graduates. The sample includes one widow/widower.
Example: An investigator collects the income level (in ranges) of 100 recent high school graduates. The sample includes one graduate earning between $100,000 and $150,000.

Anonymous data collection involves the lowest level of risk or potential for harm to the subjects.

Confidential

Confidential refers to private information a subject discloses with the expectation that it will not be divulged to others without that subject's permission. When an investigator promises confidentiality, the subject is asked to supply information that could potentially identify that subject, which is then linked to the research data collected from the subject with the understanding that the investigator will not disclose the information to others outside of those for whom the subject has given the investigator explicit consent to share (i.e., the research team).

Example: Surveys conducted in-person are considered confidential rather than anonymous, as the investigator can place the subject, even if the investigator has collected no other identifying information.

It is important to note the use of the terms divulge and disclose, as they point to an important aspect of confidentiality that the investigator must always keep in mind — that privacy cannot be guaranteed. While the investigator may promise not to share the subjects' private information, it may still be discoverable by outside parties. When dealing with confidential information, then, the investigator must ensure the information is collected and stored in such a way as to minimize discovery by outside parties.

Example: One-to-one interviews that are conducted in a public place may be overheard by others.
Example: The investigator stores identifiable data on his/her computer unencrypted. The computer is left unattended and found by an outside party who traces the data to the subjects.

Confidential data collection involves a higher level of risk or potential for harm to the subjects than does anonymous data collection. It should be noted that there are multiple levels of risk in confidential data collection and storage.

Example: The fewer the number of individuals who have access to the data, the lower the level of risk. Focus groups involve a higher level of risk than do one-to-one interviews, as the subjects must rely on the ability of all other subjects as well as the investigator to maintain the confidentiality of the information shared.
Example: The more securely the data is stored, the lower the level of risk. Encrypted computers are more secure than locked file cabinets, and encrypted servers such as box.com are more secure than personal computers. Data should always be stored in the most secure manner possible for that particular data.

Protecting Confidentiality

Prior to a subject's participation in research, s/he must be told whether his/her involvement and the data collected will be anonymous or confidential. If the data are to remain confidential, it is also important for the investigator to discuss with the subject during the process of informed consent the level of confidentiality that can be offered and the potential for breach of confidentiality. This should, as well, be noted on the informed consent form.

Note that there are times when breaking confidentiality may be required. Investigators who are mandated reporters, for example, must disclose to subjects that they are legally obligated to report suspected child or elder abuse, or if the participant or others are in immediate risk of harm.

An important consideration in the use of confidential data is the investigator's responsibility to keep the data as safe and secure as possible. The investigator can do this in a variety of ways (note the following list is in no way exhaustive):

Limit access to the data to as few individuals as possible.
Code the data whenever feasible.
Store hard copies of the data in locked cabinets in locked rooms.
Store the data, master code list and informed consent forms in separate locations.
Transfer (from person to person, place to place) the data (field notes, recorded interviews, informed consent forms) promptly and securely .
Transcribe recorded data as soon as possible and destroy original recordings.
Store data on an encrypted computer or server. (At Brandeis University, it is strongly encouraged that investigators use the university’s encrypted server, box.com, where the default settings provide maximum security for all accounts.)
Upload data to an encrypted server promptly (do not wait until all data is collected).
Delete identifiers as soon as is feasible.

It is imperative that investigators keep in mind at all times the potential harm (social, legal, economic, physical) to subjects that may result from a breach in confidentiality. Plans for data security must be outlined in the research protocol when discussing the provisions for managing risk, and approved by the Institutional Review Board.

Coding Data

A common practice for reducing the risk of a breach of confidentiality is for the investigator to code the information and data collected from the subject. When data is coded, a subject's identifying information is separated from the subject's research data and replaced with a code. The investigator then keeps a "master list" of the subjects' names and identifying codes. For security purposes, the master list is kept separately from the subjects' data.

Example: Studies that involve interviews often utilize pseudonyms to mask subjects' identities. In such cases, the investigator assigns each subject with a code name, which is used in all interview notes in lieu of the subject's name. A master list linking the subjects' names to their pseudonyms is developed to keep track of the data and is secured in a locked file cabinet. The interview notes are secured separately on an encrypted server.

De-identification

Another common practice — one that often leads to confusion between anonymous and confidential — is for the investigator (or the individual/organization from which the data originates) to de-identify the data collected from the subject. De-identification is the process by which all links between the subjects' personally identifying information and their research data are severed and the investigator has no code by which to re-identify them.

Example: Often when an investigator conducts secondary data analysis, all identifying information has been scrubbed from the data prior to the investigator receiving it. (Note that in this case, the individual/organization whence the data originate may retain the identifying information with the data.)
Example: An investigator removes all identifying information from his/her research data and maintains no code with which to re-identify the data.

It is important to remember that the more indirect identifiers investigators collect, the higher the risk of re-identification. In addition, the investigator must keep in mind that there may be information available on the subjects that are unconnected from the data collected for the specific research in question and cumulatively, this information may be used to re-identify the subjects.