Pseudonymous Data vs Anonymous Data
- Pseudonymous data is data that has been de-identified from the data’s subject but can be re-identified as needed.
- Anonymous data is data that has been changed so that reidentification of the individual is impossible.
- Pseudonymous data and anonymous data are treated differently under GDPR, the European Union data protection law.
- Tokenization and hash functions can be used to pseudonymize data, while noise addition, substitution, and aggregation can be used to anonymize data.
Personal data protection should be a priority for any company’s data protection framework. Two tactics, data pseudonymization and data anonymization, not only secure personal data but also help companies achieve compliance with privacy laws (like GDPR).
If your company handles personal data, defined as any data attributed to an identifiable person, you may benefit from pseudonymous data or anonymous data tactics. Keep reading to discover the difference between the two kinds of data, and how you can use pseudonymous or anonymous data in your company.
What is Pseudonymous Data?
A ‘pseudonym’ is a false name, which can be helpful for multiple purposes. We are familiar with pseudonyms being used for author’s pen names (Like Theodor Geisel who wrote under the name Dr. Suess) or for superheroes (like Clark Kent who was secretly Superman). However, pseudonyms can also be used to protect the privacy and security of personal data.
Pseudonymized data is personal data that has been changed so that it can no longer be attributed to the original data subject. This can be achieved through hash functions and tokenization, which we’ll go through in-depth later. Creating pseudonymous data is particularly helpful for companies trying to achieve GDPR compliance. While pseudonymous data is not formally defined by GDPR, the law clearly distinguishes between pseudonymized data and anonymous data.
What is Anonymous Data?
Anonymous data is data that has been completely de-identified from the original data subject. Unlike pseudonymous data, anonymous data cannot be reversed or reconnected to the original individual. Anonymous data removes personal identification from personal data entirely so that it can be used more freely.
Anonymous data is impossible to work with if the data being used needs to be eventually reconnected to the original individual. However, anonymous data can be used for statistical research without increasing scope for compliance like GDPR. Anonymous data is also often used to send data to third parties. Although your company may have an original or pseudonymized version of this data, the version you send to a third party can be anonymized to reduce legal concerns.
Pseudonymous Data and Anonymous Data according to GDPR
Pseudonymous data and anonymous data are treated differently under GDPR, the European Union data protection law.
Pseudonymous data is still categorized as personal data but can be used to soften GDPR’s obligations for personal data. GDPR’s requirements for personal data are strict, but pseudonymized data is treated more as a subcategory of personal data, with less requirements and more flexibility. GDPR recognizes that using pseudonymized data reduces the risks associated with using personal data and incentivizes this method by relaxing requirements.
Anonymous data is no longer considered personal data, and because of this is not subject to GDPR. However, the bar for anonymization is incredibly high. There must be no possible way to re-identify the individual the data was previously attached to. This makes anonymous data an impossible choice for certain use cases where identification needs to remain an option.
Tactics for Anonymizing and Pseudonymizing Data
If you want your data anonymized or pseudonymized there are different de-identification tactics you can use. Here are some of the best tactics for both:
How to Anonymize Data
- Aggregation: Aggregation groups individuals with other similar individuals that share aspects of their personal data while removing certain identifying characteristics. Reidentification is impossible once the data has been scrubbed, with important characteristics replaced with symbols (A location like San Jose is replaced with X) or generalized (The age 32 is categorized as “between 30 and 35”).
- Noise Addition: Noise addition works well for numbered data; it adds or subtracts values within a certain range (A height of 5’2” is input with +/-6”)
- Substitution: Certain data is substituted with a different identifier (In a study of BMIs, the numbers could be replaced with colors or symbols)
How to Pseudonymize Data
- Hash Functions: Hash functions transform large pieces of data into a fixed-length output. This makes an unreadable string of data that’s easier to store.
- Data Encryption: Data encryption changes data so that it is unrecognizable without a key, however, decrypting the data for use would bring it back into scope for PII Compliance
- Tokenization: Tokenization pseudonymizes data by exchanging personal information for customizable placeholder tokens. These tokens preserve the data’s utility, while the sensitive data is stored externally. This token cannot be traced back to the original data and is useless if stolen. If the process needs to be reversed, the token can be traded for the original data.
Anonymization techniques like aggregation, noise addition, and substitution work well if the data doesn’t need to be connected to an individual anymore. However, if the data is needed for internal purposes, a tokenization solution will be able to pseudonymize the data without compromising its business utility.
If you are interested in pseudonymizing your data, consider checking out the TokenEx Tokenization & Pseudonymization tools that secure personal data while preserving its business utility.