Pseudonymization vs. Anonymization: GDPR
What is Pseudonymization?
The General Data Protection Regulation (GDPR) is now in effect, with strong requirements to protect the personal data of European Union (EU) data subjects “by design and by default.” Although the GDPR doesn’t contain detailed technical requirements for data security, it does mention pseudonymization as an appropriate mechanism for data protection and de-identification. So, what is pseudonymization?
Pseudonymization is defined in Article 4(5) of the GDPR as:
The processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organizational measures to ensure that the personal data are not attributed to an identified or identifiable natural person.
In other words, pseudonymization is the process of replacing identifying or sensitive data with a pseudonym. This is synonymous with tokenization, which replaces sensitive data with a nonsensitive placeholder called a token, a technology utilized for years by the Payment Card Industry to protect payment card information (PCI).
Pseudonymization vs. Anonymization
In addition to pseudonymization, the GDPR also makes a reference to anonymous information in Recital 26:
The principles of data protection should therefore not apply to anonymous information, namely, information that does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable.
So, what’s the difference between anonymized and pseudonymized information?
If you imagine a continuum of personal data—with fully identifiable, explicit personal data such as full names and Social Security numbers at one end and anonymized data with no identifiable personal information at the other end—pseudonymized data lies somewhere in the middle. A data subject’s pseudonymized data can be reidentified, or associated, with that individual by replacing the pseudonyms with the actual data. Fully anonymized data, on the other hand, cannot be linked back to the individual(s) with which it corresponds.
As noted in Recital 26, anonymous data is not subject to the data protection obligations created by the GDPR. So why not just anonymize all the sensitive data in your organization’s systems? For one, fully anonymizing a data set is a difficult task. Second, by definition, anonymous data can’t be linked back to identifiable individuals, which renders it useless for almost anything but very high-level data aggregation and analysis.
Benefits of Data Pseudonymization
In contrast to anonymized data, pseudonymized data retains some statistical utility relative to the level of pseudonymization. And in contrast to explicitly identified data, pseudonymized data provides obvious data protection benefits. Because of this, the GDPR provides several incentives for organizations to implement pseudonymization.
Both Article 25 and Recital 78 make reference to “appropriate technical and organizational measures” for data protection and cite pseudonymization as one of those measures. Recital 78 also cites “pseudonymising personal data as soon as possible” as a method that can be used for demonstrating PII compliance with GDPR.
Like many other data protection compliance frameworks, the GDPR advocates a risk-based approach. Under Article 32, organizations are directed to “ensure a level of security appropriate to the risk,” and again pseudonymization is described as an appropriate technical measure.
Another incentive to pseudonymize data appears in Article 34(1) regarding the obligation to notify affected data subjects in the event of a data breach when it “is likely to result in a high risk to the rights and freedoms of natural persons.” If the breached data has been appropriately pseudonymized, the risk is lower, potentially mitigating the need for notification. Many U.S. breach notification laws make similar allowances for pseudonymized data sets.
Pseudonymization may also enable the processing of personal data beyond the purpose for which it was originally collected. The GDPR requires that personal data be collected only for “specific, explicit and legitimate purposes,” although further processing may be permissible if that processing is compatible with the original purpose. Article 6(4) describes the factors that must be taken into account when determining if further processing is compatible, including “the existence of appropriate safeguards, which may include encryption or pseudonymization.”
Methods of Pseudonymization
There are multiple methods for pseudonymizing data including data masking, encryption, and tokenization. At a high level, encryption entails the use of a key to encode or protect a data set. Consequently, encryption is mathematically reversible and subject to the complexities of key management. Tokenization, by comparison, involves replacing identifying or sensitive data with a mathematically unrelated value. Therefore, the tokens cannot be mathematically reversed. Both encryption and tokenization can be format-preserving and tokens may optionally include elements of the original value for data processing purposes. Data masking is a process for obfuscating data that is typically accomplished via encryption.
The most suitable method of pseudonymization will depend on the specific use case and needs of an organization, although it’s worth noting that from a compliance standpoint, tokenization via a cloud-based tokenization provider is the only method that enables an organization to completely remove sensitive or identifying data from its systems. This is a significant differentiator from both a compliance and a data security perspective.
Pseudonymization in the Cloud
As mentioned above, the definition of pseudonymization in the GDPR mentions that the identifying attributes of personal data be “kept separately and subject to technical and organizational measures to ensure non-attribution to an identified or identifiable person.” Utilizing cloud-based tokenization for pseudonymization allows an organization to keep identifying attributes stored securely offsite, separate from the remaining data. Again, contrast this with encryption, where the encryption keys are typically stored in the same environment as the encrypted data.
Cloud-based tokenization also allows an organization to tokenize data before it is stored and meet the requirement for demonstrating compliance with the GDPR in Recital 78 “pseudonymising personal data as soon as possible.”
TokenEx can help your organization meet the data protection obligations created by the GDPR by pseudonymizing personal data at the ingestion point. The TokenEx Data Protection Platform provides flexible technologies and methodologies to make tokenizing, encrypting, and data vaulting work with any acceptance channel your organization uses. Follow us on Twitter and LinkedIn for updates and news.
How TokenEx Assists in Achieving GDPR Compliance
TokenEx’s tokenization solutions are well-recognized and accepted forms of pseudonymization, which makes GDPR compliance more certain, less costly, and much easier to accomplish. Tokenization is an advanced form of pseudonymization, as referenced in the GDPR. It is the process TokenEx has used for over a decade to protect the private data of clients worldwide, without a single breach or exposure. As a well-recognized and accepted form of pseudonymization, tokenization can be used to satisfy many of the compliance requirements of the GDPR.
Anonymization vs. Deanonymization
As referenced in the previous paragraph, one method for compliance is desensitizing the data in question, removing it from the scope of GDPR altogether. In order to desensitize or de-identify information, companies commonly choose to employ anonymization or pseudonymization. The GDPR explicitly states the data-protection principles of the law do not apply to anonymous information—“information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable.”
However, fully anonymizing a data set is a difficult task, and once it’s done, the anonymous data isn’t designed to be returnable to its original, identifiable form—rendering it useless for almost anything but very high-level data aggregation and analysis. Because the data’s business utility likely was the reason your organization was processing it in the first place, this isn’t a terribly attractive solution.
Although it is not impossible to deanonymize anonymized data, it does require extensive data-mining efforts in order to return enough information to make cross-referencing feasible—which defeats the purpose of anonymizing data to begin with. An alternative that “cleanses” sensitive data while still maintaining its valuable business intelligence purposes is pseudonymization. Pseudonymization is defined in Article 4(5) of the GDPR as:
“The processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organizational measures to ensure that the personal data are not attributed to an identified or identifiable natural person.”
In other words, pseudonymization is the process of replacing identifying or sensitive data with a pseudonym. TokenEx’s Cloud Security Platform does just that via tokenization–the process of replacing sensitive data with a nonsensitive token. TokenEx’s cloud tokenization successfully pseudonymizes data while outsourcing the risk and security concerns of internal data storage, and many of our existing customers already use this technology to comply with Payment Card Industry Data Security Standard requirements.
Pseudonymization may also enable the processing of personal data beyond the purpose for which it was originally collected. The GDPR requires that personal data be collected only for “specific, explicit, and legitimate purposes,” although further processing may be permissible if it is compatible with the original purpose. Article 6(4) describes the factors that must be taken into account when determining if further processing is compatible, including “the existence of appropriate safeguards, which may include encryption or pseudonymization.”
TokenEx’s Cloud Security Platform can help your organization comply with the GDPR and maintain the business utility of your data by pseudonymizing sensitive information at the point where it enters your system. Our flexible technologies and methodologies make tokenizing and encrypting work with any acceptance channel your organization uses. For more information, contact us at firstname.lastname@example.org.