[GDPR Step 07] How to Establish Data Masking Standards

The General Data Protection Regulation (GDPR), introduced by the European Union, took effect on May 25, 2018. With the introduction of GDPR, organizations have an increased responsibility to protect personal data of subjects such as customers, employees, and prospects. This includes anonymizing data for controlled privacy protection.

We recently hosted an on-demand webinar, Practical Steps to GDPR Compliance, that focuses on a comprehensive 16-step plan to operationalize a data governance program that supports GDPR compliance.

Establishing data masking standards is Step 7 in this plan. To know more about the first six steps, check out the links in the sidebar!

What Does the GDPR Say About Data Masking?

The GDPR recognizes the fact that while some personal data can directly reveal the identity of a person, other data may be processed or stored in a way that maintains confidentiality.

Minimizing Obligations for Anonymous Data

Recital 26 of the GDPR states that the principles of data protection should apply to any information, “concerning an identified or identifiable natural person.” Hence, the principles do not apply to anonymous information or to personal data through which the subject is not identifiable.

Article 11 of the GDPR addresses processing that does not require identification. If a controller (an entity that determines the purposes and means of processing personal data) does not need the identity of the data subject, the obligations of the controller under the GDPR are significantly minimized.

Effectively, Recital 26 and Article 11 of the GDPR, acknowledge the presence of anonymous information and release organizations from tighter security obligations in cases where personal data remain concealed.

Masking Personal Data for Security

Article 32 of the GDPR deals with the security of processing. In case of sensitive personal data, the GDPR recommends that organizations implement appropriate organizational and technical measures (e.g., anonymization, pseudonymization, etc.) to ensure a level of security appropriate to the risk.

The data governance office must establish controls to appropriately mask or encrypt sensitive personal data. The data masking standards need to ensure that data cannot be reconstructed when multiple fields are combined.

For example, data scientists may request that the employee name field should be masked prior to analytics. However, an experienced and familiar data scientist may be able to discern the identity of an employee by looking at title, compensation, and gender (e.g., “Director of HR who is a female with base salary of $200,000”). In this situation, it may be more appropriate to also mask job title and to provide a salary band, such as “above $100,000.”

Using Talend for Data Masking

Talend Data Quality provides data masking and data shuffling as core components that can be enforced at any step of a data pipeline (see Figure 1).

Data shuffling is a type of data masking, which involves randomly shuffling a column (or a more complex dataset like a group of columns or a partition) to keep its identity is hidden, but the relevant values in place. In this way, privacy is preserved, but analytics and data testing can still take place using the original data values.

Figure 1: Data masking and shuffling can be applied to batch and real-time data streams, through preconfigured or customized functions.

Through Talend Data Preparation, data masking can also be enforced in an ad-hoc manner, allowing line-of-business users to protect sensitive data before sharing it with colleagues. For example, a marketing campaign manager, who wants to report on the success of a campaign with a partner, can share the dataset for analytics after anonymizing the data that could inappropriately reveal privacy-related information (Figure 2).

Figure 2: Self-service data masking for business users in Talend Data Preparation.

This is made possible because the data masking tool is semantic aware and pre-configured for typical PII (personally identifiable information) data such as email ID, phone number, etc. In the case of an email ID, the tool may hide the first part—before “@”—but still display the domain name.

Next Steps to GDPR Compliance

In the past, data masking was a specialized discipline, used only by niche applications. However, the GDPR has changed that by making data masking a part of all kinds of applications. This also means that the feature has be simple, for both developers and business users who are not data masking experts. This is exactly how we’ve engineered Talend’s GDPR solutions.

By executing Steps 1 to 7 of the comprehensive 16-step plan, the foundation should be in place for achieving GDPR compliance. Organizations should, by now, have an understanding of their data and a set of well-defined standards on how to collect, process, and protect data. The next step is to conduct data protection impact assessments.

← Step 6  |  Step 8 →

Ready to get started with Talend?