[GDPR Step 07] How to Establish Data Masking Standards
The General Data Protection Regulation (GDPR), introduced by the European Union, took effect on May 25, 2018. With the introduction of GDPR, organizations have an increased responsibility to protect personal data of subjects such as customers, employees, and prospects. This includes anonymizing data for controlled privacy protection.
Establishing data masking standards is Step 7 in this plan. To know more about the first six steps, check out the links in the sidebar!
What Does the GDPR Say About Data Masking?
The GDPR recognizes the fact that while some personal data can directly reveal the identity of a person, other data may be processed or stored in a way that maintains confidentiality.
Minimizing Obligations for Anonymous Data
Recital 26 of the GDPR states that the principles of data protection should apply to any information, “concerning an identified or identifiable natural person.” Hence, the principles do not apply to anonymous information or to personal data through which the subject is not identifiable.
Article 11 of the GDPR addresses processing that does not require identification. If a controller (an entity that determines the purposes and means of processing personal data) does not need the identity of the data subject, the obligations of the controller under the GDPR are significantly minimized.
Effectively, Recital 26 and Article 11 of the GDPR, acknowledge the presence of anonymous information and release organizations from tighter security obligations in cases where personal data remain concealed.
Masking Personal Data for Security
Article 32 of the GDPR deals with the security of processing. In case of sensitive personal data, the GDPR recommends that organizations implement appropriate organizational and technical measures (e.g., anonymization, pseudonymization, etc.) to ensure a level of security appropriate to the risk.
The data governance office must establish controls to appropriately mask or encrypt sensitive personal data. The data masking standards need to ensure that data cannot be reconstructed when multiple fields are combined.
For example, data scientists may request that the employee name field should be masked prior to analytics. However, an experienced and familiar data scientist may be able to discern the identity of an employee by looking at title, compensation, and gender (e.g., “Director of HR who is a female with base salary of $200,000”). In this situation, it may be more appropriate to also mask job title and to provide a salary band, such as “above $100,000.”
Using Talend for Data Masking
Talend Data Quality provides data masking and data shuffling as core components that can be enforced at any step of a data pipeline (see Figure 1).
Data shuffling is a type of data masking, which involves randomly shuffling a column (or a more complex dataset like a group of columns or a partition) to keep its identity is hidden, but the relevant values in place. In this way, privacy is preserved, but analytics and data testing can still take place using the original data values.
Figure 1: Data masking and shuffling can be applied to batch and real-time data streams, through preconfigured or customized functions.
Through Talend Data Preparation, data masking can also be enforced in an ad-hoc manner, allowing line-of-business users to protect sensitive data before sharing it with colleagues. For example, a marketing campaign manager, who wants to report on the success of a campaign with a partner, can share the dataset for analytics after anonymizing the data that could inappropriately reveal privacy-related information (Figure 2).
Figure 2: Self-service data masking for business users in Talend Data Preparation.
This is made possible because the data masking tool is semantic aware and pre-configured for typical PII (personally identifiable information) data such as email ID, phone number, etc. In the case of an email ID, the tool may hide the first part—before “@”—but still display the domain name.
Next Steps to GDPR Compliance
In the past, data masking was a specialized discipline, used only by niche applications. However, the GDPR has changed that by making data masking a part of all kinds of applications. This also means that the feature has be simple, for both developers and business users who are not data masking experts. This is exactly how we’ve engineered Talend’s GDPR solutions.
By executing Steps 1 to 7 of the comprehensive 16-step plan, the foundation should be in place for achieving GDPR compliance. Organizations should, by now, have an understanding of their data and a set of well-defined standards on how to collect, process, and protect data. The next step is to conduct data protection impact assessments.
Ready to get started with Talend?
More related articles
- Pillars to GDPR Success (2 of 5): Data Capture and Integration
- Pillars to GDPR Success (4 of 5): Self-Service Curation and Certification
- Pillars to GDPR Success (3 of 5): Anonymize and Pseudonymize for Data Protection with Data Masking
- Pillars to GDPR Success (5 of 5): Data Access and Portability
- Preparing for GDPR
- [GDPR Step 14] How to Govern the Lifecycle of Information
- Pillars to GDPR Success (1 of 5): Data Classification and Lineage
- PCI DSS: Definition, 12 Requirements, and Compliance
- [GDPR Step 15] How to Set Up Data Sharing Agreements
- [GDPR Step 16] How to Enforce Compliance with Controls
- [GDPR Step 13] How to Manage End-User Computing
- [GDPR Step 11] How to Stitch Data Lineage
- [GDPR Step 09] How to Conduct Vendor Risk Assessments
- [GDPR Step 12] How to Govern Analytical Models
- [GDPR Step 10] How to Improve Data Quality
- [GDPR Step 08] How to Conduct Data Protection Impact Assessments
- [GDPR Step 3] How to Confirm Data Owners
- [GDPR Step 06] How to Define Acceptable Use Standards for GDPR
- [GDPR Step 2] The Importance of Creating Data Taxonomy
- [GDPR Step 4] How to Identify Critical Datasets and Critical Data Elements
- What is Data Portability?
- [GDPR Step 01] How to Develop Policies, Standards, and Controls
- What is Data Privacy?
- [GDPR Step 5] How to Establish Data Collection Standards