[GDPR Step 5] How to Establish Data Collection Standards

The General Data Protection Regulation (GDPR), introduced by the European Union, took effect on May 25, 2018. This regulation will change how organizations handle personal data of data subjects, including customers, employees, and prospects. Organizations have to revamp their processes and systems to be compliant with the new stringent data protection standards.

We recently hosted an on-demand webinar, Practical Steps to GDPR Compliance, that focuses on a comprehensive 16-step plan to operationalize a data governance program that supports GDPR compliance. Establishing Data Collection Standards is Step 5 in this plan. We have already covered the first four steps here: establishing policies, standards, and controls; creating a data taxonomy; assigning data ownership; and identifying critical data elements.

GDPR’s Perspective on Data Collection

Article 25 of the GDPR addresses data protection by design and by default. Here are three aspects of the article that impact data collection:

1. Data Minimization: Only Collect Required Data

The regulation requires organizations to implement appropriate technical and organizational measures, such as data minimization, to ensure that, by default, only personal data that are absolutely necessary for processing are collected.

For example, a software company may use an online form to allow users to download software on a trial basis. In this scenario, it would be reasonable for the form to request the applicant’s name and email address. However, it would be inappropriate for the company to also request the applicant’s date of birth and national identifier.

Along with collecting only as much data as needed, the GDPR also expects organizations to use and/or store the data for processing only if required. If the data is no longer required after a period of time, there should be an automated mechanism to clean-up this unwanted data.

2. Consent: Collect Explicit User Permission

The GDPR mandates that the consent to use personal data has to be explicitly collected from a data subject. This consent can no longer be ambiguous or by default. For example, organizations cannot assume consent by default and then provide an opt-out option for customers. Instead, it has to be the other way around with an explicit opt-in option.

Also, every time there is a change in the way organizations process personal data, fresh consent has to be received. Similarly, if special category fields such as race, ethnic origin, sexual orientation, or political opinions are processed, another consent (separate from the ones already collected for basic personal data) must be received.

3. Data Protection

During the data collection stage, there have to be sufficient measures, such as pseudonymization (replacing identifiable fields within a data record by pseudonyms), to protect personal data. For example, parts of a zip code can be hidden using a pseudonym to protect the specific address of a person. However, just enough digits may be revealed to indicate that it is a US zip code.

How to Use Talend for Data Collection

Data collection and corresponding operationalization of GDPR controls is one of Talend’s core functions. Talend Master Data Manager (MDM), Talend Big Data, and Talend Data Quality support the creation of an enterprise repository, where all information related to a data subject, including personal data and consents, are brought together. This GDPR data lake can then be used to reconcile data.

For example, these Talend tools may be used to discern that “James Smith” and “Jim Smith” are really the same person, although only the former has provided an email opt-in.

The GDPR data lake can also manage a single inventory of all consents, including those for email campaigns, cookie consent, and phone contracts. This repository maintains a list of all consents for a single data subject across all applications within the enterprise. It provides an audit trail and record level lineage of how and when consent was obtained for a specific data subject for a given application.

Once all the information has been reconciled into the GDPR data lake, the data governance team can provide services such as data portability and the right to be forgotten.

The data lake concept is effective for GDPR, especially in the big data era, where data comes from everywhere rather than from centrally designed and governed systems. In these environments, incoming data might not be fully structured and documented. That means there is a need to capture personal data from raw datasets, and then to crowdsource the rules for protecting this data from data specialists, who need to process it for their machine learning models.

Next Steps to GDPR Compliance

Establishing standards for data collection is an important step, given the GDPR’s focus on data minimization and consents. These standards have to be implemented from both technology and process perspectives. The data governance team must establish controls so that legal and compliance sign off on data collection for any new project during the design phase.

Similar to setting standards for collection, there are also rules to be agreed upon on how this data has to be used. This is discussed in the next step of our 16-step plan to operationalize a data governance framework for GDPR compliance.

To see all the 16 steps together, don’t miss the on-demand webinar, Practical Steps to GDPR Compliance. The video covers this information, as well as developing standards and controls, identifying data owners and critical data elements, and more.

← Step 4  |  Step 6 →

Ready to get started with Talend?