[GDPR Step 4] How to Identify Critical Datasets and Critical Data Elements
The General Data Protection Regulation (GDPR), which went into effect on May 25, 2018, aims to create better data protection policies and holds the organizations that handle personal data more accountable than before. This means organizations must now focus on data governance. To achieve this, a clear understanding of personal data and how it is stored, used, and protected is required.
Talend recently hosted an on-demand webinar, Practical Steps to GDPR Compliance, that focuses on a comprehensive 16-step plan to operationalize a data governance program that supports GDPR compliance.
Identifying critical datasets and critical data elements (CDEs) is Step 4 in this plan. Take a look at the first three steps of the plan here: establishing policies, standards, and controls; creating a data taxonomy; and assigning data ownership.
Why Identifying Critical Data Elements is Important for GDPR
The following articles of the GDPR put data elements in the spotlight:
Article 4 of GDPR defines personal data as “any information related to an identified or identifiable natural person (‘data subject’) … such as a name, identification number, location data, online identifier, or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.”
Article 9 of GDPR restricts the processing of special categories of personal data such as race, ethnic origin, or political opinions.
To comply with these articles of the GDPR, it is necessary to identify such personal data as CDEs and then take relevant steps to protect them.
For example, if an organization collects sensor data from a motor vehicle or tracks its coordinates, and if the vehicle remains stationary overnight, it is reasonable to infer that this coordinate is the vehicle owner’s address. It then becomes easier to track the identity of the person. Hence, sensor data or the coordinates can become CDEs, as they can indirectly lead to revealing the subject’s personal data.
Identifying CDEs highlights that an organization is dedicated to ensuring personal data is not compromised.
How to Identify Critical Data Elements Using Talend
Data stewards have an important role in this process. They should prioritize their efforts by identifying critical datasets and CDEs within their respective data categories. For example, employee identity consists of a number of CDEs, including name, gender, date of birth, and national ID. Employee social media information consists of a number of critical datasets, such as Facebook, Twitter, and LinkedIn profile information.
The data governance team needs to determine whether standards for data collection and data use are best set at the level of critical datasets, rather than for individual CDEs. For example, acceptable use and security standards may be better managed for overall Facebook information (critical dataset) rather than for Facebook ID (CDE).
Talend Metadata Manager supports an ISO 11179 business glossary that contains personal data-related business terms. For example, it may contain an inventory of business terms for customer identity such as name, email address, and phone number. It will also define the semantics of the critical data elements using predefined semantics (such as e-mail, first name, last name, IBAN, etc.) so that footprints from those critical data elements can be captured automatically across datasets. This means that Metadata Manager can act as more than a business glossary, but rather as the single point of entry for capturing personal data footprints across datasets.
Here, are two approaches that Talend Metadata Manager supports to identify CDEs for GDPR:
- Top-down approach — Describing the enterprise data landscape as a whole, the tool supports the mapping of high-level data definitions to actual physical fields in source systems across the enterprise.
- Bottom-up approach — In this approach, physical data points are captured automatically and then linked to high-level GDPR data definitions as applicable. The physical fields will be based on technical metadata, harvested from source systems based on a rich variety of connectors from Talend Metadata Manager (see Figure 1).
Figure 1: Defining or retro-engineering data models and data elements with Talend Metadata Manager.
The broad range of connectors provides an accurate view of the data landscape, similar to a GPS navigator that can alert a driver when traffic conditions change.
This second approach is more popular in the big data era, as data comes from multiple sources and it becomes essential to automatically profile and discover the data before confirming whether it contains personal data and take actions for compliance accordingly.
Next Steps in Identifying Critical Data
Legal and compliance need to sign off on the processing of personal data during the design phase of a project. So, irrespective of which approach is followed, data governance must work with these teams to define “personal data” for the GDPR.
Identifying CDEs related to personal data is crucial to taking actions for GDPR controls, and Talend Metadata Manager can help. The next step involved in the 16-step plan is establishing data collection standards.
← Step 3 | Step 5 →
Ready to get started with Talend?
More related articles
- Pillars to GDPR Success (2 of 5): Data Capture and Integration
- Pillars to GDPR Success (4 of 5): Self-Service Curation and Certification
- Pillars to GDPR Success (3 of 5): Anonymize and Pseudonymize for Data Protection with Data Masking
- Pillars to GDPR Success (5 of 5): Data Access and Portability
- Preparing for GDPR
- [GDPR Step 14] How to Govern the Lifecycle of Information
- Pillars to GDPR Success (1 of 5): Data Classification and Lineage
- PCI DSS: Definition, 12 Requirements, and Compliance
- [GDPR Step 15] How to Set Up Data Sharing Agreements
- [GDPR Step 16] How to Enforce Compliance with Controls
- [GDPR Step 13] How to Manage End-User Computing
- [GDPR Step 11] How to Stitch Data Lineage
- [GDPR Step 09] How to Conduct Vendor Risk Assessments
- [GDPR Step 12] How to Govern Analytical Models
- [GDPR Step 10] How to Improve Data Quality
- [GDPR Step 08] How to Conduct Data Protection Impact Assessments
- [GDPR Step 07] How to Establish Data Masking Standards
- [GDPR Step 3] How to Confirm Data Owners
- [GDPR Step 06] How to Define Acceptable Use Standards for GDPR
- [GDPR Step 2] The Importance of Creating Data Taxonomy
- What is Data Portability?
- [GDPR Step 01] How to Develop Policies, Standards, and Controls
- What is Data Privacy?
- [GDPR Step 5] How to Establish Data Collection Standards