[GDPR Step 2] The Importance of Creating Data Taxonomy
The General Data Protection Regulation (GDPR), introduced by the European Union, took effect on May 25, 2018. With the introduction of GDPR, organizations will be under intense scrutiny regarding how they handle the personal data of customers, employees, and prospects. Non-compliance can lead to heavy fines, so it is imperative for organizations to understand the data that they store and take steps to protect it.
Take a look at the first step of the plan, “How to Develop Policies, Standards, and Controls.” Creating a data taxonomy is the second step.
What is Data Taxonomy?
Data taxonomy is the classification of data into categories and sub-categories. It provides a unified view of the data in an organization and introduces common terminologies and semantics across multiple systems. Establishing a hierarchy within a set of metadata and segregating it into categories creates a better understanding of the relationships between data points.
To establish GDPR compliance, the data governance team of an organization needs to collaborate with enterprise data architecture to classify data.
For example, the data taxonomy may include employee information as a level 1 category. Employee information may then be further classified into multiple, level 2 categories, such as salary and benefits, identity, contacts, protected health information, social media, and employee performance.
Why is Data Taxonomy Critical for GDPR?
The GDPR requires that organizations protect the privacy of personal data, share how they store and use this data, and take responsibility for the privacy and protection of that data. Creation of a data taxonomy becomes essential in this context. Here, we look at a few benefits gained by creating a taxonomy:
- Fundamental understanding of data — As a result of the GDPR, it is possible that many existing data elements are not compliant with the regulation and need to be fixed. Taxonomy helps discover such data quality issues by providing a basic understanding of what the data is and its lineage.
- Data access — GDPR provides data subjects with the right to access their data in an electronic format whenever needed. Categorization of data helps with faster retrieval of data, as it extends the search for a keyword automatically to other, closely related terms.
- Risk analysis — Classification of data helps determine whether it risks non-compliance. The process helps identify data that falls under the highly sensitive category. Such data would require anonymization per the GDPR. Other, non-sensitive data can be ignored for compliance analysis, saving time and effort.
- Reduce unwanted data — GDPR recommends data minimization to collect and store only as much personal data as required. A taxonomy helps get rid of existing ROT (redundant, obsolete, or trivial) data, which decreases the risk of storing non-compliant personal data.
Using Talend Tools for Automatic Taxonomy Generation
In Talend Metadata Manager, a business glossary can be used to define a collections of terms and link them to categories and subcategories, shown in Figure 1. Building a business glossary can be as simple as dragging in an existing, well-documented data model, importing the terms and definitions from other sources (e.g., CSV, Microsoft Excel), or interactive authoring via the user interface during the process of classifying objects.
Once published, the glossary can be accessed through a search-based interface by anyone who has proper authorizations.
Figure 1: Talend Metadata Manager Business Glossary
Next Steps in Creating Data Taxonomy
Creation of a taxonomy is just the beginning. After this broad classification, it is important to assign data owners and map these high level categories down to actual data points in different IT systems. It then becomes easier to run profiling and cleansing jobs on these data points to get them to be trustworthy and GDPR compliant.
Ready to get started with Talend?
More related articles
- Pillars to GDPR Success (2 of 5): Data Capture and Integration
- Pillars to GDPR Success (4 of 5): Self-Service Curation and Certification
- Pillars to GDPR Success (3 of 5): Anonymize and Pseudonymize for Data Protection with Data Masking
- Pillars to GDPR Success (5 of 5): Data Access and Portability
- Preparing for GDPR
- [GDPR Step 14] How to Govern the Lifecycle of Information
- Pillars to GDPR Success (1 of 5): Data Classification and Lineage
- PCI DSS: Definition, 12 Requirements, and Compliance
- [GDPR Step 15] How to Set Up Data Sharing Agreements
- [GDPR Step 16] How to Enforce Compliance with Controls
- [GDPR Step 13] How to Manage End-User Computing
- [GDPR Step 11] How to Stitch Data Lineage
- [GDPR Step 09] How to Conduct Vendor Risk Assessments
- [GDPR Step 12] How to Govern Analytical Models
- [GDPR Step 10] How to Improve Data Quality
- [GDPR Step 08] How to Conduct Data Protection Impact Assessments
- [GDPR Step 07] How to Establish Data Masking Standards
- [GDPR Step 3] How to Confirm Data Owners
- [GDPR Step 06] How to Define Acceptable Use Standards for GDPR
- [GDPR Step 4] How to Identify Critical Datasets and Critical Data Elements
- What is Data Portability?
- [GDPR Step 01] How to Develop Policies, Standards, and Controls
- What is Data Privacy?
- [GDPR Step 5] How to Establish Data Collection Standards