[GDPR Step 2] The Importance of Creating Data Taxonomy

The General Data Protection Regulation (GDPR), introduced by the European Union, took effect on May 25, 2018. With the introduction of GDPR, organizations will be under intense scrutiny regarding how they handle the personal data of customers, employees, and prospects. Non-compliance can lead to heavy fines, so it is imperative for organizations to understand the data that they store and take steps to protect it.

We recently hosted an on-demand webinar, Practical Steps to GDPR Compliance, that focuses on a comprehensive 16-step plan to operationalize a data governance program that supports GDPR compliance.

Take a look at the first step of the plan, “How to Develop Policies, Standards, and Controls.” Creating a data taxonomy is the second step.

What is Data Taxonomy?

Data taxonomy is the classification of data into categories and sub-categories. It provides a unified view of the data in an organization and introduces common terminologies and semantics across multiple systems. Establishing a hierarchy within a set of metadata and segregating it into categories creates a better understanding of the relationships between data points.

To establish GDPR compliance, the data governance team of an organization needs to collaborate with enterprise data architecture to classify data.

For example, the data taxonomy may include employee information as a level 1 category. Employee information may then be further classified into multiple, level 2 categories, such as salary and benefits, identity, contacts, protected health information, social media, and employee performance.

Why is Data Taxonomy Critical for GDPR?

The GDPR requires that organizations protect the privacy of personal data, share how they store and use this data, and take responsibility for the privacy and protection of that data. Creation of a data taxonomy becomes essential in this context. Here, we look at a few benefits gained by creating a taxonomy:

  • Fundamental understanding of data — As a result of the GDPR, it is possible that many existing data elements are not compliant with the regulation and need to be fixed. Taxonomy helps discover such data quality issues by providing a basic understanding of what the data is and its lineage.
  • Data access — GDPR provides data subjects with the right to access their data in an electronic format whenever needed. Categorization of data helps with faster retrieval of data, as it extends the search for a keyword automatically to other, closely related terms.
  • Risk analysis — Classification of data helps determine whether it risks non-compliance. The process helps identify data that falls under the highly sensitive category. Such data would require anonymization per the GDPR. Other, non-sensitive data can be ignored for compliance analysis, saving time and effort.
  • Reduce unwanted data — GDPR recommends data minimization to collect and store only as much personal data as required. A taxonomy helps get rid of existing ROT (redundant, obsolete, or trivial) data, which decreases the risk of storing non-compliant personal data.

Using Talend Tools for Automatic Taxonomy Generation

In Talend Metadata Manager, a business glossary can be used to define a collections of terms and link them to categories and subcategories, shown in Figure 1. Building a business glossary can be as simple as dragging in an existing, well-documented data model, importing the terms and definitions from other sources (e.g., CSV, Microsoft Excel), or interactive authoring via the user interface during the process of classifying objects.

Once published, the glossary can be accessed through a search-based interface by anyone who has proper authorizations.


Figure 1: Talend Metadata Manager Business Glossary

Next Steps in Creating Data Taxonomy

Creation of a taxonomy is just the beginning. After this broad classification, it is important to assign data owners and map these high level categories down to actual data points in different IT systems. It then becomes easier to run profiling and cleansing jobs on these data points to get them to be trustworthy and GDPR compliant.

← Step 1  |  Step 3 →

Ready to get started with Talend?