Pillars to GDPR Success (1 of 5): Data Classification and Lineage

The General Data Protection Regulation (GDPR) now requires organizations to closely track and organize personal data. Data classification and lineage is the first step of the process, so it is the first of Talend’s 5 Pillars for GDPR Compliance.

What exactly does data classification and lineage mean? And how does it help achieve GDPR compliance?

  • Data lineage tracks the origin of data, which helps companies understand where their data comes from and shows them where it exists in their data lake.
  • Data classification is the process of sorting that data into different categories, based on characteristics set by the user. A typical example would be to classify any data structure in a data landscape as “personal data” for compliance with data privacy regulations.

In short, data classification and lineage help companies know their data and keep it organized.

All five pillars are described in the 5 Pillars for GDPR Compliance webinar, which includes a demonstration on how Talend’s products can assist in data classification and lineage. In this article, we’ll dive deeper into pillar one: data classification and lineage.

Why Data Classification and Lineage?

Implementing data classification and lineage is like having a GPS for your data. It allows all personal data (customer, employee, visitor, prospect, user, etc.) that is being manipulated within an organization to be referenced quickly and easily. Knowing where data comes from—and categorizing it based on its origin and purpose—is an important step in data organization.

Data classification and lineage can be useful for:

  • Creating data inventories and increasing data accessibility through documentation and searchability.
  • Getting to know your personal data and control its use.
  • Classifying and showing lineage for auditing and change management purposes.

These steps help create a single place from which all valuable data can be described within the scope of GDPR, making compliance not only possible but simple.

Talend and Data Classification and Lineage

Within the Talend Platform, as part of Talend Data Quality, the dictionary service allows users to define the data footprints that get tracked in a data landscape. The most frequently used Personally Identifiable Information (PII)—such as IBANs, e-mails, first names, or social security numbers—are pre-configured.

Tools such as Talend Data Preparation let users discover data across datasets and check whether or not they contain personal data. This creates the ability to decode the structure and semantics of the data, making it possible to classify the data.

With Talend Metadata Manager, data classification and lineage can extend beyond the scope of what is managed by the Talend integration platform. Metadata Manager allows users to create a glossary of the critical data elements that an organization wants to track within a data privacy initiative, and capture anything that relates to those elements across data management platforms, databases, and analytics tools. Users can then generate a holistic and auditable view of the information supply chain in a language that everyone can understand.

Get Started Today

Knowing where data comes from and categorizing it is an important step in order to comply with GDPR. Data classification and lineage will help keep your customer data organized for quick and easy referencing.

Ready to get working on your organization’s data classification and lineage for optimal GDPR compliance? Check out the entire 5 Pillars for GDPR Compliance webinar for a broader understanding of how to comply with GDPR and a walk-through of Talend’s relevant products.

Pillar 2 →

Ready to get started with Talend?