[GDPR Step 10] How to Improve Data Quality

The General Data Protection Regulation (GDPR), introduced by the European Union, took effect on May 25, 2018. With the introduction of GDPR, organizations have to ensure that personal data of subjects such as customers, employees, and prospects is complete and accurate.

Talend recently hosted an on-demand webinar, Practical Steps to GDPR Compliance, that focuses on a comprehensive 16-step plan to operationalize a data governance program that supports GDPR compliance.

Improving data quality is Step 10 in this plan. For details on the first nine, check out the links in the sidebar!

GDPR’s Perspective on Data Quality

Article 16 of the GDPR requires companies to rectify inaccurate personal information and complete any missing personal data without undue delay. Governance teams need to establish controls to allow data subjects to address any quality issues relating to their personal information in a timely manner.

Data reconciliation is also important because customer and employee data might be fragmented in many places and systems within the organization. The GDPR mandates that this information should be reconciled into a consistent and complete view that can be exposed to data subjects on request.

Using Talend for Data Quality 

Bad data costs US businesses $3 trillion per year. It results in poor understanding of customers and non-compliance with the GDPR, which can result in huge fines. This is undoubtedly an important issue for most organizations to address.

Here are a few factors that cause bad data and the corresponding Talend tools that help resolve them:

  1. Information silos — In the big data world, information comes from multiple sources and systems. Talend Data Integration Platform collates data from various sources into a common platform.
  2. Diverse technologies — Given the assortment of technologies that companies use, data is presented in diverse formats. Talend Data Integration Platform helps different tools and apps integrate easily with any technology (for example, write a MapReduce or Spark job) and presents data in a consistent format.
  3. Inconsistent data — As data comes from various sources, there could be discrepancies in them. For example, the marketing and sales systems may have different records of a customer’s mobile number. Talend Data Quality and Talend Data Stewardship help reconcile these into a master record (single version of truth). 

Integration with the Data Quality Lifecycle

Talend Data Quality helps in remediating issues at all stages of the data quality lifecycle (Figure 1). The product provides solutions for all scenarios: from discovering issues and standardizing the data using libraries, to resolving duplicates and merging the records into a single version of truth, and finally monitoring for data quality at all times.

Figure 1: Data Quality Lifecycle.

It generates native code to run data quality controls and data anonymization at the right place (on-premises inside a Hadoop cluster or in the cloud) and at the right time (on data at rest or on streaming data). It also provides sophisticated deduplication and matching capabilities to reconcile or connect datasets across systems (see Figure 2).

Figure 2: Talend Data Quality can automatically match personal data against new data sources based on patterns, dictionary, or ontologies, and then tag or apply rules on highlighted data.

Organizations also have the need to delegate authority from data protection officers to data stewards or business users. For example, a sales engineer might be best positioned to ensure that contact data for his or her accounts are kept up-to-date. A campaign manager needs to ensure that a consent mechanism has been put in place within the marketing database.

To allow anyone in the organization to manage their data usage in a compliant manner, organizations will need to provide workflow-based, self-servicing apps, such as Talend Data Preparation and Talend Data Stewardship to different departments, thereby providing them with enhanced autonomy without putting the data at risk (see Figure 3).

Figure 3: Talend Data Stewardship allows the orchestration of stewardship workflows and the delegation of activities to potentially anyone in the organization.

Next Steps to Improving Data Quality

Addressing data quality is critical for organizations to be compliant with the GDPR. Given the complex software ecosystem, this can be done quickly and consistently using automated solutions and self-servicing tools that collect, reconcile, and consolidate data.

The next step of Talend’s comprehensive 16-step plan for the GDPR is stitching data lineage.

← Step 9  |  Step 11 →

Ready to get started with Talend?