What Exactly is Talend Data Stewardship and Why Do You Need It?

What does your next data-driven project have to do with data stewardship? 

Well actually a lot if you want to get the most out of your data. Many companies today are filling the data lake with vast amounts of structured and unstructured data. But they tend to forget an important fact: on average, organizations believe that 32 percent of their data is inaccurate. Sounds like addressing this data quality issue before your data lake turns to a data swamp is a must, not an option, right? That is where data stewardship comes into play.

Data stewardship is becoming a critical requirement for successful data-driven insight across the enterprise. And cleaner data will lead to more use, while reducing the costs associated with “bad data quality” such as decisions made using incorrect analytics.

What is Data Stewardship?

If you think of all the data you need to work with each day, you know that often it is incomplete and sometimes incorrect. You may be able to fix it since you know it, but that process does not scale when dealing with vast amounts of data and when other groups “bring their own data” and know what it should look like. Also, let’s not forget that using email or Excel to resolve data quality issues one by one is not very efficient, not to mention the risks that come with the proliferation of uncontrolled copies of potentially sensitive data everywhere in the enterprise across file folders. You need purposed tools, processes and polices to effectively and sustainably manage data quality.

As a critical component of data governance, data stewardship is the process of managing the lifecycle of data from curation to retirement. Data stewardship is about defining and maintaining data models, documenting the data, cleansing the data, and defining the rules and policies. It enables the implementation of well-defined data governance processes covering several activities including monitoring, reconciliation, refining, de-duplication, cleansing and aggregation to help deliver quality data to applications and end users.

In addition to improved data integrity, data stewardship helps ensure that data is being used consistently through the organization, and reduces data ambiguity through metadata and semantics. Simply put, data stewardship reduces “bad data” in your company, which translates to better decision-making and the elimination of the costs incurred when using incorrect information.

Traditionally, data stewardship tasks are assigned to a staff of data experts, the so-called data stewards. But the challenge is that there are few data stewards in a company and they are generally dedicated to high risk projects, such as regulatory compliance. In the absence of data stewards, nobody knows who is accountable for data quality, and that is what leads to a frustrating situation where organizations are fully aware that almost one third of their data assets are not accurate, but nobody acts on it.

Data Stewardship, now a team activity

With more data-driven projects, “bring your own data” projects by the line of business, and increased use of data by data workers such as data scientists, marketing and operations, there presents a need to rethink data stewardship. Next generation data stewardship tools need to evolve to support:

  • Self-service — so that any user from IT to the business can solve data quality issues in a controlled way 
  • Team collaboration — including workflow and task orchestration
  • Manual interaction — in the case of data arbitration or certification where human intervention is required to validate, certify, tag, or select a dataset
  • Integration with data preparation — defining a process for “bring your own data” 
  • Built in privacy — empowering the data protection officer and compliance teams to address new industry regulations for maintaining privacy such as GDPR (General Data Protection Regulation)

Introducing Talend Data Stewardship

With Talend Winter ’17, we are proud to launch a new capability, the Talend Data Stewardship app, a comprehensive tool you can use to configure and manage data assets, that addresses the quality challenges holding your data-driven projects back.

More than a tool just for data stewards with specific data expertise, IT can empower business users to use a point-and-click, Excel-like tool to curate their data. With Talend Data Stewardship you can manage and quickly resolve any data integrity issue to achieve “trusted” data across the enterprise. With the tool, you define common data models, semantics, and rules needed to cleanse and validate data, then define user roles, workflows, and priorities, and delegate tasks to the people that know the data best. Productivity is improved in your data curation tasks by matching and merging data, resolving data errors, certifying, or arbitrating on content.

Delegating tasks that used to be done by data professionals, such as data experts, to operational workers that now the data best is called self-service. It requires workflow-driven, easy to use tools with an excel-like user experience and smart guidance. With this respect, Talend Data Stewardship uses the same user interface that Talend Data Preparation and the tools are bundled together in a unified suite for self-data access, preparation, integration and curation. While Talend Data preparation empowers business users to get clean, useful data in minutes, not hours, in an ad-hoc way, Talend Data Stewardship orchestrates the collaborative work of fixing, merging and certifying data with self-service data curation. Similar to using Excel and Word for office automation, data workers get access to those to tools with consistent user experience that the use depending on their use case.

Because it is fully integrated with the Talend Platform, it can be associated to any data flow and integration style that Talend can manage, so you can embed governance and stewardship into data integration flows, MDM initiatives, and matching processes.

Tools for everyone

The core concepts of Talend Data Stewardship are campaigns and tasks, and the product comes with two predefined roles namely: campaign owners and data stewards.

  • Campaign owners can define different campaigns including Arbitration, Resolution or Merging; engage the data stewards that will contribute in each campaign; define the structure of the data used by the campaigns; refer to Talend Jobs to load tasks into the campaigns; retrieve tasks from the campaigns and assign tasks in the campaigns to different data stewards.
  • Data stewards can explore the data that relates to their tasks, resolve the tasks on a one to one basis or for a whole set of records, delegate tasks to colleagues and monitor and audit stewardship campaigns and data error resolution errors.

Additionally, Talend Data Stewardship can trigger validation workflows for tasks that should be double-checked. Because it is easy to use through a guided user experience and workflow-driven, anyone can participate in the data curation efforts with clear responsibilities and efficient tools to execute them.

CRM Example use case

Consider a use case where you want to improve the quality of data in your CRM system, as it has incorrect data and many duplicates. As the campaign owner using Talend Data Stewardship, you would define a Resolution campaign and objective (e.g. resolve incorrect addresses) and quarantine the data that needs attention, typically the records with invalid or empty contact data, or the potential duplicates. You would then define the participants in the campaign, for example all regional marketing managers, digital marketing managers, and the sales admin. Then you would assign tasks, e.g. the error resolution tasks for the German marketing contacts are assigned to the German marketing managers, because they know this data best to certify it, correct it, or reconcile it against multiple versions of the truth. And they will benefit from the cleansed data through higher conversion rates in their marketing campaigns. As each stakeholder updates the data, you can track the changes made, e.g. marketing verified mailing addresses, telephone numbers and email addresses.

Next, a merging campaign is created to match and merge duplicate records and the sales admin can merge the duplicate records. 

Take it for a test drive

In summary, as companies consume more data and start providing self-service access to data, there is a clear requirement for self-service data quality tools to get the most out of your data. The business benefits from increased data usage and more informed decisions using better data. IT also benefits by delegating data cleaning tasks to data workers.

Are you starting to fill the data lake and realize that you now need to manage it? Take Talend Data Stewardship for a test drive!

Ready to get started with Talend?