5 best practices to innovate at speed in the Cloud: Tip #4 Perform faster root cause analysis thanks to data lineage

5 best practices to innovate at speed in the Cloud: Tip #4 Perform faster root cause analysis thanks to data lineage

  • David Talaga
    David Talaga is Senior Product Marketing Manager for Data Governance at Talend. David has a rich and diverse marketing experience, including strategic, field, and product marketing roles in data-driven organizations. After graduating from EDHEC, David started his career as a Data analyst in the Healthcare Industry. In 2000, he joined Dassault Systèmes where he held several senior positions, notably heading up the technology partnership program in Augmented Reality and the strategic alliance with Microsoft. In 2006, David joined Microsoft as Product Marketer for the Software Engineering Product Line. In 2014, he became Marketing Manager for a new EdTech Offering at John Wiley and Sons before joining Talend as Product Marketing for Data Governance Solutions.

Starting September, The Talend Blog Team started to share fruitful tips & to securely kick off your data project in the cloud at speed.  This week, we’ll start with the fourth speed capability: perform faster root cause analysis thanks to data lineage.


Like any supply chain that aspires to be lean and frictionless, data chains need transparency and traceability. There is a need for automated data lineage to understand where data comes from, where does it go, how it is processed and who consumes it. There is also a need for whistle blowers for data quality or data protection and for impact analysis whenever change happens.

Data catalog


Why it’s important

The faster data flows and the more it is used to automate and drive, rather than just influence decisions, the more important it is used to sense issues or change and react accordingly. A modern data platform establishes an audit trail for impact analysis, data error resolution, internal control or regulatory compliance.


 When it’s important

Regulators ask for data transparency when managing sensitive data to mitigate risks, managing privacy and moving data across borders.  The cost of data errors only compounds with time.  As such, the sooner in the data flow data errors are identified, the better.

If a business wants to review, for example, where sales information entered the system in order to test an idea about a new product or process, data lineage can quickly provide that information. An extraordinary amount of data enters a business system each day, and data lineage reduces risk by providing data origin and information about how it is traveling through the system.

When it comes to trusting data and ensuring governance, lineage information becomes especially important. For example, the healthcare and finance industries are subject to strict regulatory reporting and must rely on data provenance and demonstrate lineage especially with today’s large open source technologies. Providing a record of where data came from, how it was used, who viewed it and whether it was sent, copied, transformed or received, all in real time assures that full details about any person or system in contact with data are available at any time

Our recent data trust readiness report reveals that only 38% of respondents believe their organizations are excellent at tracing back errors into files.

Download Data Trust Readiness Report now.
Download Now

How Talend tools can help

Data lineage is a map of the data journey, which includes its origin, each stop along the way, and an explanation on how and why the data has moved over time. The data lineage can be documented visually from source to eventual destination — noting stops, deviations, or changes along the way. The process simplifies tracking for operational aspects like day-to-day use and error resolution.


Data lineage is a core component of Talend Data Catalog. Whilst it integrates data lineage, Talend Data Catalog helps you to create a central, governed catalog of enriched data that can be shared and collaborated on easily. It can automatically discover, profile, organize and document your metadata and makes it easily searchable. You can manage metadata by searching for, documenting, analyzing and comparing them, tracing end-to-end data lineage and performing impact analysis.

Talend data catalog

Figure 1 Talend Data Catalog builds end to end lineage down to the attribute level.


Talend Data Catalog supports data lineage across multiple platforms, including enterprise apps like SAP, cloud apps like Salesforce.com, data stores like file systems, Hadoop, SQL and NoSQL, BI and analytical tools , ETLs...

Feel free to watch this on-demand webinar where Stewart Bond, Research Director of IDC’s Data Integration and Integrity Software Service, and Talend will highlight this modern approach to data governance. You can also download our definitive guide to data governance to explore other capabilities of trust & speed.


 Want to explore more capabilities?

This is the fourth out of five speed capabilities. Cannot wait to discover our last capability?

Go and download our Trust Data Readiness Report to discover other findings and the other 9 trust and speed capabilities.


Join The Conversation


Leave a Reply

Your email address will not be published. Required fields are marked *