Introducing Talend Data Catalog: Creating a Single Source of Trust
Talend Fall ’18 is here! We’ve released a big update to the Talend platform this time around including support for APIs, as well as new big data and serverless capabilities. You will see blogs from my colleagues to highlight those major new product and features introductions. On my side, I’ve been working passionately to introduce Talend Data Catalog, which I believe has the potential to change the way data is consumed and managed within our enterprise. Our goal with this launch is to help our customers deliver insight-ready data at scale so they can make better and faster decisions, all while spending less time looking for data or making decisions with incomplete data.
You Can’t Be Data Driven without a Data Catalog
Before we jump into features, let’s look at why you need a data catalog. Remember the early days of the Internet? Suddenly, it became so easy and cheap to create content and publish it to anyone that everybody actually did it. Soon enough, that created a data sprawl, and the challenge was not any more to create content but to find it. After two decades we know that winners in the web economy are those that created a single point of access to content in their category: Google, YouTube, Baidu, Amazon, Wikipedia.
Now, we are faced with a similar data sprawl in out data-driven economy. IDC research has found that today data professionals are spending 81% of their time searching, preparing, and protecting data with little time left to turn it into business outcomes. It has become crucial that organizations establish this same single source of access to their data to be in the winner’s circle.
Although technology can help to fix the issue, and I’ll come back on it later in the article, among these, enterprises need to set up a discipline to organize their data at scale, and this discipline is called data governance. But traditional data governance must be re-invented with this data sprawl: according to Gartner, “through 2022, only 20% of organizations investing in information will succeed in scaling governance for digital business.” Given the sheer number of companies that are awash in data, that percentage is just too small.
Modern data governance is not only about minimizing data risks but also about maximizing data usage, which is why traditional authoritative data governance approaches are not enough. There is a need for a more agile, bottom-up approach. That strategy starts with the raw data, links it to its business context so that it becomes meaningful, takes control of its data quality and security, and fully organizes it for massive consumption.
Empowering this new discipline is the promise of data catalogs, leveraging modern technologies like smart semantics and machine learning to organize data at scale and turns data governance into a team sport by engaging anyone for social curation.
With the newly introduced Talend Data Catalog, companies can organize their data at scale to make data accessible like never before and address challenges head-on. By empowering organizations to create a single source of trusted data, it’s a win for both the business with the ability to find the right data, as well as the CIO and CDO who can now control data better to improve data governance. Now let’s dive into some details on what the Talend Data Catalog is.
Intelligently discover your data
Data catalogs are a perfect fit for companies that modernized their data infrastructures with data lakes or cloud-based data warehouses, where thousands of raw data items can reside and can be accessed at scale. The catalog acts as the fish finder for that data lake, leveraging crawlers across different file systems, traditional, Hadoop, or cloud, and across typical file format. Then automatically extracts metadata and profiling information, for referencing, change management classification and accessibility.
Not only can it bring all of those metadata together in a single place, but it can also automatically draw the links between datasets and connect them to a business glossary. In a nutshell, this allows businesses to:
- Automate the data inventory
- Leverage smart semantics for auto-profiling, relationships discovery and classification
- Document and drive usage now that the data has been enriched and becomes more meaningful
The goal of the data catalog is to unlock data from the application where they reside.
Orchestrate data curation
Once the metadata has been automatically harvested in a single place, data governance can be orchestrated in a much more efficient way. Talend Data Catalog allows businesses to define the critical data elements in its business glossary and assign data owners for those critical data elements. The data catalog then relates those critical data elements to the data points that refer it across the information system.
Now data is in control and data owners can make sure that their data is properly documented and protected. Comments, warnings, or validation can be crowdsourced from any business user for collaborative, bottom-up governance. Finally, the data catalog draws end-to-end data lineage and manages version control. It guarantees accuracy and provides a complete view of the information chain, which are both critical for data governance and data compliance.
Easy search-based access to trusted data
Talend Data Catalog makes it possible for businesses to locate, understand, use, and share their trusted data faster by searching and verifying data’s validity before sharing with peers. Its collaborative user experience enables anyone to contribute metadata or business glossary information.
Data governance is most often associated with control. A discipline that allows businesses to centrally collect data, process, and consume under certain rules and policies. The beauty of Talend Data Catalog is that not only does it control data but liberates it for consumption as well. This allows data professionals to find, understand, and share data ten times faster. Now data engineers, scientists, analysts, or even developers can spend their time on extracting value from those data sets rather than searching for them or recreating them – removing the risk of your data lake turning into a data swamp.
A recently published IDC report, “Data Intelligence Software for Data Governance,” advocates the benefits of modern data governance and positions the Data Catalog as the cornerstone of what they define as Data Intelligence Software. In the report, IDC calls it a “technology that supports enablement through governance is called data intelligence software and is delivered in metadata management, data lineage, data catalog, business glossary, data profiling, mastering, and stewardship software.”
For more information, check out the full capabilities of the Talend Data Catalog here.