Introducing Talend Data Catalog: Creating a Single Source of Trust
Remember how the Internet made it so easy and cheap to create content that everyone did it? Well, that’s where we are with the data economy today.
The challenge with the web was not lack of fascinating content, but a way to find it. After two decades we know that winners in the web economy are those that created a single point of access to content in their category: Google, YouTube, Baidu, Amazon, Wikipedia.
Now, we are faced with a similar data sprawl in our data-driven economy. IDC research has found that today data professionals are spending 81% of their time searching, preparing, and protecting data with little time left to turn it into business outcomes. It has become crucial that organizations establish this same single source of access to their data to be in the winner’s circle.
Although technology can help to fix the issue, and I’ll come back on it later in the article, among these, enterprises need to set up a discipline to organize their data at scale, and this discipline is called data governance. But traditional data governance must be re-invented with this data sprawl: according to Gartner, “through 2022, only 20% of organizations investing in information will succeed in scaling governance for digital business.” Given the sheer number of companies that are awash in data, that percentage is just too small.
Modern data governance is not only about minimizing data risks but also about maximizing data usage, which is why traditional authoritative data governance approaches are not enough. There is a need for a more agile, bottom-up approach. That strategy starts with the raw data, links it to its business context so that it becomes meaningful, takes control of its data quality and security, and fully organizes it for massive consumption.
You Can’t Be Data Driven without a Data Catalog
Empowering this new discipline is the promise of data catalogs, leveraging modern technologies like smart semantics and machine learning to organize data at scale and turns data governance into a team sport by engaging anyone for social curation.
With the newly introduced Talend Data Catalog, companies can organize their data at scale to make data accessible like never before and address challenges head-on. By empowering organizations to create a single source of trusted data, it’s a win for both the business with the ability to find the right data, as well as the CIO and CDO who can now control data better to improve data governance. Now let’s dive into some details on what the Talend Data Catalog is.
Intelligently discover your data
Data catalogs are a perfect fit for companies that modernized their data infrastructures with data lakes or cloud-based data warehouses, where thousands of raw data items can reside and can be accessed at scale. The catalog acts as the fish finder for that data lake, leveraging crawlers across different file systems, traditional, Hadoop, or cloud, and across typical file format. Then automatically extracts metadata and profiling information, for referencing, change management classification and accessibility.
Not only can it bring all of those metadata together in a single place, but it can also automatically draw the links between datasets and connect them to a business glossary. In a nutshell, this allows businesses to:
- Automate the data inventory
- Leverage smart semantics for auto-profiling, relationships discovery and classification
- Document and drive usage now that the data has been enriched and becomes more meaningful
The goal of the data catalog is to unlock data from the application where they reside.
Orchestrate data curation
Once the metadata has been automatically harvested in a single place, data governance can be orchestrated in a much more efficient way. Talend Data Catalog allows businesses to define the critical data elements in its business glossary and assign data owners for those critical data elements. The data catalog then relates those critical data elements to the data points that refer it across the information system.
Now data is in control and data owners can make sure that their data is properly documented and protected. Comments, warnings, or validation can be crowdsourced from any business user for collaborative, bottom-up governance. Finally, the data catalog draws end-to-end data lineage and manages version control. It guarantees accuracy and provides a complete view of the information chain, which are both critical for data governance and data compliance.
Easy search-based access to trusted data
Talend Data Catalog makes it possible for businesses to locate, understand, use, and share their trusted data faster by searching and verifying data’s validity before sharing with peers. Its collaborative user experience enables anyone to contribute metadata or business glossary information.
Data governance is most often associated with control. A discipline that allows businesses to centrally collect data, process, and consume under certain rules and policies. The beauty of Talend Data Catalog is that not only does it control data but liberates it for consumption as well. This allows data professionals to find, understand, and share data ten times faster. Now data engineers, scientists, analysts, or even developers can spend their time on extracting value from those data sets rather than searching for them or recreating them — removing the risk of your data lake turning into a data swamp.
A recently published IDC report, “Data Intelligence Software for Data Governance,” advocates the benefits of modern data governance and positions the Data Catalog as the cornerstone of what they define as Data Intelligence Software. In the report, IDC calls it a “technology that supports enablement through governance is called data intelligence software and is delivered in metadata management, data lineage, data catalog, business glossary, data profiling, mastering, and stewardship software.”