Metadata Management 101: What, Why and How
Metadata Management has slowly become one of the most important practices for a successful digital initiative strategy. With the rise of distributed architectures such as Big Data and Cloud which can create siloed systems and data, metadata management is now vital for managing the information assets in an organization. The internet has a lot of literature around this concept and readers can easily get confused with the terminology. In this blog, I wanted to give the users a brief overview of metadata management in plain English.
What does metadata management do?
Let’s get started with the basics. Though there are many definitions out there for Metadata Management, but the core functionality is enabling a business user to search and identify the information on the key attributes in web-baseded user interface.
An example of a searchable key attribute could be Customer ID or a member name. With a proper metadata management system in place, business users will be able to understand where the data for that attribute is coming from and how was the data in the attribute calculated. They will be able to visualize which enterprise systems in the organization the attribute being used in (Lineage) and will be able to understand the impact of changing something (Impact Analysis) to the attribute such as the length of the attribute to other systems.
Technical users also have needs for metadata management. By combining business metadata with technical metadata, a technical user will also be able to find out which ETL job or database process is used to load data into the attribute. Operational metadata such as control tables in a data warehouse load can also be combined to this integrated metadata model. This is powerful information for an end user at to have at their fingertips. The end result of metadata management can be in the form of another ‘database’ of the metadata of key attributes of the company. The industry term for such a database would be called a Data Catalog, or a glossary or Data inventory.
How does metadata management work?
Metadata Management is only one of the initiatives of a holistic Data Governance program but this is the only initiative which deals with “Metadata”. Other initiatives such as Master Data Management (MDM) or Data Quality (DQ) deal with the actual “data” stored in various systems. Metadata management integrates metadata stores at the enterprise level.
Tools like Talend Metadata Manager provide an automated way to parse and load different types of metadata. The tool also enables to build an enterprise model based on the metadata generated from different systems such as your data warehouse, data integration tools, data modelling tools, etc.
Users will be able to resolve conflicts based on for example attribute names and types. You will also be able to create custom metadata types to “stitch” metadata between two systems. A completely built metadata management model would give a 360-degree view on how different systems in your organization are connected together. This model can be a starting point to any new Data Governance initiative. Data modelers will have one place now to look for a specific attribute and use it in their own data model. This model is also the foundation of the ‘database’ that we talked about in the earlier section. Just like any other Data Governance initiatives, as the metadata in individual systems change, the model needs to be updated following a SDLC methodology which includes versioning, workflows and approvals. Access to the metadata model should also be managed by creating roles, privileges and policies.
Why do we need to manage metadata?
The basic answer is, trust. If metadata is not managed during the system lifecycle, silos of inconsistent metadata will be created in the organization that does not meet any teams full needs and provide conflicting information. Users would not know how much they need to trust the data as they is no metadata to indicate how and when the data got to the system and what business rules were applied.
Costs also need to be considered. Without effectively managing metadata, each development project would have to go through the effort of defining data requirements increasing costs and decreasing efficiency. Users are presented with many tools and technologies creating redundancy and excess costs and do not provide the full value of the investment as the data they are looking for is not available. The data definitions are duplicated across multiple systems driving higher storage costs.
As business becomes mature and more and more systems are added, they need to consider how the metadata (and not just the data) needs to be governed. Managing metadata provides clear benefits to the business and technical users and the organization as a whole. I hope this has been a useful intro to all the very basics of metadata management. Until next time!