What is Metadata?
Data can be a double-edged sword. While the increase in availability of data makes it possible to make more effective business decisions, wading through a data lake is a huge pain point for technologists and analysts alike. This is where metadata can be a lifesaver.
Metadata is simply data about data. It is information that helps find, organize, maintain, and compare data.
Metadata helps address basic questions about the data such as what, when, why, who, where, which, and how. By answering these questions, it helps better characterize the data. For example, a document or spreadsheet may have attributes such as author, date created, date modified, and language that function as its metadata.
In most real-life applications, a user hardly accesses the metadata. During a web search, for example, search algorithms use metadata to show relevant results to the user. Additionally, when managing big data for data compliance purposes, data platforms use metadata to categorize for organization and governance.
How is Metadata Used?
2.5 quintillion bytes of data are created every day. That means 90 percent of the data in the world has been created just in the previous two years. In this climate of data boom, a metadata management tool is a valuable mechanism to find data faster. It achieves that objective by introducing context or provenance to content.
For example, in a blood collection lab, just after the sample is collected, information such as the owner, date of collection, and the tests that need to be performed are tagged with the sample. Without this metadata, the actual data—or the sample in this case—becomes meaningless.
Let’s look at a few scenarios where metadata can be used to improve big data processes:
1. Big Data Master Record Management
Big data—ingested from multiple sources—is a lot of work to make sense of. The records retrieved could, for example, conflict, which makes creating a single master record essential for data integrity. There is also the complexity of understanding semi-structured data (such as XML files) and unstructured data (such as videos).
Metadata can be beneficial in both scenarios, making it critical for the success of big data governance. Fields such as ‘ingested source’ and ‘date modified’ can help resolve conflicts by determining which source system is more reliable in a given situation and which record is the most recent and up to date.
2. Access Control Management
As data privacy becomes an increasing concern in the cloud, there is a need to manage control of who can access what data. One of the simplest ways to achieve this is metadata. In the big data world, this eliminates the need to process a lot of data to make a decision. For example, consider an application which collects and stores employees’ salary data. The company obviously mandates that only human resource professionals can be privy to this confidential data. A metadata field such as ‘department code’ should be sufficient to control this access.
3. Business Intelligence
Business intelligence is often associated with the crunching of actual data, not the metadata. However, metadata can be surprisingly useful in identifying patterns and making appropriate recommendations.
In a manufacturing organization, simple metadata such as date and timestamps can provide useful information about which machinery is idle, has had more downtime, or is due for maintenance. This information may also help identify better optimization models to increase throughput.
4. GDPR Compliance
The GDPR went into effect on May 25, 2018. Non-compliance with the regulation can lead to heavy fines of €20M or four percent of the organization’s worldwide revenues, whichever is higher. GDPR mandates that organizations protect the privacy of customer data, share how they store and use this data, and take accountability for how they manage it. Creation of a data taxonomy is one of the most important steps towards GDPR compliance, and that taxonomy is all metadata.
Metadata in the Real World: Improving Customer Service
One company that has used metadata well is Air France-KLM. The airline set a goal to to become the #1 airline for customer service, but with over 90 million passengers per year, their customer service data would need to be nimble.
Metadata helped them make huge strides toward their goal. By effectively organizing customer data from trip searches, bookings, social media, airport lounge interactions, and more, the airline created a 360-degree view of each customer. Air France is now equipped to make individualized recommendations and provide unique service to each one of their customers.
Metadata: Today and Tomorrow
The concept of metadata has been around since 280 BC when a Greek grammarian in the Library of Alexandria attached a small tag to the end of each scroll with information about the author and title, so that the library users did not have to unroll each scroll to see its properties.
In the technical world, metadata had previously been perceived as solely used by database administrators and application developers to define relationships among data entities.
Today, metadata has evolved from its purely technical scope to create a strong business impact. For cloud-based solutions with massive data lakes, metadata plays a crucial role in decoding data. Data analysts use metadata to derive business insights from raw data dumps.
Uncovering the Power of Metadata
One of the biggest challenges for senior leadership is to prioritize metadata management. Often, it can get lost among other big data initiatives, but the increasing reliance on metadata means that there has to be a clear strategy to collect, store, and maintain it accurately.
Creating a better understanding and awareness about metadata, ensuring data quality, defining roles and assigning ownership, and investing in systems that can provide a strong framework for metadata management are some of the approaches that can help unleash the potential of metadata.
At the other end of the spectrum, the debate on privacy and security in the cloud is worth thinking about. Mitigating risks is going to remain a key challenge with sensitive customer information available for easy access. While there are regulations such as GDPR that aim to close this gap, organizations also need to handle this challenge with a nuanced and sensitized approach to handling metadata along with an unambiguous data governance program.
Talend Metadata Manager is a powerful, centralized tool that connects all the metadata from multiple platforms, databases, and analytical tools to generate a holistic view. Learn more about its capabilities.
Ready to get started with Talend?
More related articles
- Building a Data Governance Framework
- Data governance with Snowflake: 3 things you need to know
- Data Governance Tools: The Best Tools to Organize, Access, Protect
- Data governance framework – guide and examples
- Five Pillars for Succeeding in Big Data Governance and Metadata Management with Talend
- Structured vs. Unstructured Data: A Complete Guide
- What is a data catalogue, and do you need one?
- What is data stewardship?
- What is Data Governance and Why Do You Need It?
- What is Data Lineage and How to Get Started?
- What is Data Access and Why is it Important?
- What is Data Obfuscation?