What is Data Quality? Definition, Examples, and Tools
What is data quality?
Data is your organization’s most valuable asset, and decisions based on flawed data can have a detrimental impact on the business. That’s why you must have confidence in your data before it’s shared with everyone who needs it. Data quality is a measure of how “fit” the data is to meet the specific needs of business users.
Why is data quality important?
Organizations are besieged by countless data challenges, from an exploding volume of data and data sources to competing data types and structures to regulatory compliance requirements surrounding that data. But to truly be data-driven, solving these challenges is not sufficient. Without high-quality data, the dashboards and data analytics that organizations rely on to make decisions are incomplete, out of date, or just plain wrong.
The impact of poor data quality
The insights that a business can extract from data are only as good as the data itself. Bad data can come from every area of your organization — in many forms — and can lead to difficulties in mining for insights and ultimately, poor decision-making.
Data quality is a worrisome subject for many executives. According to a recent report, nearly half of organizations surveyed are concerned about the quality of the data they’re using. Poor data quality can be costly; an astonishing study conducted by MIT Sloan notes that bad data can cost an organization as much as 15-25% of total revenue.
The cost of doing nothing to address data quality issues explodes over time. Poor data quality management can be mitigated much more easily if caught before it is used — early on in the data lifecycle. If you implement data standardization and verification at the time data enters the system, before it makes it into your back-end systems, we can say that it costs about $1 to standardize it. If you initiate data cleansing later, going through data matching and cleansing after data entry occurs, then it would cost $10 in comparison to the first dollar in terms of time and effort expended. And just leaving that bad quality data to sit in your system and continually give you incorrect information to make decisions on, or to send out to customers, or present to your company, would cost you $100 compared to the $1 it would’ve cost to deal with that data at its entry point. The cost gets greater the longer bad data sits in the system. The goal, therefore, is to catch bad data before it ever enters your systems.
The good news is that you don’t have to allow bad data to cost your company any more time and money. Keeping the five data quality metrics at the forefront of your data collection initiatives will promote optimal performance of business systems and support user faith in the data’s reliability.
Data quality measurement
We know the impact of low-quality data, but what exactly makes data “bad?” It’s easy to point to things like inconsistent data or incomplete data, but in order to maximize the effectiveness of any data quality intervention, it’s important to build a complete understanding of the biggest factors that determine data quality.
There are 5 primary dimensions of data quality:
- Completeness — In order for data to be valuable for its intended purpose, it must be sufficiently complete.
- Accuracy — Data accuracy involves ensuring data is correct, reliable, and/or certified by some sort of data governance body.
- Timeliness — Data records should be as recent and fresh as possible; at least recent enough to be relevant for their intended use case.
- Consistency — Data consistency (sometimes referred to as data validity) means that data in a data set is in the same format and stays in that same format between versions and updates. Maintaining consistent data across data sets makes it easier to join and enrich them down the road.
- Accessibility — Data assets must be easily retrievable by the people who need to access them (without compromising compliance requirements).
Obviously, these definitions are not exhaustive. For example, ensuring the uniqueness of data is a necessary component of good data quality, but a set of data that satisfies the “completeness” and “accuracy” dimensions should inherently be free of duplicates.
Setting data quality expectations
Regardless of an organization’s size, function, or market, every organization needs to pay attention to data quality to understand its business and to make sound business decisions. The kinds and sources of data are extremely numerous, and its quality will have different impacts on the business based on what it’s used for and why. That is why your business needs to set unique and agreed-upon expectations, decided in a collaborative manner, for each of the five metrics above, based on what you hope to get out of the data.
Data’s value comes primarily when it underpins a business process or decision-making based on business intelligence. Therefore, the agreed data quality rules should take into account the value that data can provide to an organization. If it is identified that data has a very high value in a certain context, then this may indicate that more rigorous data quality rules are required in this context. Companies therefore must agree on data quality standards based not only on the data quality dimensions themselves — and, of course, any external standards that data quality must meet — but also on the impact of not meeting them.
Data quality best practices
To do this, you need to establish a pervasive, proactive, and collaborative approach to data quality in your company. Data quality must be something that every team (not just the technical ones) has to be responsible for — it has to cover every system and must have rules and policies that stop bad data before it ever gets in.
Does this sound impossible? It’s not. Here’s your roadmap to develop this approach:
- Build your interdisciplinary team. Recruit data architects, business people, data scientists, and data protection experts as a core data quality team. It should be managed by a leader who should be both a team coach and a promoter of data quality projects.
- Set your expectations from the start. Why data quality, and what is a good measure of data quality for the particular use case? Find your data quality answers among business people. Make sure you and your team know your finish line. Make sure you set goals with a high business impact.
- Anticipate regulation changes and manage compliance. Stay on top of regulatory compliance requirements and changes. Use your data quality core team to confront compliance initiatives such as GDPR. You will then gain immediate short-term value and strategic visibility.
- Establish impactful and ambitious objectives. When establishing your data quality plan, don’t hesitate to set bold business-driven objectives. Your plan will retain the attention of the board and stretch people’s capabilities.
- Still deliver quick wins. Quick wins start by engaging the business in data management. Examples include onboarding data, migrating data faster to the cloud, or cleansing your Salesforce data.
- Be realistic. Define and actively use measurable KPIs accepted and understood by everyone. Data quality is tied to business, so drive your projects using business driven indicators such as ROI or Cost-Saving Improvement Rate.
- Celebrate success. When finishing a project with measurable results, make sure you take time to make it visible among key stakeholders. Know-how is good. It’s even better with good communication skills.
Managing data across the enterprise
Accessing and monitoring that data across internal, cloud, web, and mobile applications is a big task. The only way to scale that kind of monitoring across all of those systems is through data integration. But data integration by itself is not sufficient.
A proactive approach to data quality allows you to check and measure that level of quality stored in any data set. When connecting different data sources, data profiling is key to assessing your data quality for completeness, accuracy, timeliness, and consistency. It helps you answer the “What your data looks like” question. Profiling saves time and helps you to spot inaccuracies quickly.
Next, data curation is required, which includes organizing and managing data sets to meet business needs. Typically, data is treated — cleansed, standardized, converted, split, formatted, and transformed by various other methods — to shape it so that it can be easily consumed.
Then there’s data validation, which applies business-centric data rules to ensure that the data conforms to various standards and is within acceptable business parameters. In many cases, third-party data validation functions are also needed to verify entities like addresses and postal codes, and even enrich data. In enrichment, data is blended together with other data sources, filling in gaps and improving the overall context. If sensitive data is processed, data masking or obfuscation functions are needed.
Finally, you need data observability, which is not only focused on completeness, accuracy, uniqueness, timeliness, and anomalies, but also changes to the data pipelines, data infrastructure, data lineage, and availability.
With the right data quality tools and integrated data, you can create whistleblowers that detect some of the root causes of overall data quality problems early. The cost of bad data quality can be counted in lost opportunities, bad decisions, and the time it takes to hunt down, cleanse, and correct bad data. Collaborative data management and the tools to correct errors at the point of origin are the clear ways to ensure good data quality for everyone who needs it. Learn about the numerous capabilities Talend’s data management platform offers to help achieve both those goals.
Ready to get started with Talend?
More related articles
- What is Data Profiling?
- What is data integrity and why is it important?
- What is Data Quality Management?
- What is Data Redundancy?
- What is data synchronization and why is it important?
- 8 Ways to Reduce Data Integrity Risk
- 10 Best Practices for Successful Data Quality
- Data Quality Analysis
- Data Quality and Machine Learning: What’s the Connection?
- Data Quality Software
- Data Quality Tools - Why the Cloud is the Cure for Dirty Data
- How to Choose a Big Data Quality Model
- How to Choose the Right Data Quality Tools
- The Value of Data Quality in Healthcare
- Using Machine Learning for Data Quality