10 Best Practices for Successful Data Quality
Many organizations today are plagued by poor data quality management, which in turn, leads to poor decision making ability. The costs of bad data can add up quickly. In fact, poor data quality can cost as much as 15-25% of total revenue, according to a study conducted by MIT Sloan.
So if bad data is so costly, you would think that organizations would go to great lengths to ensure good data quality. However, Harvard Business Review said that 47% of newly created data records have at least one critical error. Why? It comes down to this — ensuring good data quality is a challenge that many organizations aren’t focused on.
Data is generated by people, who are inherently prone to human-error, each with their own way of handling and formatting data. Additionally, each department in an organization inputs data with its own goals and views of what data is important and what errors are acceptable. These data quality best practices will help make sure your data stays on the right track:
- Get buy-in and make data quality an enterprise-wide priority
- Establish metrics
- Investigate data quality failures
- Invest in internal training
- Establish data governance guidelines
- Establish a data auditing process
- Assign a data steward in every department
- Update and maintain data security standards
- Implement a single source of truth
- Integrate and automate data streams
- Leverage the cloud to automate consistent data
What is data quality?
Data quality is the degree to which data is error-free and able to serve its intended purpose. Certain properties of data contribute to its quality. Data must be:
- Complete with data in every field unless explicitly deemed optional
- Unique so that there is only one record for a given entity and context
- Formatted the same across all data sources
- Trusted by those that rely on the data
When organizations achieve consistently high quality data, they are better positioned to make strategic business decisions that yield valuable business insights as well as drive revenue.
1. Get buy-in and make data quality an enterprise-wide priority
If only half of the company is committed to ensuring data quality, then you can expect no better than 50% data quality. All stakeholders must understand and take responsibility for data quality.
To get enterprise buy-in, data quality has to be supported and promoted at every level of management, including the C-suite. If executives and business leaders don’t prioritize data and good data quality, data managers won’t either.
2. Establish metrics
You need a way to measure data quality by establishing metrics that are applicable to the goals and business targets you are trying to achieve with your data. Measuring data quality is essential to:
- Advising management on the effectiveness of data quality to gain buy-in
- Understanding how accurate your data is
- Quantifying missing, incomplete, or inconsistent data
- Taking corrective action to improve data quality
3. Investigate data quality failures
If you fail to investigate data quality failures, it stands to reason that errors will continue to occur. Correcting errors in data can be a difficult, time-consuming task. Once the data is corrected, it is tempting to consider the task complete.
Data errors result from a variety of causes. Experian's 2019 Global data management research states that the top causes of inaccurate data are human error, too many data sources, and lack of communication between departments. Once you know the cause of the error, you can take action to prevent similar errors in the future.
4. Invest in internal training
Attaining good data quality is a difficult task. It requires a deep understanding of data quality principles, processes, and technologies. This knowledge is best obtained through formal training. Following the training track for a data management certification such as Certified Data Management Professional (CDMP), Certified Information Management Professional (CIMP), or Certified Data Steward (CDS) would provide a good road map.
Encourage data quality staff to earn the certification, to better inform them on:
- Basic concepts, principles, and practices of quality management
- How quality management principles are applied to data
- How to think through both the benefits of high-quality data and the costs of poor quality
- How to create, deliver, and sell a business case for data quality
- The key principles in building data quality organizations
- Basic concepts, principles, and practices of a data stewardship program
- The data quality challenges that are inherent in data integration
5. Establish and implement data governance guidelines
Data governance goes beyond rules and data protection. By definition, data governance is a collection of processes, roles, policies, standards, and metrics that ensure the effective and efficient use of information in enabling an organization to achieve its goals. Every organization should establish a set of data governance guidelines specific to their unique processes, use cases, and structure.
The best way to implement these data governance guidelines across an organization, however, is to engage business users in best practices and as members of the data team. By deploying a collaborative approach in ensuring data governance in running reports and utilizing data-driven information, organizations better promote a culture of data quality.
6. Establish a data auditing process
We know that there is great value in good data quality, so we implement processes to create and maintain it. But how do we know that those processes are effective? How do we gain the trust of others that our data quality is good?
Audits on the data within data repositories are the best way to build trust in the data. The data audit process should check for any cases of poor data quality including but not limited to:
- Poorly populated fields
- Incomplete data
- Inconsistencies in formatting
- Duplicate entries
- Outdated entries
The frequency of audits is important to the acceptance and success of the data audit process. If you audit once a year, errors may exist for a year before being found. It would also take a very long time to find, correct, and investigate a full year's worth of errors. Ideally, audits should have an automated, continuous component with periodic incremental audits.
7. Assign a data steward in every department
Data stewards are responsible for maintaining the data integrity and data quality on specified data sets. They need to make sure that their data sets meet data quality standards as defined by the data governance team. This crucial role is key to ensuring good data quality.
Since data management has historically been the responsibility of IT personnel, data stewards may be found within IT. However, organizations have learned that those closest to the data's origin make better data stewards. For example, a sales administrator or CRM manager may know a customer database better than someone in IT, resulting in more accurate and higher quality data.
8. Implement a single source of truth
Single source of truth (SSOT) is a concept used to ensure that everyone in an organization bases business decisions on the same consistent and accurate data. With critical business decisions being data-driven, it is important that all business units agree on one source they trust to contain accurate, high quality data. Once the SSOT is accepted as the source of accurate data across the organization, that data can be maintained based on the organization’s data quality standards and used by anyone for any purpose to gain trusted business insights.
9. Integrate and automate data streams
Cloud computing has made it easier to access data from a wide variety of sources. With that capability comes the challenge of integrating disparate data in different formats from multiple data streams, possibly with duplicate and poor quality data, into a single data repository. To address that challenge, data must be cleansed and de-duplicated to identify and resolve corrupt, inaccurate, irrelevant, or duplicate data. This complex process often requires a data preparation tool to alleviate the workload and man-hours. However, once this process is established, organizations can better ensure data quality.
10. Leverage the use of the cloud
Data from multiple sources and locations, on and off the corporate network, is used by decision-makers around the globe. If your data quality tools are sitting in one or two corporate data centers, getting consistent data from widespread sources to business analysts around the world comes with needless complexity and latency. Move your data quality tools to the cloud to get them closer to your data sources and users, resulting in higher adoption of the tools and better data quality practices.
A cloud-native solution provides high availability, elasticity to accommodate fluctuating demand, pay-as-you-go pricing, and the benefits of a cloud service provider's ecosystem of services. It also offloads the burden of building, configuring, and maintaining servers and storage, as well as managing hardware and software procurement, updates, and upgrades.
The cloud and the future of data quality
Good data quality is both essential and difficult to achieve. Fortunately, cloud technologies like cloud data warehouses are making it easier to access data, while efficiently and effectively ingesting and preparing data from multiple sources in a wide variety of formats.
Newer cloud technologies like containers and serverless make it even easier to leverage data quality tools. It seems inevitable that cloud technology innovation will make hardware and scaling concerns irrelevant to the customer of cloud services and increase portability across networks and cloud service providers.
Start improving data quality
So how do you get started on the path to improving data quality? The quickest, easiest, and most effective path to data you can trust is to use a cloud-native suite of apps focused on data integrity and integration, like Talend Data Fabric. Talend Data Fabric solves some of the most complex aspects of the data value chain from end-to-end. Users can collect data across systems, govern it to ensure proper use, transform it into new formats, improve quality, and share it with internal and external stakeholders — all at the speed of business.
Once you've checked out Talend, go back to our #1 best practice for data quality — gain buy-in and make data quality an enterprise-wide priority. This guide combined with Talend Data Fabric’s capabilities will clear the path for a successful data quality process.
Ready to get started with Talend?
More related articles
- What is Data Profiling?
- What is Data Integrity and Why Is It Important?
- What is Data Quality? Definition, Examples, and Tools
- What is Data Quality Management?
- What is Data Redundancy?
- What is data synchronization and why is it important?
- 8 Ways to Reduce Data Integrity Risk
- Data Quality Analysis
- Data Quality and Machine Learning: What’s the Connection?
- Data Quality Software
- Data Quality Tools - Why the Cloud is the Cure for Dirty Data
- How to Choose a Big Data Quality Model
- How to Choose the Right Data Quality Tools
- The Value of Data Quality in Healthcare
- Using Machine Learning for Data Quality