What is data health?
If being surrounded by data were all it took to ensure better business decisions, we would have it made. As we are learning every day, data collection alone is not enough to turn a data-rich organization into a data-driven organization.
The irony of the era of big data seems to be that, the more data an organization manages, the harder it becomes to stay on top of that data. The only way to meet fundamental business goals is to take action based on high-quality, trusted data — in a word, data that’s healthy.
Ask any organization how they measure the health of their business, and they will list metrics backed by the data they run their business on. Most people know intuitively that healthy data should be clean, complete, and in compliance with legal and regulatory requirements. Unfortunately, those factors alone won’t guarantee that data is ready to use for business operations. Most organizations can’t measure how healthy their data is — and it's foolish to rely on data whose health you can’t measure.
Part of the problem is that while people think they understand what it means to have healthy data, they struggle to define or evaluate data health. So let’s start with a clear definition.
Data health definition
Data health describes how well an organization’s data supports its business objectives. Data is healthy if it is easily discoverable, understandable, and of value to the people that need to use it, and these characteristics are sustained throughout its lifecycle. You’ll know that your organization’s data is healthy when you can prove that it’s valid, complete, and of sufficient quality to produce analytics that decision-makers can feel comfortable relying on for business decisions.
Talend’s vision of data health combines technologies and behaviors to measure and manage data for better discoverability, understandability, and value. Healthy data means that everyone in the organization can access the information they need, when they need it, and use it without wondering about its validity.
Like any health care system, data health involves monitoring and intervention across the entire life cycle. Organization-wide data health is only possible when an organization combines three key elements:
- Data agility to quickly deliver data to those who need it. Agility requires a flexible, scalable environment with end-to-end lifecycle management.
- Data culture shared throughout the organization. Culture depends on the data literacy of every employee and a shared understanding of the origin, importance, and meaning of various data points, data sets, and data sources.
- Data trust within and across departments. Trust comes from data that’s visible and verifiable across lines of business, and gives data experts and users confidence in actions and decisions.
Risk factors for good data health
It may sound straightforward, but most organizations struggle to maintain good data health. Much like the healthcare industry, the data management industry must constantly evolve and adapt to keep pace with new threats and a constantly shifting landscape.
To meet these challenges, we must first understand the dynamics in the market that pose the greatest risk to organizations that don’t make data health a priority:
The speed and flow of data are on the rise
Data demand is driven by the business, and every business unit is demanding more data and access to real-time data. As such, data environments are getting increasingly complex. At the same time, the speed that data flows must increase to respond faster to business opportunities and threats. An organization’s data infrastructure must be increasingly nimble to handle this load.
People with varying levels of data skills are getting involved in data management
Data management now involves people who are not data engineers. This is driven on the demand side by data consumers who have become citizen data integrators to self-serve their own data needs, and on the supply side by strapped IT teams that can’t build solutions for skills gaps fast enough to meet demand. That’s why a robust data architecture should include capabilities that reduce reliance on niche technical skills and be supported by a culture that fosters the understanding and usage of data by every employee.
Organizations have accelerated the move to hybrid and multi-cloud environments
Thanks to the flexibility of cloud services, hybrid and multi-cloud environments increase productivity and optimize costs by cutting investments in on-premises hardware. By 2022, 75% of all databases will be deployed or migrated to a cloud platform. In these complex and shifting data environments, an organization’s data management strategy needs to be very well-defined, but also flexible.
The regulatory environment is continuously changing
Given the rise in privacy concerns and regulations (GDPR, CCPA, and others), the lack of a comprehensive, consistent approach to data governance and quality is slowing processes further and resulting in significant risk exposure. Data experts are left to their own devices and employees getting their hands on data they don’t understand or which they should not have access to. All the while, data and IT leaders have no control over the health of their enterprise data.
Advantages of healthy data
When organizations talk about data, they usually have a specific objective or initiative in mind. Every initiative has an associated business outcome: increased revenue, reduced costs, or mitigated risk.
These initiatives fall into a few common categories that will be familiar to most data users:
Moving data to obtain faster time to insights involves the whole data lifecycle that makes analytics possible within an organization, from acquisition of raw data to delivery of trusted data for reports and models, and everything in between. These are some common initiatives for organizations that are ready to turn their data into business results:
- Corporate reporting and enterprise analytics projects
- Marketing funnel optimization
- Pricing optimization
- Selecting the next logical purchase
- Real-time fraud detection
- Churn management
- Preventive maintenance
- Single customer view and “customer 360” projects
Modernizing cloud and data
Apps and technology can make data more scalable, adaptable, and agile — but only by moving and managing data in a cloud, hybrid, or multi-cloud environment. Cloud modernization projects represent a unique opportunity to modernize the data. Failure to take advantage of cloud modernization means reduced time-to-value, lost productivity due to suboptimal data orchestration efforts, and decreased business agility.
Here are some of the key data initiatives organizations tackle as part of their larger cloud projects:
- App modernization
- App decommissioning
- IT infrastructure modernization
- IT cost optimization
- Data sovereignty
- Data monetization
Establishing data excellence
Maintaining the delicate balance between access and security means that IT and data teams must focus more on solving data access issues across systems and less on the foundational job of ensuring the business is running on trusted, accurate data. This results in lost productivity and agility, and delays organizations from achieving their desired business objectives.
A healthy organization must institute centralized standards, programs, and processes to balance IT priorities and ensure that data is compliant and secure as well as accessible and understandable. A few common initiatives around this goal include the following:
- Data platform strategies
- Data governance initiatives
- Access management
- Data quality
- Business data glossary / data literacy initiatives
- Data marketplace
- Compliance management with regulations such as GDPR, CCPA, and others
Accelerating operational data
Every organization wants to make data available, accessible, and consumable — internally and externally — through app integration and API delivery. But employees, partners, and customers do not have access to the data they need when they need it. With the growing number of systems, sources, endpoints, data volumes, and use cases, data teams spend time working between many apps. And that, in turn, reduces team productivity.
These are some common data initiatives to help organizations share the right data across systems and people more quickly and efficiently, internally and externally:
- Data sharing between apps
- Data monetization
- Unified business reporting
- Inter-enterprise data acquisition
- App modernization
With data health metrics to prove the business value of data, an organization can improve nearly any aspect of its operations. But without healthy data, all of those processes go awry. You can’t address the right customers, shorten sales cycles, or improve processes if the available data you’re basing your work on is inaccurate, uncontrolled, or out of date. Unhealthy data costs companies time and quality in their decision-making, which adds costs and can negatively affect revenue. As you scale up to using big data, the health of the data becomes increasingly important. It is critical for companies working with big data to institute health metrics.
Measuring data health
Data quality is a major consideration for data health. The Data Management Association of the UK defines six dimensions for measuring data quality:
- Accuracy — The degree to which data correctly describes the real-world object or event being described
- Example: Are the calculations of employees’ wages based on their actual work hours?
- Completeness — The proportion of data stored in a dataset against the potential for 100%
- Example: Do address records contain data in all address fields necessary to get a postal mailing to its destination? Full postal code? Country name?
- Consistency — The absence of difference, when comparing two or more representations of a thing against a definition
- Example: Does one table contain data characterized as belonging to a particular division, even though that division has been eliminated after a reorganization?
- Timeliness — The degree to which data represents reality from the required point in time
- Example: If budget decisions are made based on sales statistics, how quickly is sales data made available to decision makers?
- Uniqueness — No item, or entity instance, is recorded more than once based upon how that thing is identified
- Example: When a system updates a record, can you be sure it isn’t creating a duplicate of the original record with more current information?
- Validity or conformity — The degree to which data conforms to the syntax (format, type, or range) of its definition
- Example: A street address of 1000 Data Way is valid (though not necessarily accurate), while an address of /03H8 Data Way is not.
Data teams must make their own assessments of the necessary level of data quality to qualify for data health — and they should be able to certify that level of quality to data users, so they in turn can be confident using the data. Remember, though, that data that is sound but not available or trusted is still not supporting business decisions. It isn’t healthy data.
Since data health is a measure of data’s value to the business, transparency and accessibility are as important as quality. If decision makers don’t have ready access to the data they need, the organization may as well not have that data. On the other hand, data privacy for personally identifiable information (PII) may apply. In those cases, it will be best to isolate some data from unprivileged users. A strong data governance technology platform that enlists relevant business experts as data stewards can help improve data accuracy and security alike.
At your organization, data health metrics may include additional factors such as reasonability and integrity. Whatever factors you include, the point is to be able to rely on your data to be useful across the enterprise. The higher you can rate your data across each of these dimensions, the healthier you can consider your data.
Data health assessment
Once you know what to measure, how do you go about assessing the well-being of your data?
A holistic data health system relies on universal metrics of data quality. With standard metrics, evaluation of data’s trustworthiness and actionability becomes possible. As described above, it is not enough for those preparing corporate data to know that the data meets quality standards. End users can only truly trust their decisions when they have metrics proving data quality.
Talend’s 2021 Data Health Survey revealed that less than half of executives are certain that their company even uses data quality standards. About a third of execs said there were no documented standards in place, and 19% more said they weren’t sure. When asked if they saw a need for universal, cross-industry data quality standards, 95% of executives agreed.
Given the volume of data your organization is probably managing through SaaS platforms, databases, and public-facing web servers, it’ll be impossible to have someone examine every record across all datasets. The best approach is to employ a data platform that includes both data integration and governance capabilities.
You can use the software both to get a reading on data health and to cure unhealthy data. Ideally, you should be able to get instant insight into what data you can trust and have tools to fix the data you can’t. The platform should address data health issues by offering self-service access, pervasive data quality tools, and comprehensive governance capabilities that span all data flows and data sources from end to end.
How healthy is your data?
Do you have confidence in your organization’s ability to deliver decision-ready data? Do you wonder about your data health statistics? Talend can help. Start with a free checkup: export a subset of your data and run it through the Talend Trust Assessor. This free service provides a rapid evaluation of the validity, completeness, and uniqueness of your data. If you just want to see how it works, try it with our sample dataset first.
Ready to get started with Talend?
More related articles
- What is data culture?
- What is data agility?
- What is data management?
- What is data trust?
- What is data value?
- A Customer 360° Data Hub: What it is and Why You Need it
- What is data reliability?
- Single Source of Truth
- What is data lifecycle management? Definition and framework
- How to develop the right data integration strategy for your organization