Can you trust your organization’s data?
Data health research conducted in 2021 shows that 60% of business executives don’t always trust their company’s data. More than a third still don’t base most of their decisions on data. This is a crisis for organizations across industries worldwide. How can a decision-maker who doesn’t trust the data trust their own decisions?
Let’s start by looking at what that data is. As the world becomes increasingly data-driven, organizations’ networks have become saturated with information. Some of the data comes in through SaaS and web applications. Other data comes directly from people’s data entry — think web forms. More and more data comes from machines, such as smartphones and Internet of Things (IoT) devices. Estimates put the amount of data created annually in the zettabytes. A zettabyte is more than a billion terabytes. That is a ridiculous amount of data.
Manual data quality management simply can’t scale to those volumes. People make mistakes. Machines aren’t foolproof. Additionally, the data often passes through complex information systems coded by multiple developers. That raises the risk of buggy code introducing errors.
But what does it mean to trust your data?
Data trust definition
Data trust means having confidence that your organization’s data is healthy and ready to act on.
Trust is the key to making successful use of your data. By ensuring trust in corporate data, an organization provides its teams the ability to design exceptional customer experiences, improve operations, ensure compliance, and drive innovation. But data trust must be earned and quantified. It can’t be taken on faith. Before trusting corporate data, you should prove that it can produce reliable analytics to support well-informed business decisions.
How do you measure data trust? The Data Management Association of the UK defines six dimensions of data quality:
- Accuracy — The degree to which data correctly describes the real-world object or event in question
- Example: Say an accounting record uses the US date format MM/DD/YYYY. Data entered using the European DD/MM/YYYY format could lead to an invoice due May 8th not being paid until August 5th
- Completeness — The proportion of data stored against the potential for being 100% complete
- Example: Blank values indicate that certain data has not been populated. An address record with 300 rows and 12 missing postal codes would have usable data for 288 addresses, and a completeness rate of 288/300, or 96%.
- Consistency — The absence of difference when comparing two or more representations of an item against a definition
- Example: Do an organization’s HR, legal, and finance teams all use one date format, or would the same date appear as 11/12/2022, 12/11/22, and 22-NOV-12 in reports generated by different departments?
- Timeliness — The degree to which data is current enough to represent reality as needed to support business functions
- Example: In a field that represents company earnings, it’s vital to have access to the latest data. What is the delay in providing that data — is it on the order of minutes, days, or weeks?
- Uniqueness — No item, or entity instance, is recorded more than once based upon how that item is identified
- Example: Duplication of a single customer’s record based on multiple entries, such as A. Lee, Alan R. Lee, and Alan Lee appearing as three individuals with the same address and contact information.
- Validity or conformity — The degree to which data conforms to the syntax (format, type, or range) of its definition
- Example: A street address of 1000 Integration Drive is valid, though not necessarily accurate. A street address of H/*27 Integration Drive is not valid.
Bear in mind that data quality is only one dimension of data trust. Analysts also include factors such as reasonability, accessibility, and integrity as important ways to measure organizational data trust. Whatever factors you include, the point is to quantify how usable your data is across the enterprise. Is it decision-ready?
The more highly you can rate the data across each of these dimensions for all tables, records, and fields, the more you can trust it — and the more decision-ready your data will be. Data that performs well in one dimension can’t necessarily be 100% trusted. As shown above, you might have information that’s valid but not accurate, or accurate but incomplete. It could also be high-quality, but inaccessible.
What matters most will vary depending on the business need. For example, finance teams require a particularly high level of accuracy, while other departments may place a premium on timeliness instead. Data teams must make their own assessments of the metrics that trusted data should meet. They should also quantify that certification of data trust to data users. A combination of trust and transparency gives decision-makers confidence to use the data.
Data trust framework
To achieve data trust in a world drowning in so much data, organizations must implement and automate processes for auditing, assessing, and cleaning their data. But data trust can’t be accomplished with technology alone. Complete data trust solutions require data infrastructure that considers human processes along with software. It’s necessary to create a data-centric culture that works in concert with data quality automation.
Data health is Talend’s concept of a holistic system that actively promotes data quality to ensure data trust. Talend’s vision of data health sees people and technology working together across the data life cycle:
- Preventative measures to identify and resolve data problems across the organization
- Effective treatments to combine automated and manual processes to cure problems
- Cultural support to document and embrace cooperative monitoring of the data
Infrastructure for data health will leverage the knowledge of line-of-business users to clean data as well as sophisticated tools that let data engineers perform complex operations without the need for coding expertise. In short, technology solutions chosen with people in mind. The right solution is one that will make it easier for everyone in the organization to work with, understand, and trust the data.
Talend Data Fabric is the only cloud-native platform that brings together data integration, data integrity, and data governance capabilities in a single, user-friendly environment. This platform is uniquely able to simplify every aspect of working with data across your entire data landscape.
To provide a framework for data trust in any organization, Talend Data Fabric features the Talend Trust Score™, an industry-first innovation that assesses the reliability of any dataset. It makes trust tangible with standards that provide instant insight into how much you can trust your data. This data trust metric shows at a glance the extent to which your data is:
- thorough — is the data clean, complete, and consistent across your systems?
- transparent — is the data accessible and understandable?
- timely — is the data up-to-date and readily available to the people who need it?
- traceable — does the data tell you where it came from and how it has been used?
- tested — has the data been rated and certified by other users?
With open access to complete, clean, trusted data, data end-users can make better, bolder decisions with confidence. Among other benefits, data trust improves the relationship between business and IT departments.
Data trust examples
Clothing manufacturer Carhartt deduplicated 50,000 consumer records in their first six hours with Talend. As discussed above, uniqueness of data is a key aspect of data quality, and a prerequisite for data trust — but that was just the beginning.
“We’ve now got one consolidated record containing all the important information on a consumer or customer, and that’s going to help us better manage our mix of channels and use the best one to communicate with specific purchasers,” says Steve Brennan, Vice President of Data Strategy and Analytics. “We’ll be able to make it easier, for example, for a consumer to buy something online and return it to a store.”
Trusted data is also making it easier for organizations to comply with data privacy regulations such as GDPR and CCPA. Companies hold all kinds of customer information, with sources ranging from emails and call center logs to customer reviews and preferences entered into loyalty programs.
Global hotel group Accor created a company data culture to prioritize respect for customers’ personal data. “When the General Data Protection Regulation came into effect, we saw it as an opportunity to bring in more transparency in our interactions with customers,” says Maud Bailly, Chief Digital Officer at Accor.
By implementing new quality controls and governing access to trusted customer data with Talend, Accor reduced its data access request response time from the legally mandated 30 days to just five. Beyond the legal implications, speedier response time strengthens customer relationships. “Through our quick responses, we also show that we are a company with efficient processes and that we take the rights of our customers very seriously,” says Accor’s Data Protection Officer Thomas Elm.
Thousands more organizations around the world achieve data trust with Talend. Using a single, comprehensive system providing data integration, data integrity, and data governance features takes the work out of working with data. Decision-makers gain open access to the data they need and metrics to trust that data. They become better equipped to respond to their customers’ needs, which drives improved customer satisfaction.
Try our data trust solutions
With trustworthy data, everyone in the organization benefits. They gain the confidence of basing decisions on a complete, accurate, and timely picture of the real world. When you make decisions using trusted data, you’re likely making decisions that will lead to better outcomes, higher revenues, and more growth.
Do you have confidence in the quality of your organization’s data? Do you wonder how accurate, complete, and timely it is? Talend can help. Export a subset of your data and run it through the Talend Trust Assessor. This free tool gives you access to our Trust Score™ technology. You’ll get a rapid report with feedback on the validity, completeness, and uniqueness of your data. You can also try it with our sample dataset just to see how it works. Think of it as the first step on your data health journey.