What is data trust?
Can you trust your organization’s data?
Data health research conducted in 2021 shows that 60% of business executives don’t always trust their company’s data. More than a third still don’t base most of their decisions on data. This is a crisis for organizations across industries worldwide. How can a decision-maker who doesn’t trust the data trust their own decisions?
Let’s start by looking at what that data is. In recent years, the world has become increasingly data-driven, leaving organizations’ networks saturated with information. Some of the data comes in through SaaS and web applications. Other data comes from direct data entry — think web forms. Some data is unstructured, like social media posts. And more and more data comes from machines, such as smartphones and Internet of Things (IoT) devices. Estimates put the amount of data created annually in the zettabytes. A zettabyte is more than a billion terabytes. That is a ridiculous amount of data.
Manual data quality management simply can’t scale to those volumes. People make mistakes. Machines aren’t foolproof. Additionally, the data often passes through complex information systems coded by multiple developers. That raises the risk of buggy code introducing errors.
But what does it mean to trust your data?
Data trust definition
Data trust means having confidence that your organization’s data is healthy and ready to act on.
Trust is one of the keys to making successful use of your data. Combined with culture and agility, it leads organization to achieve data health. By ensuring trust in data across the organization and across departments, an organization provides its teams the ability to design exceptional customer experiences, improve operations, streamline decision-making, ensure compliance, and drive innovation. But data trust must be earned and quantified. It can’t be taken on faith. Before trusting your organization’s data, you should prove that it can produce reliable analytics to support well-informed business decisions.
Dimensions of data quality
How do you measure data trust? The Data Management Association of the UK defines six dimensions of data quality:
- Accuracy — The degree to which data correctly describes the real-world object or event in question
- Example: Say an accounting record uses the US date format MM/DD/YYYY. Data entered using the European DD/MM/YYYY format could lead to an invoice due May 8 not being paid until August 5.
- Completeness — The proportion of data stored against the potential for being 100% complete
- Example: Blank values indicate that certain data has not been populated. An address record with 300 rows and 12 missing postal codes would have usable data for 288 addresses, and a completeness rate of 288/300, or 96%.
- Consistency — The absence of difference when comparing two or more representations of an item against a definition
- Example: Do an organization’s HR, legal, and finance teams all use one date format, or would the same date appear as 11/12/2022, 12/11/22, and 22-NOV-12 in reports generated by different departments?
- Timeliness — The degree to which data is current enough to represent reality as needed to support business functions
- Example: In a field that represents company earnings, it's vital to have access to the latest data. What is the delay in providing that data — is it on the order of minutes, days, or weeks?
- Uniqueness — No item or entity instance is recorded more than once based upon how that item is identified
- Example: Duplication of a single customer's record based on multiple entries, such as A. Lee, Alan R. Lee, and Alan Lee appearing as three individuals with the same address and contact information.
- Validity or conformity — The degree to which data conforms to the syntax (format, type, or range) of its definition
- Example: A street address of 1000 Integration Drive is valid, though not necessarily accurate. A street address of H/*27 Integration Drive is not valid.
The more highly you can rate the data across each of these dimensions for all tables, records, and fields, the more you can trust it — and the more decision-ready your data will be. But just because certain records or datasets perform well in one dimension, that doesn’t necessarily mean they can be 100% trusted. As shown above, you might have information that’s valid but not accurate, or accurate but incomplete.
What matters most will vary depending on the business need. For example, finance teams require a particularly high level of accuracy, while other departments may place a premium on timeliness instead. Data teams must make their own assessments of the metrics that data should meet. They should also quantify that certification of data quality to data users. A combination of trust and transparency gives decision-makers confidence to use the data.
But bear in mind that data quality is only one aspect of data trust. Talend’s vision for trusted data also include factors such as tools to make data easy to find, improve, verify, and use, as well as self-service apps that put line-of-business data users in control of their own data. For example, is data is of high quality, but the people who need it don't have access, is that data really contributing to data trust? Whatever factors you include in your measure of trusted data, the point is to quantify how usable your data is across the enterprise: Is it decision-ready?
Data trust framework
To achieve data trust in a world drowning in so much data, organizations must implement and automate processes for auditing, assessing, and cleaning their data. But data trust can’t be accomplished through technology alone. Complete data trust solutions require data infrastructure that considers human processes along with software. It's necessary to create a data-centric culture that works in concert with data quality automation.
Infrastructure for data health will leverage the knowledge of line-of-business stakeholders to clean data as well as sophisticated tools and artificial intelligence that let data engineers perform complex operations without the need for coding expertise. In short, technology solutions that are chosen with people in mind. The right solution is one that will make it easier for everyone in the organization to work with data, share data, understand data, and trust the data.
Talend’s modular self-service apps and Trust Score remove skills-based barriers to trusting data across departments by involving lines of business in the preparation and quality control of their own data. Our cloud-native platform brings together data integration, data integrity, data stewardship, and data governance capabilities in a single, user-friendly environment. This platform is uniquely able to simplify every aspect of working with data across your entire data environment.
To provide a framework for data trust in any organization, Talend Data Fabric features the Talend Trust Score™, an industry-first innovation that assesses the reliability of any dataset. It makes trust tangible with standards that provide instant insight into how much you can trust your data. This data trust metric shows at a glance the extent to which your data meets the criteria of healthy data:
- Thorough — is the data clean, complete, and consistent across your systems?
- Transparent — is the data accessible and understandable?
- Timely — is the data up-to-date and readily available to the people who need it?
- Traceable — does the data tell you where it came from and how it has been used?
- Tested — has the data been rated and certified by other users?
With open access to complete, clean, trusted data, data end-users can make better, bolder decisions with confidence. Data science and analytics teams and citizen analysts get a complete picture of the business and can trust and verify the data they’re using for better insights, responsive strategic recommendations, and confident decisions. Among these other benefits, data trust even improves the relationship between business and IT departments.
Data trust case studies
To understand the importance of data trust, it helps to see it in action. The following case studies demonstrate use cases that are common across a range of organizations, from private companies to public institutions:
Beneva – Achieving data trust to better serve and retain three million customers
Canada’s largest mutual insurance company, Beneva (formerly SSQ Insurance) serves three million customers with a full range of insurance and investment products. As can happen after 75 years in business, the company found its data systems had grown too complex and siloed to use customer data effectively.
While financial and insurance clients expect a high level of personalization, employees could not see customer data across lines of business. “If you called about another product, it was as if we didn’t know you at all,” says Simon Latouche, Director of Data Engineering at Beneva.
To put healthy data at the heart of its business and improve data sharing, Beneva created a unified customer portal. It automatically registers customers’ operations, and Talend Data Quality and Data Stewardship make sure the data is trustworthy. Now employees have access to comprehensive, trusted customer data. As a result, call centers can help customers more efficiently and marketers can customize campaigns with predictive models. In fact, Beneva was able to increase customer win-back conversions by threefold.
Aeroporti Di Roma — Analyzing Data for 48.8 Million Travelers in Compliance with GDPR
Aeroporti Di Roma (ADR) manages and develops Roma Fiumicino (Leonardo da Vinci) and Ciampino airports. Nearly 100 airlines operate from these airports, carrying passengers to more than 230 destinations worldwide.
ADR knows how critical trusted data is to understanding and anticipating customer behaviors at speed. They also understand their responsibility to protect personal data for their customers. To improve data sharing while still ensuring safeguards for personal data, ADR and its partners built a Big Data Analytics platform using Cloudera for the data lake and Talend Big Data for the ingestion engine. Pietro Caminiti, Head of IT Solutions for Aeroporti Di Roma SpA, reports excellent results: "With Talend, we can analyze large data volumes in order to extract strategic information through advanced statistical algorithms while also complying with the General Data Protection Regulation (GDPR) standards."
"We have improved our 48.8 million passengers’ experience and operation’s efficiency,” says Pietro Caminiti. “And we have been recognized as Europe’s number one airport with over 40 million passengers, according to ACI World’s globally-established Airport Service Quality programme.”
Wolters Kluwer Health — Paving the way for healthier business decisions
Wolters Kluwer Health provides professional information, services, and solutions for the healthcare industry. When the company’s appetite for trusted data threatened to overwhelm the capacity of the business intelligence (BI) staff, they launched an innovative “Citizen Analyst” initiative to democratize the use of data.
Talend had the combination of modularity, scalability, simplicity, cost efficiency, and support for extremely high data quality that Wolters Kluwer Health needed to achieve its immediate goals and long-term vision for the initiative. The shift to Talend has saved millions of dollars and also allowed the BI team to use advanced, predictive analytics and AI to find new patterns in data that facilitate better decision making.
The Citizen Analyst initiative has helped non-technical staff do their own data analysis using simple interfaces, easy-to-use tools and high-quality data — all integrated by Talend — to drive better patient care and healthier business decisions.
“We’re creating a culture of curiosity,” explains Kevin Ryan, Director of Business Intelligence at Wolters Kluwer Health. “That’s a cultural shift, but it’s a change that people are embracing because everyone wins. Product teams get insights sooner, they are less dependent on the BI team, they can share results with business leaders and get buy-in faster, and ultimately the outcome is better products and services that benefit doctors and patients.”
Try our data trust solutions
With trustworthy data, everyone in the organization benefits. They gain the confidence of basing decisions on a complete, accurate, and timely picture of the real world. When you make decisions using trusted data, you’re likely making decisions that will lead to better outcomes, higher revenues, and more growth.
Do you have confidence in the quality of your organization’s data? Do you wonder how accurate, complete, and timely it is? Talend can help. Export a subset of your data and run it through the Talend Trust Assessor. This free tool gives you access to our Trust Score™ technology. You’ll get a rapid report with feedback on the validity, completeness, and uniqueness of your data. You can also try it with our sample dataset just to see how it works. Think of it as the first step on your data health journey.
Ready to get started with Talend?
More related articles
- What is data culture?
- What is data agility?
- What is data management?
- What is data value?
- A Customer 360° Data Hub: What it is and Why You Need it
- What is data reliability?
- Single Source of Truth
- What is data lifecycle management? Definition and framework
- How to develop the right data integration strategy for your organization
- What is data health?