Can you trust your organization’s data?
Data health research conducted in 2021 shows that 60% of business executives don’t always trust their company’s data. More than a third still don’t base most of their decisions on data. This is a crisis for organizations across industries worldwide. How can a decision-maker who doesn’t trust the data trust their own decisions?
Let’s start by looking at what that data is. As the world becomes increasingly data-driven, organizations’ networks have become saturated with information. Some of the data comes in through SaaS and web applications. Other data comes directly from people’s data entry — think web forms. More and more data comes from machines, such as smartphones and Internet of Things (IoT) devices. Estimates put the amount of data created annually in the zettabytes. A zettabyte is more than a billion terabytes. That is a ridiculous amount of data.
Manual data quality management simply can’t scale to those volumes. People make mistakes. Machines aren’t foolproof. Additionally, the data often passes through complex information systems coded by multiple developers. That raises the risk of buggy code introducing errors.
But what does it mean to trust your data?
Data trust definition
Data trust means having confidence that your organization’s data is healthy and ready to act on.
Trust is the key to making successful use of your data. By ensuring trust in corporate data, an organization provides its teams the ability to design exceptional customer experiences, improve operations, ensure compliance, and drive innovation. But data trust must be earned and quantified. It can’t be taken on faith. Before trusting corporate data, you should prove that it can produce reliable analytics to support well-informed business decisions.
How do you measure data trust? The Data Management Association of the UK defines six dimensions of data quality:
- Accuracy — The degree to which data correctly describes the real-world object or event in question
- Example: Say an accounting record uses the US date format MM/DD/YYYY. Data entered using the European DD/MM/YYYY format could lead to an invoice due May 8th not being paid until August 5th
- Completeness — The proportion of data stored against the potential for being 100% complete
- Example: Blank values indicate that certain data has not been populated. An address record with 300 rows and 12 missing postal codes would have usable data for 288 addresses, and a completeness rate of 288/300, or 96%.
- Consistency — The absence of difference when comparing two or more representations of an item against a definition
- Example: Do an organization’s HR, legal, and finance teams all use one date format, or would the same date appear as 11/12/2022, 12/11/22, and 22-NOV-12 in reports generated by different departments?
- Timeliness — The degree to which data is current enough to represent reality as needed to support business functions
- Example: In a field that represents company earnings, it’s vital to have access to the latest data. What is the delay in providing that data — is it on the order of minutes, days, or weeks?
- Uniqueness — No item, or entity instance, is recorded more than once based upon how that item is identified
- Example: Duplication of a single customer’s record based on multiple entries, such as A. Lee, Alan R. Lee, and Alan Lee appearing as three individuals with the same address and contact information.
- Validity or conformity — The degree to which data conforms to the syntax (format, type, or range) of its definition
- Example: A street address of 1000 Integration Drive is valid, though not necessarily accurate. A street address of H/*27 Integration Drive is not valid.
Bear in mind that data quality is only one dimension of data trust. Analysts also include factors such as reasonability, accessibility, and integrity as important ways to measure organizational data trust. Whatever factors you include, the point is to quantify how usable your data is across the enterprise. Is it decision-ready?
The more highly you can rate the data across each of these dimensions for all tables, records, and fields, the more you can trust it — and the more decision-ready your data will be. Data that performs well in one dimension can’t necessarily be 100% trusted. As shown above, you might have information that’s valid but not accurate, or accurate but incomplete. It could also be high-quality, but inaccessible.
What matters most will vary depending on the business need. For example, finance teams require a particularly high level of accuracy, while other departments may place a premium on timeliness instead. Data teams must make their own assessments of the metrics that trusted data should meet. They should also quantify that certification of data trust to data users. A combination of trust and transparency gives decision-makers confidence to use the data.
Data trust framework
To achieve data trust in a world drowning in so much data, organizations must implement and automate processes for auditing, assessing, and cleaning their data. But data trust can’t be accomplished with technology alone. Complete data trust solutions require data infrastructure that considers human processes along with software. It’s necessary to create a data-centric culture that works in concert with data quality automation.
Data health is Talend’s concept of a holistic system that actively promotes data quality to ensure data trust. Talend’s vision of data health sees people and technology working together across the data life cycle:
- Preventative measures to identify and resolve data problems across the organization
- Effective treatments to combine automated and manual processes to cure problems
- Cultural support to document and embrace cooperative monitoring of the data
Infrastructure for data health will leverage the knowledge of line-of-business users to clean data as well as sophisticated tools that let data engineers perform complex operations without the need for coding expertise. In short, technology solutions chosen with people in mind. The right solution is one that will make it easier for everyone in the organization to work with, understand, and trust the data.
Talend Data Fabric is the only cloud-native platform that brings together data integration, data integrity, and data governance capabilities in a single, user-friendly environment. This platform is uniquely able to simplify every aspect of working with data across your entire data landscape.
To provide a framework for data trust in any organization, Talend Data Fabric features the Talend Trust Score™, an industry-first innovation that assesses the reliability of any dataset. It makes trust tangible with standards that provide instant insight into how much you can trust your data. This data trust metric shows at a glance the extent to which your data is:
- thorough — is the data clean, complete, and consistent across your systems?
- transparent — is the data accessible and understandable?
- timely — is the data up-to-date and readily available to the people who need it?
- traceable — does the data tell you where it came from and how it has been used?
- tested — has the data been rated and certified by other users?
With open access to complete, clean, trusted data, data end-users can make better, bolder decisions with confidence. Among other benefits, data trust improves the relationship between business and IT departments.
Data trust examples
SSQ Insurance –Achieving Data Trust to Better Serve and Retain Three Million Customers
Canada’s largest mutual insurance company, SSQ Insurance serves three million customers with a full range of insurance and investment products. As can happen after 75 years in business, the company found its data systems had grown too complex and siloed to use customer data effectively. While financial and insurance clients expect a high level of personalization, employees could not see customer data across lines of business. “If you called about another product, it was as if we didn’t know you at all” says Simon Latouche, Director of Data Engineering at SSQ Insurance. In order to put healthy data at the heart of its business. SSQ Insurance created a unified customer portal. It automatically registers customers’ operations, and Talend Data Quality and Data Stewardship make sure the data is trustworthy. Now employees have access to comprehensive, trusted customer data. As a result, call centers can help customers more efficiently and marketers can customize campaigns with predictive models. In fact, SSQ Insurance was able increase customer win-back conversions by threefold.
Aeroporti Di Roma —Analyzing Data for 48.8 Million Travelersin Compliance with GDPR
Aeroporti Di Roma (ADR) manages and develops Roma Fiumicino (Leonardo da Vinci) and Ciampino airports. Nearly 100 airlines operate from these airports, carrying passengers to more than 230 destinations worldwide. ADR knows how critical trusted data is to understanding and anticipating customer behaviors at speed. ADR also understands its responsibility to protect customer data. That’s why ADR and its partners built a Big Data Analytics platform using Cloudera for the data lake and Talend Big Data for the ingestion engine. Pietro Caminiti, Head of IT Solutions for Aeroporti Di Roma SpA, reports excellent results. “With Talend, we can analyze large data volumes in order to extract strategic information through advanced statistical algorithms while also complying with the General Data Protection Regulation (GDPR) standards.” “We have improved our 48.8 million passengers’ experience and operation’s efficiency,” says Pietro Caminiti. “And we have been recognized as Europe’s number one airport over 40 million passengers, according to ACI World’s globally-established Airport Service Quality programme.”
Try our data trust solutions
With trustworthy data, everyone in the organization benefits. They gain the confidence of basing decisions on a complete, accurate, and timely picture of the real world. When you make decisions using trusted data, you’re likely making decisions that will lead to better outcomes, higher revenues, and more growth.
Do you have confidence in the quality of your organization’s data? Do you wonder how accurate, complete, and timely it is? Talend can help. Export a subset of your data and run it through the Talend Trust Assessor. This free tool gives you access to our Trust Score™ technology. You’ll get a rapid report with feedback on the validity, completeness, and uniqueness of your data. You can also try it with our sample dataset just to see how it works. Think of it as the first step on your data health journey.