What is Data Integrity and Why Is It Important?
Imagine this: A pharmaceutical company touts the safety of its new wonder drug. But when the FDA inspects the offshore production facility, work is halted immediately; important quality control data is missing. Unfortunately, this real-life example of compromised data integrity isn’t uncommon. Problems with the accuracy of data and consistency of data exist across all industries and can cause everything from minor hassles to significant business problems.
In this era of big data, when more pieces of information are processed and stored than ever, data health has become a pressing issue — and implementing measures that preserve the integrity of the data that’s collected is increasingly important. Understanding the fundamentals of data integrity and how it works is the first step in keeping data safe. Read on to learn what data integrity is, why it’s essential, and what you can do to keep your data healthy.
What is data integrity?
Data integrity is the overall accuracy, completeness, and consistency of data. Data integrity also refers to the safety of data in regard to regulatory compliance — such as GDPR compliance — and security. It is maintained by a collection of processes, rules, and standards implemented during the design phase. When the integrity of data is secure, the information stored in a database will remain complete, accurate, and reliable no matter how long it’s stored or how often it’s accessed.
The importance of data integrity in protecting yourself from data loss or a data leak cannot be overstated: in order to keep your data safe from outside forces with malicious intent, you must first ensure that internal users are handling data correctly. By implementing the appropriate data validation and error checking, you can ensure that sensitive data is never miscategorized or stored incorrectly, thus exposing you to potential risk.
Types of data integrity
Maintaining data integrity requires an understanding of the two types of data integrity: physical integrity and logical integrity. Both are collections of processes and methods that enforce data integrity in both hierarchical and relational databases.
Physical integrity is the protection of the wholeness and accuracy of that data as it’s stored and retrieved. When natural disasters strike, power goes out, or hackers disrupt database functions, physical integrity is compromised. Human error, storage erosion, and a host of other issues can also make it impossible for data processing managers, system programmers, applications programmers, and internal auditors to obtain accurate data.
Logical integrity keeps data unchanged as it’s used in different ways in a relational database. Logical integrity protects data from human error and hackers as well, but in a much different way than physical integrity does. There are four types of logical integrity:
- Entity integrityEntity integrity relies on the creation of primary keys — the unique values that identify pieces of data — to ensure that data isn’t listed more than once and that no field in a table is null. It’s a feature of relational systems which store data in tables that can be linked and used in a variety of ways.
- Referential integrityReferential integrity refers to the series of processes that make sure data is stored and used uniformly. Rules embedded into the database’s structure about how foreign keys are used ensure that only appropriate changes, additions, or deletions of data occur. Rules may include constraints that eliminate the entry of duplicate data, guarantee that data entry is accurate, and/or disallow the entry of data that doesn’t apply.
- Domain integrityDomain integrity is the collection of processes that ensure the accuracy of each piece of data in a domain. In this context, a domain is a set of acceptable values that a column is allowed to contain. It can include constraints and other measures that limit the format, type, and amount of data entered.
- User-defined integrityUser-defined integrity involves the rules and constraints created by the user to fit their particular needs. Sometimes entity, referential, and domain integrity aren’t enough to safeguard data. Often, specific business rules must be taken into account and incorporated into data integrity measures.
What data integrity isn’t
With so much talk about data integrity, it’s easy for its true meaning to be muddled. Often data security and data quality are incorrectly substituted for data integrity, but each term has a distinct meaning.
Data integrity is not data security
Data security is the collection of measures taken to keep data from getting corrupted. It incorporates the use of systems, processes, and procedures that restrict unauthorized access and keep data inaccessible to others who may use it in harmful or unintended ways. Breaches in data security may be small and easy to contain or large and capable of causing significant damage.
While data integrity is concerned with keeping information intact and accurate for the entirety of its existence, the goal of data security is to protect information from outside attacks. Data security is but one of the many facets of data integrity. Data security is not broad enough to include the many processes necessary for keeping data unchanged over time.
Data integrity is not data quality
Does the data in your database meet company-defined standards and the needs of your business? Data quality answers these questions with an assortment of processes that measure your data’s age, relevance, accuracy, completeness, and reliability.
Much like data security, data quality is only a part of data integrity, but a crucial one. Data integrity encompasses every aspect of data quality and goes further by implementing an assortment of rules and processes that govern how data is entered, stored, transferred, and much more.
Data integrity and GDPR compliance
Data integrity is key to complying with data protection regulations like GDPR. Non-compliance with these regulations can make companies liable for large penalties. In some instances, they may be sued on top of these significant fees. Repeated compliance violations can even put companies out of business.
Fortunately, there are ways to ensure the data integrity you need to comply with GDPR and other data protection legislation. Take a look at our series, Practical Steps to GDPR Compliance.
Data integrity risks
An assortment of factors can affect the integrity of the data stored in a database. A few examples include the following:
- Human error: When individuals enter information incorrectly, duplicate or delete data, don’t follow the appropriate protocols, or make mistakes during the implementation of procedures meant to safeguard information, data integrity is put in jeopardy.
- Transfer errors: When data can’t successfully transfer from one location in a database to another, a transfer error has occurred. Transfer errors happen when a piece of data is present in the destination table, but not in the source table in a relational database.
- Bugs and viruses: Spyware, malware, and viruses are pieces of software that can invade a computer and alter, delete, or steal data.
- Compromised hardware: Sudden computer or server crashes, and problems with how a computer or other device functions are examples of significant failures and may be indications that your hardware is compromised. Compromised hardware may render data incorrectly or incompletely, limit or eliminate access to data, or make information hard to use.
Risks to data integrity can easily be minimized or eliminated by doing the following:
- Limiting access to data and changing permissions to restrict changes to information by unauthorized parties
- Validating data to make sure it’s correct both when it’s gathered and when it’s used
- Backing up data
- Using logs to keep track of when data is added, modified, or deleted
- Conducting regular internal audits
- Using error detection software
Getting started with data integrity
Protecting the integrity of your company’s data using traditional methods can seem like an overwhelming task. Secure, cloud-based data integration platforms offer a modern alternative that provide a real-time view of all of your data. With industry-leading cloud integration tools, you can connect multiple source data applications and get access to all of your company’s data in one location.
Take a look at the Definitive Guide to Data Governance to find out how to establish a framework for data integrity.
Ready to get started with Talend?
More related articles
- What is Data Profiling?
- What is Data Quality? Definition, Examples, and Tools
- What is Data Quality Management?
- What is Data Redundancy?
- What is data synchronization and why is it important?
- 8 Ways to Reduce Data Integrity Risk
- 10 Best Practices for Successful Data Quality
- Data Quality Analysis
- Data Quality and Machine Learning: What’s the Connection?
- Data Quality Software
- Data Quality Tools - Why the Cloud is the Cure for Dirty Data
- How to Choose a Big Data Quality Model
- How to Choose the Right Data Quality Tools
- The Value of Data Quality in Healthcare
- Using Machine Learning for Data Quality