Imagine this: A pharmaceutical company touts the safety of its new wonder drug. But when the FDA inspects the offshore production facility, work is halted immediately; important quality control data is missing. Unfortunately, this real-life example of compromised data integrity isn’t uncommon. Problems with the accuracy and consistency of data exist across all industries and can cause everything from minor hassles to significant business problems.
In this era of big data, when more pieces of information are processed and stored than ever, implementing measures that preserve the integrity of the data that’s collected is increasingly important. Understanding the fundamentals of data integrity and how it works is the first step in keeping data safe. Read on to learn what data integrity is, why it’s essential, and what you can do to keep your data intact.
What is data integrity?
Data integrity is the overall accuracy, completeness, and consistency of data. Data integrity also refers to the safety of data in regards to regulatory compliance — such as GDPR compliance — and security. It is maintained by a collection of processes, rules, and standards implemented during the design phase. When the integrity of data is secure, the information stored in a database will remain complete, accurate, and reliable no matter how long it’s stored or how often it’s accessed. Data integrity also ensures that your data is safe from any outside forces.
Debunking Data Quality Myths now.
Types of data integrity
There are two types of data integrity: physical integrity and logical integrity. Both are a collection of processes and methods that enforce data integrity in both hierarchical and relational databases.
Physical integrity is the protection of data’s wholeness and accuracy as it’s stored and retrieved. When natural disasters strike, power goes out, or hackers disrupt database functions, physical integrity is compromised. Human error, storage erosion, and a host of other issues can also make it impossible for data processing managers, system programmers, applications programmers, and internal auditors to obtain accurate data.
Logical integrity keeps data unchanged as it’s used in different ways in a relational database. Logical integrity protects data from human error and hackers as well, but in a much different way than physical integrity does. There are four types of logical integrity.
Entity integrity relies on the creation of primary keys, or unique values that identify pieces of data, to ensure that data isn't listed more than once and that no field in a table is null. It's a feature of relational systems which store data in tables that can be linked and used in a variety of ways.
Referential integrity refers to the series of processes that make sure data is stored and used uniformly. Rules embedded into the database’s structure about how foreign keys are used ensure that only appropriate changes, additions, or deletions of data occur. Rules may include constraints that eliminate the entry of duplicate data, guarantee that data is accurate, and/or disallow the entry of data that doesn’t apply.
Domain integrity is the collection of processes that ensure the accuracy of each piece of data in a domain. In this context, a domain is a set of acceptable values that a column is allowed to contain. It can include constraints and other measures that limit the format, type, and amount of data entered.
User-defined integrity involves the rules and constraints created by the user to fit their particular needs. Sometimes entity, referential, and domain integrity aren’t enough to safeguard data. Often, specific business rules must be taken into account and incorporated into data integrity measures.
What data integrity isn’t
With so much talk about data integrity, it’s easy for the true meaning to be muddled. Often data security and data quality are incorrectly substituted for data integrity, but each term has a distinct meaning.
Data integrity is not data security
Data security is the collection of measures taken to keep data from getting corrupted. It incorporates the use of systems, processes, and procedures that keep data inaccessible to others who may use it in harmful or unintended ways. Breaches in data security may be small and easy to contain or large and cause significant damage.
While data integrity is concerned with keeping information intact and accurate for the entirety of its existence, the goal of data security is to protect information from outside attacks. Data security is but one of the many facets of data integrity. Data security is not broad enough to include the many processes necessary for keeping data unchanged over time.
Data integrity is not data quality
Does the data in your database meet company defined standards and the needs of your business? Data quality answers these questions with an assortment of processes that measure your data’s age, relevance, accuracy, completeness, and reliability.
Much like data security, data quality is only a part of data integrity, but a crucial one. Data integrity encompasses every aspect of data quality and goes further by implementing an assortment of rules and processes that govern how data are entered, stored, transferred, and much more.
The Definitive Guide to Data Quality now.
Data integrity and GDPR compliance
Data integrity is key to complying with data protection regulations like GDPR. Non-compliance with these regulations can make companies liable for large penalties. In some instances, they may be sued on top of these significant fees. Repeated compliance violations can even put companies out of business.
Fortunately, there are ways to ensure the data integrity you need to comply with GDPR and other data protection legislation. Take a look at our series Practical Steps to GDPR Compliance.
Data integrity risks
There is an assortment of factors that can affect the integrity of the data stored in a database. A few examples include:
- Human error: When individuals enter information incorrectly, duplicate or delete data, don’t follow the appropriate protocol, or make mistakes during the implementation of procedures meant to safeguard information, data integrity is put in jeopardy.
- Transfer errors: When data can’t successfully transfer from one location in a database to another, a transfer error has occurred. Transfer errors happen when a piece of data is present in the destination table, but not in the source table in a relational database.
- Bugs and viruses: Spyware, malware, and viruses are pieces of software that can invade a computer and alter, delete, or steal data.
- Compromised hardware: Sudden computer or server crashes, and problems with how a computer or other device functions, are examples of significant failures and may be indications that your hardware is compromised. Compromised hardware may render data incorrectly or incompletely, limit or eliminate access to data, or make information hard to use.
Risks to data integrity can easily be minimized or eliminated by doing the following:
- Limiting access to data and changing permissions to restrict changes to information by unauthorized parties
- Validating data to make sure it’s correct both when it’s gathered and used
- Backing up data
- Using logs to keep track of when data is added, modified, or deleted
- Conducting regular internal audits
- Using error detection software
What is Data Integrity and Why Is It Important? now.
Getting started with data integrity
Protecting the integrity of your company’s data using traditional methods can seem like an overwhelming task. Secure, cloud-based data integration platforms offer a modern alternative that provide a real-time view of all of your data. With industry-leading cloud integration tools, you can connect multiple source data applications and get access to all of your company’s data in one location.
Take a look at the Definitive Guide to Data Governance to find out how to establish a framework for data integrity.