What is Data Obfuscation?

Data privacy has never been more of a concern than it is today. Breaches of private data continue to grow in frequency and magnitude. According to the Privacy Rights Clearinghouse's Chronology of Data Breaches, over 10 billion data records have been breached in more than 9,000 data breaches made public since 2005. Learn how data obfuscation could have prevented the disclosure of many of those records, even if the breaches were successful. 

Data obfuscation is a process to obscure the meaning of data. Organizations should take steps to obfuscate their data now, so in the event of a data breach, the data will be rendered useless and the organization will not be compromised. 

Data obfuscation methods 

If you ask ten people the definition of data obfuscation, you will get 12 different answers. That is because there are many different ways to obfuscate data, each designed for specific goals and purposes. Three of the most common techniques used to obfuscate data are encryption, tokenization, and data masking. 

 

Watch Scale Data Access with Data Masking now.
Watch Now

Encryption, tokenization, and data masking work in different ways. Encryption and tokenization are reversible in that the original value of the data can be derived from the obfuscated data. Data masking is irreversible if done correctly. Here are how the three main types of data obfuscation are different: 

  • Encryption is very secure, but you lose the ability to work with or analyze the data while it's encrypted. It is a good obfuscation method if you need to store or transfer data securely.
  • Tokenization substitutes specific data with a value that is meaningless. However, authorized users can link the token back to the original data. Operations can be done on tokenized data, such as payment processing without revealing the credit card number to a third party processor.
  • Data masking substitutes realistic but false data for the original to ensure privacy. It is used to allow for testing, training, application development, or support personnel to work with the data set without sharing sensitive data. 

The most common method of data obfuscation is data masking. Data masking permits you to make data public without revealing private information — an essential function in today’s world. The National Institutes of Health funded $700,000 of research to develop data masking technology for medical records because of the high need for it.

Benefits of data obfuscation

The most obvious and essential benefit of data obfuscation is that it hides data from those who are not authorized to see it. However, many other benefits can be yielded from proper data obfuscation, such as the reduction in scope to meet regulatory requirements, such as GDPR. This can result in huge savings and reduce the risk of breaches and fines. The manner in which the data is hidden yields other benefits as well.

Encryption is the most secure way to hide data. It is reversible, so the original can be retrieved, but only by those who have the keys necessary to decrypt it. Encryption is good for secure storage and transfer of data.

Tokenization allows you to replace sensitive data with substitute data. Once tokenized, you can provide the data set to a third party for processing without revealing the sensitive data. They can then process the data and return the resultant data set. You can then reverse the tokenized data to its original value.

Data masking also uses substitution, but it is irreversible. Eliminating the need to be reversible makes data masking very secure and less expensive than encryption. A unique benefit of data masking is that you can maintain data integrity so that testers or application developers see and use realistic, albeit false data.

How can fake data have data integrity? In this case, integrity does not mean accurate data, but rather that it behaves exactly the same as the original data. For example, a credit card number can be replaced by a different 16-digit numerical value that will pass the checksum for a valid credit card number. If it fails the checksum, it does not have data integrity. Any references to other fields must remain functional to maintain integrity, as well.

Data masking also benefits from being highly customizable. You can select which data fields get masked and exactly how each substitute value is selected and formatted. U.S. Social Security numbers have the format of nnn-nn-nnnn, where n is an integer from 0-9. You may select to substitute the first five digits with the letter x, substitute all 9 digits with random numbers, or make any other substitution. It depends on which best suits your needs.

Challenges of data obfuscation

Just as data obfuscation has its benefits, it also has its challenges. The biggest challenge is planning, as it can consume a lot of time and resources. Data management is an enterprise-wide effort, so data owners, data stewards from each department — if you are fortunate enough to have them — and recipients of your obfuscated data should be involved in planning any data obfuscation efforts. Even just selecting which data needs to be obfuscated will likely require more effort than you imagine.

Implementation can be a significant effort. Data masking's customizability is a great benefit. But with that comes the challenge of customizing each field to your specifications. You can offset that to some extent with a proper data masking tool.

Data obfuscation and the cloud

Data management can take advantage of cloud technologies to deliver data services faster and with better results than on-premises solutions. Cloud-native data solutions provide a variety of tools and services that can be used simply and improve data management. While cloud computing has proven to be as safe, if not safer than on-premises solutions, some still have security concerns with cloud-based services.

Download Cloud Data Warehouse Trends for 2019 now.
Download Now

Data obfuscation can mitigate these concerns. If data is obfuscated before being ingested into a cloud-native data repository, then even if that data is breached, it will be useless to the attacker. The stolen data contains only fake data substituted by data masking. Some cloud-native data services have data masking tools built-in, making it relatively easy to implement.

Data obfuscation best practices

Successful data obfuscation is best achieved by following these best practices. Measure twice, cut once — the old carpenter's adage applies just as well to data obfuscation planning. Make sure that you include these steps in your data obfuscation plan:

  • Get buy-in and support from your data owners, data stewards, and management
  • Identify sensitive data by collaborating with your organization's departmental data stewards
  • Include internal and external regulations, policies, and standards, such as GDPR, with which your organization must comply when identifying sensitive data
  • Determine the data masking techniques, rules, and formats for each piece of sensitive data. Organizing data into groups with common characteristics can simplify this process
  • Select a proper tool to automate as much as possible 

Download What is Data Obfuscation? now.
View Now

Unless there is a specific need for your obfuscation technique to be reversible, use irreversible data masking. It is the surest way to protect sensitive data. It is also less expensive than most reversible techniques. The resultant masked data set, if done correctly, will be as useful to your trainers, application developers, and testers as the original data.

For data masking to be done right, you must ensure that data integrity is maintained. Data integrity is essential for users of the resultant data set to be able to use the masked data as effectively as the original data. For example, suppose you are doing analysis on credit card usage. Specifically you need to determine how many credit cards in your data set are issued from each bank, based solely on the credit card numbers. The first six digits of a credit card number are the bank identifier number (BIN). To maintain integrity and protect sensitive data in this case, you should keep the first six digits and obfuscate the rest.

How to make data obfuscation work for you

Data obfuscation can be done in several ways. The most common use cases — testing, training, application development, and support — call for data masking, the irreversible substitution of the original sensitive data with realistic but fake data. Masked data can maintain integrity and may be customized to meet the needs of each specific use case.

While data masking has many benefits, it can consume a lot of time and resources. Use best practices to achieve successful data obfuscation and automate the process wherever possible.

Talend Data Fabric allows you to simplify the data masking process. As a comprehensive suite of apps focused on data integration and data integrity, Talend Data Fabric empowers companies to collect, govern, transform, and share trusted data all at the speed of business.

Learn how data masking enables you to reduce your regulatory footprint, realize savings, and reduce risk. Share quality data across your organization without exposing personally identifiable information by using Talend Data Fabric today for data you can trust.

 

 

 

 

 

 

 

| Last Updated: November 11th, 2019