Data Conversion 101: Improving Database Accuracy

In this digital era, businesses are faced with the complex challenge of managing a vast array of data generated by multiple applications, devices, and operating systems. The first step in meeting this daunting challenge? Data conversion. Without this simple yet essential step, organizations would have an abundance of useless data and miss out on the opportunity to gain valuable business insights on customer behavior, operations, and trends.

In this article, we define “data conversion” and show you how this critical step can help you attain data integration goals.

What is data conversion?

Data conversion is the process of translating data from one format to another. While the concept itself may seem simple, data conversion is a critical step in the process of data integration. This step enables the data to be read, altered, and executed in an application or database other than that in which it was created.

For instance, if your network uses different techniques for storing data (especially numeric values) or if your business needs to communicate between networks or users who view data in different character/symbol sets, you need a process for data conversion.

A typical example of data conversion is the way in which many of us consume entertainment. In order to play a music or video file on a mobile device, the file needs to be converted from its source format (such as an MKV file) to one that can be read by the device (such as an MP4 file).

The goal of data conversion is to prevent data loss or corruption by maintaining the integrity of the data and embedded structures. This can be done easily if the destination format supports the same features and data structures as the source data. If the source formatting is not supported, however, businesses must correctly and comprehensively convert the format and structure to read, modify, and analyze data.

What data conversion is not

Data conversion is often confused with processes known as data migration, data transformation, and data cleansing. Let’s take a look to clarify these processes.

  • Data migration: Where data conversion translates individual computer objects and data types from one format to another, data migration transfers entire databases or programs from one location to another. Data migration often entails data conversion, data transformation, and/or data cleansing.
  • Data transformation: Data conversion translates one format to another. An example would be converting an RTF file to a Word file. Data transformation changes the data presentation. A common data transformation process is to condense the data as shown in this example. Note: the format itself does not change. Transform foo ("string-A", 77, kCommon);bar (Obj-W, Obj-X);foo ("string-B", 23, kCommon);bar (Obj-Y, Obj-Z); Tofoobar (“string-A”, 77, Obj-W, Obj-X);foobar ("string-B", 23, Obj-Y, Obj-Z);
  • Data cleansing: Data cleansing finds and corrects inaccurate, repeated, and incomplete data. This procedure often occurs after a data conversion, data transformation, or data migration process.

Types of data that can be converted

The first step to data conversion is to understand the different types of data you can convert. All programmatic languages rely on data types that tell the compiler or interpreter how to use the data. The data type determines the operations that can be performed on the data and defines the structure in which the data is to be stored. Most data types (both primitive and composite) can be converted. Here are  just a few of many examples:

  • Compiler languages (C language versus Java, for example)
  • Code pages (character/symbol sets) that are language specific (English versus Spanish, for example)
  • Code pages that are specific to operating systems (ASCII versus EBCDIC, for example)
  • Document types, including different text, audio, and video file formats

Accurate, complete data conversion is essential for applications used by professionals who are dependent on available data. To understand the different types of data that can be converted, let’s look at these real-life applications.

  • The healthcare industry relies on a high level of quality data historical conversion for ensuring accurate health records. This industry frequently uses data conversion since it transitions between electronic medical record systems.
  • Telecommunications and networking companies rely on vendor-agnostic input and output that can only be achieved with data conversion.
  • The scientific community often researches and merges findings of complex studies conducted in dissimilar formats.
  • Insurance companies make frequent use of data conversions, particularly to read and manage documents that are shared in different formats. Data must be compatible to flow freely across the industry so the claim process runs as efficiently as possible.

How data conversion works

Data conversion can be a complex process, though it doesn’t have to be. Tools for automating the process can improve both the accuracy and completeness of the converted data, while reducing development time. The basic steps that most data conversions incorporate are as follows:

  1. A comprehensive plan is developed based on user requirements.
  2. The character/symbol set is extracted from its source.
  3. That source data is converted to the format of the destination.
  4. The data is reviewed and loaded to the target system.

These basic steps vary based on several different factors. One important factor of data conversion is whether the source data type is converted to another data type or whether it is only reinterpreted as another data type.

Another aspect is whether it is implicit or explicit. With implicit conversion, a compiler automatically performs the conversion. This process is done by comparing one data type to another and then assigning the source data type to the proper destination data type.

With explicit conversion, objects and data types are converted in one of three ways:

  • A runtime check is performed prior to the conversion to determine if the destination data type can hold the source value. If it cannot, an error occurs.
  • No checks are performed. If the destination data type cannot hold the source value, no error occurs. Rather, the resulting data type is left undefined.
  • The raw bit pattern is copied without data being interpreted.

Each programming language uses its unique set of instructions for how the data types are converted. Strong-typing languages, which have stricter rules at compile time, typically use the explicit methodology to convert the data types. Weak-typing languages, which have looser rules and may produce unpredictable results, are more likely to arbitrarily interpret a data type as having different representations.

Data conversion in a cloud-native world

Failure to properly convert data can lead to an inaccurate, incomplete database that could take months to fix. Even worse, you may be making business decisions based on inaccurate data, decisions that can have real consequences for your bottom line. The fastest and most efficient way to ensure that data is converted properly is to use a data management platform that automates the conversion process.

Talend Cloud provides a complete, scalable, and secure data management solution that simplifies data conversion. The platform includes a full range of tools to manage your data from start to finish, so that your data is ready for integration when you are. Start your free trial and see how Talend Cloud can address your data conversion needs.

Ready to get started with Talend?