Understanding Data Migration: Strategy and Best Practices
Big data is what drives most modern businesses, and big data never sleeps. That means data integration and data migration need to be well-established, seamless processes — whether data is migrating from inputs to a data lake, from one repository to another, from a data warehouse to a data mart, or in or through the cloud. Without a competent data migration plan, businesses can run over budget, end up with overwhelming data processes, or find that their data operations are functioning below expectations.
What is Data Migration?
Data migration is the process of moving data from one system to another. While this might seem pretty straightforward, it involves a change in storage and database or application.
In the context of the extract/transform/load (ETL) process, any data migration will involve at least the transform and load steps. This means that extracted data needs to go through a series of functions in preparation, after which it can be loaded in to a target location.
Organizations undertake data migrations for a number of reasons. They might need to overhaul an entire system, upgrade databases, establish a new data warehouse, or merge new data from an acquisition or other source. Data migration is also necessary when deploying another system that sits alongside existing applications.
Why a Data Migration Strategy is Important
Regardless of the exact purpose for a data migration, the goal is generally to enhance performance and competitiveness.
But you have to get it right.
Less successful migrations can result in inaccurate data that contains redundancies and unknowns. This can happen even when source data is fully usable and adequate. Further, any issues that did exist in the source data can be amplified when it’s brought into a new, more sophisticated system.
A complete data migration strategy prevents a subpar experience that ends up creating more problems than it solves. Aside from missing deadlines and exceeding budgets, incomplete plans can cause migration projects to fail altogether. In planning and strategizing the work, teams need to give migrations their full attention, rather than making them subordinate to another project with a large scope.
A strategic data migration plan should include consideration of these critical factors:
- Knowing the data — Before migration, source data needs to undergo a complete audit. Unexpected issues can surface if this step is ignored.
- Cleanup — Once you identify any issues with your source data, they must be resolved. This may require additional software tools and third-party resources because of the scale of the work.
- Maintenance and protection — Data undergoes degradation after a period of time, making it unreliable. This means there must be controls in place to maintain data quality.
- Governance — Tracking and reporting on data quality is important because it enables a better understanding of data integrity. The processes and tools used to produce this information should be highly usable and automate functions where possible.
In addition to a structured, step-by-step procedure, a data migration plan should include a process for bringing on the right software and tools for the project.
Data Migration Strategies
There is more than one way to build a data migration strategy. An organization’s specific business needs and requirements will help establish what’s most appropriate. However, most strategies fall into one of two categories: “big bang” or “trickle.”
“Big Bang” Migration
In a big bang data migration, the full transfer is completed within a limited window of time. Live systems experience downtime while data goes through ETL processing and transitions to the new database.
The draw of this method is, of course, that it all happens in one time-boxed event, requiring relatively little time to complete. The pressure, though, can be intense, as the business operates with one of its resources offline. This risks a compromised implementation.
If the big bang approach makes the most sense for your business, consider running through the migration process before the actual event.
Trickle migrations, in contrast, complete the migration process in phases. During implementation, the old system and the new are run in parallel, which eliminates downtime or operational interruptions. Processes running in real-time can keep data continuously migrating.
Compared to the big bang approach, these implementations can be fairly complex in design. However, the added complexity — if done right — usually reduces risks, rather than adding them.
Best Practices for Data Migration
Regardless of which implementation method you follow, there are some best practices to keep in mind:
- Back up the data before executing. In case something goes wrong during the implementation, you can’t afford to lose data. Make sure there are backup resources and that they’ve been tested before you proceed.
- Stick to the strategy. Too many data managers make a plan and then abandon it when the process goes “too” smoothly or when things get out of hand. The migration process can be complicated and even frustrating at times, so prepare for that reality and then stick to the plan.
- Test, test, test. During the planning and design phases, and throughout implementation and maintenance, test the data migration to make sure you will eventually achieve the desired outcome.
6 Key Steps in a Data Migration Strategy
Each strategy will vary in the specifics, based on the organization’s needs and goals, but generally, a data migration plan should follow a common, recognizable pattern:
1. Explore and Assess the Source
Before migrating data, you must know (and understand) what you’re migrating, as well as how it fits within the target system. Understand how much data is pulling over and what that data looks like.
There may be data with lots of fields, some of which won’t need to be mapped to the target system. There may also be missing data fields within a source that will need to pull from another location to fill a gap. Ask yourself what needs to migrate over, what can be left behind, and what might be missing.
Beyond meeting the requirements for data fields to be transferred, run an audit on the actual data contained within. If there are poorly populated fields, a lot of incomplete data pieces, inaccuracies, or other problems, you may reconsider whether you really need to go through the process of migrating that data in the first place.
If an organization skips this source review step, and assumes an understanding of the data, the result could be wasted time and money on migration. Worse, the organization could run into a critical flaw in the data mapping that halts any progress in its tracks.
2. Define and Design the Migration
The design phase is where organizations define the type of migration to take on — big bang or trickle. This also involves drawing out the technical architecture of the solution and detailing the migration processes.
Considering the design, the data to be pulled over, and the target system, you can begin to define timelines and any project concerns. By the end of this step, the whole project should be documented.
During planning, it’s important to consider security plans for the data. Any data that needs to be protected should have protection threaded throughout the plan.
3. Build the Migration Solution
It can be tempting to approach migration with a “just enough” development approach. However, since you will only undergo the implementation one time, it’s crucial to get it right. A common tactic is to break the data into subsets and build out one category at a time, followed by a test. If an organization is working on a particularly large migration, it might make sense to build and test in parallel.
4. Conduct a Live Test
The testing process isn’t over after testing the code during the build phase. It’s important to test the data migration design with real data to ensure the accuracy of the implementation and completeness of the application.
5. Flipping the Switch
After final testing, implementation can proceed, using the style defined in the plan.
Once the implementation has gone live, set up a system to audit the data in order to ensure the accuracy of the migration.
Data Migration Software
Building out data migration tools from scratch, and coding them by hand, is challenging and incredibly time-consuming. Data tools that simplify migration are more efficient and cost-effective. When you start your search for a software solution, look for these factors in a vendor:
- Connectivity — Does the solution support the systems and software you currently use?
- Scalability — What are the data limits for the software, and will data needs exceed them in the foreseeable future?
- Security — Take time investigating a software platform’s security measures. You’re data is one of your most valuable resources, and it must remain protected.
- Speed — How quickly can processing occur on the platform?
Migrating Data to the Cloud
Increasingly, organizations are migrating some or all of their data to the cloud in order to increase their speed to market, improve scalability, and reduce the need for technical resources.
In the past, data architects were tasked with deploying sizeable server farms on-premises to keep data within the organization’s physical resources. Part of the reason for pushing ahead with on-site servers had been a concern for security on the cloud. However, as major platforms adopt security practices putting them on par with traditional IT security (and necessarily in compliance with the GDPR), this barrier to migration has largely been overcome.
The right cloud integration tools help customers accelerate cloud data migration projects with a highly scalable and secure cloud integration platform-as-a-service (iPaaS). Talend’s suite of open source, cloud-native data integration tools enable drag-and-drop functionality to simplify complex mapping, and our open-source foundations make our solution cost-effective and efficient.
Getting Started with Data Migration
If your organization is upgrading systems, moving to the cloud, or consolidating data, a data migration is on the horizon. It’s a big and important project, and the integrity of the data demands that it gets done right.
Talend’s platform includes free, open source data tools that can streamline every step in the data migration process, from Data Preparation, to Integration, to continued Data Streaming. Kickstart your data migration process by exploring the software that can help you get it done. Try Talend Data Fabric today.
Ready to get started with Talend?
More related articles
- Talend Job Design Patterns and Best Practices: Part 4
- Talend Job Design Patterns and Best Practices: Part 3
- What is Data Migration?
- What is Data Mapping?
- What is Data Integration?
- Talend Job Design Patterns and Best Practices: Part 2
- Talend Job Design Patterns and Best Practices: Part 1
- What is change data capture?
- An Informatica PowerCenter Developers' Guide to Talend: Part 1