Key Considerations for Converting Legacy ETL to Modern ETL
Recently, there has been a surge in our customers who want to move away from legacy data integration platforms to adopting Talend as their one-stop shop for all their integration needs. Some of the organizations have thousands of legacy ETL jobs to convert to Talend before they are fully operational. The big question that lurks in everyone’s mind is how to get past this hurdle.
Defining Your Conversion Strategy
To begin with, every organization undergoing such a change needs to focus on three key aspects:
- Will the source and/or target systems change? Is this just an ETL conversion from their legacy system to modern ETL like Talend?
- Is the goal to re-platform as well? Will the target system change?
- Will the new platform reside on the cloud or continue to be on-premise?
This is where Talend’s Strategic Services can help carve a successful conversion strategy and implementation roadmap for our customers. In the first part of this three-blog series, I will focus on the first aspect of conversion.
Before we dig into it, it’s worthwhile to note a very important point - the architecture of the product itself. Talend is a JAVA code generator and unlike its competitors (where the code is migrated from one environment to the other) Talend actually builds the code and migrates built binaries from one environment to the other. In many organizations, it takes a few sprints to fully acknowledge this fact as the architects and developers are used to the old ways of referring to code migration.
The upside of this architecture is that it helps in enabling a continuous integration environment that was not possible with legacy tools. A complete architecture of Talend’s platform not only includes the product itself, but also includes third-party products such as Jenkins, NEXUS - artifact repository and a source control repository like GIT. Compare this to a JAVA programming environment and you can clearly see the similarities. In short, it is extremely important to understand that Talend works differently and that’s what sets it apart from the rest in the crowd.
Where Should You Get Started?
Let’s focus on the first aspect, conversion. Assuming that nothing else changes except for the ETL jobs that integrate, cleanse, transform and load the data, it makes it a lucrative opportunity to leverage a conversion tool - something that ingests legacy code and generates Talend code. It is not a good idea to try and replicate the entire business logic of all ETL jobs manually as there will be a great risk of introducing errors leading to prolonged QA cycles. However, just like anyone coming from a sound technical background, it is also not a good idea to completely rely on the automated conversion process itself since the comparison may not always be apples to apples. The right approach is to use the automated conversion process as an accelerator with some manual interventions.
Bright minds bring in success. Keeping that mantra in mind, first build your team:
- Core Team - Identify architects, senior developers and SMEs (data analysts, business analysts, people who live and breathe data in your organization)
- Talend Experts - Bring in experts of the tool so that they can guide you and provide you with the best practices and solutions to all your conversion related effort. Will participate in performance tuning activities
- Conversion Team - A team that leverages a conversion tool to automate the conversion process. A solid team with a solid tool and open to enhancing the tool along the way to automate new designs and specifications
- QA Team - Seasoned QA professionals that help you breeze through your QA testing activities
Now comes the approach - Follow this approach for each sprint:
Analyze the ETL jobs and categorize them depending on the complexity of the jobs based on functionality and components used. Some good conversion tools provide analyzers that can help you determine the complexity of the jobs to be converted. Spread a healthy mix of varying complexity jobs across each sprint.
Leverage a conversion tool to automate the conversion of the jobs. There are certain functionalities such as an “unconnected lookup” that can be achieved through an innovative method in Talend. Seasoned conversion tools will help automate such functionalities
Focus on job design and performance tuning. This is your chance to revisit design, if required, either to leverage better component(s) or to go for a complete redesign. Also focus on performance optimization. For high-volume jobs, you could increase the throughput and performance by leveraging Talend’s big data components, it is not uncommon for us to see that we end up completely redesigning a converted Talend Data Integration job to a Talend Big Data job to drastically improve performance. Another feather in our hat where you can seamlessly execute standard data integration jobs alongside big data jobs.
Unit test and ensure all functionalities and performance acceptance criteria are satisfied before handing over the job to QA
An automated QA approach to compare result sets produced by the old set of ETL jobs and new ETL jobs. At the least, focus on:
- Compare row counts from the old process to that of the new one
- Compare each data element loaded by the load process to that of the new one
- Verify “upsert” and “delete” logic work as expected
- Introduce an element of regression testing to ensure fixes are not breaking other functionalities
- Performance testing to ensure SLAs are met
Now, for several reasons, there can be instances where one would need to design a completely new ETL process for a certain functionality in order to continue processing data in the same way as before. For such situations, you should leverage the “Talend Experts” team that not only liaisons with the team that does the automated conversion but also works closely with the core team to ensure that, in such situations, the best solution is proposed which is then converted to a template and provided to the conversion team who can then automate the new design into the affected jobs.
As you can see, these activities can be part of the “Categorize” and “Convert” phases of the approach.
Finally, I would suggest chunking the conversion effort into logical waves. Do not go for a big bang approach since the conversion effort could be a lengthy one depending on the number of legacy ETL jobs in an organization.
This brings me to the end of the first part of the three-blog series. Below are the five key takeaways of this blog:
- Define scope and spread the conversion effort across multiple waves
- Identify core team, Talend experts, a solid conversion team leveraging a solid conversion tool and seasoned QA professionals
- Follow an iterative approach for the conversion effort
- Explore Talend’s big data capabilities to enhance performance
- Innovate new functionalities, create templates and automate the conversion of these functionalities
Stay tuned for the next two!!