How to containerize your Talend jobs with one click

How to containerize your Talend jobs with one click

  • Edward Ost
    Over 25 years of experience in application and web development with extensive experience on both Java and .NET platforms. Ed works as the Channels Technical Director at Talend working with technology partners, commercial use customers, and SI partners in the Talend ecosystem. Current focus is on enterprise integration strategies spanning DW, Data Lake, and operational decision support systems using the Talend Unified Platform.

This blog post is part of a series on serverless architecture and containers. The first post of this series discussed the impact of containers on DevOps.

Talend Data Integration is an enterprise data integration platform that provides visual design while generating simple Java. This lightweight, modular design approach is a great fit for containers. In this blog post, we’ll walk you through how to containerize your Talend job with a single click. All of the code examples in this post can be found on our Talend Job2Docker Git repository. The git readme also includes step-by-step instructions.

Building Job2Docker

There are two parts to Job2Docker. The first part is a very simple Bash script that packages your Talend job zip file as a Docker image. During this packaging step, the script tweaks your Talend job launch command so it will run as PID 1. It does not modify the job in any other way. As a result, your Talend Job will be the only process in the container. This approach is consistent with the spirit and best practices for Docker. When you create a container instance, your Talend job will automatically run. When your job is finished, the container will shut down. Your application logic is safely decoupled from the hosting compute infrastructure. You can now leverage container orchestration tools such as Kubernetes, OpenShift, Docker Swarm, EC2 Container Services, or Azure Container Instances to manage your job containers efficiently. Whether operating in the Cloud or on-premises, you can leverage the improved elasticity to reduce your total costs of ownership.

Running Job2Docker

The second part of Job2Docker is a simple utility Job written in Talend itself. It monitors a shared directory. Whenever you build a Job from Studio to this directory, the Job2Docker listener invokes the job packaging script to create the Docker image.

All you need to run the examples are an instance of Talend Studio 7.0.1 (that you can download for free here) and a server running Docker. 

  • If you run Studio on Linux, you can simply install Docker and select a directory to share your Talend Jobs with Docker.
  • If you run Studio on Windows, then you can either try Docker on Windows, or you can install Linux on a VM.

The examples here were run on a Linux VM while Talend Studio ran in Windows on the host OS. For the Studio and Docker to communicate, you will need to share a folder using your VM tool of choice.

Once you have installed these Job2Docker scripts and listener, the workflow is transparent to the Studio user.

1. Start the Job2Docker_listener job monitoring the shared directory.

2. Click “Build” in Talend Studio to create a Talend job zip file in the shared directory. That’s it.

3. The Talend Job2Docker_listener triggers the Job2Docker script to convert the Talend zip file to a .tgz file ready for Docker.

4. The Talend Job2Docker_listener triggers the Job2Docker_build script creating a Docker Image.

5. Create a container with the Docker Run command. Your job runs automatically.

The Job2Docker repository includes some basic Hello World examples, and it also walks you through how you can easily pass parameters to your Job. The step-by-step process is detailed in a short video available on YouTube.

Once you have your Docker image you can publish it to your Docker registry of choice and stage your containerized application as part of your Continuous Integration workflow.

While this process is deceptively simple it has big implications for how you can manage your workflows at scale. In our next post, we’ll show you how to orchestrate multiple containerized jobs using container orchestration tools like Kubernetes, EC2 Container Services, Fargate, or Azure Container Instances.

Join The Conversation

0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *