DOWNLOAD : The Cloud Data Integration Checklist from TDWI

Going Serverless with Talend through CI/CD and Containers

Going Serverless with Talend through CI/CD and Containers

  • Thibaut Gourdel
    Thibaut Gourdel is Technical Product Marketing Jr at Talend since 2017. His area of interest includes cloud technologies, containerization, serverless computing and data stream processing.

Why should we care about CI/CD and Containers?

Continuous integration, delivery and deployment, known as CI/CD, has become such a critical piece in every successful software project that we cannot deny the benefits it can bring to your project. At the same time, containers are everywhere right now and are very popular among developers. In practice, CI/CD delivery allows users to gain confidence in the applications they are building by continuously test and validate them. Meanwhile, containerization gives you the agility to distribute and run your software by building once and being able to deploy “anywhere” thanks to a standard format that companies adopt. These common DevOps practices avoid the “it works on my machine!” effect. Direct consequences are a better time to market and more frequent deliveries.

How does it fit with Talend environment?

At Talend, we want to give you the possibility to be part of this revolution giving you access to the best of both worlds. Since the release of Talend 7.0 you can now build your Talend Jobs within Docker images thanks to standard Maven build. In addition, we also help you to smoothly plug this process into your CI/CD pipeline.

What about serverless?

The serverless piece comes at the end of it. This is the way we will deploy our Talend jobs. In fact, by shipping our jobs in containers we now have the freedom to deploy integration jobs anywhere. Among all the possibilities, a new category of services that are defined as serverless is raising. Most of the major cloud providers are starting to offer their serverless services for containers such AWS Fargate or Azure Container Instances to name a few. They allow you to run containers without the need to manage any infrastructure (servers or clusters). You are only billed for your container usage.

These new features have been presented at Talend Connect US 2018 during the main keynote and have been illustrated with a live demo of a whole pipeline from the build to the run of the job in AWS Fargate as well as Azure ACI. In this blog post, we are going to take advantage of Jenkins to create a CI/CD pipeline. It consists of building our jobs within Docker images, making the images available in a Docker registry, and eventually calling AWS Fargate and Azure ACI to run our images.

Schema for going serverles with Talend through CI/CD and containers

Let’s see how to reproduce this process. If you would like to follow along, please make sure you fulfill the following requirements.

Requirements

  • Talend Studio 7.0 or higher
  • Talend Cloud Spring 18’ or higher
  • Have a Jenkins server available
  • Your Jenkins server needs to have access to a Docker daemon installed on the same machine
  • Talend CommandLine installed on the Jenkins’ machine
  • Nexus version 2 or 3
  • Have installed the Talend CI Builder configured along with your Talend CommandLine.

* All Talend components here are available in the Talend Cloud 30 Day Trial

Talend Environment

If you are new to Talend then let me walk you through a high-level overview of the different components. You need to start out with Talend Cloud and create a project in the Talend Management Console (TMC), you will then need to configure your project to use a Git repository to store your job artifacts. You also need to configure the Nexus setting in TMC to point to your hosted Nexus server to store any 3rd party libraries you job may need. Talend Cloud account provides the overall project and artifact management in the TMC.

Next, you need to install the Studio and connect it to the Talend Cloud account. The Studio is the main Talend design environment where you build integration jobs. When you log in with the Studio to the cloud account following these steps you will see the projects from the TMC and use the project that is configured to the Git repository. Follow the steps below to add the needed plugins to the Studio.

The last two components you need are the CI builder and Talend CommandLine or cmdline (if using cloud trial, the CommandLine tool is included with the studio install files). When installing or using the CommandLine tool the first time you will need to give the CommandLine tool a license as well, you can use the same license from your Studio. The CommandLine tool and the CI builder tool are the components that allow you to take the code from the job in Studio (really in Git) and build and deploy fully executables processes to environments via scripts. The CI builder along with the profile in the studio is what determines if that is going to say a Talend Cloud runtime environment or a container. Here are the steps to get started!

1) Create a project in Talend Cloud

First, you need to create a project in your Talend Cloud account and link your project to a GitHub repository. Please, have a look at the documentation to perform this operation.

Don’t forget to configure your Nexus in your Talend Cloud account. Please follow the documentation for configuring your Nexus with Talend cloud. As a reminder your Nexus needs to have the following repositories:

  • releases
  • snapshots
  • talend-custom-libs-release
  • talend-custom-libs-snapshot
  • talend-updates
  • thirdparty

2) Add the Maven Docker profiles to the Studio

This step is no longer needed with Talend Studio 7.1 and higher. Please skip this part if you have this version or higher as the Maven Docker profiles are now generated by the Talend Studio. Talend also introduces OpenJDK 8 support in 7.1 which means that you can use OpenJDK Docker base images in production environement.

We are going to configure the Studio by adding the Maven Docker profiles to the configuration of our project and job pom files. Please find the two files you will need here under project.xml and standalone-job.xml.

You can do so in your studio, under the menu Project Properties -> Build -> Project / Standalone Job.

You just need to replace them with the ones you had copied above. No changes are needed.

In fact, what we are really doing here is adding a new profile called “docker” using the fabric8 Maven plugin. When building our jobs with this Maven profile, openjdk:8-jre-slim will be used as a base image, then the jars of our jobs are going to be added to this image along with a small script to indicate how to run the job. Please be aware that Talend does not support OpenJDK nor Alpine Linux. For testing purposes only, you can keep the openjdk:8-jre-slim image, but for production purposes, you will have to build your own Java Oracle base image. For more information please refer to our supported platforms documentation.

3) Set up Jenkins

The third step is to set up our Jenkins server. In this blog post, the initial configuration of Jenkins will not be covered. If you have never used it before please follow the Jenkins Pipeline getting started guide. Once the initial configuration is completed, we will be using the Maven, Git, Docker, Pipeline and Blue Ocean plugins to achieve our CI/CD pipeline.

We are going to store our Maven settings file in Jenkins. In the settings of Jenkins (Manage Jenkins), go to “Managed files” and create a file with ID “maven-file”. Copy this file in it as in the screenshot below. Make sure to modify the CommandLine path according to your own settings and to specify your own nexus credentials and URL.

What you also need to achieve before going into the detail of the pipeline is define some credentials. To do so go to “Manage Jenkins” and “Configure credentials” then on the left “Credentials”. Look at the screenshot below:

Create four credentials for GitHub, Docker Hub, AWS and Azure. If you only plan to use AWS, you don’t need to specify your Azure credentials and conversely. Make sure you set your ACCESS KEY as username and SECRET ACCESS KEY as password for the AWS credentials.

Finally, and before going through the pipeline, we must get two CLI Docker images available on the Jenkins machine. Indeed, Jenkins will use docker images with AWS and Azure CLIs to perform CLI commands to the different services. This is an easy way to use these CLIs without the need to install them on the machine. Here are the images we will use:

  • vfarcic/aws-cli (docker pull vfarcic/aws-cli:latest; docker tag vfarcic/aws-cli:latest aws-cli)
  • microsoft/azure-cli:latest (docker pull microsoft/azure-cli:latest; docker tag microsoft/azure-cli:latest azure-cli)

You can of course use different images at your convenience.

These Docker images need to be pulled on the Jenkins machine, this way in the pipeline we can use the Jenkins Docker plugin to use the “withDockerContainer(‘image’)” function to execute the CLI commands as you will see later. You can find more information about running build steps inside a Docker container in the Jenkins documentation here.

Now that all the pre-requisites have been fulfilled let’s create a “New item” on the main page and choose “Pipeline”.

Once created you can configure your pipeline. This is where you will define your pipeline script (Groovy language).

You can find the script here.

Let’s go through this file and I will highlight the main steps.

At the top of the file you can set your own settings through environment variables. You have an example that you can follow with a project called “TALEND_JOB_PIPELINE” and a job “test”. The project git name should match the one in your GitHub repository. That is why the name is uppercase. Please be aware that in this script we use the job name as the Docker image name, so you cannot use underscores in your job name. If you want to use an underscore you need to define another name without underscore for your Docker image. You can override the Docker image name using -Dtalend.docker.name=NAME option in your Maven command. The following environment variables must be set:

env.PROJECT_GIT_NAME = 'TALEND_JOB_PIPELINE'
env.PROJECT_NAME = env.PROJECT_GIT_NAME.toLowerCase()
env.JOB = 'test'
env.VERSION = '0.1'
env.GIT_URL = 'https://github.com/tgourdel/talend-pipeline-job.git'
env.TYPE = "" // if big data = _mr
env.DOCKERHUB_USER = "talendinc"

In this file, each step is defined by a “stage”. The first two stages are here for pulling the latest version of the job using the Git plugin.

Then comes the build of the job itself. As you can see we are utilizing the Maven plugin. The settings are in a Jenkins Config file. This is the file we added earlier in the Jenkins configuration with the maven-file ID.

In the stages “Build, Test and Publish to Nexus” and “Package Jobs as Container” the line to change is:

-Dproduct.path=/cmdline -DgenerationType=local -DaltDeploymentRepository=snapshots::default::http://nexus:8081/repository/snapshots/ -Xms1024m -Xmx3096m

Here you need to specify your own path to the CommandLine directory (relatively to the Jenkins server) and your Nexus URL.

After the build of the job in a Docker image we are going to push the image to Dockerhub registry. For this step and the next one we will use CLIs to use the different third-parties. As the Docker daemon should be running on the Jenkins’ machine you can use directly the docker CLI. We use the withCredentials() function to get your Dockerhub username and password:

stage ('Push to a Registry') {
            withCredentials([usernamePassword(credentialsId: 'dockerhub', passwordVariable: 'dockerhubPassword', usernameVariable: 'dockerhubUser')]) {
               sh 'docker tag $PROJECT_NAME/$JOB:$VERSION $DOCKERHUB_USER/$JOB:$VERSION'
               sh "docker login -u ${env.dockerhubUser} -p ${env.dockerhubPassword}"
               sh "docker push $DOCKERHUB_USER/$JOB:$VERSION"
           }
}

The stage “Deployment environment” is simply an interaction with the user when running the pipeline. It asks whether you want to deploy your container in AWS Fargate or Azure ACI. You can remove this step if you want to have a continuous build until the deployment. This step is for demo purposes.

The next two stages are the deployment itself to AWS Fargate or Azure ACI. In each of the two stages you need to modify with your own settings. For example, in the AWS Fargate deployment stage you need to modify this line:

aws ecs run-task --cluster TalendDeployedPipeline --task-definition TalendContainerizedJob --network-configuration awsvpcConfiguration={subnets=[subnet-6b30d745],securityGroups=[],assignPublicIp=ENABLED} --launch-type FARGATE

You need to modify the name of your Fargate Cluster and your task definition. For your information you need to create them in your AWS console. You can read the documentation to achieve this operation. At the time of writing, AWS Fargate is only available in N. Virginia region, but other regions will come. The container you that will be defined in your task definition is the one that will be created in your Docker Hub account with the name of your job as image name. For example, it would be talendinc/test:0.1 with the default configuration in the pipeline script.

The same applies to Azure ACI, you need to specify your own resource group and container instance.

4) Configure the Command line

As a matter of fact, Maven will use the CommandLine to build your job. The CommandLine can be used in 2 modes: script and server mode. Here we will use the CommandLine in server mode. First, you need to indicate the workspace of your CommandLine (which in our case will be the Jenkins workspace). Modify command-line.sh file as follow with your own path to Jenkins workspace (it depends on your pipeline name you choose in the previous step):

./Talend-Studio-linux-gtk-x86_64 -nosplash -application org.talend.commandline.CommandLine -consoleLog -data $WORKSPACE startServer -p 8002

The variable $WORKSPACE will be specified by your Jenkins instance and will be replaced by the Jenkins' workspace of your pipeline. Thanks to this environement variable provided by Jenkins you can run multiple pipelines with different Jenkins' workspaces and the CommandLine will adapt its own workspace accordingly.

As you can see in the Maven Options of the Groovy script the parameter -Dgeneration.type=local is specified. It means the CommandLine will be started to achieve the build and then will be shut down. Using this option allows us to avoid keeping a CommandLine running indefinitely and to use the $WORKSPACE variable to change the workspace depending on the pipeline running.

Last thing to do is to modify the /configuration/maven_user_settings.xml file. To do so copy paste this file with your own nexus URL and login information.

5) Run the pipeline

Once all the necessary configuration has been done you can run your pipeline. To do so, you can go in the Open Blue Ocean view and click on “run” button. It will trigger the pipeline and should see the pipeline progress:

Screenshot of Jenkins Pipeline to build Talend Jobs into Docker Containers

Jenkins Pipeline to build Talend Jobs into Docker Containers

The pipeline in the context of this blog will ask you where you want to deploy your container. Choose either AWS Fargate or Azure ACI. Let’s take the Fargate example.

After having proceeded the deployment, your Fargate cluster should now have one pending task:

If you go into the detail of your task once run, you should be able to access the logs of your job:

You can now run your Talend integration job packaged in a Docker container anywhere such as:

  • AWS Fargate
  • Azure ACI
  • Google Container Engine
  • Kubernetes or OpenShift
  • And more …

Thanks to Talend CI/CD capabilities you can automate the whole process from the build to the run of your jobs.

If you want to become cloud agnostic and take advantage of the portability of the containers this example shows you how you can use a CI/CD tool (Jenkins in our case) to automate the build and run in different cloud container services. This is only an example among others but being able to build your jobs as containers opens you to a whole new world for your integration jobs. Depending on your use-cases you could find yourself spend way less money thanks to these new serverless services (such as Fargate or Azure ACI). You could also now spend less time configuring your infrastructure and focus on designing your jobs.

If you want to learn more about how to take advantage of containers and serverless technology, join us at Talend Connect 2018 in London and Paris. We will have dedicated break-out sessions on serverless to help you go hands on with these demos. See you there!

Join The Conversation

0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *