DOWNLOAD : The Cloud Data Integration Checklist from TDWI

Talend and Red Hat OpenShift Integration: A Primer

Talend and Red Hat OpenShift Integration: A Primer

  • Nikhil Thampi
    Nikhil Thampi is Customer Success Architect at Talend and his core expertise are in Data Integration, Database and Data warehousing technologies. He has more than 12 years of IT experience and during this career, he has helped to create technical solutions for customers from different parts of the globe. His areas of interest also include Cloud, Containers, Big Data, Data Governance and Machine Learning technologies. He is passionate about teaching and increase awareness about Talend among IT developers and he is one of the top contributors of Talend Community site.

One of the aspects I am always fascinated about Talend is its ability to run programs according to multiple job execution methodologies. Today I wanted to write an overview of a new way of executing data integration jobs using Talend and Red Hat OpenShift Platform.

First and foremost, let us do a quick recap of the standard ways of running Talend jobs. Users usually run Talend jobs using Talend schedulers which can be either in the Cloud or On-premise. Other methods include creating standalone jobs, building web services from Talend Jobs, building OSGI Bundle for ESB and the latest entry to this list from Talend 7.1 onwards is building the job as Docker image. For this blog, we are going to focus on the Docker route and show you how Talend Data Integration jobs can be used with Red Hat OpenShift Platform. 

I would also highly recommend reading two other interesting Talend blogs related to the interaction between Talend and Docker, which are:

  1. Going Serverless with Talend through CI/CD and Containers by Thibaut Gourdel
  2. Overview: Talend Server Applications with Docker by Michaël Gainhao 

Before going to other details, let's get into the basics of containers, Docker and Red Hat OpenShift Platform. For all those are already proficient in container technology, I would recommend skipping ahead to the next section of the blog.

Containers, Docker, Kubernetes and Red Hat OpenShift

What is a container? A container is a standardized unit of software which is quite lightweight and can be executed without environment related constraints. Docker is the most popular container platform and it has helped the Information technology industry in two major fronts i.e. reduction in the infrastructure and maintenance cost and reduction in turnaround time to bring applications to market. 

The diagram above shows how the various levels Docker container platform and Talend jobs are stacked in application containers. The Docker platform interacts with underlying infrastructure and host operating system and it helps the application containers to run in a seamless manner without knowing the complexities of the underlying layers.

Kubernetes

Next, let us quickly talk about Kubernetes and how it has helped in the growth of container technology. When we are building more and more containers, we will need an orchestrator who can control the management, automatic deployment and scaling of the containers and Kubernetes is the software platform which does this orchestration in a magical way.

Kubernetes helps to coordinate a cluster of computers as a single unit and we can deploy containerized applications on top of the cluster. It consists of Pods which acts as logical host for the containers and these pods are running on top of worker machines in Kubernetes called Nodes. There are a lot of other concepts in Kubernetes but let us limit ourselves to the context of the blog since Talend Job containers are executed on top of these Pods.

Red Hat OpenShift

OpenShift is the open source container application platform from Red Hat which is built on top of Docker containers and Kubernetes container cluster manager. I am republishing the official OpenShift block diagram from Red Hat website for your quick reference.

OpenShift comes in a few flavors apart from the free (Red Hat OpenShift Online Starter) version.

  1. Red Hat OpenShift Online Pro
  2. Red Hat OpenShift Dedicated
  3. Red Hat OpenShift Container Platform

OpenShift Online Pro and Dedicated will be running on top of Red Hat hosted infrastructure and OpenShift Container Platform can be set up on top of customer’s own infrastructure.

Now let's move to our familiar territory where we are planning to convert the Talend job to Docker container.

Talend Job Conversion to Container and Image Registry Storage

Considering the customers who are using older versions of Talend, we will first create a Docker image from a sample Talend job. Those who are already using Talend 7.1 version, you have the capability to export the Talend jobs to Docker as mentioned in the introduction section. So, you can safely move to the next section where the Docker image is already available and we will meet you there. People who are still with me, let us quickly build a Docker image for a sample job 😊.

I have created a simple job where I am generating random first and last names and then printing them on the console.  We are going to build a standalone job zip file from the Talend job and the zip will be placed in the target directory of the server, where Docker is available.

The next step will be to create a Docker file which will store the instructions to perform while building a Docker container from the Talend standalone zip file. The steps in the Docker file is as shown below.

FROM anapsix/alpine-java:8u121b13_jdk



ARG talend_job

ARG talend_version



LABEL maintainer="nthampi@talend.com" \

    talend.job=${talend_job} \

    talend.version=${talend_version}



ENV TALEND_JOB ${talend_job}

ENV TALEND_VERSION ${talend_version}

ENV ARGS ""



WORKDIR /opt/talend



COPY ${TALEND_JOB}_${talend_version}.zip .


RUN unzip ${TALEND_JOB}_${TALEND_VERSION}.zip && \

    rm -rf ${TALEND_JOB}_${TALEND_VERSION}.zip && \

    chmod +x ${TALEND_JOB}/${TALEND_JOB}_run.sh


CMD ["/bin/sh","-c","${TALEND_JOB}/${TALEND_JOB}_run.sh ${ARGS} "]

If you notice the various commands specified in the Docker file, we could identify that we are creating a base Alpine java image. On top of that we are adding additional instructions in a layered format. The instructions specify to unzip the file that contains the Talend job and execute the right shell script file. Now, we have created the Docker file which will be used for the container build.

The statement to create the Docker build for the Talend job is below. 

docker build /home/centos/talend/ -f /home/centos/talend/dockerfile.txt -t nikhilthampi/helloworld:0.1 --build-arg talend_job=helloworld --build-arg talend_version=0.1

The docker images command will list the newly created container with the container name and container tag already present such as "nikhilthampi/helloworld" and "0.1" respectively.

If you are interested in moving the Docker image to a Docker repository, you can login to Docker using the command below and push the container to Docker Hub.

The image will be now available in the Docker hub repository as shown below.

Similarly, you can load the container to a Red Hat OpenShift image repository. The first step is to configure the OpenShift client in the server and follow the steps below for installing in CentOS.

wget https://mirror.openshift.com/pub/openshift-v3/clients/3.9.31/linux/oc.tar.gz
tar -xvf oc.tar.gz
cd /opt
mkdir oc
mv /home/centos/oc /opt/oc/oc
export PATH=$PATH:/opt/oc

The next step is to go OpenShift Console and get the login credentials from the site as shown below. You will be provided with login credentials with a token.

Using the token, you will be able to login to OpenShift and the details of successful login are shown below. I have already created a project called “docker” inside OpenShift and OpenShift will be start using this project.

We can now tag the container we have created and push the container to OpenShift image registry and the sample pattern is as shown below.

docker tag <docker image id> <OpenShift region registry>/< project >/<container image name>

docker push <OpenShift region registry>/< project >/<container image name>

The screenshot below is sample output we will be getting from OpenShift after executing the commands.

The container image can be viewed through OpenShift Console also and it will be available under Image Streams section of the project.

Alright! We have completed the tasks involved in transferring the Talend job docker image to OpenShift image registry.  Don’t you think it is easy? Instead of doing the container image migration manually, CI/CD can be also used to do the deployment to docker registries. It is not in the scope of the current blog and I would recommend going through CI/CD blogs of Talend to automate the above steps.

Talend Job execution in OpenShift

Now, let's get to Talend job execution in OpenShift. The first step to create a job in Openshift is to configure the corresponding YAML file. Below is the sample YAML file which I have created for the "helloworld" job.

apiVersion: batch/v1

kind: Job

metadata:

  name: helloworld

spec:

  template:

    spec:

      activeDeadlineSeconds: 180

      containers:

      - name: helloworld

        image: docker-registry.default.svc:5000/docker/helloworld

      restartPolicy: Never

  backoffLimit: 4

Instead of Pod, Route or Service, we have created a Job kind and we have also added the source image registry details to the YAML file. Once the YAML file is ready, below command must be executed in the command line to generate the job in OpenShift.

Once the success message is generated by the command, we will be able to see the entry got created under Other Resources -> Job section of OpenShift.

If you go the Pods section of OpenShift, you will be able to see that the Talend job has been executed successfully and the logs have been captured as shown below.

I hope your journey through the blog to execute Talend and Red Hat Openshift job was quite easy and interesting. There are a lot of other interesting blogs on various subject areas of Talend. I would highly recommend checking them also to increase your knowledge of Talend and how Talend is interacting with many interesting technologies in IT.

These micro-services / OSGI runtimes that are referenced in this blog can be generated if you have Talend Cloud API services or Real-Time Big Data

Till I come back with a new blog, enjoy your time using Talend! 

References

https://www.docker.com/resources/what-container

https://kubernetes.io/docs/tutorials/kubernetes-basics/

https://www.openshift.com/learn/what-is-openshift/

https://docs.openshift.com/container-platform/3.5/dev_guide/jobs.html#creating-a-job

Join The Conversation

0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *