ebook: : The Definitive Guide to Data Integration

How to deploy Talend Jobs as Docker images to Amazon, Azure and Google Cloud registries

How to deploy Talend Jobs as Docker images to Amazon, Azure and Google Cloud registries

  • Thibaut Gourdel
    Thibaut Gourdel is Technical Product Marketing Jr at Talend since 2017. His area of interest includes cloud technologies, containerization, serverless computing and data stream processing.

Since the release of Talend 7.1 users can build Talend jobs as Docker images and publish them to Docker registries. In this blog post, I am going to run through the steps to publish to the major cloud provider container registries (AWS, Azure and Google Cloud). Before I dig into publishing container images to registries, I am going to remind you the basics of building Talend Jobs in Docker images from Talend Studio as well as point out the difference between a local build and a remote build.

Requirements

  • Talend Studio 7.1.1 or higher
  • Platform license
  • Docker software installed and accessible from Studio

What is a container registry?

First, let’s explain the concept of a container registry. For those of you familiar with this, feel free to skip ahead.

A container registry is basically a set of repositories for Docker images. This is where you store and distribute your Docker images to further use. However, most of them also offer access control over who can see, view and download images as well as CI/CD integration and vulnerability scanning.

Let’s take a look at the major Docker registries available:

  • DockerHub is the world’s largest library and community for container images. This is the in-house registry of Docker.
  • Amazon ECR is the Amazon Web Services registry.
  • Google GCR is the Google Cloud Platform registry.
  • Azure ACR is the Microsoft Azure registry.

How do I build a Talend Job as a Docker image?

Before we publish to a registry, let me remind you of the ways to perform a build of a Docker image in Talend Studio:

Local build

To build a job as a Docker Image:

  • Right-Click on a Job
  • Select “Build Job”.
  • Select “Docker Image” as build type and fill in the form with your own settings.

The Docker Host is the Docker daemon currently running on the machine where you want to build your image. You can either build the image on your local machine or on a remote host. In the example above the Docker daemon is running on the same machine than the Studio, hence we use the local mode.

Remote Build

In most cases, Docker is installed on the machine where you run Talend Studio. However, you might want to build your Docker image on a remote host such as a local virtual machine or a virtual machine in the cloud. If this is the case, you need to select the remote mode.

In my case, I have a Windows laptop where I run a Linux virtual machine in VMWare. I am more comfortable with running Docker on a Linux machine. That is why, to build my image on my Linux VM, I need to select the remote mode and specify its IP address using TCP protocol. We also need to open a port to be able to access the remote Docker daemon from outside. Please refer to the Docker documentation to enable the TCP socket access to Docker.

How to publish a Talend Docker Image to a Registry

With Talend Studio you can also build your Docker images and push them to a registry with only one action called publishing.

Dockerhub

Let’s start with DockerHub. To publish your Talend Job as an image into a Docker registry, right-click on your job and select Publish:

Compared to the build function you now have 3 more fields. As you are also pushing your image to a Docker registry you need to specify your registry and the credentials used to access it.

Registry: docker.io/<DOCKER_USERNAME>

Username: <DOCKER_USERNAME>

Password: <DOCKER_PASSWORD>

Amazon ECR

If you want to publish your Docker image into your own AWS account, you have the possibility to use Amazon Elastic Container Service which is a private registry into your AWS account. With Amazon ECR you only get a single registry by account and by region. You also need to create the image repositories beforehand. So, in my case I created the repository “talendjob”:

Then you can fill the publish form as follows:

The image name will be the name you have given to your image repository and the registry will be the URI composed of your account number and region.

Image Name: Amazon ECR repository name

Registry: <AWS_ACCOUNT_NUMBER>.dkr.ecr.<REGION>.amazonaws.com

Username: <AWS_ACCESS_KEY>

Password: <AWS_SECRET_KEY>

Exceptionally the username and password are AWS credentials. As a matter of fact, Talend Studio uses Fabric8 Maven plugin, and as you can see in the documentation a custom way of connection has been developed to ease the authentication to Amazon ECR.

Azure ACR

Unlike Amazon ECR, you can create as many as registries you want in your Azure Portal. In my case I created a registry called “tgourdel”:

You can get your registry credentials using the Azure CLI as follow:

Command:

$ az acr credential show --name tgourdel
{
  "passwords": [
    {
      "name": "password",
      "value": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
    },
    {
      "name": "password2",
      "value": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
    }
  ],
  "username": "tgourdel"
}

Then you can build and publish your job in Talend Studio:

Image Name: Name of your repository image (can be created on the fly)

Registry: <AZURE_REGISTRY_NAME>.azurecr.io

Username: <AZURE_ACR_USERNAME>

Password: <AZURE_ACR_PASSWORD>

Google GCR

On GCP the authentication is again a slightly different then for the other clouds. You need to use “oauth2accesstoken” as username get an access token as password. Use the gcloud CLI to get your token:

$ gcloud auth print-access-token
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Your registry address is linked to your GCP Project name:

Registry: gcr.io/<GCP_PROJECT_NAME>

(note that you can also use located registry DNS such as eu.gcr.io or us.gcr.io)

Finally, you can see your image in GCR:

Summary

As we have seen during this article, each cloud provider registries have its own authentication mechanism. That is why I gathered all the needed information into a single table that I hope will help you publish to any cloud registry:

  Build
  Docker Host Image Name Image Tag
Local Local if the Docker daemon is installed on the same machine than Talend Studio. Created on the fly on your machine. Any Tag.
Remote Remote if Docker daemon is on a remote machine:
tcp://IP_ADDRESS_DOCKER_DAEMON:DOCKER_DAEMON_PORT
Created on the fly on remote machine. Any Tag.

 

  Publish
  Registry Username Password
DockerHub docker.io/DOCKER_USER DOCKER_USER DOCKER_PWD
Amazon ECR AWS_ACCOUNT.dkr.ecr.
REGION.amazonaws.com
AWS_ACCESS_KEY AWS_SECRET_KEY
Azure ACR AZ_REGISTRY.azurecr.io AZURE_ACR_USER AZURE_ACR_PWD
Google GCR gcr.io/GCP_PROJECT oauth2accesstoken GCLOUD_AUTH_ACCESS_TOKEN

 

If you are looking for info regarding integrating the processing of publishing your jobs to Docker registries in your CI/CD process, you can follow my previous blog article where I show you how you can achieve this.

Join The Conversation

0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *