Since the release of Talend 7.1 users can build Talend jobs as Docker images and publish them to Docker registries. In this blog post, I am going to run through the steps to publish to the major cloud provider container registries (AWS, Azure and Google Cloud). Before I dig into publishing container images to registries, I am going to remind you the basics of building Talend Jobs in Docker images from Talend Studio as well as point out the difference between a local build and a remote build.
Requirements
- Talend Studio 7.1.1 or higher
- Platform license
- Docker software installed and accessible from Studio
What is a container registry?
First, let’s explain the concept of a container registry. For those of you familiar with this, feel free to skip ahead.
A container registry is basically a set of repositories for Docker images. This is where you store and distribute your Docker images to further use. However, most of them also offer access control over who can see, view and download images as well as CI/CD integration and vulnerability scanning.
Let’s take a look at the major Docker registries available:
- DockerHub is the world’s largest library and community for container images. This is the in-house registry of Docker.
- Amazon ECR is the Amazon Web Services registry.
- Google GCR is the Google Cloud Platform registry.
- Azure ACR is the Microsoft Azure registry.
How do I build a Talend Job as a Docker image?
Before we publish to a registry, let me remind you of the ways to perform a build of a Docker image in Talend Studio:
Local build
To build a job as a Docker Image:
- Right-Click on a Job
- Select “Build Job”.
- Select “Docker Image” as build type and fill in the form with your own settings.
The Docker Host is the Docker daemon currently running on the machine where you want to build your image. You can either build the image on your local machine or on a remote host. In the example above the Docker daemon is running on the same machine than the Studio, hence we use the local mode.
Remote Build
In most cases, Docker is installed on the machine where you run Talend Studio. However, you might want to build your Docker image on a remote host such as a local virtual machine or a virtual machine in the cloud. If this is the case, you need to select the remote mode.
In my case, I have a Windows laptop where I run a Linux virtual machine in VMWare. I am more comfortable with running Docker on a Linux machine. That is why, to build my image on my Linux VM, I need to select the remote mode and specify its IP address using TCP protocol. We also need to open a port to be able to access the remote Docker daemon from outside. Please refer to the Docker documentation to enable the TCP socket access to Docker.
How to publish a Talend Docker Image to a Registry
With Talend Studio you can also build your Docker images and push them to a registry with only one action called publishing.
Dockerhub
Let’s start with DockerHub. To publish your Talend Job as an image into a Docker registry, right-click on your job and select Publish:
Compared to the build function you now have 3 more fields. As you are also pushing your image to a Docker registry you need to specify your registry and the credentials used to access it.
Registry: |
Username: |
Password: |
Amazon ECR
If you want to publish your Docker image into your own AWS account, you have the possibility to use Amazon Elastic Container Service which is a private registry into your AWS account. With Amazon ECR you only get a single registry by account and by region. You also need to create the image repositories beforehand. So, in my case I created the repository “talendjob”:
Then you can fill the publish form as follows:
The image name will be the name you have given to your image repository and the registry will be the URI composed of your account number and region.
Image Name: |
Registry: |
Username: |
Password: |
Exceptionally the username and password are AWS credentials. As a matter of fact, Talend Studio uses Fabric8 Maven plugin, and as you can see in the documentation a custom way of connection has been developed to ease the authentication to Amazon ECR.
Azure ACR
Unlike Amazon ECR, you can create as many as registries you want in your Azure Portal. In my case I created a registry called “tgourdel”:
You can get your registry credentials using the Azure CLI as follow:
Command:
$ az acr credential show --name tgourdel { "passwords": [ { "name": "password", "value": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" }, { "name": "password2", "value": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" } ], "username": "tgourdel" }
Then you can build and publish your job in Talend Studio:
Image Name: |
Registry: |
Username: |
Password: |
Google GCR
On GCP the authentication is again a slightly different then for the other clouds. You need to use “oauth2accesstoken” as username get an access token as password. Use the gcloud CLI to get your token:
$ gcloud auth print-access-token XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Your registry address is linked to your GCP Project name:
Registry: gcr.io/<GCP_PROJECT_NAME>
(note that you can also use located registry DNS such as eu.gcr.io or us.gcr.io)
Finally, you can see your image in GCR:
Summary
As we have seen during this article, each cloud provider registries have its own authentication mechanism. That is why I gathered all the needed information into a single table that I hope will help you publish to any cloud registry:
Build | |||
---|---|---|---|
Docker Host | Image Name | Image Tag | |
Local | Local if the Docker daemon is installed on the same machine than Talend Studio. | Created on the fly on your machine. | Any Tag. |
Remote | Remote if Docker daemon is on a remote machine: tcp:// IP_ADDRESS_DOCKER_DAEMON :DOCKER_DAEMON_PORT |
Created on the fly on remote machine. | Any Tag. |
Publish | |||
---|---|---|---|
Registry | Username | Password | |
DockerHub | docker.io/DOCKER_USER |
DOCKER_USER |
DOCKER_PWD |
Amazon ECR | AWS_ACCOUNT .dkr.ecr.REGION .amazonaws.com |
AWS_ACCESS_KEY |
AWS_SECRET_KEY |
Azure ACR | AZ_REGISTRY .azurecr.io |
AZURE_ACR_USER |
AZURE_ACR_PWD |
Google GCR | gcr.io/GCP_PROJECT |
oauth2accesstoken | GCLOUD_AUTH_ACCESS_TOKEN |
If you are looking for info regarding integrating the processing of publishing your jobs to Docker registries in your CI/CD process, you can follow my previous blog article where I show you how you can achieve this.
Participer aux discussions