Containerization in the Cloud – Getting Started

Containerization in the cloud is an important trend in software development, and for good reason: it enables your business to scale beyond your wildest dreams. If you are thinking about deploying containers, there are a few things you should know first that will help to maximize the benefits.

Application Containers — Amazon EC2

Application containers are run according to a specific Amazon EC2 instance. The instances are grouped by family according to target application profiles: CPU, memory, storage, and networking capacity. There are the standard on-demand instances that you run every day with Amazon. You spin them up, run the tasks, shut them down…but when you go into an operational environment, you must think about the various kinds of workloads you have. There are a variety of instance types from Amazon, here are three we recommend:

  • Reserved Instances: You have some predictable and known workloads and if you use Amazon Reserved Instances (RI), you can save a significant amount. If you get a three-year RI, you can save up to 75% of your IT costs for that portion of your workload portfolio that you know is fixed, stable, and repeatable.
  • On-demand Instances: For more dynamic workloads, you can use EC2 On-Demand instances or AWS Fargate.
  • Spot Instances: There is a third category that is underutilized but perfect for unpredictable workloads that vary over time – ">AWS Spot Instances. Spot Instances offer same savings as on-demand RIs without the big upfront commitment of paying for an entire year and they're elastic. Spot Instances do come with a challenge, however. They're cheaper because they can be interrupted, so you must build more robust, resilient applications and integration workflows, and that’s how Talend can help.

Using Containers to Build a Resilient Integration Workflow

Talend solutions can help you build resilient integration workflows, and you can run them in containers that are hosted on Spot Instances. If a Spot Instance is interrupted, Talend will have preserved the state, so it will failover to another instance. We have observed that this type of resilient data integration fabric has the potential to save 60% on that portion of the workload.

An end-to-end unified integration infrastructure, such as Talend Data Fabric, would be part of your bigger ecosystem. This will help with standardizing end-to-end DevOps, and if you have five different islands or variations of DevOps for each of your enterprise tools, that doesn't really streamline your overall IT environment. A traditional enterprise application server with Talend Administration Center is taking your jobs from a Nexus binary repository and sending it on a schedule to your job’s server agents. A Java virtual machine (JVM) is started, and your job runs and once it’s finished, the JVM shuts down.

Download The Cloud Data Integration Primer now.
Download Now

Delegating More to the Cloud with Talend

Just as we delegated to the cloud for serverless, we can also delegate the responsibility for scaling out, scheduling, login, and provisioning to the cloud provider. Talend is built for the cloud and is a Java code generator uniquely positioned to generate lightweight jobs that run as Process ID 1 (PID 1) in containers.

The PID 1 would be the only process in the container, so to start a Talend job, instead of using custom infrastructure, you can use something like EC2 container services. You're scheduling your container: all you do is spin up the container, and the Talend job is part of a Docker image it starts to run. When it's finished, it shuts down and that's the only process in the container. After that, you don't have to manage anything else because you've lived up to your part of that contract by containerizing your Talend app.

This process is very efficient and more elastic…and you are reaping the benefits from your cloud provider. Amazon provides a lot of great benefits, including built-in security and login, elasticity, and scalability. This takes us far beyond the days of getting a big application server and having to integrate different configuration tools - monitor management and control tools, security islands, and the like into it.

Today, that infrastructure has been standardized, it has been commoditized down to the cloud provider and that standardization translates to cost savings and more time to focus on the data.

If you are running containers in the cloud, you will want to loosely couple your chain of jobs that implement your workflow while keeping them tightly integrated. If you are using Amazon S3, you will get an Simple Notification Service (SNS) message which you will put into Amazon Simple Queue Service (SQS). The second job will move it from SQS and run some externalized business validation rules. These jobs are running in containers on Spot Instances, which brings us back to the different instances.

Which Containerization Solution is Right for You?

We have touched on some of the instances available, but how do you know which to use? That is driven by your environment – is it highly predictable or completely variable? To better explain, we will mix and match those to show you how to manage that workload. Let’s focus on two parts:

  • Containerizing the platform
  • Containerizing your jobs in Talend

You can start by containerizing the jobs; bringing over only the capabilities that define your containerization strategy and “Docker-itizing” a specific job or data service within the Talend Studio. Your developer can then drive the packaging right from the beginning, from defining and implementing the microservices that get deployed out into the container.

Next, you will need to productize all the plugins that you have in the built pipeline. To do this, you create all the packages and the deployments for the jobs and immediately also go and create the image, and publish it to the Docker registry, and finally, it’s ready for provisioning.

Download Containerization in the Cloud – Getting Started now.
View Now

Provisioning the Deployment

Provisioning starts with deploying the container on-premise and making sure you can have quick roundabout test cycles. As you move up to the environments and testing inferences, you can continue and deploy the container to Amazon. Then by integrating it with Talend’s scheduling capabilities on those areas you can ultimately leverage all the cloud benefits.

Containerizing the Platform

Containers alone provide a method of deployment, but they don't give you scale. Luckily with Talend, as your workload grows, you can go beyond the jobs you have built and scale Talend as a platform.

You can start with a simple configuration: the developer experience. Developers tend to be less concerned about security and constraints, and don’t need to scale out to large workloads because their workloads don’t usually have a lot of data passing through. At the other end of the spectrum is the production deployment, which needs to auto-scale. Talend provides you with a variety of configuration templates and all include auto-scaling.

Talend also gives you the ability to plug in your own system monitoring and operational tools, making it possible for you to run a private cloud. It’s easy to leverage your own tools when you can plug them in and integrate them into the larger integration solution.

Realize the Value of Containerization in the Cloud

There is so much value that can come from the cloud: managed, monitored, and controlled applications that can be delivered on demand in an elastic way, with security, visibility and measurability baked right in — the potential is immense. Enterprises looking to take the leap into the container world can start small. If you are in the beginning stage, you can certainly start with on-premise deployment. Just make sure to have your sights ultimately set on the cloud, so you can take advantage of the wealth of benefits cloud providers offer.

| Last Updated: August 12th, 2019