The 2019 Gartner Magic Quadrant for Data Quality Tools : Talend named a leader

ELK with Talend cloud

ELK with Talend cloud

  • Rekha Sree
    Rekha Sree is a Customer Success Architect, using her expertise in Data Integration, Data Warehouse and Big Data to help drive customer success at Talend. Prior to joining Talend, Rekha worked at Target Corporation India Pvt Ltd for more than a decade using her vast knowledge in building their enterprise and analytical data warehouse.

Overview

ELK is the acronym for three open source projects where E stands for Elasticsearch, L stands for Logstash and K stands for Kibana. ELK is a robust solution for log management and data analysis. These open source projects have specific roles in ELK as follows:

  • Elasticsearch handles storage and provides a RESTful search and analytics endpoint.
  • Logstash is a server-side data processing pipeline that ingests, transforms and loads data.
  • Kibana lets you visualize your Elasticsearch data and navigate the Elastic Stack.

In this blog, I am going to show you how to configure ELK while working with Talend Cloud. The blog will focus on Loading Streaming Data into Amazon ES from Amazon S3. Refer to this help document from AWS for more details

 

Process Flow

Talend Cloud enables you to save the execution logs automatically to Amazon S3 Bucket. The flow for Talend Cloud logs to be working with ELK is as shown below.

Talend Cloud ELK

Once you have configured the Talend cloud logs to be saved to Amazon S3 bucket, a Lambda function is written. Lambda is used to send data from S3 to Amazon ES domain. As soon as a log arrives into S3, the S3 bucket triggers an event notification to Lambda, which then runs the custom code to perform the indexing. The custom code in this blog is written in Python.

Prerequisite

To configure ELK with Talend Cloud logs you need

  • Talend Cloud account with log configuration in TMC – refer to this help document for Talend Cloud logs Configuration
  • Amazon S3 bucket – refer to this amazon page on Amazon S3
  • Configure cross-account roles – refer to this amazon page on how to configure cross-account roles
  • Amazon Lambda function – refer to this amazon page on Amazon Lambda functions
  • Amazon Elasticsearch domain – refer to this amazon page on Amazon Elasticsearch domain

Steps

This section outlines the steps needed for loading streaming Talend Cloud logs to Amazon ES domain

Step1 : Configure Talend cloud

  • Download the cloud formation template.
  • Open your AWS account in a new tab and start the Create Stack wizard on the AWS CloudFormation Console. In the Select Template step, select Upload a template to Amazon S3 and pick the template provided by Talend Cloud.

Elk with Talend cloud

 

 

  • In the Specify Details section, define the External IDS3BucketName, and S3 prefix

Elk with Talend cloud

 

 

  • Click Create. The stack is created. If you select the stack, you can and find the RoleARN key value in the Outputs

Elk with Talend cloud

 

 

  • In the Review step, select I acknowledge that AWS CloudFormation might create IAM resources.

Elk with Talend cloud

 

 

  • Go back to Talend Cloud Management Console and enter the details

Elk with Talend cloud

 

Step2: Create Amazon Elasticsearch Domain

Elk with Talend cloud

 

 

  • Give a domain name

Elk with Talend cloud

 

 

  • For the rest of the options, give as needed by the organization and click on create

Elk with Talend cloud

 

Step3: Create Lambda function

  • There are multiple ways to create a Lambda function. For this blog, I am using Amazon Linux Machine with CLI configured.
  • Using putty login to the Ec2 instance
  • Install Python using these commands

yum -y install python-pip zip

pip install virtualenv

Elk with Talend cloud

 

  • Run the next set of commands

# Prepare the log ingestor virtual environment

mkdir -p /var/s3-to-es && cd /var/s3-to-es

virtualenv /var/s3-to-es

cd /var/s3-to-es && source bin/activate

pip install requests_aws4auth -t .

pip freeze > requirements.txt

Elk with Talend cloud 

 

 

  • Validate that the files needed are installed

Elk with Talend cloud 

 

 

  • Create a file s3-to-es.py and past the attach code in the file

Elk with Talend cloud 

 

 

  • Change the permission to 754

Elk with Talend cloud 

 

 

  • Run the command to package

# Package the lambda runtime

zip -r /var/s3-to-es.zip *

Elk with Talend cloud 

 

 

  • Send the package to S3 bucket

aws s3 cp /var/s3-to-es.zip s3://rsree-tcloud-eu-logs/log-ingester/

Elk with Talend cloud 

 

 

  • Validate the upload in the S3 bucket

Elk with Talend cloud 

 

 

  • Create Lambda function

Elk with Talend cloud 

 

 

  • In the function code, select ‘Upload a file from Amazon S3’ as shown below and click on save

Elk with Talend cloud 

 

 

  • Add a Trigger, by selecting S3 bucket

Elk with Talend cloud

 

 

  • Validate that the trigger is added to the s3 bucket

Elk with Talend cloud

Elk with Talend cloud 

 

 

  • Now let’s execute a Talend job for the log to be routed to S3. You could notice from the Lambda Monitoring tab that the log is being pulled in. You could also view the logs in Cloudwatch

Elk with Talend cloud 

 

Step4: Create Visualization in Kibana

  • Navigate to Elasticsearch domain and notice that a new index is created

Elk with Talend cloud 

 

 

  • You could also search for this index in Kibana dashboard

Elk with Talend cloud 

 

 

  • Click on the discover to view the sample data

 

 

 

  • You could now create visualization and see those in the dashboard

Conclusion

In this blog we saw how we could leverage the power of ELK with Talend Cloud. Once you have the Elk configured you could use it for diagnosing and resolving bugs and production issues or for the metrics about health and usage of jobs/resource.  Well that’s all for now, keep watching this space for more blogs and until then happy reading!!

 

Join The Conversation

0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *