ELK with Talend cloud
ELK is the acronym for three open source projects where E stands for Elasticsearch, L stands for Logstash and K stands for Kibana. ELK is a robust solution for log management and data analysis. These open source projects have specific roles in ELK as follows:
- Elasticsearch handles storage and provides a RESTful search and analytics endpoint.
- Logstash is a server-side data processing pipeline that ingests, transforms and loads data.
- Kibana lets you visualize your Elasticsearch data and navigate the Elastic Stack.
In this blog, I am going to show you how to configure ELK while working with Talend Cloud. The blog will focus on Loading Streaming Data into Amazon ES from Amazon S3. Refer to this help document from AWS for more details
Talend Cloud enables you to save the execution logs automatically to Amazon S3 Bucket. The flow for Talend Cloud logs to be working with ELK is as shown below.
Once you have configured the Talend cloud logs to be saved to Amazon S3 bucket, a Lambda function is written. Lambda is used to send data from S3 to Amazon ES domain. As soon as a log arrives into S3, the S3 bucket triggers an event notification to Lambda, which then runs the custom code to perform the indexing. The custom code in this blog is written in Python.
To configure ELK with Talend Cloud logs you need
- Talend Cloud account with log configuration in TMC – refer to this help document for Talend Cloud logs Configuration
- Amazon S3 bucket – refer to this amazon page on Amazon S3
- Configure cross-account roles – refer to this amazon page on how to configure cross-account roles
- Amazon Lambda function – refer to this amazon page on Amazon Lambda functions
- Amazon Elasticsearch domain – refer to this amazon page on Amazon Elasticsearch domain
This section outlines the steps needed for loading streaming Talend Cloud logs to Amazon ES domain
Step1 : Configure Talend cloud
- Download the cloud formation template.
- Open your AWS account in a new tab and start the Create Stack wizard on the AWS CloudFormation Console. In the Select Template step, select Upload a template to Amazon S3 and pick the template provided by Talend Cloud.
- In the Specify Details section, define the External ID, S3BucketName, and S3 prefix
- Click Create. The stack is created. If you select the stack, you can and find the RoleARN key value in the Outputs
- In the Review step, select I acknowledge that AWS CloudFormation might create IAM resources.
- Go back to Talend Cloud Management Console and enter the details
Step2: Create Amazon Elasticsearch Domain
- Refer to this document : https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-gsg-create-domain.html
- For this KB, I am selecting Development and Testing
- Give a domain name
- For the rest of the options, give as needed by the organization and click on create
Step3: Create Lambda function
- There are multiple ways to create a Lambda function. For this blog, I am using Amazon Linux Machine with CLI configured.
- Using putty login to the Ec2 instance
- Install Python using these commands
yum -y install python-pip zip
pip install virtualenv
- Run the next set of commands
# Prepare the log ingestor virtual environment
mkdir -p /var/s3-to-es && cd /var/s3-to-es
cd /var/s3-to-es && source bin/activate
pip install requests_aws4auth -t .
pip freeze > requirements.txt
- Validate that the files needed are installed
- Create a file s3-to-es.py and past the attach code in the file
- Change the permission to 754
- Run the command to package
# Package the lambda runtime
zip -r /var/s3-to-es.zip *
- Send the package to S3 bucket
aws s3 cp /var/s3-to-es.zip s3://rsree-tcloud-eu-logs/log-ingester/
- Validate the upload in the S3 bucket
- Create Lambda function
- In the function code, select ‘Upload a file from Amazon S3’ as shown below and click on save
- Add a Trigger, by selecting S3 bucket
- Validate that the trigger is added to the s3 bucket
- Now let’s execute a Talend job for the log to be routed to S3. You could notice from the Lambda Monitoring tab that the log is being pulled in. You could also view the logs in Cloudwatch
Step4: Create Visualization in Kibana
- Navigate to Elasticsearch domain and notice that a new index is created
- You could also search for this index in Kibana dashboard
- Click on the discover to view the sample data
- You could now create visualization and see those in the dashboard
In this blog we saw how we could leverage the power of ELK with Talend Cloud. Once you have the Elk configured you could use it for diagnosing and resolving bugs and production issues or for the metrics about health and usage of jobs/resource. Well that’s all for now, keep watching this space for more blogs and until then happy reading!!