Talend Big Data Advanced – Spark Streaming

Talend provides a development environment that lets you interact with many source and target Big Data stores, without having to learn and write complicated code.

This course covers Big Data streaming Jobs that use the Spark streaming framework.

Duration1 day
(7 hours)
Target audienceAnyone who wants to use Talend Studio to interact with Big Data systems
PrerequisitesCompletion of Talend Big Data Basics
Course objectives

After completing this course, you will be able to:

  • Connect to a Hadoop cluster from a Talend Job
  • Use context variables and metadata
  • Read and write files in HDFS or HBase in a Big Data batch or Big Data streaming Job
  • Read and write messages in a Kafka topic in real time
  • Configure a Big Data batch Job to use the Spark framework
  • Configure a Big Data streaming Job to use the Spark streaming framework
  • Save logs to Elasticsearch
  • Configure a Kibana dashboard
  • Ingest a stream of data to a NoSQL database, HBase
Course agenda

Spark in context

  • Concepts

Reading and writing messages with Kafka

  • Understanding Kafka basics
  • Creating a new topic in Kafka
  • Publishing messages to a specific topic using a standard Job
  • Consuming messages in a specific topic using a standard Job
  • Publishing messages to Kafka topics in real time using a Big Data Spark Streaming Job
  • Consuming messages to Kafka topics in real time using a Big Data Spark Streaming Job
  • Enriching data using a MySQL table and a lookup in a Big Data Spark Streaming Job

Introduction to Spark

  • Understanding Spark basics
  • Analyzing customer data
  • Producing and consuming messages in real time

Logs processing use case – monitoring

  • Introduction to the log processing use case
  • Monitoring enriched logs
  • Saving logs to Elasticsearch
  • Using and modifying a Kibana dashboard to visualize data

Logs processing use case – reporting

  • Generating reports based on data windows
  • Consuming messages from a Kafka topic
  • Using the tWindow component to schedule processing

Logs processing use case – batch analysis

  • Ingesting streams of data
  • Analyzing logs with a batch Job