Talend Big Data Advanced

Talend provides a development environment that lets you interact with many source and target Big Data stores, without having to learn and write complicated code.

This course covers Big Data batch Jobs that use the MapReduce or Apache Spark framework. It then covers Big Data streaming Jobs that use the Spark streaming framework.

Duration 2 days (14 hours)
Target audience Anyone who wants to use the Talend Studio to interact with Big Data systems
Prerequisites Completion of Talend Data Integration Basics and Talend Big Data Basics
Course objectives
After completing this course, you will be able to:
  • Connect to an Apache Hadoop cluster from a Talend Job
  • Use context variables and metadata
  • Read and write files in HDFS or HBase in a Big Data batch or Big Data streaming Job
  • Use the Twitter API with Talend components
  • Schedule Big Data Job execution from the Talend Administration Center (TAC)
  • Read and write messages in an Apache Kafka topic in real-time
  • Configure a Big Data batch Job to use the Apache Spark or the MapReduce framework
  • Configure a Big Data streaming Job to use the Apache Spark streaming framework
Course agenda

Big Data Advanced–YARN (1 day)
Clickstream use case

  • Setting up a development environment
  • Loading data into HDFS
  • Enriching logs
  • Computing statistics
  • Converting a standard Job to a Big Data batch Job
  • Understanding MapReduce jobs
  • Using the Studio to configure resource requests to YARN

Sentiment analysis use case

  • Loading dictionary and time zone data into HDFS
  • Loading tweets in HDFS
  • Processing tweets with MapReduce
  • Scheduling Job execution

Big Data Advanced–Spark (1 day)
Introduction to Apache Kafka

  • Understanding basics of Kafka
  • Publishing messages to a Kafka topic
  • Consuming messages

Introduction to Apache Spark

  • Understanding basics of Spark
  • Analyzing customer data
  • Producing and consuming messages in real time

Logs processing use case–generating enriched logs

  • Introduction to the logs processing use case
  • Generating raw logs
  • Generating enriched logs

Logs processing use case–monitoring

  • Monitoring enriched logs

Logs processing use case–reporting

  • Generating reports based on data windows

Logs processing use case–batch analysis

  • Ingesting streams of data
  • Analyzing logs with a batch Job

SMS classification use case (optional)

  • Understanding the basics of machine learning
  • Creating an SMS classification model
  • Testing the SMS classification model