Talend Big Data Advanced – Spark Batch

Talend provides a development environment that lets you interact with many source and target Big Data stores without having to learn and write complicated code.

This course covers Big Data batch Jobs that use the Spark framework.

Duration1 day (7 hours)
Target audience Anyone who wants to use Talend Studio to interact with Big Data systems
PrerequisitesCompletion of Talend Big Data Basics
Course objectives

After completing this course, you will be able to:

  • Create a Big Data batch Job using the Spark framework
  • Copy data from a local file to HDFS
  • Copy data from MySQL to HDFS
  • Create a Hive table and copy data from HDFS to it
  • Import tweets to HDFS
  • Join, sort, and aggregate data
  • Use caches for faster processing
  • Query data from a Hive table using Hive QL
  • Query data from Spark datasets using Spark SQL
Course agenda

Spark in context

  • Concepts

Introduction to Spark

  • Monitoring the Hadoop cluster
  • Setting up the development environment
  • Understanding the basics of Spark
  • Analyzing customer data

Sentiment analysis use case

  • Monitoring the Hadoop cluster
  • Setting up the development environment
  • Loading tweets into HDFS
  • Processing tweets with Spark
  • Scheduling job execution

Download analysis use case

  • Setting up the development environment
  • Loading customers to Hive
  • Download analysis
  • Using Spark SQL to query data