Talend Big Data Basics

Talend provides a development environment that enables users to interact with many Big Data sources and targets without having to understand or write complicated code.

Talend Big Data Basics is an introduction to the Talend components shipped with several products that interact with Big Data systems.

Duration1 day (7 hours)
Target audience Anyone who wants to use the Talend Studio to interact with Big Data systems
PrerequisitesCompletion of Talend Data Integration Basics or Talend Data Integration Advanced
Course objectives
After completing this course, you will be able to:
  • Create cluster metadata manually, from configuration files, or automatically
  • Create HDFS and Hive metadata
  • Connect to your cluster to use HDFS, HBase, Hive, Pig, Sqoop and Map Reduce
  • Read and write data to/from HDFS (HDFS, HBase)
  • Read and write tables to/from HDFS (Hive, Sqoop)
  • Process tables stored on HDFS with Hive
  • Process data stored on HDFS with Pig
  • Process data stored on HDFS with Big Data batch Jobs
Course agendaBasic concepts
  • Opening a project
  • Monitoring the Hadoop cluster
  • Creating cluster metadata
Reading and writing data in HDFS
  • Storing a file on HDFS
  • Storing multiple files on HDFS
  • Reading data from HDFS
  • Using Hbase to store sparse data on HDFS
Working with tables
  • Importing tables with Sqoop
  • Creating tables with Hive
Processing data and tables in HDFS
  • Processing Hive tables with Jobs
  • Profiling Hive tables
  • Processing data with Pig
  • Processing data with batch Jobs