Big Data (BD) Hadoop Components

Talend provides a development environment that allows for interaction with Big Data sources and targets without the need to learn and write complicated code. Running 100% natively on Apache Hadoop, Talend Big Data provides massive scalability.

Talend Big Data Hadoop Components is an introduction to the components Talend ships with several products for interacting with Big Data systems.

Duration 1 day (approximately 8 hours)
Target Audience Anyone who wants to create Jobs using Talend Big Data components
Prerequisites None. DI Basics or equivalent level of knowledge
Course Objectives
After completing this class, you will be able to:
  • Connect to a Hadoop cluster from a Talend Job
  • Store a raw Web log file to HDFS
  • Write text data files to HDFS
  • Read text files from HDFS
  • Read data from a SQL database and write it to HDFS
  • List a folder's contents and operate on each file separately
  • Move, copy, append, delete, and rename HDFS files
  • Read selected file attributes from HDFS files
  • Conditionally operate on HDFS files
  • Connect to a Hive database from a Talend Job
  • Use a Talend Job to load data from HDFS into a Hive table
  • Use a Talend Job to read data from a Hive table and use it in a Job
  • Execute Hive commands iteratively in a Talend Job, based on variable inputs
  • Develop and run Pig Jobs using Talend components
  • Sort, join, and aggregate data using Pig components
  • Filter data in multiple ways using Pig components
  • Replicate Pig data streams
  • Run Talend Jobs with the Apache Oozie Job Manager
Course Agenda
  • Read and Write HDFS Files
  • Work with HDFS Files
  • Work with Hive
  • Work with Pig
  • Run Talend Jobs with Oozie