Talend Big Data Advanced – Machine Learning

Talend provides a development environment that lets you interact with many source and target Big Data stores without having to learn and write complicated code.

This course covers the implementation of machine learning algorithms in Big Data batch Jobs using the Spark framework.

Duration1 day (7 hours)
Target audience Anyone who wants to use Talend Studio to industrialize machine learning algorithms
PrerequisitesCompletion of Talend Data Quality Essentials or Talend Big Data Basics
Course objectives

After completing this course, you will be able to:

• Connect to a Hadoop cluster from a Talend Job
• Use context variables and metadata
• Read and write files in HDFS in a Big Data batch Job
• Configure a Big Data batch Job to use the Spark framework
• Create and test recommendation models
• Create and test classification models
• Use a machine learning algorithm to deduplicate data

Course agenda

SMS classification use case

• Monitoring the Hadoop cluster
• Exploring an SMS classification use case: decision trees
• Connecting to your cluster
• Creating an SMS classification model
• Testing the SMS classification model

Movie recommendation use case

• Movie recommendation use case: alternating least squares
• Building a movie recommendation model
• Testing the movie recommendation model

Irises classification use case

• Exploring an Iris flower classification use case: Naïve Bayes classifier
• Building an iris classification model
• Testing the iris classification model

Child-care deduplication use case

• Exploring a child-care use case and dataset: matching
• Setting up the environment
• Pairing data
• Building a matching model
• Using the matching model
• Merging groups of duplicates