Recommendation Engine POC

Talend Big Data and Machine Learning Cookbook 

View the rest of the series:

Sandbox Set Up

IoT Predictive Maintenance Real-time Risk Assessment Data Warehouse


Now that you have downloaded the Talend’s Big Data and Machine Learning Sandbox virtual environment, it’s time to explore some practical ways it can be used in business. In this example, we will demonstrate how to recommend the most relevant movies for users based on recommendations from the fictitious Talend Movie Database website. Using Talend’s machine learning capabilities, we can recommend movies based on the individual visitor’s ratings. First, you will train the recommendation model using a large amount of ratings data (no less than 100,000 are used to build a model) collected from the MovieLens website. This model will allow us to provide movie recommendations that visitors are more likely going to enjoy because previous users had the same tastes based on their own ratings.


Sandbox Recommendation Engine Schema



Machine Learning

Use Talend machine learning capabilities to provide a Recommendation Engine to your website.


Spark Streaming / Real-time

Use Spark Streaming technology to deliver real-time recommendations to your users.


Rest Service Icon

REST Service to Live Dashboard

Use a restful web service to track user movements in a web-based dashboard.



Access the Recommendation Engine use case portal from the Sandbox loading page for quick-run directions and an interactive web interface.

Sandbox Recommendation Engine Web Page Access

Open Talend Studio within the sandbox environment.   For this example, we will be working in the RecommendationEngine folder found in the repository view.  We will explore jobs in the Standard, Big Data Batch and Big Data Streaming Job Designs.   When ready to begin, follow the steps below: 

  1. Navigate to the RecommendationEngine folder under Standard jobs.  Run job Step_01_EnvironmentSetup under the A_Setup folder.  This job initializes the demo environment based on the Big Data Platform you have chosen. Specifically, it loads seed data into HDFS and initializes the tables in a NoSQL database. Sandbox Recommendation Engine Setup
  2. Navigate to the RecommendationEngine folder under Standard jobs.  For quick execution, run job Step_02_TrainModel found in the B_Model folder.  This step trains a model on previous data using a tALSModel component.  This single job consists of four individual Standard and Big Data Batch jobs.Recommendation Engine Train Model
  3. Optional:  The job Step_02_TrainModel consists of four individual jobs in both Standard and Big Data Batch.  For a deeper dive and understanding of the Recommendation Engine process and in particular the training of the Machine Learning model, you can choose to execute each step of this process individually.  To do so, follow these steps:
    • Job 1 – Navigate to the RecommendationEngine folder under Standard jobs and go to B_Model > Sub_Steps.  Run job DeleteModel.  This job simply removes any existing machine learning model existing in the model directory.
    • Job 2 – Navigate to the RecommendationEngine folder under Big Data Batch jobs and go to Sub_Steps.  Run job PrepareMovieData.  This job stages the movie data and populates Cassandra NoSQL tables for fast retrieval during the real-time recommendation execution.
    • Job 3 – Navigate to the RecommendationEngine folder under Big Data Batch jobs and go to Sub_Steps.  Run job Train.  This is the job that uses the prepared movie data to train an Alternating Least Squares Algorithm which will be used in the recommendation engine to produce individualized movie recommendations.
    • Job 4 – Navigate to the RecommendationEngine folder under Standard jobs and to to B_Model > Sub_Steps.  Run job StageModel.  Once the model has been created and trained, it will be copied onto HDFS so it can be accessed by the Recommendation Engine.
  4. Navigate to the RecommendationEngine folder under Standard jobs.  Run the 3 jobs listed in the C_Services folder to enable the API Services that are required for this demo’s web interface:


    Recommendation Engine Movies Service


    Recommendation Engine Ratings Service


    Recommendation Engine Recommendations Service

  5. Navigate to the RecommendationEngine folder under the Big Data Streaming jobs.  Run job Step_04a_RecommendationStream.  This job reads input from the Kafka Queue and based on the input data will send real-time movie recommendations from a Cassandra NoSQL database to be displayed to the user via the web interface. Recommendation Engine Streaming
  6. Now navigate to the web interface.  By default you are logged in with user Charlie Chaplin. This user has already pre-rated some movies.  Choose a Genre and then choose a Movie within that Genre.  After a few seconds of processing, you should get recommendations based on the movies he has already ratedRecommendation Engine Demo Webpage 
    • If you rate more movies for this user, you will have to retrain the model.  To retrain the model, first, stop the Recommendation Stream job that is currently running and then follow the below steps to make a new model available for the recommendation engine:
      • Navigate to the RecommendationEngine folder under the Big Data Batch jobs.  Run job Step_06_RetrainModel. This will use the new input to factor into the machine learning model for the most up-to-date information for the user.
      • Navigate to the RecommendationEngine folder under the Standard jobs.  Run job StageModel in the B_Model > Sub_Steps folder.  As noted before, this will place the newly trained model on HDFS, making it accessible to the Recommendation Engine.
      • Restart the Recommendation Stream job.  With the Recommendation Stream job running, navigate back to the web interface and again, choose a new Genera and movie.  If enough movies were newly rated, you should see new recommendations appear.
  7. For additional practice, you can also start with a new user on the top right of this web page. This user will be brand new and have no pre-selected ratings.  Following the same process as before, rates some movies, Retrain the model and re-start the Recommendation Stream job. Then you should get recommendations based on the ratings of the new user.


This example highlights the use of a recommendation engine to provide real-time movie recommendations based on information gathered from an individual users’ previous ratings of movies.  The more information that is gathered from individual users, the more valuable the recommendations will be to the user.  Behind the scenes, Talend used Spark Streaming and the Alternating Least Squares model to generate the recommendations and a NoSQL database such as Cassandra with its fast-read capabilities to feed the recommendations to the web frontend in a matter of seconds.