Talend Big Data and Machine Learning Sandbox Cookbook

Before we dive into the practical ways the Talend Big Data and Machine Learning Sandbox can benefit your business, we want to help you install it properly.

What is the Sandbox?

Talend’s Big Data and Machine Learning Sandbox is a virtual environment that utilizes Docker containers to combine the Talend Real-time Big Data Platform with some sample scenarios that are pre-built and ready-to-run. 

In the links below, you will find POCs that are built on real world use cases that demonstrate how Talend, Spark, NoSQL, and real-time messaging can be easily integrated into your daily business.  Whether it's batch, streaming or real-time data integration, you will begin to understand how Talend can be used to address your big data challenges and move your business into the data-driven age.

Visit the POCs:

You can now take full advantage of your Sandbox with these pre-built and ready to run sample scenarios:

Recommendation EngineIoT Predictive Maintenance
Real-time Risk AssessmentData Warehouse Optimization

What are the system requirements for the Sandbox?

The sandbox is packaged as a virtual machine (VM) and requires a virtual machine player

Supported VM Players are:

  • VMWare
  • VMWare Fusion (For MAC users)
  • VirtualBox

For the host machine, we recommend:

  • At least 8-10GB of available RAM
  • 50GB of available disk space

Once the player of your choice is downloaded and installed following VM player installation instructions, you will be able to download and install the Talend Big Data and Machine Learning Sandbox. Then you can walk through the sandbox demos (links above) that integrate Apache Kafka, Spark, Spark Streaming, Hadoop, and NoSQL.

How do I set up and configure the sandbox?

It is important to note that a steady and reliable internet connection is required to complete the installation and configuration of the Talend Big Data and Machine Learning Sandbox.  Once you have completed the online registration and chosen the desired sandbox download file, you will receive a small Download Manager Application (.dlm).  Open this application to manage the rest of the download of the sandbox.  The Talend Big Data and Machine Learning Sandbox is a 6GB Open Virtualization Format Archive (.ova) file and could take some time to download depending on internet connection speeds.  For this reason, the Download Manager Application can be used to pause and restart the download process.

Once you have completed the download and saved the .ova file to your local hard drive (i.e. C:/TalendSandbox), follow the instructions for Importing to VirtualBox or Importing to VMWare-based on the Virtual Machine Player and matching Sandbox file that you are using.

VirtualBox

  1. Open the VirtualBox application.
  2. From the menu bar, select File > Import Appliance…
  3. Navigate to the .ova file that you downloaded. Select it and click Next.
  4. Accept the default Appliance Settings by clicking

Talend Machine Learning Sandbox Import Virtualbox

 

VMWare

  1. Open the VMware Player application.
  2. Click on “Open a Virtual Machine
  3. Navigate to the .ova file that you downloaded. Select it and click Open.
  4. Select the Storage path for the new Virtual Machine (e.g. C:/TalendSandbox/vmware) and then click Import.

Talend Machine Learning Sandbox VMWare Import

The Talend Big Data and Machine Learning Sandbox Virtual Machines come pre-configured to run with 8GB RAM and 2 CPUs.  You may need to adjust these settings based on your PC’s capabilities.  To run the MapR examples, it is recommended to boost the VM RAM setting to 10GB or more if available.

What should I expect when I start the VM for the first time?

When you start the Talend Big Data and Machine Learning Sandbox for the first time, the virtual machine will start by loading a web landing page that tracks the Sandbox setup.  This process can take 15-30 mins depending on internet connection speeds and network traffic.  After a short time, you will be asked to choose a Hadoop Platform.  You can choose from either Cloudera, Hortonworks or MapR.  You may also choose to explore the sandbox environment without selecting a Hadoop Platform.  If at some point, you decide to select a Platform or even change to a different Platform, you can access the available Platforms at any time by clicking “Choose a Hadoop Platform” in the top right of landing page.

Sandbox Big Data Platform Selection

It is important to note to be patient during the loading process and let the sandbox complete its building process.  Do not open Talend Studio during the building process.  Once you receive the indication that the sandbox is ready for use, you can begin working in the virtual environment.

Sandbox is ready

When the sandbox is officially ready, you can access additional resources and demo content by scrolling down on the landing page.  Here you will have access to demo-specific web applications that provide quick-start instructions on how to execute the demos within the sandbox.  You will also have access to the Hadoop Cluster Resource Manager WebUI by accessing “Hadoop Cluster”, as well as the HDFS WebUI by accessing “HDFS Browser”.

With the MapR Distribution, the HDFS Browser directs you to the MapR Control System (MCS) where you can look at your volumes, data tables, and streams. To access this in Firefox, you will need to add an exception certificate.

How do I launch Talend Real-time Big Data Studio?

Now that your Sandbox is up and running you can launch the Talend Studio. To do so click on the Talend icon on the left bar of your desktop. Follow these steps the first time you run it:

  1. First you need to configure a connection. Click on Manage Connection and enter your email address and then click OK.
  2. Then you need to select the project you want to open. Depending on the Big Data Platform you have chosen you will get the following choices:
    • CLOUDERA_DEMOS
    • HORTONWORKS_DEMOS
    • MAPR_DEMOS
    • LOCAL_DEMOS (if you declined to load a Big Data Platform)

Machine Learning Sandbox Select Project

Note:  If you have downloaded multiple Big Data Platforms you will have several projects. Choose accordingly for the Big Data Platform you have selected.

  1. Once Talend Studio opens, you will be presented with a Welcome screen.  Close the Welcome screen, and you will be presented with a pop-up to install additional packages.  You need to keep the Required third-Party libraries selected and also select Optional third-party libraries and click Finish.

  1. Accept all 3rd party licenses that need acceptance. Click the "I accept the terms of the selected license agreement" radio button and click Accept All.
    Sandbox Talend Studio Accept License
  2. Let the downloads complete before continuing (Be patient as the downloads can take a while).

| Last Updated: October 11th, 2018