Creating Cluster Connection Metadata from Configuration Files

In this tutorial, create Hadoop Cluster metadata by importing the configuration from the Hadoop configuration files.

This tutorial uses Talend Data Fabric Studio version 6 and a Hadoop cluster: Cloudera CDH version 5.4.

1. Create a new Hadoop cluster metadata definition

  1. Ensure that the Integration perspective is selected.
  2. In the Project Repository, expand Metadata, right-click Hadoop Cluster, and click Create Hadoop Cluster to open the wizard.
  3. In the Name field of the Hadoop Cluster Connection wizard, type MyHadoopCluster_files. In the Purpose field, type Cluster connection metadata, in the Description field, type Metadata to connect to a Cloudera CDH 5.4 cluster, and click Next.

The Hadoop Configuration Import wizard opens.

2. Import the configuration from Hadoop configuration files

  1. In the Distribution list of the Hadoop Configuration Import wizard, select Cloudera, and in the Version list, select 4(YARN mode). There are different ways to create Hadoop cluster metadata:

    - Automatic configuration by retrieving the configuration from Cloudera Manager or from Ambari
    - Automatic configuration by importing the configuration from the Hadoop configuration files
    - Manual configuration
  2. To select the configuration from files method, in the Option panel, select Import configuration from local files, and click Next

3. Locate the configuration files folder and retrieve configuration

  1. Click Browse … , navigate to C:/StudentFiles/HadoopConf, and click OK.

    The Hadoop configuration files must be available and locally accessible. You will need to provide the location of the Hadoop configuration files folder. The configuration files will be parsed to retrieve the connection information.

    The Hadoop Configuration Import Wizard detects configuration files and lists the corresponding services. In this tutorial, we will keep the default configuration and create metadata definitions for YARN and HDFS.
  2. To import the configuration to the Hadoop cluster metadata created, click Finish.

4. Create definitions corresponding to YARN and HDFS

  1. In the authentication panel of the Hadoop Cluster Connection wizard, type the User name as student, and click Check Services. The Checking Hadoop Services window opens. Note that the Namenode and Resource Manager status is 100%.
  2. Close the Checking Hadoop Services window. To close the Hadoop Cluster Connection window and create the metadata, click Finish.

5. Inspect the metadata created in the Repository

  1. In the Repository, expand Hadoop Cluster. The metadata definition is now available.
  2. Expand the main definition named MyHadoopCluster_files, which corresponds to the YARN service. Expand HDFS, which is saved in the subfolder of the main MyHadoopCluster_files metadata.

The Hadoop cluster metadata definition created is now ready to be used in a Talend Job.