In this tutorial, create Hadoop Cluster metadata automatically by connecting to the Cloudera Manager.
This tutorial uses Talend Data Fabric Studio version 6 and a Hadoop cluster: Cloudera CDH version 5.4.
1. Create a new Hadoop cluster metadata definition
- Ensure that the Integration perspective is selected.
- In the Project Repository, expand Metadata, right-click Hadoop Cluster, and click Create Hadoop Cluster to open the wizard.
- In the Name field of the Hadoop Cluster Connection wizard, type MyHadoopCluster. In the Purpose field, type Cluster connection metadata, and in the Description field, type Metadata to connect to a Cloudera CDH 5.4 cluster, and click Next.
The Hadoop Configuration Import wizard opens.
2. Select the automatic configuration method
- In the Distribution list of the Hadoop Configuration Import wizard, select Cloudera, and in the Version list, select 4(YARN mode).
There are different ways to create Hadoop cluster metadata:
– Automatic configuration by retrieving the configuration from Ambari or from Cloudera Manager
– Automatic configuration by importing the configuration from the Hadoop configuration files
– Manual configuration
- To select the automatic configuration method, in the Option panel, select Retrieve configuration from Ambari or Cloudera, and click Next.
3. Connect to the Cloudera Manager
The Cloudera Manager is an end-to-end application for managing Cloudera CDH clusters. To retrieve the connection information and create the corresponding metadata, you will connect to the Cloudera Manager.
- To connect to the Cloudera Manager, enter the Cloudera Manager credentials. In the Manager URI (with port) box, type http://clusterCDH54:7180. In the Username and Password boxes, type admin and click Connect.
The cluster named Cluster 1 appears in the Discovered clusters list.
- To retrieve the discovered cluster configuration, click Fetch.
The wizard detects configuration files and lists the corresponding services. In this tutorial, we will keep the default configuration and create metadata definitions for YARN, HDFS, Hive and HBase. The definition for Spark is not available.
- To import the configuration to the Hadoop cluster metadata created, click Finish.
4. Create metadata corresponding to the listed services except Spark
- In the authentication panel of the Hadoop Cluster Connection wizard, type the username as student, and click Check Services. The Checking Hadoop Services window opens. The Namenode and Resource Manager status is 100%.
- Close the Checking Hadoop Services window. To close the Hadoop Cluster Connection wizard and create the metadata, click Finish.
5. Inspect the metadata created in the Repository
- In the Repository, expand Hadoop Cluster.
The metadata definitions are now available.
- Expand the main definition named MyHadoopCluster, which corresponds to the YARN service. Expand HBase, HDFS, and Hive.
The metadata definitions are now ready to be used in a Talend Job.