What is a Data Warehouse and Why Does It Matter To Your Business?

A data warehouse is a large collection of business data used to help an organization make decisions. It is the foundational component of business intelligence efforts. Learn how data warehouses work, how they are different from databases or data marts, why they are moving to the cloud, and more.

View Now

Cloud Integration 101

Cloud integration lets businesses connect data hosted on local servers to cloud-native data stores and applications. Cloud integration also provides a path to data analytics platforms, CRM systems, and other applications hosted by third-party providers. These include data warehouses such as Google BigQuery, Snowflake, Salesforce, AWS, and Microsoft Azure. 

View Now

Data Lake vs Data Warehouse

Data lakes and data warehouses are both widely used for storing big data, but they are not interchangeable terms. A data lake is a vast pool of raw data, the purpose for which is not yet defined. A data warehouse is a repository for structured, filtered data that has already been processed for a specific purpose.

View Now

What is a Data Mart?

A data mart is a subject-oriented database that meets the demands of a specific group of users. Data marts accelerate business processes by allowing access to information in a data warehouse or operational data store within days as opposed to months or longer.

View Now

What is Data Preparation?

Data preparation is the process of cleaning and transforming raw data prior to processing and analysis. It is a time consuming process, but the business intelligence benefits demand it. And today, savvy self-service data preparation tools are making it easier and more efficient than ever.

View Now

What is Data Processing?

Data processing converts data in its raw form to a more readable format, to be interpreted by computers and utilized by employees throughout an organization.

View Now

Creating Cluster Connection Metadata from Configuration Files

In this tutorial, create Hadoop Cluster metadata by importing the configuration from the Hadoop configuration files.
This tutorial uses Talend Data Fabric Studio version 6 and a Hadoop cluster: Cloudera CDH version 5.4.
1. Create a new Hadoop cluster metadata definition
Ensure that the Integration perspective is selected.
In the Project Repository, expand Metadata, right-click Hadoop Cluster, and click Create Hadoop Cluster to open the wizard.
In the Name field of the Hadoop Cluster Connection wizard, type MyHadoopCluster_files. In the Purpose field, type Cluster connection metadata, in the Description field, type Metadata to connect to a Cloudera CDH 5.4 cluster, and click Next.

Watch Now