Writing and Reading Data in HDFS

In this tutorial, generate random data and write them to HDFS. Then, read the data from HDFS, sort them and display the result in the Console. This tutorial uses Talend Data Fabric Studio version 6 and a Hadoop cluster: Cloudera CDH version 5.4. 1. Create a new standard …

Watch Now

Creating Cluster Connection Metadata from Configuration Files

In this tutorial, create Hadoop Cluster metadata by importing the configuration from the Hadoop configuration files. This tutorial uses Talend Data Fabric Studio version 6 and a Hadoop cluster: Cloudera CDH version 5.4. 1. Create a new Hadoop cluster metadata definition Ensure that the Integration perspective is …

Watch Now

Creating Cluster Connection Metadata

In this tutorial, create Hadoop Cluster metadata automatically by connecting to the Cloudera Manager. This tutorial uses Talend Data Fabric Studio version 6 and a Hadoop cluster: Cloudera CDH version 5.4. 1. Create a new Hadoop cluster metadata definition Ensure that the Integration perspective is selected. In …

Watch Now

Running a Job on YARN

In this tutorial, create a Big Data batch Job running on YARN, read data from HDFS, sort them and display them in the Console. This tutorial uses Talend Data Fabric Studio version 6 and a Hadoop cluster: Cloudera CDH version 5.4. It reuses the HDFS connection metadata …

Watch Now

Running a Job on Spark

In this tutorial, create a Big Data batch Job using the Spark framework, read data from HDFS, sort them and display them in the Console. This tutorial uses Talend Data Fabric Studio version 6 and a Hadoop cluster: Cloudera CDH version 5.4. It reuses the HDFS connection …

Watch Now

Get Value Out of Your Data With a 360° Data Hub

Data is the lifeblood of digital transformation, but it is much more valuable when it is trusted and actionable. There are numerous business benefits from gathering trusted data in real time, such as: personalizing offers and recommendations, increasing compliance in a holistic way, and improving product development and innovation.   To deliver truly valuable data that achieves …

Download Now

Apache Sqoop: A Complete Guide

Maximizing the value of large amounts of unstructured data available today requires timely integrations with structured data in relational settings. Setting up manual integrations between RDBMS options like MySQL and contemporary data stores like Hadoop is time-consuming, costly, and inefficient for modern workflows. However, organizations can largely …

View Now

Data Extraction Tools: Improving Data Warehouse Performance

With corporate data increasing approximately 40 percent each year, it’s almost impossible for organizations to manually keep track of and collect every data point available. Enterprises that do rely on manual efforts dedicate an inordinate amount of time, resources, and effort to get the data they need—and …

View Now

EXAMPLE 2 – TDWI: Governing Big Data and Hadoop

Get ready to take operational analytics and a 360-degree view of customers to a whole new level with big data. Download this new Checklist Report from TDWI. The report covers governance best practices and tools for: Self-service access to big data New data exploration and discovery Metadata …

Download Now

EXAMPLE 1 – TDWI: Governing Big Data and Hadoop

Get ready to take operational analytics and a 360-degree view of customers to a whole new level with big data. Download this new Checklist Report from TDWI. The report covers governance best practices and tools for: Self-service access to big data New data exploration and discovery Metadata …

Download Now

Harnessing Data Quality for Better Decisions

Tired of dealing with the impact of poor data quality in your organization? Frustrated with the cost and effort associated with reworking processes and reports as a result of inaccurate or incomplete data? A good data quality solution empowers you to readily embrace enterprise-class data governance as …

Watch Now

Why ELT Tools Are Disrupting the ETL Market

Research indicates approximately 50 percent of business data resides in the cloud, illustrating the importance of external data sources to the modern enterprise. Organizations need similarly modern tools to swiftly process and integrate this data in a range of time commensurate with the current speed of business. …

View Now

Machine Learning Sandbox – Data Warehouse Optimization

Talend’s Big Data and Machine Learning Sandbox is a virtual environment that utilizes Docker containers to combine the Talend Real-time Big Data Platform with some sample scenarios that are pre-built and ready-to-run. This example demonstrates a Data Warehouse Optimization approach that utilizes the power of Spark to …

View Now

Machine Learning Sandbox – Real-Time Risk Assessment

Talend’s Big Data and Machine Learning Sandbox is a virtual environment that utilizes Docker containers to combine the Talend Real-time Big Data Platform with some sample scenarios that are pre-built and ready-to-run. This video demonstration shows how an online bank is trying to mitigate their exposure and …

View Now

Machine Learning Sandbox – IoT Predictive Maintenance

Talend’s Big Data and Machine Learning Sandbox is a virtual environment that utilizes Docker containers to combine the Talend Real-time Big Data Platform with some sample scenarios that are pre-built and ready-to-run. This video shows you how to get signed up and download the Talend Big Data …

View Now

Machine Learning Sandbox – Sign Up and Download

Talend’s Big Data and Machine Learning Sandbox is a virtual environment that utilizes Docker containers to combine the Talend Real-time Big Data Platform with some sample scenarios that are pre-built and ready-to-run. This video shows you how to get signed up and download the Talend Big Data …

View Now

Best Practices Report: Multiplatform Data Architectures

This TDWI Best Practices Report shows how companies are taming cost, complexity, and governance with what TDWI calls multiplatform data architectures (MDAs). By expanding and integrating portfolios of data platforms and tools into MDAs, data-driven organizations are better able to capture distributed enterprise data, big data, and …

Download Now

ETL in the Cloud

Since the dawn of big data, the ETL (extract, transform, and load) process has been the heart that pumps information through modern business networks. Today, cloud-based ETL is a critical tool for managing massive data sets, and one that companies will increasingly rely on in the future. …

View Now

How to Move Data from Salesforce to AWS Redshift

As companies become more data-driven, there is a greater need to collect and analyze large volumes of data from on-premises and Cloud applications.  A popular IT request is to extract Customer Relationship Data (CRM) data from Salesforce and load it into the Amazon Web Services (AWS) Redshift …

View Now

What is ELT?

In a data-driven world, an efficient process for moving and transforming data for analysis is critical to business growth and innovation. Loading a data warehouse can be an extremely time-consuming process. The process of extracting, loading, and transforming (ELT) data streamlines the tasks of modern data warehousing …

View Now

What is MapReduce?

In today’s data-driven market, algorithms and applications are collecting data 24/7 about people, processes, systems, and organizations, resulting in huge volumes of data. The challenge, though, is how to process this massive amount of data with speed and efficiency, and without sacrificing meaningful insights. This is where …

View Now

ETL Testing: An Overview

ETL — Extract/Transform/Load — is a process that extracts data from source systems, transforms the information into a consistent data type, then loads the data into a single depository. ETL testing refers to the process of validating, verifying, and qualifying data while preventing duplicate records and data …

View Now

What is Hadoop?

Hadoop is an open source, Java based framework used for storing and processing big data. The data is stored on inexpensive commodity servers that run as clusters. Its distributed file system enables concurrent processing and fault tolerance. Developed by Doug Cutting and Michael J. Cafarella, Hadoop uses …

View Now

Big Data Quality

The Open Source Solution for Big Data Quality Management With the advent of big data, data quality management is both more important and more challenging than ever. Fortunately the combination of Hadoop open source distributed processing technologies and Talend open source data management solutions bring big data …

View Now