What it DoesIntegration at Cluster Scale
Talend redefines the development skills needed for big data and facilitates the organization and orchestration required by these projects so that you can focus on the key question: “What use should we make of data, big and small, and how am I going to be the leader in using data to help my business?”
Talend’s big data product combines big data components for MapReduce 2.0 (YARN), Hadoop, HBase, Hive, HCatalog, Oozie, Sqoop and Pig into a unified open source environment so you can quickly load, extract, transform and process large and diverse data sets from disparate systems.
How it WorksBig Data Without The Need To Write / Maintain Code
Ready to Use Big Data Connectors
Talend provides an easy-to-use graphical environment that allows developers to visually map big data sources and targets without the need to learn and write complicated code. Running 100% natively on Hadoop, Talend Big Data provides massive scalability. Once a big data connection is configured the underlying code is automatically generated and can be deployed remotely as a job that runs natively on your big data cluster - HDFS, Pig, HCatalog, HBase, Sqoop or Hive.
Big Data Distribution and Big Data Appliance Support
Talend's big data components have been tested and certified to work with leading big data Hadoop distributions, including Amazon EMR, Cloudera, IBM PureData, Hortonworks, MapR, Pivotal Greenplum, Pivotal HD, and SAP HANA. Talend provides out-of-the-box support for big data platforms from the leading appliance vendors including Greenplum/Pivotal, Netezza, Teradata, and Vertica.
Using the Apache software license means developers can use the Studio without restrictions. As Talend’s big data products rely on standard Hadoop APIs, users can easily migrate their data integration jobs between different Hadoop distributions without any concerns about underlying platform dependencies. Support for Apache Oozie is provided out-of-the-box, allowing operators to schedule their data jobs through open source software.
Pull Source Data from Anywhere Including NoSQL
With 450+ connectors, Talend integrates almost any data source so you can transform and integrate data in real-time or batch. Pre-built connectors for HBase, MongoDB,Cassandra, CouchDB, Couchbase, Neo4J and Riak speed development without requiring specific NoSQL knowledge. Talend big data components can be configured to bulk upload data to Hadoop Hadoop or other big data appliance, either as a manual process, or an automatic schedule for incremental data updates.
Get White Paper
The strategy for data quality with Big Data will depend on whether the application is mission-critical, whether regulatory compliance ramifications are involved, and the degree to which bad quality data will materially impact the business.