Talend with Spark and Hadoop
An acceleration engine for your integration platform
The speed and scale of data processing unleashed by Apache Spark on Hadoop brings the promise of Big Data closer than ever. Talend Big Data provides the platform to take advantage of it today.
Optimize for the speed and scale of Spark on Hadoop
Talend generates native code to optimize the features of Spark that deliver the speed and scale of big data and the Internet of Things.
- Optimal management of distributed computing: partition up front for better performance
- Unmatched performance with massively parallel streaming of data straight from the source and data kept in-memory for reuse using compressed column storage
- Mix messaging and batch at scale with connectors for Kafka and more from Talend for an end-to-end distributed solution for large scale messaging systems
- A new category of JDBC connectors native to Spark enable ingestion from RDBMS using partitioned parallel read
- In-memory windowing helps compare data values over a set period of time
Leverage the full power of Spark Machine Learning
Spark can combine batch and streaming in a single run-time, and Talend provides a single tool and code base to build batch and real-time applications using high-speed messaging, real-time data ingestion and processing, and fast NoSQL connectivity capabilities.
- You can combine historical data with real-time clickstream, geolocation, or sensor data
- Talend helps you build the intelligent data pipelines, powered by Spark Machine Learning, that connect real-time and batch data to feed real-time analytics
- Pre-built drag-and-drop developer components leverage Spark machine learning classifiers for logistic and linear regression, image classification, text analysis, decision tree classification, gradient-boosted tree forecasting, random forest, ALS, and Naïve Bayes, and clustering algorithms such as K-Means
- Developers and data scientists can do everything in a single tool with appropriate tracking and governance to build Spark-based real-time analytics models for recommendations, customer segmentation, forecasting, classification, and regression analysis
- Talend's continuous delivery tools put data science models into production with fast and frequent iterations for massive learning on processed data
Stay current with the most up-to-date Hadoop distributions for Spark
Talend is the only data integration platform that supports the latest Hadoop distribution. Native Spark connectors in Talend optimize data feeds from external sources into Spark so you can ingest, load in parallel, and accelerate use of data.
Run on affordable, commoditized hardware, and deploy to your existing Hadoop cluster.
Manage the elasticity of your AWS EMR cluster within your job using Talend Studio.
Deliver Spark in the cloud via Google, Amazon, IBM, Oracle, and MS Azure.
Get started with over 100 drag-and-drop Spark components.
Track data used and apply security policies
in Cloudera Navigator and Hortonworks Atlas.