Compare All Big Data Products

Free Trial

 Open Studio for Big DataBig Data Entry-LevelBig Data PlatformReal-Time Big Data Platform
Product Details Product Details Product Details Product Details 

License

Free Open SourceUser-based subscriptionUser-based subscriptionUser-based subscription
Free open source Apache license
Available as downloadable software
Subscription license with warranty and indemnification
Available as cloud service and downloadable software
2 free Data Preparation and 2 free Data Stewardship licenses with any Talend subscription

Design and Productivity Tools

Generates native MapReduce and Spark batch code
Generates native Spark Streaming code
Visual mapping for complex XML and EDI on Spark
Spark and MapReduce job designer
Serverless Spark processing through Databricks and Qubole
Dynamic distribution support
Hadoop job scheduler with YARN
Hadoop security for Kerberos
Ingestion, loading, and unloading data into a data lake
Graphical design environment
Team collaboration with shared repository
Continuous integration / Continuous delivery
Audit, job compare, impact analysis, testing, debugging, and tuning
Metadata bridge for metadata import/export and centralized metadata management
Distant run and parallelization
Dynamic schema, re-usable joblets, and reference projects
Repository manager
ETL and ELT support
Wizards and interactive data viewer
Versioning
Change data capture (CDC)
Automatic documentation
Customizable assessment
Pattern library
Cloud Pipeline Designer*

Connectors

Cloud: AWS, Microsoft Azure, Google Cloud Platform, and more
Supported big data distributions: Amazon EMR, Azure HDInsight, Cloudera, Google Dataproc, Hortonworks, MapR
Serverless: Cloudera Altus, Databricks, Qubole
Spark MLlib (classification, clustering, recommendation, regression)
NoSQL: Cassandra, Couchbase, DynamoDB, MongoDB, Neo4j, and more
High-Speed messaging components (Kafka, Kinesis, Flume)
RDBMS: Oracle, Teradata, Microsoft SQL server, and more
SaaS: Marketo, Salesforce, NetSuite, and more
Packaged Apps: SAP, Microsoft Dynamics, Sugar CRM, and more
Technologies: Dropbox, Box, SMTP, FTP/SFTP, LDAP, and more
Cleansing, masking, and error resolution
Optional 3rd-party address validation services

Components

Hadoop components: HDFS, Hbase, Hive, Pig, Sqoop
File management: open, move, compress, decompress without scripting
Control and orchestrate data flows and data integrations with master jobs
Map, aggregate, sort, enrich, and merge data
Standard support: REST, SOAP, OpenID Connect, OAuth, SAML, WSDL, Swagger(tm) and more
Transports/protocols support: HTTP, JMS, MQTT, AMQP, UDP, Apache Kafka, WebSphere MQ, and more
Enterprise Integration Patterns for service mediation, routing, and messaging

Data Quality and Governance

Data profiling and analytics with graphical charts and drilldown data
Automated data standardization, cleansing and rules enforcement
Data privacy with masking and encryption
Data quality portal with monitoring, reporting, and dashboards
Semantic discovery with automatic detection of patterns
Comprehensive survivorship
Data sampling
Enrichment, harmonization, fuzzy matching, and de-duplication

Big Data Quality

Data cleansing, profiling, masking, parsing, and matching on Spark and Hadoop
Machine learning for data matching and deduplication
Support for Cloudera Navigator and Apache Atlas
HDFS file profiling

API Development*

Visual API Designer
Support for OAS / Swagger(tm) and RAML
Visual API Tester
Automatic API mocking
API testing automation
Hosted API documentation
API contract import into Talend Studio

Agile Application Integration

Drag-and-drop route, data and web/REST services creation and simulation
Deliver and route messages and events based on Enterprise Integration Patterns (EIPs)
Reliable messaging backbone based on ActiveMQ
Command line and scripting tools
Build and deploy as an OSGI feature
Build a microservice
Deploy and manage a microservice

Advanced Data Profiling

Fraud pattern detection using Benford Law
Advanced statistics with indicator thresholds
Column set analysis
Advanced matching analysis
Time column correlation analysis

Pipeline Designer*

Design pipelines in the cloud and run on-premises or in the cloud
Run pipelines on AWS EMR and Databricks
Read/Write support for Snowflake; Amazon Redshift, S3; Azure SQL Database, SQL Data Warehouse, Blob Storage, Data Lake Store Gen2; Amazon RDS (Oracle, SQL Server, MySQL, PostgreSQL, Aurora); and on-premise through JDBC (Oracle, SQL Server, MySQL, MariaDB, PostgreSQL)
Connectors for SaaS: Saleforce; Streaming: Apache Kakfa, Amazon Kinesis (source only), Azure Event Hubs; NoSQL: Elasticsearch
Native cloud data warehouse connectors: Snowflake and Amazon Redshift bulk loaders (destination only)
Lightweight data transformations including filter, flatten/normalize, aggregate, replicate, look up, join, and time windowing
Live preview of sample data, and pipeline sharing
Design batch and streaming pipelines in the same interface, using the same connectors
Schema on-read support
Easily embed Python code
Supports data formats including: AVRO, JSON, Parquet, and CSV
Stores data in shared, common data set repository across all Talend products
Manage users and licenses, schedule pipelines and monitor status (TMC)

Data Preparation and Stewardship

Import, export, and combine data from any database, Excel, or CSV file
Import, export, and combine CSV, Parquet and AVRO files
Export to Tableau
Self-service on-demand access to sanctioned datasets
Share data preparations and datasets
Operationalize preparations into any big data and cloud integration flow
Auto-discovery, standardization, auto-profiling, smart suggestions, and data visualization
Customization of semantic type for auto-profiling and standardization
Smart and selective sampling and full-runs
Data tracking and masking with role-based security
Cleansing and enrichment functions
Data Stewardship App for data curation and certification
Define data models, data semantics and profile data accordingly. Define and apply rules (survivorship, mass updates)
Merge and match data, resolve data errors, and arbitrate on data (classification and certification)
Orchestrate and collaborate on activities in campaigns
Define user roles, workflows and priorities, assign and delegate tasks, tag and comment
Embed governance and stewardship in data integration flows and manage rejects
Embed human certification and error resolution into MDM processes
Take matching decisions that cannot be processed automatically
De-duplicate data at scale with machine learning
Audit and track data error resolution actions. Monitor progress of campaigns. Undo/redo based on business needs

Management and Monitoring

High availability, load balancing, failover for jobs
Deployment manager and team collaboration
Manage users, groups, roles, projects, and licenses
Manage execution engines
Execution plan, time, and event-based scheduler for jobs
Check points, error recovery
Context management (dev, QA, prod)
Activity monitoring
Log collection and display
Optional Admin user add-on*
Engine clusters for jobs*
Static IP addresses*
Job execution log history (2 months for Entry products, 3 months for Platforms)*
Environments (2 for Entry products, unlimited for Platforms)*
Single Sign-On (SSO) integration with several SSO providers
Cloud Security Information and Event Management (SIEM), Intrusion Detection System (IDS), Intrusion Prevention System (IPS) and Web Application Firewall (WAF)*

Services Management

System monitoring: JMX / Jolokia
Services and routes runtime engine (Talend Runtime on-prem, Remote Engine cloud)
Containerized service generation
Access into live statistics of message flow activity
Integrated artifact repository
Interface to deploy data services and routes

Support

Self-serviceWeb, emailWeb, email, phoneWeb, email, phone

Contact Sales

For information about our collection and use of your personal information, our privacy and security practices and your data protection rights, please see our privacy policy.