Compare All Big Data Products

 Open Studio for Big DataBig Data Entry-LevelBig Data PlatformReal-Time Big Data Platform
Product Details Product Details Product Details Product Details 


Free Open SourceUser-based subscriptionUser-based subscriptionUser-based subscription
Free open source Apache license
Subscription license with warranty and indemnification

Design & Productivity Tools

Generates native Spark Streaming code
Visual mapping for complex XML & EDI on Spark
Spark & MapReduce job designer
Generates native MapReduce & Spark batch code
Hadoop job scheduler with YARN
Hadoop security for Kerberos
Ingestion, loading and unloading data into a data lake
Enterprise Messaging (JMS, ActiveMQ, AMQP)
Eclipse-based developer tooling & job designer
Continuous deliveryæintegration & team collaboration with shared repository
Audit, job compare, impact analysis, testing, debugging & tuning
Metadata bridge for metadata import/export & centralized metadata management
Distant run & parallelization
Dynamic schema, re-usable joblets & reference projects
Repository manager
ETL & ELT support
Wizards & interactive data viewer
Change data capture (CDC)
Drools business rule management system
Automatic documentation


Hadoop components: HDFS, Hbase, Hive, Pig, Sqoop
File management: open, move, compress, decompress without scripting
Control and orchestrate data flows and data integrations with master jobs
Map, aggregate, sort, enrich & merge data
Standard support: REST, SOAP, OpenID Connect, OAuth, SAML, STS, WSDL, SWAGGER and more
Transports/protocols support: HTTP, JMS, MQTT, AMQP, UDP, Apache Kafka, WebSphere MQ and more
Enterprise Integration Patterns for service mediation, routing and messaging


Cloud: AWS, Microsoft Azure, Google Cloud Platform, and more
Supported big data distributions: Amazon EMR, Azure HDInsight, Cloudera, Google Dataproc, Hortonworks, MapR
Spark MLlib (classification, clustering, recommendation, regression)
NoSQL: Cassandra, Couchbase, DynamoDB, MongoDB, Neo4j, and more
High-Speed messaging components (Kafka, Kinesis, Flume)Ê
RDBMS: Oracle, Teradata, Microsoft SQL server, and more
SaaS: Marketo, Salesforce, NetSuite, and more
Packaged Apps: SAP, Microsoft Dynamics, Sugar CRM, and more
Technologies: Dropbox, Box, SMTP, FTP/SFTP, LDAP, and more
Cleansing, masking & error resolution
Optional 3rd-party address validation services

Management & Monitoring

High availability, load balancing, failover for jobs
Deployment manager & team collaboration
Talend Administration Center
Amazon EC2 lifecycle control
Execution plan, time & event-based scheduler
Check points, error recovery
Context management (dev, QA, prod)
Activity Monitoring Console
Log server with dashboard

Data Quality & Governance

Data profiling & analytics with graphical charts & drilldown data
Automate data quality error resolution and enforce rules
Data masking
Data quality portal with monitoring, reporting & dashboards
Semantic discovery with automatic detection of patterns
Comprehensive survivorship
Data sampling
Enrichment, harmonization, fuzzy matching & de-duplication

Big Data Quality

Data cleansing, profiling, masking, parsing & matching on Spark & Hadoop
Machine learning for data matching and deduplication
Support for Cloudera Navigator & Apache Atlas
HDFS file profiling

Advanced Data Profiling

Fraud pattern detection using Benford Law
Advanced statistics with indicator thresholds
Column set analysis
Advanced matching analysis
Time column correlation analysis

Data Preparation & Stewardship

Import, export & combine data from any database, Excel or CSV file
Import, export & combine CSV, Parquet & AVRO files from/to Hadoop
Export to Tableau
Self-service on-demand access to sanctioned datasets
Share data preparations & datasets
Operationalize preparations into any big data & cloud integration flow
Run preparations on Apache Beam
Auto-discovery, profiling, smart suggestions, and data visualization
Auto-discovery & auto-profiling of custom semantic types
Smart & selective sampling & full-runs
Data tracking & masking with role-based security
Cleansing and enrichment functions
Data Stewardship App for data curation and certification

ESB Management

JMX monitoring, service activity monitoring
System monitoring
Visibility into live statistics of message flow activity
Integrated artifact repository
Centralized event logging service, provision service
HypericHQ plug-ins
Job conductor
Identity management & authorization
Web services high availability


Self-serviceWeb, emailWeb, email, phoneWeb, email, phone

Contact Sales