Big Data Platform

Turn big data into trusted insights.

Get up and running fast with the leading open source big data tool

Talend Big Data Platform simplifies complex integrations to take advantage of Apache Spark, Databricks, Qubole, AWS, Microsoft Azure, Snowflake, Google Cloud Platform, and NoSQL, and provides integrated data quality so your enterprise can turn big data into trusted insights. Leverage the full power and scale of your big data framework with the leading data integration and data quality platform built on Spark for cloud, hybrid and multi-cloud architectures.

Integrate data sources and run on the leading data platforms

Big Data Platform Features

License and Support

  • Subscription license with warranty and indemnification
  • 2 free Data Preparation and 2 free Data Stewardship licenses with any Talend subscription
  • Available as cloud service and downloadable software
+ Show more features

Design and Productivity Tools

  • Generates native MapReduce and Spark batch code
  • Visual mapping for complex JSON, XML, and EDI on Spark
  • Spark and MapReduce job designer
  • Serverless Spark processing through Databricks and Qubole
  • Dynamic distribution support
  • Hadoop job scheduler with YARN
  • Hadoop security for Kerberos
  • Ingestion, loading, and unloading data into a data lake
  • Graphical design environment
  • Team collaboration with shared repository
  • Continuous integration / Continuous delivery
  • Visual mapping for complex JSON, XML, and EDI
  • Audit, job compare, impact analysis, testing, debugging, and tuning
  • Metadata bridge for metadata import/export and centralized metadata management
  • Distant run and parallelization
  • Dynamic schema, re-usable joblets, and reference projects
  • Repository manager
  • ETL and ELT support
  • Wizards and interactive data viewer
  • Versioning
  • Change data capture (CDC)
  • Automatic documentation
  • Customizable assessment
  • Pattern library
  • Cloud Pipeline Designer
+ Show more features

Data Quality, Self-Service, and Governance

  • Data profiling and analytics with graphical charts and drill-down data
  • Automated data standardization, cleansing, and rules enforcement
  • Data privacy with masking and encryption
  • Data quality portal with monitoring, reporting, and dashboards
  • Semantic discovery with automatic detection of patterns
  • Comprehensive survivorship
  • Data sampling
  • Enrichment, harmonization, fuzzy matching, and de-duplication
  • Data sampling, semantic discovery, and auto-profiling
  • Social curation with data sharing, ratings and endorsement
  • Cross reference between datasets and data pipelines for data lineage and impact analysis
  • Cross reference between datasets and data preparations for data lineage and impact analysis
+ Show more features

Connectors

  • Cloud: Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform, and more
  • Cloud Data Warehouse and Data Lakes: Snowflake, Amazon Redshift, Azure Data Lake Storage Gen2, Azure SQL Data Warehouse, Databricks Delta Lake, Google BigQuery
  • Supported big data distributions: Amazon EMR, Azure HDInsight, Cloudera, Google Dataproc, Hortonworks, MapR
  • Serverless: Cloudera Altus, Databricks, Qubole
  • Spark MLlib (classification, clustering, recommendation, regression)
  • NoSQL: Cassandra, Couchbase, DynamoDB, MongoDB, Neo4j, and more
  • RDBMS: Oracle, Teradata, Microsoft SQL server, and more
  • SaaS: Marketo, Salesforce, NetSuite, and more
  • Packaged Apps: SAP, Microsoft Dynamics, Sugar CRM, and more
  • Technologies: Dropbox, Box, SMTP, FTP/SFTP, LDAP, and more
  • Optional 3rd-party address validation services
+ Show more features

Components

  • Hadoop components: HDFS, Hbase, Hive, Pig, Sqoop
  • File management: open, move, compress, decompress without scripting
  • Control and orchestrate data flows and data integrations with master jobs
  • Map, aggregate, sort, enrich, and merge data
+ Show more features

Data Preparation and Stewardship

  • 2 free licenses with subscription
  • Import, export and combine data from database, Excel, CSV, Parquet and AVRO files
  • Export to Tableau
  • Self-service on-demand access to sanctioned datasets
  • Share data preparations and datasets
  • Operationalize preparations into any data, big data or cloud integration flow
  • Run preparations on Apache Beam*
  • Auto-discovery, standardization, auto-profiling, smart suggestions, and data visualization
  • Customization of semantic type for auto-profiling and standardization
  • Smart and selective sampling and full-runs
  • Data tracking and masking with role-based security
  • Cleansing and enrichment functions
  • Data Stewardship App for data curation and certification
  • Define data models, data semantics and profile data accordingly. Define and apply rules
  • Merge and match data, resolve data errors, and arbitrate on data (classification and certification)
  • Orchestrate and collaborate on activities in campaigns
  • Define user roles, workflows and priorities, assign and delegate tasks, tag and comment
  • Embed governance and stewardship in data integration flows and manage rejects
  • Embed human certification and error resolution into MDM processes
  • Take matching decisions that cannot be processed automatically
  • De-duplicate data at scale with machine learning
  • Audit and track data error resolution actions. Monitor progress of campaigns. Undo/redo based on business needs
+ Show more features

Management and Monitoring

  • High availability, load balancing, failover for jobs
  • Deployment manager and team collaboration
  • Manage users, groups, roles, projects, and licenses
  • Manage execution engines
  • Single Sign-On (SSO) integration with several SSO providers
  • Execution plan, time, and event-based scheduler for jobs
  • Check points, error recovery
  • Context management (dev, QA, prod)
  • Log collection and display
  • Optional Admin user add-on*
  • Engine clusters for jobs*
  • Static IP addresses*
  • Job execution log history (2 months for Entry products, 3 months for Platforms)*
  • Environments (2 for Entry products, unlimited for Platforms)*
  • Cloud Security Information and Event Management (SIEM), Intrusion Detection System (IDS), Intrusion Prevention System (IPS) and Web Application Firewall (WAF)
+ Show more features

Big Data Quality

  • Data cleansing, profiling, masking, parsing, and matching on Spark and Hadoop
  • Machine learning for data matching and deduplication
  • Support for Cloudera Navigator and Apache Atlas
  • HDFS file profiling
+ Show more features

Advanced Data Profiling

  • Fraud pattern detection using Benford Law
  • Advanced statistics with indicator thresholds
  • Column set analysis
  • Advanced matching analysis
  • Time column correlation analysis
+ Show more features

Keep your data integration projects under budget

Talend keeps it flexible

Flexible

Keep costs predictable and resources flexible with annual or monthly subscriptions.

Talend keeps it predictable

Predictable

Talend charges per user, not per data volumes or connectors.

Talend keeps it simple

Simple

50% lower total cost of ownership with a single solution running in the cloud.

With Talend, we have been able to decode the Panama Papers, rapidly ‘connecting the dots’ between the corporate information for secret offshore companies and the people behind them.

Mar Cabra, Head of the Data & Research Unit
Talend customer: Euronext

In the stock exchange sector, we follow three watchwords: integrity, because it is impossible to lose a single order; permanent availability; and governance in a highly-regulated market. Talend has met these expectations.

Abderrahmane Belarfaoui, Chief Data Officer (CDO), Euronext

With Talend, we have improved our 48.8 million passenger’s experience and operation’s efficiency. And we have been recognized as Europe ‘s number One airport over 40 million passengers according to ACI World’s globally-established Airport Service Quality programme

Pietro Caminiti - Head of IT Solutions, Aeroporti di Roma

Ready to get started with Talend?