What’s New in
Talend Summer ’17

Run Big Data Integration on Any Cloud


Make Better, Faster Decisions

 Talend Summer ’17 delivers the latest cloud and big data innovations so you can get a 360-degree view of your customer across multiple cloud platforms. Accelerate AWS, Microsoft Azure and Google Cloud Platform adoption, with the flexibility and portability to easily reuse development work across the cloud.

Big Data Integration on Azure
Big Data Integration on Google
First and only to support Cloudera Altus
20X faster bulk loading performance

Speed Azure data warehousing and big data projects

Quickly move legacy information and build new cloud data integration pipelines using Talend and Azure. Extensive support for Azure big data, NoSQL and data storage services powers your analytics applications for deeper insight. Easily create Spark streaming jobs that integrate real-time and historical data using HDInsight, Cosmos DB, Data Lake Store, Table Storage, Blob Storage, Queue Storage, and SQL Data Warehouse.

Build robust big data pipelines on Google Cloud

The Google Cloud Platform together with Talend’s support for Google BigQuery, Dataproc, Cloud Storage and Pub/Sub allows businesses to rapidly create cloud data lakes, execute high-performance cloud data warehousing and power real-time decision making. Using graphical tools, build cloud data pipelines that ingest, process, enrich and cleanse data at the speed of Spark.

Easily deploy big data projects to Cloudera Altus

Big data in the cloud is more agile than ever before. Cloudera’s new big data-as-a-service makes getting big data insight easier and faster while reducing data management costs. With Talend and Cloudera Altus you can fire up transient Hadoop and Spark clusters in the Cloud, specify the node capacity needed and run the Talend data job in one simple click. No need to worry about DevOps or provisioning, the entire server management is done by Cloudera Altus. Available as a technical preview.

Rapidly build a Cloud data lake with Talend and Snowflake

As a leading SQL cloud data warehouse, Snowflake quickly and cost effectively crunches demanding analytic workloads. Talend delivers the fastest bulk loader into Snowflake - 20X faster than previous versions. Using visual ETL and data quality tools, Talend dramatically shortens the time to migrate on-premises and cloud databases to Snowflake.

Additional Updates

Big Data Integration

Improve the performance and productivity of your big data projects:

  • Streamline DevOps processes for Hadoop by defining custom cluster configurations for development, test, and production environments

  • Run SparkSQL queries significantly faster with Spark 2.1 support

  • Innovate faster with support for the latest big data distributions

What’s New in Big Data

Data Integration

Improve your productivity and project security:

  • Increased management and security flexibility through Talend Administrator Center (TAC) updates for creating custom roles, separating security and TAC administrator roles, and single sign-on (SSO)

  • Continuous integration updates include AWS CodeCommit support (Git) and the ability to version jobs individually (by group id, artefact id, deploy version)

  • Improved tMap filtering and Studio wizards make it easier to search large schemas

  • Improved security through Talend Identity Access Management (IAM) updates

  • New Connector SDK to create a single component for multiple integration styles (Technical Preview)

  • New JDBC slowly changing dimension component (tJDBCSCD) with ELT support

Data Preparation

Deliver the best data preparation user experience at extreme scale:

  • Easily access, clean, fix and format Salesforce data with a self-service Salesforce connector

  • Easily access, clean, fix, format and store any dataset or data preparation on Amazon S3 with a secure Amazon S3 connector

  • Track modifications and easily leverage preparations from users with versioning

  • Support IT project management standards by easily promoting data preparations across non-production and production environments

  • Run big data preparations on streaming data for real-time insight with Spark Streaming support (tDataPrepRun in Talend Real-Time Big Data)

  • Easily migrate, with a single click, all preparations and datasets from Data Preparation Free Desktop to the commercial version

  • Save time with single sign-on for Data Stewardship and Data Preparation

Talend Data Mapper

Increase the performance of your complex mappings:

  • Faster results through Spark 2.1 support

  • Consume hierarchical records using Spark Streaming (tHMapRecord)

  • Improved handling of SAP IDoc structures

  • Document signature enhancements for Spark Batch

Data Quality

Increase the integrity of data as it flows through the business:

  • Enhanced dictionary service with support for compound semantic types, and North American states and international phones semantic types

  • Improved matching performance by matching only the new records against the original large dataset using Spark continuous matching

  • Fine tune survivorship through support for complex and cascading survivorship rules

  • Automatically extract information from unstructured data using Natural Language Processing (NLP)

Data Stewardship

Speed the curation of quality data through Data Stewardship:

  • Improve deduplication productivity using new user interface for Spark machine learning and matching with grouping campaigns

  • Optimize performance by applying smarter and more complex matching with MDM-integrated matching

  • Anticipate impact of data model changes through improved impact analysis

  • Improve auto-discovery capabilities through enhanced dictionary service with support for compound semantic types, and North American states and international phones semantic types


Design, ingest, author, curate, and update your master data faster:

  • Faster performance with read/write and cluster improvements

  • Integrated matching and native survivorship with smarter rules and ability to delegate stewardship to potentially any user through the Data Stewardship App

  • Smarter and more flexible impact analysis to facilitate change management

What’s New in Data Governance


Improve your ESB productivity and project security:

  • Streamline development efforts using the ESB test runtime packaged with Studio. Debug data services and routes from Studio in the Talend runtime.

  • Leverage the latest security standards when integrating with mobile or native apps through advanced REST APIs

  • Improve data services security through Talend Identity Access Management updates with OpenID Connect, SAML, OAuth.

  • Improve productivity by sharing custom jars with team members (cConfig)

Talend Metadata Manager

Get a holistic view of metadata across the data lake:

  • Collect and integrate metadata from Cloud and big data systems with new metadata bridges for S3, Hadoop HDFS, Hive, MongoDB, Couchbase, Cassandra, and Apache Atlas

  • Enhanced script parsing with support for BTEK, SQL and PL/SQL formats

  • Build a data inventory for the data lake by automatically harvesting data structures across file systems - S3, Hadoop HDFS, Unix, Windows, and Linux; and file formats - CSV, Excel, JSON, Avro, and Parquet

  • Business users can subscribe to alerts and metadata changes through notifications. Stewardship roles can be assigned to anyone and they will be notified when new content is harvested with change impact analysis

Extend Your Integration Reach

Talend Studio features over 900 enterprise application components and connectors. For a complete list, go to Talendforge.org.

New and Updated Hadoop Distributions

Amazon EMR 5.5/5.6 | Cloudera CDH 5.10.1 | Google Dataproc 1.1 | Hortonworks HDP 2.6 | Microsoft Azure HDInsight 3.6 | Spark 2.1

New and Updated Components

Exasol | Excel | Google Cloud Dataproc, Google Cloud Storage, Google Cloud Pub/Sub, Google BigQuery | MapR-DB, MapR-Streams | Marketo | Microsoft Azure Table Storage, Azure Blob Storage, Azure Queue Storage, Azure Data Lake Store, Azure SQL Data Warehouse, Azure Cosmos DB, Microsoft Dynamics CRM (365/2016) | Netsuite | Salesforce | SAP | Snowflake | Sybase