What’s New in Talend Spring '18

Introducing Talend Data Streams

Put More Data to Work

New cloud, big data, and governance enhancements will dramatically improve your team’s ability to deliver data-driven results.

Ingesting Streaming Data
Just Got Easier

Make streaming data integration easy for data scientists, analysts, and data engineers


Reduce your cloud data processing costs by 67%

Big Data Integration

Quickly build cloud data warehouses and data lakes with the latest technology

Cloud Data Stewardship

Empower those who know the data best with a self-service data curation and validation app

Catch Your
Streaming Data

Designed for data scientists, analysts, and engineers, Talend Data Streams is a self-service, free application that makes streaming data integration faster, easier, and more accessible. Built for the cloud, it can be up and running in minutes. Ingest new types and streaming data painlessly with schema-on-read, all in a single interface for streaming and batch pipelines, powered by Apache Beam. Accelerate your pipeline development with embedded Python coding components and the unique live preview to see your data at every step of your design.

Webinar: Put More Data to Work

Cut Cloud Data Processing Costs by 67% with Serverless

Deploy on serverless services and focus less on managing your infrastructure and more on delivering data-driven insight. Through Maven Plug-ins you can easily integrate Docker into your build process and deploy on serverless services, such as AWS Fargate and Azure Container Instances (ACI). Metering by the second and faster execution speeds reduces data processing costs up to 67%, while processing more data in parallel increases performance up to 50%. Qubole and Cloudera Altus on Azure support enables serverless big data integration to minimize server management tasks and automatically scale up/down cloud resources.

Article: How to go Serverless with Talend and AWS Lambda

Faster Big
Data Integration

Process more data more quickly across cloud data warehouses and data lakes. Gain rapid insight with new ELT push-down capabilities for Snowflake, new Spark and Spark Streaming support in Azure Data Lake Store, and enhanced data extraction capabilities for SAP.

Talend now supports dynamic distribution (technical preview) for Cloudera, giving you instant access to the latest Cloudera features with no need to upgrade Talend, saving weeks to even months of admin time. Develop big data jobs once and deploy on-premises, on any cloud, or as a Talend Cloud-managed service.

Learn More: What’s New with Big Data

Introducing Talend Cloud Data Stewardship

Increase trust in your data with Talend Cloud Data Stewardship, a team-based, self-service data curation and validation app that enables the people who know the data best to quickly identify, manage, and resolve any data integrity issue. Using a simple, web-based UI, you can define user roles, workflows, and priorities for data curation, then delegate tasks. Establish a single version of the truth, no matter which cloud or location your data is on. Nothing to install, just turn it on as a Talend Cloud service.

(Data Stewardship is available as a Talend Cloud app or as Talend software you download and install)

Webinar: Team-driven Data Quality and Data Stewardship


This section lists new features in Talend Spring ’18 and Talend Winter ’18.
To see what is in each release and product (downloadable software or Talend Cloud), visit help.talend.com

Big Data Integration
Data Integration
Data Quality
Data Preparation
Data Stewardship
Talend Data Mapper
Big Data Integration Improve the performance and productivity of your big data projects:
  • New dynamic distribution support (technical preview) for Cloudera CDH—instantly add Hadoop distribution updates without upgrading Talend
  • Run Spark jobs in YARN cluster mode, removing the need for a job server on an edge node at runtime, simplifying and speeding up your deployment with no single point of failure
  • Dramatically increase your ability to extract data from SAP, at the application, database, and data warehouse level. New SAP bulk-extraction capabilities allow you to extract nearly unlimited amounts of data from SAP. Easily extract new or changed pre-packaged SAP data using the business content extractor with delta mode (Technical Preview). ELT push-down support for SAP enables processing natively within SAP, before moving data to the cloud
  • Snowflake component support is enhanced so you can do ELT push-down, where data processing and transformations are done on Snowflake clusters, leveraging the massive performance and scalability of Snowflake for faster analytics
  • Ingest into and query Cloudera Kudu, a Hadoop columnar storage manager used for rapid analytics on fast data scenarios like IoT, GDPR, and fraud detection. Advanced tuning options provide optimal performance
  • MapR-DB OJAI support, so you can perform advanced hierarchical transformations graphically and query MapR-DB OJAI from your job, delivering faster performance and greater flexibility for web, mobile, social, and IoT-based applications
  • Simplify AWS S3 security implementation by using IAM roles and secure token service for your job
  • Run your Talend workloads on Cloudera Altus on Azure (in addition to AWS today)
  • Process more data faster with Spark and Spark Streaming support for Microsoft Azure Data Lake Store
  • Track application IDs in Hive Query to better manage your Talend / Hive jobs
  • Get and set rowkeys in HBase, so you can leverage HBase best practices and work with time-series data
Data Integration Improve your productivity and project security:
  • Job server security and productivity improvements including:
    • Role-based security: A Studio developer can only execute jobs belonging to a project which they have authorization
    • Enhanced job server data cleansing actions to ignore active running jobs and any linked dependencies or libraries
    • Scheduling and error handling improvements to restart tasks on unavailable job servers, and virtual job servers with weighted round-robin load balancing
  • Talend Administration Center (TAC) improvements including:
    • Additional Single Sign-on (SSO) options, including support for Ping Identity PingFederate Server, and Microsoft Active Directory Federation Services
    • Greater visibility of what is happening through auditing and security logging, which traces all user interactions including access, modifications, and configuration changes
    • A new auditor role for configuring and accessing the audit log, providing a greater level of security
  • Talend Cloud reduces testing and debugging time from minutes to seconds with a free test engine and the ability to remotely debug big data jobs, and debug jobs on either Talend Cloud Engines or Remote Engines
  • Continuous integration updates including using Maven standards for incremental builds in Studio, broader Git support including Bitbucket Server 5.x, Nexus 3 support for the Talend Artifact Repository, standard Maven commands for data integration and application integration (technical preview), and the ability to easily extend the build process through Maven Plug-ins and custom Project Object Models (POMs)
  • Increase productivity by building custom Talend components. Develop once with the Talend Component Kit, then reuse across all Talend products and integration styles, batch to real-time, data integration to big data, on-premises to cloud
  • Save time by automatically matching similarly named columns with Smart tMap Fuzzy Auto-mapping, which uses data quality algorithms (Levenshtein, Jaccard) to do fuzzy matching
  • Increased flexibility and productivity in job design with the ability to change table names at runtime through ELTMap, and new routines to adapt to changing schemas
Data Quality Increase the integrity of cloud and on-premises data as it flows through the business:
  • Improved survivorship rules with per column support so you have finer control of the master value you want to keep
  • New component tPatternMasking to define new types of masking patterns for privacy and security control
  • Import and export semantic types from the Dictionary Service UI, making it easier to manage the promotion of semantic types across environments
  • Talend Dictionary Service REST APIs are now publicly available and self-documented via Swagger. You can leverage Talend Dictionary Service in data/application integration scenarios and populate the Talend Dictionary Service programmatically
  • The Dictionary Service UI has been translated to French
Data Preparation Deliver the best data preparation user experience at extreme scale:
  • With the Cloud Dictionary Service, you can define new business terms for your data to facilitate data understanding and usage by both people and machines
  • Expanded connectivity options with Redshift and Snowflake self-service connectors
  • Dynamic preparation selection in a Talend job, to improve maintenance and productivity
  • Improved flexibility with new data preparation functions: Basic deduplication, Standardization via data dictionaries, Fill from above, Generate a sequence, Percentages management
  • Support for custom enclosure and escape characters for CSV files makes it possible to handle non-standard or complex CSV files without the need to standardize the file outside Talend Data Preparation
  • UI now supports both French and Japanese
Data Stewardship Quickly identify, manage, and resolve any data integrity issue:
  • Empower those who know the data best with Talend Cloud Data Stewardship, a team-based, self-service data curation and validation app where you quickly identify, manage, and resolve any data integrity issue
  • With the Cloud Dictionary Service, you can define new business terms for your data to facilitate data understanding and usage by others, both people and machines
  • Users can now import and export campaigns and data models directly from the Talend Data Stewardship UI. This makes it easier to comply with IT policies by managing the promotion of configuration across different environments (downloadable software only)
  • UI now supports French and Japanese
MDM Design, ingest, author, curate, and update your master data faster:
  • License and identity management via Talend Administration Center for improved security
  • Single sign-on with Data Preparation and Data Stewardship saves time
  • REST API improvement (“IN” operator)
  • Survivorship rules per column in MDM integrated matching
  • Audit all user actions, including login/logout and configuration deployment, for security compliance
Talend Data Mapper Increase the performance of your complex mappings:
  • tHMapRecord, in addition to receiving, can send complex hierarchical structures to queue outputs such as Kafka (tKafkaOutput), and Kinesis (tKinesisOutput)
  • tHMap can create multiple outputs from a single input improving productivity
  • New transformation and expression language functions including upper-case, lower-case, translate, and contains
  • Improved conversion between hierarchical data and flat records

Extend Your Data Integration Reach

To see what components are in each Talend product, visit help.talend.com.

New and Updated Hadoop Distributions

  • Amazon EMR 5.8
  • Cloudera CDH 5.12, 5.13
  • MapR 6.0
  • Spark 2.2

New and Updated Components

  • Amazon S3
  • Cloudera Kudu
  • Couchbase
  • FTP
  • Hbase
  • Hive
  • MapR-DB OJAI
  • Marketo
  • Marklogic
  • Microsoft Azure Data Lake Store
  • Microsoft Dynamics CRM 2016 (on-premises)
  • MongoDB
  • Neo4J
  • Oracle Cloud
  • SAP Business Suite
  • SAP Hana
  • SAP s/4Hana
  • Snowflake
  • Sybase
  • Vertica