7 Hidden Gems in Talend
Summer ’19 (Talend 7.2)
Over the summer, Talend released Talend Summer ’19, which includes updates to Talend Cloud and Talend 7.2. The new release serves as the industry’s first fast and easy on-ramp to accelerate the development of all types of integration environments from simple ingestion tasks to the most comprehensive integration scenarios. Key themes are:
- Start Instantly. Get started online with a credit card and pay only for what you use with Talend Pipeline Designer PAYG.
- Faster Integration. Leverage enhanced connectivity (Azure, Databricks), broader deployment, and simplify governance
- Scale development. Enable agile scalability with Docker support for data services and leverage CI/CD effortlessly
For those that missed the live webinar, you can view it here.
But as exciting as the release is, there are many “hidden gems” that did not make the headlines and offer tremendous value to the Talend community today.
#1 Hidden Gem – Improved Data Privacy
As data privacy becomes more and more important, there is a critical need to protect data as it flows across the enterprise (and beyond). With GDPR and similar regulations, these enhancements will save your development team a lot of time.
The tDataMasking and tPatternMasking components can now securely mask data by leveraging Format-Preserving Encryption algorithms, allowing repeatable and bijective masking by providing a password.
Bijective masking functions have the following characteristics:
- They are consistent masking functions.
- They are injective, meaning that they output two different masked values for two different input values.
- They check that the input data is in a valid format. If the input value is valid, bijective masking functions output a valid value. If the input value is not valid, they output an invalid value or replace values with null, depending of the masking function used.
The original data is unreadable without the knowledge of the provided password. When data was masked using the tDataMasking and tPatternMasking components combined with a Format-Preserving Encryption algorithm and a password, the tDataUnmasking and tPatternUnmasking components, respectively, can retrieve the original data by reversing the masking using the same password.
The new tDataEncrypt component can protect data by encrypting it with AES-GCM and Blowfish algorithms and a user-defined password. The encrypted data is unreadable without the knowledge of the provided password and the generated cryptographic file. The tDataDecrypt component can decrypt data that has been encrypted using tDataEncrypt.
Additionally, Talend Data Preparation, Talend Data Stewardship and Talend Dictionary Service are now available for hybrid deployment. By enabling this setup, enjoy the best of both Cloud and on-Premises worlds by storing your sensitive data behind your firewall, while still managing your users and the rest of your platform from Talend Cloud.
#2 Hidden Gem – Data Preparation “Magic Fill”
Artificial Intelligence (AI) is all around us, from smartphones giving us guidance, to cars helping us drive safer. As we know, looking through data and making it clean can be a laborious task. That is where AI can help. As part of Data Preparation, the new “Magic Fill” function allows you to define a pattern based on a handful of examples, and via a machine learning algorithm, apply the transformation on a whole column. The Magic Fill gives you many formatting possibilities, on any data type.
#3 Hidden Gem – Support for Databricks transient clusters and Delta Lake
Controlling cloud costs for big data jobs is a must, such as using transient clusters which are compute clusters that stop billing and shut down when the process is finished. With this release, transient clusters are supported for Databricks on AWS with the introduction of an option to not restart the Databricks cluster when submitting a Job. This helps control costs and ensure better enterprise-readiness with Talend and Databricks.
With support (technical preview) for Databricks Delta Lake (tDeltaLakeInput and tDeltaLakeOutput components) you can leverage Delta Lake’s ACID compliance, Time Travel (data versioning), and unified batch and streaming processing. This provides three benefits for your big data projects:
- Better data consistency by leveraging Talend’s native data quality capabilities with Delta Lake ACID transactions
- Easy rollbacks and reprocessing because of Talend’s integration to Delta Lake Time Travel and data versioning capabilities
- High-volume processing at scale due to Talend’s support of the Delta Lake scale-out architecture
#4 Hidden Gem – Talend Studio Commit/Push Projects
Although a small change, it makes life a lot easier as Talend Cloud (Studio) users can now manually commit/push projects and give a custom message on Git.
#5 Hidden Gem – Free REST API Tester
We know how testing your services and the services of others can be a real chore. Available as a free Chrome extension, the API Tester (formally known as Restlet Client) can be used to visually create and run HTTP requests and other scenarios to make discovering and testing APIs easier. Tasks include:
- API Invocation and Interaction - make calls to HTTP APIs to validate their functional behavior, testing different parameters and values without writing a program or script.
- API Testing - Create and run unit tests as well as complex API scenarios decorated by assertions to validate your APIs.
- API Automation - API scenarios and tests execution can be easily done with a Maven plugin which you can integrate into your CI/CD. (e.g. Jenkins, Travis CI, Bamboo...)
- API Orchestration - you can combine multiple API requests into API scenarios. Variables can be passed from one response to the next request.
#6 Hidden Gem – Snowflake updates
Snowflake is an extremely popular cloud data warehouse. In this release, the Talend Snowflake components come with several new features so you can leverage the flexibility of external tables and materialized views directly within Talend.
In the tSnowflakeBulkExeccomponent, you can specify the path or location where you upload data, and specify the prefix of the files to be uploaded; you can specify the maximum retry times when an error occurs during loading data to the S3 bucket; and you can specify the maximum number of times to retry when an error occurs during loading data to the internal Snowflake storage.
The tSnowflakeOutputBulkExec, tSnowflakeBulkExec, and tSnowflakeOutputBulk components now offer the “Use Custom S3 Connection Configuration” option. The tSnowflakeOutputBulk and tSnowflakeOutputBulkExec components now support the Non-empty Storage Folder Action option, where you can add new files, cancel an upload, or replace existing files.
Finally, the Snowflake components for Spark Batch have been added (technical preview).
#7 Hidden Gem – ESB Docker images
Docker makes it incredibly easy to create, deploy and run applications in containers. Data service or route-based microservices can now be packaged up with all of their dependencies and shipped out as a single package. This new feature enables users to easily package their services for multi-cloud platforms, speed deployment, improve CI efficiency, and ensure compatibility and maintainability.
That's it for our share of Hidden Gems that can be found in Talend Summer '19 release. If you're still in the treasure-hunting mood, check out 5 Hidden Gems in Talend Spring ’18 (Talend 7.0) and Hidden gems in Talend Component Palette.