This past January, Talend Data Preparation 2.0 officially went live! In this article, I want to highlight some of the new features we’ve packed into our latest release.
In a nutshell, Talend Data Preparation 2.0 allows users to democratize the exploitation of big data by taking into account the data types specific to each client, as well as in terms of the scalability of performance it offers. Also, it is the first data-processing tool that supports Apache Beam technology and it enables you to remain at the forefront of data processing environments (first Spark, then MapR, Flint, APEX, etc.)
The latest release of Talend Data Preparation also retains all the functional power and user experience that allows users to get data perfectly cleansed, enhanced and standardized in minutes rather than days. Here are 3 big updates to Talend Data Preparation that you should know about:
Data Preparation to Democratize your Big Data and Data Lake
How do you allow non-technical business users to fully utilize the treasure trove of big data stored in data lakes? Like marketing teams who want to analyze click streams from the website or sales tickets returned from the store network. Or finance, accounting or purchasing users who want to use their vendor’s billing details or historical customer financial health data?
Talend Data Preparation helps you unleash the full power of your data lake! These business users can confidently access all the data sources to which they are entitled so that the data can be viewed, discovered, cleansed, standardized according to their own management rules in minutes rather than days.
Functionally, IT provides users with self-service ‘sanctioned big datasets’ from the data lake through an HDFS connector. Depending on their rights, users can even benefit from even greater access by accessing the data lake themselves. Then, users intuitively prepare the data via a web interface, at the pace of their discovery of the data file.
To enable business users to manage millions or billions of lines of big data files, Talend Data Preparation helps users work on smart sampling of data and their work is automatically applied to the data set. These preparations are then put into production by IT to deliver them back into the data lake, or in any business application, on-prem or cloud. Here again, users can benefit from a greater autonomy according to your management rules: they can generate their own export files.
Talend Data Preparation also provides self-service access and a Hadoop File and Storage System (HDFS) export for CSV, Parquet, and Avro by natively accessing the Kerberos authentication system.
Note that Talend Data Preparation also allows any user to prepare and integrate data from any given database type (JDBC connector), any application, any Excel or CSV file received by email or stored locally. Talend Data Preparation’s maximum connectivity serves all data exploitation scenarios.
Talend Data Preparation Automatically Learns Data Language
Each company works on both standard data (name, first name, telephone number, VAT number, cities, countries, etc.) and on specific data (customer numbers, product codes, analytical accounting codes, Etc.).
If your data preparation application can’t recognize the semantic type of these specific data, how can it guarantee reliable self-discovery of data, effective self-diagnosis and self-sufficiency? Simply put, if your data preparation application does not know how to recognize and then learn your specific data types, you will fall into the old 80/20 trap of data preparation.
Talend Data Preparation speaks your business language by taking into account your specific semantic types of data. Its Data Dictionary Service analyzes and defines them once and for all. You benefit from the automation and ease of analysis of optimal data no matter the data you need to prepare.
Talend Data Preparation Now Supports Apache Beam
Talend Data Preparation opens the possibilities for all users by democratizing the operations of your Big Data environment and Data Lake in a few minutes. But the exploitation of these enormous volumes of data that are both extremely varied and generated in real time requires advanced data processing performance. Given the pace of innovations in the field of big data, investments can quickly become obsolete and cost prohibitive for companies. The race for innovation becomes a brake on adoption ( for example, there is on average a new version of Spark every 6 months).
In order to help companies escape this vicious cycle, Talend now supports Apache Beam and was the first data preparation application to do so. This enables companies to deliver a sustainable data preparation service to their users, regardless of the platform used.
Functionally, Apache Beam helps users avoid having to rewrite applications as innovations or migrations of systems towards the cloud or evolutions of the scenarios of integration (batch, real-time). Users create their data preparation models once and run them anywhere on unlimited data volumes. Talend Data Preparation 2.0 delivers unprecedented agility, perfect scalability, and state-of-the-art performance.
Technically, Apache Beam adds an abstraction layer between the data preparation application and the various data processing execution environments. Beam hides this complexity by allowing Talend Data Preparation to be agnostic about technologies.
Test Talend Data Preparation now and look out for more updates in the future!