Big Data Features Comparison Matrix

Modeling / Documentation

  Talend Open Studio for Big Data Talend Enterprise Big Data Talend Platform for Big Data
Business Modeler   checkmark checkmark
Auto Doc checkmark checkmark checkmark


  Talend Open Studio for Big Data Talend Enterprise Big Data Talend Platform for Big Data
Job Designer checkmark checkmark checkmark
Big Data Components checkmark checkmark checkmark
MapReduce Job Designer   checkmark checkmark
Pig Map checkmark checkmark checkmark
Google BigQuery checkmark checkmark checkmark
YARN (MapReduce 2.0) Support checkmark checkmark checkmark
Visual MapReduce Job Optimization checkmark checkmark
Data Viewer for Hadoop checkmark checkmark
Hadoop Security with Kerberos checkmark checkmark checkmark
Support for Big Data Hadoop Distributions checkmark checkmark checkmark
ETL Support checkmark checkmark checkmark
ELT Support checkmark checkmark checkmark
Versioning   checkmark checkmark
Shared Repository   checkmark checkmark
Repository Manager     checkmark
Wizards checkmark checkmark
Impact Analysis   checkmark checkmark
Data Lineage   checkmark checkmark
Change Data Capture   checkmark checkmark
Business Rules   checkmark checkmark
NoSQL Support checkmark checkmark checkmark
Centralized Metadata Management   checkmark checkmark
Visual Mapping for Complex XML and EDI   checkmark


  Talend Open Studio for Big Data Talend Enterprise Big Data Talend Platform for Big Data
Talend Administration Center   checkmark checkmark
Web Based Deployment   checkmark checkmark
Execution Plan, Time And Event Based Scheduler   checkmark checkmark
High Availability, Load Balancing And Failover   checkmark checkmark
Hadoop Deployer and Scheduler checkmark checkmark checkmark


  Talend Open Studio for Big Data Talend Enterprise Big Data Talend Platform for Big Data
Activity Monitoring Console   checkmark checkmark
Dashboard   checkmark checkmark
Error Recovery   checkmark checkmark

Data Quality

  Talend Open Studio for Big Data Talend Enterprise Big Data Talend Platform for Big Data
Data Profiling And Database Analysis     checkmark
Big Data Profiling     checkmark
Built-In Pattern Library     checkmark
Batch Execution Of Analyses     checkmark
Indicators (Simple, Text, Summary And Advanced Statistics & Metrics)     checkmark
History Of Analyses     checkmark
Report Generation     checkmark

Data Cleansing

  Talend Open Studio for Big Data Talend Enterprise Big Data Talend Platform for Big Data
Pattern Matching     checkmark
Name And Address Cleansing     checkmark
Third-Party Address Validation Services     checkmark
Fuzzy Matching (Soundex, Soundexfr, Levenshtein, Jaro-Winkler, Q-Gram)     checkmark
Record Matching (Match, No Match, Suspect)     checkmark
Hadoop Matching     checkmark

Reporting and Portal

  Talend Open Studio for Big Data Talend Enterprise Big Data Talend Platform for Big Data
Intuitive Administration Web-Based Console     checkmark
Report On Potential Primary Keys     checkmark
Report On Orphan Tables     checkmark
Predefined Global Quality Gauges     checkmark
Access To OLAP Analysis Structures     checkmark
Customized Queries And Reports     checkmark
Report Import/Export     checkmark

License Type & Indemnification

  Talend Open Studio for Big Data Talend Enterprise Big Data Talend Platform for Big Data
Open Source License checkmark    
Source Code Access checkmark checkmark checkmark
Subscription License   checkmark checkmark
Indemnification And Warranty   checkmark checkmark

Support & Documentation

  Talend Open Studio for Big Data Talend Enterprise Big Data Talend Platform for Big Data
Community-Based: Forums, Bugtracker... checkmark checkmark checkmark
Access To Talend Technical Support checkmark checkmark checkmark
Enterprise Grade Support With SLAs   checkmark checkmark
Documentation checkmark checkmark checkmark
Premium Services Levels     checkmark

Job Designer

The Job Designer provides both a graphical and a functional view of the actual integration processes using a graphical palette of components and connectors. Integration processes are built by simply dragging and dropping the components and connectors onto a graphical workspace, drawing connections and relationships between them, and setting their properties.
The Job Designer capabilities give access, via an exhaustive library of components, to all types of source and target needed for data integration, data migration or synchronization processes. Components and connectors cover all types of tasks and operations on the data itself, on the data management as well as on the data flow sequencing. Connectors help access and read/write all data source and target systems for data integration, data migration and data synchronization. Parameters are configured centrally in one view when selecting each component involved in the Job or can be inherited from the Metadata Manager (repository). Complex components are equipped with dedicated and intuitive graphical interfaces or built-in wizards helping users to build their Jobs.

To maintain the readability of a Job design, the Job diagram can be divided into Subjobs, and then can be set out as child and parent Jobs to sequence their execution. Orchestration components as well as various types of relationships help user sequencing their process execution. A built-in console view lets users quickly monitor execution, check and track performance directly from the Studio.

Big Data Components

Support for a rich palette of easy to configure components for Hadoop Distributed File System (HDFS), Pig, Hcatalog, Hbase, Sqoop and Hive.

MapReduce Job Designer

Talend MapReduce jobs are a major productivity tool that allows ETL developers with zero Hadoop experience to develop complex data transformations, just like they would for Oracle or MySQL, but have Talend auto-generate a Map/Reduce job that can be run on any Hadoop platform as a native job. The generated code is 100% Apache Hadoop compliant and Apache licensed. Developers simply create an input scheme, visual map the data to some output format, and then deploy this on Hadoop, which is then executed just like any other M/R job.

Pig Map

Pig Map allows developers familiar with Pig Latin to design and implement complex data transformations from within Studio. Developers simply drag-n-drop inbound data and map it to the outbound to create a pipeline for data processing that is then automatically generated as a Pig job that can then be executed on any Hadoop platform. Pig Map also includes the support for custom UDFs (User Defined Functions) either built within Talend Open Studio or elsewhere, allowing for custom libraries to be re-uses as functions data job.

Google BigQuery

Support for Google BigQuery, allowing users to upload large datasets to Google's Cloud Platform and extract results.

YARN (MapReduce 2.0) Support

Talend supports YARN natively, so customers can immediately benefit from the better resource management (optimization and scheduling) that YARN offers, leading to higher performing applications.

Visual MapReduce Job Optimization

Visual statistics and indicators are provided for MapReduce jobs in Talend Studio, such as seeing the impact of running an aggregate at different points in the job.

Data Viewer for Hadoop

With Talend Studio, users can see a sample of the data set they are working on, helping developers quickly build logic.

Support for Big Data Hadoop Distributions

Talend simplifies integration with these new big data platforms, so you spend less time integrating systems and more time using them.

ETL Support

ETL (Extract, Transform & Load) is the default mode used by Talend's data integration solutions. It consists in processing data rows one right after the other in a flow mode. This mode is specifically adapted for heterogeneous environments and it enables the integration of any technology in the source and target systems (web service, files, databases, MOM, business applications, etc.). ETL mode can also be used in both batch and real time processing. The ETL processes can be run in parallel to further accelerate their execution.

Talend's unique architecture is not restricted to any execution engine since it generates autonomous processes that can be deployed on any server (internal or external to the company). Also, the ETL processes can be executed as close to the data as possible minimizing access time and bandwidth consumption in addition to eliminating bottlenecks.

In the same Job, ETL can be combined with the ELT approach (see following paragraph) to obtain the highest level of performance without any architectural constraints.

ELT Support

Talend's data integration solutions also support ELT mode (Extract, Load & Transform for processing data in a set operation (using the Union, Except and Intersect operators) directly on the DBMS of the target database.

This mode is for use in a homogeneous environment (one database) and has advantages for processing very large volumes of data in "data warehouse appliance" environments like Teradata, Netezza, etc.

In the same Job, ELT can be combined with the ETL approach (see previous paragraph) to obtain the highest level of performance without architectural constraints.


Talend Open Studio simplifies versioning of items. Versioning best practices facilitate item reusability and simplify reverting to a previous development stage.

A major and minor version number is automatically set at Job creation, and can then be easily incremented over time and when updates occur by using the dedicated version control panel available directly in the Designer perspective.

All items created in Studio can be versioned: Business Models, Jobs, Routines, Metadata, and Documentation.

Shared Repository

The Shared Repository (or Metadata Manager) is designed to consolidate all project information and enterprise metadata in a centralized repository shared by all stakeholders in the integration processes. It helps to store and share all Talend items including: Business Models, Jobs (processes), Joblets, Routines, Metadata definitions (such as connections to source/target systems).

Studio users are granted access to projects according to their roles and permissions defined in Talend Administration Center.

Behind the shared repository is an industry-standard source manager (Subversion) that provides storing and managing of all versions of all items. It deals with different branches, provides check-in/check-out, manual or automatic commit, and commenting. An automatic locking system guarantees that the job that is being designed is effectively locked and that no other user could change the same job at the same time.


Talend Studio graphical wizards speed development and deployment.

Data Lineage

The Data Lineage feature helps you understand where a change occurred.

This feature is available from the Metadata Manager and can be carried out on any column of any metadata (database, file). The result of the data lineage shows in a report which traces a change from the target end component of a job up to the source end.
You can export this report as an HTML file.

Change Data Capture

Data warehousing involves the extraction and transfer of data from one or more databases into one or more target systems for analysis. However, this means the extraction and transfer of huge volumes of data which can be very consuming in both resource and time.

The ability to capture only the changed data in real time is known as Change Data Capture (CDC). Capturing changes reduces the traffic of data between systems and helps reduce ETL time.
Talend CDC architecture is based on a publisher/subscriber model. The publisher captures the data changes and makes them available to the subscribers (Talend Jobs). Subscribers utilize the data changes obtained from the publisher.

This feature detects changed records in real time, allowing the changed data to be sent immediately to Subscriber Jobs consequently cutting the time needed to load and update data during ETL or operational data integration.

Talend's Change Data Capture features the most commonly used modes: Trigger and Redo logs. The available mode depends on the type of databases involved.

Business Rules

Business rules are generally defined by business users through specification documents which are then interpreted and implemented by technical staff.

Talend embeds a business rule engine that helps users configure their own business rules. Users can define market segmentation criteria (by age, region...) and set their business rules via an Excel spreadsheet or through the Drools Guvnor interface directly the web-based Talend Administration Center.

The Drools Guvnor interface enables business experts to use a graphical editor to create and edit rules quickly and directly, control access to rules and other features, and manage rule versions and modification over time. Rules can be tested and called from the developed jobs.

NoSQL Support

Developers use the Talend Studio palette to wire together a data-flow to move data in and out of their NoSQL systems.

Visual Mapping for Complex XML and EDI


Business Modeler

The Business Modeler is a non-technical tool (like Microsoft Visio). It helps you to structure all relevant documentation and technical elements supporting the data integration process in a business-friendly diagram allowing different Team Editions (Design, Dev, Test, Prod...) to work on a common model, using a common tool.
For example, business users use business models to express their data integration needs. The IT development and operation staff can thus better understand these business needs and translate them into technical processes (Jobs). After each technical implementation stage (Jobs) is completed, the business model can easily be updated, showing the progress of development for other stakeholders to follow up.

DBAs can use business models to share the required DB connection metadata and system architect can thus have the big picture of the required needs in terms of data integration.
Designing business models is part of enterprises' best practices that organizations should adopt at a very early stage of a data management or integration project in order to ensure its success. Because Business Models usually help detect and resolve quickly project bottlenecks and weak points, they help limit the budget overspendings and/or reduce the upfront investment.

Auto Doc

This functionality permits generating, on request, a detailed technical documentation for all your jobs. This documentation gathers job metadata (author, version, status, update date, etc.), a graphical view of the job and all the parameters of all the components used in this job in an interactive format easy to use (HTML / XML).

This documentation can be easily enriched with personalized comments.

Talend Administration Center

All subscription offers come with one Studio (or more depending on the user number) and a software part which can be installed on a server and administrated through a web-based interface, the Talend Administration Center, a lightweight application (in a browser, no deployment needed) that helps integration project managers to administrate users, projects, user privileges, and licenses.

All Studios move from local mode to remotely connected mode to the projects defined in the Talend Administration Center.

Project authorizations are assigned easily on a per user basis (supporting LDAP directory). Users are granted rights to access projects based on their role, e.g. No permission, Read Only, Read & Write.

Users can then share repository items (Jobs, Business Models, DB connection metadata...) with other users directly from the Studio for the projects they are authorized on.

Numerous additional plug-ins are available on the left navigation panel (Dashboard, SOA manager, Server manager, ..) depending on your license.

Execution Plan, Time And Event Based Scheduler

The Event Scheduler extends time-based scheduling capabilities for real-time integration.
The event listener allows the process executions to trigger an execution on an on-demand basis, or based on an event.
The Execution Plan feature helps you sequence and orchestrate various job executions and ease error recovery, directly from the Job Conductor. The execution plan is a task-based feature that outlines dependencies among different tasks in a sequence.

The task dependencies are defined through a hierarchical view of main and child tasks where each task can have a subordinate task.

Execution plans can be scheduled and triggered, and can use environment-defined execution parameters from this Job Conductor.

Events can be file-based such as file appearing, disappearing or file modification or SQL-based using "wait for" conditions. Once the expected event is identified, the execution task is triggered and the job deployment is carried out.

You can easily add new event triggers to any task for further automation.

The Time-based scheduler helps you roll out a job execution at a defined time and date (e.g first Monday of the month, every Tuesday...) or on a regular basis over a period of time. A Task is used to centralize all information necessary for the job execution (project name, job name, job version, server...)
The task is then triggered upon schedule and the job is deployed and executed automatically on the defined server at the defined time. A convenient status system helps your monitor the triggering state and the execution success or failure directly from the Job Conductor.

With the Professional Edition, an additional event/file based scheduling feature is available.

High Availability, Load Balancing And Failover

High availability is achieved by deploying multiple job conductors and job execution servers, while clustering the databases guarantees failover and prevents any execution disruption.
The Grid Conductor module (accessible through the Job Conductor) optimizes the scalability and availability of integration processes. The Grid Conductor relies on the definition of virtual servers, which group available resources, regardless of the system type (CPU, OS...).

Tasks are assigned to virtual servers of the Grid Conductor rather than to a single execution server.

Via a constant monitoring of the resources available on the execution servers, Grid Conductor removes bottlenecks by guaranteeing that all jobs execute smoothly at triggering time while leveraging available resources. This alleviates any concerns related to resource preemption when a large number of jobs run concurrently, or when non-dedicated servers are used. Grid Conductor also provides automatic failover in the event an execution resource becomes unavailable.

Hadoop Deployer and Scheduler

The Hadoop Deployer and Scheduler allows user to remotely deploy a job to Hadoop. Leveraging Apache Oozie, the scheduler is 100% open source and will work with any Hadoop distribution that has Oozie installed.

Users can configure the scheduler to run the job immediately or a using a time based trigger.

Activity Monitoring Console

Talend Activity Monitoring Console (AMC), provides detailed monitoring capabilities for consolidating collected log information, understanding underlying job interactions, preventing faults that could be unexpectedly generated and supporting system management decisions.

The AMC monitors job events (successes, failures, warnings, etc.), execution times and data volumes through a single console.

This tool is available as a stand-alone tool accessible through a browser or as part of Talend Studio.


The Dashboard is a browser-based version of the Activity Monitoring Console. The Dashboard provides execution performance diagrams and status indicators, enabling any stakeholder to view both the current and historical status of any integration process execution.

It also provides detailed monitoring capabilities that can be used to consolidate log information collected, understand the underlying component and job interaction, provide task execution information in a timely manner, prevent faults that could be unexpectedly generated, and support system management decisions.

Error Recovery

Job execution processes can be time-consuming, as are backup and restore operations.

Talend Open Studio includes a recovery checkpoint capability that is set up at lob design time.

In case of failure, processes can be resumed from one of the checkpoints. Job developers can also design and integrate specific error management in response to specific error conditions using the checkpoint "on-failure" instruction function.

Recovery checkpoints can be appropriately initiated at specified intervals of the data flow (on trigger connections). The purpose is to minimize the amount of time and effort necessary when a job execution process needs to be restarted due to a failure.

With the help of the error recovery checkpoint feature, the process can be restarted from the latest checkpoint prior to the failure (or any other checkpoint before the failure occurred), rather than from the beginning of the job execution process.

Data Profilng And Database Analysis

Offers an overview of the content of a catalog. It computes the number of tables and the number of rows per table for each catalog and/or schema.

Big Data Profiling

Big data profiling allows users to analyze their data in their Hive database on Hadoop through Talend Studio which generates native HiveQL code that is executed on the Hadoop cluster

Built-In Pattern Library

Use out-of-the-box or custom SQL expressions to evaluate and test data. Use regular expressions to evaluate data validity, including e-mail, part numbers, postcodes and more.

Batch Execution Of Analyses

Run your analysis as part of a data integration or MDM job, or call analysis from an outside application.

Indicators (Simple, Text, Summary And Advanced Statistics & Metrics)

Indicators include row counts, null counts, unique values, duplicate counts, blank counts, min/max lengths, frequencies, patterns and specific phone number statistics and much more. Also includes mathematical statistics like mean, median and range.

History Of Analyses

Store the history of data quality.

Report Generation

Generate reports to share with your team in PDF, HTML, XLS formats and more.

Pattern Matching

Ensure that data conforms to specific shapes and patterns.

Name And Address Cleansing

Cleanse common address attributes, like name, address, state, city, postal code using included patterns and reference data. Leverage any trusted source to standardize and enrich your data.

Third-Party Address Validation Services

Leverage third-party address validation vendors to check addresses and validate them for postal discounts.

Fuzzy Matching (Soundex, Soundexfr, Levenshtein, Jaro-Winkler, Q-Gram)

Includes algorithms for finding relationships in data using fuzzy matching. Use one included or customize with your own algorithms.

Intuitive Administration Web-Based Console

Publish data quality metrics to a web-based portal to share the status of data quality with a cross-functional team.

Report On Potential Primary Keys

Understand which attributes are potential primary keys and validate those attributes that should be unique.

Report On Orphan Tables

Check the relationships of tables in your relational database and uncover orphan tables

Predefined Global Quality Gauges

Create a data quality dashboard that tracks current and historical status of data quality.

Open Source License

Talend Open Studio products are free to download and use under an open source licenses.
Details of the license used can be found on the specific product download page.

The Apache License

Source Code Access

Talend Open Studio source code is available at:

For complete transparency and consistency, Talend also provides the clients who request it access to the source code of all of the tools available in the commercial edition.

Subscription License

The "enterprise" versions include value-added features and services that enhance the open source products; these versions are distributed under a commercial license.
Talend's pricing model guarantees transparency and predictability: the price is not based on the volumes of data or potential additional needs for connectors or CPUs, rather it corresponds to the number of developers (Studio), the level of features (edition selected) and the subscription term.

This subscription approach guarantees your return on investment: the number of licenses can be increased or decreased every year to adapt to the evolution of a project's range and its staff.

The Talend solutions are cheaper to deploy, maintain and support; they are 50 to 80% less expensive than the equivalent proprietary solutions.

Indemnification And Warranty

Because open source software results from collaborative development efforts, the final code combines contributions from diverse resources. If the integration of the various contributions to the code is not carefully managed and controlled, the final software use might infringe upon the original contributors' rights. The end user might then be subject to legal and financial prosecution for infringement, even though such infringement was not intentional.

Talend offers an Indemnification clause to its subscription customers. This is a guarantee for the user that Talend will provide legal and financial protection, in the event that the Talend code infringes the rights of a third party.

Community-Based: Forums, Bugtracker...

The Talend user community, composed of tens of thousands of professionals, is extremely active. The main contributions of the community include:

• Testing and the quality of new versions,

• Requests for new features,

• Product translation and localization,

• Support and exchanges via the forums,

• Development and sharing of new components, connectors, jobs, models and other plug-ins.

Talend Exchange enables community members to publish their own plug-ins in order to share them with other users. Some of these contributions are ultimately integrated into the product, after Talend's in-house R&D team completes in-depth testing and improvements.

Additionally, Talend contributes to numerous key open source projects and is a member of the Eclipse and Apache Foundations. For more info on this see

Access To Talend Technical Support

By subscribing to Talend support services, users benefit from the knowledge of Talend's technical experts, who are directly connected with Talend's R&D organization.

Enterprise Grade Support With Slas

By subscribing to Talend Support Services, you benefit from the experience of our in-house technical services experts, who are daily in touch with our research and development team. These services were established to insure effectiveness, security, and peace of mind of our subscription customers. They are available in three levels: Silver, Gold and Platinum. Each of these levels is associated with guarantees related to the initial time spent to respond to a declared bug, the response time spent to provide a patch, etc.
Technical support


The documentation is available as a free download in PDF format, in English and French. Two guides, the User Guide and the Components Reference Guide are available at:

Furthermore, you can have a look at our Tutorials which are a valuable source of information!

You can access them at:

Premium Services Levels

Talend offers several levels of service to address your data management requirements.

Repository manager

Talend Repository Manager is a cross-platform, multi-repository administration tool to access the common centralized resources and to manage project and artifact migration between repositories.

Talend Repository Manager enables software development lifecycle (SDLC) best practices, by allowing cooperative work between administrators and users whom work with multiple repositories across multiple IT environments, for example development, test and production.

Record Matching

Based on your thresholds for matching, identify records that are matches, records that are unique, and those records that need manual inspection to determine match status.