Open Source License
Talend Open Studio for Data Integration and Talend Open Studio for Big Data are free to download and use under an open source license.
GNU General Public License (GPL)
The GNU General Public License is a license that establishes the legal conditions for the distribution of free software of a GNU project. The purpose of the GNU GPL license is to guarantee the following rights to the user:
• the right to execute the software for any use and without limitation;
• the right to analyze the functioning of the software and adapt it to their needs.
If the author of modifications to the software decides to distribute this software, he or she must do so under the GPL license. The entire text of the GPL license can be viewed at: http://www.opensource.org/licenses/gpl-2.0.php
The Apache License
Talend Commercial License
The enterprise versions of Talend offerings include value-added features and services that enhance the open source products; these versions are distributed under a commercial license.
Talend Open Studio source code is available at:
For complete transparency and consistency, Talend provides customers with access to the source code of the tools in its commercial editions upon request.
Source Code Access
Talend Open Studio for Data Integration source code is available at: http://www.talendforge.org/trac/tos/
For complete transparency and consistency, Talend also provides the clients who request it access to the source code of all of the tools available in the commercial edition.
The "enterprise" versions include value-added features (see below) and services that enhance the open source products; these versions are distributed under a commercial license. Talend's pricing model guarantees transparency and predictability: the price is not based on the volumes of data or potential additional needs for connectors or CPUs, rather it corresponds to the number of developers (Studio), the level of features (edition selected) and the subscription term. This subscription approach guarantees your return on investment: the number of licenses can be increased or decreased every year to adapt to the evolution of a project's range and its staff. The Talend solutions are cheaper to deploy, maintain and support; they are 50 to 80% less expensive than the equivalent proprietary solutions.
The Business Modeler is a non-technical tool that enables collaboration between technical and business users to structure all relevant documentation and technical elements supporting the data integration process in a business-friendly diagram allowing different Team Editions (Design, Dev, Test, Prod...) to work on a common model, using a common tool.
For example, business users use business models to express their data integration needs. The IT development and operation staff can thus better understand these business needs and translate them into technical processes (Jobs). After each technical implementation stage (Jobs) is completed, the business model can easily be updated, showing the progress of development for other stakeholders to follow up.
DBAs can use business models to share the required DB connection metadata and system architect can thus have the big picture of the required needs in terms of data integration.
Designing business models is part of enterprises' best practices that organizations should adopt at a very early stage of a data management or integration project in order to ensure its success. Because Business Models usually help detect and resolve quickly project bottlenecks and weak points, they help limit the budget overspendings and/or reduce the upfront investment.
This functionality permits generating, on request, a detailed technical documentation for all your jobs. This documentation gathers job metadata (author, version, status, update date, etc.), a graphical view of the job and all the parameters of all the components used in this job in an interactive format easy to use (HTML / XML).
This documentation can be easily enriched with personalized comments.
With AutoDoc+, the technical documentation (see Auto Doc) is automatically generated for each version of each job: when you save a job, its documentation is updated and stored in the repository; therefore, it is automatically shared and available for all users.
AutoDoc+ also permits customizing the graphical display of this documentation by adding your own logo and the name of your company, or by changing the colors through a customized CSS.
Enterprise grade support with SLAs
By subscribing to Talend Support Services, you benefit from the experience of our in-house technical experts, who are daily in touch with our R&D Team Edition. These services were established to insure effectiveness, security, and peace of mind of our subscription customers. They are available in three levels: Silver, Gold and Platinum. Each of these levels is associated with guarantees related to the initial time spent to respond to a declared bug, the response time spent to provide a patch, etc.
User Guide, Reference Guide
The documentation of Talend Open Studio for Data Integration is available as free download in PDF format, in English and French. Two guides, the User Guide (276 pages) and the Components Reference Guide are available at: http://www.talend.com/resources/documentation.php
You can also buy a printed version of those guides on Amazon:
Furthermore, you can have a look at our Tutorials which are a valuable source of information!
You can access them at: http://www.talendforge.org/tutorials/menu.php
The Job Designer provides both a graphical and a functional view of the actual integration processes using a graphical palette of components and connectors. Integration processes are built by simply dragging and dropping the components and connectors onto a graphical workspace, drawing connections and relationships between them, and setting their properties.
The Job Designer capabilities give access, via an exhaustive library of components, to all types of source and target needed for data integration, data migration or synchronization processes. Components and connectors cover all types of tasks and operations on the data itself, on the data management as well as on the data flow sequencing. Connectors help access and read/write all data source and target systems for data integration, data migration and data synchronization. Parameters are configured centrally in one view when selecting each component involved in the Job or can be inherited from the Metadata Manager (repository). Complex components are equipped with dedicated and intuitive graphical interfaces or built-in wizards helping users to build their Jobs.
To maintain the readability of a Job design, the Job diagram can be divided into Subjobs, and then can be set out as child and parent Jobs to sequence their execution. Orchestration components as well as various types of relationships help user sequencing their process execution. A built-in console view lets users quickly monitor execution, check and track performance directly from the Studio.
Talend offers native technical and business open source connectors to access all IT environments. This wide array of ever-expanding connectors is the key to the successful interoperability of applications and databases. It allows bridging diverse and heterogeneous data structures at unmatched performance rates. More than 450 components are available, free of charge, 60% of which are designed and developed by the Talend community.
Connectors and components developed externally can be shared via the Talend Exchange. A number of submitted components go through validation and optimization by Talend, before they get integrated natively and supported.
Refer to http://www.talendforge.org/components for an exhaustive list of supported connectors.
ETL (Extract, Transform & Load) is the default mode used by Talend's data integration solutions. It consists in processing data rows one right after the other in a flow mode. This mode is specifically adapted for heterogeneous environments and it enables the integration of any technology in the source and target systems (web service, files, databases, MOM, business applications, etc.). ETL mode can also be used in both batch and real time processing. The ETL processes can be run in parallel to further accelerate their execution.
Talend's unique architecture is not restricted to any execution engine since it generates autonomous processes that can be deployed on any server (internal or external to the company). Also, the ETL processes can be executed as close to the data as possible minimizing access time and bandwidth consumption in addition to eliminating bottlenecks.
In the same Job, ETL can be combined with the ELT approach (see following paragraph) to obtain the highest level of performance without any architectural constraints.
Talend's data integration solutions also support ELT mode (Extract, Load & Transform for processing data in a set operation (using the Union, Except and Intersect operators) directly on the DBMS of the target database.
This mode is for use in a homogeneous environment (one database) and has advantages for processing very large volumes of data in "data warehouse appliance" environments like Teradata, Netezza, etc.
In the same Job, ELT can be combined with the ETL approach (see previous paragraph) to obtain the highest level of performance without architectural constraints.
Talend Open Studio simplifies versioning of items. Versioning best practices facilitate item reusability and simplify reverting to a previous development stage.
A major and minor version number is automatically set at Job creation, and can then be easily incremented over time and when updates occur by using the dedicated version control panel available directly in the Designer perspective.
All items created in Studio can be versioned: Business Models, Jobs, Routines, Metadata, and Documentation.
While developing jobs with Talend you may need to view the content of various source or target systems (files, DB, etc). The Data Viewer is directly accessible within the Studio through a simple right-click on any component. It is a convenient way to view data contained in your source/target systems regardless of their format (Excel, DB table, CSV...) while you are developing your integration processes. The Data Viewer drills down into the data source/target systems regardless of the application usually needed to open it: Notepad for txt & csv files, a SQL query browser for database tables, MS Excel for .XLS files, html browser, etc. No need to browse systems using multiple tools, the Data Viewer uses the defined source/target path settings to go straight to the actual data.
Dynamic schemas allow the designing of jobs with an unknown column structure and number. Depending on the choice of the developer, dynamic columns can be mapped directly to the target using pass-through mode. Dynamic schemas makes designing certain types of jobs easy, such as a replication scenario, simple one-to-one mapping of many columns, or when you need to migrate a whole database with hundreds of tables without knowing all of the table structures.
The Data Lineage feature helps you understand where a change occurred.
This feature is available from the Metadata Manager and can be carried out on any column of any metadata (DB, file). The result of the data lineage shows in a report which traces a change from the target end component of a Job up to the source end.
You can export this report as an HTML file.
The Job Compare feature helps identify differences between two job versions or different jobs.
Job Compare is fully integrated in Talend Enterprise Data Integration Studio. The result of Job Compare is a visual and interactive report in html or xml where differences are highlighted.
In this example, the comparison report shows that the delimiter field in the tFileInputDelimited component properties is not defined the same way for both jobs being compared: in version 3.2 delimiter is "\t" while in version 4.2 it is "\n".
Joblets help you factorize a job part (or Subjob) into a Joblet component. Simply select the components forming the Job part you need to reuse or want to factorize and click on the "Refactor to Joblet" menu item.
Automatically, the job design gets simplified, as the selected components are collapsed into a single Joblet component. This Joblet component can be shared through the dedicated Joblets folder in the Palette of components and is thus easily reusable in any other Job.
Joblets drastically simplify the maintenance of redundant and complex jobs.
Additionally, an "Impact Analysis" mechanism helps you find out which jobs use a defined Joblet.
The Reference Projects help avoid duplication (copy-paste) of items (Jobs, Routines, Documentation, Metadata...) between projects.
"Slave" projects are linked to one (or more) “Master” project(s) by reference and thus inherit items from the parent(s) project(s).
The resources coming from the Master project appears in the Slave project in read-only mode: they are available for reuse and execution but cannot be modified.
Because a strong link is thus established between Slave and Master projects, then, as soon as someone modifies an item in the Master project, all slave projects get updated accordingly.
The Reference Projects share all redundant items of a project (Jobs, templates, metadata) in order to make them available to other projects. This feature helps to leverage and reuse the 30% of items that are usually common to all Data Integration projects, reducing drastically the associated maintenance.
Change Data Capture
Data warehousing involves the extraction and transfer of data from one or more databases into one or more target systems for analysis. However, this means the extraction and transfer of huge volumes of data which can be very consuming in both resource and time.
The ability to capture only the changed data in real time is known as Change Data Capture (CDC). Capturing changes reduces the traffic of data between systems and helps reduce ETL time.
Talend CDC architecture is based on a publisher/subscriber model. The publisher captures the data changes and makes them available to the subscribers (Talend Jobs). Subscribers utilize the data changes obtained from the publisher.
This feature detects changed records in real time, allowing the changed data to be sent immediately to Subscriber Jobs consequently cutting the time needed to load and update data during ETL or operational data integration.
Talend's Change Data Capture features the most commonly used modes: Trigger and Redo logs. The available mode depends on the type of databases involved.
Business rules are generally defined by business users through specification documents which are then interpreted and implemented by technical staff.
Talend Enterprise Data Integration embeds a business rule engine that helps users configure their own business rules. Users can thus define market segmentation criteria (by age, region...) and set their business rules via an Excel spreadsheet or through the Drools Guvnor interface directly the web-based Talend Administration Center.
The Drools Guvnor interface enables business experts to use a graphical editor to create and edit rules quickly and directly, control access to rules and other features, manage rule versions and modification over time. Rules can be tested and called from the developed jobs.
Contexts enable nearly any parameter of components / jobs to be externalized. This helps for example users to define parameters on the fly at run time or to use different settings for testing/production.
Contexts can be defined as needed for all types of environments (Development, Test, Production...) with no limitation in terms of number of context created.
Users can switch context at any time, design time or run time to use the defined setting.
Parameter values can also be changed via a dialog box at design and testing time. Additionally, a dedicated parameter-loading component can be used to override any value dynamically.
The Distant Run feature enables the remote execution of jobs on any server directly from the studio.
This can be extremely useful when you need to test jobs, for example:
in a configuration similar to the production environment,
on various operating systems,
upon request on specific systems,
as it avoids going through complex deployment procedures.
Target system can be selected dynamically at run time directly from the Studio. All regular debug, trace and real-time statistics options remain available in this remote execution mode.
Talend Administration Center
All subscription offers come with one Studio (or more depending on the user number) and a software part which can be installed on a server and administrated through a web-based interface, the Talend Administration Center.
All Studios are thus no more in local mode but remotely connected to the projects defined in the Talend Administration Center.
Talend Administration Center is a lightweight application (in a browser, no deployment needed) that helps integration project managers to administrate users, projects, user privilege, license...
Project authorizations are assigned easily on a per user basis (supporting LDAP directory). And users are thus granted rights to access projects based on their role: No permission, Read Only, Read & Write...
Users can then share repository items (Jobs, Business Models, DB connection metadata...) with other users, directly in their Studio, for the projects they are authorized on. More information on the shared repository in slides hereafter.
Depending on the Talend Enterprise Data Integration Edition you subscribed to, numerous additional plug-ins are available on the left navigation panel (Dashboard, SOA manager, Server manager...).
The Job conductor coordinates the execution of data integration jobs. It provides a centralized execution interface from which all jobs can be started upon request or according to time-based (from Team Edition) or event-based (from Professional Edition) schedules.
The Job Conductor module relies on “JobServers” or agents which are small applications that are installed on each server where Jobs will be executed on.
After your agents are set up, the Job conductor allows you to monitor, in real time, all your hardware resources (available CPU, RAM, HD...) helping you distributing job executions over the grid, based on the best available server. The native JMX support allows you to monitor over 40 indicators. Any job can thus be deployed onto any server in just one click!
Integration processes developed with the Job designer can be deployed, updated and executed outside the Talend Studio GUI, using theCommand Line module.
Talend Command Line module provides a set of command line options that allow developers and administrators to easily perform batch operations.
Nearly all Job management functions offered through the Talend Studio and the Talend Administration Center are also available through the Command Line. This includes for example functions like: updating Job properties, promoting projects to production, exporting/importing Jobs or sets of Jobs, etc.
The Command Line feature makes it easy and quick to roll out numerous and complex Job deployments and executions including their dependencies and execution metadata.
The native command line Help provides an exhaustive list of all available commands with a short function description.
The Time-based scheduler helps you roll-out a job execution at a defined time and date (first Monday of the month, every Tuesday...) or on a regular basis, over a period of time. A Task is used to centralize all information necessary for the job execution (projet name, job name, job version, server...)
The task is then triggered upon schedule and the job is thus deployed & executed automatically onto the defined server at the defined time. A convenient status system helps your monitor the triggering state and the execution roll-out success/failure directly from the Job Conductor.
From the Professional Edition, an additional event/file based scheduling feature is available.
The Event Scheduler extends time-based scheduling capabilities for real-time integration.
The event listener allows the process executions to trigger an execution on an on-demand basis, or based on an event.
Events can be file-based such as file appearing, disappearing or file modification or SQL-based using “wait for” condition. Once the expected event is identified, the execution task is triggered and the job deployment and roll-out are carried out.
You can easily add new event triggers to any task, extending the industrialization of automatic executions.
The Execution Plan feature helps you sequence and orchestrate the various Job executions and ease the error recovery, directly from the Job Conductor. The execution plan is a task-based feature that outlines dependencies among different tasks orchestrating the execution sequence.
The task dependencies are defined through a hierarchical view of main and child tasks where each task can have a subordinate task.
Execution plans can be scheduled, triggered and can use environment-defined execution parameters from this single view of Job Conductor.
The Grid Conductor module (accessible through the Job conductor) optimizes the scalability and availability of the integration processes by ensuring an optimal use of the execution grid.
The grid conductor relies on the definition of virtual servers, which group available resources, regardless of the system type (CPU, OS...).
Tasks are assigned to virtual servers of the Grid Conductor rather than to a single execution server.
Via a constant monitoring of the resources available on the execution servers, Grid Conductor guarantees that all jobs execute smoothly at triggering time and fully leverage available resources, removing bottlenecks created by the traditional single-server approach.
This alleviates any concerns related to resource preemption when a large number of jobs run concurrently, or when non-dedicated servers are used. Grid Conductor also provides automatic fail-over in the event an execution resource becomes unavailable.
High Availability is achieved through the ability of deploying multiple Job conductors and job execution servers.
On the other end, clustering the databases guarantees failover and prevents any execution disruption.
Apache Hadoop is an open source Java software framework that supports data-intensive distributed applications. It leverage Map Reduce architecture and enables applications to work with thousands of nodes and petabytes of data using large grid of inexpensive servers. Talend Enterprise Data Integration Big Data includes a native support for Hadoop making it possible to scale to any level and support any complex data type, so companies can leverage their Hadoop clusters for peak data volumes and complex transformations.
A dedicated set of components available from the component Palette help read and write HDFS as well as Hive systems and include ELT and SQL template features.
Talend Activity Monitoring Console is a convenient graphical interface and a centralized supervising tool.
It provides detailed monitoring capabilities that can be used to consolidate the collected log information, understand the underlying Job interaction, prevent faults that could be unexpectedly generated and support system management decisions.
The Activity Monitoring Console monitors job events (successes, failures, warnings, etc.), execution times and data volumes through a single console from a centralized point.
This tool is available as a stand-alone tool or may be fully integrated in the Studio.
The Dashboard is a Web-based version of the Activity Monitoring Console that can be accessed easily through a Web browser.
The Dashboard provides execution performance diagrams and status indicators, enabling any stakeholder to view both the current and historical status of any integration process execution.
It also provides detailed monitoring capabilities that can be used to consolidate log information collected, understand the underlying component and job interaction, provide task execution information in a timely manner, prevent faults that could be unexpectedly generated, support the system management decisions.
Job execution processes can be time-consuming, as are backup and restore operations.
Talend Enterprise Data Integration Studio includes a recovery checkpoint capability that is set up at Job design time.
In case of failure, processes can be resumed from one of the checkpoints. Job developers can also design and integrate specific error management in response to specific error conditions using the checkpoint “on-failure” instruction function.
Recovery checkpoints can be appropriately initiated at specified intervals of the data flow (on trigger connections). The purpose of it is to minimize the amount of time and effort necessary when a Job execution process needs to be restarted due to a failure.
With the help of the error recovery checkpoint feature, the process can be restarted from the latest checkpoint prior to the failure (or any other checkpoint before the failure occurred), rather than from the beginning of the Job execution process.
Talend Repository Manager is a cross-platform, multi-repository administration tool to access the common centralized resources and to manage project and artifact migration between repositories.
Talend Repository Manager enables software development lifecycle (SDLC) best practices, by allowing cooperative work between administrators and users whom work with multiple repositories across multiple IT environments, for example development, test and production.
Because open source software results from collaborative development efforts, the final code combines contributions from diverse resources. If the integration of the various contributions to the code is not carefully managed and controlled, the final software use might infringe upon the original contributors’ rights. The end user might then be subject to legal and financial prosecution for infringement, even though such infringement was not intentional. Talend offers an Indemnification clause to its subscription customers. This guarantees customers that Talend will provide legal and financial protection, in the event that the Talend code infringes the rights of a third party.