What YARN Brings to Talend

November 12, 2013 --

Announced a while ago, in “beta” for several months (if there is such thing as “beta” for open source technology), YARN is viewed by many as the promised land of Hadoop. Already supported by most major Hadoop distribution vendors, it enables Hadoop to move from a data storage and processing platform, to an actual computing platform – the computing platform of the future.

YARN is a name that only an open source project could have come up – it’s actually an acronym that stands, in true self-deprecating fashion, for “Yet Another Resource Negotiator”.  There is ample literature out there on YARN (Hortonworks has a good resource page), so I won’t dwell on explanations on what YARN is or how it works.

The most interesting part about YARN is that it enables the Hadoop platform to become a multi-workload environment, on which different types of processes can be run concurrently. YARN comes with resource management, optimization, scheduling, and overall enables better and more efficient use of Hadoop. It also provides the ability to run different types of tasks than batch-oriented MapReduce jobs.

For Talend users, the native support for YARN added in v5.4 means that any big data integration or big data quality job, built originally for MapReduce code generation, can be run inside YARN. This is completely transparent for the users who only need to select the target environment. With YARN, Talend jobs no longer need to “fight” for Hadoop cluster resources with other big data processes, but can run concomitantly. And since Talend’s approach to big data is engine-less, Hadoop is the engine. This decreases overhead, reduces maintenance, and ensures the best performance. 

From a positioning standpoint, this commitment to YARN by Talend increases the technology gap with other integration products, which are still struggling to assemble a big data strategy.  Ask these vendors who are porting their hub-and-spoke engines to Hadoop, what their plans for YARN support are!