Tackling Big Data with Hadoop support

July 01, 2010 --

hadoopelephant_sm.jpgEarlier this week Talend announced native support for Hadoop, with immediate availability.  What does this mean?

From the technical standpoint, it means that Talend’s solutions now include the capability to not only connect to Hadoop-based data sources such as HDFS (Hadoop Distributed File system) and the Hive database, but also to run natively data transformations inside Hive.  You don’t need to get your data outside of the Hadoop-based systems to process and transform it.  This, in itself, is a major benefit for users.

One way to think of it is as the “ELT of Hadoop”.  The same way ELT allows you to run your data integration jobs inside the database engine, Talend Integration Suite runs your data integration logic inside the Hadoop architecture.  It does so by generating native Hadoop code.  (Maybe we ought to call this EHT).

From the usage standpoint, it means that organizations with large amounts of complex data are now able to fully leverage the Hadoop architecture.  Previously, there was no data integration option available for Hadoop.  You would have to extract data from Hadoop, transform it inside a data integration engine, and insert it again in Hadoop.  Not only was this costly in performance, but it was also highly inefficient – after all, Hadoop’s MapReduce architecture is especially well suited to running complex data transformations.

From an innovation standpoint, this feature once again highlights the unique innovation engine that drives open source (or that open source drives).  As far as I know, Talend is the first vendor to ship a functional data integration solution for Hadoop.  Sure, many other announcements have already been made.  With availability “in the summer” or “by end of year”.  But nothing is available today, except from Talend.