Talend, a global leader in open source integration software, today announced the availability of Talend Open Studio for Big Data, to be released under the Apache Software License. Talend Open Studio for Big Data is based on the world’s most popular open source integration product, Talend Open Studio, augmented with native support for Apache Hadoop. In addition, Talend Open Studio for Big Data will be bundled in Hortonworks’ leading Apache Hadoop distribution, Hortonworks Data Platform, constituting a key integration component of Hortonworks Data Platform, a massively scalable, 100 percent open source platform for storing, processing and analyzing large volumes of data.
Talend Open Studio for Big Data is a powerful and versatile open source solution for data integration that dramatically improves the efficiency of integration job design through an easy-to-use graphical development environment. Talend Open Studio for Big Data provides native support for Hadoop Distributed File System (HDFS), Pig, HBase, Sqoop and Hive. By leveraging Hadoop's MapReduce architecture for highly-distributed data processing, Talend generates native Hadoop code and runs data transformations directly inside Hadoop for maximum scalability. This feature enables organizations to easily combine Hadoop-based processing, with traditional data integration processes, either ETL or ELT-based, for superior overall performance.
“By making Talend Open Studio for Big Data a key integration component of the Hortonworks Data Platform, we are providing Hadoop users with the ability to move data in and out of Hadoop without having to write complex code,” said Eric Baldeschwieler, CTO & co-founder of Hortonworks. “Talend provides the most powerful open source integration solution for enterprise data, and we are thrilled to be working with Talend to provide to the Apache Hadoop community such advanced integration capabilities.”
Talend Platform for Big Data
Talend Open Studio for Big Data is a core component of the Talend Platform for Big Data, which enables organizations to increase their productivity by deploying big data solutions in hours instead of weeks or months. The Talend Platform for Big Data easily integrates data of all types - structured, semi-structured and un-structured - and maximizes an organization’s resources by abstracting the technical complexity of big data tools and technologies. The Talend Platform for Big Data is compatible with all Apache Hadoop distributions and has been certified for use with Hortonworks Data Platform.
Talend Platform for Big Data provides:
- Big Data Integration: Loading Big Data in Hadoop via HDFS, HBase, Sqoop or Hive is considered an operational data integration problem. Talend Platform for Big Data provides an intuitive set of graphical components and workspace that allows for interaction with a big data source or target without the need to learn and write complicated code.
- Big Data Quality: Talend Platform for Big Data presents data quality functions that take advantage of the massively parallel environment of Hadoop. It enables developers to take advantage of the high performance processing environment to identify duplicate records across these huge data stores in moments not days. It also extends into profiling big data and other important quality issues as the Talend data quality functions can be employed for big data tasks.
- Project Optimization: With Talend Platform for Big Data, the ability to schedule, monitor and deploy any big data job is included, built on a shared repository, so that data analysts can collaborate and share project metadata and artifacts.
According to a recent report from Gartner, Inc., “Big data will bring huge challenges in information governance. Much of the data organizations will want to leverage comes from outside their control, is less-structured and is much less understood (semantics and relationships) than the transactional data they have traditionally dealt with.” Gartner also states within the report, “The strong desire to apply analytics to these new and different data types (often in support of critical decision-making), means suitable levels of data quality are essential."
Open Source: Leading the Way for Big Data Applications
Open source technology is helping organizations of all sizes convert massive data sets into meaningful business intelligence. While proprietary systems are expensive to deploy across large, distributed big data environments, open source software is far more cost-effective and flexible than traditional proprietary solutions and supports real-time scaling of big data environments, with no increase in licensing costs, making it the technology of choice for big data applications.
“Talend’s big data solutions provide a full open source solution that connects Apache Hadoop to the rest of enterprise applications, greatly benefiting data scientists in their ability to access and analyze massive amounts of data efficiently and effectively,” said Fabrice Bonan, COO & co-founder of Talend. “Through the release of Talend Open Studio for Big Data under the Apache Software License, and our partnership with Hortonworks, we are proud to be contributing to the democratization of big data, greatly simplifying the process of integrating Hadoop into existing data architectures, without investing hefty budgets into non-scalable proprietary solutions.”
Talend Open Studio for Big Data will be available in May 2012. A preview version of the product is available immediately at http://www.talend.com/download-tosbd.