Big Data Tools for Developers

A Changing Environment

Ever-changing requirements from the business translate in ever-increasing pressure on developers to deliver more and more complex projects, on time and on spec.  This is especially true as big data projects start to emerge from their sandboxes and evolve into full-scale big data analytics, soon to become real-time and even operational projects. 

Throughout this journey, challenges abound for developers that hinder their ability to deliver projects that meet expectations.

New Skills

Big data platforms such as Hadoop and NoSQL are an entirely new game. Few developers have been trained on MapReduce programming, and even fewer organizations have the resources to invest in this training. Legacy integration tools aggravate this problem by applying old recipes to new challenges, ensuring that developers won’t grow the skills they need to remain competitive. Developers need native Hadoop tools, but these tools must be usable with their existing skillsets.

Diverse Data Sources

The multiplicity and disparity of data sources create new technical challenges for the developer, and for legacy integration tools. Big data projects must integrate traditional systems such as ERPs, databases, SaaS applications, flat files, and also new sources including social media, web data, sensors, logs, etc. Further integration complexity is introduced as each data source has its own protocols, formats, security rules, APIs, integrity & bandwidth needs, and more. Simple ETL becomes extreme ETL.

New Infrastructure

Traditional data stores have evolved to meet big data demands. Relational databases are now supplemented by Hadoop clusters, augmented or replaced by a multiplicity of NoSQL databases technologies (key-value stores, document databases, graph databases, column-oriented databases) – and it’s an ever-changing landscape developers need to deal with. Native support for these platforms becomes a must-have, and legacy integration tools were not designed for this task. For example, transformations that were once run on a server or in an RDBMS can now be run in Hadoop for greater performance.

Data Security & Quality

A security standard has emerged for Hadoop: Kerberos, which has been ported to integrate natively with the Hadoop platform. Legacy integration tools require their own proprietary security methods, which won’t work in the Hadoop environment, leaving the data vulnerable. As new data quality requirements grow, traditional approaches, that require data to be extracted from Hadoop for processing, simply don’t work and don’t scale.

On-Time and On-Spec

While demands from the business keep accelerating, the challenges that developers face are severely impacting their ability to deliver projects on time and on spec.

Legacy integration tools, designed for a prior era of data management, are now obsolete. No matter how creative the marketing messages can be, developers quickly realize that big data projects require native big data tools and that what most vendors attempt to do is apply old solutions to new problems.

Similarly, manual MapReduce coding may have been used successfully in a sandbox for a proof-of-concept, but this approach doesn’t scale, and is certainly not sophisticated enough to address tough challenges such as data source connectivity, security, or data quality.

Ramping Up to Deliver

Because big data and traditional data environments differ greatly, Talend provides developers with a complete toolset that enables you to quickly get started integrating big data, while building up skills for future projects.

Native Big Data Support

Unlike legacy integration tools, Talend natively supports Hadoop, generating Pig Latin and MapReduce/YARN code. Talend requires zero footprint on the Hadoop cluster since there is no runtime component.  Natively optimized for major Hadoop distributions such as Cloudera, Hortonworks, MapR, Pivotal HD and more, Talend also uses native Hadoop security, Kerberos, and is the only data quality solution to run inside Hadoop – again via native MapReduce code generation.

Open and Easy-to-use

Like Hadoop, Talend is committed to open source and open standards and the benefits that they bring: the largest developer community, collaboration tools that include a vibrant forum and a component/code sharing platform, and of course portability of skills. Big data complexities are abstracted using graphical Eclipse-based tools and wizards making it easier to work with big data using existing skillsets. With Talend, any integration or data developer can become a big data developer in no time!

Future Proofing

As the big data journey continues through analytics to real-time and operational use cases, the investment in Talend continues to bear fruit thanks to the commitments made to the big data platform, but also to the unification of what legacy vendors view as separate integration challenges: data and application integration. With Talend, real-time, REST web services and ESB become part of every developer’s toolkit.

Download now your free evaluation

We were particularly impressed with Talend’s graphical development environment. It enables us to program complex interfaces with just a few mouse clicks, not to mention multiple automation options. It can also be expanded to include various other components.
Quote Author: 
Wolfram Zimmermann
Quote Author Title: 
Systems Analyst
Berlin Attorney General Office
Executive Copy: 

As the business demands faster access to more data, developers need to adopt new technologies, new practices and even new paradigms to deliver on these expectations.

Today’s developers require tools that equip them for the big data journey, for the quest for real-time, for operational integration while at the same time helping them to deliver today’s tasks faster.

Executive Header: 
For Developers