Last year, I predicted that 2013 would see Hadoop become enterprise-acceptable. Pure-play distribution vendors such as Cloudera and Hortonworks, and larger vendors which have made significant commitments to Hadoop (including Pivotal or IBM), clearly contributed to this becoming reality. So what is the next step for Hadoop?
Predict: 2014 will see Hadoop transition from a single-purpose batch data processing environment to a multi-use computing platform running diverse applications, mixed workloads, and driving more real-time and operational uses of big data.
In it’s “1.0” incarnation, Hadoop is essentially a single-task platform, only able to run one process at a time. Of course different processes can be sequenced, but this need to “queue” jobs decreases the interactivity and reactivity of the processes Hadoop runs. This makes it pretty much limited to batch data crunching, for use cases ranging from ETL offload to data science to predictive analytics – but limits its ability to provide data sets on demand, with the short response time that are needed for interactive navigation or for operational consumption by applications.
With version 2.0, and especially YARN, Hadoop becomes much more a computing platform, able to run all kinds of concurrent workloads, some of them batch with “longer” response times, and some of them interactive/real-time. YARN also extends the ability of Hadoop from just MapReduce to other processing frameworks such as Storm for streaming.
In a rare case of business needs being well ahead of the curve on technology, use cases are already plentiful that will boost the adoption of Hadoop 2.0 and turn it into the “next” enterprise computing platform.