Hadoop is Not a Plus One

September 06, 2013 --
Tags: hadoop, nosql, database, CPU,

There are two right ways to leverage Hadoop for data management. The first one is to have Hadoop be the data management infrastructure. The second one is to symbiotically merge Hadoop within the existing infrastructure. 

There is also the wrong way. This wrong way is to put Hadoop alongside the existing infrastructure and view it as an auxiliary engine. A “Plus One”.

A recent guest blog post from a legacy integration vendor baffled me. In this post, a reference is made to a new modern data architecture that includes Hadoop as a ‘+1’ to the existing systems”.

A long time ago, when CPUs of personal computers were not powerful enough to render the 3D graphics of “kill them all” games, one would add a video card with dedicated graphic chips (remember ATI, Matrox, etc.?).  These graphics capabilities are now integrated into the motherboard or even the CPU.

In a modern data architecture, Hadoop is the data infrastructure. It’s not a secondary system to which other systems offload peak activity. Hadoop itself, complemented by NoSQL database systems native to HDFS, do support all workloads – even the ones traditionally borne by legacy systems, which become obsolete.

In a slightly-less modern data architecture (or one that has to accommodate legacy systems), Hadoop is part of the legacy system, it’s not its “+1”. Vendors such as Oracle, IBM, Teradata, Pivotal (and others) have demonstrated how to combine the processing capabilities of their legacy databases and of Hadoop.

Viewing Hadoop as a “+1” is, in a lot of ways, the same as adding a video card to your PC.  While we are at it, why not compare data integration to the PCI bus?




- by de Montcheuil Yves on September 06, 2013
Timo, thanks for the comment. The trouble with analogies is, you better master the topic lest you be challenged... I guess my point is that much fewer people go and "bolt on" a video card to their PC than when I was younger. As far as the main topic - Hadoop - is concerned, I would argue that Hadoop can be the heart (sometimes with add-ons such as Cassandra which does support transactions), or it can be part of the heart (as in SAP's reference architecture, or in your future "Hanadoop"). It all depends on what you need to do.
- by Elliott Timo on September 06, 2013
Yves, I think you missed a bit on this one. First, the analogy part: I haven't bought a PC for a while, but as far as I can tell, Windows PCs still have separate graphics cards. And even the new Apple Mac Pro has a separate processor for graphics... Second, how can a system that doesn't support transactions (at least today) be considered the "heart" of a company's infrastructure? (big data and data warehouse vendors often seem to dismiss transaction systems as something separate from their so-called "unified" architecture visions.) I think what you meant to say was that Hadoop is more than just an add-on to your existing systems -I completely agree - but it's part of the heart, not the whole thing. [I'm an innovation evangelist for SAP, and we called our vision of a unified information infrastructure the "Real Time Data Platform", and yes, it includes Hadoop etc, as more than just a "+1"] Regards, Timo Business Analytics Blog: timoelliott.com