There are two right ways to leverage Hadoop for data management. The first one is to have Hadoop be the data management infrastructure. The second one is to symbiotically merge Hadoop within the existing infrastructure.
There is also the wrong way. This wrong way is to put Hadoop alongside the existing infrastructure and view it as an auxiliary engine. A “Plus One”.
A recent guest blog post from a legacy integration vendor baffled me. In this post, a reference is made to “a new modern data architecture that includes Hadoop as a ‘+1’ to the existing systems”.
A long time ago, when CPUs of personal computers were not powerful enough to render the 3D graphics of “kill them all” games, one would add a video card with dedicated graphic chips (remember ATI, Matrox, etc.?). These graphics capabilities are now integrated into the motherboard or even the CPU.
In a modern data architecture, Hadoop is the data infrastructure. It’s not a secondary system to which other systems offload peak activity. Hadoop itself, complemented by NoSQL database systems native to HDFS, do support all workloads – even the ones traditionally borne by legacy systems, which become obsolete.
In a slightly-less modern data architecture (or one that has to accommodate legacy systems), Hadoop is part of the legacy system, it’s not its “+1”. Vendors such as Oracle, IBM, Teradata, Pivotal (and others) have demonstrated how to combine the processing capabilities of their legacy databases and of Hadoop.
Viewing Hadoop as a “+1” is, in a lot of ways, the same as adding a video card to your PC. While we are at it, why not compare data integration to the PCI bus?