Applications are from Cloud, Data Platforms are from Open Source

Remember “Men Are from Mars, Women Are from Venus”, the best-selling hardcover nonfiction book published in the 90’s? It sounds to me as a good metaphor to highlight two key delivery models that are currently disrupting the software industry: cloud and open source.

The cloud has profoundly changed the way business applications are consumed. From HR to customer relationship management and collaborative applications, it has become the delivery model of choice for the majority of new projects. It gave birth to new hyper growth companies, such as Salesforce in CRM, Workday in HR, Box and Dropbox in office automation and collaborative software. They took the market by storm, because they moved the center of gravity of enterprise applications to a new place. And this model is so disruptive that established mega vendors had to acquire new entrants to get in the game, while struggling to bring their “legacy” platforms into the new model: for example, Microsoft acquired Skype, Yammer or Parature, SAP Ariba and Successfactor, Oracle Eloqua, Responsys, BigMachines, Taleo and many more. 

However, until now, there is a market that the cloud didn’t succeed to disrupt: the data centric platforms. By Data centric platforms, I mean all the solutions that need to connect to large and heterogeneous sets of data; discover and profile them; reconcile, process and augment them; and then publish them to a large and diverse set of people and applications. Typically, this is the case for Business Intelligence, Analytics, Data management, Data Integration, Enterprise Information Management and Data Governance, etc.

Take the example of Business Intelligence and analytics; I wrote last year in Beye Network series of article to  investigate the BI market and try to understand why cloud BI was so slow to take off (with the notable exception of Web analytics and data marketing platform who were born on the cloud). Indeed, this is currently changing and there are already some cloud centric players that are aiming to disrupt the market; but, as of today, there is no clear player that could have the same impact than Salesforce.com, Workday or Marketo.

Meanwhile, open source is bringing that disruption to the market, right now. Take a look at the data management market. Over the 25 last years, it has turned to a very conservative market with a very slow pace of innovation. Now, NoSQL databases and Hadoop are taking the market by storm, and new poster child companies such as Cloudera, Hortonworks, MapR or MongoDB are drawing the attention of investors and customers. As in the case of the cloud for enterprise applications, those platforms are profoundly disrupting the industry because they drive gravity shifts. There are new mandates for managing data, and the legacy environment cannot handle them, so sooner or later the data had to move elsewhere if you want to extract its full value.  New environments are needed.  They need to be open to handle heterogeneity and variety of data, extensible to evolve at the pace of innovation, and affordable in order to embrace the long tail of Information Management. Open source is establishing itself as the right model to provide this.

On top of this reinvented data management layer, tools and solutions need to provide the same characteristics. And, there comes use cases for open source Data Integration and data governance, open source Business Intelligence platforms, open source predictive analytics and decision management engines, etc. This raising trend is evidenced by the latest release of Gartner’s Magic Quadrant and Forrester’s Wave in those markets: see as an illustration the evolution of Talend positioning along the yearly release of the Magic Quadrants for Data Integration, Data Quality or MDM ; or the rise of Jaspersoft and Pentaho in the latest Magic Quadrant for BI Platforms ; and see as well how Rapid Miner, Knime or Revolution Analytics shine in the new Magic Quadrant for Advanced Analytics.   

Through this post, I don’t want to underestimate the impact that Cloud delivery model can bring to data centric applications; but again, my feeling is that it is a matter of center of gravity. The more data for data centric application has to be sourced beyond the firewall, the stronger is the rationale to have those applications deployed on the cloud too, and this is happening, sooner or later. So, there is no doubt:  data centric application will finally move to the cloud. But, guess what, there is a prerequisite for this to happen: the data centric platforms that are needed to make that happen need to be highly open and extensible. Amazon Elastic Map Reduce, Google Big Query or Microsoft Azure HDInsight show that cloud and open source is a winning association for data management on the cloud.

This brings us back to our Mars and Venus metaphor. Long time after the publishing of the book, a study from the University of Rochester involving 13,301 individuals founds that men and women, by and large, do not fall into different groups. Time will tell, but what if the future of Enterprise applications and Data Platform were similar, based on a mixed Cloud and open source delivery model?  

Jean-Michel