Have you ever stood up a datamart that was needed to build a handful of analytical reports, but then that repository sits idle until the next time those reports need refreshing (which may be a week, a month or several months…)? At many points in my career I have built data warehouses and datamarts for that exact scenario and have been frustrated by the length of time that database sits idle…it seemed like such a waste or energy and technical resources.
With the world becoming more connected and data savvy with every passing year, there’s a rising need for businesses to efficiently manage the trillions of bytes of data that they capture, and gain insights from them. Talend helps businesses do exactly this while boosting developer productivity and reducing time-to-value for ETL data warehouse projects. Talend for Big Data is seeing rapid growth, emerging as a must-have tool for quickly and effectively cleansing and analyzing Big Data.
In my previous blog “Beyond ‘The Data Vault’” I examined various data storage options and a practical architecture/design for an Enterprise Data Vault Warehouse. As you may have realized by now I am quite smitten with this innovative data modeling methodology and recommend to anyone who is developing a ‘Data Lake’ or Data Warehouse on Big Data platforms consider this as a critical design paradigm.
I wrote a blog around another favorite topic of mine, DevOps, a while back and in it I discussed the notion of perfection being the enemy of ‘good enough’. After some conversations these last few weeks, I have reaffirmed my stance and broadened it to include everything, especially analytics.
Our Puzzled Customers
In this era of Big Data, many of the IT people I talk with have a number of questions about the technology and trends associated with this new paradigm.
For example, many of them are feeling somewhat overwhelmed with the amount of data they now have to deal with – data that seems to be growing exponentially.
Many of the comments I hear go something like this: “I never thought we would be swamped with so much information. Are there Big Data solutions available now that can help me deal with this deluge of data?”
Every day we all make thousands of decisions, many of them are made subconsciously or are based on minimal information. And we often make them lightning fast - even when it comes to making decisions at work. So clearly we often rely on our intuition and gut feeling.
I wrote in a previous post about the fallacy of the word big in the phrase “Big Data”. This catch phrase that has been associated with “everything having to do with the analysis of data” is a poisonous co-opting of the analytics space to fit a vendor’s needs. It creates a major barrier to adoption for analytics and confusion over exactly what data analytics can and can’t do with its reliance on capturing massive amounts of data. If you accept the premise that “Big Data” is a lie, and that anyone can benefit from analyzing data then the next logical question is, “now what?”
The Internet of Things is now a very effective way to bring the digital world to the physical world, a way of closing the gap between information systems and the field. In the past, there was always a divide between the two since field data were not integrated into the information system. Most often, field data were incorporated later, in "batch" mode or by manual entry, which made it impossible to use them in real time and limited the ability to respond to market demands or uncertainties in operations.
Enterprise Hadoop may be less than a decade old, but analysts at Forrester recently estimated that 100% of all large enterprises will adopt it and related technologies such as Spark for big data analytics within the next two years. It’s clear that we are getting beyond the need for businesses to be educated about Hadoop. Time and again, Hadoop has gained significant interest across various industries globally, and is considered to be one of the core platforms available for managing Big Data (structured and unstructured).
What a delight to have had such a positive response to my previous blog on Talend “Job Design Patterns” & Best Practices. To all those who have read it, thank you! If you haven’t read it, I invite you to read it now before continuing, as ~Part 2 will build upon it and dive a bit deeper. It also seems appropriate for me to touch on some advanced topic
Gartner research recently posted the results of its inaugural survey of chief data officers (CDOs), which revealed that the primary mandate and objective of today’s CDO is to manage, govern and utilize information as an organizational asset. Well, at Talend we see ‘enterprise information’ as more than just ‘an asset’ – we believe it is a company’s most valuable commodity. In keeping with this philosophy, we view the CDO as one of the most important roles in today’s C-suite.