This blog is the first in a series of posts explaining the overarching goal and purpose of the Apache Beam project. In the future blogs, we will explain how to use Apache Beam to implement data processing jobs.
Our Puzzled Customers
In this era of Big Data, many of the IT people I talk with have a number of questions about the technology and trends associated with this new paradigm.
For example, many of them are feeling somewhat overwhelmed with the amount of data they now have to deal with – data that seems to be growing exponentially.
Many of the comments I hear go something like this: “I never thought we would be swamped with so much information. Are there Big Data solutions available now that can help me deal with this deluge of data?”
Every day we all make thousands of decisions, many of them are made subconsciously or are based on minimal information. And we often make them lightning fast - even when it comes to making decisions at work. So clearly we often rely on our intuition and gut feeling.
I wrote in a previous post about the fallacy of the word big in the phrase “Big Data”. This catch phrase that has been associated with “everything having to do with the analysis of data” is a poisonous co-opting of the analytics space to fit a vendor’s needs. It creates a major barrier to adoption for analytics and confusion over exactly what data analytics can and can’t do with its reliance on capturing massive amounts of data. If you accept the premise that “Big Data” is a lie, and that anyone can benefit from analyzing data then the next logical question is, “now what?”
The Internet of Things is now a very effective way to bring the digital world to the physical world, a way of closing the gap between information systems and the field. In the past, there was always a divide between the two since field data were not integrated into the information system. Most often, field data were incorporated later, in "batch" mode or by manual entry, which made it impossible to use them in real time and limited the ability to respond to market demands or uncertainties in operations.
Enterprise Hadoop may be less than a decade old, but analysts at Forrester recently estimated that 100% of all large enterprises will adopt it and related technologies such as Spark for big data analytics within the next two years. It’s clear that we are getting beyond the need for businesses to be educated about Hadoop. Time and again, Hadoop has gained significant interest across various industries globally, and is considered to be one of the core platforms available for managing Big Data (structured and unstructured).
What a delight to have had such a positive response to my previous blog on Talend “Job Design Patterns” & Best Practices. To all those who have read it, thank you! If you haven’t read it, I invite you to read it now before continuing, as ~Part 2 will build upon it and dive a bit deeper. It also seems appropriate for me to touch on some advanced topic
Gartner research recently posted the results of its inaugural survey of chief data officers (CDOs), which revealed that the primary mandate and objective of today’s CDO is to manage, govern and utilize information as an organizational asset. Well, at Talend we see ‘enterprise information’ as more than just ‘an asset’ – we believe it is a company’s most valuable commodity. In keeping with this philosophy, we view the CDO as one of the most important roles in today’s C-suite.
Customer data is everywhere. In some organizations, it can become the root cause for serious business inefficiencies: Undeliverable outbound e-mails, returned shipping, unaddressed customer claims, lack of privacy regulation compliance or data breaches. However, if customer data is managed correctly, it can fuel new levels of customer acquisition, company performance, sales conversion rates and the overall customer lifetime value.
Several years ago, the cloud was a concept that many forward looking businesses were just beginning to think about and which many feared.