How To Turn Any Big Data Project Into a Success (And Key Pitfalls To Avoid)

How To Turn Any Big Data Project Into a Success (And Key Pitfalls To Avoid)


Guest Blog by Bernard Marr, Founder and CEO of The Advanced Performance Institute

I could start this article by saying that Big Data is taking over the world, spreading into every industry and that if you aren’t “doing it”, you’re going to be a dinosaur destined for extinction. That’s how hundreds of other articles that have been written about Big Data in business over the last few years have started. And many of them are fine articles filled with useful information.

But it’s really a lazy way to start an article – I’m going to assume most people reading this will know this already. In fact, I’m willing to bet that that it’s the main reason you’re reading this article in the first place.

But as true (and clichéd) as those sentiments may be, they oversimplify the situation. The truth is that even if you are “doing it”, there’s no guarantee of success. What’s important is that you do it well – and although that might seem obvious, it’s an obstacle that I’ve seen many companies fail to navigate.

So with this post I’m going to start from the beginning, and explain a few steps that need to be carefully considered before you spend a penny on collecting data, hiring analysts or setting up your very own distributed cloud storage infrastructure. It will hopefully help you to avoid the most common pitfalls – pitfalls which I’ve seen many businesses of all sizes tumble into, usually because they got carried away on the Big Data hype train

The very first thing to do is quickly mention what Big Data is – and just as importantly, what it isn’t – because this is a common misconception. Apologies if you think I am over-simplifying here, but this is still something I constantly see people getting wrong! Big Data is not simply “a lot of data”. And understanding that is key to avoiding some of the most common errors.

Recap – so what’s Big Data

Big Data is the term used to describe our ability to make sense of the ever-increasing volumes of data in the world. It not only refers to the data but also the methodologies and technologies used to store and analyze the data. Increasingly, it is particularly concerned with understanding and drawing insights from unstructured data (data which does not fit nicely into the rows and columns of a spreadsheet, such as video, audio or social media data) as well as machine-generated data (data which comes from smart devices and internet-connected things). Because this is the sort of data that is growing at an enormous rate in today’s digital, mobile, always-on world. For more on what big data is have a look here.

This, among other reasons, is why I often prefer to use the term Smart Data rather than Big Data. It’s really not the size that counts, it’s what you do with it.

Download>>Test Drive Talend Real-Time Big Data with Apache Spark & Ready to Run Scenarios

Key most important step

The next and most important point you need to consider is why you want to use Big Data in the first place. And the answer really shouldn’t be “because everyone else is”.

Instead, what you need is a clear business case. You need to know how, and why, Big Data is useful to your company. Can it help you to solve a particular problem? Almost certainly it can – but you need to be certain that you’ve identified the right problems. A retailer client of mine once called me in to help with some Big Data projects which they had under way. During our first meeting I asked them to explain their projects to me, only to be told that would take some time as they had over 250 different data-driven projects on the go! Many of these were not even directed at solving a particular problem, or were focused on altering metrics which they could not demonstrate the significance of (for example, predicting which days of the year staff were most likely to call in sick – when there was no evidence that staff absenteeism was impacting their business performance). With just a relatively little amount of work we were able to cull most of those projects and allocate resources to the ones which were likely to drive positive change. The bottom line is that if you can’t say immediately how a project is going to answer important questions about your business, improve your service to your customers, or create efficiencies in your operations, you’re probably wasting your time.

Finding the right data

This leads nicely into the next pitfall I want to highlight: Make sure that the data you’re collecting is the right data. Too many companies launch into a Big Data initiative with the idea that “We’ll just collect all the data that we can, and work out what to do with it later”. This is an extremely wrong-headed way to go about things and all too often leads to disaster. It brings with it two potentially project-crippling dangers. The first is that with too much data you “won’t be able to see the woods for the trees”. Rather than focusing on the data that’s likely to drive the insights or change you’re looking for, you will become distracted by patterns and maybe even insights which have little potential to teach you anything useful.

The second problem with the “gotta collect it all” attitude is that any data collection and storage brings with it expense, as well as legal obligations and compliance – and with Big Data projects often involving personal data, these expenses and obligations can be immense.

The need for data-skills

As well as a financial cost, there’s obviously also a cost in human resources and time. If you have data scientists bumbling their way through hundreds of projects with no clear aim, or decoding terabytes of data you have no clear, immediate use for, they’re likely to be unavailable, or distracted, when something of real value comes along. Having the right people with the right skills in the right place is essential. Good data scientists don’t come cheap – generally commanding salaries of $100k or more, and the best are constantly in demand and rarely short of work. The more you know about precisely what you want to achieve, the more likely you are to find the right people for the job. This won’t necessarily mean hiring externally – one client of mine in the financial industry realized there was a heavy skillset crossover between some money market analysts already employed at the company, and the work they were looking to hire data scientists for. By offering on-the-job training to financial analysts interested in working in Big Data science, they were able to far more efficiently fill the roles.

Good project-management

Another point, which I can’t over stress the importance of, is the importance of good communication throughout the project. This involves both ensuring that there is “buy in” for your project across the team carrying out the work and the wider organization – from c-level executives to the nuts and bolts techies who will be carrying out the analytics and the customer facing or workforce staff whose work will be affected by your results. Everyone needs to clearly understand what it is you are trying to achieve and, crucially, why. Plenty of Big Data projects fail because the frontline staff responsible for putting the insights you’ve gleaned into play don’t understand why they are suddenly being told to do things differently than how they’ve done them for years. This isn’t their fault – it’s almost always because no one has taken the time to explain things properly to them.  They don’t need to understand the ins and outs of the machine learning algorithms which are running across the distributed, unstructured data you’re analyzing in real time. But there should always be a logical, common-sense reason for what you’re asking them to do. The only real difference is that you now (hopefully) have stats and analytical evidence that backs up your decisions regarding both overall strategy and day-to-day business procedures.

So, back near the start of the article I mentioned that I prefer to talk about Smart Data, rather than Big Data. This isn’t just because I think it is a more accurate description for what the term really entails. It’s also because it works as a handy breakdown of the steps you need to take to make sure your data analytics activity proves fruitful. Those steps are:

S –Start with strategy (ensure you have a clear business case for what you are doing)

M – Measure metrics and data – And make sure it’s the right data!

A – Apply analytics – being certain you have the right skills and technology in place for the job you need to get done.

R – Report results – Ensure you have clear lines of communication, from top to bottom of your organization.

T – Transform your business in a positive way, based on the insights you’ve discovered

Following that basic template would be a good start to making sure your Big Data project doesn’t become one of the many which fail to deliver any real benefits, but one that delivers real business value and performance improvements to your organization.

Related Resources

With Talend, Speed Up Your Big Data Integration Projects
Easier Data Integration: 5 Steps to Success

Related Products

Talend Big Data


Join The Conversation


Leave a Reply

Your email address will not be published. Required fields are marked *