I recently had a chance to meet Dr. Shahzad Cheema, a Lead Data Scientist at IBM’s IoT Industry Lab in Munich. We had an interesting discussion around Data Science and its applications in the real world. According to Dr. Cheema, Data Science is probably the most fascinating and least understood field in IT. Luckily, we are out of the “Hype” phase for Big Data as we are already witnessing its adoption and acceptance in almost all industries. Like industry revolution, big data will continue to bring technological revolution in many different forms. All of the “smart” features that are showing up in products today are based on analytics and data which is a proof that data science is a key foundation for both business and technological innovation.
So, what exactly is Data Science? Data Science is an interdisciplinary field. It’s a combination of data, science, technology and its business impact. Business value from that process is very important and it usually employs sophisticated tools and techniques to extract knowledge and actionable insights from structured or unstructured data in order to optimize business objectives.
Wikipedia defines it as “a field of scientific methods, processes, algorithms, and systems to extract knowledge or insights from data in various forms, either structured or unstructured, like data mining”. While the definition of data science is widely accepted, the implications and implementations of it in the real-world remain a bit of a mystery. To get down to business implications, we need to better understand the main building blocks of Data Science and how they are tied together. In this article, I am going to summarize our discussion around four major components of Data Science. Data, Science, Technology, and Business.
Data is the most important component in Data Science. What matters isn’t the size of the data (the term “big” is relative anyway) but how it’s used. This idea has been dubbed in a more sensible term, “smart data”. While the four famous V’s (Volume, Velocity, Variety, and Veracity) explain the underlying landscape of big data, it’s the “Value” that matters in the end. Velocity makes it very difficult to maintain and analyze data over 2 million records per day. Feature Engineering i.e. creating meaningful/useful attributes from raw data is a key trend in the space. Another key trend is using Feature Engineering to deal with unstructured data by embedding it in powerful Machine Learning models such as deep neural networks.
How Leading Enterprises Achieve Business Transformation with Talend and AWS now.
Data processing algorithms (better known as Machine Learning) are the backbone of Data Science. A data scientist follows a rigorous process (such as CRISP-DM) to explore and analyze data sets while training and building the Machine Learning models.
A machine learning model resolves a certain problem such as predicting customer churn or identifying the most influential factors in a purchase pattern. Starting from neural networks in 1950’s, providing sophisticated algorithms such as Support Vector Machines and Random Forests, Machine Learning has not disappointed practitioners. What is most fascinating is the immediate feedback of the model through the train-validate-test process. If done properly, there is always an added value of this exploration even when the final model does not reach the desired goal.
Talend has many built-in Machine Learning components. tALSModel, tClassify, tKMeansModel, tRecommend, tPredict, tLogisticRegressionModel, and tRandomForestModel to name a few. Talend’s intuitive interface makes it easier to customize and train the model.
The advancement in data processing and management tools has put life in Machine Learning Models. While the conventional spreadsheets and SQL continue to be major tools, there have been an exceptional amount of tools that have recently entered landscape – especially when the scale and rapid development is a choice.
Who would have thought a few years ago that Python and NoSQL would be competing with Java and SQL, respectively? We have seen a rapid progress and adoption of open-source tools, cloud platforms, SaaS, and API’s. Talend is providing a novel way of integration providing choices to adopt technology and developing jobs rapidly and effectively with a little understanding of underlying language. Talend leverages Apache Beam for a single programming language for both batch and streaming data-parallel processing pipelines. use cases. Read more about Apache Beam use cases on the Talend Blog.
Distributed computing and technology is being democratized and has become a norm (e.g. Apache Spar and blockchain). Building large-scale, compute-intensive, real-world applications have become much less difficult – thanks to smart and low-cost sensors, powerful GPU’s such as Tesla P100 and compute environments such as IBM’s Power AI. Have a look at the work of Matt Turck if you want to learn a bit more.
Business KPIs and their impact are the most important and underrated aspect among many new entrants into the data science field. Every now and then, I meet data science enthusiasts, new graduates, and researchers with bright eyes (I used to have such a pair) who believe that being a data scientist means beating some benchmark. No! It’s about meeting some objective – a business objective in 99% of the cases. Yes, there are cases and situations where you will be challenged by the underlying problem and will have to exhibit the magic, but that is not a starting point.
Most traditional businesses are in transition phase even in digitization phase, so a lot many problems can be solved through Automation, Data Analysis, and Predictive Modelling. In my short career, I have witnessed success stories across a range of applications: volume forecast, churn prediction, routing optimization, real-time-bidding, fine-grain image recognition, crop-optimization, web analysis, insurance estimation, vehicle control optimization – to name a few.
In this article, we have listed the four building blocks of practical Data Science and its business application. In our next article, we will dive one level deeper and will discuss some of the ingredients of above-mentioned quadrants i.e. Data, Science, Technology and their Business Applications.
Reference:  https://link.springer.com/article/10.1007/s10586-018-1799-6