Talend provides a number of Machine Learning components that can be used for a variety of purposes. I have previously described some of these various components, some in more detail than others, as well as outlining what they can do. However, one question remains, what use cases can be solved by using these Machine Learning components?
Machine learning components
Firstly, a quick overview. Talend provides a set of ‘out of the box’ components for various ML techniques. These can be classified into four groups:
- Classification – Classification is the problem of identifying to which of a set of categories a new observation belongs, on the basis of a training set of data containing observations.
- Clustering – Clustering is the task of grouping a set of objects in such a way that objects in the same group are more like each other than to those in other groups.
- Recommendation – Recommendation is a class of information filtering that tries to predict the “rating” or “preference” that a user would give to an item.
- Regression – Regression is a process for estimating the relationships among variables
All of the above leverage Apache Spark for scale and performance, they enable a faster time to insight and value, they focus on business outcomes – not development and they present with a lower skills barrier to use.
So, given that I have a number of components in each of the above groups, how do I use them to suit my own use cases? Let us take a look at each of the four groups mentioned above in turn.
As mentioned above, classification is the problem of identifying where new observations belong. In practice, this can cover a wide range of things. Some of the most common use cases for classification algorithms includes whether certain events could happen in the future. Contained with the area of ‘classification’ are lots of different algorithms that can do lots of different things, but the common thread is that they use data sets to make predictions about future events.
As for use cases, the most common example here is ‘what will likely happen next?’. If you would like to predict what your customer will do next, what could their next behaviour be, what your supplier also will do, then classification algorithms are what you need. In all fields from life sciences to retail, large amounts of data are being used to build predictive models. These can do such things as predicting clinical outcomes, predicting customer behaviour, predicting the next move in a business process, predicting the likely increase in sales over Christmas. The choices are many. Within the classification group in Talend there are different types of component that allow you to build different models such as Decision Trees, Regression models and Random Forrest models and these models can be used to best fit to your ‘what will happen next?’ use cases.
Clustering is the task of grouping together a set of objects in such a way, that objects in the same group are more similar to each other than to those in other groups. Clustering is really useful for identify separate groups and therefore is used to solve use cases such as “who are my premium customers?”.
As a simple example, if you plot out your customers v how much they spend on a graph, then you can easily identify separate groups and therefore those premium customers that you want to hold on to. Clustering can also be used for other business cases. In life sciences clustering can be used in drug discovery, in climate science cluster analysis is used to analyse weather patterns. Network analysis is another good example, who is my network being used or how is it being hit? In economics, clustering is used widely for a variety of applications.
Recommendation algorithms are often self explanatory and we are all familiar with them. If you use Amazon, Netscape, eBay etc. you will be all to aware of them recommending all sorts of things, based on your previous purchases and behaviour. They simply work by predicting the “rating” or “preference” that a user would give to an item.
We have a couple of Talend components in this base and they work by analysing data from a proceeding model component, and then making that prediction. The potential use cases are many here. If you sell items to customers, buy from a supplier, provide services etc. then recommendation algorithms are for you, and you can fit your recommendation use cases in here.
Finally, regression analysis is a statistical process for estimating the relationships in variables. It includes various techniques for modelling and analysing several variables at once, when the focus is on the relationship between a dependent variable and one or more independent variables. This means Regression analysis can be used to do various analyses on your data and then the results are used to build a model.
Within Talend we have one component, the tModelEncoder component which is used to do just that. It performs various featurisation operations that can transform data into the format expected by the Talend model training components. As regression can be used to find the relationship between variables, it can be used in a number of business use cases.
Examples could be the relationship between temperature and sales, useful if you sell air conditioning units. You could look for relationships between advertising spend and sales through various channels, which ones generate the best sales? In life science you could find relationships between various drugs and clinical outcomes. The possibilities are wide and varied, but the bottom line is this; if you have data in which you think there may be relationships between variables, then Regression analysis can help you find and quantify those relationships. From that you can build a predictive model which can use to help your business.
Fundamentals of Machine Learning now.
Talend and machine learning
As we have seen, there are a number of components that fit into four main areas. Each of those areas cover a large number of use cases that you could use within your business. Ultimately, although we have mentioned a few use cases, there are always more. The point is to give some examples so that you can think about your own use cases and how you can use Talend’s Machine Learning components can help your company bridge the gap between business, IT, and data scientists to seamlessly deploy critical machine learning models.