Talend Step-by-Step: Continuous Data Matching & Machine Learning with Microsoft Azure
Today, almost everyone has big data, machine learning and cloud at the top of their IT “to-do” list. The importance of these technologies can’t be overemphasized as all three are opening up innovation, uncovering opportunities and optimizing businesses.
Machine learning isn’t a brand new concept, simple machine learning algorithms actually date back to the 1950s, though today it’s subject to large-scale data sets and applications.
Today, I want to run through some step-by-step videos that will teach you how to use Talend’s machine learning capabilities with Microsoft Azure to help pinpoint errors in large datasets for cleansing before entering the analytics pipeline.
Machine Learning in Practice:
Machine learning techniques bring tremendous opportunity to better target customers and improve operations. Yet, data-driven insights are only as good and trusted as the data going into them. Let’s jump in how to use Talend’s simple and automated machine learning approach to match a very high volume of data, ultimately accomplishing what’s called continuous matching.
In the first video above, we learned how to set up the initial machine learning matching process. We started with a pairing exercise in order to pre-analyze the data sets and ultimately create a set of sample pairs that can be sent to a Data Stewardship user. The pairing exercise we go through on the sample helps us build our machine learning scenario.
Using Continuous Matching on Machine Learning:
Now that we have our algorithm set and our rules in place, let’s learn how to accomplish continuous matching and continuously feed new customer data through matching models to produce new suspect duplicates and unique data records.
At the end of our quick, two-video tutorial, we’ve built a continuous cycle of pairing, matching and updating data that will grow our search index in Elastic Search Index viewed through Kibana and ultimately grow our capabilities to match new data sets that we run through the cycle.
Want to give this a go on your own? You can try Talend Cloud FREE for 30-days and put this exact scenario in practice on your data. Questions? Tweet me @MarkBalk or catch me on the road during our Craft Beer and Data Series where we will be in your town talking data and…drinking beer of course!