Talend Big Data and Machine Learning Cookbook
View the rest of the series:
In this example, an online bank is trying to mitigate their exposure and risk by targeting credit offers to only those customers whom are deemed low risk and most likely to accept the credit offer. Using Web APIs and machine learning, this job will use a decision tree model to determine, at login, whether to display a specific credit offer, or no offer at all.
NoSQL / SQL
Use different database models: Cassandra, MySQL
Access the Real-time Risk Assessment use case portal from the sandbox loading page for quick-run directions and an interactive web interface.
Open Talend Studio within the sandbox environment. For this example, we will be working in the RealTimeRiskAssessmentEngine folder found in the repository view. We will explore jobs in the Standard, Big Data Batch and Big Data Streaming Job Designs. When ready to begin, follow the steps below:
- Navigate to the RealTimeRiskAssessmentEngine folder under Standard jobs. Run job Step_01_SetupMarketingCampaignsEnv. This job initializes the demo environment based on the selected Big Data Platform. Specifically, it loads the data in HDFS and to a NoSQL Database for quick data retrieval.
- Navigate to the RealTimeRiskAssessmentEngine folder under Big Data Batch jobs. Run job Step_02_Train_MarketingCampaignData. This job uses a previous dataset to train a decision tree model using Talend’s tDecisionTreeModel component.
- Optional: Navigate to the RealTimeRiskAssessmentEngine folder under Big Data Batch jobs. Run job Step_02bis_Test_MarketingCampaignData. The results of this job provides a look at the ratio of right predictions against false positives. In machine learning terminology, this is called the Confusion or Error Matrix - a summary of prediction results on a classification problem. This job acts as a test of our trained model on a separate dataset.
- Navigate to the RealTimeRiskAssessmentEngine folder under Big Data Streaming jobs. Run job Step_03_RealtimeConversionPrediction. This job will predict, in real-time, the ad to display to the user.
- Navigate to the RealTimeRiskAssessmentEngine folder under Standard jobs. Run jobs Step_04_AdService and Step_05_LoginService. These jobs provide Web API to the Real-time Risk Assessment web portal and allow you to test the results.
With the Web Services running, navigate to or reload Real-time Risk Assessment portal page. Fill in the form on the web page and look at the result of the ad displayed. This example provides a database of around 1500 users. Log in with an id from 0 to 1547 and look at the result. For most users the ad is not displayed. But for the selected few, you will have an indication that a targeted marketing ad will be displayed for the identified user. For example, log in with id 569 to see the indication of a targeted marketing ad. If you log in twice with the same user id, the resulted decision will be displayed without hesitation, because targeted ads are stored along the way.
This example highlights the use of machine learning and Spark to provide immediate insight and decision processing. We made a decision on targeting marketing campaigns to specific customers using a Decision Tree Model.