Real-time Risk Assessment Engine POC

Talend Big Data and Machine Learning Cookbook

View the rest of the series:

Sandbox Set Up

IoT Predictive Maintenance Recommendation Engine Data Warehouse Optimization


In this example, an online bank is trying to mitigate their exposure and risk by targeting credit offers to only those customers whom are deemed low risk and most likely to accept the credit offer. Using Web APIs and machine learning, this job will use a decision tree model to determine, at login, whether to display a specific credit offer, or no offer at all.


Sandbox Real Time Risk Assessment Schema


Machine Learning

Use the power of Talend machine learning capabilities to build a decision tree model.


streaming data with spark

Real-time / Spark Streaming

Use Spark to run your streaming jobs with Apache Kafka.



Use different database models: Cassandra, MySQL



Access the Real-time Risk Assessment use case portal from the sandbox loading page for quick-run directions and an interactive web interface.

Sandbox Real Time Risk Assessment Access

Open Talend Studio within the sandbox environment.   For this example, we will be working in the  RealTimeRiskAssessmentEngine folder found in the repository view.  We will explore jobs in the Standard, Big Data Batch and Big Data Streaming Job Designs.   When ready to begin, follow the steps below:

  1. Navigate to the RealTimeRiskAssessmentEngine folder under Standard jobs.  Run job Step_01_SetupMarketingCampaignsEnv. This job initializes the demo environment based on the selected Big Data Platform.  Specifically, it loads the data in HDFS and to a NoSQL Database for quick data retrieval. Sandbox Risk Assessment Setup Environment
  2. Navigate to the RealTimeRiskAssessmentEngine folder under Big Data Batch jobs.  Run job Step_02_Train_MarketingCampaignData.  This job uses a previous dataset to train a decision tree model using Talend’s tDecisionTreeModel component. Sandbox Risk Assessment Train Model
  3. Optional:  Navigate to the RealTimeRiskAssessmentEngine folder under Big Data Batch jobs.  Run job Step_02bis_Test_MarketingCampaignData. The results of this job provides a look at the ratio of right predictions against false positives.  In machine learning terminology, this is called the Confusion or Error Matrix – a summary of prediction results on a classification problem.  This job acts as a test of our trained model on a separate dataset. Sandbox Risk Assessment Test Model
  4. Navigate to the RealTimeRiskAssessmentEngine folder under Big Data Streaming jobs.   Run job Step_03_RealtimeConversionPrediction. This job will predict, in real-time, the ad to display to the user.Sandbox Risk Assessment Prediction
  5. Navigate to the RealTimeRiskAssessmentEngine folder under Standard jobs.  Run jobs Step_04_AdService and Step_05_LoginService.  These jobs provide Web API to the Real-time Risk Assessment web portal and allow you to test the results.


    Sandbox Risk Assessment Ad Service


    Sandbox Risk Assessment Login Service

  6. With the Web Services running, navigate to or reload Real-time Risk Assessment portal page.  Fill in the form on the web page and look at the result of the ad displayed. This example provides a database of around 1500 users. Log in with an id from 0 to 1547 and look at the result. For most users the ad is not displayed. But for the selected few, you will have an indication that a targeted marketing ad will be displayed for the identified user. For example, log in with id 569 to see the indication of a targeted marketing ad. If you log in twice with the same user id, the resulted decision will be displayed without hesitation, because targeted ads are stored along the way.


This example highlights the use of machine learning and Spark to provide immediate insight and decision processing.  We made a decision on targeting marketing campaigns to specific customers using a Decision Tree Model.