The 2019 Gartner Magic Quadrant for Data Quality Tools : Talend named a leader

Standardization of Customer Data – Hidden gems in Talend Component Palette

Standardization of Customer Data – Hidden gems in Talend Component Palette

  • Nikhil Thampi
    Nikhil Thampi is Customer Success Architect at Talend and his core expertise are in Data Integration, Database and Data warehousing technologies. He has more than 12 years of IT experience and during this career, he has helped to create technical solutions for customers from different parts of the globe. His areas of interest also include Cloud, Containers, Big Data, Data Governance and Machine Learning technologies. He is passionate about teaching and increase awareness about Talend among IT developers and he is one of the top contributors of Talend Community site.

If you are a banker, would you like to hear the bad news that your company’s name is on the headlines of every news channel since the bank has accidently delivered the credit card to a scamster instead of a genuine customer? Unfortunately, this scenario occurs when the address records were not in standard format and the letter was delivered to a similarly resembling address. This issue can happen not only in banks but also in government institutions, hospitals, and for that manner any customer facing organization.

  

In this era of data privacy and high corporate scrutiny, every customer friendly institution would like to be on the good books of their customers. Since customer address, email and telephone are three major mediums of customer interactions for any institutions, data quality for these attributes should be given utmost importance and care.

 

Data Quality and standardization of above three interaction mediums will be a crucial step for customer satisfaction in the long run. We have already discussed a scenario for address standardization issue. Similarly, if the phone numbers or emails are not correct, companies may not be able to serve right offers and recommendations at right time if sms or email get bounced due to invalid phone number or invalid email.

In this blog, I would like to reinforce the importance of some of the hidden gems of the Talend component Palette. At times, they are overlooked by Talend developers due to their ignorance about the importance of data quality and standardization. I hope the ideas mentioned in the blog will help them to plan the customer data standardization tasks in more efficient way.

 

Address Standardization

Address Standardization is one of the most important aspect in ensuring data privacy for a customer. The most popular customer merge rule used by companies is based on customer’s name, address and date of birth. Many customers have common and popular names with minor change like Junior, II etc. For non-essential websites, customers tend to provide random values like 1st January of a year as Date of Birth. This means, to distinguish the correct customer and to get 360-degree view of the customer, customer address eventually becomes a crucial parameter.

 

Talend helps in address standardization by providing components which can integrate to Experian QAS, Loqate, MellissaData and Google address standardization services. The various components available in Talend Palette are as shown below.

Let us see a quick scenario of address validation service and in this example, we are using tMelissaDataAddress component to standardize the input address information.

The input address details are as shown below.

The data will be verified and enriched by the MelissaData component and you will get standardized address output data as shown below.

   

The standardization of address records helps in quicker address match process which in turn will reduce the overall time required for customer de-duplication efforts. The usage of these components will also help to identify wrong or non-existent addresses at an earlier stage in their data processing flow. The method of processing can be either real time API call through Cloud or through batch mode, depending on the Talend component and the choice of the address standardization vendor.

Some of the other address lookup components like tGoogleMapLookup and tGoogleGeoCoder also helps the Talend developers to identify address from geographical co-ordinates (latitude and longitude) and vice versa.

The component detailed specifications and associated scenarios of the Talend components for each vendor can be referred from below links.

 

Vendor

 

Talend Component Reference Link

Experian

 

Experian QAS address standardization

Google

 

Google address standardization

Loqate

 

Loqate address standardization

MelissaData

 

MelissaData address standardization

 

The cloud version of the Talend address data standardization is also available and there are two components under this category. The data can be processed either through real time or in batch mode for cloud components.

 

Email Standardization

Email has become the new primary medium of communication for the customers of digital age. This has resulted in increased scrutiny and validation of email addresses by most of the companies.

The standardization of email is achieved in Talend by using tVerifyEmail component and it helps to verify and format email addresses against patterns and regular expression. The component also helps to either whitelist or blacklist specific email domains based on business requirements.

 

A sample scenario for email verification is as shown below.

The input data for the sample scenario has been added by inline table of the input component.

 

In this example, instead of regular expression verification, name column contents are used to create validation rules as shown below.

 

The emails were validated and categorized to multiple groups along with suggested emails as shown below.

 

The detailed specification about the various properties and usage of this component can be referred from here.

 

Phone Standardization

Standardization of phone numbers is another basic requirement for customer interactions and match process. Every customer is using a mobile phone which means invalid or junk phone numbers often become a headache to companies when they are trying to reach their customers. Talend helps in the telephone number standardization using the component tStandardizePhoneNumber. The telephone number can be standardized to one of the below formats.

  1. E164
  2. International
  3. National

A sample scenario for the Telephone number standardization is as shown below.

 

Telephone numbers in various formats have been added inline to the input flow as shown below.

Telephone standardization component verifies the data and provide the standardized format. It will also provide Boolean values to identify whether the input value is a valid phone number, possible phone number, already standard along with phone number type and any error message.

The detailed specification about the various properties and usage of this component can be referred from here.

 

First Name Standardization

The last data standardization component I would like to discuss in this blog is tFirstNameMatch which will help to standardize the first name of a customer. The process of standardizing the first name can be explained with a quick example as shown below.

 

The sample names have been added as inline to the input component as shown below.

The details will be filtered to send only name and gender in the above example. But it is also possible to select the column that contains the gender or country respectively, which will optimize system performance and will give more precise results. You can also use Fuzzy Logic option for precise results.

The output of the flow is as shown below where standardized name details are provided for further usage.

The detailed specification about the various properties and usage of this component can be referred from here.

 

Conclusion

The standardization of the customer name, address, phone number and email will help the customer data matching and de-duplication efforts in big way during later stages of the data process flow. The above components will also enable Talend developers to avoid writing time consuming logics within tMap using various regular expression rules. So next time, when a business user is asking a Talend designer or developer whether they can standardize customer interaction medium data, tell to them with confidence and with a smile that I can do it! 😊

 

Join The Conversation

0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *