Epsilon Streamlines a Legacy Database

Talend Enterprise Data Integration builds a demographic, compiled list file from multiple formats.
Talend Open Studio for Data Integration worked very well for us, but as we scaled our use of it we were confronted with issues around teamwork, centralization of the projects and code, etc. We decided to invest in Talend Enterprise Data Integration, and it was well worth the investment.
Aaron Dix , Senior Technical Manager, Data Engineering

Leading marketing services firm

Founded in 1969, Epsilon is the industry's leading marketing services firm. Ad Age ranks it the #1 U.S. Marketing Services Firm and the #1 U.S. Direct Marketing Agency. Comprising three divisions - Purple@Epsilon, Marketing Technology, and Epsilon Targeting (including Abacus) - Epsilon combines the power of the world's most extensive collections of consumer and business data together with world-class creative and proven techniques to maximize marketing success for clients worldwide. Services include strategic consulting, database and loyalty technology, proprietary data, predictive modeling and a full range of creative and interactive services including brand and promotional development, web design, email deployment, search engine optimization and direct mail production. As a full-service marketing company, Epsilon's clients can take advantage of a complete range of services, or choose among many online or offline solutions. In addition, Epsilon is the world's largest permission-based email marketer.

Epsilon's problem was typical for a company that aggregates data for its clients. Data arrived in many different formats and required a lot of tedious hand coding. They needed to streamline the integration process.

Interfacing with legacy systems

Epsilon had prior experience in-house with proprietary tools and realized that they didn't want to be tied to a closed solution's restrictions as they took a new project forward. "We wanted something that was more formally focused on data integration,"€ said Aaron Dix, Senior Technical Manager of the Data Engineering group. While cost was a contributing factor in the choice of an open source solution, Epsilon found that of the products they tested Talend was one of the easiest to use. Because in-house programs used previously were written in Java or Perl, the developers were already at home with the technology. "We quickly noted that Talend outperformed some of the other products we were testing,"€ said Aaron Dix. "€œHowever, the determining factor was that the project involved legacy code that we needed to integrate into our build solutions. With its ease-of-use of external applications through the system, or through the Java drivers, Talend allows us to easily interface with external processes. Basically, we've replaced or overlaid a lot of legacy technology with Talend and it's much easier to maintain."€ Building a demographic database Epsilon's project entailed building a compiled list file with geographic and demographic attributes - age, ethnicity, occupation, income, etc. - appending more than 800 attributes. This data arrives in over a dozen different formats and is put into the database after processing through Talend. "It's a very large database," explained Aaron Dix, "€œcontaining around 430 million records. We could distribute the work among several servers, but it was time consuming and required a lot of manual processing."

"€œInitially the data is imported in flat files that are usually fixed or delimited - ASCII, DOS, or UNIX," continued Aaron Dix. "We actually use the Alterian Integrated Marketing Platform on the back end. Talend handles the integration aspect in between. Typically, Talend also performs file retrieval; it's got a very efficient set of components that integrates well into what we're doing."

"€œThe code generation approach is also a plus,"€ continued Aaron Dix. "€œWe can verify a job by looking at the generated code and then adapt it to our needs. We can also write code that is then invoked from within our Talend processes. By adding user routines we create components or import them from Talend's extensive library. And we can reuse roughly 80% of it just by calling the routine we developed."€

From Talend Open Studio for Data Integration to Talend Enterprise Data Integration

Epsilon used Talend Open Studio for Data Integration, the GPL product, for quite some time before deciding to subscribe to Talend Enterprise Data Integration, the enterprise solution. "€œTalend Open Studio for Data Integration worked very well for us,"€ said Aaron Dix, "€œbut as we scaled our use of it we were confronted with issues around teamwork, centralization of the projects and code, etc. We decided to invest in Talend Enterprise Integration Data, and it was well worth the investment."€ Beyond value added features for larger projects, the Talend Enterprise Integration Data subscription also includes Technical Support and IP indemnification.

"€œIf you work on many different systems,"€ continued Aaron Dix, "€œeven for testing, the product is very efficient. Instead of manually exporting your code over to many different systems, Talend Enterprise Integration Data lets you launch that code and test it on different systems from a single repository. It also facilitates reusability and makes teamwork pretty seamless."

Talend Enterprise Integration Data is also in use elsewhere in the company. "€œWe have a sand box for ad hoc projects,"€ explained Aaron Dix. "Talend lets us take external data formats and quickly create a database where we can parse and analyze data. We currently have 10 licenses and are considering more."€

Going forward with open source technology

Epsilon is currently considering testing Talend Enterprise Data Integration MPx Edition, Talend's latest solution based on FileScale, a breakthrough technology that allows organizations to conduct highly parallelized data processing. Talend Enterprise Data Integration MPx Edition enables enterprises to process large volumes of data in even less time, while also eliminating the limitations inherent in traditional data integration architectures. "Our data sets are quite large and growing rapidly,"€ said Aaron Dix, "€œand we're always interested in ways to process faster and increase parallelization. Talend Enterprise Data Integration MPx Edition sounds like an interesting solution."€