This post is co-authored by Hari Umapathy, Lead Data Engineer at Beachbody and Aarthi Sridharan, Sr.Director of Data (Enterprise Technology) at Beachbody.
Beachbody is a leading provider of fitness, nutrition, and weight-loss programs that deliver results for our more than 23 million customers. Our 350,000 independent “coach” distributors help people reach their health and financial goals.
The company was founded in 1998, and has more than 800 employees. Digital business and the management of data is a vital part of our success. We average more than 5 million monthly unique visits across our digital platforms, which generates an enormous amount of data that we can leverage to enhance our services, provide greater customer satisfaction, and create new business opportunities.
Building a Big Data Lake
One of our most important decisions with regard to data management was deploying Talend’s Real Time Big Data platform about two years ago. We wanted to build a new data environment, including a cloud-based data lake, that could help us manage the fast-growing volumes of data and the growing number of data sources. We also wanted to glean more and better business insights from all the data we are gathering, and respond more quickly to changes.
We are planning to gradually add at least 40 new data sources, including our own in-house databases as well as external sources such as Google Ads, Google Marketing Platform, Facebook, and a number of other social media sites.
We have a process in which we ingest data from the various sources, store the data that we ingested into the data lake, process the data and then build the reporting and the visualization layer on top of it. The process is enabled in part by Talend’s ETL (Extract, Transform, Load) solution, which can gather data from an unlimited number of sources, organize the data, and centralize it into a single repository such as a data lake.
We already had a traditional, on-premise data warehouse, which we still use, but we were looking for a new platform that could work well with both cloud and big data-related components, and could enable us to bring on the new data sources without as much need for additional development efforts.
The Talend solution enables us to execute new jobs again and again when we add new data sources to ingest in the data lake, without having to code each time. We now have a practice of reusing the existing job via a template, then just bringing in a different set of parameters. That saves us time and money, and allows us to shorten the turnaround time for any new data acquisitions that we had to do as an organization.
The Results of Digital Transformation
For example, whenever a business analytics team or other group comes to us with a request for a new job, we can usually complete it over a two-week sprint. The data will be there for them to write any kind of analytics queries on top of it. That’s a great benefit.
The new data sources we are acquiring allow us to bring all kinds of data into the data lake. For example, we’re adding information such as reports related to the advertisements that we place on Google sites, the user interaction that has taken place on those sites, and the revenue we were able to generate based on those advertisements.
We are also gathering clickstream data from our on-demand streaming platform, and all the activities and transactions related to that. And we are ingesting data from the Salesforce.com marketing cloud, which has all the information related to the email marketing that we do. For instance, there’s data about whether people opened the email, whether they responded to the email and how.
Currently, we have about 60 terabytes of data in the data lake, and as we continue to add data sources we anticipate that the volume will at least double in size within the next year.
Getting Data Management in Shape for GDPR
One of the best use cases we’ve had that’s enabled by the Talend solution relates to our efforts to comply with the General Data Protection Regulation (GDPR). The regulation, a set of rules created by the European Parliament, European Council, and European Commission that took effect in May 2018, is designed to bolster data protection and privacy for individuals within the European Union (EU).
We leverage the data lake whenever we need to quickly access customer data that falls under the domain of GDPR. So when a customer asks us for data specific to that customer we have our team create the files from the data lake.
The entire process is simple, making it much easier to comply with such requests. Without a data lake that provides a single, centralized source of information, we would have to go to individual departments within the company to gather customer information. That’s far more complex and time-consuming.
When we built the data lake it was principally for the analytics team. But when different data projects such as this arise we can now leverage the data lake for those purposes, while still benefiting from the analytics use cases.
Looking to the Future
Our next effort, which will likely take place in 2019, will be to consolidate various data stores within the organization with our data lake. Right now different departments have their own data stores, which are siloed. Having this consolidation, which we will achieve using the Talend solutions and the automation these tools provide, will give us an even more convenient way to access data and run business analytics on the data.
We are also planning to leverage the Talend platform to increase data quality. Now that we’re increasing our data sources and getting much more into data analytics and data science, quality becomes an increasingly important consideration. Members of our organization will be able to use the data quality side of the solution in the upcoming months.
Beachbody has always been an innovative company when it comes to gleaning value from our data. But with the Talend technology we can now take data management to the next level. A variety of processes and functions within the company will see use cases and benefits from this, including sales and marketing, customer service, and others.
About the Authors:
Hari Umapathy is a Lead Data Engineer at Beachbody working on architecting, designing and developing their Data Lake using AWS, Talend, Hadoop and Redshift. Hari is a Cloudera Certified Developer for Apache Hadoop. Previously, he worked at Infosys Limited as a Technical Project Lead managing applications and databases for a huge automotive manufacturer in the United States. Hari holds a bachelor’s degree in Information Technology from Vellore Institute of Technology, Vellore, India.